Xianglong He*
Chunli Peng*
Zexiang Liu*
Boyang Wang*†
Yifan Zhang
Qi Cui
Fei Kang
Biao Jiang
Mengyin An
Yangyang Ren
Baixin Xu
Hao-Xiang Guo
Kaixiong Gong
Xuchen Song
Yang Liu‡
Eric Li‡
Yahui Zhou
Skywork AI
The foundation model is derived from WanX. By removing the text branch and adding action modules, the model predicts next frames only from visual contents and corresponding actions.
Matrix-Game 2.0 achieves excellent performance on the GameWorld Score benchmark in Minecraft scenes.
Model | Image Quality ↑ | Aesthetic Quality↑ | Temporal Cons. ↑ | Motion Smooth. ↑ | Keyboard Acc. ↑ | Mouse Acc. ↑ | Object Cons. ↑ | Scenario Cons. ↑ |
---|---|---|---|---|---|---|---|---|
Oasis | 0.27 | 0.27 | 0.82 | 0.99 | 0.73 | 0.56 | 0.18 | 0.84 |
Ours | 0.61 | 0.50 | 0.94 | 0.98 | 0.91 | 0.95 | 0.64 | 0.80 |
Matrix-Game 2.0 demonstrates strong generative capabilities across diverse scene styles, featuring varying visual aesthetics and terrains.
Matrix-Game 2.0 demonstrates the ability to generate precisely controlled videos in GTA scenarios, while also shows the capability for modeling scene dynamics.
Matrix-Game 2.0 demonstrates strong auto-regressive generation capabilities for producing long videos.
Matrix-Game 2.0 demonstrates strong generative capabilities in Minecraft scenes, adapting to diverse visual styles and terrains.
Matrix-Game 2.0 can also be applied to generate interactive videos in TempleRun scenes.
We would like to express our gratitude to:
We are grateful to the broader research community for their open exploration and contributions to the field of interactive world generation.