WorldMem enables long-term consistent world simulation with memory mechanism. We can control agents to explore diverse and consistent worlds with an expansive action space, crafting environments by placing objects like pumpkin light or freely roaming around. Most importantly, after exploring for a while and glancing back, we find the objects we placed are still there, with the sight of the light melting surrounding snow, indicating the passage of time!
Initial View
Revisited View
Initial View
Revisited View
Initial View
Revisited View
By conditioning on the memory bank, our framework accurately generates diverse and dynamic worlds that remains consistent with past states.
We can interact with the world by placing hay in the desert or planting wheat in the plains. Meanwhile, these changes are recorded and reproduced in revisited views. Over time, we can observe transformations such as wheat growing.
Initial View
Revisited View
Initial View
Revisited View
Without timestamps as a condition, the model struggles to distinguish between memory units representing the same location at different time points, leading to incorrect generations. In contrast, with time conditioning, the model effectively aligns with the updated world state, ensuring consistent outputs.
Initial View
Revisited View
Initial View
Revisited View
In the 360-degree consistency testing. Our approach with memory successfully returns to the original location without losing previously generated details.
Initial View
Revisited View
Initial View
Revisited View
Our method can generate consistent results with customized trajectories.
Initial View
Revisited View
Initial View
Revisited View
@misc{xiao2025worldmemlongtermconsistentworld,
title={WORLDMEM: Long-term Consistent World Simulation with Memory},
author={Zeqi Xiao and Yushi Lan and Yifan Zhou and Wenqi Ouyang and Shuai Yang and Yanhong Zeng and Xingang Pan},
year={2025},
eprint={2504.12369},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2504.12369},
}