Genie can generate playable worlds from images, photographs, and even sketches. It is trained from a large dataset of publicly available internet videos. Genie can learn fine-grained controls from these videos, without needing any action labels. This allows Genie to be used with images it has never seen before. In the future, Genie could be used to train generalist AI agents.