It looks like OpenAI has finally caught on to Google"s Veo 3 and has introduced Sora 2, its most advanced video and audio generation model to date.
The second iteration of the model comes just 10 months after the original Sora release, which, as per OpenAI, was hailed as a “GPT-1 moment for video,” while Sora 2 represents the “GPT-3.5 moment” in video generation. Sora 2 is capable of producing sophisticated scenes that obey the laws of physics in ways previous models could not.
OpenAI says that Sora 2 understands realistic physical dynamics, unlike prior video generative AI systems that often distorted reality, such as teleporting basketballs into hoops after missed shots. The model can generate intricate movements like Olympic gymnastics routines, paddleboarding backflips that respect buoyancy and rigidity, and even complex triple axels performed by virtual characters, all while maintaining synchronized dialogue and sound effects. The model"s ability to accurately model failure, not just success, is a key advancement toward realistic world simulation.
One of the key things that Sora 2 excels in is controllability. The model is capable of following detailed multi-shot instructions and maintaining consistent world states across video sequences. It also supports multiple visual styles, including realistic, cinematic, and anime aesthetics, enhancing versatility for creators.
Perhaps one of the most innovative features of Sora 2 is the ability to inject real-world elements into generated videos via “cameos.” By uploading a short video and audio sample, users can insert faithful digital representations of themselves, friends, or objects into any Sora-generated environment.
OpenAI has also launched a new Sora app on iOS that will allow users to create and remix each other’s videos and include each other"s cameos.
Sora 2"s iOS app will feature a video feed of AI-generated content. OpenAI says that it is aware of doomscrolling, addiction, isolation, and RL-sloptimized feeds, and added that the app prioritizes content creation over passive consumption. The company has also highlighted what it"s doing for safety and other measures of the Sora feed in a separate post.
The company says that it employs AI-driven recommender algorithms that can be controlled through natural language instructions and includes features like well-being check-ins and adjustable feed algorithms. Teen users face default limits on content consumption, while parents can use ChatGPT-powered controls to manage settings, protecting younger audiences.
The Sora app is available to download on iOS today, where users can sign up for an invite and will be notified once they get access. After getting access, users can also access Sora 2 for free in the early rollout stages on its website, with ChatGPT Pro users gaining access to a higher-quality Sora 2 Pro model along with the pre-existing Sora 1 Turbo model. OpenAI says that the model will also be accessible via API soon.