Early this year, Google introduced Veo 3, its latest-generation video generation model, with greater realism and fidelity. Compared to Veo 2, the Veo 3 model came with improved prompt adherence and supported new levels of control and consistency when generating videos.
Today, Google announced Veo 3.1 and Veo 3.1 Fast models with several improvements. Google claims that these new models can now generate better native audio, which includes natural conversations and synchronized sound effects. Additionally, developers can now take advantage of the improved understanding of cinematic styles. Google also claims that these models now maintain better character consistency across multiple scenes.
Google is also introducing new ways for developers to better guide the Veo 3.1 models. First, developers can provide up to three reference images of a character, object, or scene. This will make the model maintain better character consistency across multiple shots.
Second, the new scene extension feature will allow developers to create longer videos by generating new clips that connect to the previous video. Previously, developers were only able to create 30-second videos. And creating two 30-second videos while maintaining characters consistently was a big task. With this new feature, each new video is generated based on the final second of the previous clip to maintain visual continuity.
Finally, developers can now provide a starting and an ending image and ask Veo 3.1 to generate the transition between them along with audio.
Developers can now access both the Veo 3.1 and Veo 3.1 Fast models via the Gemini API in Google AI Studio and Vertex AI. Even with these improved capabilities, Veo 3.1 will cost the same as Veo 3 for developers. Developers can learn more about these new Veo 3.1 models here. General consumers can access the Veo 3.1 model both via the Gemini app and Flow.