
During the Cloud Next event last week, Google announced that the Gemini 2.5 Flash model is coming soon with major improvements. Today, Google announced the roll out of the Gemini 2.5 Flash preview in the Gemini API via Google AI Studio and Vertex AI. This new model is also available for Gemini users via the model picker and can be used with Canvas for easily refining documents and code.
Following in the footsteps of its predecessor, Gemini 2.0 Flash, Gemini 2.5 Flash comes with significant improvements to reasoning capabilities without incurring high costs or latency. Google claims that this new model has an excellent performance-to-cost ratio. The pricing details are below:
- $0.15 per 1 million input tokens
- $.60 per 1 million output tokens without reasoning
- $3.50 per 1 million output tokens with reasoning
This is an early version of 2.5 Flash, but it already shows huge gains over 2.0 Flash.
— Logan Kilpatrick (@OfficialLoganK) April 17, 2025
You can fully turn off thinking if needed and use this model as a drop in replacement for 2.0 Flash.
It’s available across the Gemini API, AI Studio, Vertex, and the Gemini app!
Gemini 2.5 Flash is the first fully hybrid reasoning model from Google that allows developers to enable reasoning on or off. This is said to help developers optimize their responses depending on the targeted quality, cost, and latency. Check out the benchmarks for this new model below.

As shown in the table above, despite its low cost, Gemini 2.5 Flash seems to hold its own when compared to frontier models from Anthropic and Grok. OpenAI's recently released o4-mini appears to perform better than the Gemini 2.5 Flash preview, but it costs significantly more.
3 Comments - Add comment