When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Gemini 2.5 Computer Use model outperforms leading alternatives on multiple AI benchmarks

Google has announced Gemini 2.5 Computer Use, a new AI model that allows agents to interact with user interfaces on websites and mobile apps.
Gemini 25 Computer Use

During Google I/O earlier this year, Google revealed that it would be bringing computer use capabilities to the Gemini API. Today, Google announced Gemini 2.5 Computer Use, a new specialized model to power agents that can interact with user interfaces (UIs). Google claims that this new model outperforms other similar models on multiple web and mobile control benchmarks.

Here's how the Gemini API computer_use tool works:

  • Developers need to send the user request as the input to the tool, which includes a screenshot of the environment and a history of recent actions.
  • Along with the input, developers can also specify whether to exclude functions from the full list of supported UI actions or if any additional custom functions need to be included.
  • The model will analyze the received inputs and generate a response, which will be one of the UI actions, such as clicking or typing.
  • If the model is unsure, it may even request end-user confirmation. For example, if the action is related to purchasing an item, user confirmation will be required.
  • The client-side code then executes the received action, such as clicking a button or displaying an end-user confirmation.
  • Once the action is completed, a new screenshot of the current GUI and the current URL are sent back to the Computer Use model as a function response, restarting the loop.
  • Until the main task objective is reached, the above steps are repeated.

While the Gemini 2.5 Computer Use model is optimized for web browsers, Google claims that this model also performs well for mobile UI control tasks. Google specifically mentioned that this model is not yet optimized for desktop OS-level control. As you can notice in the benchmarks below, Gemini 2.5 Computer Use delivers state-of-the-art results in several key benchmarks.

Gemini 25 Computer Use

The Gemini 2.5 Computer Use model is now available in public preview, and developers can access it via the Gemini API on Google AI Studio and Vertex AI.

Google logo on a CodeMender background
Next Article

Google rolls out AI Plus plan and AI Mode to dozens of new countries

wd elements
Previous Article

WD Elements 14TB external HDD is a mouth-watering deal this 2025 Amazon Prime Day

0 Comments

Load the comments and join the conversation!

Read the comments, ask the editors questions, show respect and join the conversation.

Click here