When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Google announces Gemini Robotics, a Gemini 2.0 model optimized for robots

Gemini Robotics

Google DeepMind has been making steady progress in the field of AI with regular updates to Gemini, Imagen, Veo, Gemma, and AlphaFold. Today, the Google DeepMind team entered the robotics industry with two new Gemini 2.0-based models: Gemini Robotics and Gemini Robotics-ER.

Gemini Robotics is an advanced vision-language-action (VLA) model which is based on Gemini 2.0, with the addition of physical actions as a new output modality to control robots. Google claims that this new model can understand situations that it has never seen before in training.

Compared to other state-of-the-art vision-language-action models, Gemini Robotics performs twice as well on a comprehensive generalization benchmark. Since Gemini Robotics is built on the Gemini 2.0 model, it features natural language understanding capabilities in different languages. So, it can understand people’s commands in a much better way.

When it comes to dexterity, Google claims that Gemini Robotics can handle extremely complex, multi-step tasks that require precise manipulation. For example, this model can perform origami folding or put a snack into a Ziploc bag.

Gemini Robotics-ER is an advanced vision-language model that focuses on spatial reasoning and allows roboticists to connect it with their existing low-level controllers. Using this model, roboticists will have all the steps to control a robot right out of the box, which includes perception, state estimation, spatial understanding, planning, and code generation.

Google is partnering with Apptronik to build humanoid robots based on Gemini 2.0 models. Google is also working with select trusted testers, including Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools, on the future of Gemini Robotics-ER.

By enabling robots to understand and execute complex tasks with greater precision and adaptability, Google DeepMind is paving the way for a future where robots can seamlessly integrate into various aspects of our lives.

Report a problem with article
The Intel logo
Next Article

Intel appoints Lip-Bu Tan as its new CEO

Scopely Niantic
Previous Article

Niantic confirms sale of Pokemon GO, its other games, and development teams for $3.5 billion

Join the conversation!

Login or Sign Up to read and post a comment.

3 Comments - Add comment