Published Date : 10/10/2025
Google DeepMind has announced a significant advancement in artificial intelligence (AI) with the introduction of Gemini Robotics, a pair of AI models designed to enable robots to perform complex tasks and reason in a manner that was once considered impossible. This breakthrough builds on the earlier Gemini Robotics model, which was specialized for robotics and allowed machines to reason and execute simple tasks in physical environments.
Earlier this year, DeepMind introduced Gemini Robotics, which could follow simple instructions like 'place this banana in the basket' and guide a robotic arm to complete the task. Now, with the new models, robots can handle more sophisticated tasks. For instance, a pair of robotic arms (Aloha 2) can sort a selection of fruits into individual containers based on color. The robot not only performs the task but also explains its actions in natural language, offering a step-by-step breakdown of its reasoning.
Jie Tan, a senior staff research scientist at DeepMind, highlighted the significance of this development in a video. 'We enable it to think,' he said. 'It can perceive the environment, think step-by-step, and then finish this multistep task. Although this example seems very simple, the idea behind it is really powerful. The same model is going to power more sophisticated humanoid robots to do more complicated daily tasks.'
The AI models, Google Robotics-ER 1.5 (the 'brain') and Google Robotics 1.5 (the 'hands and eyes'), work together much like a supervisor and a worker. The vision-language model (VLM) gathers information about the environment and objects, processes natural language commands, and uses advanced reasoning to send instructions to the vision-language-action (VLA) model. The VLA model then matches these instructions to its visual understanding of the space, builds a plan, and executes the tasks while providing feedback on its processes and reasoning.
One of the key advancements is the models' ability to use tools such as Google Search to complete tasks. For example, a researcher asked Aloha to sort objects into compost, recycling, and trash bins based on local recycling rules. The robot recognized the user's location in San Francisco, found the relevant rules online, and accurately sorted the trash.
Another significant feature is the models' capability to learn and apply that learning across multiple robotics systems. DeepMind representatives stated that any learning gained from its Aloha 2 robot (a pair of robotic arms), Apollo humanoid robot, and bi-arm Franka robot can be applied to other systems due to the generalized way the models learn and evolve.
'General-purpose robots need a deep understanding of the physical world, advanced reasoning, and general and dexterous control,' the Gemini Robotics Team explained in a technical report. This generalized reasoning allows the models to approach problems with a broad understanding of physical spaces and interactions, breaking tasks down into small, manageable steps that can be easily executed.
The team also demonstrated the robot's adaptability in a real-world scenario. They presented an Apollo robot with two bins and asked it to sort clothes by color, with whites going into one bin and other colors into the other. They then added an additional challenge by moving the clothes and bins around, forcing the robot to reevaluate the physical space and react accordingly, which it managed successfully.
This breakthrough in AI-powered robotics opens up new possibilities for automation in various industries, from manufacturing to household tasks, and marks a significant step towards the development of truly intelligent machines.
Q: What is Gemini Robotics?
A: Gemini Robotics is a pair of AI models developed by Google DeepMind that enable robots to perform complex tasks and reason in a way previously impossible. The models include Google Robotics-ER 1.5 (the 'brain') and Google Robotics 1.5 (the 'hands and eyes').
Q: How does Gemini Robotics differ from earlier AI models?
A: Gemini Robotics can perform more sophisticated tasks, such as sorting fruits by color and explaining its actions in natural language. It can also use tools like Google Search to complete tasks and apply learning across multiple robotics systems.
Q: What are the key capabilities of the new AI models?
A: The key capabilities include advanced reasoning, the ability to use tools like Google Search, and the capacity to learn and apply that learning across different robotics systems. The models can also reevaluate and adapt to changes in the physical environment.
Q: How does the 'brain' and 'hands and eyes' model work together?
A: The 'brain' (Google Robotics-ER 1.5) gathers information about the environment, processes natural language commands, and sends instructions to the 'hands and eyes' (Google Robotics 1.5). The 'hands and eyes' then matches these instructions to its visual understanding of the space, builds a plan, and executes the tasks.
Q: What are the potential real-world applications of this technology?
A: The technology has potential applications in various industries, including manufacturing, household tasks, and environmental management. For example, robots can help sort recycling, manage inventory, and perform complex tasks in industrial settings.