Published Date : 26/06/2025
Google DeepMind has announced the launch of a new artificial intelligence model tailored for robotics, capable of functioning entirely on a local device without requiring an active data connection. Named Gemini Robotics On-Device, the advanced model is designed to enable bi-arm robots to carry out complex tasks in real-world environments by combining voice, language, and action (VLA) processing.
In a blog post, Carolina Parada, Senior Director and Head of Robotics at Google DeepMind, introduced the new model, highlighting its low-latency performance and flexibility. As it operates independently of the cloud, the model is especially suited to latency-sensitive environments and real-time applications where constant internet connectivity is not feasible.
Currently, access to the model is restricted to participants of Google’s trusted tester programme. Developers can experiment with the AI system through the Gemini Robotics software development kit (SDK) and the company's MuJoCo physics simulator.
Although Google has not disclosed specific details about the model’s architecture or training methodology, it has outlined the model’s robust capabilities. Designed for bi-arm robotic platforms, Gemini Robotics On-Device requires minimal computing resources. Remarkably, the system can adapt to new tasks using only 50 to 100 demonstrations, a feature that significantly accelerates deployment in diverse settings.
In internal trials, the model demonstrated the ability to interpret natural language commands and perform a wide array of sophisticated tasks, from folding clothes and unzipping bags to handling unfamiliar objects. It also successfully completed precision tasks such as those found in industrial belt assembly, showcasing high levels of dexterity.
Though originally trained on ALOHA robotic systems, Gemini Robotics On-Device has also been adapted to work with other bi-arm robots, including Franka Emika’s FR3 and Apptronik’s Apollo humanoid robot. According to the American tech giant, the model exhibited consistent generalisation performance across different platforms, even when faced with out-of-distribution tasks or multi-step instructions.
Google DeepMind’s Gemini Robotics On-Device represents a significant advancement in the field of robotics, offering a powerful tool for developers and researchers to explore the potential of AI in real-world applications. The model’s ability to operate locally and perform complex tasks with minimal resources opens up new possibilities for industries ranging from manufacturing to healthcare.
Q: What is Gemini Robotics On-Device?
A: Gemini Robotics On-Device is a new AI model developed by Google DeepMind that enables bi-arm robots to perform complex tasks in real-world environments. It operates locally on the device without requiring an active internet connection.
Q: How does Gemini Robotics On-Device work?
A: The model combines voice, language, and action (VLA) processing to interpret natural language commands and perform a wide array of sophisticated tasks, from folding clothes to handling unfamiliar objects. It requires minimal computing resources and can adapt to new tasks with just 50 to 100 demonstrations.
Q: Who can access Gemini Robotics On-Device?
A: Currently, access to the model is restricted to participants of Google’s trusted tester programme. Developers can experiment with the AI system through the Gemini Robotics SDK and the company's MuJoCo physics simulator.
Q: What are the benefits of Gemini Robotics On-Device?
A: The benefits include low-latency performance, flexibility, and the ability to operate in latency-sensitive environments without constant internet connectivity. It is particularly useful for real-time applications in industries like manufacturing and healthcare.
Q: Which robots can use Gemini Robotics On-Device?
A: The model has been adapted to work with various bi-arm robots, including Franka Emika’s FR3 and Apptronik’s Apollo humanoid robot. It has shown consistent performance across different platforms, even when faced with out-of-distribution tasks or multi-step instructions.