Google DeepMind introduces Gemini Robotics 1.5 – WATCH robots plan, analyse, and act

Google DeepMind has launched two Gemini Robotics AI models, ER 1.5 and 1.5, to enhance general-purpose robots. ER 1.5 plans tasks using reasoning and tools, while 1.5 executes actions from visual input and instructions, enabling complex multi-step tasks with natural language control.

Govind Choudhary
Updated27 Sep 2025, 03:09 PM IST
Google DeepMind has introduced two new artificial intelligence (AI) models in its Gemini Robotics family, aimed at enhancing the capabilities of general-purpose robots.
Google DeepMind has introduced two new artificial intelligence (AI) models in its Gemini Robotics family, aimed at enhancing the capabilities of general-purpose robots.(Google DeepMind)

Google DeepMind has introduced two new artificial intelligence (AI) models in its Gemini Robotics family, aimed at enhancing the capabilities of general-purpose robots. The models, named Gemini Robotics-ER 1.5 and Gemini Robotics 1.5, are designed to operate together to improve reasoning, vision, and action in real-world environments.

Two-Model System for Planning and Execution

According to a blog post from DeepMind, Gemini Robotics-ER 1.5 serves as the planner or orchestrator, while Gemini Robotics 1.5 is responsible for executing tasks based on natural language instructions. The two-model system is intended to address limitations seen in earlier AI models, where a single system both planned and performed actions, often leading to errors or delays in execution.

Gemini Robotics-ER 1.5: The Planner

The ER 1.5 model functions as a vision-language model (VLM) capable of advanced reasoning and tool integration. It can create multi-step plans for a given task and is reported to perform strongly on spatial understanding benchmarks. The model can also access external tools, such as Google Search, to gather information for decision-making in physical environments.

Gemini Robotics 1.5: Task Execution

Once a plan is formulated, Gemini Robotics 1.5, a vision-language-action (VLA) model, translates instructions and visual input into motor commands, enabling the robot to carry out the task. The model assesses the most efficient path to complete an action and executes it, while also offering explanations of its decision-making in natural language.

Also Read | Gemini Nano Banana trend: 7 AI prompts for dandiya portraits this Navratri
Also Read | 5 Must-try AI prompts for retro-style couple portraits with Gemini Nano Banana
Also Read | Happy Navratri 2025: 10 AI prompts to create Navratri & Durga Puja stickers

Handling Complex Multi-Step Tasks

The system is designed to allow robots to handle complex, multi-step commands in a seamless process. For example, a robot could sort items into compost, recycling, and trash bins after consulting local recycling guidelines online, analysing the objects, planning the sorting process, and then executing the actions.

DeepMind states that the AI models are adaptable to robots of various shapes and sizes due to their spatial awareness and flexible design. At present, the orchestrator model, Gemini Robotics-ER 1.5, is accessible to developers via the Gemini API in Google AI Studio, while the VLA model is limited to select partners.

This development marks a step in integrating generative AI into robotics, replacing traditional interfaces with natural language-driven control, while also attempting to separate planning from execution to reduce errors.

Artificial Intelligence
Get Latest real-time updates

Get details on the iPhone 17 expected price, features, and launch date for the iPhone 17 Pro, Pro Max and Air models.

Business NewsTechnologyNewsGoogle DeepMind introduces Gemini Robotics 1.5 – WATCH robots plan, analyse, and act
More