DeepMind’s SIMA 2 Uses Gemini AI to Play Games, Navigate Complex Virtual Worlds, and Handle Multi-Step Tasks Like a Human Player

DeepMind has unveiled SIMA 2, an advanced AI agent powered by Gemini AI, capable of operating in 3D virtual worlds with human-like reasoning and independence. It can interpret complex instructions, adapt to unfamiliar environments, and learn autonomously, marking a significant step towards Artificial General Intelligence (AGI) and future robotics.

Google DeepMind Pushes Further Into Virtual Intelligence With SIMA 2

Google DeepMind has unveiled SIMA 2, an upgraded AI agent designed to operate inside 3D virtual worlds with a level of independence and reasoning that the company says brings it closer to future real-world robotics.

The agent builds on last year’s SIMA model but now runs on Google’s Gemini AI, allowing it to plan, explain decisions, learn through experience, and collaborate with users in a way the original system could not.

DeepMind describes SIMA 2 as a “companion” in virtual environments—an AI that can talk, interpret high-level goals, and carry out tasks using simulated keyboard and mouse controls.

The company said,

“This is a significant step in the direction of Artificial General Intelligence (AGI), with important implications for the future of robotics and AI-embodiment in general.”

A More Capable Agent Built On Gemini

The upgrade to Gemini is central to SIMA 2’s progress.

With multimodal abilities, the agent can respond to text, voice, sketches, and even emojis while taking actions in real time.

Google DeepMind wrote on X that “SIMA 2 is our most capable AI agent for virtual 3D worlds… meaning you can talk to it through text, voice, or even images.”

SIMA 2 is our most capable AI agent for virtual 3D worlds. 👾🌐
Powered by Gemini, it goes beyond following basic instructions to think, understand, and take actions in interactive environments – meaning you can talk to it through text, voice, or even images. Here’s how 🧵 pic.twitter.com/DuVWGJXW7W
— Google DeepMind (@GoogleDeepMind) November 13, 2025

This shift allows the agent to interpret complex instructions, ask clarifying questions, and describe the steps it intends to take.

The model can also adapt behaviour to match tasks it has never encountered before by analysing on-screen visuals alone—no internal game data is accessed.

How Does SIMA 2 Work Across Games It Has Never Seen?

During tests, SIMA 2 solved significantly more tasks in unfamiliar environments, including MineDojo and ASKA.

Success rates ranged between 45 and 75%, compared with SIMA 1’s 15 to 30% in the same settings.

Across all benchmarks, the newer agent completed 65% of tasks, more than double SIMA 1’s 31%.

SIMA 1 learned 600+ in game skills by watching the screen and using a virtual keyboard and mouse.
Now SIMA 2 goes past simple instructions.
With Gemini at its core it can understand your goal, think through it, and take actions inside complex 3D worlds. pic.twitter.com/HrLiTVcYl0
— CHRIS FIRST (@chrisfirst) November 13, 2025

DeepMind found that SIMA 2 could transfer concepts from one game to another—for instance, treating “harvesting” in a building game as similar to “mining” in an exploration game.

That level of abstraction is one of the features researchers hope could eventually translate into robotics.

Joe Marino, a research scientist at DeepMind, said even basic actions require layered reasoning.

“It’s a really complex set of tasks you need to solve to progress.”

The multi-step challenges found in games mirror the sequential and complex demands required for tasks in physical robotics.

Learning Through Experience, Not Just Human Demonstrations

SIMA 2’s training began with human gameplay footage across eight commercial titles, including No Man’s Sky and Goat Simulator 3, as well as three custom-built environments.

But the more notable advancement is the agent’s ability to improve without human-labelled data.

After initial demonstrations, the system switched to self-directed learning.

Gemini generated new tasks, evaluated SIMA 2’s attempts, and provided tips after each failure.

Over repeated attempts, SIMA 2 adjusted behaviour and produced its own trajectory data, forming a loop that helped it refine skills autonomously.

Testing SIMA 2 Inside Worlds Generated From A Single Image

DeepMind also tested SIMA 2 in experimental worlds created by Genie 3, a project that can generate 3D environments from just one image or text prompt.

SIMA 2 🤝 Genie 3
We tested SIMA 2’s abilities in simulated 3D worlds created by our world model Genie 3.
It demonstrated unprecedented adaptability by navigating its surroundings and took meaningful steps toward goals. pic.twitter.com/9M9bVMqD6e
— Google DeepMind (@GoogleDeepMind) November 13, 2025

Dropped into these unfamiliar worlds moments after they were formed, the agent was able to orient itself, interpret goals, and take meaningful actions—behaviour that researchers say they did not observe in SIMA 1.

Marino called this adaptability a “fundamental” step toward AGI and future robotics: a flexible agent that can navigate, use tools, and collaborate with people in unpredictable environments.

How Far Can This Technology Go? Experts Weigh In

Some researchers say SIMA 2’s achievements stand out because controlling multiple games from raw visual input has long been a challenge.

Julian Togelius, an AI researcher at New York University, noted that earlier attempts struggled, referencing previous multi-game systems such as GATO.

“Playing in real time from visual input only is ‘hard mode’.”

Others remain sceptical about its real-world impact.

Matthew Guzdial from the University of Alberta said it is unsurprising that SIMA 2 performs well on many games, since most rely on similar keyboard and mouse controls.

“If you put a game with weird input in front of it, I don’t think it’d be able to perform well.”

Damn. DeepMind's generalist AI agent SIMA 2 evolved from basic instruction-following to actual reasoning companion. Uses vision and keyboard/mouse like a human player, works across dozens of games without touching game code. The robotics angle is obvious - if you can generalize… pic.twitter.com/8OsE6g6qI9
— Bilawal Sidhu (@bilawalsidhu) November 13, 2025

He also questioned whether visual understanding learned in games would transfer smoothly to physical robots, where camera data is far messier than video game graphics.

What Still Limits SIMA 2 Today

DeepMind openly acknowledges the system’s ongoing weaknesses.

SIMA 2 struggles with very long, multi-step tasks and retains only short-term context to keep interactions responsive.

Its simulated keyboard and mouse control is less precise than a human player’s, and its visual interpretation still fails in busy or cluttered 3D scenes.

These gaps reveal how far current systems remain from general-purpose intelligence.

For now, SIMA 2 remains a research project available only to select academics and developers.

Could SIMA 2 Lead To Better Robots One Day?

DeepMind believes the skills SIMA 2 is learning—navigation, tool use, reasoning, collaboration—form the foundations of future general-purpose robots.

The team hopes to combine SIMA’s trial-and-error learning with Genie 3’s limitless virtual worlds, creating an ongoing training loop where the agent continuously improves.

Marino said,

“We’ve kind of just scratched the surface of what’s possible.”

The virtual learning environments might eventually bridge the gap between simulation and physical robotics.