Google Just Turned Street View Into a Video Game

Here’s what Google shipped at I/O, what they haven’t, and why a Korean research paper shows where they might go next.

May 21, 2026

At I/O 2026 this week, while everyone was focused on agentic AI and debating whether Gemini Flash 3.5 lived up to the hype, DeepMind quietly shipped what might be the most consequential spatial intelligence release of the year.

They connected Project Genie to Google Maps. Specifically, Genie 3 -- their real-time world model -- can now generate explorable, interactive 3D worlds anchored to any of the 280 billion Street View images Google has captured over two decades across 110 countries.

Pick a place on Earth. Choose a style. Drop in a character. Walk around.

For the first time, Street View became playable -- not just navigable

What “Maps Imagery Grounding” actually means

The workflow is shockingly simple. Open the Genie interface (Google AI Ultra subscription required, $200/month). Hit the new “Choose location from Google Maps” button. You land in a familiar Street View pane -- Pegman in the corner, panoramas at every coordinate. Pick a location. Click “use this location.”

Now choose a style. Preset list includes Desert Sands, Stone Age, Ocean World, Black & White Film. Or write your own. Then describe a character. Mine was “mobster in a nice suit walking around 1920s SF.”

Click Create World. The model spins up a 60-second interactive scene anchored to the panorama you picked.

I dropped in front of the Ferry Building. The Bay Bridge was right where it should be. The 1920s film grain rendered correctly over everything. My mobster walked the street.

Now remember, this isn’t a Gaussian splat. There’s no pre-rendered 3D scene. It’s an autoregressive video model that has essentially seen most of YouTube and now knows what should be around you at any given Street View coordinate. It generates each frame on the fly, conditioned on the panorama you selected. And let me tell you, this is just the beginning…

Now for the fun part

I ran this thing through its paces. And apparently my commentary ended up on TBPN too.

A Google Maps-branded F1 car whizzing around the Las Vegas Strip. Speedometer in the corner. Checkpoints generated automatically. It plays exactly like the framing you’d expect -- and it points at where games go when the world is the dataset.

A raccoon on a scooter around the Palace of Fine Arts. Shadows landing roughly correctly. Architecture mostly accurate.

A tattooed runner around Lady Bird Lake in Austin (I had to). The skyline checked out. The Google building was right where it should be.

Pegman returning to the Ferry Building, for old times’ sake. Former Google Maps nerd in me could not resist.

A character walking around the inside of the White House -- yes, Google has indoor Street View special collects, and yes, you can generate worlds from them.

Why a Korean lab got here first

Google didn’t invent this approach. In March, Naver -- South Korea’s Google -- published Seoul World Model (SWM): the first city-scale world model grounded in a real metropolis.

The trick was retrieval. SWM doesn’t just hallucinate forward; it continuously pulls the nearest Street View panoramas as the camera moves and re-grounds the generation against them. The result is faithful, multi-kilometer video of actual Seoul streets.

Google’s implementation appears to skip the retrieval step. Seed with one panorama, generate from there. How do I know? In my Genie demo, the back of the Palace of Fine Arts is full of phantom houses. In reality, that’s the Marina -- water, sailboats, the bay.

I’d bet Google adds retrieval next. They have the database. They have the GPU budget. The path is obvious.

And lest we forget this is ground-level panoramas only. The moment Google fuses aerial + ground (and they own the best aerial dataset on Earth), faithful 1:1 reconstruction becomes possible. Next obvious step.

Where this goes

To me this is the geospatial substrate becoming legible. For two decades, Street View was a passive archive -- a reference layer. Genie 3 changes its job description.

For creators: a one-prompt path from any place on Earth to a playable scene.
For robotics: simulators conditioned on real cities, not synthetic ones. Waymo’s already using Genie 3 for rare-event sims -- tornadoes, elephants on roads -- and perspective-swap lets them train pedestrians and delivery robots, not just AV viewpoints.
For training data: edge cases anchored in geographies that actually exist.
For everyone else: the closest we’ve come to a holodeck for the real world, running on the largest ground-truth dataset humanity has assembled.

Want to go deeper?

I suspect you will not stop hearing about world models this year, and you probably want to get up to speed on what’s going on. Here are two videos that will help you with exactly that.

My sit down with the Genie 3 team:

The broader world models primer:

If this gave you something to think about, share it with fellow reality mappers.

Map the World by Bilawal Sidhu

Discussion about this post

Ready for more?