a thousand world models
The launch of Google's Genie 3 last week nudged me back to a favourite idea: the overlap between these ground-breaking world models and theories of biological intelligence. Since reading Jeff Hawkins’s Thousand Brains a few years ago, I’ve been drawn to that crossover; the way it hints at how brains handle genuinely new situations without having seen a million related examples in the training data.
World models are like the AI equivalent of daydreaming, but useful. Imagine robots practising inside their brain before risking reality, AI agents rehearsing actions in virtual worlds before being let loose. They're a step on from token generators; seeking physical intuition, long‑term planning, and real‑world grounding.
It's the difference between giving an AI something like a realistic video game to live inside, and teaching it to navigate that world the way our brains actually do, by constantly anchoring experience to internal reference frames. And as someone who frequently relies on the tricks of mind-palaces to remember my presentation material, I'm instinctively drawn to the idea of connecting thoughts and concepts, with the physical world.
Genie-3 feels like a genuine leap forward. In this model, the AI doesn’t just watch, it acts. It changes its environment and sees those changes persist over minutes. That continuity and interactivity makes it easier to imagine an AI that experiments, makes mistakes, adapts, and learns, rather than following a script from its training set. We’ve gone from short, throwaway video snippets in systems like Veo and Grok to something closer to a small, persistent dream-world. Still flawed, but much richer – and evolving at a staggering pace.
Jeff Hawkins' Thousand Brains theory complements this nicely. Hawkins posits that our brains don't build one perfect model of the world. Instead, thousands of cortical columns build their own individual models, each one centred around a "reference frame"; a way of anchoring what we perceive to something stable and consistent. He links it to the way our brains' "grid cells" help humans and animals navigate physical spaces. Every time we move or interact, our brain updates these reference frames, keeping our internal representation of the world coherent and navigable.
This idea of reference frames feels intuitively important. It explains how we can effortlessly understand objects and concepts from multiple perspectives and contexts—because each column of the cortex maintains its own flexible but coherent map. Our brain then reconciles these many perspectives, seamlessly blending them into a stable sense of reality. It's not a competition between models but rather a negotiation, a collaborative effort to decide what's real, where things are, and how we should respond.
Now imagine taking this biological insight and applying it to something like Genie-3. Instead of training a single monolithic model to handle every scenario in every possible context, you give an AI multiple smaller models, each specialised around a specific kind of interaction, object or maybe a class; things like doors, cups, language, social interactions. Each of these modules doesn't just learn patterns; it learns patterns anchored to its own reference frames, much as our brain anchors visual or tactile experiences to internal maps.
Then, when the AI encounters new situations in its interactive world, these modules collaborate, sharing their understanding. If one module’s interpretation is slightly off, others help correct it by offering alternative perspectives. This isn’t just voting or averaging predictions; it's a deeper integration where each module’s understanding is tied to its own consistent internal coordinate system.
This approach lets the AI adapt quickly and robustly. It can explore its environment actively, asking questions through interaction rather than passive observation, in the same way we move and manipulate objects to understand them better (or move through a physical representation of a space when using a mind-palace to remember things). When faced with unexpected scenarios or changed environments, an AI built on reference frames can adjust rapidly by shifting its internal maps, just like our brains do when we're exploring a new place.
Obviously there are challenges here. Genie-3's memory spans minutes, not hours, days or years, limiting the complexity of tasks it can tackle. Additionally, to achieve truly emergent intelligence, these world models may need richer causal structures, understanding not just "what is" but "what could be." Managing the coordination between many specialised reference-frame-based modules might need new methods to mimic the nuanced negotiations happening in our brains.
Still, I can't help but feel drawn to the idea. Combining interactive world models like Genie-3 with the Thousand Brains theory’s emphasis on flexible, reference-frame anchored perception, could be one possible path towards a novel intelligence architecture that's practical, and biologically inspired, giving rise to AIs that are more adaptable, capable, and aligned with how natural intelligence actually operates.
This isn't about one groundbreaking demo or one neat idea; it's about recognising that intelligence, whether artificial or biological, emerges when agents have coherent internal maps that make sense of persistent, interactive environments. Google's Genie-3 provides the environment, and Hawkins gives us a powerful way to navigate it.