Originally published in Bessemer’s Atlas; co-authored by Janelle Teng Wade, Lance Co Ting Keh, Talia Goldberg, David Cowan, Grace Ma, Bhavik Nagda, Brandon Nydick, and Bar Weiner.

The first generation of AI was built for a world where the model was the product, and progress meant bigger weights, more data, and stellar benchmarks. AI infrastructure mirrored this reality, fueling the rise of giants in foundation models, compute capacity, training techniques, and data ops. This was the focus of our 2024 AI Infrastructure Roadmap, which drove our investments in companies such as Anthropic, Fal AI, Supermaven (acquired by Cursor), and VAPI as the AI infrastructure revolution unfolded.

But the landscape has changed. Big labs are moving beyond chasing benchmark gains to designing AI that interfaces with the real world, and enterprises are graduating from POCs to production. The infrastructure that got us here — which was optimized for scale and efficiency — won’t get us to the next phase. What’s needed now is infrastructure for grounding AI in operational contexts, real-world experience, and continuous learning.

The stage is being set for a new wave of AI infrastructure tools to enable AI to operate in the real world. We’ve identified five frontiers that will define this next wave, each addressing a structural limitation that needs to be solved beyond model scaling.

Five cutting-edge frontiers for next-gen AI infrastructure

1. “Harness” infrastructure

As AI deployments shift from single models to compound systems, infrastructure designed to “harness” models — unlocking their full potential — becomes more important than ever.

Take memory and context management. Most enterprise AI systems suffer from organizational amnesia. While basic Retrieval-Augmented Generation (RAG) solved the connection problem between models and data sources, compound AI systems now require more sophisticated memory infrastructure. Enterprises hold vast amounts of historical data and organizational knowledge — from proprietary documents to CRM records — that AI systems must access to avoid hallucinations and stay grounded in company-specific reality.

Reliable AI deployment depends not just on raw model horsepower, but on orchestrating components like knowledge retrieval, cross-session context management, and planning. As models become commoditized, differentiation shifts to the memory and context layer. What developers once built from scratch — custom vector databases and retrieval systems — is now emerging as its own infrastructure category. Startups and Big Tech alike now offer plug-and-play semantic layers that maintain conversation context, user preferences, and long-term memory across sessions.

Novel evaluation and observability present another critical infrastructure challenge — one that didn’t exist in prior software development paradigms. Consider teams shipping conversational AI agents to production. Traditional monitoring tracks completion rates, latency, error codes, and thumbs up/down feedback. But conversational AI fails differently. When a chatbot gives a confident wrong answer, gradually drifts from the user’s actual question, or misunderstands the request while producing something plausible, users often don’t react. No complaint, no thumbs down, no error signal. The conversation looks fine in dashboards, and AI just quietly failed.

An estimated 78% of AI failures are invisible — AI gets something wrong, but no one catches it. Not the user, not traditional monitoring, not even a sentiment analysis. These failures cluster into recurring patterns:

These patterns persist across 93% of cases even with more powerful models, because they stem from interaction dynamics — how models present outputs and how users communicate intent — not capability gaps.

New infrastructure is emerging to address this. Platforms like Bigspin.ai provide not just pre-deployment testing but real-time production monitoring of model outputs against golden datasets and user feedback. We’re also moving beyond traditional analytics toward semantic metrics, with new platforms such as Braintrust and Judgment Labs, as well as techniques such as LLM-as-a-judge, that are emerging for high-quality evals and metrics definition.

These examples illustrate evolving needs for AI harness infrastructure. For more on environments, runtime, orchestration, protocols, and frameworks, see our Software 3.0 roadmap.

2. Continual learning systems

Today’s AI models face a fundamental constraint: frozen weights prevent true learning after deployment. While context management strategies like compaction are powerful, and we see many big labs use them for long-running agents, in-context learning enables only surface-level adaptation through rote memorization, not the acquisition of new skills. It also becomes prohibitively expensive as contexts grow, since the KV cache scales linearly with added context. From both technical and economic perspectives, it’s infeasible to build AI systems that remember everything and continuously improve over years of use.

This is where continual learning offers a solution. It enables AI to accumulate knowledge and skills across tasks over time, maintaining earlier capabilities while acquiring new ones. Unlike traditional models trained once and deployed statically, continual learning systems evolve in production — getting smarter with each interaction while avoiding catastrophic forgetting. Researchers and practitioners are pursuing this through innovations at both pre-training and post-training stages.

Architectural approaches fundamentally rethink how models learn:

Near-term, production-ready solutions are also emerging:

The spectrum of approaches we’ve seen for continual learning ranges from high-risk architectural moonshots that could redefine the field entirely to production-ready techniques that incrementally improve existing transformers. We’re eager to meet founders across this spectrum.

Production deployment of continual learning requires new governance primitives that don’t yet exist in standard ML workflows. Rollback mechanisms enable reversion to stable checkpoints when updates introduce regressions, requiring full lineage tracking of weights, data, and hyperparameters. Isolation techniques allow safe experimentation without affecting core capabilities. Creating benchmarks, beyond needle-in-the-haystack tests, to gauge the performance of continual learning systems versus in-context learning will also be critical.

3. Reinforcement learning platforms

With data quality fundamentally determining AI capabilities, the old machine learning axiom of “garbage in, garbage out” has never been more relevant. Data platforms such as Mercor, Turing, and micro1 have been instrumental in the AI revolution’s first wave by mobilizing human expertise to create high-quality datasets. But we believe that as AI systems evolve from pattern recognition to autonomous decision-making, a critical limitation has emerged: human-generated labeled data is no longer enough to enable production-grade AI. It cannot teach AI systems how to navigate complex, multi-step tasks with delayed consequences and compounding decisions.

This is where reinforcement learning (RL) becomes essential, as AI must learn through interaction rather than static datasets to ground the AI in “experience.” Leveraging an RL stack is now a cornerstone of AI infra tooling to teach agents complex behaviors without the cost and risk of real-world trial and error. Platforms in this emerging stack include:

4. Inference inflection point

Model deployment and inference optimization emerged as a critical infrastructure layer in our 2024 roadmap, when vendors like Fal, Together, Baseten, and Fireworks pioneered efficient serving solutions. At that time, capital-intensive model training consumed the majority of compute resources across the AI stack. Today, we’re witnessing a fundamental shift in the compute center of gravity. As AI agents and applications transition from prototype to production at scale, inference workloads now rival — and in many cases exceed — training in both compute demand and economic importance. As NVIDIA’s Jensen Huang stated in his GTC 2026 keynote, “Finally, AI is able to do productive work, and therefore the inflection point of inference has arrived.”

This inflection point reflects a maturing market where the cost and performance of running AI systems continuously matter just as much as the initial investment in building them.

A new generation of infrastructure startups is addressing this production imperative through specialized optimization across the inference stack. Companies like TensorMesh are leveraging LMCache to eliminate redundant re-computation, RadixArk is advancing SGLang-based routing and scheduling for multi-turn conversations, and Inferact is pushing vLLM performance boundaries for high-throughput serving. Gimlet Labs and even hyperscalers like NVIDIA are working on heterogeneous inference innovations purpose-built for complex agentic systems. These innovations translate cutting-edge systems research into measurable production gains: faster response times and lower costs.

We’re also seeing innovations in inference for novel deployments, with edge and on-device as one prime example. As AI proliferates all sectors of the economy, from robotics to consumer, AI deployments need to meet users where they are, which isn’t always cloud-based. We’re seeing companies such as WebAI, FemtoAI, PolarGrid, Aizip Mirai, and OpenInfer build at the very “edge” of what’s possible for on-device AI deployments in consumer devices. On-device innovations from model vendors such as Perceptron are also important for physical AI, and we expect more in the space as we outlined in our thinking on intelligent robotics.

Edge AI is also critical for industries such as defense, where comms are jammed or denied; companies such as TurbineOne, Dominion Dynamics, Picogrid, and Breaker are leading the charge on providing the infrastructure tooling for warfighters to harness the power of AI even in the most austere environments.

5. World models

The model layer is one of the most dynamic and hotly contested layers within the AI infrastructure stack. While LLMs have taken over language intelligence, a new class of models — world models — has emerged to deliver intelligence for the physical world.

As AI moves from our screens to our physical realities, new challenges arise: how does an AI “brain” develop intuition for physics and the world if it has no “body”? World models offer a solution. At the core, these are AI systems trained on real-world data — video, sensors, GPS, and more — that learn to predict how the world evolves given a current situation and action. Rather than describing reality, they simulate it.

Out of this newer research, three broad architectural paradigms have emerged. In practice, companies are also beginning to explore hybrids that combine elements of each:

This commercial opportunity for world models is expansive. We recently shared our view of world models in robotics, as this sector has been among the most visible early applications. By generating unlimited synthetic training environments, world models solve the data scarcity problem that has bottlenecked physical AI for decades. Autonomous driving is proving this as Waymo and Wayve use world models to simulate rare edge cases that no real-world test program could economically replicate. The same core capability unlocks even more, such as high-stakes simulation in defense, healthcare, industrial operations, and enterprise planning.

World models are not a vertical-specific kind of tool — they’re a new substrate for machine intelligence, analogous to what LLMs did for text-based reasoning. The industries that build on top of them early will have a significant head start on deploying agents that work in the real world. We’re excited about companies building the architectures and simulators that make world models possible across industries.

Building infrastructure for AI to experience and enter the real world

While the first generation of AI infrastructure companies built the engines of intelligence — the models, compute clusters, and training pipelines that proved AI’s capability — the next generation must build the nervous system and harnesses that allow AI to sense, remember, adapt, and operate continuously in the real world. These frontiers represent more than incremental improvements to existing infrastructure. The companies building in these spaces aren’t just optimizing latency or reducing costs; they’re solving the fundamental challenges that separate impressive demos from reliable systems that create enduring value.

We believe 2026 will be the year when AI infrastructure’s center of gravity definitively shifts, reimagining what AI-native operations look like for this year and beyond.

This post and the information presented are intended for informational purposes only. The views expressed herein are the author’s alone and do not constitute an offer to sell, or a recommendation to purchase, or a solicitation of an offer to buy, any security, nor a recommendation for any investment product or service. While certain information contained herein has been obtained from sources believed to be reliable, neither the author nor any of her employers or their affiliates have independently verified this information, and its accuracy and completeness cannot be guaranteed. Accordingly, no representation or warranty, express or implied, is made as to, and no reliance should be placed on the fairness, accuracy, timeliness or completeness of this information. The author and all employers and their affiliated persons assume no liability for this information and no obligation to update the information or analysis contained herein in the future.

Leave a Reply

Sign Up for TheVCDaily

The best news in VC, delivered every day!