Retrieval Augmented Generation (RAG) and in-context learning have been exciting developments in the field of AI since about 2020. These techniques promised to revolutionize how enterprises and app developers leverage customer data — allowing them to tap into powerful models without retraining or fine-tuning. By simply “feeding” the model relevant data during the prompt, companies could instantly apply AI to their own data.

It’s a breakthrough that has made getting started faster and easier for companies of all sizes. Today, the RAG pattern is at the center of AI strategies for enterprises, app developers, and startups alike.

But the future doesn’t stand still, especially in AI.

While RAG has captured the imagination (and budgets) of the industry, 2025 will reveal its limits. AI’s ability to reason about a customer’s data is only as good as the data the models were originally trained on. And here’s the catch: If your data doesn’t resemble the training set, even the most advanced off-the-shelf models fall short. The gap grows even wider as companies grapple with more diverse and rapidly changing data — and seek cost-effective, smaller models that sacrifice generality for speed and efficiency.

To truly unlock AI’s potential, businesses will need to build on RAG as a foundational tool while integrating specialized training, custom fine-tuning, and domain-specific optimization. RAG remains essential, and leaders like Unstructured.io are making this possible by transforming complex enterprise documents into high-quality data that these systems can understand and reason about.

The next evolution of enterprise AI will require combining RAG with other approaches to fully address the challenges of scale and complexity. And the shift is already underway.

Practitioners now have a broad spectrum of approaches for optimizing AI systems: pre-training builds the foundation on broad datasets, mid-training introduces specialized data during base model development, post-training applies techniques like reinforcement learning after completion, fine-tuning adapts models for specific domains, and test-time compute enhances reasoning capabilities with longer inference cycles. Each approach offers different tradeoffs between generalization, specialization, resource requirements, and processing time.

2025 will show it’s not the end of history for the intersection of data and AI. This applies to enterprises and app developers alike. For example: if Mastercard wants to build a GenAI digital assistant that uses RAG to reason about data in Mastercard’s own data schema, an off-the-shelf model may not understand that data well enough; so Mastercard chose to fine-tune a model to better understand its own data. As other examples, Glean and Read.ai are planning to tune custom models for each customer organization to maximize accuracy for those customers’ own experiences. Contextual AI, co-founded by one of the creators of RAG, is also extending the traditional RAG architecture with fine-tuning and specialization, leading to what they call specialized RAG agents. Ello is just one example of a company that is not just tuning but training its own models to best fit the data and behaviors they care the most about.

For founders, here’s the good news:

First, as compute costs continue to fall and tools like OpenAI’s Reinforcement Fine-Tuning democratize advanced training techniques, sophisticated AI architectures are becoming accessible to a broader set of practitioners. The success of companies like Glean, Ello and Read.ai shows that startups of many sizes can effectively train and tune their own models, especially when focused on specific domains.

Second, advances in test-time compute create a powerful flywheel effect. These techniques enhance model reasoning by spending more time on inference when deeper analysis is needed. This makes the returns from specialized training and domain optimization even more valuable – enhanced reasoning means better understanding of domain-specific data and contexts. As compute costs continue to fall, this virtuous cycle becomes increasingly practical for production deployments.

Third, the shift toward smaller models creates its own reinforcing cycle. As companies choose these models for performance and cost reasons, more of their data naturally falls “out of domain” – smaller models simply can’t maintain the broad knowledge of their larger counterparts. This increases the returns from fine-tuning and specialization, making domain-specific optimization even more valuable.

The convergence of these trends means no single approach will dominate. Instead, we’re entering an era where RAG becomes one tool in a broader toolkit, combining specialized training, sophisticated retrieval, and test-time compute optimization. The companies that enable and capitalize on this shift – while deeply understanding how these approaches work together – will be the ones who best apply AI to their customers’ data, helping enterprises and app developers deliver for their customers and make the future happen faster.

Madrona is actively investing in AI+data architecture, from infrastructure to applications. We have backed multiple companies in this area and will continue to do so. We would love to meet you if you’re building in this space. You can reach out to us directly at: jonturow@madrona.com

The post RAG Is Not the End of History: Why AI+Data Architecture Will Transform in 2025 appeared first on Madrona.

Leave a Reply

Sign Up for TheVCDaily

The best news in VC, delivered every day!