Happy Sunday and welcome to Investing in AI. Be sure to check out our AI in NYC podcast to hear the latest analysis on AI news and interesting applied AI startups. And also check out the newly launched Small Model Marketplace from Neurometric. 100+ models performing specific tasks – with no token costs, just a flat monthly fee.

Today I want to ask an important question every investor should be thinking about:

If inference is going to zero, where is the alpha?

That question is rattling around every AI-focused fund right now. And it’s the wrong question—because it assumes that a falling price means a shrinking market. History says the opposite. What we’re actually watching is the transition from the “Speculative Build” phase, where the entire investment thesis was “buy H100s at any cost and figure it out later,” to the “Utility Efficiency” phase, where the money shifts from betting on the best brain to betting on the best margin. That’s a fundamentally different game, and most investors haven’t updated their playbooks yet.

This piece is my attempt to lay out how the inference market matures, where the defensible positions are, and where capital should flow as AI stops being a science experiment and starts being an industrial utility.

The Economic Engine: Jevons Paradox and the Demand Explosion

In the 1860s, William Stanley Jevons observed that as steam engines became more fuel-efficient, total coal consumption increased. Cheaper energy didn’t reduce demand—it unlocked use cases that were previously uneconomical. The same dynamic is playing out in inference right now.

When inference is expensive, AI is a chatbot. It sits behind a text box, waits for a human to type something, and returns a response. That’s a tool. When inference is effectively free, AI becomes a background autonomous agent—monitoring, deciding, acting, continuously, without a human in the loop. That’s not a tool. That’s a workforce.

The surface area of economically viable AI use cases is expanding faster than prices are falling. Every 10x reduction in inference cost doesn’t just make existing applications cheaper; it makes entirely new categories of applications possible. Think about logistics optimization running inference on every package in a supply chain, or financial compliance checking every transaction in real time, or customer service agents that don’t just answer tickets but proactively resolve issues before the customer even notices.

For investors, this means the question isn’t “will the inference market shrink?” It’s “which segments of this exploding market will capture durable margin?” That’s where the real analysis begins.

Segment 1: The Frontier Labs (The “Intelligence” Layer)

OpenAI, Anthropic, Google DeepMind—these are the research-heavy, high-CAPEX players racing to build the most capable models on earth. Their business model is selling maximum reasoning at a premium.

The defensibility here is narrower than most people think. It comes down to the last 5% of intelligence—the marginal capability that separates a model that can draft a decent email from one that can navigate a complex legal strategy or contribute meaningfully to drug discovery. For high-stakes, high-value decisions where error rates have real consequences, frontier capability commands a genuine premium.

The second moat is data recency. Exclusive partnerships with platforms like Reddit and news organizations give frontier labs access to real-time “world data” that smaller players can’t replicate. When your model needs to reason about what happened yesterday, not just what’s in a static training corpus, that pipeline matters.

But here’s the investor warning: for general tasks—summarization, translation, basic Q&A, content generation—frontier models are dramatically over-provisioned. The churn risk is real. As open-source and distilled models close the gap on the 80th percentile of use cases, frontier labs face a strategic fork: dominate the premium tier or watch their volume business erode. Frontier or bust. There is no comfortable middle.

Segment 2: Action-as-a-Service (The “Task” Layer)

This is where I think the most interesting investment opportunity lives (hence working on Neurometric), and it’s the segment most investors are still underweighting.

The thesis is simple: instead of selling raw model access (tokens in, tokens out), you host functional tasks. Bank reconciliation. Lead qualification. Medical coding review. Contract clause extraction. The customer doesn’t buy inference—they buy a completed unit of work, priced per task.

The economics are beautiful from an investor’s perspective, and here’s why. Revenue is fixed on a per-task basis. The customer pays $0.15 for a reconciled transaction or $2.00 for a qualified lead. But your cost to deliver that task is variable and declining. You’re running optimized small language models, fine-tuned specifically for that task, on depreciated or secondary-market hardware. Every quarter, your cost basis drops while your price holds. That’s expanding margin without raising prices—the PE investor’s dream.

This is SaaS 2.0. Traditional SaaS sells software that enables a human to work. Action-as-a-Service sells software that is the work. The human is removed from the loop entirely on the task itself, which means your unit economics aren’t constrained by labor costs anymore.

The defensibility comes from two places. First, vertical specialization. A sentiment analyzer tuned for medical malpractice litigation is a genuinely different product than a general sentiment model—different training data, different error tolerance, different regulatory context. That specificity creates switching costs. Second, and more importantly, data gravity. Once a task host has processed 100 million insurance claims, the feedback loops built into their fine-tuning process create an accuracy moat that no amount of general compute can overcome. The model gets better because it’s running in production, which means every new customer widens the gap.

There’s also a hardware arbitrage story here that matters. The NVIDIA A100/H100 secondary market is starting to develop real liquidity. Distilled, task-specific models don’t need the latest silicon—they need enough silicon, cheaply. Running a 7B parameter model fine-tuned for invoice processing on a used A100 is a fraction of the cost of running GPT-5 for the same task, and for that narrow task, the accuracy is comparable or better. That’s the kind of structural cost advantage that compounds over time.

From a PE perspective, this is the most investable segment in the inference stack. Predictable unit economics, high switching costs, expanding margins, and a clear path to profitability without requiring frontier-scale R&D budgets.

Segment 3: The Hyperscalers and Sovereign Clouds (The “Utility” Layer)

Azure, AWS, GCP, and increasingly regional government-backed clouds—these are playing a different game entirely. Their moat isn’t intelligence. It’s infrastructure.

Two factors make this segment durable. The first is the power wall. Building and operating data centers at scale requires control over energy and physical land that can’t be replicated quickly. Permitting alone takes years. The second is sovereignty. Domestic hosting mandates are proliferating across the EU, Middle East, and Southeast Asia. When a government says “this data doesn’t leave our borders,” the hyperscaler with local presence wins by default.

For investors, this is a safe yield play. These are the utility stocks of the 21st century—not exciting, not high-growth, but essential infrastructure with regulatory moats. You’re buying the pipes, not the water.

Segment 4: Edge Inference (The “Distributed” Layer)

The final segment worth watching is the move to push compute onto local devices—phones, vehicles, factory sensors, wearables.

The drivers are straightforward: latency (some decisions can’t wait for a round trip to the cloud), privacy (some data shouldn’t leave the device), and cost (using the customer’s electricity instead of the provider’s). As models get smaller and more efficient, the range of tasks that can run on-device keeps expanding.

Defensibility here lives in hardware-software co-optimization. If your model is the only one that runs efficiently on an Apple Watch’s neural engine or a Tesla FSD chip, you have a moat that’s defined by silicon architecture, not just model quality. The second advantage is data proximity—being “on the wire” where data is generated means you can act on information before it ever hits the cloud.

This is still early, but the trajectory is clear. Edge inference will absorb a significant share of high-frequency, low-complexity tasks over the next five years.

Why Consolidation Is Not Guaranteed

The instinct from previous tech cycles is to assume consolidation—that inference will follow the storage market’s path toward a handful of commodity providers. I think that’s wrong, and the reason is qualitative.

Storage consolidated because bits are bits. A byte on a Seagate drive is identical to a byte on a Western Digital drive. Storage is horizontal—the product is undifferentiated. But AI reasoning is opinionated. The “logic” required for a lawyer reviewing contracts is fundamentally different from the “logic” required for a chemist analyzing molecular interactions. Inference is vertical by nature, which means it fragments rather than consolidates.

Regulatory dynamics reinforce this. Antitrust scrutiny and data residency laws act as a firewall against total market capture by one or two players. The inference market is going to look more like specialized manufacturing than commodity cloud storage.

The Playbook: Where to Allocate Capital

Go long on frontier labs if the valuations are reasonable post-IPO.

Go long on Action-as-a-Service providers with high task stickiness, expanding data gravity, and low-cost hardware strategies. These are the businesses with the clearest path to durable, expanding margins.

Go long on regional cloud providers in sovereignty-hungry nations. The regulatory tailwind here is strong and accelerating.

Watch edge AI. Go long once it starts to take off. It isn’t there yet but it will be.

Be cautious on mid-tier model providers who lack the scale of a hyperscaler and the accuracy of a frontier lab. This is the “death zone” of the inference market—too expensive to compete on cost, too generic to compete on quality.

Summary

The maturation of the inference market isn’t about the death of margin. It’s about the birth of specialization. The winners won’t be the companies that “solved AI” in some grand, general sense. They’ll be the ones that industrialized it—turning intelligence into reliable, cheap, invisible units of work that run millions of times a day without anyone thinking about it.

That’s not a dystopia for investors. That’s the best kind of market: one where the product disappears into the infrastructure of everyday business, and the meter never stops running.

Thanks for reading.

Leave a Reply

Sign Up for TheVCDaily

The best news in VC, delivered every day!