The reverberations from DeepSeek’s R1 release over the past week sent shockwaves not just through the Silicon Valley ecosystem, but also in the public markets and amongst consumers worldwide (chart below). Everyone had a take, ranging from geopolitics, to the small vs big model debate, to the open vs closed source debate, to the RL vs SFT debate…
As a VC focused on the infrastructure layer, my (obligatory) post on DeepSeek is unsurprisingly rooted in the AI infrastructure build-out debate. This has been an ongoing hot topic within the VC world with the pre-DeepSeek discussion centered on issues such as estimating how much compute would ultimately be needed to support an AI-driven future, dealing with capacity constraints, and gaining access to infrastructure. Throughout the discourse, one tenet was held almost universally: compute was a highly limited resource that needed to be amassed through means such as ownership or proprietary access.
Driven by such a philosophy, it’s no secret that Big Tech capex has reached record-breaking levels (chart below) as hyperscalers invest in AI infrastructure to capitalize on seemingly insatiable demand. Capex has remained a major topic on recent earnings calls over the past week.

If the absolute numbers weren’t eyewatering enough, Big Tech capex spending is also jaw-dropping on a relative basis, and could come with an opportunity cost as less investment is funneled into other areas such as M&A:

Model layer companies are not sitting on the sidelines either. Project Stargate, a massive joint venture between OpenAI, SoftBank, Oracle, and MGX, to invest up to $500bn over the next 4 years to build new AI infrastructure in the US was announced last week, coinciding with the DeepSeek news cycle. Slightly distinct from the motives of hyperscalers, model layer companies may be motivated to get in on the build-out action as this could drive significant competitive advantage across growth, margin, and business resiliency vectors. Sam Altman has previously commented that compute constraints are limiting training potential, and it’s been speculated that prices have to be kept high for SOTA models (especially compute-heavy reasoning models) in order to artificially lower demand, since there is not enough capacity to handle widespread usage.
While demand certainly outpaces supply today, with the capex investment race reaching such staggering numbers, questions have been raised around the ultimate return on investment for these capital outlays.

DeepSeek’s sudden entry has intensified the ongoing debate by calling into question recent AI infrastructure spending trends. Notably, DeepSeek’s reasoning model R1 was comparable to OpenAI’s o1 in performance, achieving this not only at substantially lower inference costs (chart above) but also at a fraction of the training costs since DeepSeek-V3 (the base model for R1) was purportedly trained in 2 months with just $5.6MM:

There have been ensuing debates on the actual cost of training since the DeepSeek-V3 paper notes that “the aforementioned costs include only the official training of DeepSeek-V3, excluding the costs associated with prior research and ablation experiments on architectures, algorithms, or data”. Regardless of the answer around the “real” total number, the shocking cost differential headline has sparked a widespread discussion around how much compute is actually required to develop cutting-edge models, and whether there has been overspending on AI infrastructure build-outs.
As I highlighted in my interview with LinkedIn News, this inspired debate around better inference and training cost management will be a wake-up call for the industry, and could cause positive ripple effects as companies build upon efficiency innovations and begin focusing on margins rather than just sheer growth and/or performance. Researchers have already begun to leverage DeepSeek’s learnings around techniques to drive efficiencies, and this could have compounding effects in what could be a “Google moment” for the AI model layer.
If companies begin to become more compute-efficient, does this mean that the impending investments in AI infrastructure are all for naught? Well perhaps not, since some believe that we may see the opposite effect here. As Satya Nadella highlighted in response to the DeepSeek news, more efficiency and lower prices may promote increased adoption per Jevon’s paradox:

Furthermore, expansion of TAM is just one vector that could drive a surge in compute demand. Another factor is the nature of the models being deployed. We are moving into a new paradigm of reasoning models that emphasize time-test compute. Such models tend to be compute-hungry at inference. In fact, while a large share of today’s AI infrastructure capacity is currently channeled toward training use cases, the AI industry’s biggest bulls believe that the inference market could be much larger than expected, driven not just by production use cases but also by the popularity of inference time scaling models. We may just be at the start of this shift, so instead of gaining clarity, I expect more complexity ahead as we grapple with the AI infrastructure build-out question in the coming months.