Open vs. Closed Source AI: The Monetizable Spread Argument
Summary
- • Open-source models now trail frontier closed models by only ~3 months, down from ~1 year in late 2024
- • A new 'monetizable spread' concept argues markets are mispricing the erosion of closed-source AI premium value
- • DeepSeek R1 matched OpenAI o1 at roughly 3% of the cost, illustrating the capability compression mechanism
- • The 'good enough' line for open models moves up each quarter, eroding closed-model premium revenue
Details
Author introduces 'monetizable spread' to reframe closed-source AI valuations
The author argues the raw capability spread (benchmark delta between best closed and best open model) is the wrong metric for investors. What matters is the 'monetizable spread' — the subset of that delta customers will actually pay a premium for. This is declining faster than the headline capability spread, a divergence the author contends markets have not priced.
MMLU benchmark gap closed from ~17.5 pp to near zero in roughly 2 years
In late 2023, the best closed model scored ~88% on MMLU while the best open model scored ~70.5%. By early 2026, that gap is effectively zero on knowledge benchmarks and single digits on most reasoning tasks.
Epoch AI: open-weight models trail frontier by ~3 months, down from ~1 year in late 2024
The time lag between frontier closed models and the best open-weight equivalents compressed dramatically in roughly 18 months, implying continued rapid erosion of any capability-based moat for closed-source labs.
DeepSeek V3 trained at 2.6M GPU hours vs. Llama 3 405B's 30.8M — ~12x more efficient
DeepSeek's R1 reasoning model, built on V3, matched OpenAI's o1 at roughly 3% of the cost — illustrating how open-source teams are achieving capability parity through efficiency breakthroughs rather than proportional compute spend.
The 'good enough' line — where open models become interchangeable with closed ones — moves up each quarter
Each time the 'good enough' threshold rises to cover another task tier, a slice of paying customers loses economic justification for the closed-model premium. The monetizable spread is this gap between the threshold and the top of the task stack, multiplied by revenue density at each layer.
Anthropic's Economic Index: routine coding and math tasks = 36% of API usage
Menlo Ventures enterprise survey similarly found code completion and productivity tools — not agentic systems — drove category revenue, supporting the thesis that most current AI revenue sits in task tiers already reachable by open models.
Frontier closed models still lead on complex agentic coding and long-horizon reliability tasks
The author is not arguing open models have won overall. The capability spread at the top of the stack — complex multi-step tool chaining, long-horizon workflows — remains real. The argument is that the list of tasks where closed models meaningfully lead is shrinking each quarter.
Framework rests on unverified assumptions about revenue density by task complexity
No published dataset cleanly maps AI revenue to task difficulty. The revenue density claims at each tier are the author's assertion, not measurement. The author explicitly flags this caveat, noting proxy data is 'strongly directional' but not conclusive.
Insight = author's analytical argument; Stat = quantitative data point; Research = third-party finding; Market Impact = competitive landscape observation; Context = background or qualification
What This Means
If the author's thesis holds, investors evaluating closed-source AI lab valuations should scrutinize not just benchmark leadership but revenue concentration in tasks where open models already suffice. The rapid compression of the capability lag — from one year to three months in roughly 18 months — suggests the window for premium pricing on mid-complexity tasks is narrowing faster than market consensus assumes. For AI practitioners, open-weight models are increasingly viable substitutes for the majority of production use cases, with the closed-model premium justified mainly at the frontier of agentic and long-horizon reliability tasks.
