Open vs. Closed Source AI: The Monetizable Spread Argument

Markets1 source·Mar 26

industry-analysis economic-impact unit-economics ai-progress openai anthropic deepseek llama

Summary

• Open-source models now trail frontier closed models by only ~3 months, down from ~1 year in late 2024
• A new 'monetizable spread' concept argues markets are mispricing the erosion of closed-source AI premium value
• DeepSeek R1 matched OpenAI o1 at roughly 3% of the cost, illustrating the capability compression mechanism
• The 'good enough' line for open models moves up each quarter, eroding closed-model premium revenue

Adjust signal

Details

#	Type	Key Point	Context
1	Insight	Author introduces 'monetizable spread' to reframe closed-source AI valuations	The author argues the raw capability spread (benchmark delta between best closed and best open model) is the wrong metric for investors. What matters is the 'monetizable spread' — the subset of that delta customers will actually pay a premium for. This is declining faster than the headline capability spread, a divergence the author contends markets have not priced.
2	Stat	MMLU benchmark gap closed from ~17.5 pp to near zero in roughly 2 years	In late 2023, the best closed model scored ~88% on MMLU while the best open model scored ~70.5%. By early 2026, that gap is effectively zero on knowledge benchmarks and single digits on most reasoning tasks.
3	Research	Epoch AI: open-weight models trail frontier by ~3 months, down from ~1 year in late 2024	The time lag between frontier closed models and the best open-weight equivalents compressed dramatically in roughly 18 months, implying continued rapid erosion of any capability-based moat for closed-source labs.
4	Stat	DeepSeek V3 trained at 2.6M GPU hours vs. Llama 3 405B's 30.8M — ~12x more efficient	DeepSeek's R1 reasoning model, built on V3, matched OpenAI's o1 at roughly 3% of the cost — illustrating how open-source teams are achieving capability parity through efficiency breakthroughs rather than proportional compute spend.
5	Market Impact	The 'good enough' line — where open models become interchangeable with closed ones — moves up each quarter	Each time the 'good enough' threshold rises to cover another task tier, a slice of paying customers loses economic justification for the closed-model premium. The monetizable spread is this gap between the threshold and the top of the task stack, multiplied by revenue density at each layer.
6	Stat	Anthropic's Economic Index: routine coding and math tasks = 36% of API usage	Menlo Ventures enterprise survey similarly found code completion and productivity tools — not agentic systems — drove category revenue, supporting the thesis that most current AI revenue sits in task tiers already reachable by open models.
7	Context	Frontier closed models still lead on complex agentic coding and long-horizon reliability tasks	The author is not arguing open models have won overall. The capability spread at the top of the stack — complex multi-step tool chaining, long-horizon workflows — remains real. The argument is that the list of tasks where closed models meaningfully lead is shrinking each quarter.
8	Insight	Framework rests on unverified assumptions about revenue density by task complexity	No published dataset cleanly maps AI revenue to task difficulty. The revenue density claims at each tier are the author's assertion, not measurement. The author explicitly flags this caveat, noting proxy data is 'strongly directional' but not conclusive.

1.Insight

Author introduces 'monetizable spread' to reframe closed-source AI valuations

The author argues the raw capability spread (benchmark delta between best closed and best open model) is the wrong metric for investors. What matters is the 'monetizable spread' — the subset of that delta customers will actually pay a premium for. This is declining faster than the headline capability spread, a divergence the author contends markets have not priced.

2.Stat

MMLU benchmark gap closed from ~17.5 pp to near zero in roughly 2 years

In late 2023, the best closed model scored ~88% on MMLU while the best open model scored ~70.5%. By early 2026, that gap is effectively zero on knowledge benchmarks and single digits on most reasoning tasks.

3.Research

Epoch AI: open-weight models trail frontier by ~3 months, down from ~1 year in late 2024

The time lag between frontier closed models and the best open-weight equivalents compressed dramatically in roughly 18 months, implying continued rapid erosion of any capability-based moat for closed-source labs.

4.Stat

DeepSeek V3 trained at 2.6M GPU hours vs. Llama 3 405B's 30.8M — ~12x more efficient

DeepSeek's R1 reasoning model, built on V3, matched OpenAI's o1 at roughly 3% of the cost — illustrating how open-source teams are achieving capability parity through efficiency breakthroughs rather than proportional compute spend.

5.Market Impact

The 'good enough' line — where open models become interchangeable with closed ones — moves up each quarter

Each time the 'good enough' threshold rises to cover another task tier, a slice of paying customers loses economic justification for the closed-model premium. The monetizable spread is this gap between the threshold and the top of the task stack, multiplied by revenue density at each layer.

6.Stat

Anthropic's Economic Index: routine coding and math tasks = 36% of API usage

Menlo Ventures enterprise survey similarly found code completion and productivity tools — not agentic systems — drove category revenue, supporting the thesis that most current AI revenue sits in task tiers already reachable by open models.

7.Context

Frontier closed models still lead on complex agentic coding and long-horizon reliability tasks

The author is not arguing open models have won overall. The capability spread at the top of the stack — complex multi-step tool chaining, long-horizon workflows — remains real. The argument is that the list of tasks where closed models meaningfully lead is shrinking each quarter.

8.Insight

Framework rests on unverified assumptions about revenue density by task complexity

No published dataset cleanly maps AI revenue to task difficulty. The revenue density claims at each tier are the author's assertion, not measurement. The author explicitly flags this caveat, noting proxy data is 'strongly directional' but not conclusive.

Insight = author's analytical argument; Stat = quantitative data point; Research = third-party finding; Market Impact = competitive landscape observation; Context = background or qualification

What This Means

If the author's thesis holds, investors evaluating closed-source AI lab valuations should scrutinize not just benchmark leadership but revenue concentration in tasks where open models already suffice. The rapid compression of the capability lag — from one year to three months in roughly 18 months — suggests the window for premium pricing on mid-complexity tasks is narrowing faster than market consensus assumes. For AI practitioners, open-weight models are increasingly viable substitutes for the majority of production use cases, with the closed-model premium justified mainly at the frontier of agentic and long-horizon reliability tasks.

Sources

Closed Source vs Open Source AI: A Cage Fight Few People UnderstandDavefriedman

Similar Events

Open Models Trail Closed Frontier by 8–10 Months on Private Benchmarks

5d ago

Uncensored AI: Abliteration Technique Makes Removing Model Safety Guardrails Trivially Easy

3d ago