Layer Duplication in Qwen2-72B Topped Open LLM Leaderboard on Consumer GPUs

ResearchTop News1 source·Mar 13

qwen benchmarks circuit-analysis layer-duplication llm

Summary

• Duplicating 7 middle layers of Qwen2-72B with no weight changes achieved #1 on the Open LLM Leaderboard
• Technique requires zero retraining and was developed on two consumer RTX 4090 GPUs
• Only circuit-sized blocks of ~7 layers produce gains — single layers or wrong counts yield nothing
• Top 4 Open LLM Leaderboard models as of 2026 are still descendants of this method

Adjust signal

Details

#	Type	Key Point	Context
1	Research	Duplicating ~7 middle layers of Qwen2-72B achieved #1 on HuggingFace Open LLM Leaderboard	No weights were modified — only the layer block was copied and inserted. The improvement was consistent across all benchmarks on the leaderboard, not just a single metric, suggesting a genuine capability gain rather than benchmark overfitting.
2	Insight	Only circuit-sized blocks of ~7 layers produce gains; too few or too many degrades performance	Single-layer duplication does nothing. Small sub-7 blocks do nothing. Blocks larger than the optimal range degrade performance. This threshold behavior implies the model's middle layers encode discrete functional circuits — learned during pretraining — that only work when preserved as a complete unit. Splitting or over-replicating the circuit breaks it.
3	Tech Info	Layer duplication is a zero-cost, no-retraining technique achievable on consumer hardware	The entire method was developed on two consumer NVIDIA RTX 4090 GPUs (approximately $3,000). No fine-tuning, no new training data, and no weight updates are required — making this replicable by individual researchers outside well-funded labs.
4	Market Impact	Top 4 Open LLM Leaderboard models as of 2026 are descendants of this approach	The technique was adopted or replicated across multiple model development efforts and has had measurable influence on the competitive open-weight LLM landscape, not just as a one-off curiosity.
5	Context	Researcher has since moved to a dual GH200 rig; code and new models forthcoming	Original work was done on basement consumer hardware. The researcher is now running newer models — GLM-4.7, Qwen3.5, MiniMax M2.5 — on a dual GH200 setup, with code and new models to be released. HN post received 358 upvotes and 94 comments.

1.Research

Duplicating ~7 middle layers of Qwen2-72B achieved #1 on HuggingFace Open LLM Leaderboard

No weights were modified — only the layer block was copied and inserted. The improvement was consistent across all benchmarks on the leaderboard, not just a single metric, suggesting a genuine capability gain rather than benchmark overfitting.

2.Insight

Only circuit-sized blocks of ~7 layers produce gains; too few or too many degrades performance

Single-layer duplication does nothing. Small sub-7 blocks do nothing. Blocks larger than the optimal range degrade performance. This threshold behavior implies the model's middle layers encode discrete functional circuits — learned during pretraining — that only work when preserved as a complete unit. Splitting or over-replicating the circuit breaks it.

3.Tech Info

Layer duplication is a zero-cost, no-retraining technique achievable on consumer hardware

The entire method was developed on two consumer NVIDIA RTX 4090 GPUs (approximately $3,000). No fine-tuning, no new training data, and no weight updates are required — making this replicable by individual researchers outside well-funded labs.

4.Market Impact

Top 4 Open LLM Leaderboard models as of 2026 are descendants of this approach

The technique was adopted or replicated across multiple model development efforts and has had measurable influence on the competitive open-weight LLM landscape, not just as a one-off curiosity.

5.Context

Researcher has since moved to a dual GH200 rig; code and new models forthcoming

Original work was done on basement consumer hardware. The researcher is now running newer models — GLM-4.7, Qwen3.5, MiniMax M2.5 — on a dual GH200 setup, with code and new models to be released. HN post received 358 upvotes and 94 comments.

Research = novel finding; Insight = mechanistic interpretation; Tech Info = methodology detail; Market Impact = competitive effect; Context = background

What This Means

A solo researcher discovered that copying a specific block of roughly 7 middle layers in a large open-weight language model — with no retraining whatsoever — reliably improves benchmark performance, and used this to take the top spot on the HuggingFace Open LLM Leaderboard. The result challenges the assumption that frontier model improvement requires massive compute or proprietary training runs. More broadly, the finding points to a structural property of transformer models: pretraining appears to produce discrete functional circuits confined to specific layer ranges, and those circuits can be amplified by duplication. The downstream influence on the leaderboard's top models suggests this is now a known lever in open-weight model development. Note: claims are based on a single researcher's post and have not yet been independently peer-reviewed.

Sentiment

Limited but positive discussion among AI practitioners, impressed by the solo hack

@QuixiAIEric Hartford · Independent AI model developer (Dolphin, Samantha)View post

Impressed

“Excellent research and write up by @dnhkng whose RYS model dominated the @huggingface leaderboard for several months.”

@stamigosVitalii-Alan B · Software Engineer & Indie MakerView post

Supportive

“Someone secured a top spot on the HuggingFace LLM leaderboard by duplicating the right seven layers on two gaming GPUs at home. Sometimes, leverage beats scale.”

Split

Uniformly positive; no significant disagreement in available X reactions (~100% supportive).

Sources

Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUsDnhkng

Similar Events

LLM Relayering Technique "RYS" Generalizes Across Models, Hints at Universal Thinking Space

Mar 25

Moonshot AI Proposes Cross-Datacenter LLM Serving via Prefill-as-a-Service

Apr 20