Layer Duplication in Qwen2-72B Topped Open LLM Leaderboard on Consumer GPUs
Summary
- • Duplicating 7 middle layers of Qwen2-72B with no weight changes achieved #1 on the Open LLM Leaderboard
- • Technique requires zero retraining and was developed on two consumer RTX 4090 GPUs
- • Only circuit-sized blocks of ~7 layers produce gains — single layers or wrong counts yield nothing
- • Top 4 Open LLM Leaderboard models as of 2026 are still descendants of this method
Details
Duplicating ~7 middle layers of Qwen2-72B achieved #1 on HuggingFace Open LLM Leaderboard
No weights were modified — only the layer block was copied and inserted. The improvement was consistent across all benchmarks on the leaderboard, not just a single metric, suggesting a genuine capability gain rather than benchmark overfitting.
Only circuit-sized blocks of ~7 layers produce gains; too few or too many degrades performance
Single-layer duplication does nothing. Small sub-7 blocks do nothing. Blocks larger than the optimal range degrade performance. This threshold behavior implies the model's middle layers encode discrete functional circuits — learned during pretraining — that only work when preserved as a complete unit. Splitting or over-replicating the circuit breaks it.
Layer duplication is a zero-cost, no-retraining technique achievable on consumer hardware
The entire method was developed on two consumer NVIDIA RTX 4090 GPUs (approximately $3,000). No fine-tuning, no new training data, and no weight updates are required — making this replicable by individual researchers outside well-funded labs.
Top 4 Open LLM Leaderboard models as of 2026 are descendants of this approach
The technique was adopted or replicated across multiple model development efforts and has had measurable influence on the competitive open-weight LLM landscape, not just as a one-off curiosity.
Researcher has since moved to a dual GH200 rig; code and new models forthcoming
Original work was done on basement consumer hardware. The researcher is now running newer models — GLM-4.7, Qwen3.5, MiniMax M2.5 — on a dual GH200 setup, with code and new models to be released. HN post received 358 upvotes and 94 comments.
Research = novel finding; Insight = mechanistic interpretation; Tech Info = methodology detail; Market Impact = competitive effect; Context = background
What This Means
A solo researcher discovered that copying a specific block of roughly 7 middle layers in a large open-weight language model — with no retraining whatsoever — reliably improves benchmark performance, and used this to take the top spot on the HuggingFace Open LLM Leaderboard. The result challenges the assumption that frontier model improvement requires massive compute or proprietary training runs. More broadly, the finding points to a structural property of transformer models: pretraining appears to produce discrete functional circuits confined to specific layer ranges, and those circuits can be amplified by duplication. The downstream influence on the leaderboard's top models suggests this is now a known lever in open-weight model development. Note: claims are based on a single researcher's post and have not yet been independently peer-reviewed.
Sentiment
Limited but positive discussion among AI practitioners, impressed by the solo hack
“Excellent research and write up by @dnhkng whose RYS model dominated the @huggingface leaderboard for several months.”
“Someone secured a top spot on the HuggingFace LLM leaderboard by duplicating the right seven layers on two gaming GPUs at home. Sometimes, leverage beats scale.”
Split
Uniformly positive; no significant disagreement in available X reactions (~100% supportive).
