← Back to feed
8

Layer Duplication in Qwen2-72B Topped Open LLM Leaderboard on Consumer GPUs

ResearchTop News1 source·Mar 13

Summary

  • • Duplicating 7 middle layers of Qwen2-72B with no weight changes achieved #1 on the Open LLM Leaderboard
  • • Technique requires zero retraining and was developed on two consumer RTX 4090 GPUs
  • • Only circuit-sized blocks of ~7 layers produce gains — single layers or wrong counts yield nothing
  • • Top 4 Open LLM Leaderboard models as of 2026 are still descendants of this method
Adjust signal

Details

1.Research

Duplicating ~7 middle layers of Qwen2-72B achieved #1 on HuggingFace Open LLM Leaderboard

No weights were modified — only the layer block was copied and inserted. The improvement was consistent across all benchmarks on the leaderboard, not just a single metric, suggesting a genuine capability gain rather than benchmark overfitting.

2.Insight

Only circuit-sized blocks of ~7 layers produce gains; too few or too many degrades performance

Single-layer duplication does nothing. Small sub-7 blocks do nothing. Blocks larger than the optimal range degrade performance. This threshold behavior implies the model's middle layers encode discrete functional circuits — learned during pretraining — that only work when preserved as a complete unit. Splitting or over-replicating the circuit breaks it.

3.Tech Info

Layer duplication is a zero-cost, no-retraining technique achievable on consumer hardware

The entire method was developed on two consumer NVIDIA RTX 4090 GPUs (approximately $3,000). No fine-tuning, no new training data, and no weight updates are required — making this replicable by individual researchers outside well-funded labs.

4.Market Impact

Top 4 Open LLM Leaderboard models as of 2026 are descendants of this approach

The technique was adopted or replicated across multiple model development efforts and has had measurable influence on the competitive open-weight LLM landscape, not just as a one-off curiosity.

5.Context

Researcher has since moved to a dual GH200 rig; code and new models forthcoming

Original work was done on basement consumer hardware. The researcher is now running newer models — GLM-4.7, Qwen3.5, MiniMax M2.5 — on a dual GH200 setup, with code and new models to be released. HN post received 358 upvotes and 94 comments.

Research = novel finding; Insight = mechanistic interpretation; Tech Info = methodology detail; Market Impact = competitive effect; Context = background

What This Means

A solo researcher discovered that copying a specific block of roughly 7 middle layers in a large open-weight language model — with no retraining whatsoever — reliably improves benchmark performance, and used this to take the top spot on the HuggingFace Open LLM Leaderboard. The result challenges the assumption that frontier model improvement requires massive compute or proprietary training runs. More broadly, the finding points to a structural property of transformer models: pretraining appears to produce discrete functional circuits confined to specific layer ranges, and those circuits can be amplified by duplication. The downstream influence on the leaderboard's top models suggests this is now a known lever in open-weight model development. Note: claims are based on a single researcher's post and have not yet been independently peer-reviewed.

Sentiment

Limited but positive discussion among AI practitioners, impressed by the solo hack

@QuixiAIEric Hartford · Independent AI model developer (Dolphin, Samantha)View post
Impressed

Excellent research and write up by @dnhkng whose RYS model dominated the @huggingface leaderboard for several months.

@stamigosVitalii-Alan B · Software Engineer & Indie MakerView post
Supportive

Someone secured a top spot on the HuggingFace LLM leaderboard by duplicating the right seven layers on two gaming GPUs at home. Sometimes, leverage beats scale.

Split

Uniformly positive; no significant disagreement in available X reactions (~100% supportive).

Sources

Similar Events