LLM Relayering Technique "RYS" Generalizes Across Models, Hints at Universal Thinking Space

Research1 source·Mar 25

benchmarks llm qwen layer-duplication open-source-release

Summary

• RYS layer-duplication boosts LLM benchmarks with no training or weight changes
• Technique generalizes from Qwen2-72B to modern Qwen3.5-27B and other open models
• Cross-lingual probes confirm LLM middle layers reason in a universal topic-focused space
• Scanning code and new RYS model variants released publicly for community use

Adjust signal

Details

#	Type	Key Point	Context
1	Research	RYS: duplicate middle layers, no training required	Seven consecutive middle layers of Qwen2-72B duplicated with no weight changes or fine-tuning; this alone produced the #1 ranked open model on HuggingFace Open LLM Leaderboard in mid-2024, discovered using hard math probes and EQ-Bench on two RTX 4090s.
2	Research	Generalizes to Qwen3.5-27B and modern models	Follow-up experiments on Qwen3.5-27B (released around Chinese New Year 2026) confirm relayering remains effective even in more compact models with more entangled functional anatomy — ruling out RYS as a Qwen2-72B fluke.
3	Tech Info	Three-phase LLM structure confirmed directly	Evan Maunder's cross-encoding experiment (English, Mandarin, Base64) showed cosine similarity of hidden states rapidly converging in early layers (encoding), near-perfect through the middle (format-agnostic reasoning), then diverging in final layers (decoding to surface form).
4	Insight	Universal thinking space across 8 languages	Extending to 8 languages × 8 topics, same-topic different-language pairs are more similar in middle layers than same-language different-topic pairs — the strongest direct empirical evidence yet that LLMs reason about meaning rather than surface form.
5	Stat	3,024 candidates; 2M surrogate-scored configs	Rigorous methodology: beam search over 3,024 relayering candidates, surrogate model trained on results and used to score 2 million possible configurations, followed by unified validation sweep.
6	Tech Info	Code and RYS variants released publicly	Author released scanning code and new RYS model variants openly, enabling researchers with consumer-grade GPUs to replicate and extend findings across other model families.

1.Research

RYS: duplicate middle layers, no training required

Seven consecutive middle layers of Qwen2-72B duplicated with no weight changes or fine-tuning; this alone produced the #1 ranked open model on HuggingFace Open LLM Leaderboard in mid-2024, discovered using hard math probes and EQ-Bench on two RTX 4090s.

2.Research

Generalizes to Qwen3.5-27B and modern models

Follow-up experiments on Qwen3.5-27B (released around Chinese New Year 2026) confirm relayering remains effective even in more compact models with more entangled functional anatomy — ruling out RYS as a Qwen2-72B fluke.

3.Tech Info

Three-phase LLM structure confirmed directly

Evan Maunder's cross-encoding experiment (English, Mandarin, Base64) showed cosine similarity of hidden states rapidly converging in early layers (encoding), near-perfect through the middle (format-agnostic reasoning), then diverging in final layers (decoding to surface form).

4.Insight

Universal thinking space across 8 languages

Extending to 8 languages × 8 topics, same-topic different-language pairs are more similar in middle layers than same-language different-topic pairs — the strongest direct empirical evidence yet that LLMs reason about meaning rather than surface form.

5.Stat

3,024 candidates; 2M surrogate-scored configs

Rigorous methodology: beam search over 3,024 relayering candidates, surrogate model trained on results and used to score 2 million possible configurations, followed by unified validation sweep.

6.Tech Info

Code and RYS variants released publicly

Author released scanning code and new RYS model variants openly, enabling researchers with consumer-grade GPUs to replicate and extend findings across other model families.

Findings from LLM Neuroanatomy II research post, March 2026

What This Means

For practitioners, RYS is a zero-cost technique — no gradient updates, no retraining — that can meaningfully boost model benchmark scores and now appears to be a general property of transformer architectures. For researchers, the cross-lingual cosine-similarity evidence is the most direct empirical support yet for a language-agnostic internal reasoning space in LLMs, with implications for interpretability, multilingual transfer, and fundamental understanding of how these models process meaning.

Sources

LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?Dnhkng

Similar Events

Layer Duplication in Qwen2-72B Topped Open LLM Leaderboard on Consumer GPUs

Mar 13

New LLM Architectures Target Long-Context Efficiency: Gemma 4, DeepSeek V4, and More

May 18