LLM Relayering Technique "RYS" Generalizes Across Models, Hints at Universal Thinking Space
Summary
- • RYS layer-duplication boosts LLM benchmarks with no training or weight changes
- • Technique generalizes from Qwen2-72B to modern Qwen3.5-27B and other open models
- • Cross-lingual probes confirm LLM middle layers reason in a universal topic-focused space
- • Scanning code and new RYS model variants released publicly for community use
Details
RYS: duplicate middle layers, no training required
Seven consecutive middle layers of Qwen2-72B duplicated with no weight changes or fine-tuning; this alone produced the #1 ranked open model on HuggingFace Open LLM Leaderboard in mid-2024, discovered using hard math probes and EQ-Bench on two RTX 4090s.
Generalizes to Qwen3.5-27B and modern models
Follow-up experiments on Qwen3.5-27B (released around Chinese New Year 2026) confirm relayering remains effective even in more compact models with more entangled functional anatomy — ruling out RYS as a Qwen2-72B fluke.
Three-phase LLM structure confirmed directly
Evan Maunder's cross-encoding experiment (English, Mandarin, Base64) showed cosine similarity of hidden states rapidly converging in early layers (encoding), near-perfect through the middle (format-agnostic reasoning), then diverging in final layers (decoding to surface form).
Universal thinking space across 8 languages
Extending to 8 languages × 8 topics, same-topic different-language pairs are more similar in middle layers than same-language different-topic pairs — the strongest direct empirical evidence yet that LLMs reason about meaning rather than surface form.
3,024 candidates; 2M surrogate-scored configs
Rigorous methodology: beam search over 3,024 relayering candidates, surrogate model trained on results and used to score 2 million possible configurations, followed by unified validation sweep.
Code and RYS variants released publicly
Author released scanning code and new RYS model variants openly, enabling researchers with consumer-grade GPUs to replicate and extend findings across other model families.
Findings from LLM Neuroanatomy II research post, March 2026
What This Means
For practitioners, RYS is a zero-cost technique — no gradient updates, no retraining — that can meaningfully boost model benchmark scores and now appears to be a general property of transformer architectures. For researchers, the cross-lingual cosine-similarity evidence is the most direct empirical support yet for a language-agnostic internal reasoning space in LLMs, with implications for interpretability, multilingual transfer, and fundamental understanding of how these models process meaning.
