mHC: Manifold-Constrained Hyper-Connections (Paper Review)
Автор: SheepML
Загружено: 2026-01-07
Просмотров: 456
Описание:
In this video, I explain DeepSeek's latest paper: mHC: Manifold-Constrained Hyper-Connections (arXiv: 2512.24880).
Hyper-Connections (HC) extended the classic residual connection paradigm by widening the residual stream and introducing learnable mixing matrices. While this brought performance gains, it also broke the identity mapping property — causing training instability and gradient explosions at scale (gains up to 3000× in 27B models).
mHC solves this by projecting the residual connection matrices onto the Birkhoff polytope (doubly stochastic matrices) using the Sinkhorn-Knopp algorithm. This restores stable signal propagation while preserving the flexibility and performance benefits of Hyper-Connections.
Key takeaways:
TIMESTAMPS:
0:00 - Intro
0:54 - Complex Hyper-Connections
2:18 - Flaw in Hyper-Connections
3:37 - Instability at Scale
5:01 - Hyper-Connections Memory Wall
6:10 - Constrained Hyper-Connections Solution
8:08 - Double Stochastic Constraints
10:29 - Bounded Propagation
11:29 - Weights Mappings
12:47 - SOTA Stability
14:27 - Scalability of Constrained Hyper-Connections
15:38 - Minimal Overhead Training
17:01 - Technical Details
18:23 - Conclusion
Why standard residual connections work (identity mapping)
How Hyper-Connections break this property
The Birkhoff polytope constraint and Sinkhorn-Knopp projection
Empirical results: stable training + better downstream performance
📄 Paper: https://arxiv.org/abs/2512.24880
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: