Mixture of Recursions: The Power of Recursive Transformers
Автор: alphaXiv
Загружено: 2025-08-04
Просмотров: 1236
Описание:
What if language models could learn to "think harder" only when they need to—allocating deep computation to challenging tokens while breezing through simple ones?
Reza Bayat presents Mixture-of-Recursions, a breakthrough architecture that unifies parameter sharing with adaptive computation. By dynamically assigning different recursion depths to individual tokens, MoR achieves large-model quality with significantly fewer parameters and computational resources.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: