Microsoft - Sigma-Moe-Tiny Technical Report
Автор: AI Papers Podcast Daily
Загружено: 2026-01-19
Просмотров: 4
Описание:
Microsoft Research has developed a new artificial intelligence model called Sigma-MoE-Tiny that is designed to be both powerful and highly efficient. This model uses a special architecture known as Mixture-of-Experts which splits the system into many small digital experts, but it is unique because it only activates one single expert for each piece of data it processes. Even though the model contains 20 billion total parameters, it only uses 0.5 billion of them at a time, making it much faster and cheaper to run than many other models. A major challenge in building this model was ensuring that the work was shared evenly among the experts, so the researchers invented a training schedule that starts with more experts active and slowly reduces them to just one. The team trained the model on diverse information including math and coding problems, and they taught it to handle very long documents by gradually increasing the amount of text it could read at once. Despite using very little computing power compared to its size, Sigma-MoE-Tiny performs as well as or better than other open-source models that are much larger.
https://arxiv.org/pdf/2512.16248
https://github.com/microsoft/ltp-mega...
https://qghuxmu.github.io/Sigma-MoE-Tiny
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: