Sparse Autoencoders: Progress & Limitations with Joshua Engels
Автор: NDIF Team
Загружено: 2025-08-28
Просмотров: 589
Описание:
In this talk, Joshua Engels discusses sparse autoencoders, commonly referred to as SAEs, which are used to learn monosemantic features from large model internals. He overviews the motivation and background of SAEs for interpretability, along with addressing their key limitations and future directions, including work on improving SAEs through low-rank adapting models.
Joshua Engels is a research scientist at Google DeepMind working on applied interpretability. At the time of this talk, he was pursuing his PhD at MIT. He is interested in language model representations, AI control, and AI safety more broadly.
Discussed Papers:
https://arxiv.org/abs/2502.16681
https://arxiv.org/abs/2501.19406
Josh's Website: https://www.joshengels.com/
00:00 Intro
28:30 Decomposing the Dark Matter of Sparse Autoencoders
37:43 Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
50:41 Low-Rank Adapting Models for Sparse Autoencoders
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: