VISION SPARSE AUTOENCODERS: Overview + Walkthrough of Running an SAE
Автор: Sonia Joseph
Загружено: 2025-06-10
Просмотров: 563
Описание:
In this video, we explore how Vision Sparse Autoencoders (SAEs) work — from conceptual foundations to feeding a real image through the model.
⏱️ Timestamps:
History of Vision SAEs
0:00 Introduction to vision sparse autoencoders
2:40 Negative results in sparse autoencoders
3:13 History of SAEs is similar to the history of probes
3:55 SAEs as analytic probes
4:40 SAEs in vision
5:59 Prisma library
Demo - pass an image into a vision SAE
7:03 Setup environment
8:42 Load CLIP SAE from Prisma suite
14:20 Load hooked CLIP model
18:20 Load ImageNet dataset
23:31 Feed in parrot image
24:50 Feed parrot image into SAE and cache activations
32:16 Feed in ImageNet validation into SAE to get feature semantics
42:53 Visualize top images per feature
📓 Colab Notebook:
https://colab.research.google.com/dri...
💻 GitHub Repo:
https://github.com/Prisma-Multimodal/...
📄 Whitepaper:
https://arxiv.org/abs/2504.19475
🐦 Twitter/X:
https://x.com/soniajoseph_
----
Papers (in order mentioned)
SAE papers
Sparse Autoencoders Find Highly Interpretable Features in Language Models
https://arxiv.org/pdf/2309.08600
Steering CLIP’s vision transformer with sparse auto encoders
https://arxiv.org/abs/2504.08729
Negative Results
Sparse Autoencoders Trained on the Same Data Learn Different Features
https://arxiv.org/abs/2501.16615
Sparse Autoencoders Can Interpret Randomly Initialized Transformers
https://arxiv.org/abs/2501.17727
Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)
https://www.lesswrong.com/posts/4uXCA...
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
https://arxiv.org/pdf/2502.16681
Sparse Autoencoder Use Cases?
Auditing Language Models for Hidden Objectives
https://assets.anthropic.com/m/317564...
Linear Probes
Understanding intermediate layers using linear classifier probes
https://arxiv.org/pdf/1610.01644
Information-Theoretic Probing for Linguistic Structure
https://arxiv.org/pdf/2004.03061
A Non-Linear Structural Probe
https://arxiv.org/pdf/2105.10185
SAE improvements
Scaling and evaluating sparse auto encoders
https://cdn.openai.com/papers/sparse-...
SAEs as analytic probes
How Visual Representations Map to Language Feature Space in Multimodal LLMs
Vision SAEs
Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment
https://arxiv.org/abs/2502.03714
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
https://arxiv.org/pdf/2502.12892
Steering CLIP’s vision transformer with sparse auto encoders
https://arxiv.org/abs/2504.08729
Past autoencoder work
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
https://openreview.net/forum?id=Sy2fz...
Transcoders and crosscoders
Transcoders Find Interpretable LLM Feature Circuits
https://arxiv.org/abs/2406.11944
Sparse Crosscoders for Cross-Layer Features and Model Diffing
https://transformer-circuits.pub/2024...
The Prisma Library Whiteppaer:
https://arxiv.org/abs/2504.19475
Повторяем попытку...

Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: