How AI Taught Itself to See [DINOv3]
Автор: Jia-Bin Huang
Загружено: 2025-09-08
Просмотров: 52929
Описание:
How can we train a general-purpose vision model to perceive our visual world?
This video dives into the fascinating idea of self-supervised learning. We will discuss the basic concepts of transfer learning, contrastive language-image pretraining (CLIP), and self-supervised learning methods, including masked autoencoder, contrastive methods like SimCLR, and self-distillation methods like DINOv1, v2, and v3. I hope you enjoy the video!
00:00 Introduction
00:33 Why do features matter?
01:11 Learning features using classification
02:14 Learning features using language (CLIP)
04:09 Learning features using pretask (Self-supervised learning)
05:20 Learning features using contrast (SimCLR)
06:36 Learning features using self-distillation (DINOv1)
12:18 DINOv2
13:54 DINOv3
References:
Language-image pretraining
[CLIP] https://openai.com/index/clip/
Self-supervised learning (pretask):
[Context encoder] https://arxiv.org/abs/1604.07379
[Colorization] https://arxiv.org/abs/1611.09842
[Rotation prediction] https://arxiv.org/abs/1803.07728
[Jigsaw puzzle] https://arxiv.org/abs/1603.09246
[Temporal order shuffling] https://arxiv.org/abs/1708.01246
Contrastive learning
[SimCLR] https://arxiv.org/abs/2002.05709
Inpainting
[MAE] https://arxiv.org/abs/2111.06377
[iBOT] https://arxiv.org/abs/2111.07832
Self-distillation
[DINOv1] https://arxiv.org/abs/2104.14294
[DINOv2] https://arxiv.org/abs/2304.07193
[DINOv3] https://arxiv.org/abs/2508.10104
Self-supervised learning
[Cookbook] https://arxiv.org/abs/2304.12210
Video made with Manim: https://www.manim.community/
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: