Back to Basics: Is "Denoising" Actually About Predicting the Clean Image?

Автор: Paper to Pod

Загружено: 2025-12-08

Просмотров: 6

Описание: We've spent years building complex diffusion models that predict "noise." But what if the secret to better AI is simply predicting the image itself?

In this episode of Paper to Pod, we break down "Back to Basics: Let Denoising Generative Models Denoise," the latest paper co-authored by Kaiming He (creator of ResNet). This research challenges the dominant paradigm of Stable Diffusion and DALL-E by proposing a radically simpler approach: *Just Image Transformers (JiT)*.

What is the paper about and why does it matter?
Current diffusion models typically predict the "noise" added to an image. This paper argues that this is mathematically inefficient for high-dimensional data. Instead, they demonstrate that a simple Vision Transformer (ViT) operating on raw image patches—without tokenizers (like VAEs), without latent spaces, and without complex architectures—can achieve state-of-the-art generation by directly predicting the "clean" image.

🎧 In this Video Overview, we cover:
The Core Argument: Why predicting the clean data is fundamentally better than predicting noise due to the "Manifold Assumption."
"Just Image Transformers" (JiT): A minimalist architecture that removes the need for U-Nets and pre-trained tokenizers.
The "Patch" Revolution: How using large patch sizes (16px, 32px) allows Transformers to work efficiently on high-resolution pixel space.
Performance: How this simple approach matches complex models on ImageNet benchmarks.

🧠 Curator's Note (PhD Perspective):
This is a classic "Kaiming He" move—stripping away complexity to reveal the underlying principle. Just like Masked Autoencoders (MAE) simplified self-supervised learning, JiT attempts to simplify generative AI. The insight that "noise does not lie on a low-dimensional manifold, but data does" is a profound theoretical shift that could make future image generation models much cheaper and easier to train.

---

🔗 Original Article & Source:
[Back to Basics: Let Denoising Generative Models Denoise]
[Tianhong Li, Kaiming He]
https://arxiv.org/pdf/2511.13720

---

About Paper to Pod:
Curated by a PhD student, Paper to Pod bridges the gap between complex academic research and accessible knowledge. I hand-pick the most important papers in science and tech, then use AI tools like NotebookLM to generate clear, conversational audio summaries (Deep Dives) for your review.

Disclaimer:
This audio overview was generated using AI (NotebookLM) based on the cited article. The content is for educational purposes only.

#GenerativeAI #KaimingHe #DiffusionModels #ComputerVision #MachineLearning #PaperToPod #DeepLearning

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Back to Basics: Is "Denoising" Actually About Predicting the Clean Image?

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Акунин ошарашил прогнозом! Финал войны уже решён — Кремль скрывает правду

Акунин ошарашил прогнозом! Финал войны уже решён — Кремль скрывает правду

Bridge App

Titans + MIRAS. Непрерывно обучающийся ИИ от Google

Titans + MIRAS. Непрерывно обучающийся ИИ от Google

Их находят на территории Римской Империи. Никто не знает зачем они

Их находят на территории Римской Империи. Никто не знает зачем они

But how do AI images and videos actually work? | Guest video by Welch Labs

But how do AI images and videos actually work? | Guest video by Welch Labs

Почему «Трансформеры» заменяют CNN?

Почему «Трансформеры» заменяют CNN?

Короткометражка «Апокалипсис ИИ» | Озвучка DeeaFilm

Короткометражка «Апокалипсис ИИ» | Озвучка DeeaFilm

Can AI Actually Do Science? The

Can AI Actually Do Science? The "Mannose Rescue" Experiment Explained

Flow-Matching vs Diffusion Models explained side by side

Flow-Matching vs Diffusion Models explained side by side

Шоу Путина: Ежегодная лапша на уши россиянам

Шоу Путина: Ежегодная лапша на уши россиянам

ЛУЧШАЯ БЕСПЛАТНАЯ НЕЙРОСЕТЬ Google, которой нет аналогов

ЛУЧШАЯ БЕСПЛАТНАЯ НЕЙРОСЕТЬ Google, которой нет аналогов

Предел развития НЕЙРОСЕТЕЙ

Предел развития НЕЙРОСЕТЕЙ

ВЕЛИКИЙ ОБМАН ЕГИПТА — Нам врали о строительстве пирамид

ВЕЛИКИЙ ОБМАН ЕГИПТА — Нам врали о строительстве пирамид

Больше, чем генераторы изображений: наука решения проблем с использованием теории вероятностей | ...

Больше, чем генераторы изображений: наука решения проблем с использованием теории вероятностей | ...

Почему НАМ это Не ПОКАЗАЛИ в ВУЗе? Электродвигатель: принцип работы и конструкция.

Почему НАМ это Не ПОКАЗАЛИ в ВУЗе? Электродвигатель: принцип работы и конструкция.

Почему законы физики не запрещают ЭТО?

Почему законы физики не запрещают ЭТО?

Вы думали, что допинг — это плохо? Подождите, пока не услышите об электромагнитных велосипедах.

Вы думали, что допинг — это плохо? Подождите, пока не услышите об электромагнитных велосипедах.

Социобиолог про ИИ и утрату навыков: выживут талантливые

Социобиолог про ИИ и утрату навыков: выживут талантливые

В Америке найдены самые древние свидетельства присутствия человека.

В Америке найдены самые древние свидетельства присутствия человека.