Back to Basics: Is "Denoising" Actually About Predicting the Clean Image?
Автор: Paper to Pod
Загружено: 2025-12-08
Просмотров: 6
Описание:
We've spent years building complex diffusion models that predict "noise." But what if the secret to better AI is simply predicting the image itself?
In this episode of Paper to Pod, we break down "Back to Basics: Let Denoising Generative Models Denoise," the latest paper co-authored by Kaiming He (creator of ResNet). This research challenges the dominant paradigm of Stable Diffusion and DALL-E by proposing a radically simpler approach: *Just Image Transformers (JiT)*.
What is the paper about and why does it matter?
Current diffusion models typically predict the "noise" added to an image. This paper argues that this is mathematically inefficient for high-dimensional data. Instead, they demonstrate that a simple Vision Transformer (ViT) operating on raw image patches—without tokenizers (like VAEs), without latent spaces, and without complex architectures—can achieve state-of-the-art generation by directly predicting the "clean" image.
🎧 In this Video Overview, we cover:
The Core Argument: Why predicting the clean data is fundamentally better than predicting noise due to the "Manifold Assumption."
"Just Image Transformers" (JiT): A minimalist architecture that removes the need for U-Nets and pre-trained tokenizers.
The "Patch" Revolution: How using large patch sizes (16px, 32px) allows Transformers to work efficiently on high-resolution pixel space.
Performance: How this simple approach matches complex models on ImageNet benchmarks.
🧠 Curator's Note (PhD Perspective):
This is a classic "Kaiming He" move—stripping away complexity to reveal the underlying principle. Just like Masked Autoencoders (MAE) simplified self-supervised learning, JiT attempts to simplify generative AI. The insight that "noise does not lie on a low-dimensional manifold, but data does" is a profound theoretical shift that could make future image generation models much cheaper and easier to train.
---
🔗 Original Article & Source:
[Back to Basics: Let Denoising Generative Models Denoise]
[Tianhong Li, Kaiming He]
https://arxiv.org/pdf/2511.13720
---
About Paper to Pod:
Curated by a PhD student, Paper to Pod bridges the gap between complex academic research and accessible knowledge. I hand-pick the most important papers in science and tech, then use AI tools like NotebookLM to generate clear, conversational audio summaries (Deep Dives) for your review.
Disclaimer:
This audio overview was generated using AI (NotebookLM) based on the cited article. The content is for educational purposes only.
#GenerativeAI #KaimingHe #DiffusionModels #ComputerVision #MachineLearning #PaperToPod #DeepLearning
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: