ConvNeXt: A ConvNet for the 2020s

Автор: AI Bites

Загружено: 2022-01-25

Просмотров: 7905

Описание: ConvNeXt: A ConvNet for the 2020s

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla ViT, on the other hand, faces difficulties when applied to general computer vision tasks such as object detection and semantic segmentation. It is the hierarchical Transformers (e.g., Swin Transformers) that reintroduced several ConvNet priors, making Transformers practically viable as a generic vision backbone and demonstrating remarkable performance on a wide variety of vision tasks. However, the effectiveness of such hybrid approaches is still largely credited to the intrinsic superiority of Transformers, rather than the inherent inductive biases of convolutions. In this work, we reexamine the design spaces and test the limits of what a pure ConvNet can achieve. We gradually "modernize" a standard ResNet toward the design of a vision Transformer, and discover several key components that contribute to the performance difference along the way. The outcome of this exploration is a family of pure ConvNet models dubbed ConvNeXt. Constructed entirely from standard ConvNet modules, ConvNeXts compete favorably with Transformers in terms of accuracy and scalability, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation, while maintaining the simplicity and efficiency of standard ConvNets.

Paper link: https://arxiv.org/abs/2201.03545
Official code: https://github.com/facebookresearch/C...

Video Outline:
0:00 - Introduction
1:00 - Training Techniques
2:59 - Macro Design
5:02 - ResNeXt-ify
5:51 - Inversted Bottleneck
6:44 - Micro Design
8:15 - Summary of ConvNeXt Architecture
9:01 - Empirical Evaluation
9:21 - Results

*AI Bites*
YouTube:    / aibites
Twitter:   / ai_bites
Patreon:   / ai_bites
Github: https://github.com/ai-bites

Swin Transformer:    • Swin Transformer: Hierarchical Vision Tran...
Vision Transformers (ViT):    • Vision Transformer (ViT) - An Image is Wor...
Data Efficient Image Transformer (DeiT):    • DeiT - Data-efficient image transformers &...

📚 📚 📚 BOOKS I HAVE READ, REFER AND RECOMMEND 📚 📚 📚
📖 Deep Learning by Ian Goodfellow - https://amzn.to/3Wnyixv
📙 Pattern Recognition and Machine Learning by Christopher M. Bishop - https://amzn.to/3ZVnQQA
📗 Machine Learning: A Probabilistic Perspective by Kevin Murphy - https://amzn.to/3kAqThb
📘 Multiple View Geometry in Computer Vision by R Hartley and A Zisserman - https://amzn.to/3XKVOWi

Music: https://www.bensound.com

#machinelearning #aibites #deeplearning #convnext #visiontransformers #computervision

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

ConvNeXt: A ConvNet for the 2020s

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations)

ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations)

ConvNet превосходит Vision Transformers (ConvNeXt) Статья с пояснениями

ConvNet превосходит Vision Transformers (ConvNeXt) Статья с пояснениями

ConvNeXt: A ConvNet for the 2020s | Paper Explained

ConvNeXt: A ConvNet for the 2020s | Paper Explained

Разделимая по глубине свертка — более быстрая свертка!

Разделимая по глубине свертка — более быстрая свертка!

Краткое объяснение больших языковых моделей

Краткое объяснение больших языковых моделей

Почему «Трансформеры» заменяют CNN?

Почему «Трансформеры» заменяют CNN?

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (paper illustrated)

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (paper illustrated)

Изображение стоит 16x16 слов: Трансформеры для масштабного распознавания изображений (с пояснения...

Изображение стоит 16x16 слов: Трансформеры для масштабного распознавания изображений (с пояснения...

Объяснение тензорных процессоров (TPU)

Объяснение тензорных процессоров (TPU)

Mask Region based Convolution Neural Networks - EXPLAINED!

Mask Region based Convolution Neural Networks - EXPLAINED!

CoAtNet: Marrying Convolution and Attention for All Data Sizes

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Трансформатор зрения

Трансформатор зрения

DenseNet Deep Neural Network Architecture Explained

DenseNet Deep Neural Network Architecture Explained

Краткое руководство по Vision Transformer — теория и код за (почти) 15 минут

Краткое руководство по Vision Transformer — теория и код за (почти) 15 минут

Как происходит модернизация остаточных соединений [mHC]

Как происходит модернизация остаточных соединений [mHC]

Почему диффузия работает лучше, чем авторегрессия?

Почему диффузия работает лучше, чем авторегрессия?

A ConvNet for the 2020s

A ConvNet for the 2020s

ДИНО: Новые свойства самоконтролируемых преобразователей зрения (иллюстрированная статья)

ДИНО: Новые свойства самоконтролируемых преобразователей зрения (иллюстрированная статья)

Vision Transformer for Image Classification

Vision Transformer for Image Classification

Самая сложная модель из тех, что мы реально понимаем

Самая сложная модель из тех, что мы реально понимаем