State Of The Art Object Detection with Isaac Robinson

Автор: Robin Cole

Загружено: 2026-01-30

Просмотров: 234

Описание: In this video I sat down with Isaac to discuss RF-DETR, a new state-of-the-art family of real-time object detection and segmentation models from Roboflow. We cover the motivation for building models that are not just accurate but also fast, cost-efficient, and deployable across diverse hardware and data regimes, and why moving beyond fixed architectures is key to achieving that. Isaac explains how RF-DETR combines strong foundation backbones like DINOv2 with efficient neural architecture search to unlock novel speed–accuracy trade-offs, including dropping decoder layers and queries after training. We also discuss the model’s strong transfer performance on domains far from COCO, the introduction of a memory-efficient instance segmentation head, and the team’s unusually rigorous benchmarking approach, before closing on the challenges of open-source research and upcoming improvements to inference and platform integration.

/ robinsonish
https://github.com/roboflow/rf-detr
https://arxiv.org/abs/2511.09554

🚀 TIMELINE
0:18 – Isaac’s Role: ML research engineer at Roboflow building new CV models for diverse user use cases.
0:40 – What Is RF-DETR: Presented as SOTA for real-time object detection (COCO + downstream transfer), beating much slower/larger models.
1:20 – ICLR Acceptance: RF-DETR paper accepted to ICLR; now formally peer-reviewed.
2:20 – How They Did It: Combine strong pretrained vision backbones (e.g., DINOv2) with architecture optimization rather than a single fixed design.
4:12 – Neural Architecture Search: Efficient weight-sharing NAS trains thousands of variants and selects the best architecture for a dataset; NAS engine planned as a platform feature.
6:19 – Benchmarking at Scale: Team trains huge numbers of models quickly (e.g., hundreds of YOLO variants overnight) to evaluate transfer thoroughly.
9:06 – Speed Tricks From NAS: Can drop queries post-training (dataset-dependent object-count prior) and even remove all decoder layers (ViT-forward style), improving latency.
10:49 – Segmentation + Memory Wins: Adds a real-time instance segmentation head and cuts segmentation training VRAM ~3–4× (≈40–50GB → ≈10GB) without performance loss.
13:37 – Domain Transfer: RF-DETR performs especially well on “non-COCO-like” domains (notably medical, aerial, and documents) on Roboflow’s RF100-VL benchmark dataset.
20:18 – High-End Result: DINOv2-base RF-DETR reaches greater than 60 mAP on COCO at ~17 ms, framed as the first published real-time detector past 60 mAP.
24:12 – Open-Source Ramp-Up: A new open-source engineer (ex–PyTorch Lightning) is increasing repo velocity; more dev time planned.
26:05 – Deployment Reality + Fix: “Single artefact benchmarking” + upcoming inference stack improvements to make advertised latency reproducible and easier across hardware backends.

Bio: Isaac Robinson is a Machine Learning Research Engineer at Roboflow. He’s worked across the field of computer vision, from real-time stereo depth estimation on household robots to biomedical research at the NIH to founding a zero shot computer vision infrastructure startup. Isaac focusses on the intersection of low latency and high performance, with the goal of helping people unlock new capabilities through vision.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

State Of The Art Object Detection with Isaac Robinson

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

TorchGeo 1.0 with Adam Stewart

TorchGeo 1.0 with Adam Stewart

Chained Models for High-Res Aerial Solar Fault Detection

Chained Models for High-Res Aerial Solar Fault Detection

Democratising access to GeoAI with InstaGeo

Democratising access to GeoAI with InstaGeo

OpenClaw Creator: Почему 80% приложений исчезнут

OpenClaw Creator: Почему 80% приложений исчезнут

История C# и TypeScript с Андерсом Хейлсбергом | GitHub

История C# и TypeScript с Андерсом Хейлсбергом | GitHub

ATLAS 3 от BOSTON DYNAMICS – ОТ ПАРКУРА К ЗАВОДУ

ATLAS 3 от BOSTON DYNAMICS – ОТ ПАРКУРА К ЗАВОДУ

Мир AI-агентов уже наступил. Что меняется прямо сейчас

Мир AI-агентов уже наступил. Что меняется прямо сейчас

Окупай DPI: Выводим провайдера на чистую воду

Окупай DPI: Выводим провайдера на чистую воду

LibreOffice — Эксклюзивное интервью с Итало Виньоли

LibreOffice — Эксклюзивное интервью с Итало Виньоли

Автоматизация взлома оборудования с помощью кода Клода

Автоматизация взлома оборудования с помощью кода Клода

Что НАСА обнаружило на Ио

Что НАСА обнаружило на Ио

Tessera: A Temporal Foundation Model for Earth Observation

Tessera: A Temporal Foundation Model for Earth Observation

Как Сделать Настольный ЭЛЕКТРОЭРОЗИОННЫЙ Станок?

Как Сделать Настольный ЭЛЕКТРОЭРОЗИОННЫЙ Станок?

Kling 3.0 - Полный обзор возможностей нейросети!

Kling 3.0 - Полный обзор возможностей нейросети!

Парадоксы велосипеда

Парадоксы велосипеда

Технический анализ: как агенты ИИ игнорируют 40 лет прогресса в области безопасности.

Технический анализ: как агенты ИИ игнорируют 40 лет прогресса в области безопасности.

Доведение моделирования до предела возможностей для поиска порядка в хаосе.

Доведение моделирования до предела возможностей для поиска порядка в хаосе.

Самая недооценённая идея в науке

Самая недооценённая идея в науке

Лучший документальный фильм про создание ИИ

Лучший документальный фильм про создание ИИ

6 бесплатных инструментов для работы со спутниковыми снимками, которые должен знать каждый следов...

6 бесплатных инструментов для работы со спутниковыми снимками, которые должен знать каждый следов...