State Of The Art Object Detection with Isaac Robinson
Автор: Robin Cole
Загружено: 2026-01-30
Просмотров: 234
Описание:
In this video I sat down with Isaac to discuss RF-DETR, a new state-of-the-art family of real-time object detection and segmentation models from Roboflow. We cover the motivation for building models that are not just accurate but also fast, cost-efficient, and deployable across diverse hardware and data regimes, and why moving beyond fixed architectures is key to achieving that. Isaac explains how RF-DETR combines strong foundation backbones like DINOv2 with efficient neural architecture search to unlock novel speed–accuracy trade-offs, including dropping decoder layers and queries after training. We also discuss the model’s strong transfer performance on domains far from COCO, the introduction of a memory-efficient instance segmentation head, and the team’s unusually rigorous benchmarking approach, before closing on the challenges of open-source research and upcoming improvements to inference and platform integration.
/ robinsonish
https://github.com/roboflow/rf-detr
https://arxiv.org/abs/2511.09554
🚀 TIMELINE
0:18 – Isaac’s Role: ML research engineer at Roboflow building new CV models for diverse user use cases.
0:40 – What Is RF-DETR: Presented as SOTA for real-time object detection (COCO + downstream transfer), beating much slower/larger models.
1:20 – ICLR Acceptance: RF-DETR paper accepted to ICLR; now formally peer-reviewed.
2:20 – How They Did It: Combine strong pretrained vision backbones (e.g., DINOv2) with architecture optimization rather than a single fixed design.
4:12 – Neural Architecture Search: Efficient weight-sharing NAS trains thousands of variants and selects the best architecture for a dataset; NAS engine planned as a platform feature.
6:19 – Benchmarking at Scale: Team trains huge numbers of models quickly (e.g., hundreds of YOLO variants overnight) to evaluate transfer thoroughly.
9:06 – Speed Tricks From NAS: Can drop queries post-training (dataset-dependent object-count prior) and even remove all decoder layers (ViT-forward style), improving latency.
10:49 – Segmentation + Memory Wins: Adds a real-time instance segmentation head and cuts segmentation training VRAM ~3–4× (≈40–50GB → ≈10GB) without performance loss.
13:37 – Domain Transfer: RF-DETR performs especially well on “non-COCO-like” domains (notably medical, aerial, and documents) on Roboflow’s RF100-VL benchmark dataset.
20:18 – High-End Result: DINOv2-base RF-DETR reaches greater than 60 mAP on COCO at ~17 ms, framed as the first published real-time detector past 60 mAP.
24:12 – Open-Source Ramp-Up: A new open-source engineer (ex–PyTorch Lightning) is increasing repo velocity; more dev time planned.
26:05 – Deployment Reality + Fix: “Single artefact benchmarking” + upcoming inference stack improvements to make advertised latency reproducible and easier across hardware backends.
Bio: Isaac Robinson is a Machine Learning Research Engineer at Roboflow. He’s worked across the field of computer vision, from real-time stereo depth estimation on household robots to biomedical research at the NIH to founding a zero shot computer vision infrastructure startup. Isaac focusses on the intersection of low latency and high performance, with the goal of helping people unlock new capabilities through vision.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: