CNNs vs. Transformers for Urban Semantic Segmentation
Автор: Gabriel Macias
Загружено: 2026-02-12
Просмотров: 6
Описание:
Abstract. Urban semantic segmentation is a key perception task for
autonomous driving and mobile robotics, where models must operate
with limited labels, imperfect visual streams, and strict latency–memory
budgets. Reported CNN and Transformer results are often hard to compare
because training recipes, preprocessing, and measurement practices
differ across studies. This paper presents a controlled Cityscapes evaluation
that fixes the official split, label mapping, resizing to 512×1024,
normalization, and metric code, then compares a DeepLabV3 CNN baseline
against SegFormer under identical evaluation rules. We analyze four
deployment-facing axes: clean validation accuracy, label efficiency, robustness
to common corruptions (severity-3 blur, noise, JPEG compression,
brightness/contrast), and measured computational cost (latency,
peak VRAM, parameters). Results show a regime-dependent trade-off:
the CNN achieves the best clean score under full supervision (0.7217
mIoU), while SegFormer is stronger in the low-label regime (0.6476 mIoU
at 10%). Robustness diverges sharply under Gaussian noise and JPEG
compression, where SegFormer degrades far less than the CNN, and it
runs with a much smaller footprint (3.72M parameters, 0.64 GB VRAM)
and slightly lower latency, motivating closer analysis of these operating
points.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: