Can You Even Tell Left From Right? Presenting a New Challenge for VQA

Автор: ComputerVisionFoundation Videos

Загружено: 2024-01-29

Просмотров: 32

Описание: Authors: Sai Raam Venkataraman; Rishi Sridhar Rao; S. Balasubramanian; R. Raghunatha Sarma; Chandra Sekhar Vorugunti
Description: Visual Question Answering (VQA) needs a means of evaluating the strengths and weaknesses of models. One aspect of such an evaluation is the
measurement of compositional generalisation. This relates to the ability of a model to answer well on scenes whose compositions are different from those of scenes in the training dataset.
In this work, we present several quantitative measures of compositional separation and find that popular datasets for VQA are not good compositional evaluators. To solve this, we present Uncommon Objects in Unseen Configurations (UOUC), a synthetic dataset for VQA. UOUC is at once fairly complex while also being compositionally well-separated. The object-class of UOUC consists of 380 clasess taken from 528 characters from the Dungeons and Dragons game. The training dataset of UOUC consists of 200,000 scenes; whereas the test set consists of 30,000 scenes. In order to study compositional generalisation, simple reasoning and memorisation, each scene of UOUC is annotated with up to 10 novel questions. These deal with spatial relationships, hypothetical changes to scenes, counting, comparison, memorisation and memory-based reasoning. In total, UOUC presents over 2 million questions.
Our evaluation of recent high-performing models for VQA shows that they exhibit poor compositional generalisation, and comparatively lower ability towards simple reasoning. These results suggest that UOUC could lead to advances in research by being a strong benchmark for VQA, especially in the study of compositional generalisation.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Can You Even Tell Left From Right? Presenting a New Challenge for VQA

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Modality-Aware Representation Learning for Zero-Shot Sketch-Based Image Retrieval

Modality-Aware Representation Learning for Zero-Shot Sketch-Based Image Retrieval

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Что происходит с нейросетью во время обучения?

Что происходит с нейросетью во время обучения?

Внимание — это всё, что вам нужно (Transformer) — объяснение модели (включая математику), вывод и...

Внимание — это всё, что вам нужно (Transformer) — объяснение модели (включая математику), вывод и...

ПОСЛЕ СМЕРТИ ВАС ВСТРЕТЯТ НЕ РОДСТВЕННИКИ, А.. ЖУТКОЕ ПРИЗНАНИЕ БЕХТЕРЕВОЙ. ПРАВДА КОТОРУЮ СКРЫВАЛИ

ПОСЛЕ СМЕРТИ ВАС ВСТРЕТЯТ НЕ РОДСТВЕННИКИ, А.. ЖУТКОЕ ПРИЗНАНИЕ БЕХТЕРЕВОЙ. ПРАВДА КОТОРУЮ СКРЫВАЛИ

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Как работает ChatGPT: объясняем нейросети просто

Как работает ChatGPT: объясняем нейросети просто

Deep Learning for Computer Vision with Python and TensorFlow – Complete Course

Deep Learning for Computer Vision with Python and TensorFlow – Complete Course

Understanding the Discrete Fourier Transform and the FFT

Understanding the Discrete Fourier Transform and the FFT

Что такое встраивание слов?

Что такое встраивание слов?

Endless White Particles Black Background 4K (No Loops)

Endless White Particles Black Background 4K (No Loops)

Panelformer: Sewing Pattern Reconstruction From 2D Garment Images

Panelformer: Sewing Pattern Reconstruction From 2D Garment Images

Что такое генеративный ИИ и как он работает? – Лекции Тьюринга с Миреллой Лапатой

Что такое генеративный ИИ и как он работает? – Лекции Тьюринга с Миреллой Лапатой

30 самых прекрасных классических произведений для души и сердца 🎵 Моцарт, Бах, Бетховен, Шопен

30 самых прекрасных классических произведений для души и сердца 🎵 Моцарт, Бах, Бетховен, Шопен

Если у тебя спросили «Как твои дела?» — НЕ ГОВОРИ! Ты теряешь свою силу | Еврейская мудрость

Если у тебя спросили «Как твои дела?» — НЕ ГОВОРИ! Ты теряешь свою силу | Еврейская мудрость

Ночные пробуждения в 3–4 часа: как найти причину и вернуть глубокий сон.

Ночные пробуждения в 3–4 часа: как найти причину и вернуть глубокий сон.

Экспресс-курс RAG для начинающих

Экспресс-курс RAG для начинающих

Управление поведением LLM без тонкой настройки

Управление поведением LLM без тонкой настройки

Понимание GD&T

Визуализация скрытого пространства: PCA, t-SNE, UMAP | Глубокое обучение с анимацией

Визуализация скрытого пространства: PCA, t-SNE, UMAP | Глубокое обучение с анимацией