VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
Автор: Cognitive AI
Загружено: 2022-04-06
Просмотров: 2734
Описание:
VL-InterpreT was accepted to CVPR 2022.
Paper: https://arxiv.org/abs/2203.17247
Demo: http://vlinterpretenv4env-env.eba-vmh...
VL-InterpreT provides novel interactive visualizations for interpreting the attention and hidden representations in multimodal transformers. It is a task agnostic and integrated tool that (1) tracks a variety of statistics in attention heads throughout all layers for both vision and language components, (2) visualizes cross-modal and intra-modal attentions through easily readable heatmaps, and (3) plots the hidden representations of vision and language tokens as they pass through the transformer layers. In this paper, we demonstrate the functionalities of VL-InterpreT through the analysis of KD-VLP, an end-to-end pretraining vision-language multimodal transformer-based model, in the tasks of Visual Commonsense Reasoning (VCR) and WebQA, two visual question answering benchmarks. Furthermore, we also present a few interesting findings about multimodal transformer behaviors that were learned through our tool.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: