mPLUG: Vision-Language Learning by Cross-modal Skip-connections
Автор: Data Science Gems
Загружено: 2023-11-26
Просмотров: 538
Описание:
mPLUG is an effective and efficient VLP framework for both cross-modal understanding and generation. It has an asymmetric vision-language architecture with novel cross-modal skip-connections, to address information asymmetry and computation efficiency. It is pretrained on large-scale image-text pairs. It shows strong results on image captioning, image-text retrieval, visual grounding and visual question answering. It also demonstrates strong zero-shot transfer ability on multiple video-language tasks.
In this video, I will talk about the following: What is the mPLUG model architecture? How does mPLUG perform?
For more details, please look at https://arxiv.org/pdf/2205.12005v2.pdf
Li, Chenliang, Haiyang Xu, Junfeng Tian, Wei Wang, Ming Yan, Bin Bi, Jiabo Ye et al. "mplug: Effective and efficient vision-language learning by cross-modal skip-connections." arXiv preprint arXiv:2205.12005 (2022).
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: