Multimodal deep learning: A Comparison between LSTM and Transformers for Image captioning

Image captioning

Features extraction

NLP

VGG16

LSTM

Transformers

multimodal deep learning

deep learning

CNN

natural language processing

nlp

Flickr8k dataset

BLEU score

textual description of images

computer vision

Автор: Prof. Sabri

Загружено: 2023-01-15

Просмотров: 1209

Описание: Image captioning is the process of generating a textual description of images, which integrates both computer vision and natural language processing. Approaches based on encoder-decoder architectures have been recently proposed to solve image captioning problems. The main objective of this paper is to conduct a comparative study between the two most widely used approaches for natural language processing tasks, namely, LSTMs and Transformers. We used the Flickr8k dataset as input images. Regarding image feature extraction, we used the VGG16 model. To evaluate the obtained descriptions generated by the models, the BLEU score metric is used to measure the performance of both models. The latter were able to generate grammatically correct and expressive captions.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Multimodal deep learning: A Comparison between LSTM and Transformers for Image captioning

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Long Short-Term Memory (LSTM): RNN à LSTM. Améliorer la Mémoire Séquentielle avec des Portes #lstm

Long Short-Term Memory (LSTM): RNN à LSTM. Améliorer la Mémoire Séquentielle avec des Portes #lstm

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

All Machine Learning algorithms explained in 17 min

All Machine Learning algorithms explained in 17 min

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Как автоматизировать анализ информации с n8n и AI: на примере анализа резюме

Как автоматизировать анализ информации с n8n и AI: на примере анализа резюме

How I'd learn ML in 2025 (if I could start over)

How I'd learn ML in 2025 (if I could start over)

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Как начать работать с Obsidian ПРАВИЛЬНО (Гайд для новичков)

Как начать работать с Obsidian ПРАВИЛЬНО (Гайд для новичков)

Клещ думал, что он охотник, пока не встретил муравьев!

Клещ думал, что он охотник, пока не встретил муравьев!