Inspect Rich Documents with Gemini Multimodality and Multimodal RAG: Challenge Lab | GSP520
Автор: SheCodes
Загружено: 2025-07-20
Просмотров: 13915
Описание:
🚀 Inspect Rich Documents with Gemini Multimodality & RAG | Google Cloud Challenge Lab Walkthrough 🎯
Welcome to this video walkthrough of the “Inspect Rich Documents with Gemini Multimodality and Multimodal RAG: Challenge Lab” (GSP520) on Google Cloud Skills Boost!
This lab is part of the Gemini Multimodality Skill Badge path and is designed to test your skills in real-world scenarios. Unlike guided labs, this challenge lab provides tasks without step-by-step instructions. You'll apply your knowledge of Google’s Gemini multimodal model and Retrieval Augmented Generation (RAG) to extract insights from rich content, including text, images, and video.
🧠 What You’ll Learn in This Lab:
In this hands-on session, you'll use Vertex AI Workbench and Gemini's generative capabilities to complete two major tasks:
🔍 Task 1: Generate Multimodal Insights
Analyze and compare multiple brand images
Generate a description from a Pixel 8 Pro promotional video
Extract object tags from the video
Ask follow-up questions about the video content
Retrieve additional information beyond what's visible in the video
Gemini’s multimodal ability helps you understand visuals and videos deeply by combining text and image input into smart responses.
📄 Task 2: Retrieval-Augmented Generation (RAG)
Here, you'll work with two documents:
Google’s Terms of Service (text-only)
A shortened 10-K financial report (text + images)
You will:
Build and inspect metadata for text chunks and images
Use helper functions like get_similar_text_from_query() and get_similar_image_from_query() to perform semantic search
Pass relevant context into Gemini and generate intelligent answers
Print citations to show source credibility
🔧 Tools & Setup
You’ll run this lab in the Vertex AI Workbench (JupyterLab) using Python 3. The lab environment provides access to Gemini’s SDK, pretrained documents, and helper functions needed for multimodal search and response generation.
✅ Ideal For:
This challenge lab is ideal for learners aiming to:
Master Gemini’s multimodal AI capabilities
Understand multimodal RAG workflows
Build real-world document insight pipelines
Earn the Gemini Multimodality Skill Badge from Google Cloud
If you enjoy the video, don’t forget to:
👍 Like
💬 Comment
🔔 Subscribe for more Google Cloud labs and AI tutorials!
Github file - https://github.com/BhumikaJoshi13/Ins...
#GoogleCloud #GeminiAI #MultimodalAI #VertexAI #ChallengeLab #RAG #GoogleCloudLabs #GSP520 #GenAI #AIlabs #CloudSkillsBoost
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: