RAG (Retrieval Augmented Generation)|| Docling, Re ranker, Query rewriting

Автор: Intelligent Machines

Загружено: 2025-04-22

Просмотров: 424

Описание: 📌 GitHub Code: https://github.com/mohan696matlab/NLP
📌 1:1 AI Consulting: https://topmate.io/balyogi_mohan_dash...
📌 freelance profile: https://www.upwork.com/freelancers/~0...

=============================
LLM Playlist:    • Small to Large Language Models || Full Cou...
GenAI Project Playlist:    • Generative AI Projects

Whisper ASR Playlist:    • Whisper finetuning on pytorch
=============================

This video delves into the development of an advanced rack pipeline for retrieving relevant information from the Bhagavad Gita. The pipeline involves several stages, including text extraction, chunking, vectorization, query rewriting, and re-ranking. The process begins with the extraction of text from a PDF of the Bhagavad Gita using the Duckling library, which provides a more coherent and sect-free text. The text is then chunked using a recursive text splitter, which divides the text into meaningful chunks without breaking sentences. These chunks are then embedded into vectors using the SentenceTransformer library's MiniLM model. The vector database is created by storing these embeddings in a PyTorch tensor. The pipeline is further enhanced by query rewriting, which involves rephrasing the user's question to better match the context of the Bhagavad Gita. The rewritten query is then used to retrieve similar chunks from the database using a cross-encoder model. The top 10 most similar chunks are re-ranked using the cross-encoder, and the resulting context is passed to a large language model (LLM) for final retrieval of relevant information. The entire pipeline can be executed in under a second, making it efficient for real-time applications. The video also showcases the deployment of the pipeline in a Python script and demonstrates its potential for fine-tuning the LLM model.

🔥 Don’t forget to LIKE, SUBSCRIBE

=============================
🔗Links🔗

LinkedIn:   / balyogi-mohan-dash
Google Scholar: https://scholar.google.com/citations?...

=============================
Please reach out via email for any questions: [email protected]
=============================
LLM videos:
part 1:    • Simple Guide to Training Small LLMs | Part 1
part 2:    • How AI Models Learn to Follow Prompts | Pa...
part 3:    • How to Use Meta’s LLaMA Model – Step-by-St...

Whisper videos:
part 1:    • Master Fine-Tuning OpenAI Whisper with PyT...
part 2:    • Word Error Rate in Atomatic Speech Recogni...
part 3:    • Fine tuning  Whisper with Pytorch (Simples...

TIMESTAMPS
00:00 – Understanding the Basics of a Simple RAG Pipeline
00:27 – Using Bhagavad Gita as Reference for Answering Queries
02:16 – Using Vector Databases for Large-Scale Text Search
03:46 – Running LLaMA 3.2B Model on a Consumer GPU
05:32 – Optimizing Large Language Models for Better Performance
07:07 – Improved Chunking for Better Embeddings
09:22 – Improving Document Retrieval with Enhanced Query Matching
12:47 – Extracting Text from PDF of Bhagavad Gita
18:27 – How to Find Top Similar Query Embeddings in a Database
24:43 – How to Fine-Tune a LLaMA Model with a Simple PyTorch Loop

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

RAG (Retrieval Augmented Generation)|| Docling, Re ranker, Query rewriting

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Тонкая настройка LLaMA с нуля с помощью PyTorch (без тренера) (с кодом Python)

Тонкая настройка LLaMA с нуля с помощью PyTorch (без тренера) (с кодом Python)

Краткий обзор новой версии n8n 2.0 🚀

Краткий обзор новой версии n8n 2.0 🚀

Все стратегии RAG объясняются за 13 минут (без лишних слов)

Все стратегии RAG объясняются за 13 минут (без лишних слов)

Turbo Engine RUL Prediction: A Machine Learning Approach

Turbo Engine RUL Prediction: A Machine Learning Approach

LLM Fine-tuning with LoRA & QLoRA

LLM Fine-tuning with LoRA & QLoRA

What is a vector database? Why are they critical infrastructure for #ai #applications?

What is a vector database? Why are they critical infrastructure for #ai #applications?

5 Re-Ranking Hacks to Boost RAG Accuracy by 40%

5 Re-Ranking Hacks to Boost RAG Accuracy by 40%

Как подключить свои документы к LLM — полный разбор RAG

Как подключить свои документы к LLM — полный разбор RAG

RAG Fundamentals and Advanced Techniques – Full Course

RAG Fundamentals and Advanced Techniques – Full Course

Экспресс-курс RAG для начинающих

Экспресс-курс RAG для начинающих

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

Simple Guide to Training Small LLMs | Part 1

Simple Guide to Training Small LLMs | Part 1

Rewriting RAG Queries with OpenAI Structured Outputs

Rewriting RAG Queries with OpenAI Structured Outputs

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

Самая сложная модель из тех, что мы реально понимаем

Самая сложная модель из тех, что мы реально понимаем

Kubernetes — Простым Языком на Понятном Примере

Kubernetes — Простым Языком на Понятном Примере

Что такое стек ИИ? Магистратура LLM, RAG и аппаратное обеспечение ИИ

Что такое стек ИИ? Магистратура LLM, RAG и аппаратное обеспечение ИИ

Улучшает ли тонкая настройка встраиваемых моделей RAG?

Улучшает ли тонкая настройка встраиваемых моделей RAG?

Интернет в небе: Сергей

Интернет в небе: Сергей "Флеш" о том, как «Шахеды» и «Герберы» научились работать в одной связке

Управление поведением LLM без тонкой настройки

Управление поведением LLM без тонкой настройки