How to Build a Finance Domain Specific LLM from Scratch Using Python

Автор: Analytics in Practice

Загружено: 2025-12-30

Просмотров: 162

Описание: This notebook walks through an end-to-end workflow for building a finance domain–specific LLM in Python, starting with a clear goal: ingest real financial language like 10-K/10-Q filings and train a model to answer finance questions, follow finance instructions, and later support citation-style retrieval with RAG. It begins by installing the core tooling and setting environment flags to reduce multiprocessing and tokenizer threading issues, which helps stability on Windows. The pipeline downloads recent SEC filings for a small set of tickers and forms using sec-edgar-downloader, while emphasizing proper SEC identification via a user agent and email. Before heavy processing, it checks available RAM to avoid crashes when loading and tokenizing large documents. Next, it traverses the SEC filing directory tree, selects high-signal files like full-submission.txt, filters out tiny or noisy documents, and builds a Hugging Face Dataset with the raw text plus metadata like file path and ticker. The notebook then tokenizes the text with a pretrained tokenizer, removes the raw text to save memory, and “packs” tokens into fixed-size blocks suitable for language-model training by concatenating sequences and chunking into 1024-token windows. To avoid redoing expensive preprocessing, it saves the tokenized and packed datasets to disk and demonstrates reloading them later. For training, it switches to a CPU-friendly base model and uses LoRA with peft plus trl’s SFTTrainer to perform a small instruction-tuning run on a subset of the packed dataset, keeping steps limited for practicality on a laptop. Finally, it shows how to load the base model with the LoRA adapter and query it using a chat-style prompt template so the model responds as a finance tutor, producing explanatory answers rather than code or unrelated output.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

How to Build a Finance Domain Specific LLM from Scratch Using Python

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Typst: Современная замена Word и LaTeX, которую ждали 40 лет

Typst: Современная замена Word и LaTeX, которую ждали 40 лет

Как использовать квантовое машинное обучение для оптимизации прибыли инвестиционного портфеля

Как использовать квантовое машинное обучение для оптимизации прибыли инвестиционного портфеля

Teach LLM Something New 💡 LoRA Fine Tuning on Custom Data

Teach LLM Something New 💡 LoRA Fine Tuning on Custom Data

Как создать магистерскую программу с нуля на Python с использованием ИИ (для начинающих)

Как создать магистерскую программу с нуля на Python с использованием ИИ (для начинающих)

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

Удалил Notion: Как ИИ наводит порядок в делах (n8n + NotebookLM + Gemini)

Удалил Notion: Как ИИ наводит порядок в делах (n8n + NotebookLM + Gemini)

Я в опасности

Кто пишет код лучше всех? Сравнил GPT‑5.2, Opus 4.5, Sonnet 4.5, Gemini 3, Qwen 3 Max, Kimi, GLM

Кто пишет код лучше всех? Сравнил GPT‑5.2, Opus 4.5, Sonnet 4.5, Gemini 3, Qwen 3 Max, Kimi, GLM

Доработайте свою степень магистра права за 13 минут. Вот как

Доработайте свою степень магистра права за 13 минут. Вот как

Feed Your OWN Documents to a Local Large Language Model!

Feed Your OWN Documents to a Local Large Language Model!

Самая сложная модель из тех, что мы реально понимаем

Самая сложная модель из тех, что мы реально понимаем

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

Самая быстрая передача файлов МЕЖДУ ВСЕМИ ТИПАМИ УСТРОЙСТВ 🚀

Самая быстрая передача файлов МЕЖДУ ВСЕМИ ТИПАМИ УСТРОЙСТВ 🚀

LLM Fine-Tuning 14: Train LLMs on Your PDF/Text Data | Domain-Specific Fine-Tuning with Hugging Face

LLM Fine-Tuning 14: Train LLMs on Your PDF/Text Data | Domain-Specific Fine-Tuning with Hugging Face

Анатомия масштабируемого проекта Python (FastAPI)

Анатомия масштабируемого проекта Python (FastAPI)

Сисадмины больше не нужны? Gemini настраивает Linux сервер и устанавливает cтек N8N. ЭТО ЗАКОННО?

Сисадмины больше не нужны? Gemini настраивает Linux сервер и устанавливает cтек N8N. ЭТО ЗАКОННО?

Fine-Tuning Local Models with LoRA in Python (Theory & Code)

Fine-Tuning Local Models with LoRA in Python (Theory & Code)

Как внимание стало настолько эффективным [GQA/MLA/DSA]

Как внимание стало настолько эффективным [GQA/MLA/DSA]

13 ПРИЁМОВ ПО РАБОТЕ С CLAUDE CODE ОТ ЕГО СОЗДАТЕЛЯ!

13 ПРИЁМОВ ПО РАБОТЕ С CLAUDE CODE ОТ ЕГО СОЗДАТЕЛЯ!

XPENG IRON - China's MOST HUMAN Robot Ever Built!

XPENG IRON - China's MOST HUMAN Robot Ever Built!