Stop Sending Your Sensitive Data to OpenAI, Google & Anthropic

Автор: The AI Automators

Загружено: 2026-02-15

Просмотров: 1408

Описание: 👉 Get access to the full Agentic RAG codebase & join hundreds of AI builders in our community https://www.theaiautomators.com/?utm_...

🔗 Get Started:
GitHub Repo: https://github.com/theaiautomators/cl...
Microsoft Persidio: https://microsoft.github.io/presidio/
Persidio Demo: https://huggingface.co/spaces/presidi...
Episode 1: • The Complete Agentic RAG Build: 8 Modules,...

What if you could use powerful cloud AI models with your private company documents — without any sensitive data ever leaving your network?

In this video, we build a full redaction and anonymization system using Microsoft Presidio, local LLMs, and the Faker library, ensuring that cloud models like Claude Haiku never see real names, financials, or personal information.

We cover the real-world challenges of entity resolution, surrogate data generation, and reversible anonymization — and show you honestly where things break down and how we fixed them.

📌 What's covered:

Why redaction and anonymization matter (GDPR, HIPAA, CCPA, PCI-DSS)
The difference between hard redaction (irreversible) and reversible anonymization with surrogate data
How Microsoft Presidio identifies PII using pattern matching, named entity recognition, and context enhancement
The entity resolution problem — why "Margaret Thompson," "Maggie Thompson," and "M. Thompson" all need the same surrogate
Using a local LLM (Qwen 3 8B) as a safety net for entity clustering and catching missed PII
Building the full architecture with Claude Code and Agent Teams (Opus 4.6)
End-to-end testing with Langfuse tracing to verify the cloud LLM never sees real data
Hard lessons learned: why our first architecture was over-engineered and how we simplified it

🔍 Tech stack:

Microsoft Presidio (open-source PII detection)
Faker library (surrogate data generation)
Qwen 3 8B (local LLM for entity resolution)
Claude Haiku via OpenRouter (cloud LLM)
Supabase (local Postgres + auth + storage)
React frontend / Python backend
Langfuse (self-hosted tracing)
Claude Code with Agent Teams

Key takeaway: Entity recognition is not perfect — even the best systems miss 5%+ of sensitive entities. You need defense in depth: technical safeguards, legal safeguards, and organizational policies working together.

🔗 PRD and requirements available in the repo below
🔗 Full codebase available to AI Automators community members

📌 This is part of our Agentic RAG series where we're building a full AI agent web app grounded in private company knowledge.

⏱️ Timestamps:
00:00:00 The Explainer
00:16:43 Phase 1 Planning
00:33:10 Phase 1 Build
00:46:30 The Rebuild!
01:00:49 New Features & Demo

#AI #RAG #Privacy #Redaction #Anonymization #MicrosoftPresidio #ClaudeCode #AgenticRAG #PII #GDPR

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Stop Sending Your Sensitive Data to OpenAI, Google & Anthropic

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Изучите 90% команд агентов Claude Code за 22 минуты (Opus 4.6)

Изучите 90% команд агентов Claude Code за 22 минуты (Opus 4.6)

Оркестрация Суб-агентов в Cursor AI: часы автономной работы через циклы агентов

Оркестрация Суб-агентов в Cursor AI: часы автономной работы через циклы агентов

Это действительно происходит!

Это действительно происходит!

Как работает Search Engine под капотом: ранжирование и релевантность

Как работает Search Engine под капотом: ранжирование и релевантность

Дарио Амодеи — «Мы близки к концу экспоненты»

Дарио Амодеи — «Мы близки к концу экспоненты»

Как настроить Claude Code за час и получить второй мозг для решения любых своих задач

Как настроить Claude Code за час и получить второй мозг для решения любых своих задач

Катастрофа в столице / Захват Киева Россией?

Катастрофа в столице / Захват Киева Россией?

Build Database Agents That Get Smarter With Every Query (n8n)

Build Database Agents That Get Smarter With Every Query (n8n)

I Ran Claude Code for FREE… Here's How

I Ran Claude Code for FREE… Here's How

Explaining Agentic AI: The Good, the Bad & the Ugly

Explaining Agentic AI: The Good, the Bad & the Ugly

Cursor: как выжать максимум из AI-разработки

Cursor: как выжать максимум из AI-разработки

100 hours of OpenClaw lessons in 35 minutes

100 hours of OpenClaw lessons in 35 minutes

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!

Model Context Protocol (MCP) Explained for Beginners: AI Flight Booking Demo!

Unlock DEEP AGENTS with Anthropic’s Agent Harness in n8n

Unlock DEEP AGENTS with Anthropic’s Agent Harness in n8n

AntiGravity + Claude Code уничтожает все инструменты для оптимизации рабочих процессов (НОВЫЙ навык)

AntiGravity + Claude Code уничтожает все инструменты для оптимизации рабочих процессов (НОВЫЙ навык)

Claude Code создал мне команду AI-агентов (Claude Code + Skills + MCP)

Claude Code создал мне команду AI-агентов (Claude Code + Skills + MCP)

GLM 5 Обзор Z.ai

GLM 5 Обзор Z.ai

Why Replacing Developers with AI is Going Horribly Wrong

Why Replacing Developers with AI is Going Horribly Wrong

OpenClaw Creator: Почему 80% приложений исчезнут

OpenClaw Creator: Почему 80% приложений исчезнут

Is Gemini File Search Actually a Game-Changer?

Is Gemini File Search Actually a Game-Changer?