Petabyte-Scale Data Quality: Leveraging AI to Build a Production Web Classifier
Автор: Machine & Deep Learning Israel
Загружено: 2026-01-26
Просмотров: 41
Описание:
הרצאה זו היא חלק ממיטאפ משותף עם חברת Cyera וקהילת MDLI
אתם יכולים לצפות בשאר ההרצאות ובמצגות פה: https://mdli.co.il/classifyai
In this talk, I'll share how I tackled the challenge of filtering dead webpages at petabyte scale by combining AI, machine learning, and strategic preprocessing techniques. I'll walk through my approach to classifying pages with meaningful content versus empty or dead pages, starting with data science techniques for exploratory analysis and leveraging AI to automate the labeling process.
You'll see how I found a production-grade solution that operates at massive scale, along with the key architectural decisions that made this solution work in a real-world, high-volume environment. Whether you're dealing with large-scale data pipelines or interested in practical applications of AI for data quality problems, you'll learn how to approach similar challenges in your own infrastructure.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: