Generative AI Model Data Pre-Training on Kubernetes: A Use Case S... Anish Asthana & Mohammad Nassar
Автор: The Linux Foundation
Загружено: 2025-07-02
Просмотров: 80
Описание:
Don't miss out! Join us at the next Open Source Summit in Hyderabad, India (August 5); Amsterdam, Netherland (August 25-29); Seoul, South Korea (November 4-5). Join us at the premier vendor-neutral open source conference, where developers and technologists come together to collaborate, share knowledge, and explore the latest innovations and advancements in open source technology. Learn more at https://events.linuxfoundation.org/
Generative AI Model Data Pre-Training on Kubernetes: A Use Case Study - Anish Asthana, Red Hat & Mohammad Nassar, IBM
Large Language Models (LLM) require preprocessing vast amounts of data, a process that can span days due to its complexity and scale, often involving PetaBytes of data. This talk demonstrates how Kubeflow Pipelines (KFP) simplify LLM data processing with flexibility, repeatability, and scalability. These pipelines are being used daily at IBM Research to build indemnified LLMs tailored for enterprise applications.
Different data preparation toolkits are built on Kubernetes, Rust, Slurm, or Spark. How would you choose one for your own LLM experiments or enterprise use cases and why should you consider Kubernetes and KFP?
This talk describes how open source Data Prep Toolkit leverages KFP and KubeRay for scalable pipeline orchestration, e.g. deduplication, content classification, and tokenization.
We share challenges, lessons, and insights from our experience with KFP, highlighting its applicability for diverse LLM tasks, such as data preprocessing, RAG retrieval, and model fine-tuning.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: