Introduction to RLHF | PyImageSearch | Learn how ChatGPT works!

Автор: PyImageSearch

Загружено: 2023-08-16

Просмотров: 616

Описание: Souradip is currently a 2nd-year Ph.D. Computer Science Ph.D. student at the University of Maryland, College Park, working in the Foundations of Reinforcement Learning in Sequential Decision Making. He aims to develop large-scale robust algorithms for sequential decision-making tasks under practical and challenging limitations to make Safe, Fair, Robust, and Aligned to Human behavior & Preferences - bridge the Gap b/w Theory and Practice. Recently received the Outstanding Paper Award, TSRML at Neurips2022 and Outstanding Reviewer Awards, Neurips 2022, AISTATS 2023. As a part of the Ph.D. program, he has published in venues including ICML, Neurips, AAAI, CoRL, and ICRA. In the past, Souradip has worked for 3 years as a Research AI Scientist at Walmart Labs, India after completing my Masters from the Indian Statistical Institute in 2018 summa cum laude and also a Google Developers Expert in Machine Learning (2019). Co-authored several US patents and top-tier publications in the field of AI & ML applications in the NLP and Computer Vision domain as a part of Walmart Labs and GDE-ML.

The major success behind the exceptional performance of ChatGPT can be attributed to the Reinforcement Learning from Human Feedback which has significantly improved the performance of Language models. Aligning with Human Feedback is extremely critical in the current times in the context of Safety, Security, and Trustworthy AI. RLHF provides an efficient framework for alignment with only human preferences. In this session, Souradip will give an introduction to the RLHF framework and challenges and what are the next steps.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Introduction to RLHF | PyImageSearch | Learn how ChatGPT works!

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Визуализация внимания, сердце трансформера | Глава 6, Глубокое обучение

Image Classification with JAX and FLAX | PyImageSearch Bonus Lesson

Image Classification with JAX and FLAX | PyImageSearch Bonus Lesson

Emerging AI Threats and Innovations in Cybersecurity

Emerging AI Threats and Innovations in Cybersecurity

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Reinforcement Learning, RLHF, & DPO Explained

Reinforcement Learning, RLHF, & DPO Explained

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

Introduction to KerasCV with Google Software Engineer | PyImageSearch | LiveStream

Introduction to KerasCV with Google Software Engineer | PyImageSearch | LiveStream

Оптимизация проксимальной политики (PPO) — как обучать большие языковые модели

Оптимизация проксимальной политики (PPO) — как обучать большие языковые модели

Как LLM могут хранить факты | Глава 7, Глубокое обучение

Как LLM могут хранить факты | Глава 7, Глубокое обучение

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

How ChatGPT works - From Transformers to Reinforcement Learning with Human Feedback (RLHF)

How ChatGPT works - From Transformers to Reinforcement Learning with Human Feedback (RLHF)

Роботы, Которых Никто Не Ожидал Увидеть на CES 2026

Роботы, Которых Никто Не Ожидал Увидеть на CES 2026

MLOps with Weights & Biases | PyImageSearch | Live learning

MLOps with Weights & Biases | PyImageSearch | Live learning

Как работает ChatGPT: объясняем нейросети просто

Как работает ChatGPT: объясняем нейросети просто

CS 285: Eric Mitchell: Reinforcement Learning from Human Feedback: Algorithms & Applications

CS 285: Eric Mitchell: Reinforcement Learning from Human Feedback: Algorithms & Applications

Reinforcement Learning from Human Feedback: From Zero to chatGPT

Reinforcement Learning from Human Feedback: From Zero to chatGPT

Hybrid Systems TC Seminar - Youssef Ait Si

Hybrid Systems TC Seminar - Youssef Ait Si

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Dario Amodei’s message to Congress on AI

Dario Amodei’s message to Congress on AI

Создай своего ИИ агента за 20 минут (сможет каждый)

Создай своего ИИ агента за 20 минут (сможет каждый)