Reinforcement Learning from Human Feedback: From Zero to chatGPT

Автор: HuggingFace

Загружено: 2022-12-13

Просмотров: 187087

Описание: In this talk, we will cover the basics of Reinforcement Learning from Human Feedback (RLHF) and how this technology is being used to enable state-of-the-art ML tools like ChatGPT. Most of the talk will be an overview of the interconnected ML models and cover the basics of Natural Language Processing and RL that one needs to understand how RLHF is used on large language models. It will conclude with open question in RLHF.

RLHF Blogpost: https://huggingface.co/blog/rlhf
The Deep RL Course: https://hf.co/deep-rl-course
Slides from this talk: https://docs.google.com/presentation/...
Nathan Twitter: / natolambert
Thomas Twitter: / thomassimonini

Nathan Lambert is a Research Scientist at HuggingFace. He received his PhD from the University of California, Berkeley working at the intersection of machine learning and robotics. He was advised by Professor Kristofer Pister in the Berkeley Autonomous Microsystems Lab and Roberto Calandra at Meta AI Research. He was lucky to intern at Facebook AI and DeepMind during his Ph.D. Nathan was was awarded the UC Berkeley EECS Demetri Angelakos Memorial Achievement Award for Altruism for his efforts to better community norms.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Reinforcement Learning from Human Feedback: From Zero to chatGPT

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Deep Dive into LLMs like ChatGPT

Deep Dive into LLMs like ChatGPT

MCP's 1st Birthday Kickoff

MCP's 1st Birthday Kickoff

Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback

Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback

The Power of Open Source: Building Giants in the Open

The Power of Open Source: Building Giants in the Open

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)

Лучший документальный фильм про создание ИИ

Лучший документальный фильм про создание ИИ

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

MCP's 1st Birthday Kickoff 📱

MCP's 1st Birthday Kickoff 📱

Управление поведением LLM без тонкой настройки

Управление поведением LLM без тонкой настройки

Вселенная состоит из информации? Объясняю на пальцах

Вселенная состоит из информации? Объясняю на пальцах

Przedsiębiorca miażdży KSEF. Oto dlaczego ten system to problem | prof. SGMK dr Mariusz Miąsko

Przedsiębiorca miażdży KSEF. Oto dlaczego ten system to problem | prof. SGMK dr Mariusz Miąsko

GPT-4 - How does it work, and how do I build apps with it? - CS50 Tech Talk

GPT-4 - How does it work, and how do I build apps with it? - CS50 Tech Talk

Mastering Identity Cybersecurity: The Power Trio of Zero Trust, Identity-First Security, and ITDR

Mastering Identity Cybersecurity: The Power Trio of Zero Trust, Identity-First Security, and ITDR

MIT 6.S191 (2024): Reinforcement Learning

MIT 6.S191 (2024): Reinforcement Learning

Kubernetes Crash Course for Absolute Beginners [NEW]

Kubernetes Crash Course for Absolute Beginners [NEW]

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

Краткий курс по SDK агентов OpenAI (с моделями объятий)

Краткий курс по SDK агентов OpenAI (с моделями объятий)

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Building LLMs from the Ground Up: A 3-hour Coding Workshop

Building LLMs from the Ground Up: A 3-hour Coding Workshop

Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!

Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!