ycliper

Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
Скачать

The RL Irony in LLMs (And its insane new Meta)

Автор: bycloud

Загружено: 2026-01-21

Просмотров: 9814

Описание: Start learning cyber security with TryHackMe: https://tryhackme.com/bycloud Use my code "BYCLOUD25" to get 25% off on annual subscription!

This video breaks down what's wrong with scaling RL for LLMs, especially in the direction of reaching AGI, but why RL still matters. As RL is noisy and can hurt generalization, yet it enables exploration and self-correction that pretraining can’t, we are stuck between a rock and a hard place with this direction. We’ll also look at why LoRA is becoming the practical way to do RL cheaply, swappable adapters that can match full fine-tuning on reasoning and make personalized agents easier to deploy, which might look like a promising future direction to apply RL on a massive scale.


my latest project: Intuitive AI Academy
https://intuitiveai.academy/
code "NYNM" for 50% off forever (limited to 50)


Dwarkesh Podcast w/ AK
[YouTube]    • Andrej Karpathy — “We’re summoning ghosts,...  

Dwarkesh Podcast w/ Ilya
[YouTube]    • Ilya Sutskever – We're moving from the age...  

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
[Paper] https://arxiv.org/abs/2506.01939

The Path Not Taken: RLVR Provably Learns Off the Principals
[Paper] https://arxiv.org/abs/2511.08567

LoRA Without Regret
[Blog] https://thinkingmachines.ai/blog/lora/

Tina: Tiny Reasoning Models via LoRA
[Paper] https://arxiv.org/abs/2504.15777

Tinker
[Website] https://thinkingmachines.ai/tinker/


My Newsletter
https://mail.bycloud.ai/

My Patreon
  / bycloud  


Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI

This video is supported by the kind Patrons & YouTube Members:
🙏Spam Maj, Alex, Chris LeDoux, DX Research Group, Poof N' Inu, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon, Lame Plane, Matej Macak


[Discord]   / discord  
[Twitter]   / bycloudai  
[Patreon]   / bycloud  
[Business Inquiries] [email protected]
[Profile & Banner Art]   / pygm7  
[Video Editor] Abhay and ‪@Booga04‬
[Ko-fi] https://ko-fi.com/bycloudai

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...
The RL Irony in LLMs (And its insane new Meta)

Поделиться в:

Доступные форматы для скачивания:

Скачать видео

  • Информация по загрузке:

Скачать аудио

Похожие видео

Microsoft Reacts to “One-Click” Copilot Hack

Microsoft Reacts to “One-Click” Copilot Hack

Как 27M Model вообще смогла обойти ChatGPT?

Как 27M Model вообще смогла обойти ChatGPT?

python 5 minАвтоматизация тестирования на python | Основы №5

python 5 minАвтоматизация тестирования на python | Основы №5

Ruby first impressions

Ruby first impressions

Linus Sebastian Shows Jimmy and Marcello Hernández Some Mind-Boggling Technology (Extended)

Linus Sebastian Shows Jimmy and Marcello Hernández Some Mind-Boggling Technology (Extended)

Перетест Ai MAX+ 395 в жирном мини-ПК и тест AMD 8060s vs Intel B390

Перетест Ai MAX+ 395 в жирном мини-ПК и тест AMD 8060s vs Intel B390

OpenCode - Убийца Claude Code???

OpenCode - Убийца Claude Code???

I paid $40,000.00 for licensed code in hopes of open-sourcing it.

I paid $40,000.00 for licensed code in hopes of open-sourcing it.

AI just killed another company... (StackOverflow)

AI just killed another company... (StackOverflow)

Not Just Printers. It Bans Everything.

Not Just Printers. It Bans Everything.

Could Europe Dump US Treasuries?

Could Europe Dump US Treasuries?

Компания Salesforce признала свою ошибку.

Компания Salesforce признала свою ошибку.

The REAL Reason AI Can’t Be Stopped Now

The REAL Reason AI Can’t Be Stopped Now

A Random Developer Just Solved Adobe On Linux

A Random Developer Just Solved Adobe On Linux

Воссоздание старинного насоса (без движущихся частей)

Воссоздание старинного насоса (без движущихся частей)

it only took 2 characters

it only took 2 characters

The unhinged world of tech in 2026...

The unhinged world of tech in 2026...

MIT Researchers DESTROY the Context Window Limit

MIT Researchers DESTROY the Context Window Limit

This New Technology Could Kill TSMC and ASML

This New Technology Could Kill TSMC and ASML

Palantir убивает людей? Но кто на самом деле нажимает на кнопки?

Palantir убивает людей? Но кто на самом деле нажимает на кнопки?

© 2025 ycliper. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]