Видео с ютуба Airesearchroundup
How Harness Complexity Affects LLM Agents
TRB: Stabilizing On-Policy LLM Distillation
CHERRL: Detecting LLM Reward Hacking in RL
LongTraceRL: Teaching LLMs Long-Context Reasoning
WBench: New Benchmark for Video World Models
q0: Efficient Multi-Epoch LLM Pretraining
Predict LLM Self-Distillation Before Training
COLLEAGUE.SKILL: Portable Skills for LLM Agents
Crafter: Multi-Agent Editable Scientific Figures
New Stateful Monitor Stops LLM Agent Attacks
TELBench: Debugging LLM Agent Trajectories
RecFM: 20x Faster Generative Physics Modeling
ProRL: Smart RL for Proactive Recommendations
SZD-50 Puchacz Glider Flight Over Żar Airstrip