TunexRL - Reinforcement Learning for Reliable LLM Explanations using Tunix
Автор: Om Shree
Загружено: 2026-01-12
Просмотров: 17
Описание:
This video presents my submission for the Google Tunix Hackathon, where I fine-tune Gemma-3-1B to produce explicit, judge-visible reasoning traces using Tunix, Google’s JAX-native post-training library.
The goal of this project is not to maximize benchmark accuracy, but to train a model that reliably explains its reasoning in a strict, reproducible format
The model is trained end-to-end using GRPO (Group Relative Policy Optimization) in a single Kaggle TPU session, with no inference-time post-processing or output repair. If the model produces invalid output, it is surfaced exactly as generated—matching judge evaluation behavior.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: