⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI

Автор: Latent Space

Загружено: 2025-12-26

Просмотров: 3982

Описание: From the frontlines of OpenAI's Codex and GPT-5 training teams, Bryan and Bill are building the future of AI-powered coding—where agents don't just autocomplete, they architect, refactor, and ship entire features while you sleep. We caught up with them at AI Engineer Conference right after the launch of Codex Max, OpenAI's newest long-running coding agent designed to work for 24+ hours straight, manage its own context, and spawn sub-agents to parallelize work across your entire codebase.
We sat down with Bryan and Bill to dig into what it actually takes to train a model that developers _trust_—why personality, communication, and planning matter as much as raw capability, how Codex is trained with strong opinions about tools (it loves rg over grep, seriously), why the abstraction layer is moving from models to full-stack agents you can plug into VS Code or Zed, how OpenAI partners co-develop tool integrations and discover unexpected model habits (like renaming tools to match Codex's internal training), the rise of applied evals that measure real-world impact instead of academic benchmarks, why multi-turn evals are the next frontier (and Bryan's "job interview eval" idea), how coding agents are breaking out of code into personal automation, terminal workflows, and computer use, and their 2026 vision: coding agents trusted enough to handle the hardest refactors at any company, not just top-tier firms, and general enough to build integrations, organize your desktop, and unlock capabilities you'd never get access to otherwise.
We discuss:

What Codex Max is: a long-running coding agent that can work 24+ hours, manage its own context window, and spawn sub-agents for parallel work
Why the name "Max": maximalist, maximization, speed and endurance—it's simply better and faster for the same problems
Training for personality: communication, planning, context gathering, and checking your work as behavioral characteristics, not just capabilities
How Codex develops habits like preferring rg over grep, and why renaming tools to match its training (e.g., terminal-style naming) dramatically improves tool-call performance
The split between Codex (opinionated, agent-focused, optimized for the Codex harness) and GPT-5 (general, more durable across different tools and modalities)
Why the abstraction layer is moving up: from prompting models to plugging in full agents (Codex, GitHub Copilot, Zed) that package the entire stack
The rise of sub-agents and agents-using-agents: Codex Max spawning its own instances, handing off context, and parallelizing work across a codebase
How OpenAI works with coding partners on the bleeding edge to co-develop tool integrations and discover what the model is actually good at
The shift to applied evals: capturing real-world use cases instead of academic benchmarks, and why ~50% of OpenAI employees now use Codex daily
Why multi-turn evals are the next frontier: LM-as-a-judge for entire trajectories, Bryan's "job interview eval" concept, and the need for a batch multi-turn eval API
How coding agents are breaking out of code: personal automation, organizing desktops, terminal workflows, and "Devin for non-coding" use cases
Why Slack is the ultimate UI for work, and how coding agents can become your personal automation layer for email, files, and everything in between
The 2026 vision: more computer use, more trust, and coding agents capable enough that any company can access top-tier developer capabilities, not just elite firms

—
Bryan & Bill (OpenAI Codex Team)

http://x.com/bfioca
https://x.com/realchillben
OpenAI Codex: \

Where to find Latent Space

X: \
Substack: \

00:00:00 Introduction: Latent Space Listeners at AI Engineer Code
00:01:27 Codex Max Launch: Training for Long-Running Coding Agents
00:03:01 Model Personality and Trust: Communication, Planning, and Self-Checking
00:05:20 Codex vs GPT-5: Opinionated Agents vs General Models
00:07:47 Tool Use and Model Habits: The Ripgrep Discovery
00:09:16 Personality Design: Verbosity vs Efficiency in Coding Agents
00:11:56 The Agent Abstraction Layer: Building on Top of Codex
00:14:08 Sub-Agents and Multi-Agent Patterns: The Future of Composition
00:16:11 Trust and Adoption: OpenAI Developers Using Codex Daily
00:17:21 Applied Evals: Real-World Testing vs Academic Benchmarks
00:19:15 Multi-Turn Evals and the Job Interview Pattern
00:21:35 Feature Request: Batch Multi-Turn Eval API
00:22:28 Beyond Code: Personal Automation and Computer Use
00:24:51 Vision-Native Agents and the UI Integration Challenge
00:25:02 2026 Predictions: Trust, Computer Use, and Democratized Excellence

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Кто пишет код лучше всех? Сравнил GPT‑5.2, Opus 4.5, Sonnet 4.5, Gemini 3, Qwen 3 Max, Kimi, GLM

Кто пишет код лучше всех? Сравнил GPT‑5.2, Opus 4.5, Sonnet 4.5, Gemini 3, Qwen 3 Max, Kimi, GLM

Steve Yegge's Vibe Coding Manifesto: Why Claude Code Isn't It & What Comes After the IDE

Steve Yegge's Vibe Coding Manifesto: Why Claude Code Isn't It & What Comes After the IDE

[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI

[State of Post-Training] From GPT-4.1 to 5.1: RLVR, Agent & Token Efficiency — Josh McGrath, OpenAI

Самая сложная модель из тех, что мы реально понимаем

Самая сложная модель из тех, что мы реально понимаем

Новое расширение Claude для Chrome: секретное оружие, которое должен использовать каждый

Новое расширение Claude для Chrome: секретное оружие, которое должен использовать каждый

Я в опасности

Как SpaceX построит город на Марсе

Как SpaceX построит город на Марсе

13 ПРИЁМОВ ПО РАБОТЕ С CLAUDE CODE ОТ ЕГО СОЗДАТЕЛЯ!

13 ПРИЁМОВ ПО РАБОТЕ С CLAUDE CODE ОТ ЕГО СОЗДАТЕЛЯ!

Тренды в ИИ 2026. К чему готовиться каждому.

Тренды в ИИ 2026. К чему готовиться каждому.

Почему OpenAI создала кодекс GPT-5.2 (и почему он подходит не всем)

Почему OpenAI создала кодекс GPT-5.2 (и почему он подходит не всем)

[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor

[State of RL/Reasoning] IMO/IOI Gold, OpenAI o3/GPT-5, and Cursor Composer — Ashvin Nair, Cursor

[State of AI Papers 2025] Fixing Research with Social Signals, OCR & Implementation — Team AlphaXiv

[State of AI Papers 2025] Fixing Research with Social Signals, OCR & Implementation — Team AlphaXiv

Claude Code: полный гайд по AI-кодингу (хаки, техники и секреты)

Claude Code: полный гайд по AI-кодингу (хаки, техники и секреты)

5 Уровней ИИ-Кодинга: от n8n и Cursor до Figma MCP и Google Stitch

5 Уровней ИИ-Кодинга: от n8n и Cursor до Figma MCP и Google Stitch

Появляется новый тип искусственного интеллекта, и он лучше, чем LLMS?

Появляется новый тип искусственного интеллекта, и он лучше, чем LLMS?

OpenAI добавляет навыки агентов в Codex (первый взгляд и пошаговое руководство)

OpenAI добавляет навыки агентов в Codex (первый взгляд и пошаговое руководство)

[State of Context Engineering] Agentic RAG, Context Rot, MCP, Subagents — Nina Lopatina, Contextual

[State of Context Engineering] Agentic RAG, Context Rot, MCP, Subagents — Nina Lopatina, Contextual

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

Zettelkasten + AI: Как я связал ChatGPT и Obsidian в единую систему знаний

Zettelkasten + AI: Как я связал ChatGPT и Obsidian в единую систему знаний

Is Zed the Killer of All IDEs?

Is Zed the Killer of All IDEs?