AI Dubbing Demystified: Insights from TTS Expert | Voices of the Industry Ep8 w/ Álex Pérez

Автор: AI Loc Think Tank

Загружено: 2025-11-13

Просмотров: 135

Описание: AI dubbing is everywhere right now—but very few people actually understand how these systems work under the hood. Are we really close to replacing human voice actors? Why do some AI-dubbed clips sound great while others feel… off?

In this episode of “Voices of the Industry”, Belén sits down with Álex Pérez, text-to-speech scientist, to demystify AI dubbing, synthetic voices, and modern TTS / STS workflows. Together, they unpack the science (not just the marketing) behind AI dubbing and what it really takes to get natural, believable performances in multiple languages.

🎧 In this conversation, we cover:

✅What “AI dubbing” actually means (TTS vs. voice conversion vs. speech-to-speech)

✅The underlying tech: autoregressive vs. non-autoregressive TTS and transformer-based architectures

✅How TTS and STS models are really trained – and why good data is so hard to get

✅Hallucinations in AI-generated speech and why accent, style, and intent are still so tricky to control

✅Why voice acting and lip-sync remain the hardest parts of AI dubbing

✅Limitations of current tools for premium content (film, series, games) vs. e-learning, podcasts, etc.

✅The future: promptable TTS, scene-level generation, and multimodal AI systems handling dubbing end-to-end

🎙️ Hosted by: Belén Agulló from the AI Localization Think Tank

💡 Guest: Álex Pérez, Lead Text-to-Speech Scientist at Apptek

---
🔍In This Episode

00:00 Introduction
01:20 Meet Álex: Background and Expertise
03:42 Understanding AI Dubbing and Synthetic Voices
05:17 Technical Insights: TTS and Voice Conversion Models
13:38 Training AI Models for Dubbing
20:33 Challenges in AI Dubbing
30:49 Future of Synthetic Voices
34:44 Conclusion and Final Thoughts

---
Álex Pérez LinkedIn Profile: / alexdemartos

📖Álex's selection of recent scientific papers on AI Dubbing:

➡️Microsoft's VibeVoice (auto-regressive): https://arxiv.org/pdf/2508.19205
➡️F5-TTS (non-autoregressive): https://arxiv.org/pdf/2410.06885v1
➡️Kyutai's DSM (auto-regressive, streaming/low-latency TTS): https://arxiv.org/pdf/2509.08753v1

👉 Subscribe to the AI Localization Think Tank channel and newsletter for more conversations like this.

📢 Join the discussion on LinkedIn and tell us: What do you think about synthetic voices and their impact in the localization industry?

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

AI Dubbing Demystified: Insights from TTS Expert | Voices of the Industry Ep8 w/ Álex Pérez

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Freelanceverse, AI, and the Future of Translation | Voices of the Industry Ep9 w/ Adrian Probst

Freelanceverse, AI, and the Future of Translation | Voices of the Industry Ep9 w/ Adrian Probst

Mam Dwugłowego Węża... (0,001%)

Mam Dwugłowego Węża... (0,001%)

OOP1 SS26 - VO+KU - Kick-Off

OOP1 SS26 - VO+KU - Kick-Off

Самый подробный видеоурок про нейросеть NOTEBOOK LM и ее связке с GEMINI - с обновлениями 2026 года.

Самый подробный видеоурок про нейросеть NOTEBOOK LM и ее связке с GEMINI - с обновлениями 2026 года.

Стриминги заполняет музыка, созданная ИИ – и людям она нравится

Стриминги заполняет музыка, созданная ИИ – и людям она нравится

Дороничев: ИИ — пузырь, который скоро ЛОПНЕТ. Какие перемены ждут мир?

Дороничев: ИИ — пузырь, который скоро ЛОПНЕТ. Какие перемены ждут мир?

Может ли у ИИ появиться сознание? — Семихатов, Анохин

Может ли у ИИ появиться сознание? — Семихатов, Анохин

NotebookLM на максималках. Как изучать всё быстрее чем 99% пользователей

NotebookLM на максималках. Как изучать всё быстрее чем 99% пользователей

Савватеев разоблачает фокусы Земскова

Савватеев разоблачает фокусы Земскова

ШУЛЬМАН: "Мы не могли не..". ПРОВОКАЦИЯ, что происходит и самый страшный вопрос

MacBook Neo за $599 — дешевле не бывает!

MacBook Neo за $599 — дешевле не бывает!

Я разобрал всю ИИ-экосистему Google — 7 ключевых инструментов

Я разобрал всю ИИ-экосистему Google — 7 ключевых инструментов

Деревья НЕ растут из земли (это не то, что вы думаете) | Ричард Фейнман объясняет почему

Деревья НЕ растут из земли (это не то, что вы думаете) | Ричард Фейнман объясняет почему

Как знание языка влияет на деньги и мозг

Как знание языка влияет на деньги и мозг

GROK Показал AGI! Илон Маск ВЗОРВАЛ Индустрию ИИ! Grok СамоОбучается! Новый Уровень ИИ! В 100 РАЗ

GROK Показал AGI! Илон Маск ВЗОРВАЛ Индустрию ИИ! Grok СамоОбучается! Новый Уровень ИИ! В 100 РАЗ

Алексей Арестович. США против Ирана или Китая? Антизападная риторика Зеленского

Алексей Арестович. США против Ирана или Китая? Антизападная риторика Зеленского

Самая Сложная Задача В Истории Самой Сложной Олимпиады

Самая Сложная Задача В Истории Самой Сложной Олимпиады

Don’t Speak – No Doubt | Разбор песни + перевод | Учим английский по песням

Don’t Speak – No Doubt | Разбор песни + перевод | Учим английский по песням

Гипотеза Пуанкаре — Алексей Савватеев на ПостНауке

Гипотеза Пуанкаре — Алексей Савватеев на ПостНауке

Как художнику продавать картины: где, кому и за сколько

Как художнику продавать картины: где, кому и за сколько