AI Dubbing Demystified: Insights from TTS Expert | Voices of the Industry Ep8 w/ Álex Pérez
Автор: AI Loc Think Tank
Загружено: 2025-11-13
Просмотров: 135
Описание:
AI dubbing is everywhere right now—but very few people actually understand how these systems work under the hood. Are we really close to replacing human voice actors? Why do some AI-dubbed clips sound great while others feel… off?
In this episode of “Voices of the Industry”, Belén sits down with Álex Pérez, text-to-speech scientist, to demystify AI dubbing, synthetic voices, and modern TTS / STS workflows. Together, they unpack the science (not just the marketing) behind AI dubbing and what it really takes to get natural, believable performances in multiple languages.
🎧 In this conversation, we cover:
✅What “AI dubbing” actually means (TTS vs. voice conversion vs. speech-to-speech)
✅The underlying tech: autoregressive vs. non-autoregressive TTS and transformer-based architectures
✅How TTS and STS models are really trained – and why good data is so hard to get
✅Hallucinations in AI-generated speech and why accent, style, and intent are still so tricky to control
✅Why voice acting and lip-sync remain the hardest parts of AI dubbing
✅Limitations of current tools for premium content (film, series, games) vs. e-learning, podcasts, etc.
✅The future: promptable TTS, scene-level generation, and multimodal AI systems handling dubbing end-to-end
🎙️ Hosted by: Belén Agulló from the AI Localization Think Tank
💡 Guest: Álex Pérez, Lead Text-to-Speech Scientist at Apptek
---
🔍In This Episode
00:00 Introduction
01:20 Meet Álex: Background and Expertise
03:42 Understanding AI Dubbing and Synthetic Voices
05:17 Technical Insights: TTS and Voice Conversion Models
13:38 Training AI Models for Dubbing
20:33 Challenges in AI Dubbing
30:49 Future of Synthetic Voices
34:44 Conclusion and Final Thoughts
---
Álex Pérez LinkedIn Profile: / alexdemartos
📖Álex's selection of recent scientific papers on AI Dubbing:
➡️Microsoft's VibeVoice (auto-regressive): https://arxiv.org/pdf/2508.19205
➡️F5-TTS (non-autoregressive): https://arxiv.org/pdf/2410.06885v1
➡️Kyutai's DSM (auto-regressive, streaming/low-latency TTS): https://arxiv.org/pdf/2509.08753v1
👉 Subscribe to the AI Localization Think Tank channel and newsletter for more conversations like this.
📢 Join the discussion on LinkedIn and tell us: What do you think about synthetic voices and their impact in the localization industry?
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: