Pushing the frontier of neural text to speech

Автор: Microsoft Research

Загружено: 2021-05-27

Просмотров: 10480

Описание: In the popular field of text to speech, the goal is to transform the written or printed word into speech that is natural and intelligible. Today, the technology is being used in products and services to help people who are blind or have low vision consume digital content, power personal digital assistants that sound more realistic, and make it easier to do two things at once, such as listening to an article online while washing dishes, among other applications. Although the quality of synthesized speech has gotten better thanks to neural network-based end-to-end TTS, advancing neural TTS and allowing it to be more easily integrated into product development and deployment requires overcoming a variety of remaining challenges.

In this webinar, Senior Researcher Xu Tan will talk about these challenges, specifically the high computational cost and slow inference speed in online serving; word skipping and repeating issues, poor voice quality, and lack of voice controllability; the large amounts of training data needed for improved voice synthesis; and the practical challenges in TTS voice adaptation. He’ll introduce his team’s work addressing these challenges—including fast TTS, end-to-end TTS, low-resource TTS, and adaptive TTS—as well as discuss other critical questions and opportunities to pursue in the space.

Together, you'll explore:

■ An overview of text to speech, including its evolution
■ The important challenges in neural text to speech and how to address them with dedicated research
■ How to factor product development into your research

𝗥𝗲𝘀𝗼𝘂𝗿𝗰𝗲 𝗹𝗶𝘀𝘁:

■ Text to Speech (project page): https://www.microsoft.com/en-us/resea...
■ Xu Tan (publications page): https://www.microsoft.com/en-us/resea...
■ Speech Research Repository Master List (GitHub): https://speechresearch.github.io/
■ FastSpeech: Fast, Robust and Controllable Text to Speech (GitHub): https://speechresearch.github.io/fast...
■ FastSpeech 2: Fast and High-Quality End-to-End Text to Speech (GitHub): https://speechresearch.github.io/fast...
■ AdaSpeech: Adaptive Text to Speech for Custom Voice (GitHub): https://speechresearch.github.io/adas...
■ AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data (GitHub): https://speechresearch.github.io/adas...
■ LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search (GitHub): https://speechresearch.github.io/ligh...
■ LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition (Github): https://speechresearch.github.io/lrsp...
■ Neural Text-to-Speech previews five new languages with innovative models in the low-resource setting (blog): https://techcommunity.microsoft.com/t...
■ Microsoft Azure Text to Speech: https://azure.microsoft.com/en-us/ser...
■ Microsoft Azure Custom Voice: https://speech.microsoft.com/customvoice
■ Xu Tan (Researcher profile): https://www.microsoft.com/en-us/resea...

*This on-demand webinar features a previously recorded Q&A session and open captioning.

This webinar originally aired on May 20, 2021

Explore more Microsoft Research webinars: https://aka.ms/msrwebinars

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Pushing the frontier of neural text to speech

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Foundations of Real-World Reinforcement Learning

Foundations of Real-World Reinforcement Learning

Роскомнадзор рубит Telegram, Россия вырезает скот, Куба во тьме. Мартынов, Дунцова, Ступин

Роскомнадзор рубит Telegram, Россия вырезает скот, Куба во тьме. Мартынов, Дунцова, Ступин

SANE2018 | Yu Zhang - Towards End-to-end Speech Synthesis

SANE2018 | Yu Zhang - Towards End-to-end Speech Synthesis

Fuzzing to improve the security and reliability of cloud services with RESTler

Fuzzing to improve the security and reliability of cloud services with RESTler

Музыка лечит сердце и сосуды🌸 Успокаивающая музыка восстанавливает нервную систему,расслабляющая

Музыка лечит сердце и сосуды🌸 Успокаивающая музыка восстанавливает нервную систему,расслабляющая

Prof. Simon King - Using Speech Synthesis to give Everyone their own Voice

Prof. Simon King - Using Speech Synthesis to give Everyone their own Voice

Intro to Text2Speech

Intro to Text2Speech

From Deep Learning of Disentangled Representations to Higher-level Cognition

From Deep Learning of Disentangled Representations to Higher-level Cognition

HMM-based Speech Synthesis: Fundamentals and Its Recent Advances

HMM-based Speech Synthesis: Fundamentals and Its Recent Advances

Stanford Seminar - Deep Speech: Scaling up end-to-end speech recognition

Stanford Seminar - Deep Speech: Scaling up end-to-end speech recognition

Музыка для души 🌺 Прекрасные мелодии зимой, под падающими снежинками 🌿 Музыка Сергей Гера

Музыка для души 🌺 Прекрасные мелодии зимой, под падающими снежинками 🌿 Музыка Сергей Гера

The spelled-out intro to neural networks and backpropagation: building micrograd

The spelled-out intro to neural networks and backpropagation: building micrograd

Still Mind Deep Work – Ocean Breeze Ambient | Deep Focus Sounds for Studying, Working & Flow State

Still Mind Deep Work – Ocean Breeze Ambient | Deep Focus Sounds for Studying, Working & Flow State

In-depth Review of VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

In-depth Review of VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Lecture 10 - Text to Speech (TTS) [Andrew Senior]

Lecture 10 - Text to Speech (TTS) [Andrew Senior]

Успокаивающая музыка для нервной системы 🌿 Глубокий сон и полное расслабление

Успокаивающая музыка для нервной системы 🌿 Глубокий сон и полное расслабление

Let's build GPT: from scratch, in code, spelled out.

Let's build GPT: from scratch, in code, spelled out.

The Hidden Monopoly Behind AI Chips: Tokyo Electron

The Hidden Monopoly Behind AI Chips: Tokyo Electron

Lecture 12: End-to-End Models for Speech Processing

Lecture 12: End-to-End Models for Speech Processing

Recent Advances in Image Captioning, Image-Text Retrieval and…

Recent Advances in Image Captioning, Image-Text Retrieval and…