HifiGAN From Scratch: Building a Neural Vocoder for Speech Synthesis
Автор: Priyam Mazumdar
Загружено: 2026-02-14
Просмотров: 562
Описание:
Code: https://github.com/priyammaz/PyTorch-...
Neural Vocoders are an important part of the TTS pipeline that takes some intermediate representation such as a Mel Spectrogram and converts it to waveforms. Although the Griffin-Lim algorithm also can do this, it typically is limited and leads to poor audio quality. The HIFIGAN architecture is a popular choice today for training a Neural Network to instead learn the conversion.
PREREQS:
1) I hope you saw the Tacotron2 Video • Build your own Voice Generator w/ Tacotron... as we will be reusing some of the data processing code from there!
2) I also hope you are comfortable with the basics of Audio Processing like Griffin-Lim and Spectrograms • Intro to Audio Processing for Deep Learning
Timestamps:
00:00:00 - What are Vocoders?
00:03:30 - Listening to some audio samples
00:05:30 - HIFIGAN Overview
00:06:15 - Audio to Mel Conversion / Data Processing
00:14:15 - Pretraining vs Finetuning
00:15:30 - Write the MelDataset
00:39:30 - Start the Model Implementation
00:42:30 - Weight Init
00:43:30 - Implement ResidualBlock
00:55:05 - Implement Generator
01:07:40 - What is Periodicity?
01:09:30 - Implement the Multi-Period Discriminator
01:26:00 - Implement the Multi-Scale Discriminator
01:31:30 - Wrap up the Model
01:32:05 - Feature Matching Loss
01:35:30 - Discriminator Loss from LS-GAN
01:39:00 - Generator Loss
01:40:40 - Training Script
01:58:45 - Finetuning HIFIGAN on Synthetic Mel Spectrograms
02:01:20 - Results
02:03:08 - Listen to Generations
02:08:10 - What new TTS systems are doing
Socials!
X / data_adventurer
Instagram / nixielights
Linkedin / priyammaz
Discord / discord
🚀 Github: https://github.com/priyammaz
🌐 Website: https://www.priyammazumdar.com/
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: