[Olewave's Review] AudioLM: a Language Modeling Approach to Audio Generation

Автор: Olewave

Загружено: 2023-03-11

Просмотров: 3016

Описание: Eager to train your own #Whisper or #GPT-4o model but running out of data? We are proud to offer this unique large-scale conversational speech dataset in different languages and topics for #ASR, #TTS, #NLP, and other conversational AI R&D. It has speaker labels and high quality transcriptions. The duration of the dataset depends on the customer's needs and can extend up to 1 million hours. See the description and samples in the following post:
/ olewave-large-scaled-convesational-speech-...
send an email to [email protected] for more details.

AudioLM: a Language Modeling Approach to Audio Generation

https://arxiv.org/abs/2209.03143
Abstract:

We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens and casts audio generation as a language modeling task in this representation space. We show how existing audio tokenizers provide different trade-offs between reconstruction quality and long-term structure, and we propose a hybrid tokenization scheme to achieve both objectives. Namely, we leverage the discretized activations of a masked language model pre-trained on audio to capture long-term structure and the discrete codes produced by a neural audio codec to achieve high-quality synthesis. By training on large corpora of raw audio waveforms, AudioLM learns to generate natural and coherent continuations given short prompts. When trained on speech, and without any transcript or annotation, AudioLM generates syntactically and semantically plausible speech continuations while also maintaining speaker identity and prosody for unseen speakers. Furthermore, we demonstrate how our approach extends beyond speech by generating coherent piano music continuations, despite being trained without any symbolic representation of music.

#audiolm #google #openai #gpt3 #audiogeneration #textgeneration #soundstream

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

[Olewave's Review] AudioLM: a Language Modeling Approach to Audio Generation

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

In-depth Review of Google's SoundStream: An End-to-End Neural Audio Codec

In-depth Review of Google's SoundStream: An End-to-End Neural Audio Codec

From OpenAI's Whisper Model to Your Own In-House ASR Service: Postprocessing and Language Modeling

From OpenAI's Whisper Model to Your Own In-House ASR Service: Postprocessing and Language Modeling

From OpenAI's Whisper Model to Your Own In-House ASR Service: Long Audio and Streaming (Part 3)

From OpenAI's Whisper Model to Your Own In-House ASR Service: Long Audio and Streaming (Part 3)

TTS

Иностранные языки 2:0 без репетитора: Gemini + NotebookLM I Промпты для изучения французского языка

Иностранные языки 2:0 без репетитора: Gemini + NotebookLM I Промпты для изучения французского языка

UG‑99 Hydrostatic Test Explained: What Every Pressure Vessel Inspector Should Know

UG‑99 Hydrostatic Test Explained: What Every Pressure Vessel Inspector Should Know

Как понять RAG за 18 минут, даже если ты никогда не слышал про эмбеддинги

Как понять RAG за 18 минут, даже если ты никогда не слышал про эмбеддинги

Самая Сложная Задача В Истории Самой Сложной Олимпиады

Самая Сложная Задача В Истории Самой Сложной Олимпиады

Полный гайд по Claude: как выжать максимум из этой нейросети

Полный гайд по Claude: как выжать максимум из этой нейросети

Как заговорить на любом языке? Главная ошибка 99% людей в изучении. Полиглот Дмитрий Петров.

Как заговорить на любом языке? Главная ошибка 99% людей в изучении. Полиглот Дмитрий Петров.

Как Сделать Настольный ЭЛЕКТРОЭРОЗИОННЫЙ Станок?

Как Сделать Настольный ЭЛЕКТРОЭРОЗИОННЫЙ Станок?

США нападут на РФ / Глава правительства убит? / Россияне в шоке от заявления РПЦ

США нападут на РФ / Глава правительства убит? / Россияне в шоке от заявления РПЦ

ChatGPT и Gemini устарели. Ты перейдешь на Claude и вот почему…

ChatGPT и Gemini устарели. Ты перейдешь на Claude и вот почему…

Почему японцы до сих пор пишут иероглифами? История японской письменности

Почему японцы до сих пор пишут иероглифами? История японской письменности

Как Гений Математик разгадал тайну вселенной

Как Гений Математик разгадал тайну вселенной

Что ТАКОЕ USB-C на Самом Деле (и Почему Весь Мир был Вынужден его Принять)

Что ТАКОЕ USB-C на Самом Деле (и Почему Весь Мир был Вынужден его Принять)

[Detailed Paper Reading] Zipformer: A faster and better encoder for automatic speech recognition

[Detailed Paper Reading] Zipformer: A faster and better encoder for automatic speech recognition

Физики нашли способ объяснить реальность… и он пугает

Физики нашли способ объяснить реальность… и он пугает

Юрий Кнутов | Иран: боевые действия

Юрий Кнутов | Иран: боевые действия

Лучший документальный фильм про создание ИИ

Лучший документальный фильм про создание ИИ