Empowering Low-Resource Languages Through Technology | Voices of the Industry Ep10 w/Felipe Sánchez
Автор: AI Loc Think Tank
Загружено: 2026-03-05
Просмотров: 5
Описание:
In this episode of “Voices of the Industry” by the AI Localization Think Tank, Belén interviews Felipe Sánchez Martínez, associate professor at the University of Alicante, about building machine translation for low-resource languages and how the field has moved from rule-based to statistical, hybrid, and neural approaches. Felipe explains how neural MT enables transfer learning and multilingual systems, but highlights key data challenges: scarce parallel corpora, inconsistent orthography, and the difficulty of crawling usable web data. He describes work on predicting language and parallelism from URLs to guide crawling, and warns that much online text may be MT output, requiring detection and careful handling of synthetic data. He also discusses community-driven data creation for Mayan languages in Guatemala, including terminology agreement, guidelines, review workflows, and scanning/OCR hurdles. Finally, he outlines a new Spanish-government-funded project using LLMs for low-resource translation, including leveraging unstructured resources like grammar books and releasing outputs as open source.
00:00 Welcome and Guest Intro
01:05 Felipe Background in MT
03:30 Why Low Resource Matters
05:32 Crawling and Filtering Data
07:44 Mayan Languages Fieldwork
11:21 Finding Translators Partners
12:40 Detecting Machine Translations
16:24 LLMs and Creativity Gap
21:04 New Funded Research Project
24:54 Teaching LLMs with Grammars
27:42 Wrap Up and Thanks
—
➡️Felipe Sánchez LinkedIn Profile: / felipe-s%c3%a1nchez-mart%c3%adnez-5817037a
➡️Link to Felipe’s research: https://www.dlsi.ua.es/~fsanchez/
➡️Link to Transducens Project website: https://transducens.github.io/ai-tralow/
👉 Subscribe to the AI Localization Think Tank channel and newsletter for more conversations like this.
📢 Join the discussion on LinkedIn and tell us: What do you think about the data challenge for low-resource languages?
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: