Multilingual LLM Evaluation in Practical Settings - Sebastian Ruder (Meta)

Автор: HiTZ zentroa

Загружено: 2025-02-10

Просмотров: 333

Описание: Large language models (LLMs) are increasingly used in a variety of applications across the globe but do not provide equal utility across languages. In this talk, I will discuss multilingual evaluation of LLMs in two practical settings: conversational instruction-following and usage of quantized models. For the first part, I will focus on a specific aspect of multilingual conversational ability where errors result in a jarring user experience: generating text in the user’s desired language. I will describe a new benchmark and evaluation of a range of LLMs. We find that even the strongest models exhibit language confusion, i.e., they fail to consistently respond in the correct language. I will discuss what affects language confusion, how to mitigate it, and potential extensions. In the second part, I will discuss the first evaluation study of quantized multilingual LLMs across languages. We find that automatic metrics severely underestimate the negative impact of quantization and that human evaluation—which has been neglected by prior studies—is key to revealing harmful effects. Overall, I highlight limitations of multilingual LLMs and challenges of real-world multilingual evaluation.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Multilingual LLM Evaluation in Practical Settings - Sebastian Ruder (Meta)

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

The Mímir Project: Impact of copyrighted materials in LLMs - Javier de la RosaJavier de la Rosa

The Mímir Project: Impact of copyrighted materials in LLMs - Javier de la RosaJavier de la Rosa

Prompting is *not* all you need! Or why Multi-LLM Collaboration Matters-Mirella Lapata (Edin)

Prompting is *not* all you need! Or why Multi-LLM Collaboration Matters-Mirella Lapata (Edin)

Meaning making with artificial interlocutors and risks of language technology-Emily M. Bender (UW)

Meaning making with artificial interlocutors and risks of language technology-Emily M. Bender (UW)

xCOMET,Tower,EuroLLM: Open & Multilingual LLMs for Europe-André F. T. Martins~Universidade de Lisboa

xCOMET,Tower,EuroLLM: Open & Multilingual LLMs for Europe-André F. T. Martins~Universidade de Lisboa

Toward Argumentative Large Language Models - Henning Wachsmuth (Leibniz University Hannover)

Toward Argumentative Large Language Models - Henning Wachsmuth (Leibniz University Hannover)

Лекция от легенды ИИ в Стэнфорде

Лекция от легенды ИИ в Стэнфорде

Как заговорить на любом языке? Главная ошибка 99% людей в изучении. Полиглот Дмитрий Петров.

Как заговорить на любом языке? Главная ошибка 99% людей в изучении. Полиглот Дмитрий Петров.

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

GraphRAG: союз графов знаний и RAG: Эмиль Эйфрем

Мировая роль евреев. Что связывает файлы Эпштейна и иранский вопрос? Дело принца Эндрю. Шевченко

Мировая роль евреев. Что связывает файлы Эпштейна и иранский вопрос? Дело принца Эндрю. Шевченко

Towards Inclusive Multimodal AI - Registration Emanuele Bugliarello (Google DeepMind)

Towards Inclusive Multimodal AI - Registration Emanuele Bugliarello (Google DeepMind)

A Research Agenda for Low Resource NLP~ Thamar Solorio-Md bin Zayed Univ. of Artificial Intelligence

A Research Agenda for Low Resource NLP~ Thamar Solorio-Md bin Zayed Univ. of Artificial Intelligence

О ВОЙНЕ, ДЕНЬГАХ И ЧУТЬ-ЧУТЬ О ЛИТЕРАТУРЕ #веллер 24 02 2026

О ВОЙНЕ, ДЕНЬГАХ И ЧУТЬ-ЧУТЬ О ЛИТЕРАТУРЕ #веллер 24 02 2026

SIGCOMM'24 TS1: CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving

SIGCOMM'24 TS1: CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving

Te skecze przejdą do historii! - Kabaret Moralnego Niepokoju - Wielki Test o Historii i Skojarzenia

Te skecze przejdą do historii! - Kabaret Moralnego Niepokoju - Wielki Test o Historii i Skojarzenia

Нина Хрущёва: «Эту лягушку он кипятит долго» // «Скажи Гордеевой»

Нина Хрущёва: «Эту лягушку он кипятит долго» // «Скажи Гордеевой»

Основные теоремы в теории игр — Алексей Савватеев на ПостНауке

Основные теоремы в теории игр — Алексей Савватеев на ПостНауке

Как создаются степени магистра права?

Как создаются степени магистра права?

Safer Generative ConvAI - Pascale Fung (The Hong Kong University of Science and Technology)

Safer Generative ConvAI - Pascale Fung (The Hong Kong University of Science and Technology)

Что такое генеративный ИИ и как он работает? – Лекции Тьюринга с Миреллой Лапатой

Что такое генеративный ИИ и как он работает? – Лекции Тьюринга с Миреллой Лапатой

Самая недооценённая идея в науке

Самая недооценённая идея в науке