AI Safety Beyond Benchmarks -- Dr. Swabha Swayamdipta on Evaluation, Personalization, and Control

Автор: Women in AI Research WiAIR

Загружено: 2026-01-21

Просмотров: 157

Описание: As language models become more capable, the hardest questions are no longer just about performance, but about safety, interpretation, and control.

In this episode of Women in AI Research, we speak with Swabha Swayamdipta, Assistant Professor of Computer Science at the University of Southern California and co-Associate Director of the USC Center for AI and Society. Swabha’s research examines how the design and deployment of language models intersect with real-world risks — from how models behave in unexpected ways to how seemingly technical choices can have broader societal consequences.

We talk about AI safety from multiple angles: what it means when hidden inputs to models can sometimes be inferred from their outputs, why personalization introduces new trade-offs around privacy and user agency, and how assumptions about model behavior can quietly shape downstream harms. Rather than focusing only on accuracy or benchmarks, the conversation asks what kinds of evidence we actually need to trust these systems in practice.

CHAPTERS
00:00 Swabha's Journey into NLP Research
04:25 Navigating Career Challenges and Building Networks
08:58 The Importance of AI Safety and Reliability
10:49 Addressing Security and Privacy Concerns in Language Models
13:41 Innovations in Language Model Inversion
20:33 Balancing Personalization and Privacy in AI
27:08 Incorporating Psychological Scaffolds in Language Models
30:19 The Duality of AI: Enhancing Human Decision-Making
32:17 AI in Social Issues: Addressing Homelessness
35:18 OATH-Frames: Analyzing Public Sentiment on Homelessness
46:40 Suicide Prevention: AI's Role in Critical Interventions
56:12 The Responsibility of AI Researchers: Balancing Capability and Safety

REFERENCES
13:52 Better Language Model Inversion by Compactly Representing Next-Token Distributions (https://arxiv.org/abs/2506.17090)
27:18 Improving Language Model Personas via Rationalization with Psychological Scaffolds (https://ui.adsabs.harvard.edu/abs/202...)
35:21 OATH-Frames: Characterizing Online Attitudes Towards Homelessness with LLM Assistants (https://arxiv.org/abs/2406.14883)
46:52 Uncovering Intervention Opportunities for Suicide Prevention with Language Model Assistants (https://arxiv.org/abs/2508.18541)

🎧 Subscribe to stay updated on new episodes spotlighting brilliant women shaping the future of AI.

WiAIR website:
♾️ https://women-in-ai-research.github.io

Follow us at:
♾️ LinkedIn: / women-in-ai-research
♾️ Bluesky: https://bsky.app/profile/wiair.bsky.s...
♾️ X (Twitter): https://x.com/WiAIR_podcast

#AISafety #LanguageModels #AIResearch #ResponsibleAI #WomenInAI #NLP #MachineLearning #AIAlignment #wiair #wiairpodcast

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

AI Safety Beyond Benchmarks -- Dr. Swabha Swayamdipta on Evaluation, Personalization, and Control

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Faithfulness and Hallucinations in Reasoning Models, with Dr. Letitia Parcalabescu

Faithfulness and Hallucinations in Reasoning Models, with Dr. Letitia Parcalabescu

Конференция NeurIPS 2025 в Сан-Диего. Создание графов знаний из текста с помощью LLM — объяснение...

Конференция NeurIPS 2025 в Сан-Диего. Создание графов знаний из текста с помощью LLM — объяснение...

Технический анализ: как агенты ИИ игнорируют 40 лет прогресса в области безопасности.

Технический анализ: как агенты ИИ игнорируют 40 лет прогресса в области безопасности.

Прекратите болтать без умолку: 3-2-1 прием ораторского искусства, который заставит вас звучать ка...

Прекратите болтать без умолку: 3-2-1 прием ораторского искусства, который заставит вас звучать ка...

Конференция NeurIPS 2025 в Сан-Диего. Секреты глубокого обучения.

Конференция NeurIPS 2025 в Сан-Диего. Секреты глубокого обучения.

Лекция от легенды ИИ в Стэнфорде

Лекция от легенды ИИ в Стэнфорде

4 AI‑стартапа, которые уже меняют бизнес в 2026. Zoom‑клон и симулятор социальной сети

4 AI‑стартапа, которые уже меняют бизнес в 2026. Zoom‑клон и симулятор социальной сети

Do LLMs Understand Meaning? Neuroscience, Evaluation, and the Future of AI, with Maria Ryskina

Do LLMs Understand Meaning? Neuroscience, Evaluation, and the Future of AI, with Maria Ryskina

Лучшие инструменты искусственного интеллекта для академической среды в 2026 году — прекратите пои...

Лучшие инструменты искусственного интеллекта для академической среды в 2026 году — прекратите пои...

Конференция NeurIPS 2025 в Сан-Диего. Действительно ли модели искусственного интеллекта понимают ...

Конференция NeurIPS 2025 в Сан-Диего. Действительно ли модели искусственного интеллекта понимают ...

Лучший документальный фильм про создание ИИ

Лучший документальный фильм про создание ИИ

Тренды в ИИ 2026. К чему готовиться каждому.

Тренды в ИИ 2026. К чему готовиться каждому.

OpenAI против Anthropic: начинается большая война ИИ

OpenAI против Anthropic: начинается большая война ИИ

Limits of Transformers, with Dr. Nouha Dziri

Limits of Transformers, with Dr. Nouha Dziri

День 1 на NeurIPS 2025 — семинар WiML

День 1 на NeurIPS 2025 — семинар WiML

Иллюзия контроля: кто на самом деле управляет твоей жизнью? | Роберт Сапольски

Иллюзия контроля: кто на самом деле управляет твоей жизнью? | Роберт Сапольски

Этим вы отталкиваете людей! 7 золотых правил этикета, чтобы расположить к себе любого / Анна Валл

Этим вы отталкиваете людей! 7 золотых правил этикета, чтобы расположить к себе любого / Анна Валл

Vision-Language Programs - Antonia Wüst

Vision-Language Programs - Antonia Wüst

The U.S. in the World | Main Stage I

The U.S. in the World | Main Stage I

How Does AI Reflect Society, with Dr. Maria Antoniak

How Does AI Reflect Society, with Dr. Maria Antoniak