Prof. Furong Huang: Towards AI Security – An Interplay of Stress-Testing and Alignment

Автор: AI Agent Frontier

Загружено: 2025-09-09

Просмотров: 120

Описание: Talk Abstract: As large language models (LLMs) become increasingly integrated into critical applications, ensuring their robustness and alignment with human values is paramount. This talk explores the interplay between stress-testing LLMs and alignment strategies to secure AI systems against emerging threats. We begin by motivating the need for rigorous stress-testing approaches that expose vulnerabilities, focusing on three key challenges: hallucinations, jailbreaking, and poisoning attacks. Hallucinations—where models generate incorrect or misleading content—compromise reliability. Jailbreaking methods that bypass safety filters can be exploited to elicit harmful outputs, while data poisoning undermines model integrity and security. After identifying these challenges, we propose alignment methods that embed ethical and security constraints directly into model behavior. By systematically combining stress-testing methodologies with alignment interventions, we aim to advance AI security and foster the development of resilient, trustworthy LLMs.

Bio: Furong Huang is an Associate Professor of the Department of Computer Science at the University of Maryland. Specializing in trustworthy machine learning, Security in AI, AI for sequential decision-making, and generative AI, Dr. Huang focuses on applying principles to solve practical challenges in contemporary computing to develop efficient, robust, scalable, sustainable, ethical, and responsible machine learning algorithms. She is recognized for her contributions with awards including best paper awards, the MIT Technology Review Innovators Under 35 Asia Pacific, the MLconf Industry Impact Research Award, the NSF CRII Award, the Microsoft Accelerate Foundation Models Research award, the Adobe Faculty Research Award, three JP Morgan Faculty Research Awards and Finalist of AI in Research - AI researcher of the year for Women in AI Awards North America.

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...

Prof. Furong Huang: Towards AI Security – An Interplay of Stress-Testing and Alignment

Доступные форматы для скачивания:

Скачать видео

Информация по загрузке:

Скачать аудио

Похожие видео

Prof. Natasha Jaques: Multi-agent Reinforcement Learning (MARL) for LLMs

Prof. Natasha Jaques: Multi-agent Reinforcement Learning (MARL) for LLMs

Dr. Akshara Rai: Sim2Real Learning for Home Robots

Dr. Akshara Rai: Sim2Real Learning for Home Robots

MATH 2400 UPenn Session 1

MATH 2400 UPenn Session 1

The Man Behind Google's AI Machine | Demis Hassabis Interview

The Man Behind Google's AI Machine | Demis Hassabis Interview

Prof. Eric Xin Wang: Building AI Agents that Reason and Act Like Humans

Prof. Eric Xin Wang: Building AI Agents that Reason and Act Like Humans

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

π0: A Foundation Model for Robotics with Sergey Levine - 719

π0: A Foundation Model for Robotics with Sergey Levine - 719

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Prof. Peter Stone: Human-in-the-Loop Machine Learning for Robot Navigation and Manipulation

Prof. Peter Stone: Human-in-the-Loop Machine Learning for Robot Navigation and Manipulation

«Я понял, что это конец»: как создатель «Алисы» уволился из «Сбера», эмигрировал и строит AI-стартап

«Я понял, что это конец»: как создатель «Алисы» уволился из «Сбера», эмигрировал и строит AI-стартап

Травматолог №1: Суставы в 40, будут как в 20! Главное внедрите эти простые привычки

Травматолог №1: Суставы в 40, будут как в 20! Главное внедрите эти простые привычки

How to Secure AI Business Models

How to Secure AI Business Models

Explained: The OWASP Top 10 for Large Language Model Applications

Explained: The OWASP Top 10 for Large Language Model Applications

Не создавайте агентов, а развивайте навыки – Барри Чжан и Махеш Мураг, Anthropic

Не создавайте агентов, а развивайте навыки – Барри Чжан и Махеш Мураг, Anthropic

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Но что такое нейронная сеть? | Глава 1. Глубокое обучение

Уоррен Баффет: Если вы хотите разбогатеть, перестаньте покупать эти 5 вещей.

Уоррен Баффет: Если вы хотите разбогатеть, перестаньте покупать эти 5 вещей.

Prof. Huan Sun: Advancing the Capability and Safety of Computer-Use Agents Together

Prof. Huan Sun: Advancing the Capability and Safety of Computer-Use Agents Together

От нуля до вашего первого ИИ-агента за 25 минут (без кодирования)

От нуля до вашего первого ИИ-агента за 25 минут (без кодирования)

Важные открытия XXI века: почему рак победил и что не так с клонированием? Что скрывают нобелевки?

Важные открытия XXI века: почему рак победил и что не так с клонированием? Что скрывают нобелевки?