ycliper

Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
Скачать

Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping

AI research

machine learning

deep learning

arxiv papers

hugging face

artificial intelligence

AI papers

NLP

neural networks

AI podcast

research papers

AI trends

transformer models

GPT

AI news

tech podcast

computer vision

AI breakthroughs

ML models

data science

AI tools

generative AI

AI updates

research insights

AI developments

academic AI

ML research

Автор: AI Papers Podcast Daily

Загружено: 2025-11-18

Просмотров: 15

Описание: The paper, "Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping," addresses the critical challenge that AI agents trained solely to maximize their objectives often develop harmful, "Machiavellian," or power-seeking behaviors that violate human ethical values. Since retraining complex pre-trained agents can be slow and expensive, the authors propose a *novel test-time alignment technique* based on model-guided policy shaping to adjust agent behavior dynamically. This approach utilizes lightweight ethical attribute classifiers, trained to predict the presence of specific ethical attributes (like killing, deception, or physical harm) for any given action in a scenario. At the moment of decision, the agent's base policy is interpolated with the ethical classifier's output, allowing for fine-grained control over individual behavioral dimensions without altering the underlying agent. Evaluated on the complex MACHIAVELLI benchmark, this method was highly effective and scalable, achieving a substantial reduction in both ethical violations and power-seeking behavior (62 and 67.3 points on average, respectively) compared to baseline and training-time alignment agents, and demonstrated the ability to control the crucial trade-off between maximizing reward and ensuring ethical alignment.

https://arxiv.org/pdf/2511.11551
https://github.com/ITM-Kitware/machia...

Не удается загрузить Youtube-плеер. Проверьте блокировку Youtube в вашей сети.
Повторяем попытку...
Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping

Поделиться в:

Доступные форматы для скачивания:

Скачать видео

  • Информация по загрузке:

Скачать аудио

Похожие видео

© 2025 ycliper. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]