Attackers Adapt: Why “We Tested Prompt Injection” Doesn’t Mean You’re Safe
Автор: David Campbell
Загружено: 2026-03-02
Просмотров: 97
Описание:
If your agent passes 500 jailbreak prompts, that’s nice.
It doesn’t mean you’re secure.
Security is adversarial. Attackers iterate. They probe your system, learn how it responds, and route around your controls.
Static prompt injection tests are useful for regression.
They are not a realistic model of an adaptive attacker.
In this episode:
Why static jailbreak benchmarks overestimate robustness
What “the attacker moves second” actually means
How adaptive attacks work against tool-using agents
What credible adversarial evaluation should include
Why replayability and regression matter
An attacker only needs one working path. Your benchmark needs 100%.
Research referenced:
Nasr et al., “The Attacker Moves Second” (2025)
https://arxiv.org/abs/2510.09023
AgentDojo (2024)
https://arxiv.org/abs/2406.13352
Static tests aren’t useless. They’re just not the top of the pyramid.
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: