Evaluating the worst-case performance of a reinforcement learning (RL) agent
under the strongest/optimal adversarial perturbations on state observations
(within some constraints) is crucial for understanding the robustness of RL
agents. However, finding the optimal adversary is challenging, in terms of both
whether we can find the optimal attack and how efficiently we can find it.
Existing works on adversarial RL either use heuristics-based methods that may
not find the strongest adversary, or directly train an RL-based adversary by
treating the agent as a part of the environment, which can find the optimal
adversary but may become intractable in a large state space. In this paper, we
propose a novel attacking algorithm which has an RL-based “director” searching
for the optimal policy perturbation, and an “actor” crafting state
perturbations following the directions from the director (i.e. the actor
executes targeted attacks). Our proposed algorithm, PA-AD, is theoretically
optimal against an RL agent and significantly improves the efficiency compared
with prior RL-based works in environments with large or pixel state spaces.
Empirical results show that our proposed PA-AD universally outperforms
state-of-the-art attacking methods in a wide range of environments. Our method
can be easily applied to any RL algorithms to evaluate and improve their

By admin