TY - GEN
T1 - Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals
AU - Huang, Yunhan
AU - Zhu, Quanyan
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - This paper studies reinforcement learning (RL) under malicious falsification on cost signals and introduces a quantitative framework of attack models to understand the vulnerabilities of RL. Focusing on Q-learning, we show that Q-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the Q-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary’s favored policy. A numerical case study of water reservoir control is provided to show the potential hazards of RL in learning-based control systems and corroborate the results.
AB - This paper studies reinforcement learning (RL) under malicious falsification on cost signals and introduces a quantitative framework of attack models to understand the vulnerabilities of RL. Focusing on Q-learning, we show that Q-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the Q-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary’s favored policy. A numerical case study of water reservoir control is provided to show the potential hazards of RL in learning-based control systems and corroborate the results.
KW - Adversarial learning
KW - Cybersecurity
KW - Deception and counterdeception
KW - Q-learning
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85076422155&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076422155&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-32430-8_14
DO - 10.1007/978-3-030-32430-8_14
M3 - Conference contribution
AN - SCOPUS:85076422155
SN - 9783030324292
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 217
EP - 237
BT - Decision and Game Theory for Security - 10th International Conference, GameSec 2019, Proceedings
A2 - Alpcan, Tansu
A2 - Vorobeychik, Yevgeniy
A2 - Baras, John S.
A2 - Dán, György
PB - Springer
T2 - 10th International Conference on Decision and Game Theory for Security, GameSec 2019
Y2 - 30 October 2019 through 1 November 2019
ER -