Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals

Yunhan Huang, Quanyan Zhu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper studies reinforcement learning (RL) under malicious falsification on cost signals and introduces a quantitative framework of attack models to understand the vulnerabilities of RL. Focusing on Q-learning, we show that Q-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the Q-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary’s favored policy. A numerical case study of water reservoir control is provided to show the potential hazards of RL in learning-based control systems and corroborate the results.

Original languageEnglish (US)
Title of host publicationDecision and Game Theory for Security - 10th International Conference, GameSec 2019, Proceedings
EditorsTansu Alpcan, Yevgeniy Vorobeychik, John S. Baras, György Dán
PublisherSpringer
Pages217-237
Number of pages21
ISBN (Print)9783030324292
DOIs
StatePublished - 2019
Event10th International Conference on Decision and Game Theory for Security, GameSec 2019 - Stockholm, Sweden
Duration: Oct 30 2019Nov 1 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11836 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th International Conference on Decision and Game Theory for Security, GameSec 2019
Country/TerritorySweden
CityStockholm
Period10/30/1911/1/19

Keywords

  • Adversarial learning
  • Cybersecurity
  • Deception and counterdeception
  • Q-learning
  • Reinforcement learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals'. Together they form a unique fingerprint.

Cite this