Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper studies the robustness of reinforcement learning algorithms to errors in the learning process. Specifically, we revisit the benchmark problem of discrete-time linear quadratic regulation (LQR) and study the long-standing open question: Under what conditions is the policy iteration method robustly stable from a dynamical systems perspective? Using advanced stability results in control theory, it is shown that policy iteration for LQR is inherently robust to small errors in the learning process and enjoys small-disturbance input-to-state stability: whenever the error in each iteration is bounded and small, the solutions of the policy iteration algorithm are also bounded, and, moreover, enter and stay in a small neighbourhood of the optimal LQR solution. As an application, a novel off-policy optimistic least-squares policy iteration for the LQR problem is proposed, when the system dynamics are subjected to additive stochastic disturbances. The proposed new results in robust reinforcement learning are validated by a numerical example.

Original languageEnglish (US)
Title of host publication35th AAAI Conference on Artificial Intelligence, AAAI 2021
PublisherAssociation for the Advancement of Artificial Intelligence
Pages9303-9311
Number of pages9
ISBN (Electronic)9781713835974
StatePublished - 2021
Event35th AAAI Conference on Artificial Intelligence, AAAI 2021 - Virtual, Online
Duration: Feb 2 2021Feb 9 2021

Publication series

Name35th AAAI Conference on Artificial Intelligence, AAAI 2021
Volume10B

Conference

Conference35th AAAI Conference on Artificial Intelligence, AAAI 2021
CityVirtual, Online
Period2/2/212/9/21

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation'. Together they form a unique fingerprint.

Cite this