TY - CHAP
T1 - Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise
AU - Pang, Bo
AU - Jiang, Zhong Ping
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - This chapter studies the robustness of reinforcement learning for discrete-time linear stochastic systems with multiplicative noise evolving in continuous state and action spaces. As one of the popular methods in reinforcement learning, the robustness of policy iteration is a longstanding open problem for the stochastic linear quadratic regulator (LQR) problem with multiplicative noise. A solution in the spirit of input-to-state stability is given, guaranteeing that the solutions of the policy iteration algorithm are bounded and enter a small neighborhood of the optimal solution, whenever the error in each iteration is bounded and small. In addition, a novel off-policy multiple-trajectory optimistic least-squares policy iteration algorithm is proposed, to learn a near-optimal solution of the stochastic LQR problem directly from online input/state data, without explicitly identifying the system matrices. The efficacy of the proposed algorithm is supported by rigorous convergence analysis and numerical results on a second-order example.
AB - This chapter studies the robustness of reinforcement learning for discrete-time linear stochastic systems with multiplicative noise evolving in continuous state and action spaces. As one of the popular methods in reinforcement learning, the robustness of policy iteration is a longstanding open problem for the stochastic linear quadratic regulator (LQR) problem with multiplicative noise. A solution in the spirit of input-to-state stability is given, guaranteeing that the solutions of the policy iteration algorithm are bounded and enter a small neighborhood of the optimal solution, whenever the error in each iteration is bounded and small. In addition, a novel off-policy multiple-trajectory optimistic least-squares policy iteration algorithm is proposed, to learn a near-optimal solution of the stochastic LQR problem directly from online input/state data, without explicitly identifying the system matrices. The efficacy of the proposed algorithm is supported by rigorous convergence analysis and numerical results on a second-order example.
UR - http://www.scopus.com/inward/record.url?scp=85115142702&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85115142702&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-74628-5_9
DO - 10.1007/978-3-030-74628-5_9
M3 - Chapter
AN - SCOPUS:85115142702
T3 - Lecture Notes in Control and Information Sciences
SP - 249
EP - 277
BT - Lecture Notes in Control and Information Sciences
PB - Springer Science and Business Media Deutschland GmbH
ER -