TY - CHAP
T1 - Robust Reinforcement Learning for Stochastic Linear Quadratic Control with Multiplicative Noise
AU - Pang, Bo
AU - Jiang, Zhong Ping
N1 - Funding Information:
Acknowledgements Confucius once said, Virtue is not left to stand alone. He who practices it will have neighbors. Laurent Praly, the former PhD advisor of the second-named author, is such a beautiful mind. His vision about and seminal contributions to control theory, especially nonlinear and adaptive control, have influenced generations of students including the authors of this chapter. ZPJ is privileged to have Laurent as the PhD advisor during 1989–1993 and is very grateful to Laurent for introducing him to the field of nonlinear control. It is under Laurent’s close guidance that ZPJ started, in 1991, working on the stability and control of interconnected nonlinear systems that has paved the foundation for nonlinear small-gain theory. The research findings presented here are just a reflection of Laurent’s vision about the relationships between control and learning. We also thank the U.S. National Science Foundation for its continuous financial support.
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - This chapter studies the robustness of reinforcement learning for discrete-time linear stochastic systems with multiplicative noise evolving in continuous state and action spaces. As one of the popular methods in reinforcement learning, the robustness of policy iteration is a longstanding open problem for the stochastic linear quadratic regulator (LQR) problem with multiplicative noise. A solution in the spirit of input-to-state stability is given, guaranteeing that the solutions of the policy iteration algorithm are bounded and enter a small neighborhood of the optimal solution, whenever the error in each iteration is bounded and small. In addition, a novel off-policy multiple-trajectory optimistic least-squares policy iteration algorithm is proposed, to learn a near-optimal solution of the stochastic LQR problem directly from online input/state data, without explicitly identifying the system matrices. The efficacy of the proposed algorithm is supported by rigorous convergence analysis and numerical results on a second-order example.
AB - This chapter studies the robustness of reinforcement learning for discrete-time linear stochastic systems with multiplicative noise evolving in continuous state and action spaces. As one of the popular methods in reinforcement learning, the robustness of policy iteration is a longstanding open problem for the stochastic linear quadratic regulator (LQR) problem with multiplicative noise. A solution in the spirit of input-to-state stability is given, guaranteeing that the solutions of the policy iteration algorithm are bounded and enter a small neighborhood of the optimal solution, whenever the error in each iteration is bounded and small. In addition, a novel off-policy multiple-trajectory optimistic least-squares policy iteration algorithm is proposed, to learn a near-optimal solution of the stochastic LQR problem directly from online input/state data, without explicitly identifying the system matrices. The efficacy of the proposed algorithm is supported by rigorous convergence analysis and numerical results on a second-order example.
UR - http://www.scopus.com/inward/record.url?scp=85115142702&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85115142702&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-74628-5_9
DO - 10.1007/978-3-030-74628-5_9
M3 - Chapter
AN - SCOPUS:85115142702
T3 - Lecture Notes in Control and Information Sciences
SP - 249
EP - 277
BT - Lecture Notes in Control and Information Sciences
PB - Springer Science and Business Media Deutschland GmbH
ER -