Abstract
In this paper, we propose a robust reinforcement learning method for a class of linear discrete-time systems to handle model mismatches that may be induced by sim-to-real gap. Under the formulation of risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to iteratively approximate the robust and optimal controller. The convergence and robustness of the dual-loop policy optimization algorithm are rigorously analyzed. It is shown that the dual-loop policy optimization algorithm uniformly converges to the optimal solution. In addition, by invoking the concept of small-disturbance input-to-state stability, it is guaranteed that the dual-loop policy optimization algorithm still converges to a neighborhood of the optimal solution when the algorithm is subject to a sufficiently small disturbance at each step. When the system matrices are unknown, a learning-based off-policy policy optimization algorithm is proposed for the same class of linear systems with additive Gaussian noise. The numerical simulation is implemented to demonstrate the efficacy of the proposed algorithm.
Original language | English (US) |
---|---|
Pages (from-to) | 534-546 |
Number of pages | 13 |
Journal | Proceedings of Machine Learning Research |
Volume | 211 |
State | Published - 2023 |
Event | 5th Annual Conference on Learning for Dynamics and Control, L4DC 2023 - Philadelphia, United States Duration: Jun 15 2023 → Jun 16 2023 |
Keywords
- Robust reinforcement learning
- input-to-state stability (ISS)
- policy optimization (PO)
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability