Abstract
This article proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm.
Original language | English (US) |
---|---|
Pages (from-to) | 7678-7693 |
Number of pages | 16 |
Journal | IEEE Transactions on Automatic Control |
Volume | 69 |
Issue number | 11 |
DOIs | |
State | Published - 2024 |
Keywords
- Policy optimization (PO)
- risk-sensitive linear quadratic Gaussian (LQG)
- robust reinforcement learning
ASJC Scopus subject areas
- Control and Systems Engineering
- Computer Science Applications
- Electrical and Electronic Engineering