Abstract
This paper studies the robustness of reinforcement learning for discrete-time linear stochastic systems with multiplicative noise evolving in continuous state and action spaces. As one of the popular methods in reinforcement learning, the robustness of policy iteration is a longstanding open issue for the stochastic linear quadratic regulator (LQR) problem with multiplicative noise. A solution in the spirit of small-disturbance input-to-state stability is given, guaranteeing that the solutions of the policy iteration algorithm are bounded and enter a small neighborhood of the optimal solution, whenever the error in each iteration is bounded and small. In addition, a novel off-policy multiple-trajectory optimistic least-squares policy iteration algorithm is proposed, to learn a near-optimal solution of the stochastic LQR problem directly from online input/state data, without explicitly identifying the system matrices. The efficacy of the proposed algorithm is supported by rigorous convergence analysis and numerical results on a second-order example.
Original language | English (US) |
---|---|
Pages (from-to) | 240-243 |
Number of pages | 4 |
Journal | IFAC-PapersOnLine |
Volume | 54 |
Issue number | 7 |
DOIs | |
State | Published - Jul 1 2021 |
Event | 19th IFAC Symposium on System Identification, SYSID 2021 - Padova, Italy Duration: Jul 13 2021 → Jul 16 2021 |
Keywords
- Data-based control
- Input-to-state stability
- Reinforcement learning control
- Robustness analysis
- Stochastic optimal control problems
ASJC Scopus subject areas
- Control and Systems Engineering