Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control

Leilei Cui, Tamer Basar, Zhong Ping Jiang

Research output: Contribution to journalArticlepeer-review


This paper proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm.

Original languageEnglish (US)
Pages (from-to)1-16
Number of pages16
JournalIEEE Transactions on Automatic Control
StateAccepted/In press - 2024


  • Approximation algorithms
  • Convergence
  • Estimation error
  • Heuristic algorithms
  • Optimization
  • Performance analysis
  • policy optimization
  • risk-sensitive LQG
  • Robust reinforcement learning
  • Robustness

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science Applications
  • Electrical and Electronic Engineering


Dive into the research topics of 'Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control'. Together they form a unique fingerprint.

Cite this