A Lyapunov characterization of robust policy optimization

Leilei Cui, Zhong Ping Jiang

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we study the robustness property of policy optimization (particularly Gauss–Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning) subject to noise at each iteration. By invoking the concept of input-to-state stability and utilizing Lyapunov’s direct method, it is shown that, if the noise is sufficiently small, the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration. Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided. Based on Willems’ fundamental lemma, a learning-based policy iteration algorithm is proposed. The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal. The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration. Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.

Original languageEnglish (US)
Pages (from-to)374-389
Number of pages16
JournalControl Theory and Technology
Volume21
Issue number3
DOIs
StatePublished - Aug 2023

Keywords

  • Input-to-state stability (ISS)
  • Lyapunov’s direct method
  • Policy iteration (PI)
  • Policy optimization

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Information Systems
  • Modeling and Simulation
  • Aerospace Engineering
  • Control and Optimization
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A Lyapunov characterization of robust policy optimization'. Together they form a unique fingerprint.

Cite this