Optimal Output-Feedback Control of Unknown Continuous-Time Linear Systems Using Off-policy Reinforcement Learning

Hamidreza Modares, Frank L. Lewis, Zhong Ping Jiang

Research output: Contribution to journalArticlepeer-review

Abstract

A model-free off-policy reinforcement learning algorithm is developed to learn the optimal output-feedback (OPFB) solution for linear continuous-time systems. The proposed algorithm has the important feature of being applicable to the design of optimal OPFB controllers for both regulation and tracking problems. To provide a unified framework for both optimal regulation and tracking, a discounted performance function is employed and a discounted algebraic Riccati equation (ARE) is derived which gives the solution to the problem. Conditions on the existence of a solution to the discounted ARE are provided and an upper bound for the discount factor is found to assure the stability of the optimal control solution. To develop an optimal OPFB controller, it is first shown that the system state can be constructed using some limited observations on the system output over a period of the history of the system. A Bellman equation is then developed to evaluate a control policy and find an improved policy simultaneously using only some limited observations on the system output. Then, using this Bellman equation, a model-free Off-policy RL-based OPFB controller is developed without requiring the knowledge of the system state or the system dynamics. It is shown that the proposed OPFB method is more powerful than the static OPFB as it is equivalent to a state-feedback control policy. The proposed method is successfully used to solve a regulation and a tracking problem.

Original languageEnglish (US)
Article number7574391
Pages (from-to)2401-2410
Number of pages10
JournalIEEE Transactions on Cybernetics
Volume46
Issue number11
DOIs
StatePublished - Nov 2016

Keywords

  • Off-policy reinforcement learning
  • measured output data
  • optimal control
  • output feedback (OPFB)

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Optimal Output-Feedback Control of Unknown Continuous-Time Linear Systems Using Off-policy Reinforcement Learning'. Together they form a unique fingerprint.

Cite this