Abstract
A model-free off-policy reinforcement learning algorithm is developed to learn the optimal output-feedback (OPFB) solution for linear continuous-time systems. The proposed algorithm has the important feature of being applicable to the design of optimal OPFB controllers for both regulation and tracking problems. To provide a unified framework for both optimal regulation and tracking, a discounted performance function is employed and a discounted algebraic Riccati equation (ARE) is derived which gives the solution to the problem. Conditions on the existence of a solution to the discounted ARE are provided and an upper bound for the discount factor is found to assure the stability of the optimal control solution. To develop an optimal OPFB controller, it is first shown that the system state can be constructed using some limited observations on the system output over a period of the history of the system. A Bellman equation is then developed to evaluate a control policy and find an improved policy simultaneously using only some limited observations on the system output. Then, using this Bellman equation, a model-free Off-policy RL-based OPFB controller is developed without requiring the knowledge of the system state or the system dynamics. It is shown that the proposed OPFB method is more powerful than the static OPFB as it is equivalent to a state-feedback control policy. The proposed method is successfully used to solve a regulation and a tracking problem.
Original language | English (US) |
---|---|
Article number | 7574391 |
Pages (from-to) | 2401-2410 |
Number of pages | 10 |
Journal | IEEE Transactions on Cybernetics |
Volume | 46 |
Issue number | 11 |
DOIs | |
State | Published - Nov 2016 |
Keywords
- Off-policy reinforcement learning
- measured output data
- optimal control
- output feedback (OPFB)
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Information Systems
- Human-Computer Interaction
- Computer Science Applications
- Electrical and Electronic Engineering