TY - GEN
T1 - Bridging the Gap Between Reinforcement Learning and Nonlinear Output-Feedback Control
AU - Gao, Weinan
AU - Jiang, Zhong Ping
AU - Chai, Tianyou
N1 - Publisher Copyright:
© 2024 Technical Committee on Control Theory, Chinese Association of Automation.
PY - 2024
Y1 - 2024
N2 - The primary objective of this paper is to bridge the gap between reinforcement learning (RL) and nonlinear output-feedback control by developing a novel solution to direct adaptive optimal control with guaranteed closed-loop stability More specifically, for a broad class of nonlinear affine discrete time systems with limited output measurements, we integrate RL and advanced nonlinear control methods to devise high-fidelity direct adaptive optimal controllers from data. Under the condition of uniform observability, our original learning-based control solution begins with the reconstruction of the system state from the retrospective input and output data, akin to a deadbea observer. Subsequently, we propose value iteration algorithms to facilitate the learning of optimal output-feedback control policies and value functions leveraging measured input and output data To ensure feasibility and reliability in practice, we provide rigorous convergence proofs for the proposed learning algorithms along with the stability analysis for the closed-loop system Simulation results are presented to showcase the effectivenes of the developed methodologies, demonstrating their capability to handle the output-feedback adaptive optimal control problems of general nonlinear affine systems.
AB - The primary objective of this paper is to bridge the gap between reinforcement learning (RL) and nonlinear output-feedback control by developing a novel solution to direct adaptive optimal control with guaranteed closed-loop stability More specifically, for a broad class of nonlinear affine discrete time systems with limited output measurements, we integrate RL and advanced nonlinear control methods to devise high-fidelity direct adaptive optimal controllers from data. Under the condition of uniform observability, our original learning-based control solution begins with the reconstruction of the system state from the retrospective input and output data, akin to a deadbea observer. Subsequently, we propose value iteration algorithms to facilitate the learning of optimal output-feedback control policies and value functions leveraging measured input and output data To ensure feasibility and reliability in practice, we provide rigorous convergence proofs for the proposed learning algorithms along with the stability analysis for the closed-loop system Simulation results are presented to showcase the effectivenes of the developed methodologies, demonstrating their capability to handle the output-feedback adaptive optimal control problems of general nonlinear affine systems.
KW - Adaptive optimal control
KW - nonlinear systems
KW - output-feedback
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85205451265&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85205451265&partnerID=8YFLogxK
U2 - 10.23919/CCC63176.2024.10661387
DO - 10.23919/CCC63176.2024.10661387
M3 - Conference contribution
AN - SCOPUS:85205451265
T3 - Chinese Control Conference, CCC
SP - 2425
EP - 2431
BT - Proceedings of the 43rd Chinese Control Conference, CCC 2024
A2 - Na, Jing
A2 - Sun, Jian
PB - IEEE Computer Society
T2 - 43rd Chinese Control Conference, CCC 2024
Y2 - 28 July 2024 through 31 July 2024
ER -