TY - JOUR
T1 - Learning-based adaptive optimal control of linear time-delay systems
T2 - A value iteration approach
AU - Cui, Leilei
AU - Pang, Bo
AU - Krstić, Miroslav
AU - Jiang, Zhong Ping
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2025/1
Y1 - 2025/1
N2 - This paper proposes a novel learning-based adaptive optimal controller design method for a class of continuous-time linear time-delay systems. A key strategy is to exploit the state-of-the-art reinforcement learning (RL) techniques and adaptive dynamic programming (ADP), and propose a data-driven method to learn the near-optimal controller without the precise knowledge of system dynamics. Specifically, a value iteration (VI) algorithm is proposed to solve the infinite-dimensional Riccati equation for the linear quadratic optimal control problem of time-delay systems using finite samples of input-state trajectory data. It is rigorously proved that the proposed VI algorithm converges to the near-optimal solution. Compared with the previous literature, the nice features of the proposed VI algorithm are that it is directly developed for continuous-time systems without discretization and an initial admissible controller is not required for implementing the algorithm. The efficacy of the proposed methodology is demonstrated by two practical examples of metal cutting and autonomous driving.
AB - This paper proposes a novel learning-based adaptive optimal controller design method for a class of continuous-time linear time-delay systems. A key strategy is to exploit the state-of-the-art reinforcement learning (RL) techniques and adaptive dynamic programming (ADP), and propose a data-driven method to learn the near-optimal controller without the precise knowledge of system dynamics. Specifically, a value iteration (VI) algorithm is proposed to solve the infinite-dimensional Riccati equation for the linear quadratic optimal control problem of time-delay systems using finite samples of input-state trajectory data. It is rigorously proved that the proposed VI algorithm converges to the near-optimal solution. Compared with the previous literature, the nice features of the proposed VI algorithm are that it is directly developed for continuous-time systems without discretization and an initial admissible controller is not required for implementing the algorithm. The efficacy of the proposed methodology is demonstrated by two practical examples of metal cutting and autonomous driving.
KW - Adaptive dynamic programming (ADP)
KW - Learning-based control
KW - Linear time-delay systems
KW - Value iteration (VI)
UR - http://www.scopus.com/inward/record.url?scp=85205296377&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85205296377&partnerID=8YFLogxK
U2 - 10.1016/j.automatica.2024.111944
DO - 10.1016/j.automatica.2024.111944
M3 - Article
AN - SCOPUS:85205296377
SN - 0005-1098
VL - 171
JO - Automatica
JF - Automatica
M1 - 111944
ER -