TY - GEN
T1 - Regularizing policy iteration for recursive feasibility and stability
AU - Granzotto, Mathieu
AU - De Silva, Olivier Lindamulage
AU - Postoyan, Romain
AU - Nesic, Dragan
AU - Jiang, Zhong Ping
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - We present a new algorithm called policy iteration plus (PI+) for the optimal control of nonlinear deterministic discrete-time plants with general cost functions. PI+ builds upon classical policy iteration and has the distinctive feature to enforce recursive feasibility under mild conditions, in the sense that the minimization problems solved at each iteration are guaranteed to admit a solution. While recursive feasibility is a desired property, it appears that existing results on the policy iteration algorithm fail to ensure it in general, contrary to PI+. We also establish the recursive stability of PI+: the policies generated at each iteration ensure a stability property for the closed-loop system. We prove our results under more general conditions than those currently available for policy iteration, by notably covering set stability. Finally, we present characterizations of near-optimality bounds for PI+ and prove the uniform convergence of the value functions generated by PI+ to the optimal value function. We believe that these results would benefit the burgeoning literature on approximate dynamic programming and reinforcement learning, where recursive feasibility is typically assumed without a clear method for verifying it and where recursive stability is essential for safe operation of the system.
AB - We present a new algorithm called policy iteration plus (PI+) for the optimal control of nonlinear deterministic discrete-time plants with general cost functions. PI+ builds upon classical policy iteration and has the distinctive feature to enforce recursive feasibility under mild conditions, in the sense that the minimization problems solved at each iteration are guaranteed to admit a solution. While recursive feasibility is a desired property, it appears that existing results on the policy iteration algorithm fail to ensure it in general, contrary to PI+. We also establish the recursive stability of PI+: the policies generated at each iteration ensure a stability property for the closed-loop system. We prove our results under more general conditions than those currently available for policy iteration, by notably covering set stability. Finally, we present characterizations of near-optimality bounds for PI+ and prove the uniform convergence of the value functions generated by PI+ to the optimal value function. We believe that these results would benefit the burgeoning literature on approximate dynamic programming and reinforcement learning, where recursive feasibility is typically assumed without a clear method for verifying it and where recursive stability is essential for safe operation of the system.
UR - http://www.scopus.com/inward/record.url?scp=85146992869&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146992869&partnerID=8YFLogxK
U2 - 10.1109/CDC51059.2022.9993315
DO - 10.1109/CDC51059.2022.9993315
M3 - Conference contribution
AN - SCOPUS:85146992869
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 6818
EP - 6823
BT - 2022 IEEE 61st Conference on Decision and Control, CDC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 61st IEEE Conference on Decision and Control, CDC 2022
Y2 - 6 December 2022 through 9 December 2022
ER -