TY - GEN
T1 - Self-Triggered Markov Decision Processes
AU - Huang, Yunhan
AU - Zhu, Quanyan
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - In this paper, we study Markov Decision Processes (MDPs) with self-triggered strategies, where the idea of self-triggered control is extended to more generic MDP models. This extension broadens the application of self-triggering policies to a broader range of systems. We study the co-design problems of the control policy and the triggering policy to optimize two pre-specified cost criteria. The first cost criterion is introduced by incorporating a pre-specified update penalty into the traditional MDP cost criteria to reduce the use of communication resources. A novel dynamic programming (DP) equation called DP equation with optimized lookahead is proposed to solve for the optimal self-triggering policy under this criteria. The second self-triggering policy is to maximize the triggering time while still guaranteeing a pre-specified level of sub-optimality. Theoretical underpinnings are established for the computation and implementation of both policies. Through a gridworld numerical example, we illustrate the two policies' effectiveness in reducing resources consumption and demonstrate the tradeoffs between resource consumption and system performance.
AB - In this paper, we study Markov Decision Processes (MDPs) with self-triggered strategies, where the idea of self-triggered control is extended to more generic MDP models. This extension broadens the application of self-triggering policies to a broader range of systems. We study the co-design problems of the control policy and the triggering policy to optimize two pre-specified cost criteria. The first cost criterion is introduced by incorporating a pre-specified update penalty into the traditional MDP cost criteria to reduce the use of communication resources. A novel dynamic programming (DP) equation called DP equation with optimized lookahead is proposed to solve for the optimal self-triggering policy under this criteria. The second self-triggering policy is to maximize the triggering time while still guaranteeing a pre-specified level of sub-optimality. Theoretical underpinnings are established for the computation and implementation of both policies. Through a gridworld numerical example, we illustrate the two policies' effectiveness in reducing resources consumption and demonstrate the tradeoffs between resource consumption and system performance.
UR - http://www.scopus.com/inward/record.url?scp=85126010392&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126010392&partnerID=8YFLogxK
U2 - 10.1109/CDC45484.2021.9682918
DO - 10.1109/CDC45484.2021.9682918
M3 - Conference contribution
AN - SCOPUS:85126010392
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 4507
EP - 4514
BT - 60th IEEE Conference on Decision and Control, CDC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 60th IEEE Conference on Decision and Control, CDC 2021
Y2 - 13 December 2021 through 17 December 2021
ER -