TY - GEN
T1 - Blackwell online learning for markov decision processes
AU - Li, Tao
AU - Peng, Guanze
AU - Zhu, Quanyan
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/3/24
Y1 - 2021/3/24
N2 - Ahstract-This work provides a novel interpretation of Markov Decision Processes (MDP) from the online optimization viewpoint. In such an online optimization context, the policy of the MDP is viewed as the decision variable while the corresponding value function is treated as payoff feedback from the environment. Based on this interpretation, we construct a Blackwell game induced by MDP, which bridges the gap among regret minimization, Blackwell approachability theory, and learning theory for MDP. Specifically, Based on the approachability theory, we propose 1) Blackwell value iteration for offline planning and 2) Blackwell Q-learning for online learning in MDP, both of which are shown to converge to the optimal solution. Our theoretical guarantees are corroborated by numerical experiments.
AB - Ahstract-This work provides a novel interpretation of Markov Decision Processes (MDP) from the online optimization viewpoint. In such an online optimization context, the policy of the MDP is viewed as the decision variable while the corresponding value function is treated as payoff feedback from the environment. Based on this interpretation, we construct a Blackwell game induced by MDP, which bridges the gap among regret minimization, Blackwell approachability theory, and learning theory for MDP. Specifically, Based on the approachability theory, we propose 1) Blackwell value iteration for offline planning and 2) Blackwell Q-learning for online learning in MDP, both of which are shown to converge to the optimal solution. Our theoretical guarantees are corroborated by numerical experiments.
KW - Blackwell approachability
KW - No-regret learning
KW - Online optimization
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85104985288&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85104985288&partnerID=8YFLogxK
U2 - 10.1109/CISS50987.2021.9400319
DO - 10.1109/CISS50987.2021.9400319
M3 - Conference contribution
AN - SCOPUS:85104985288
T3 - 2021 55th Annual Conference on Information Sciences and Systems, CISS 2021
BT - 2021 55th Annual Conference on Information Sciences and Systems, CISS 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 55th Annual Conference on Information Sciences and Systems, CISS 2021
Y2 - 24 March 2021 through 26 March 2021
ER -