TY - GEN
T1 - Efficient Episodic Learning of Nonstationary and Unknown Zero-Sum Games Using Expert Game Ensembles
AU - Pan, Yunian
AU - Zhu, Quanyan
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Game theory provides essential analysis in many applications of strategic interactions. However, the question of how to construct a game model and what is its fidelity is seldom addressed. In this work, we consider learning in a class of repeated zero-sum games with unknown, time-varying payoff matrix, and noisy feedbacks, by making use of an ensemble of benchmark game models. These models can be pre-trained and collected dynamically during sequential plays. They serve as prior side information and imperfectly underpin the unknown true game model. We propose OFULinMat, an episodic learning algorithm that integrates the adaptive estimation of game models and the learning of the strategies. The proposed algorithm is shown to achieve a sublinear bound on the saddle-point regret. We show that this algorithm is provably efficient through both theoretical analysis and numerical examples. We use a dynamic honeypot allocation game as a case study to illustrate and corroborate our results. We also discuss the relationship and highlight the difference between our framework and the classical adversarial multi-armed bandit framework.
AB - Game theory provides essential analysis in many applications of strategic interactions. However, the question of how to construct a game model and what is its fidelity is seldom addressed. In this work, we consider learning in a class of repeated zero-sum games with unknown, time-varying payoff matrix, and noisy feedbacks, by making use of an ensemble of benchmark game models. These models can be pre-trained and collected dynamically during sequential plays. They serve as prior side information and imperfectly underpin the unknown true game model. We propose OFULinMat, an episodic learning algorithm that integrates the adaptive estimation of game models and the learning of the strategies. The proposed algorithm is shown to achieve a sublinear bound on the saddle-point regret. We show that this algorithm is provably efficient through both theoretical analysis and numerical examples. We use a dynamic honeypot allocation game as a case study to illustrate and corroborate our results. We also discuss the relationship and highlight the difference between our framework and the classical adversarial multi-armed bandit framework.
UR - http://www.scopus.com/inward/record.url?scp=85126029423&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126029423&partnerID=8YFLogxK
U2 - 10.1109/CDC45484.2021.9682998
DO - 10.1109/CDC45484.2021.9682998
M3 - Conference contribution
AN - SCOPUS:85126029423
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 1669
EP - 1676
BT - 60th IEEE Conference on Decision and Control, CDC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 60th IEEE Conference on Decision and Control, CDC 2021
Y2 - 13 December 2021 through 17 December 2021
ER -