TY - JOUR

T1 - Strategizing against Learners in Bayesian Games

AU - Mansour, Yishay

AU - Mohri, Mehryar

AU - Schneider, Jon

AU - Sivan, Balasubramanian

N1 - Funding Information:
YM received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 882396), the Israel Science Foundation(grant number 993/17), Tel Aviv University Center for AI and Data Science (TAD), and the Yandex Initiative for Machine Learning at Tel Aviv University.
Publisher Copyright:
© 2022 Y. Mansour, M. Mohri, J. Schneider & B. Sivan.

PY - 2022

Y1 - 2022

N2 - We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy, while the other, the optimizer, is a rational utility maximizer. We consider general Bayesian games, where the payoffs of both the optimizer and the learner could depend on the type, which is drawn from a publicly known distribution, but revealed privately to the learner. We address the following questions: (a) what is the bare minimum that the optimizer can guarantee to obtain regardless of the no-regret learning algorithm employed by the learner? (b) are there learning algorithms that cap the optimizer payoff at this minimum? (c) can these algorithms be implemented efficiently? While building this theory of optimizer-learner interactions, we define a new combinatorial notion of regret called polytope swap regret, that could be of independent interest in other settings.

AB - We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy, while the other, the optimizer, is a rational utility maximizer. We consider general Bayesian games, where the payoffs of both the optimizer and the learner could depend on the type, which is drawn from a publicly known distribution, but revealed privately to the learner. We address the following questions: (a) what is the bare minimum that the optimizer can guarantee to obtain regardless of the no-regret learning algorithm employed by the learner? (b) are there learning algorithms that cap the optimizer payoff at this minimum? (c) can these algorithms be implemented efficiently? While building this theory of optimizer-learner interactions, we define a new combinatorial notion of regret called polytope swap regret, that could be of independent interest in other settings.

KW - Bayesian games

KW - Stackelberg value

KW - swap regret

UR - http://www.scopus.com/inward/record.url?scp=85154051026&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85154051026&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85154051026

SN - 2640-3498

VL - 178

SP - 5221

EP - 5252

JO - Proceedings of Machine Learning Research

JF - Proceedings of Machine Learning Research

T2 - 35th Conference on Learning Theory, COLT 2022

Y2 - 2 July 2022 through 5 July 2022

ER -