TY - JOUR
T1 - Bandits with feedback graphs and switching costs
AU - Arora, Raman
AU - Marinov, Teodor V.
AU - Mohri, Mehryar
N1 - Funding Information:
This research was partly supported by NSF BIGDATA grants IIS-1546482 and NSF IIS-1838139, and by NSF CCF-1535987, NSF IIS-1618662, and a Google Research Award.
Publisher Copyright:
© 2019 Neural information processing systems foundation. All rights reserved.
PY - 2019
Y1 - 2019
N2 - We study the adversarial multi-armed bandit problem where the learner is supplied with partial observations modeled by a feedback graph and where shifting to a new action incurs a fixed switching cost. We give two new algorithms for this problem in the informed setting. Our best algorithm achieves a pseudo-regret of Õ(?(G) 1 3 T 2 3 ), where ?(G) is the domination number of the feedback graph. This significantly improves upon the previous best result for the same problem, which was based on the independence number of G. We also present matching lower bounds for our result that we describe in detail. Finally, we give a new algorithm with improved policy regret bounds when partial counterfactual feedback is available.
AB - We study the adversarial multi-armed bandit problem where the learner is supplied with partial observations modeled by a feedback graph and where shifting to a new action incurs a fixed switching cost. We give two new algorithms for this problem in the informed setting. Our best algorithm achieves a pseudo-regret of Õ(?(G) 1 3 T 2 3 ), where ?(G) is the domination number of the feedback graph. This significantly improves upon the previous best result for the same problem, which was based on the independence number of G. We also present matching lower bounds for our result that we describe in detail. Finally, we give a new algorithm with improved policy regret bounds when partial counterfactual feedback is available.
UR - http://www.scopus.com/inward/record.url?scp=85090175879&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090175879&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85090175879
SN - 1049-5258
VL - 32
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019
Y2 - 8 December 2019 through 14 December 2019
ER -