Bandits with feedback graphs and switching costs

Raman Arora, Teodor V. Marinov, Mehryar Mohri

Research output: Contribution to journalConference article

Abstract

We study the adversarial multi-armed bandit problem where the learner is supplied with partial observations modeled by a feedback graph and where shifting to a new action incurs a fixed switching cost. We give two new algorithms for this problem in the informed setting. Our best algorithm achieves a pseudo-regret of Õ(?(G) 1 3 T 2 3 ), where ?(G) is the domination number of the feedback graph. This significantly improves upon the previous best result for the same problem, which was based on the independence number of G. We also present matching lower bounds for our result that we describe in detail. Finally, we give a new algorithm with improved policy regret bounds when partial counterfactual feedback is available.

Original languageEnglish (US)
JournalAdvances in Neural Information Processing Systems
Volume32
StatePublished - 2019
Event33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019 - Vancouver, Canada
Duration: Dec 8 2019Dec 14 2019

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint Dive into the research topics of 'Bandits with feedback graphs and switching costs'. Together they form a unique fingerprint.

Cite this