Pure Exploration of Multi-armed Bandit under Matroid Constraints

Lijie Chen, Anupam Gupta, Jian Li

Research output: Contribution to journalConference articlepeer-review

Abstract

We study the pure exploration problem subject to a matroid constraint (BEST-BASIS) in a stochastic multi-armed bandit game. In a BEST-BASIS instance, we are given n stochastic arms with unknown reward distributions, as well as a matroid M over the arms. Let the weight of an arm be the mean of its reward distribution. Our goal is to identify a basis of M with the maximum total weight, using as few samples as possible. The problem is a significant generalization of the best arm identification problem and the top-k arm identification problem, which have attracted significant attentions in recent years. We study both the exact and PAC versions of BEST-BASIS, and provide algorithms with nearly-optimal sample complexities for these versions. Our results generalize and/or improve on several previous results for the top-k arm identification problem and the combinatorial pure exploration problem when the combinatorial constraint is a matroid.

Original languageEnglish (US)
Pages (from-to)647-669
Number of pages23
JournalJournal of Machine Learning Research
Volume49
Issue numberJune
StatePublished - Jun 6 2016
Event29th Conference on Learning Theory, COLT 2016 - New York, United States
Duration: Jun 23 2016Jun 26 2016

Keywords

  • Matroid
  • Multi-armed bandit
  • Pure exploration

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Pure Exploration of Multi-armed Bandit under Matroid Constraints'. Together they form a unique fingerprint.

Cite this