Abstract
We study the pure exploration problem subject to a matroid constraint (BEST-BASIS) in a stochastic multi-armed bandit game. In a BEST-BASIS instance, we are given n stochastic arms with unknown reward distributions, as well as a matroid M over the arms. Let the weight of an arm be the mean of its reward distribution. Our goal is to identify a basis of M with the maximum total weight, using as few samples as possible. The problem is a significant generalization of the best arm identification problem and the top-k arm identification problem, which have attracted significant attentions in recent years. We study both the exact and PAC versions of BEST-BASIS, and provide algorithms with nearly-optimal sample complexities for these versions. Our results generalize and/or improve on several previous results for the top-k arm identification problem and the combinatorial pure exploration problem when the combinatorial constraint is a matroid.
Original language | English (US) |
---|---|
Pages (from-to) | 647-669 |
Number of pages | 23 |
Journal | Journal of Machine Learning Research |
Volume | 49 |
Issue number | June |
State | Published - Jun 6 2016 |
Event | 29th Conference on Learning Theory, COLT 2016 - New York, United States Duration: Jun 23 2016 → Jun 26 2016 |
Keywords
- Matroid
- Multi-armed bandit
- Pure exploration
ASJC Scopus subject areas
- Control and Systems Engineering
- Software
- Statistics and Probability
- Artificial Intelligence