Adversarial Combinatorial Bandits with General Non-linear Reward Functions

Xi Chen, Yanjun Han, Yining Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we study the adversarial combinatorial bandit with a known non-linear reward function, extending existing work on adversarial linear combinatorial bandit. The adversarial combinatorial bandit with general non-linear reward is an important open problem in bandit literature, and it is still unclear whether there is a significant gap from the case of linear reward, stochastic bandit, or semi-bandit feedback. We show that, with N arms and subsets of K arms being chosen at each of T time periods, the minimax optimal regret is Θ(equation presented) d(√NdT) if the reward function is a d-degree polynomial with d < K, and ΘK(NKT) if the reward function is not a low-degree polynomial. Both bounds are significantly different from the bound O(√poly(N, K)T) for the linear case, which suggests that there is a fundamental gap between the linear and non-linear reward structures. Our result also finds applications to adversarial assortment optimization problem in online recommendation. We show that in the worst-case of adversarial assortment problem, the optimal algorithm must treat each individual (NK) assortment as independent.

Original languageEnglish (US)
Title of host publicationProceedings of the 38th International Conference on Machine Learning, ICML 2021
PublisherML Research Press
Pages4030-4039
Number of pages10
ISBN (Electronic)9781713845065
StatePublished - 2021
Event38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Duration: Jul 18 2021Jul 24 2021

Publication series

NameProceedings of Machine Learning Research
Volume139
ISSN (Electronic)2640-3498

Conference

Conference38th International Conference on Machine Learning, ICML 2021
CityVirtual, Online
Period7/18/217/24/21

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Adversarial Combinatorial Bandits with General Non-linear Reward Functions'. Together they form a unique fingerprint.

Cite this