STATISTICAL COMPLEXITY AND OPTIMAL ALGORITHMS FOR NONLINEAR RIDGE BANDITS

Nived Rajaraman, H. A.N. Yanjun, Jiantao Jiao, Kannan Ramchandran

Research output: Contribution to journalArticlepeer-review

Abstract

We consider the sequential decision-making problem where the mean outcome is a nonlinear function of the chosen action. Compared with the linear model, two curious phenomena arise in nonlinear models: first, in addition to the “learning phase” with a standard parametric rate for estimation or regret, there is an “burn-in period” with a fixed cost determined by the nonlinear function; second, achieving the smallest burn-in cost requires new exploration algorithms. For a special family of nonlinear functions named ridge functions in the literature, we derive upper and lower bounds on the optimal burn-in cost, and in addition, on the entire learning trajectory during the burn-in period via differential equations. In particular, a two-stage algorithm that first finds a good initial action and then treats the problem as locally linear is statistically optimal. In contrast, several classical algorithms, such as UCB and algorithms relying on regression oracles, are provably suboptimal.

Original languageEnglish (US)
Pages (from-to)2557-2582
Number of pages26
JournalAnnals of Statistics
Volume52
Issue number6
DOIs
StatePublished - Dec 2024

Keywords

  • adaptive sampling
  • Bandit problems
  • minimax rate
  • regret bounds
  • ridge functions
  • sequential estimation

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'STATISTICAL COMPLEXITY AND OPTIMAL ALGORITHMS FOR NONLINEAR RIDGE BANDITS'. Together they form a unique fingerprint.

Cite this