Multi-armed bandit algorithms and empirical evaluation

Joannès Vermorel, Mehryar Mohri

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, POKER (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.

Original languageEnglish (US)
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages437-448
Number of pages12
DOIs
StatePublished - 2005
Event16th European Conference on Machine Learning, ECML 2005 - Porto, Portugal
Duration: Oct 3 2005Oct 7 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3720 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other16th European Conference on Machine Learning, ECML 2005
Country/TerritoryPortugal
CityPorto
Period10/3/0510/7/05

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Multi-armed bandit algorithms and empirical evaluation'. Together they form a unique fingerprint.

Cite this