TY - GEN
T1 - Multi-armed bandit algorithms and empirical evaluation
AU - Vermorel, Joannès
AU - Mohri, Mehryar
PY - 2005
Y1 - 2005
N2 - The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, POKER (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.
AB - The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, POKER (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.
UR - http://www.scopus.com/inward/record.url?scp=33646406807&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33646406807&partnerID=8YFLogxK
U2 - 10.1007/11564096_42
DO - 10.1007/11564096_42
M3 - Conference contribution
AN - SCOPUS:33646406807
SN - 3540292438
SN - 9783540292432
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 437
EP - 448
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
T2 - 16th European Conference on Machine Learning, ECML 2005
Y2 - 3 October 2005 through 7 October 2005
ER -