TY - JOUR
T1 - Online learning in repeated auctions
AU - Weed, Jonathan
AU - Perchet, Vianney
AU - Rigollet, Philippe
N1 - Funding Information:
Jonathan Weed is supported in part by NSF Graduate Research Fellowship DGE-1122374 and NSF grants DMS-1317308 and CAREER-DMS-1053987. Vianney Perchet is supported in part by ANR grant ANR-13-JS01-0004-01, PGMO grant Nougat, and CNRS grant PEPS Parasol. Philippe Rigollet is supported in part by NSF grants DMS-1317308 and CAREER-DMS-1053987. The authors wish to thank the anonymous reviewers for their helpful suggestions.
Publisher Copyright:
© 2016 J. Weed, V. Perchet & P. Rigollet.
PY - 2016/6/6
Y1 - 2016/6/6
N2 - Motivated by online advertising auctions, we consider repeated Vickrey auctions where goods of unknown value are sold sequentially and bidders only learn (potentially noisy) information about a good's value once it is purchased. We adopt an online learning approach with bandit feedback to model this problem and derive bidding strategies for two models: stochastic and adversarial. In the stochastic model, the observed values of the goods are random variables centered around the true value of the good. In this case, logarithmic regret is achievable when competing against well behaved adversaries. In the adversarial model, the goods need not be identical. Comparing our performance against that of the best fixed bid in hindsight, we show that sublinear regret is also achievable in this case. For both the stochastic and adversarial models, we prove matching minimax lower bounds showing our strategies to be optimal up to lower-order terms. To our knowledge, this is the first complete set of strategies for bidders participating in auctions of this type.
AB - Motivated by online advertising auctions, we consider repeated Vickrey auctions where goods of unknown value are sold sequentially and bidders only learn (potentially noisy) information about a good's value once it is purchased. We adopt an online learning approach with bandit feedback to model this problem and derive bidding strategies for two models: stochastic and adversarial. In the stochastic model, the observed values of the goods are random variables centered around the true value of the good. In this case, logarithmic regret is achievable when competing against well behaved adversaries. In the adversarial model, the goods need not be identical. Comparing our performance against that of the best fixed bid in hindsight, we show that sublinear regret is also achievable in this case. For both the stochastic and adversarial models, we prove matching minimax lower bounds showing our strategies to be optimal up to lower-order terms. To our knowledge, this is the first complete set of strategies for bidders participating in auctions of this type.
KW - Bandit problems
KW - Repeated auctions
KW - Second price auctions
KW - Vickrey auctions
UR - http://www.scopus.com/inward/record.url?scp=85072249507&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072249507&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85072249507
SN - 1532-4435
VL - 49
SP - 1562
EP - 1583
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
IS - June
T2 - 29th Conference on Learning Theory, COLT 2016
Y2 - 23 June 2016 through 26 June 2016
ER -