A time and space efficient algorithm for contextual linear bandits

José Bento, Stratis Ioannidis, S. Muthukrishnan, Jinyun Yan

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve O(logT) regret after T time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with T or achieve regrets that grow linearly with the number of contexts |χ|. We propose an ε-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in ℝd, we prove that our algorithm has a constant computation complexity per iteration of O(poly(d)) and can achieve a regret of O(poly(d) log T) even when |χ| = Ω(2d). In addition, unlike previous algorithms, its space complexity scales like O(Kd2) and does not grow with T.

    Original languageEnglish (US)
    Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Proceedings
    Pages257-272
    Number of pages16
    EditionPART 1
    DOIs
    StatePublished - 2013
    EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013 - Prague, Czech Republic
    Duration: Sep 23 2013Sep 27 2013

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    NumberPART 1
    Volume8188 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2013
    Country/TerritoryCzech Republic
    CityPrague
    Period9/23/139/27/13

    Keywords

    • Contextual Linear Bandits
    • Space and Time Efficiency

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Fingerprint

    Dive into the research topics of 'A time and space efficient algorithm for contextual linear bandits'. Together they form a unique fingerprint.

    Cite this