RANDOMIZED ENSEMBLED DOUBLE Q-LEARNING: LEARNING FAST WITHOUT A MODEL

Xinyue Chen, Che Wang, Zijian Zhou, Keith Ross

    Research output: Contribution to conferencePaperpeer-review

    Abstract

    Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved much higher sample efficiency than previous model-free methods for continuous-action DRL benchmarks. In this paper, we introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ), and show that its performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark. Moreover, REDQ can achieve this performance using fewer parameters than the model-based method, and with less wall-clock run time. REDQ has three carefully integrated ingredients which allow it to achieve its high performance: (i) a UTD ratio ≫ 1; (ii) an ensemble of Q functions; (iii) in-target minimization across a random subset of Q functions from the ensemble. Through carefully designed experiments, we provide a detailed analysis of REDQ and related model-free algorithms. To our knowledge, REDQ is the first successful model-free DRL algorithm for continuous-action spaces using a UTD ratio ≫ 1.

    Original languageEnglish (US)
    StatePublished - 2021
    Event9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online
    Duration: May 3 2021May 7 2021

    Conference

    Conference9th International Conference on Learning Representations, ICLR 2021
    CityVirtual, Online
    Period5/3/215/7/21

    ASJC Scopus subject areas

    • Language and Linguistics
    • Computer Science Applications
    • Education
    • Linguistics and Language

    Fingerprint

    Dive into the research topics of 'RANDOMIZED ENSEMBLED DOUBLE Q-LEARNING: LEARNING FAST WITHOUT A MODEL'. Together they form a unique fingerprint.

    Cite this