Striving for simplicity and performance in off-policy drl: Output normalization and non-uniform sampling

Che Wang, Yanqiu Wu, Quan Vuong, Keith Ross

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We aim to develop off-policy DRL algorithms that not only exceed state-of-The-Art performance but are also simple and minimalistic. For standard continuous control benchmarks, Soft Actor-Critic (SAC), which employs entropy maximization, currently provides state-of-The-Art performance. We frst demonstrate that the entropy term in SAC addresses action saturation due to the bounded nature of the action spaces, with this insight, we propose a streamlined algorithm with a simple normalization scheme or with inverted gradients. We show that both approaches can match SAC s sample effciency performance without the need of entropy maximization, we then propose a simple non-uniform sampling method for selecting transitions from the replay buffer during training. Extensive experimental results demonstrate that our proposed sampling scheme leads to state of the art sample effciency on challenging continuous control tasks. We combine all of our fndings into one simple algorithm, which we call Streamlined Off Policy with Emphasizing Recent Experience, for which we provide robust public-domain code.

    Original languageEnglish (US)
    Title of host publication37th International Conference on Machine Learning, ICML 2020
    EditorsHal Daume, Aarti Singh
    PublisherInternational Machine Learning Society (IMLS)
    Pages10012-10022
    Number of pages11
    ISBN (Electronic)9781713821120
    StatePublished - 2020
    Event37th International Conference on Machine Learning, ICML 2020 - Virtual, Online
    Duration: Jul 13 2020Jul 18 2020

    Publication series

    Name37th International Conference on Machine Learning, ICML 2020
    VolumePartF168147-13

    Conference

    Conference37th International Conference on Machine Learning, ICML 2020
    CityVirtual, Online
    Period7/13/207/18/20

    ASJC Scopus subject areas

    • Computational Theory and Mathematics
    • Human-Computer Interaction
    • Software

    Fingerprint Dive into the research topics of 'Striving for simplicity and performance in off-policy drl: Output normalization and non-uniform sampling'. Together they form a unique fingerprint.

    Cite this