Approximating gradients for differentiable quality diversity in reinforcement learning

Bryon Tjanaka, Matthew C. Fontaine, Julian Togelius, Stefanos Nikolaidis

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Consider the problem of training robustly capable agents. One approach is to generate a diverse collection of agent polices. Training can then be viewed as a quality diversity (QD) optimization problem, where we search for a collection of performant policies that are diverse with respect to quantified behavior. Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available. However, agent policies typically assume that the environment is not differentiable. To apply DQD algorithms to training agent policies, we must approximate gradients for performance and behavior. We propose two variants of the current state-of-the-art DQD algorithm that compute gradients via approximation methods common in reinforcement learning (RL). We evaluate our approach on four simulated locomotion tasks. One variant achieves results comparable to the current state-of-the-art in combining QD and RL, while the other performs comparably in two locomotion tasks. These results provide insight into the limitations of current DQD algorithms in domains where gradients must be approximated. Source code is available at https://github.com/icaros-usc/dqd-rl

    Original languageEnglish (US)
    Title of host publicationGECCO 2022 - Proceedings of the 2022 Genetic and Evolutionary Computation Conference
    PublisherAssociation for Computing Machinery, Inc
    Pages1102-1111
    Number of pages10
    ISBN (Electronic)9781450392372
    DOIs
    StatePublished - Jul 8 2022
    Event2022 Genetic and Evolutionary Computation Conference, GECCO 2022 - Virtual, Online, United States
    Duration: Jul 9 2022Jul 13 2022

    Publication series

    NameGECCO 2022 - Proceedings of the 2022 Genetic and Evolutionary Computation Conference

    Conference

    Conference2022 Genetic and Evolutionary Computation Conference, GECCO 2022
    Country/TerritoryUnited States
    CityVirtual, Online
    Period7/9/227/13/22

    Keywords

    • neuroevolution
    • quality diversity
    • reinforcement learning

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Software
    • Theoretical Computer Science

    Fingerprint

    Dive into the research topics of 'Approximating gradients for differentiable quality diversity in reinforcement learning'. Together they form a unique fingerprint.

    Cite this