TY - GEN
T1 - Approximating gradients for differentiable quality diversity in reinforcement learning
AU - Tjanaka, Bryon
AU - Fontaine, Matthew C.
AU - Togelius, Julian
AU - Nikolaidis, Stefanos
N1 - Funding Information:
The authors thank the anonymous reviewers, Ya-Chuan Hsu, Her-amb Nemlekar, and Gautam Salhotra for their invaluable feedback. This work was partially supported by the NSF NRI (#1053128) and NSF GRFP (#DGE-1842487).
Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/7/8
Y1 - 2022/7/8
N2 - Consider the problem of training robustly capable agents. One approach is to generate a diverse collection of agent polices. Training can then be viewed as a quality diversity (QD) optimization problem, where we search for a collection of performant policies that are diverse with respect to quantified behavior. Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available. However, agent policies typically assume that the environment is not differentiable. To apply DQD algorithms to training agent policies, we must approximate gradients for performance and behavior. We propose two variants of the current state-of-the-art DQD algorithm that compute gradients via approximation methods common in reinforcement learning (RL). We evaluate our approach on four simulated locomotion tasks. One variant achieves results comparable to the current state-of-the-art in combining QD and RL, while the other performs comparably in two locomotion tasks. These results provide insight into the limitations of current DQD algorithms in domains where gradients must be approximated. Source code is available at https://github.com/icaros-usc/dqd-rl
AB - Consider the problem of training robustly capable agents. One approach is to generate a diverse collection of agent polices. Training can then be viewed as a quality diversity (QD) optimization problem, where we search for a collection of performant policies that are diverse with respect to quantified behavior. Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available. However, agent policies typically assume that the environment is not differentiable. To apply DQD algorithms to training agent policies, we must approximate gradients for performance and behavior. We propose two variants of the current state-of-the-art DQD algorithm that compute gradients via approximation methods common in reinforcement learning (RL). We evaluate our approach on four simulated locomotion tasks. One variant achieves results comparable to the current state-of-the-art in combining QD and RL, while the other performs comparably in two locomotion tasks. These results provide insight into the limitations of current DQD algorithms in domains where gradients must be approximated. Source code is available at https://github.com/icaros-usc/dqd-rl
KW - neuroevolution
KW - quality diversity
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85130854633&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85130854633&partnerID=8YFLogxK
U2 - 10.1145/3512290.3528705
DO - 10.1145/3512290.3528705
M3 - Conference contribution
AN - SCOPUS:85130854633
T3 - GECCO 2022 - Proceedings of the 2022 Genetic and Evolutionary Computation Conference
SP - 1102
EP - 1111
BT - GECCO 2022 - Proceedings of the 2022 Genetic and Evolutionary Computation Conference
PB - Association for Computing Machinery, Inc
T2 - 2022 Genetic and Evolutionary Computation Conference, GECCO 2022
Y2 - 9 July 2022 through 13 July 2022
ER -