Abstract
Existing model-free reinforcement learning (RL) approaches are effective when trained on states but struggle to learn directly from image observations. We propose an augmentation technique that can be applied to standard model-free RL algorithms, enabling robust learning directly from pixels without the need for auxiliary losses or pre-training. The approach leverages input perturbations commonly used in computer vision tasks to transform input examples, as well as regularizing the value function and policy. Our approach reaches a new state-of-the-art performance on DeepMind control suite and Atari 100k benchmark, surpassing previous model-free (Haarnoja et al., 2018; van Hasselt et al., 2019a), model-based (Hafner et al., 2019; Lee et al., 2019; Hafner et al., 2018; Kaiser et al., 2019) and contrastive learning (Srinivas et al., 2020) approaches. It also closes the gap between state-based and image-based RL training. Our method, which we dub DrQ: Data-regularized Q, can be combined with any model-free RL algorithm. To the best of our knowledge, our approach is the first effective data augmentation method for RL on these benchmarks.
Original language | English (US) |
---|---|
State | Published - 2021 |
Event | 9th International Conference on Learning Representations, ICLR 2021 - Virtual, Online Duration: May 3 2021 → May 7 2021 |
Conference
Conference | 9th International Conference on Learning Representations, ICLR 2021 |
---|---|
City | Virtual, Online |
Period | 5/3/21 → 5/7/21 |
ASJC Scopus subject areas
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language