TY - GEN
T1 - Asymmetric Actor Critic for Image-Based Robot Learning
AU - Pinto, Lerrel
AU - Andrychowicz, Marcin
AU - Welinder, Peter
AU - Zaremba, Wojciech
AU - Abbeel, Pieter
N1 - Publisher Copyright:
© 2018, MIT Press Journals. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator. In this work, we propose the Asymmetric Actor Critic, which learns a vision-based control policy while taking advantage of access to the underlying state to significantly speed up training. Concretely, our algorithm employs an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) is trained on images. We show that using these asymmetric inputs improves performance on a range of simulated tasks. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real-world transfer without training on any real-world data. Videos of these experiments can be found in www.goo.gl/b57WTs.
AB - Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator. In this work, we propose the Asymmetric Actor Critic, which learns a vision-based control policy while taking advantage of access to the underlying state to significantly speed up training. Concretely, our algorithm employs an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) is trained on images. We show that using these asymmetric inputs improves performance on a range of simulated tasks. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real-world transfer without training on any real-world data. Videos of these experiments can be found in www.goo.gl/b57WTs.
UR - http://www.scopus.com/inward/record.url?scp=85127903841&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127903841&partnerID=8YFLogxK
U2 - 10.15607/RSS.2018.XIV.008
DO - 10.15607/RSS.2018.XIV.008
M3 - Conference contribution
AN - SCOPUS:85127903841
SN - 9780992374747
T3 - Robotics: Science and Systems
BT - Robotics
A2 - Kress-Gazit, Hadas
A2 - Srinivasa, Siddhartha S.
A2 - Howard, Tom
A2 - Atanasov, Nikolay
PB - MIT Press Journals
T2 - 14th Robotics: Science and Systems, RSS 2018
Y2 - 26 June 2018 through 30 June 2018
ER -