Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation

Albert Zhan, Ruihan Zhao, Lerrel Pinto, Pieter Abbeel, Michael Laskin

Research output: Contribution to journalConference articlepeer-review

Abstract

Recent advances in unsupervised representation learning significantly improved the sample efficiency of training Reinforcement Learning policies in simulated environments. However, similar gains have not yet been seen for real-robot reinforcement learning. In this work, we focus on enabling data-efficient real-robot learning from pixels. We present Contrastive Pre-training and Data Augmentation for Efficient Robotic Learning (CoDER), a method that utilizes data augmentation and unsupervised learning to achieve sample-efficient training of real-robot arm policies from sparse rewards. While contrastive pre-training, data augmentation, demonstrations, and reinforcement learning are alone insufficient for efficient learning, our main contribution is showing that the combination of these disparate techniques results in a simple yet data-efficient method. We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels, such as reaching, picking, moving, pulling a large object, flipping a switch, and opening a drawer in just 30 minutes of mean real-world training time. We include videos and code on the project website: https://sites.google.com/view/efficientrobotic-manipulation/home.

Original languageEnglish (US)
Pages (from-to)4040-4047
Number of pages8
JournalIEEE International Conference on Intelligent Robots and Systems
DOIs
StatePublished - 2022
Event2022 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2022 - Kyoto, Japan
Duration: Oct 23 2022Oct 27 2022

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation'. Together they form a unique fingerprint.

Cite this