Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Siddhant Haldar, Vaibhav Mathur, Denis Yarats, Lerrel Pinto

Research output: Contribution to journalConference articlepeer-review

Abstract

Imitation learning holds tremendous promise in learning policies efficiently for complex decision making problems. Current state-of-the-art algorithms often use inverse reinforcement learning (IRL), where given a set of expert demonstrations, an agent alternatively infers a reward function and the associated optimal policy. However, such IRL approaches often require substantial online interactions for complex control problems. In this work, we present Regularized Optimal Transport (ROT), a new imitation learning algorithm that builds on recent advances in optimal transport based trajectory-matching. Our key technical insight is that adaptively combining trajectory-matching rewards with behavior cloning can significantly accelerate imitation even with only a few demonstrations. Our experiments on 20 visual control tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark demonstrate an average of 7.8× faster imitation to reach 90% of expert performance compared to prior state-of-the-art methods. On real-world robotic manipulation, with just one demonstration and an hour of online training, ROT achieves an average success rate of 90.1% across 14 tasks.

Original languageEnglish (US)
Pages (from-to)32-43
Number of pages12
JournalProceedings of Machine Learning Research
Volume205
StatePublished - 2023
Event6th Conference on Robot Learning, CoRL 2022 - Auckland, New Zealand
Duration: Dec 14 2022Dec 18 2022

Keywords

  • Imitation Learning
  • Manipulation
  • Robotics

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Watch and Match: Supercharging Imitation with Regularized Optimal Transport'. Together they form a unique fingerprint.

Cite this