Improving Long-Horizon Imitation through Instruction Prediction

Joey Hejna, Pieter Abbeel, Lerrel Pinto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in low data regimes where over-fitting stifles generalization and compounding errors hurt accuracy. In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction. Concretely, we demonstrate that instruction modeling significantly improves performance in planning environments when training with a limited number of demonstrations on the BabyAI and Crafter benchmarks. In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans. More details and code can be found at https://github.com/jhejna/instruction-prediction.

Original languageEnglish (US)
Title of host publicationAAAI-23 Technical Tracks 7
EditorsBrian Williams, Yiling Chen, Jennifer Neville
PublisherAAAI press
Pages7857-7865
Number of pages9
ISBN (Electronic)9781577358800
StatePublished - Jun 27 2023
Event37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, United States
Duration: Feb 7 2023Feb 14 2023

Publication series

NameProceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Volume37

Conference

Conference37th AAAI Conference on Artificial Intelligence, AAAI 2023
Country/TerritoryUnited States
CityWashington
Period2/7/232/14/23

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Improving Long-Horizon Imitation through Instruction Prediction'. Together they form a unique fingerprint.

Cite this