Imitation learning by coaching

He He, Hal Daumé, Jason Eisner

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Imitation Learning has been shown to be successful in solving many challenging real-world problems. Some recent approaches give strong performance guarantees by training the policy iteratively. However, it is important to note that these guarantees depend on how well the policy we found can imitate the oracle on the training data. When there is a substantial difference between the oracle's ability and the learner's policy space, we may fail to find a policy that has low error on the training set. In such cases, we propose to use a coach that demonstrates easy-to-learn actions for the learner and gradually approaches the oracle. By a reduction of learning by demonstration to online learning, we prove that coaching can yield a lower regret bound than using the oracle. We apply our algorithm to cost-sensitive dynamic feature selection, a hard decision problem that considers a user-specified accuracy-cost trade-off. Experimental results on UCI datasets show that our method outperforms state-of-the-art imitation learning methods in dynamic feature selection and two static feature selection methods.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 25
Subtitle of host publication26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
Pages3149-3157
Number of pages9
StatePublished - 2012
Event26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012 - Lake Tahoe, NV, United States
Duration: Dec 3 2012Dec 6 2012

Publication series

NameAdvances in Neural Information Processing Systems
Volume4
ISSN (Print)1049-5258

Other

Other26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012
CountryUnited States
CityLake Tahoe, NV
Period12/3/1212/6/12

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint Dive into the research topics of 'Imitation learning by coaching'. Together they form a unique fingerprint.

  • Cite this

    He, H., Daumé, H., & Eisner, J. (2012). Imitation learning by coaching. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, NIPS 2012 (pp. 3149-3157). (Advances in Neural Information Processing Systems; Vol. 4).