The curious robot: Learning visual representations via physical interactions

Lerrel Pinto, Dhiraj Gandhi, Yuanfeng Han, Yong Lae Park, Abhinav Gupta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

What is the right supervisory signal to train visual representations? Current approaches in computer vision use category labels from datasets such as ImageNet to train ConvNets. However, in case of biological agents, visual representation learning does not require millions of semantic labels. We argue that biological agents use physical interactions with the world to learn visual representations unlike current vision systems which just use passive observations (images and videos downloaded from web). For example, babies push objects, poke them, put them in their mouth and throw them to learn representations. Towards this goal, we build one of the first systems on a Baxter platform that pushes, pokes, grasps and observes objects in a tabletop environment. It uses four different types of physical interactions to collect more than 130K datapoints, with each datapoint providing supervision to a shared ConvNet architecture allowing us to learn visual representations. We show the quality of learned representations by observing neuron activations and performing nearest neighbor retrieval on this learned representation. Quantitatively, we evaluate our learned ConvNet on image classification tasks and show improvements compared to learning without external data. Finally, on the task of instance retrieval, our network outperforms the ImageNet network on recall@1 by 3%.

Original languageEnglish (US)
Title of host publicationComputer Vision - 14th European Conference, ECCV 2016, Proceedings
EditorsBastian Leibe, Nicu Sebe, Max Welling, Jiri Matas
PublisherSpringer Verlag
Pages3-18
Number of pages16
ISBN (Print)9783319464749
DOIs
StatePublished - 2016
Event14th European Conference on Computer Vision, ECCV 2016 - Amsterdam, Netherlands
Duration: Oct 8 2016Oct 16 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9906 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th European Conference on Computer Vision, ECCV 2016
CountryNetherlands
CityAmsterdam
Period10/8/1610/16/16

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'The curious robot: Learning visual representations via physical interactions'. Together they form a unique fingerprint.

Cite this