Swoosh! Rattle! Thump!-Actions that Sound

Dhiraj Gandhi, Abhinav Gupta, Lerrel Pinto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Truly intelligent agents need to capture the interplay of all their senses to build a rich physical understanding of their world. In robotics, we have seen tremendous progress in using visual and tactile perception; however, we have often ignored a key sense: sound. This is primarily due to the lack of data that captures the interplay of action and sound. In this work, we perform the first large-scale study of the interactions between sound and robotic action. To do this, we create the largest available sound-action-vision dataset with 15,000 interactions on 60 objects using our robotic platform Tilt-Bot. By tilting objects and allowing them to crash into the walls of a robotic tray, we collect rich four-channel audio information. Using this data, we explore the synergies between sound and action and present three key insights. First, sound is indicative of fine-grained object class information, e.g., sound can differentiate a metal screwdriver from a metal wrench. Second, sound also contains information about the causal effects of an action, i.e. given the sound produced, we can predict what action was applied to the object. Finally, object representations derived from audio embeddings are indicative of implicit physical properties. We demonstrate that on previously unseen objects, audio embeddings generated through interactions can predict forward models 24% better than passive visual embeddings. Project videos and data are at this url

Original languageEnglish (US)
Title of host publicationRobotics
Subtitle of host publicationScience and Systems XVI
EditorsMarc Toussaint, Antonio Bicchi, Tucker Hermans
PublisherMIT Press Journals
ISBN (Print)9780992374761
DOIs
StatePublished - 2020
Event16th Robotics: Science and Systems, RSS 2020 - Virtual, Online
Duration: Jul 12 2020Jul 16 2020

Publication series

NameRobotics: Science and Systems
ISSN (Electronic)2330-765X

Conference

Conference16th Robotics: Science and Systems, RSS 2020
CityVirtual, Online
Period7/12/207/16/20

ASJC Scopus subject areas

  • Artificial Intelligence
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Swoosh! Rattle! Thump!-Actions that Sound'. Together they form a unique fingerprint.

Cite this