Tricycle: Audio representation learning from sensor network data using self-supervision

Mark Cartwright, Jason Cramer, Justin Salamon, Juan Pablo Bello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Self-supervised representation learning with deep neural networks is a powerful tool for machine learning tasks with limited labeled data but extensive unlabeled data. To learn representations, self-supervised models are typically trained on a pretext task to predict structure in the data (e.g. audio-visual correspondence, short-term temporal sequence, word sequence) that is indicative of higher-level concepts relevant to a target, downstream task. Sensor networks are promising yet unexplored sources of data for self-supervised learning - they collect large amounts of unlabeled yet timestamped data over extended periods of time and typically exhibit long-term temporal structure (e.g., over hours, months, years) not observable at the short time scales previously explored in self-supervised learning (e.g., seconds). This structure can be present even in single-modal data and therefore could be exploited for self-supervision in many types of sensor networks. In this work, we present a model for learning audio representations by predicting the long-term, cyclic temporal structure in audio data collected from an urban acoustic sensor network. We then demonstrate the utility of the learned audio representation in an urban sound event detection task with limited labeled data.

Original languageEnglish (US)
Title of host publication2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages278-282
Number of pages5
ISBN (Electronic)9781728111230
DOIs
StatePublished - Oct 2019
Event2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019 - New Paltz, United States
Duration: Oct 20 2019Oct 23 2019

Publication series

NameIEEE Workshop on Applications of Signal Processing to Audio and Acoustics
Volume2019-October
ISSN (Print)1931-1168
ISSN (Electronic)1947-1629

Conference

Conference2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019
CountryUnited States
CityNew Paltz
Period10/20/1910/23/19

Keywords

  • audio embedding
  • representation learning
  • self-supervised learning
  • sensor network

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Tricycle: Audio representation learning from sensor network data using self-supervision'. Together they form a unique fingerprint.

  • Cite this

    Cartwright, M., Cramer, J., Salamon, J., & Bello, J. P. (2019). Tricycle: Audio representation learning from sensor network data using self-supervision. In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019 (pp. 278-282). [8937265] (IEEE Workshop on Applications of Signal Processing to Audio and Acoustics; Vol. 2019-October). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/WASPAA.2019.8937265