Audio source separation with discriminative scattering networks

Pablo Sprechmann, Joan Bruna, Yann Lecun

Research output: Contribution to conferencePaperpeer-review

Abstract

In this report we describe an ongoing line of research for solving single-channel source separation problems. Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency representation of the input data. A challenge faced by these approaches is to effectively exploit the temporal dependencies of the signals at scales larger than the duration of a time-frame. In this work we propose to tackle this problem by modeling the signals using a time-frequency representation with multiple temporal resolutions. The proposed representation consists of a pyramid of wavelet scattering operators, which generalizes Constant Q Transforms (CQT) with extra layers of convolution and complex modulus. We first show that learning standard models with this multi-resolution setting improves source separation results over fixed-resolution methods. As study case, we use Non-Negative Matrix Factorizations (NMF) that has been widely considered in many audio application. Then, we investigate the inclusion of the proposed multi-resolution setting into a discriminative training regime. We discuss several alternatives using different deep neural network architectures.

Original languageEnglish (US)
StatePublished - 2015
Event3rd International Conference on Learning Representations, ICLR 2015 - San Diego, United States
Duration: May 7 2015May 9 2015

Conference

Conference3rd International Conference on Learning Representations, ICLR 2015
Country/TerritoryUnited States
CitySan Diego
Period5/7/155/9/15

ASJC Scopus subject areas

  • Education
  • Linguistics and Language
  • Language and Linguistics
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Audio source separation with discriminative scattering networks'. Together they form a unique fingerprint.

Cite this