Audio source separation with discriminative scattering networks

Pablo Sprechmann, Joan Bruna, Yann LeCun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency representation of the input data. A challenge faced by these approaches is to effectively exploit the temporal dependencies of the signals at scales larger than the duration of a time-frame. In this work we propose to tackle this problem by modeling the signals using a time-frequency representation with multiple temporal resolutions. For this reason we use a signal representation that consists of a pyramid of wavelet scattering operators, which generalizes Constant Q Transforms (CQT) with extra layers of convolution and complex modulus. We first show that learning standard models with this multi-resolution setting improves source separation results over fixed-resolution methods. As study case, we use Non-Negative Matrix Factorizations (NMF) that has been widely considered in many audio application. Then, we investigate the inclusion of the proposed multi-resolution setting into a discriminative training regime. We discuss several alternatives using different deep neural network architectures, and our preliminary experiments suggest that in this task, finite impulse, multi-resolution Convolutional Networks are a competitive baseline compared to recurrent alternatives.

Original languageEnglish (US)
Title of host publicationLatent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings
EditorsZbynĕk Koldovský, Emmanuel Vincent, Arie Yeredor, Petr Tichavský
PublisherSpringer Verlag
Pages259-267
Number of pages9
ISBN (Print)9783319224817
DOIs
StatePublished - 2015
Event12th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2015 - Liberec, Czech Republic
Duration: Aug 25 2015Aug 28 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9237
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other12th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2015
CountryCzech Republic
CityLiberec
Period8/25/158/28/15

Keywords

  • Deep learning
  • Non-negative matrix factorization
  • Scattering
  • Source separation

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Audio source separation with discriminative scattering networks'. Together they form a unique fingerprint.

  • Cite this

    Sprechmann, P., Bruna, J., & LeCun, Y. (2015). Audio source separation with discriminative scattering networks. In Z. Koldovský, E. Vincent, A. Yeredor, & P. Tichavský (Eds.), Latent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings (pp. 259-267). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9237). Springer Verlag. https://doi.org/10.1007/978-3-319-22482-4_30