Abstract
In this report we describe an ongoing line of research for solving single-channel source separation problems. Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency representation of the input data. A challenge faced by these approaches is to effectively exploit the temporal dependencies of the signals at scales larger than the duration of a time-frame. In this work we propose to tackle this problem by modeling the signals using a time-frequency representation with multiple temporal resolutions. The proposed representation consists of a pyramid of wavelet scattering operators, which generalizes Constant Q Transforms (CQT) with extra layers of convolution and complex modulus. We first show that learning standard models with this multi-resolution setting improves source separation results over fixed-resolution methods. As study case, we use Non-Negative Matrix Factorizations (NMF) that has been widely considered in many audio application. Then, we investigate the inclusion of the proposed multi-resolution setting into a discriminative training regime. We discuss several alternatives using different deep neural network architectures.
Original language | English (US) |
---|---|
State | Published - 2015 |
Event | 3rd International Conference on Learning Representations, ICLR 2015 - San Diego, United States Duration: May 7 2015 → May 9 2015 |
Conference
Conference | 3rd International Conference on Learning Representations, ICLR 2015 |
---|---|
Country/Territory | United States |
City | San Diego |
Period | 5/7/15 → 5/9/15 |
ASJC Scopus subject areas
- Education
- Linguistics and Language
- Language and Linguistics
- Computer Science Applications