Unsupervised learning of sparse features for scalable audio classification

Mikael Henaff, Kevin Jarrett, Koray Kavukcuoglu, Yann Lecun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this work we present a system to automatically learn features from audio in an unsupervised manner. Our method first learns an overcomplete dictionary which can be used to sparsely decompose log-scaled spectrograms. It then trains an efficient encoder which quickly maps new inputs to approximations of their sparse representations using the learned dictionary. This avoids expensive iterative procedures usually required to infer sparse codes. We then use these sparse codes as inputs for a linear Support Vector Machine (SVM). Our system achieves 83.4% accuracy in predicting genres on the GTZAN dataset, which is competitive with current state-of-the-art approaches. Furthermore, the use of a simple linear classifier combined with a fast feature extraction system allows our approach to scale well to large datasets.

Original languageEnglish (US)
Title of host publicationProceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011
Pages681-686
Number of pages6
StatePublished - 2011
Event12th International Society for Music Information Retrieval Conference, ISMIR 2011 - Miami, FL, United States
Duration: Oct 24 2011Oct 28 2011

Publication series

NameProceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011

Other

Other12th International Society for Music Information Retrieval Conference, ISMIR 2011
Country/TerritoryUnited States
CityMiami, FL
Period10/24/1110/28/11

ASJC Scopus subject areas

  • Music
  • Information Systems

Fingerprint

Dive into the research topics of 'Unsupervised learning of sparse features for scalable audio classification'. Together they form a unique fingerprint.

Cite this