TY - GEN
T1 - Data driven and discriminative projections for large-scale cover song identification
AU - Humphrey, Eric J.
AU - Nieto, Oriol
AU - Bello, Juan P.
N1 - Publisher Copyright:
© 2013 International Society for Music Information Retrieval.
PY - 2013
Y1 - 2013
N2 - The predominant approach to computing document similarity in web scale applications proceeds by encoding task-specific invariance in a vectorized representation, such that the relationship between items can be computed efficiently by a simple scoring function, e.g. Euclidean distance. Here, we improve upon previous work in large-scale cover song identification by using data-driven projections at different time-scales to capture local features and embed summary vectors into a semantically organized space. We achieve this by projecting 2D-Fourier Magnitude Coefficients (2D-FMCs) of beat-chroma patches into a sparse, high dimensional representation which, due to the shift invariance properties of the Fourier Transform, is similar in principle to convolutional sparse coding. After aggregating these local beat-chroma projections, we apply supervised dimensionality reduction to recover an embedding where distance is useful for cover song retrieval. Evaluating on the Million Song Dataset, we find our method outperforms the current state of the art overall, but significantly so for top-k metrics, which indicate improved usability.
AB - The predominant approach to computing document similarity in web scale applications proceeds by encoding task-specific invariance in a vectorized representation, such that the relationship between items can be computed efficiently by a simple scoring function, e.g. Euclidean distance. Here, we improve upon previous work in large-scale cover song identification by using data-driven projections at different time-scales to capture local features and embed summary vectors into a semantically organized space. We achieve this by projecting 2D-Fourier Magnitude Coefficients (2D-FMCs) of beat-chroma patches into a sparse, high dimensional representation which, due to the shift invariance properties of the Fourier Transform, is similar in principle to convolutional sparse coding. After aggregating these local beat-chroma projections, we apply supervised dimensionality reduction to recover an embedding where distance is useful for cover song retrieval. Evaluating on the Million Song Dataset, we find our method outperforms the current state of the art overall, but significantly so for top-k metrics, which indicate improved usability.
UR - http://www.scopus.com/inward/record.url?scp=85006053298&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85006053298&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85006053298
T3 - Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013
SP - 149
EP - 154
BT - Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013
A2 - Britto, Alceu de Souza
A2 - Gouyon, Fabien
A2 - Dixon, Simon
PB - International Society for Music Information Retrieval
T2 - 14th International Society for Music Information Retrieval Conference, ISMIR 2013
Y2 - 4 November 2013 through 8 November 2013
ER -