TY - GEN
T1 - Learning the Helix Topology of Musical Pitch
AU - Lostanlen, Vincent
AU - Sridhar, Sripathi
AU - McFee, Brian
AU - Farnsworth, Andrew
AU - Bello, Juan Pablo
N1 - Funding Information:
This work is partially supported by NSF award 1633259 (BirdVox).
Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - To explain the consonance of octaves, music psychologists represent pitch as a helix where azimuth and axial coordinate correspond to pitch class and pitch height respectively. This article addresses the problem of discovering this helical structure from unlabeled audio data. We measure Pearson correlations in the constant-Q transform (CQT) domain to build a K-nearest neighbor graph between frequency subbands. Then, we run the Isomap manifold learning algorithm to represent this graph in a three-dimensional space in which straight lines approximate graph geodesics. Experiments on isolated musical notes demonstrate that the resulting manifold resembles a helix which makes a full turn at every octave. A circular shape is also found in English speech, but not in urban noise. We discuss the impact of various design choices on the visualization: instrumentarium, loudness mapping function, and number of neighbors K.
AB - To explain the consonance of octaves, music psychologists represent pitch as a helix where azimuth and axial coordinate correspond to pitch class and pitch height respectively. This article addresses the problem of discovering this helical structure from unlabeled audio data. We measure Pearson correlations in the constant-Q transform (CQT) domain to build a K-nearest neighbor graph between frequency subbands. Then, we run the Isomap manifold learning algorithm to represent this graph in a three-dimensional space in which straight lines approximate graph geodesics. Experiments on isolated musical notes demonstrate that the resulting manifold resembles a helix which makes a full turn at every octave. A circular shape is also found in English speech, but not in urban noise. We discuss the impact of various design choices on the visualization: instrumentarium, loudness mapping function, and number of neighbors K.
KW - Continuous wavelet transforms
KW - distance learning
KW - music
KW - pitch control (audio)
KW - shortest path problem
UR - http://www.scopus.com/inward/record.url?scp=85089209720&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089209720&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9053644
DO - 10.1109/ICASSP40776.2020.9053644
M3 - Conference contribution
AN - SCOPUS:85089209720
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 11
EP - 15
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -