TY - GEN
T1 - Structured training for large-vocabulary chord recognition
AU - McFee, Brian
AU - Bello, Juan Pablo
N1 - Funding Information:
BM acknowledges support from the Moore-Sloan data science environment at NYU. We thank the NVIDIA Corporation for the donation of a Tesla K40 GPU.
Publisher Copyright:
© 2019 Brian McFee, Juan Pablo Bello.
PY - 2017
Y1 - 2017
N2 - Automatic chord recognition systems operating in the large-vocabulary regime must overcome data scarcity: certain classes occur much less frequently than others, and this presents a significant challenge when estimating model parameters. While most systems model the chord recognition task as a (multi-class) classification problem, few attempts have been made to directly exploit the intrinsic structural similarities between chord classes. In this work, we develop a deep convolutional-recurrent model for automatic chord recognition over a vocabulary of 170 classes. To exploit structural relationships between chord classes, the model is trained to produce both the time-varying chord label sequence as well as binary encodings of chord roots and qualities. This binary encoding directly exposes similarities between related classes, allowing the model to learn a more coherent representation of simultaneous pitch content. Evaluations on a corpus of 1217 annotated recordings demonstrate substantial improvements compared to previous models.
AB - Automatic chord recognition systems operating in the large-vocabulary regime must overcome data scarcity: certain classes occur much less frequently than others, and this presents a significant challenge when estimating model parameters. While most systems model the chord recognition task as a (multi-class) classification problem, few attempts have been made to directly exploit the intrinsic structural similarities between chord classes. In this work, we develop a deep convolutional-recurrent model for automatic chord recognition over a vocabulary of 170 classes. To exploit structural relationships between chord classes, the model is trained to produce both the time-varying chord label sequence as well as binary encodings of chord roots and qualities. This binary encoding directly exposes similarities between related classes, allowing the model to learn a more coherent representation of simultaneous pitch content. Evaluations on a corpus of 1217 annotated recordings demonstrate substantial improvements compared to previous models.
UR - http://www.scopus.com/inward/record.url?scp=85053714526&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85053714526&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85053714526
T3 - Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017
SP - 188
EP - 194
BT - Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2017
A2 - Cunningham, Sally Jo
A2 - Duan, Zhiyao
A2 - Hu, Xiao
A2 - Turnbull, Douglas
PB - International Society for Music Information Retrieval
T2 - 18th International Society for Music Information Retrieval Conference, ISMIR 2017
Y2 - 23 October 2017 through 27 October 2017
ER -