TY - GEN
T1 - A Music Structure Informed Downbeat Tracking System Using Skip-chain Conditional Random Fields and Deep Learning
AU - Fuentes, Magdalena
AU - McFee, Brian
AU - Crayencour, Helene C.
AU - Essid, Slim
AU - Bello, Juan Pablo
N1 - Funding Information:
* This work is partially supported by the NYU Abu-Dhabi Research Enhancement Fund.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/5/1
Y1 - 2019/5/1
N2 - In recent years the task of downbeat tracking has received increasing attention and the state of the art has been improved with the introduction of deep learning methods. Among proposed solutions, existing systems exploit short-term musical rules as part of their language modelling. In this work we show in an oracle scenario how including longer-term musical rules, in particular music structure, can enhance downbeat estimation. We introduce a skip-chain conditional random field language model for downbeat tracking designed to include section information in an unified and flexible framework. We combine this model with a state-of-the-art convolutional-recurrent network and we contrast the system's performance to the commonly used Bar Pointer model. Our experiments on the popular Beatles dataset show that incorporating structure information in the language model leads to more consistent and more robust downbeat estimations.
AB - In recent years the task of downbeat tracking has received increasing attention and the state of the art has been improved with the introduction of deep learning methods. Among proposed solutions, existing systems exploit short-term musical rules as part of their language modelling. In this work we show in an oracle scenario how including longer-term musical rules, in particular music structure, can enhance downbeat estimation. We introduce a skip-chain conditional random field language model for downbeat tracking designed to include section information in an unified and flexible framework. We combine this model with a state-of-the-art convolutional-recurrent network and we contrast the system's performance to the commonly used Bar Pointer model. Our experiments on the popular Beatles dataset show that incorporating structure information in the language model leads to more consistent and more robust downbeat estimations.
KW - Convolutional-Recurrent Neural Networks
KW - Deep Learning
KW - Downbeat Tracking
KW - Music Structure
KW - Skip-Chain Conditional Random Fields
UR - http://www.scopus.com/inward/record.url?scp=85068990053&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068990053&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2019.8682870
DO - 10.1109/ICASSP.2019.8682870
M3 - Conference contribution
AN - SCOPUS:85068990053
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 481
EP - 485
BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Y2 - 12 May 2019 through 17 May 2019
ER -