TY - GEN
T1 - Analysis of common design choices in deep learning systems for downbeat tracking
AU - Fuentes, Magdalena
AU - McFee, Brian
AU - Crayencour, Hélène C.
AU - Essid, Slim
AU - Bello, Juan P.
N1 - Funding Information:
The authors would like to thank Simon Durand and Florian Krebs for sharing the code of their downbeat tracking architectures with us. B.M. is supported by the Moore-Sloan Data Science Environment at NYU.
Funding Information:
Acknowledgments. The authors would like to thank Simon Durand and Florian Krebs for sharing the code of their downbeattracking architectures with us. B.M. is supported by the Moore-Sloan Data Science Environmentat NYU.
Publisher Copyright:
© Magdalena Fuentes, Brian McFee, Hélène C. Crayencour, Slim Essid, Juan P. Bello.
PY - 2018
Y1 - 2018
N2 - Downbeat tracking consists of annotating a piece of musical audio with the estimated position of the first beat of each bar. In recent years, increasing attention has been paid to applying deep learning models to this task, and various architectures have been proposed, leading to a significant improvement in accuracy. However, there are few insights about the role of the various design choices and the delicate interactions between them. In this paper we offer a systematic investigation of the impact of largely adopted variants. We study the effects of the temporal granularity of the input representation (i.e. beat-level vs tatum-level) and the encoding of the networks outputs. We also investigate the potential of convolutional-recurrent networks, which have not been explored in previous downbeat tracking systems. To this end, we exploit a state-of-the-art recurrent neural network where we introduce those variants, while keeping the training data, network learning parameters and post-processing stages fixed. We find that temporal granularity has a significant impact on performance, and we analyze its interaction with the encoding of the networks outputs.
AB - Downbeat tracking consists of annotating a piece of musical audio with the estimated position of the first beat of each bar. In recent years, increasing attention has been paid to applying deep learning models to this task, and various architectures have been proposed, leading to a significant improvement in accuracy. However, there are few insights about the role of the various design choices and the delicate interactions between them. In this paper we offer a systematic investigation of the impact of largely adopted variants. We study the effects of the temporal granularity of the input representation (i.e. beat-level vs tatum-level) and the encoding of the networks outputs. We also investigate the potential of convolutional-recurrent networks, which have not been explored in previous downbeat tracking systems. To this end, we exploit a state-of-the-art recurrent neural network where we introduce those variants, while keeping the training data, network learning parameters and post-processing stages fixed. We find that temporal granularity has a significant impact on performance, and we analyze its interaction with the encoding of the networks outputs.
UR - http://www.scopus.com/inward/record.url?scp=85065981409&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85065981409&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85065981409
T3 - Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018
SP - 106
EP - 112
BT - Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018
A2 - Gomez, Emilia
A2 - Hu, Xiao
A2 - Humphrey, Eric
A2 - Benetos, Emmanouil
PB - International Society for Music Information Retrieval
T2 - 19th International Society for Music Information Retrieval Conference, ISMIR 2018
Y2 - 23 September 2018 through 27 September 2018
ER -