Abstract
Current datasets for automatic drum transcription (ADT) are small and limited due to the tedious task of annotating onset events. While some of these datasets contain large vocabularies of percussive instrument classes (e.g. ~20 classes), many of these classes occur very infrequently in the data. This paucity of data makes it difficult to train models that support such large vocabularies. Therefore, data-driven drum transcription models often focus on a small number of percussive instrument classes (e.g. 3 classes). In this paper, we propose to support large-vocabulary drum transcription by generating a large synthetic dataset (210,000 eight second examples) of audio examples for which we have ground-truth transcriptions. Using this synthetic dataset along with existing drum transcription datasets, we train convolutional-recurrent neural networks (CRNNs) in a multi-task framework to support large-vocabulary ADT. We find that training on both the synthetic and real music drum transcription datasets together improves performance on not only large-vocabulary ADT, but also beat / downbeat detection small-vocabulary ADT.
Original language | English (US) |
---|---|
Pages (from-to) | 72-79 |
Number of pages | 8 |
Journal | Proceedings of the International Conference on Digital Audio Effects, DAFx |
State | Published - 2018 |
Event | 21st International Conference on Digital Audio Effects, DAFx 2018 - Aveiro, Portugal Duration: Sep 4 2018 → Sep 8 2018 |
ASJC Scopus subject areas
- Computer Science Applications
- Signal Processing
- Music