Adversarial learning for improved onsets and frames music transcription

Jong Wook Kim, Juan Pablo Bello

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance. These approaches commonly employ supervised learning models that predict various time-frequency representations, by minimizing element-wise losses such as the cross entropy function. However, applying the loss in this manner assumes conditional independence of each label given the input, and thus cannot accurately express inter-label dependencies. To address this issue, we introduce an adversarial training scheme that operates directly on the timefrequency representations and makes the output distribution closer to the ground-truth. Through adversarial learning, we achieve a consistent improvement in both framelevel and note-level metrics over Onsets and Frames, a state-of-the-art music transcription model. Our results show that adversarial learning can significantly reduce the error rate while increasing the confidence of the model estimations. Our approach is generic and applicable to any transcription model based on multi-label predictions, which are very common in music signal analysis.

Original languageEnglish (US)
Title of host publicationProceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019
EditorsArthur Flexer, Geoffroy Peeters, Julian Urbano, Anja Volk
PublisherInternational Society for Music Information Retrieval
Pages670-677
Number of pages8
ISBN (Electronic)9781732729919
StatePublished - 2019
Event20th International Society for Music Information Retrieval Conference, ISMIR 2019 - Delft, Netherlands
Duration: Nov 4 2019Nov 8 2019

Publication series

NameProceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019

Conference

Conference20th International Society for Music Information Retrieval Conference, ISMIR 2019
CountryNetherlands
CityDelft
Period11/4/1911/8/19

ASJC Scopus subject areas

  • Music
  • Information Systems

Fingerprint Dive into the research topics of 'Adversarial learning for improved onsets and frames music transcription'. Together they form a unique fingerprint.

  • Cite this

    Kim, J. W., & Bello, J. P. (2019). Adversarial learning for improved onsets and frames music transcription. In A. Flexer, G. Peeters, J. Urbano, & A. Volk (Eds.), Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019 (pp. 670-677). (Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019). International Society for Music Information Retrieval.