TY - GEN
T1 - Adversarial learning for improved onsets and frames music transcription
AU - Kim, Jong Wook
AU - Bello, Juan Pablo
N1 - Publisher Copyright:
© 2020 International Society for Music Information Retrieval. All rights reserved.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019
Y1 - 2019
N2 - Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance. These approaches commonly employ supervised learning models that predict various time-frequency representations, by minimizing element-wise losses such as the cross entropy function. However, applying the loss in this manner assumes conditional independence of each label given the input, and thus cannot accurately express inter-label dependencies. To address this issue, we introduce an adversarial training scheme that operates directly on the timefrequency representations and makes the output distribution closer to the ground-truth. Through adversarial learning, we achieve a consistent improvement in both framelevel and note-level metrics over Onsets and Frames, a state-of-the-art music transcription model. Our results show that adversarial learning can significantly reduce the error rate while increasing the confidence of the model estimations. Our approach is generic and applicable to any transcription model based on multi-label predictions, which are very common in music signal analysis.
AB - Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance. These approaches commonly employ supervised learning models that predict various time-frequency representations, by minimizing element-wise losses such as the cross entropy function. However, applying the loss in this manner assumes conditional independence of each label given the input, and thus cannot accurately express inter-label dependencies. To address this issue, we introduce an adversarial training scheme that operates directly on the timefrequency representations and makes the output distribution closer to the ground-truth. Through adversarial learning, we achieve a consistent improvement in both framelevel and note-level metrics over Onsets and Frames, a state-of-the-art music transcription model. Our results show that adversarial learning can significantly reduce the error rate while increasing the confidence of the model estimations. Our approach is generic and applicable to any transcription model based on multi-label predictions, which are very common in music signal analysis.
UR - http://www.scopus.com/inward/record.url?scp=85087095765&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85087095765&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85087095765
T3 - Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019
SP - 670
EP - 677
BT - Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019
A2 - Flexer, Arthur
A2 - Peeters, Geoffroy
A2 - Urbano, Julian
A2 - Volk, Anja
PB - International Society for Music Information Retrieval
T2 - 20th International Society for Music Information Retrieval Conference, ISMIR 2019
Y2 - 4 November 2019 through 8 November 2019
ER -