TY - GEN
T1 - Mirdata
T2 - 20th International Society for Music Information Retrieval Conference, ISMIR 2019
AU - Bittner, Rachel M.
AU - Fuentes, Magdalena
AU - Rubinstein, David
AU - Jansson, Andreas
AU - Choi, Keunwoo
AU - Kell, Thor
N1 - Publisher Copyright:
© 2020 International Society for Music Information Retrieval. All rights reserved.
PY - 2019
Y1 - 2019
N2 - There are a number of efforts in the MIR community towards increased reproducibility, such as creating more open datasets, publishing code, and the use of common software libraries, e.g. for evaluation. However, when it comes to datasets, there is usually little guarantee that researchers are using the exact same data in the same way, which among other issues, makes comparisons of different methods on the "same" datasets problematic. In this paper, we first show how (often unknown) differences in datasets can lead to significantly different experimental results. We propose a solution to these problems in the form of an open source library, mirdata, which handles datasets in their current distribution modes, but controls for possible variability. In particular, it contains tools which: (1) validate if the user's data (e.g. audio, annotations) is consistent with a canonical version of the dataset; (2) load annotations in a consistent manner; (3) download or give instructions for obtaining data; and (4) make it easy to perform track metadata-specific analysis.
AB - There are a number of efforts in the MIR community towards increased reproducibility, such as creating more open datasets, publishing code, and the use of common software libraries, e.g. for evaluation. However, when it comes to datasets, there is usually little guarantee that researchers are using the exact same data in the same way, which among other issues, makes comparisons of different methods on the "same" datasets problematic. In this paper, we first show how (often unknown) differences in datasets can lead to significantly different experimental results. We propose a solution to these problems in the form of an open source library, mirdata, which handles datasets in their current distribution modes, but controls for possible variability. In particular, it contains tools which: (1) validate if the user's data (e.g. audio, annotations) is consistent with a canonical version of the dataset; (2) load annotations in a consistent manner; (3) download or give instructions for obtaining data; and (4) make it easy to perform track metadata-specific analysis.
UR - http://www.scopus.com/inward/record.url?scp=85087093741&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85087093741&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85087093741
T3 - Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019
SP - 99
EP - 106
BT - Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019
A2 - Flexer, Arthur
A2 - Peeters, Geoffroy
A2 - Urbano, Julian
A2 - Volk, Anja
PB - International Society for Music Information Retrieval
Y2 - 4 November 2019 through 8 November 2019
ER -