Mirdata: Software for reproducible usage of datasets

Rachel M. Bittner, Magdalena Fuentes, David Rubinstein, Andreas Jansson, Keunwoo Choi, Thor Kell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

There are a number of efforts in the MIR community towards increased reproducibility, such as creating more open datasets, publishing code, and the use of common software libraries, e.g. for evaluation. However, when it comes to datasets, there is usually little guarantee that researchers are using the exact same data in the same way, which among other issues, makes comparisons of different methods on the "same" datasets problematic. In this paper, we first show how (often unknown) differences in datasets can lead to significantly different experimental results. We propose a solution to these problems in the form of an open source library, mirdata, which handles datasets in their current distribution modes, but controls for possible variability. In particular, it contains tools which: (1) validate if the user's data (e.g. audio, annotations) is consistent with a canonical version of the dataset; (2) load annotations in a consistent manner; (3) download or give instructions for obtaining data; and (4) make it easy to perform track metadata-specific analysis.

Original languageEnglish (US)
Title of host publicationProceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019
EditorsArthur Flexer, Geoffroy Peeters, Julian Urbano, Anja Volk
PublisherInternational Society for Music Information Retrieval
Pages99-106
Number of pages8
ISBN (Electronic)9781732729919
StatePublished - 2019
Event20th International Society for Music Information Retrieval Conference, ISMIR 2019 - Delft, Netherlands
Duration: Nov 4 2019Nov 8 2019

Publication series

NameProceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019

Conference

Conference20th International Society for Music Information Retrieval Conference, ISMIR 2019
Country/TerritoryNetherlands
CityDelft
Period11/4/1911/8/19

ASJC Scopus subject areas

  • Music
  • Information Systems

Fingerprint

Dive into the research topics of 'Mirdata: Software for reproducible usage of datasets'. Together they form a unique fingerprint.

Cite this