Debugging machine learning pipelines

Raoni Lourenço, Juliana Freire, Dennis Shasha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Machine learning tasks entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous or uninformative outputs, the pipeline may fail or produce incorrect results. Inferring the root cause of failures and unexpected behavior is challenging, usually requiring much human thought, and is both time consuming and error prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our source code and experimental data will be available for reproducibility and enhancement.

Original languageEnglish (US)
Title of host publicationProceedings of the 3rd Workshop on Data Management for End-To-End Machine Learning, DEEM 2019 - In conjunction with the 2019 ACM SIGMOD/PODS Conference
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450367974
DOIs
StatePublished - Jun 30 2019
Event3rd Workshop on Data Management for End-To-End Machine Learning, DEEM 2019 - In conjunction with the 2019 ACM SIGMOD/PODS Conference - Amsterdam, Netherlands
Duration: Jun 30 2019 → …

Publication series

NameProceedings of the ACM SIGMOD International Conference on Management of Data
ISSN (Print)0730-8078

Conference

Conference3rd Workshop on Data Management for End-To-End Machine Learning, DEEM 2019 - In conjunction with the 2019 ACM SIGMOD/PODS Conference
CountryNetherlands
CityAmsterdam
Period6/30/19 → …

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'Debugging machine learning pipelines'. Together they form a unique fingerprint.

  • Cite this

    Lourenço, R., Freire, J., & Shasha, D. (2019). Debugging machine learning pipelines. In Proceedings of the 3rd Workshop on Data Management for End-To-End Machine Learning, DEEM 2019 - In conjunction with the 2019 ACM SIGMOD/PODS Conference [3329489] (Proceedings of the ACM SIGMOD International Conference on Management of Data). Association for Computing Machinery. https://doi.org/10.1145/3329486.3329489