Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines

Stefan Grafberger, Julia Stoyanovich, Sebastian Schelter

    Research output: Contribution to conferencePaperpeer-review

    Abstract

    Machine Learning (ML) is increasingly used to automate impactful decisions, and the risks arising from this wide-spread use are garnering attention from policy makers, scientists, and the media. ML applications are often very brittle with respect to their input data, which leads to concerns about their reliability, accountability, and fairness. In this paper we discuss such hard-to-identify data issues and describe mlinspect, a library that enables lightweight lineage-based inspection of ML preprocessing pipelines. The key idea is to extract a directed acyclic graph representation of the dataflow from ML preprocessing pipelines in Python, and to use this representation to automatically instrument the code with predefined inspections based on a lightweight annotation propagation approach. In contrast to existing work, mlinspect operates on declarative abstractions of popular data science libraries like estimator/transformer pipelines and does not require manual code instrumentation. We discuss the design and implementation of the mlinspect prototype, and give a complex end-to-end example that illustrates its functionality.

    Original languageEnglish (US)
    StatePublished - 2021
    Event11th Annual Conference on Innovative Data Systems Research, CIDR 2021 - Virtual, Online
    Duration: Jan 11 2021Jan 15 2021

    Conference

    Conference11th Annual Conference on Innovative Data Systems Research, CIDR 2021
    CityVirtual, Online
    Period1/11/211/15/21

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Information Systems
    • Information Systems and Management
    • Hardware and Architecture

    Fingerprint

    Dive into the research topics of 'Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines'. Together they form a unique fingerprint.

    Cite this