Abstract
Machine Learning (ML) is increasingly used to automate impactful decisions, and the risks arising from this wide-spread use are garnering attention from policy makers, scientists, and the media. ML applications are often very brittle with respect to their input data, which leads to concerns about their reliability, accountability, and fairness. In this paper we discuss such hard-to-identify data issues and describe mlinspect, a library that enables lightweight lineage-based inspection of ML preprocessing pipelines. The key idea is to extract a directed acyclic graph representation of the dataflow from ML preprocessing pipelines in Python, and to use this representation to automatically instrument the code with predefined inspections based on a lightweight annotation propagation approach. In contrast to existing work, mlinspect operates on declarative abstractions of popular data science libraries like estimator/transformer pipelines and does not require manual code instrumentation. We discuss the design and implementation of the mlinspect prototype, and give a complex end-to-end example that illustrates its functionality.
Original language | English (US) |
---|---|
State | Published - 2021 |
Event | 11th Annual Conference on Innovative Data Systems Research, CIDR 2021 - Virtual, Online Duration: Jan 11 2021 → Jan 15 2021 |
Conference
Conference | 11th Annual Conference on Innovative Data Systems Research, CIDR 2021 |
---|---|
City | Virtual, Online |
Period | 1/11/21 → 1/15/21 |
ASJC Scopus subject areas
- Artificial Intelligence
- Information Systems
- Information Systems and Management
- Hardware and Architecture