TY - JOUR
T1 - MLINSPECT
T2 - 2021 International Conference on Management of Data, SIGMOD 2021
AU - Grafberger, Stefan
AU - Guha, Shubha
AU - Stoyanovich, Julia
AU - Schelter, Sebastian
N1 - Funding Information:
Acknowledgements. This work was supported in part by Ahold Delhaize, and by NSF Grants No. 1926250, 1934464 and 1922658. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.
Publisher Copyright:
© 2021 ACM.
PY - 2021
Y1 - 2021
N2 - Machine Learning (ML) is increasingly used to automate impactful decisions, and the risks arising from this wide-spread use are garnering attention from policymakers, scientists, and the media. ML applications are often very brittle with respect to their input data, which leads to concerns about their reliability, accountability, and fairness. While bias detection cannot be fully automated, computational tools can help pinpoint particular types of data issues. We recently proposed mlinspect, a library that enables lightweight lineage-based inspection of ML preprocessing pipelines. In this demonstration, we show how mlinspect can be used to detect data distribution bugs in a representative pipeline. In contrast to existing work, mlinspect operates on declarative abstractions of popular data science libraries like estimator/transformer pipelines, can handle both relational and matrix data, and does not require manual code instrumentation. The library is publicly available at https://github.com/stefan-grafberger/mlinspect.
AB - Machine Learning (ML) is increasingly used to automate impactful decisions, and the risks arising from this wide-spread use are garnering attention from policymakers, scientists, and the media. ML applications are often very brittle with respect to their input data, which leads to concerns about their reliability, accountability, and fairness. While bias detection cannot be fully automated, computational tools can help pinpoint particular types of data issues. We recently proposed mlinspect, a library that enables lightweight lineage-based inspection of ML preprocessing pipelines. In this demonstration, we show how mlinspect can be used to detect data distribution bugs in a representative pipeline. In contrast to existing work, mlinspect operates on declarative abstractions of popular data science libraries like estimator/transformer pipelines, can handle both relational and matrix data, and does not require manual code instrumentation. The library is publicly available at https://github.com/stefan-grafberger/mlinspect.
KW - data distribution debugging
KW - machine learning pipelines
KW - responsible data science
KW - technical bias
UR - http://www.scopus.com/inward/record.url?scp=85108956175&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108956175&partnerID=8YFLogxK
U2 - 10.1145/3448016.3452759
DO - 10.1145/3448016.3452759
M3 - Conference article
AN - SCOPUS:85108956175
SN - 0730-8078
SP - 2736
EP - 2739
JO - Proceedings of the ACM SIGMOD International Conference on Management of Data
JF - Proceedings of the ACM SIGMOD International Conference on Management of Data
Y2 - 20 June 2021 through 25 June 2021
ER -