TY - JOUR
T1 - PipelineProfiler
T2 - A Visual Analytics Tool for the Exploration of AutoML Pipelines
AU - Ono, Jorge Piazentin
AU - Castelo, Sonia
AU - Lopez, Roque
AU - Bertini, Enrico
AU - Freire, Juliana
AU - Silva, Claudio
N1 - Funding Information:
This work was partially supported by the DARPA D3M program and NSF awards CNS-1229185, CCF-1533564, CNS-1544753, CNS-1730396, and CNS-1828576. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF and DARPA.
Publisher Copyright:
© 2020 IEEE.
PY - 2021/2
Y1 - 2021/2
N2 - In recent years, a wide variety of automated machine learning (AutoML) methods have been proposed to generate end-to-end ML pipelines. While these techniques facilitate the creation of models, given their black-box nature, the complexity of the underlying algorithms, and the large number of pipelines they derive, they are difficult for developers to debug. It is also challenging for machine learning experts to select an AutoML system that is well suited for a given problem. In this paper, we present the Pipeline Profiler, an interactive visualization tool that allows the exploration and comparison of the solution space of machine learning (ML) pipelines produced by AutoML systems. PipelineProfiler is integrated with Jupyter Notebook and can be combined with common data science tools to enable a rich set of analyses of the ML pipelines, providing users a better understanding of the algorithms that generated them as well as insights into how they can be improved. We demonstrate the utility of our tool through use cases where PipelineProfiler is used to better understand and improve a real-world AutoML system. Furthermore, we validate our approach by presenting a detailed analysis of a think-aloud experiment with six data scientists who develop and evaluate AutoML tools.
AB - In recent years, a wide variety of automated machine learning (AutoML) methods have been proposed to generate end-to-end ML pipelines. While these techniques facilitate the creation of models, given their black-box nature, the complexity of the underlying algorithms, and the large number of pipelines they derive, they are difficult for developers to debug. It is also challenging for machine learning experts to select an AutoML system that is well suited for a given problem. In this paper, we present the Pipeline Profiler, an interactive visualization tool that allows the exploration and comparison of the solution space of machine learning (ML) pipelines produced by AutoML systems. PipelineProfiler is integrated with Jupyter Notebook and can be combined with common data science tools to enable a rich set of analyses of the ML pipelines, providing users a better understanding of the algorithms that generated them as well as insights into how they can be improved. We demonstrate the utility of our tool through use cases where PipelineProfiler is used to better understand and improve a real-world AutoML system. Furthermore, we validate our approach by presenting a detailed analysis of a think-aloud experiment with six data scientists who develop and evaluate AutoML tools.
KW - Automatic Machine Learning
KW - Model Evaluation
KW - Pipeline Visualization
UR - http://www.scopus.com/inward/record.url?scp=85100444099&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100444099&partnerID=8YFLogxK
U2 - 10.1109/TVCG.2020.3030361
DO - 10.1109/TVCG.2020.3030361
M3 - Article
C2 - 33048694
AN - SCOPUS:85100444099
SN - 1077-2626
VL - 27
SP - 390
EP - 400
JO - IEEE Transactions on Visualization and Computer Graphics
JF - IEEE Transactions on Visualization and Computer Graphics
IS - 2
M1 - 9222086
ER -