Using Pipeline Performance Prediction to Accelerate AutoML Systems

Haoxiang Zhang, Roque López, Aécio Santos, Jorge Piazentin Ono, Aline Bessa, Juliana Freire

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic machine learning (AutoML) systems aim to automate the synthesis of machine learning (ML) pipelines. An important challenge these systems face is how to efficiently search a large space of candidate pipelines. Several strategies have been proposed to navigate and prune the search space, from the use of grammars to deep learning models. However, regardless of the strategy used, a major overhead lies in the evaluation step: for each synthesized pipeline p, these systems must both train and test p to guide the search and to identify the best pipelines. Given a time budget and computing resources, the evaluation cost limits how much of the search space can be explored. As a result, these systems may miss good pipelines. We propose ML4ML, an approach that aims to reduce the evaluation overhead for AutoML systems. ML4ML leverages the provenance of prior pipeline runs to predict performance without having to re-train and test the pipelines. We present results of an experimental evaluation which demonstrates that not only can ML4ML build a reliable predictive model with low mean absolute error, but the integration of this model with AutoML systems leads to substantial speedups, enabling the systems to explore a larger number of pipelines and primitive combinations and derive pipelines at a much lower cost.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th Workshop on Data Management for End-To-End Machine Learning, DEEM 2023
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400702044
DOIs
StatePublished - Jun 18 2023
Event7th Workshop on Data Management for End-To-End Machine Learning, DEEM 2023 - Seattle, United States
Duration: Jun 18 2023 → …

Publication series

NameProceedings of the 7th Workshop on Data Management for End-To-End Machine Learning, DEEM 2023

Conference

Conference7th Workshop on Data Management for End-To-End Machine Learning, DEEM 2023
Country/TerritoryUnited States
CitySeattle
Period6/18/23 → …

ASJC Scopus subject areas

  • Hardware and Architecture
  • Human-Computer Interaction
  • Sociology and Political Science

Fingerprint

Dive into the research topics of 'Using Pipeline Performance Prediction to Accelerate AutoML Systems'. Together they form a unique fingerprint.

Cite this