TY - CONF
T1 - Practical multi-fidelity Bayesian optimization for hyperparameter tuning
AU - Wu, Jian
AU - Toscano-Palmerin, Saul
AU - Frazier, Peter I.
AU - Wilson, Andrew Gordon
N1 - Funding Information:
PIF was supported by NSF CAREER CMMI-1254298, NSF CMMI-1536895, and AFOSR FA9550-15-1-0038. AGW was supported by NSF IIS-1563887, an Amazon Research Award and a Facebook Research Award.
Publisher Copyright:
© 2019 Air and Waste Management Association. All rights reserved.
PY - 2019
Y1 - 2019
N2 - Bayesian optimization is popular for optimizing time-consuming black-box objectives. Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck. Multi-fidelity optimization promises relief using cheaper proxies to such objectives — for example, validation error for a network trained using a subset of the training points or fewer iterations than required for convergence. We propose a highly flexible and practical approach to multi-fidelity Bayesian optimization, focused on efficiently optimizing hyperparameters for iteratively trained supervised learning models. We introduce a new acquisition function, the trace-aware knowledge-gradient, which efficiently leverages both multiple continuous fidelity controls and trace observations — values of the objective at a sequence of fidelities, available when varying fidelity using training iterations. We provide a provably convergent method for optimizing our acquisition function and show it outperforms state-of-the-art alternatives for hyperparameter tuning of deep neural networks and large-scale kernel learning.
AB - Bayesian optimization is popular for optimizing time-consuming black-box objectives. Nonetheless, for hyperparameter tuning in deep neural networks, the time required to evaluate the validation error for even a few hyperparameter settings remains a bottleneck. Multi-fidelity optimization promises relief using cheaper proxies to such objectives — for example, validation error for a network trained using a subset of the training points or fewer iterations than required for convergence. We propose a highly flexible and practical approach to multi-fidelity Bayesian optimization, focused on efficiently optimizing hyperparameters for iteratively trained supervised learning models. We introduce a new acquisition function, the trace-aware knowledge-gradient, which efficiently leverages both multiple continuous fidelity controls and trace observations — values of the objective at a sequence of fidelities, available when varying fidelity using training iterations. We provide a provably convergent method for optimizing our acquisition function and show it outperforms state-of-the-art alternatives for hyperparameter tuning of deep neural networks and large-scale kernel learning.
UR - http://www.scopus.com/inward/record.url?scp=85084011934&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084011934&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85084011934
T2 - 35th Conference on Uncertainty in Artificial Intelligence, UAI 2019
Y2 - 22 July 2019 through 25 July 2019
ER -