TY - JOUR
T1 - Stochastic variational deep kernel learning
AU - Wilson, Andrew Gordon
AU - Hu, Zhiting
AU - Salakhutdinov, Ruslan
AU - Xing, Eric P.
N1 - Funding Information:
We thank NSF IIS-1563887, ONR N000141410684, N000141310721, N000141512791, and ADeLAIDE FA8750-16C-0130-001 grants.
Publisher Copyright:
© 2016 NIPS Foundation - All Rights Reserved.
PY - 2016
Y1 - 2016
N2 - Deep kernel learning combines the non-parametric flexibility of kernel methods with the inductive biases of deep learning architectures. We propose a novel deep kernel learning model and stochastic variational inference procedure which generalizes deep kernel learning approaches to enable classification, multi-task learning, additive covariance structures, and stochastic gradient training. Specifically, we apply additive base kernels to subsets of output features from deep neural architectures, and jointly learn the parameters of the base kernels and deep network through a Gaussian process marginal likelihood objective. Within this framework, we derive an efficient form of stochastic variational inference which leverages local kernel interpolation, inducing points, and structure exploiting algebra. We show improved performance over stand alone deep networks, SVMs, and state of the art scalable Gaussian processes on several classification benchmarks, including an airline delay dataset containing 6 million training points, CIFAR, and ImageNet.
AB - Deep kernel learning combines the non-parametric flexibility of kernel methods with the inductive biases of deep learning architectures. We propose a novel deep kernel learning model and stochastic variational inference procedure which generalizes deep kernel learning approaches to enable classification, multi-task learning, additive covariance structures, and stochastic gradient training. Specifically, we apply additive base kernels to subsets of output features from deep neural architectures, and jointly learn the parameters of the base kernels and deep network through a Gaussian process marginal likelihood objective. Within this framework, we derive an efficient form of stochastic variational inference which leverages local kernel interpolation, inducing points, and structure exploiting algebra. We show improved performance over stand alone deep networks, SVMs, and state of the art scalable Gaussian processes on several classification benchmarks, including an airline delay dataset containing 6 million training points, CIFAR, and ImageNet.
UR - http://www.scopus.com/inward/record.url?scp=85018930816&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85018930816&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85018930816
SN - 1049-5258
SP - 2594
EP - 2602
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 30th Annual Conference on Neural Information Processing Systems, NIPS 2016
Y2 - 5 December 2016 through 10 December 2016
ER -