TY - JOUR
T1 - Learning scalable deep kernels with recurrent structure
AU - Al-Shedivat, Maruan
AU - Wilson, Andrew Gordon
AU - Saatchi, Yunus
AU - Hu, Zhiting
AU - Xing, Eric P.
N1 - Funding Information:
The authors thank Yifei Ma for helpful discussions and the anonymous reviewers for the valuable comments that helped to improve the paper. This work was supported in part by NIH R01GM114311, AFRL/DARPA FA87501220324, and NSF IIS-1563887.
Publisher Copyright:
© 2017 Maruan Al-Shedivat, Andrew Gordon Wilson, Yunus Saatchi, Zhiting Hu and Eric P. Xing.
PY - 2017/8/1
Y1 - 2017/8/1
N2 - Many applications in speech, robotics, finance, and biology deal with sequential data, where ordering matters and recurrent structures are common. However, this structure cannot be easily captured by standard kernel functions. To model such structure, we propose expressive closed-form kernel functions for Gaussian processes. The resulting model, GP-LSTM, fully encapsulates the inductive biases of long short-term memory (LSTM) recurrent networks, while retaining the non-parametric probabilistic advantages of Gaussian processes. We learn the properties of the proposed kernels by optimizing the Gaussian process marginal likelihood using a new provably convergent semi-stochastic gradient procedure, and exploit the structure of these kernels for scalable training and prediction. This approach provides a practical representation for Bayesian LSTMs. We demonstrate state-of-the-art performance on several benchmarks, and thoroughly investigate a consequential autonomous driving application, where the predictive uncertainties provided by GP-LSTM are uniquely valuable.
AB - Many applications in speech, robotics, finance, and biology deal with sequential data, where ordering matters and recurrent structures are common. However, this structure cannot be easily captured by standard kernel functions. To model such structure, we propose expressive closed-form kernel functions for Gaussian processes. The resulting model, GP-LSTM, fully encapsulates the inductive biases of long short-term memory (LSTM) recurrent networks, while retaining the non-parametric probabilistic advantages of Gaussian processes. We learn the properties of the proposed kernels by optimizing the Gaussian process marginal likelihood using a new provably convergent semi-stochastic gradient procedure, and exploit the structure of these kernels for scalable training and prediction. This approach provides a practical representation for Bayesian LSTMs. We demonstrate state-of-the-art performance on several benchmarks, and thoroughly investigate a consequential autonomous driving application, where the predictive uncertainties provided by GP-LSTM are uniquely valuable.
UR - http://www.scopus.com/inward/record.url?scp=85030250501&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85030250501&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85030250501
SN - 1532-4435
VL - 18
SP - 1
EP - 37
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
ER -