TY - JOUR
T1 - Fast Direct Methods for Gaussian Processes
AU - Ambikasaran, Sivaram
AU - Foreman-Mackey, Daniel
AU - Greengard, Leslie
AU - Hogg, David W.
AU - O'Neil, Michael
N1 - Funding Information:
The authors would like to thank Iain Murray for several useful and detailed discussions. Research by S. Ambikasaran and M. O'Neil was supported in part by the Air Force Office of Scientific Research under NSSEFF Program Award FA9550-10-1-0180 and AIG-NYU Award #A15-0098-001. Research by L. Greengard was supported in part by the Simons Foundation and the Air Force Office of Scientific Research under NSSEFF Program Award FA9550-10-1-0180. Research by D. Foreman-Mackey and D. W. Hogg was supported in part by NASA grant NNX12AI50G, US National Science Foundation (NSF) grant IIS-1124794, the Gordon and Betty Moore Foundation, and the Alfred P. Sloan Foundation
Publisher Copyright:
© 1979-2012 IEEE.
PY - 2016/2/1
Y1 - 2016/2/1
N2 - A number of problems in probability and statistics can be addressed using the multivariate normal (Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the n -dimensional setting, however, it requires the inversion of an n × n covariance matrix, C , as well as the evaluation of its determinant, det (C). In many cases, such as regression using Gaussian processes, the covariance matrix is of the form C = σ2 I + K , where K is computed using a specified covariance kernel which depends on the data and additional parameters (hyperparameters). The matrix C is typically dense, causing standard direct methods for inversion and determinant evaluation to require O(n3) work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix C can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an O (n2, n) algorithm for inversion. More importantly, we show that this factorization enables the evaluation of the determinant (C), permitting the direct calculation of probabilities in high dimensions under fairly broad assumptions on the kernel defining K. Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with high-performance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels.
AB - A number of problems in probability and statistics can be addressed using the multivariate normal (Gaussian) distribution. In the one-dimensional case, computing the probability for a given mean and variance simply requires the evaluation of the corresponding Gaussian density. In the n -dimensional setting, however, it requires the inversion of an n × n covariance matrix, C , as well as the evaluation of its determinant, det (C). In many cases, such as regression using Gaussian processes, the covariance matrix is of the form C = σ2 I + K , where K is computed using a specified covariance kernel which depends on the data and additional parameters (hyperparameters). The matrix C is typically dense, causing standard direct methods for inversion and determinant evaluation to require O(n3) work. This cost is prohibitive for large-scale modeling. Here, we show that for the most commonly used covariance functions, the matrix C can be hierarchically factored into a product of block low-rank updates of the identity matrix, yielding an O (n2, n) algorithm for inversion. More importantly, we show that this factorization enables the evaluation of the determinant (C), permitting the direct calculation of probabilities in high dimensions under fairly broad assumptions on the kernel defining K. Our fast algorithm brings many problems in marginalization and the adaptation of hyperparameters within practical reach using a single CPU core. The combination of nearly optimal scaling in terms of problem size with high-performance computing resources will permit the modeling of previously intractable problems. We illustrate the performance of the scheme on standard covariance kernels.
KW - Bayesian analysis
KW - Covariance matrix
KW - Ction
KW - Determinant
KW - Direct solver
KW - Fast multipole method
KW - Hierarchical off-diagonal low-rank
KW - Likelihood
UR - http://www.scopus.com/inward/record.url?scp=84962053105&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84962053105&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2015.2448083
DO - 10.1109/TPAMI.2015.2448083
M3 - Article
AN - SCOPUS:84962053105
VL - 38
SP - 252
EP - 265
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
SN - 0162-8828
IS - 2
M1 - 7130620
ER -