TY - JOUR
T1 - Kernel alignment risk estimator
T2 - 34th Conference on Neural Information Processing Systems, NeurIPS 2020
AU - Jacot, Arthur
AU - Simsek, Berfin
AU - Spadaro, Francesco
AU - Hongler, Clément
AU - Gabriel, Franck
N1 - Funding Information:
The authors wish to thank A. Montanari and M. Wyart for useful discussions. This work is partly supported by the ERC SG CONSTAMIS. C. Hongler acknowledges support from the Blavatnik Family Foundation, the Latsis Foundation, and the the NCCR Swissmap.
Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.
PY - 2020
Y1 - 2020
N2 - We study the risk (i.e. generalization error) of Kernel Ridge Regression (KRR) for a kernel K with ridge ? > 0 and i.i.d. observations. For this, we introduce two objects: the Signal Capture Threshold (SCT) and the Kernel Alignment Risk Estimator (KARE). The SCT ?K,? is a function of the data distribution: it can be used to identify the components of the data that the KRR predictor captures, and to approximate the (expected) KRR risk. This then leads to a KRR risk approximation by the KARE ?K,?, an explicit function of the training data, agnostic of the true data distribution. We phrase the regression problem in a functional setting. The key results then follow from a finite-size analysis of the Stieltjes transform of general Wishart random matrices. Under a natural universality assumption (that the KRR moments depend asymptotically on the first two moments of the observations) we capture the mean and variance of the KRR predictor. We numerically investigate our findings on the Higgs and MNIST datasets for various classical kernels: the KARE gives an excellent approximation of the risk, thus supporting our universality assumption. Using the KARE, one can compare choices of Kernels and hyperparameters directly from the training set. The KARE thus provides a promising data-dependent procedure to select Kernels that generalize well.
AB - We study the risk (i.e. generalization error) of Kernel Ridge Regression (KRR) for a kernel K with ridge ? > 0 and i.i.d. observations. For this, we introduce two objects: the Signal Capture Threshold (SCT) and the Kernel Alignment Risk Estimator (KARE). The SCT ?K,? is a function of the data distribution: it can be used to identify the components of the data that the KRR predictor captures, and to approximate the (expected) KRR risk. This then leads to a KRR risk approximation by the KARE ?K,?, an explicit function of the training data, agnostic of the true data distribution. We phrase the regression problem in a functional setting. The key results then follow from a finite-size analysis of the Stieltjes transform of general Wishart random matrices. Under a natural universality assumption (that the KRR moments depend asymptotically on the first two moments of the observations) we capture the mean and variance of the KRR predictor. We numerically investigate our findings on the Higgs and MNIST datasets for various classical kernels: the KARE gives an excellent approximation of the risk, thus supporting our universality assumption. Using the KARE, one can compare choices of Kernels and hyperparameters directly from the training set. The KARE thus provides a promising data-dependent procedure to select Kernels that generalize well.
UR - http://www.scopus.com/inward/record.url?scp=85108391232&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108391232&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85108391232
SN - 1049-5258
VL - 2020-December
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 6 December 2020 through 12 December 2020
ER -