TY - JOUR
T1 - Local linear regression on manifolds and its geometric interpretation
AU - Cheng, Ming Yen
AU - Wu, Hau Tieng
N1 - Funding Information:
Ming-Yen Cheng is Professor, Department of Mathematics, National Taiwan University, Taipei 106, Taiwan (E-mail: [email protected]). Hau-Tieng Wu is Postdoctoral Research Associate, Department of Statistics, University of California at Berkeley, California, USA (E-mail: [email protected]). Cheng’s research was supported in part by the National Science Council grants NSC97-2118-M-002-001-MY3 and NSC101-2118-M-002-001-MY3, and the Mathematics Division, National Center of Theoretical Sciences (Taipei Office). Wu’s research was supported by AFOSR grant FA9550-09-1-0551, NSF grant CCF-0939370, and FRG grant DMS-1160319. The authors thank Professor Peter Bickel and Professor Toshio Honda for instructive comments, and the associate editor and one reviewer for their constructive comments.
PY - 2013
Y1 - 2013
N2 - High-dimensional data analysis has been an active area, and the main focus areas have been variable selection and dimension reduction. In practice, it occurs often that the variables are located on an unknown, lower-dimensional nonlinear manifold. Under this manifold assumption, one purpose of this article is regression and gradient estimation on the manifold, and another is developing a new tool for manifold learning. As regards the first aim, we suggest directly reducing the dimensionality to the intrinsic dimension d of the manifold, and performing the popular local linear regression (LLR) on a tangent plane estimate. An immediate consequence is a dramatic reduction in the computational time when the ambient space dimension p ≫ d. We provide rigorous theoretical justification of the convergence of the proposed regression and gradient estimators by carefully analyzing the curvature, boundary, and nonuniform sampling effects. We propose a bandwidth selector that can handle heteroscedastic errors.With reference to the second aim, we analyze carefully the asymptotic behavior of our regression estimator both in the interior and near the boundary of the manifold, and make explicit its relationship with manifold learning, in particular estimating the Laplace-Beltrami operator of the manifold. In this context, we also make clear that it is important to use a smaller bandwidth in the tangent plane estimation than in the LLR. A simulation study and applications to the Isomap face data and a clinically computed tomography scan dataset are used to illustrate the computational speed and estimation accuracy of our methods. Supplementary materials for this article are available online.
AB - High-dimensional data analysis has been an active area, and the main focus areas have been variable selection and dimension reduction. In practice, it occurs often that the variables are located on an unknown, lower-dimensional nonlinear manifold. Under this manifold assumption, one purpose of this article is regression and gradient estimation on the manifold, and another is developing a new tool for manifold learning. As regards the first aim, we suggest directly reducing the dimensionality to the intrinsic dimension d of the manifold, and performing the popular local linear regression (LLR) on a tangent plane estimate. An immediate consequence is a dramatic reduction in the computational time when the ambient space dimension p ≫ d. We provide rigorous theoretical justification of the convergence of the proposed regression and gradient estimators by carefully analyzing the curvature, boundary, and nonuniform sampling effects. We propose a bandwidth selector that can handle heteroscedastic errors.With reference to the second aim, we analyze carefully the asymptotic behavior of our regression estimator both in the interior and near the boundary of the manifold, and make explicit its relationship with manifold learning, in particular estimating the Laplace-Beltrami operator of the manifold. In this context, we also make clear that it is important to use a smaller bandwidth in the tangent plane estimation than in the LLR. A simulation study and applications to the Isomap face data and a clinically computed tomography scan dataset are used to illustrate the computational speed and estimation accuracy of our methods. Supplementary materials for this article are available online.
KW - Diffusion map
KW - Dimension reduction
KW - High-dimensional data
KW - Manifold learning
KW - Nonparametric regression
UR - http://www.scopus.com/inward/record.url?scp=84901773883&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84901773883&partnerID=8YFLogxK
U2 - 10.1080/01621459.2013.827984
DO - 10.1080/01621459.2013.827984
M3 - Article
AN - SCOPUS:84901773883
SN - 0162-1459
VL - 108
SP - 1421
EP - 1434
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 504
ER -