Abstract
A general methodology is proposed for the explanation of variability in a quantity of interest x in terms of covariates z = (z1, ., zL). It provides the conditional mean x(z) as a sum of components, where each component is represented as a product of non-parametric one-dimensional functions of each covariate zl that are computed through an alternating projection procedure. Both x and the zl can be real or categorical variables; in addition, some or all values of each zl can be unknown, providing a general framework for multi-clustering, classification and covariate imputation in the presence of confounding factors. The procedure can be considered as a preconditioning step for the more general determination of the full conditional distribution ρ(x|z) through a data-driven optimal-transport barycenter problem. In particular, just iterating the procedure once yields the second order structure (i.e. the covariance) of ρ(x|z). The methodology is illustrated through examples that include the explanation of variability of ground temperature across the continental United States and the prediction of book preference among potential readers.
Original language | English (US) |
---|---|
Pages (from-to) | 727-754 |
Number of pages | 28 |
Journal | Information and Inference |
Volume | 7 |
Issue number | 4 |
DOIs | |
State | Published - Dec 11 2018 |
Keywords
- Conditional density estimation
- Optimal transport
- Principal component analysis
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Analysis
- Applied Mathematics
- Statistics and Probability
- Numerical Analysis