TY - JOUR
T1 - Conditional expectation estimation through attributable components
AU - Tabak, Esteban G.
AU - Trigila, Giulio
N1 - Funding Information:
This work was partially supported by grants from the Office of Naval Research and from the NYU-AIG Partnership on Global Resilience.
Publisher Copyright:
© The Author(s) 2018.
PY - 2018/12/11
Y1 - 2018/12/11
N2 - A general methodology is proposed for the explanation of variability in a quantity of interest x in terms of covariates z = (z1, ., zL). It provides the conditional mean x(z) as a sum of components, where each component is represented as a product of non-parametric one-dimensional functions of each covariate zl that are computed through an alternating projection procedure. Both x and the zl can be real or categorical variables; in addition, some or all values of each zl can be unknown, providing a general framework for multi-clustering, classification and covariate imputation in the presence of confounding factors. The procedure can be considered as a preconditioning step for the more general determination of the full conditional distribution ρ(x|z) through a data-driven optimal-transport barycenter problem. In particular, just iterating the procedure once yields the second order structure (i.e. the covariance) of ρ(x|z). The methodology is illustrated through examples that include the explanation of variability of ground temperature across the continental United States and the prediction of book preference among potential readers.
AB - A general methodology is proposed for the explanation of variability in a quantity of interest x in terms of covariates z = (z1, ., zL). It provides the conditional mean x(z) as a sum of components, where each component is represented as a product of non-parametric one-dimensional functions of each covariate zl that are computed through an alternating projection procedure. Both x and the zl can be real or categorical variables; in addition, some or all values of each zl can be unknown, providing a general framework for multi-clustering, classification and covariate imputation in the presence of confounding factors. The procedure can be considered as a preconditioning step for the more general determination of the full conditional distribution ρ(x|z) through a data-driven optimal-transport barycenter problem. In particular, just iterating the procedure once yields the second order structure (i.e. the covariance) of ρ(x|z). The methodology is illustrated through examples that include the explanation of variability of ground temperature across the continental United States and the prediction of book preference among potential readers.
KW - Conditional density estimation
KW - Optimal transport
KW - Principal component analysis
UR - http://www.scopus.com/inward/record.url?scp=85071490499&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85071490499&partnerID=8YFLogxK
U2 - 10.1093/imaiai/iax023
DO - 10.1093/imaiai/iax023
M3 - Article
AN - SCOPUS:85071490499
VL - 7
SP - 727
EP - 754
JO - Information and Inference
JF - Information and Inference
SN - 2049-8772
IS - 4
ER -