TY - JOUR

T1 - Conditional expectation estimation through attributable components

AU - Tabak, Esteban G.

AU - Trigila, Giulio

N1 - Funding Information:
This work was partially supported by grants from the Office of Naval Research and from the NYU-AIG Partnership on Global Resilience.
Publisher Copyright:
© The Author(s) 2018.

PY - 2018/12/11

Y1 - 2018/12/11

N2 - A general methodology is proposed for the explanation of variability in a quantity of interest x in terms of covariates z = (z1, ., zL). It provides the conditional mean x(z) as a sum of components, where each component is represented as a product of non-parametric one-dimensional functions of each covariate zl that are computed through an alternating projection procedure. Both x and the zl can be real or categorical variables; in addition, some or all values of each zl can be unknown, providing a general framework for multi-clustering, classification and covariate imputation in the presence of confounding factors. The procedure can be considered as a preconditioning step for the more general determination of the full conditional distribution ρ(x|z) through a data-driven optimal-transport barycenter problem. In particular, just iterating the procedure once yields the second order structure (i.e. the covariance) of ρ(x|z). The methodology is illustrated through examples that include the explanation of variability of ground temperature across the continental United States and the prediction of book preference among potential readers.

AB - A general methodology is proposed for the explanation of variability in a quantity of interest x in terms of covariates z = (z1, ., zL). It provides the conditional mean x(z) as a sum of components, where each component is represented as a product of non-parametric one-dimensional functions of each covariate zl that are computed through an alternating projection procedure. Both x and the zl can be real or categorical variables; in addition, some or all values of each zl can be unknown, providing a general framework for multi-clustering, classification and covariate imputation in the presence of confounding factors. The procedure can be considered as a preconditioning step for the more general determination of the full conditional distribution ρ(x|z) through a data-driven optimal-transport barycenter problem. In particular, just iterating the procedure once yields the second order structure (i.e. the covariance) of ρ(x|z). The methodology is illustrated through examples that include the explanation of variability of ground temperature across the continental United States and the prediction of book preference among potential readers.

KW - Conditional density estimation

KW - Optimal transport

KW - Principal component analysis

UR - http://www.scopus.com/inward/record.url?scp=85071490499&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071490499&partnerID=8YFLogxK

U2 - 10.1093/imaiai/iax023

DO - 10.1093/imaiai/iax023

M3 - Article

AN - SCOPUS:85071490499

VL - 7

SP - 727

EP - 754

JO - Information and Inference

JF - Information and Inference

SN - 2049-8772

IS - 4

ER -