TY - JOUR
T1 - Learning Invariant Representations with Missing Data
AU - Goldstein, Mark
AU - Puli, Aahlad
AU - Ranganath, Rajesh
AU - Jacobsen, Jörn Henrik
AU - Chau, Olina
AU - Saporta, Adriel
AU - Miller, Andrew C.
N1 - Funding Information:
The authors thank Scotty Fleming, Joe Futoma, Leon Gatys, Sean Jewell, Tayor Killian, and Guillermo Sapiro for feedback and discussions. This work was in part supported by NIH/NHLBI Award R01HL148248, NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science, and NSF Award 1815633 SHF.
Publisher Copyright:
© 2022 M. Goldstein, A. Puli, R. Ranganath, J.-H. Jacobsen, O. Chau, A. Saporta & A.C. Miller.
PY - 2022
Y1 - 2022
N2 - Spurious correlations allow flexible models to predict well during training but poorly on related test populations. Recent work has shown that models that satisfy particular independencies involving correlation-inducing nuisance variables have guarantees on their test performance. Enforcing such independencies requires nuisances to be observed during training. However, nuisances, such as demographics or image background labels, are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. Here we derive MMD estimators used for invariance objectives under missing nuisances. On simulations and clinical data, optimizing through these estimates achieves test performance similar to using estimators that make use of the full data.
AB - Spurious correlations allow flexible models to predict well during training but poorly on related test populations. Recent work has shown that models that satisfy particular independencies involving correlation-inducing nuisance variables have guarantees on their test performance. Enforcing such independencies requires nuisances to be observed during training. However, nuisances, such as demographics or image background labels, are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. Here we derive MMD estimators used for invariance objectives under missing nuisances. On simulations and clinical data, optimizing through these estimates achieves test performance similar to using estimators that make use of the full data.
KW - doubly robust estimator
KW - invariant representations
KW - missing data
KW - MMD
UR - http://www.scopus.com/inward/record.url?scp=85142874244&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142874244&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85142874244
SN - 2640-3498
VL - 177
SP - 290
EP - 301
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 1st Conference on Causal Learning and Reasoning, CLeaR 2022
Y2 - 11 April 2022 through 13 April 2022
ER -