Learning Invariant Representations with Missing Data

Mark Goldstein, Aahlad Puli, Rajesh Ranganath, Jörn Henrik Jacobsen, Olina Chau, Adriel Saporta, Andrew C. Miller

Research output: Contribution to journalConference articlepeer-review


Spurious correlations allow flexible models to predict well during training but poorly on related test populations. Recent work has shown that models that satisfy particular independencies involving correlation-inducing nuisance variables have guarantees on their test performance. Enforcing such independencies requires nuisances to be observed during training. However, nuisances, such as demographics or image background labels, are often missing. Enforcing independence on just the observed data does not imply independence on the entire population. Here we derive MMD estimators used for invariance objectives under missing nuisances. On simulations and clinical data, optimizing through these estimates achieves test performance similar to using estimators that make use of the full data.

Original languageEnglish (US)
Pages (from-to)290-301
Number of pages12
JournalProceedings of Machine Learning Research
StatePublished - 2022
Event1st Conference on Causal Learning and Reasoning, CLeaR 2022 - Eureka, United States
Duration: Apr 11 2022Apr 13 2022


  • doubly robust estimator
  • invariant representations
  • missing data
  • MMD

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability


Dive into the research topics of 'Learning Invariant Representations with Missing Data'. Together they form a unique fingerprint.

Cite this