TY - JOUR
T1 - Detecting incidental correlation in multimodal learning via latent variable modeling
AU - Makino, Taro
AU - Wang, Yixin
AU - Geras, Krzysztof J.
AU - Cho, Kyunghyun
N1 - Publisher Copyright:
© 2023, Transactions on Machine Learning Research. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Multimodal neural networks often fail to utilize all modalities. They subsequently gener-alize worse than their unimodal counterparts, or make predictions that only depend on a subset of modalities. We refer to this problem as modality underutilization. Existing work has addressed this issue by ensuring that there are no systematic biases in dataset creation, or that our neural network architectures and optimization algorithms are capable of learning modality interactions. We demonstrate that even when these favorable conditions are met, modality underutilization can still occur in the small data regime. To explain this phenomenon, we put forth a concept that we call incidental correlation. It is a spurious correlation that emerges in small datasets, despite not being a part of the underlying data generating process (DGP). We develop our argument using a DGP under which multimodal neural networks must utilize all modalities, since all paths between the inputs and target are causal. This represents an idealized scenario that often fails to materialize. Instead, due to incidental correlation, small datasets sampled from this DGP have higher likelihood under an alternative DGP with spurious paths between the inputs and target. Multimodal neural networks that use these spurious paths for prediction fail to utilize all modalities. Given its harmful effects, we propose to detect incidental correlation via latent variable modeling. We specify an identifiable variational autoencoder such that the latent posterior encodes the spurious correlations between the inputs and target. This allows us to interpret the Kullback-Leibler divergence between the latent posterior and prior as the severity of incidental correlation. We use an ablation study to show that identifiability is important in this context, since we derive our conclusions from the latent posterior. Using experiments with synthetic data, as well as with VQA v2.0 and NLVR2, we demonstrate that incidental correlation emerges in the small data regime, and leads to modality underutilization. Prac-titioners of multimodal learning can use our method to detect whether incidental correlation is present in their datasets, and determine whether they should collect additional data.
AB - Multimodal neural networks often fail to utilize all modalities. They subsequently gener-alize worse than their unimodal counterparts, or make predictions that only depend on a subset of modalities. We refer to this problem as modality underutilization. Existing work has addressed this issue by ensuring that there are no systematic biases in dataset creation, or that our neural network architectures and optimization algorithms are capable of learning modality interactions. We demonstrate that even when these favorable conditions are met, modality underutilization can still occur in the small data regime. To explain this phenomenon, we put forth a concept that we call incidental correlation. It is a spurious correlation that emerges in small datasets, despite not being a part of the underlying data generating process (DGP). We develop our argument using a DGP under which multimodal neural networks must utilize all modalities, since all paths between the inputs and target are causal. This represents an idealized scenario that often fails to materialize. Instead, due to incidental correlation, small datasets sampled from this DGP have higher likelihood under an alternative DGP with spurious paths between the inputs and target. Multimodal neural networks that use these spurious paths for prediction fail to utilize all modalities. Given its harmful effects, we propose to detect incidental correlation via latent variable modeling. We specify an identifiable variational autoencoder such that the latent posterior encodes the spurious correlations between the inputs and target. This allows us to interpret the Kullback-Leibler divergence between the latent posterior and prior as the severity of incidental correlation. We use an ablation study to show that identifiability is important in this context, since we derive our conclusions from the latent posterior. Using experiments with synthetic data, as well as with VQA v2.0 and NLVR2, we demonstrate that incidental correlation emerges in the small data regime, and leads to modality underutilization. Prac-titioners of multimodal learning can use our method to detect whether incidental correlation is present in their datasets, and determine whether they should collect additional data.
UR - http://www.scopus.com/inward/record.url?scp=86000073925&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=86000073925&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:86000073925
SN - 2835-8856
VL - 2023
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -