TY - JOUR
T1 - Bayesian and likelihood inference for 2 × 2 ecological tables
T2 - An incomplete-data approach
AU - Imai, Kosuke
AU - Lu, Ying
AU - Strauss, Aaron
N1 - Funding Information:
National Science Foundation (SES–0550873); Princeton University Committee on Research in the Humanities and Social Sciences.
PY - 2008/12
Y1 - 2008/12
N2 - Ecological inference is a statistical problem where aggregate-level data are used to make inferences about individual-level behavior. In this article, we conduct a theoretical and empirical study of Bayesian and likelihood inference for 2 × 2 ecological tables by applying the general statistical framework of incomplete data. We first show that the ecological inference problem can be decomposed into three factors: distributional effects, which address the possible misspecification of parametric modeling assumptions about the unknown distribution of missing data; contextual effects, which represent the possible correlation between missing data and observed variables; and aggregation effects, which are directly related to the loss of information caused by data aggregation. We then examine how these three factors affect inference and offer new statistical methods to address each of them. To deal with distributional effects, we propose a nonparametric Bayesian model based on a Dirichlet process prior, which relaxes common parametric assumptions. We also identify the statistical adjustments necessary to account for contextual effects. Finally, although little can be done to cope with aggregation effects, we offer a method to quantify the magnitude of such effects in order to formally assess its severity. We use simulated and real data sets to empirically investigate the consequences of these three factors and to evaluate the performance of our proposed methods. C code, along with an easy-to-use R interface, is publicly available for implementing our proposed methods (Imai, Lu, and Strauss, forthcoming).
AB - Ecological inference is a statistical problem where aggregate-level data are used to make inferences about individual-level behavior. In this article, we conduct a theoretical and empirical study of Bayesian and likelihood inference for 2 × 2 ecological tables by applying the general statistical framework of incomplete data. We first show that the ecological inference problem can be decomposed into three factors: distributional effects, which address the possible misspecification of parametric modeling assumptions about the unknown distribution of missing data; contextual effects, which represent the possible correlation between missing data and observed variables; and aggregation effects, which are directly related to the loss of information caused by data aggregation. We then examine how these three factors affect inference and offer new statistical methods to address each of them. To deal with distributional effects, we propose a nonparametric Bayesian model based on a Dirichlet process prior, which relaxes common parametric assumptions. We also identify the statistical adjustments necessary to account for contextual effects. Finally, although little can be done to cope with aggregation effects, we offer a method to quantify the magnitude of such effects in order to formally assess its severity. We use simulated and real data sets to empirically investigate the consequences of these three factors and to evaluate the performance of our proposed methods. C code, along with an easy-to-use R interface, is publicly available for implementing our proposed methods (Imai, Lu, and Strauss, forthcoming).
UR - http://www.scopus.com/inward/record.url?scp=42549128258&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=42549128258&partnerID=8YFLogxK
U2 - 10.1093/pan/mpm017
DO - 10.1093/pan/mpm017
M3 - Article
AN - SCOPUS:42549128258
SN - 1047-1987
VL - 16
SP - 41
EP - 69
JO - Political Analysis
JF - Political Analysis
IS - 1
ER -