TY - JOUR
T1 - Weighting the United States All of Us Research Program data to known population estimates using raking
AU - Wang, Vivian Hsing Chun
AU - Lei, Jingwen
AU - Shi, Tingjia
AU - Pagán, José A.
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2024/7
Y1 - 2024/7
N2 - Background: The All of Us Research Program aims to collect longitudinal health-related data from a million individuals in the United States. An inherent challenge of a non-probability sampling strategy through voluntary participation in All of Us is that findings may not be nationally representative for addressing health and health care at the population level. We generated survey weights for the All of Us data that can be used to address the challenge. Research design: We developed raked weights using demographic, health, and socioeconomic variables available in both the 2020 National Health Interview Survey (NHIS) and All of Us. We then compared the unweighted and weighted prevalence of a set of health-related variables (health behaviors, health conditions, and health insurance coverage) estimated from All of Us data with the weighted prevalence estimates obtained from NHIS data. Subjects: The sample included 100,391 All of Us participants 18 years of age and older with complete data collected between May 2017 and January 2022 across the United States. Results: Final variables in the raking procedure included age, sex, race/ethnicity, region of residence, annual household income, and home ownership. The mean percentage difference between known proportions obtained from the NHIS and All of Us was reduced by 18.89% for health-related variables after applying the raked weights. Conclusions: Raking improved the comparability of prevalence estimates obtained from All of Us to known national prevalence estimates. Refining the process of variable selection for raking may further improve the comparability between All of Us and nationally representative data.
AB - Background: The All of Us Research Program aims to collect longitudinal health-related data from a million individuals in the United States. An inherent challenge of a non-probability sampling strategy through voluntary participation in All of Us is that findings may not be nationally representative for addressing health and health care at the population level. We generated survey weights for the All of Us data that can be used to address the challenge. Research design: We developed raked weights using demographic, health, and socioeconomic variables available in both the 2020 National Health Interview Survey (NHIS) and All of Us. We then compared the unweighted and weighted prevalence of a set of health-related variables (health behaviors, health conditions, and health insurance coverage) estimated from All of Us data with the weighted prevalence estimates obtained from NHIS data. Subjects: The sample included 100,391 All of Us participants 18 years of age and older with complete data collected between May 2017 and January 2022 across the United States. Results: Final variables in the raking procedure included age, sex, race/ethnicity, region of residence, annual household income, and home ownership. The mean percentage difference between known proportions obtained from the NHIS and All of Us was reduced by 18.89% for health-related variables after applying the raked weights. Conclusions: Raking improved the comparability of prevalence estimates obtained from All of Us to known national prevalence estimates. Refining the process of variable selection for raking may further improve the comparability between All of Us and nationally representative data.
KW - All of Us
KW - Health equity
KW - Health services research
KW - Population health
KW - Raking
UR - http://www.scopus.com/inward/record.url?scp=85195553730&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85195553730&partnerID=8YFLogxK
U2 - 10.1016/j.pmedr.2024.102795
DO - 10.1016/j.pmedr.2024.102795
M3 - Article
AN - SCOPUS:85195553730
SN - 2211-3355
VL - 43
JO - Preventive Medicine Reports
JF - Preventive Medicine Reports
M1 - 102795
ER -