TY - JOUR
T1 - Cardiovascular disease risk prediction for people with type 2 diabetes in a population-based cohort and in electronic health record data
AU - Szymonifka, Jackie
AU - Conderino, Sarah
AU - Cigolle, Christine
AU - Ha, Jinkyung
AU - Kabeto, Mohammed
AU - Yu, Jaehong
AU - Dodson, John A.
AU - Thorpe, Lorna
AU - Blaum, Caroline
AU - Zhong, Judy
N1 - Publisher Copyright:
© 2020 The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Objective: Electronic health records (EHRs) have become a common data source for clinical risk prediction, offering large sample sizes and frequently sampled metrics. There may be notable differences between hospital-based EHR and traditional cohort samples: EHR data often are not population-representative random samples, even for particular diseases, as they tend to be sicker with higher healthcare utilization, while cohort studies often sample healthier subjects who typically are more likely to participate. We investigate heterogeneities between EHR- and cohort-based inferences including incidence rates, risk factor identifications/quantifications, and absolute risks. Materials and methods: This is a retrospective cohort study of older patients with type 2 diabetes using EHR from New York University Langone Health ambulatory care (NYULH-EHR, years 2009-2017) and from the Health and Retirement Survey (HRS, 1995-2014) to study subsequent cardiovascular disease (CVD) risks. We used the same eligibility criteria, outcome definitions, and demographic covariates/biomarkers in both datasets. We compared subsequent CVD incidence rates, hazard ratios (HRs) of risk factors, and discrimination/calibration performances of CVD risk scores. Results: The estimated subsequent total CVD incidence rate was 37.5 and 90.6 per 1000 person-years since T2DM onset in HRS and NYULH-EHR respectively. HR estimates were comparable between the datasets for most demographic covariates/biomarkers. Common CVD risk scores underestimated observed total CVD risks in NYULH-EHR. Discussion and conclusion: EHR-estimated HRs of demographic and major clinical risk factors for CVD were mostly consistent with the estimates from a national cohort, despite high incidences and absolute risks of total CVD outcome in the EHR samples.
AB - Objective: Electronic health records (EHRs) have become a common data source for clinical risk prediction, offering large sample sizes and frequently sampled metrics. There may be notable differences between hospital-based EHR and traditional cohort samples: EHR data often are not population-representative random samples, even for particular diseases, as they tend to be sicker with higher healthcare utilization, while cohort studies often sample healthier subjects who typically are more likely to participate. We investigate heterogeneities between EHR- and cohort-based inferences including incidence rates, risk factor identifications/quantifications, and absolute risks. Materials and methods: This is a retrospective cohort study of older patients with type 2 diabetes using EHR from New York University Langone Health ambulatory care (NYULH-EHR, years 2009-2017) and from the Health and Retirement Survey (HRS, 1995-2014) to study subsequent cardiovascular disease (CVD) risks. We used the same eligibility criteria, outcome definitions, and demographic covariates/biomarkers in both datasets. We compared subsequent CVD incidence rates, hazard ratios (HRs) of risk factors, and discrimination/calibration performances of CVD risk scores. Results: The estimated subsequent total CVD incidence rate was 37.5 and 90.6 per 1000 person-years since T2DM onset in HRS and NYULH-EHR respectively. HR estimates were comparable between the datasets for most demographic covariates/biomarkers. Common CVD risk scores underestimated observed total CVD risks in NYULH-EHR. Discussion and conclusion: EHR-estimated HRs of demographic and major clinical risk factors for CVD were mostly consistent with the estimates from a national cohort, despite high incidences and absolute risks of total CVD outcome in the EHR samples.
KW - cardiovascular disease
KW - cohort analysis
KW - electronic health records
KW - risk factors
KW - type 2 diabetes mellitus
UR - http://www.scopus.com/inward/record.url?scp=85127249547&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127249547&partnerID=8YFLogxK
U2 - 10.1093/jamiaopen/ooaa059
DO - 10.1093/jamiaopen/ooaa059
M3 - Article
AN - SCOPUS:85127249547
SN - 2574-2531
VL - 3
SP - 583
EP - 592
JO - JAMIA Open
JF - JAMIA Open
IS - 4
ER -