Using a data-driven approach to define post-COVID conditions in US electronic health record data

Kathleen M. Andersen, Farid L. Khan, Peter W. Park, Timothy L. Wiemken, Birol Emir, Deepa Malhotra, Tuka Alhanai, Mohammad M. Ghassemi, Leah J. McGrath

Research output: Contribution to journalArticlepeer-review


Objective To create a data-driven definition of post-COVID conditions (PCC) by directly measure changes in symptomatology before and after a first COVID episode. Materials and methods Retrospective cohort study using Optum® de-identified Electronic Health Record (EHR) dataset from the United States of persons of any age April 2020-September 2021. For each person with COVID (ICD-10-CM U07.1 “COVID-19” or positive test result), we selected up to 3 comparators. The final COVID symptom score was computed as the sum of new diagnoses weighted by each diagnosis’ ratio of incidence in COVID group relative to comparator group. For the subset of COVID cases diagnosed in September 2021, we compared the incidence of PCC using our data-driven definition with ICD-10-CM code U09.9 “Post-COVID Conditions”, first available in the US October 2021. Results The final cohort contained 588,611 people with COVID, with mean age of 48 years and 38% male. Our definition identified 20% of persons developed PCC in follow-up. PCC incidence increased with age: (7.8% of persons aged 0–17, 17.3% aged 18–64, and 33.3% aged 65+) and did not change over time (20.0% among persons diagnosed with COVID in 2020 versus 20.3% in 2021). For cases diagnosed in September 2021, our definition identified 19.0% with PCC in follow-up as compared to 2.9% with U09.9 code in follow-up. Conclusion Symptom and U09.9 code-based definitions alone captured different populations. Maximal capture may consider a combined approach, particularly before the availability and routine utilization of specific ICD-10 codes and with the lack consensus-based definitions on the syndrome.

Original languageEnglish (US)
Article numbere0300570
JournalPloS one
Issue number4 April
StatePublished - Apr 2024

ASJC Scopus subject areas

  • General


Dive into the research topics of 'Using a data-driven approach to define post-COVID conditions in US electronic health record data'. Together they form a unique fingerprint.

Cite this