Quantifying depression-related language on social media during the COVID-19 pandemic

Brent D. Davis, Dawn Estes McKnight, Daniela Teodorescu, Anabel Quan-Haase, Rumi Chunara, Alona Fyshe, Daniel J. Lizotte

Research output: Contribution to journalArticlepeer-review


Introduction The COVID-19 pandemic had clear impacts on mental health. Social media presents an opportunity for assessing mental health at the population level. Objectives 1) Identify and describe language used on social media that is associated with discourse about depression. 2) Describe the associations between identified language and COVID-19 incidence over time across several geographies. Methods We create a word embedding based on the posts in Reddit's/r/Depression and use this word embedding to train representations of active authors. We contrast these authors against a control group and extract keywords that capture differences between the two groups. We filter these keywords for face validity and to match character limits of an information retrieval system, Elasticsearch. We retrieve all geo-tagged posts on Twitter from April 2019 to June 2021 from Seattle, Sydney, Mumbai, and Toronto. The tweets are scored with BM25 using the keywords. We call this score rDD. We compare changes in average score over time with case counts from the pandemic's beginning through June 2021. Results We observe a pattern in rDD across all cities analyzed: There is an increase in rDD near the start of the pandemic which levels off over time. However, in Mumbai we also see an increase aligned with a second wave of cases. Conclusions Our results are concordant with other studies which indicate that the impact of the pandemic on mental health was highest initially and was followed by recovery, largely unchanged by subsequent waves. However, in the Mumbai data we observed a substantial rise in rDD with a large second wave. Our results indicate possible un-captured heterogeneity across geographies, and point to a need for a better understanding of this differential impact on mental health.

Original languageEnglish (US)
Article number14
JournalInternational Journal of Population Data Science
Issue number4
StatePublished - 2020


  • COVID-19
  • Twitter
  • depression
  • information retrieval
  • machine learning
  • public health surveillance
  • social media

ASJC Scopus subject areas

  • Demography
  • Information Systems
  • Health Informatics
  • Information Systems and Management


Dive into the research topics of 'Quantifying depression-related language on social media during the COVID-19 pandemic'. Together they form a unique fingerprint.

Cite this