Detecting depression with audio/text sequence modeling of interviews

Tuka Alhanai, Mohammad Ghassemi, James Glass

Research output: Contribution to journalConference articlepeer-review


Medical professionals diagnose depression by interpreting the responses of individuals to a variety of questions, probing lifestyle changes and ongoing thoughts. Like professionals, an effective automated agent must understand that responses to queries have varying prognostic value. In this study we demonstrate an automated depression-detection algorithm that models interviews between an individual and agent and learns from sequences of questions and answers without the need to perform explicit topic modeling of the content. We utilized data of 142 individuals undergoing depression screening, and modeled the interactions with audio and text features in a Long-Short Term Memory (LSTM) neural network model to detect depression. Our results were comparable to methods that explicitly modeled the topics of the questions and answers which suggests that depression can be detected through sequential modeling of an interaction, with minimal information on the structure of the interview.

Original languageEnglish (US)
Pages (from-to)1716-1720
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2018
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: Sep 2 2018Sep 6 2018


  • Computational paralinguistics
  • Depression
  • Medical speech signal processing
  • Neural networks
  • Question answering

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation


Dive into the research topics of 'Detecting depression with audio/text sequence modeling of interviews'. Together they form a unique fingerprint.

Cite this