TY - GEN
T1 - Extracting signals from news streams for disease outbreak prediction
AU - Chakraborty, Sunandan
AU - Subramanian, Lakshminarayanan
PY - 2017/4/19
Y1 - 2017/4/19
N2 - Emergence of digital news provides new opportunities in information extraction. Proper characterization of unstructured news can help identify signals that may drive variations in many observable phenomena, such as disease outbreaks. In this paper, we propose a method to extract such signals from a large corpus of news events and identify a subset of signals that are closely related to the observed phenomenon. We show how words appearing in a large news corpus can be represented and latent features can be extracted to build predictive models. We build and evaluate such a system specifically for characterizing and predicting diseases outbreaks in India. We focused on 5 different diseases prevalent in India and experiments showed that our model can predict disease outbreaks 2 to 4 weeks prior, with an average precision of around 0.80 and recall of around 0.65. We also compared our model with an LDA-based baseline model, where our model demonstrated around 5-14% improvement across different diseases.
AB - Emergence of digital news provides new opportunities in information extraction. Proper characterization of unstructured news can help identify signals that may drive variations in many observable phenomena, such as disease outbreaks. In this paper, we propose a method to extract such signals from a large corpus of news events and identify a subset of signals that are closely related to the observed phenomenon. We show how words appearing in a large news corpus can be represented and latent features can be extracted to build predictive models. We build and evaluate such a system specifically for characterizing and predicting diseases outbreaks in India. We focused on 5 different diseases prevalent in India and experiments showed that our model can predict disease outbreaks 2 to 4 weeks prior, with an average precision of around 0.80 and recall of around 0.65. We also compared our model with an LDA-based baseline model, where our model demonstrated around 5-14% improvement across different diseases.
UR - http://www.scopus.com/inward/record.url?scp=85019170631&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85019170631&partnerID=8YFLogxK
U2 - 10.1109/GlobalSIP.2016.7906051
DO - 10.1109/GlobalSIP.2016.7906051
M3 - Conference contribution
AN - SCOPUS:85019170631
T3 - 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings
SP - 1300
EP - 1304
BT - 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2016
Y2 - 7 December 2016 through 9 December 2016
ER -