Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs

Feng Chen, Daniel B. Neill

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Event detection in social media is an important but challenging problem. Most existing approaches are based on burst detection, topic modeling, or clustering techniques, which cannot naturally model the implicit heterogeneous network structure in social media. As a result, only limited information, such as terms and geographic locations, can be used. This paper presents Non-Parametric Heterogeneous Graph Scan (NPHGS), a new approach that considers the entire heterogeneous network for event detection: we first model the network as a "sensor" network, in which each node senses its "neighborhood environment" and reports an empirical p-value measuring its current level of anomalousness for each time interval (e.g., hour or day). Then, we efficiently maximize a nonparametric scan statistic over connected subgraphs to identify the most anomalous network clusters. Finally, the event represented by each cluster is summarized with information such as type of event, geographical locations, time, and participants. As a case study, we consider two applications using Twitter data, civil unrest event detection and rare disease outbreak detection, and present empirical evaluations illustrating the effectiveness and efficiency of our proposed approach.

Original languageEnglish (US)
Title of host publicationKDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1166-1175
Number of pages10
ISBN (Print)9781450329569
DOIs
StatePublished - 2014
Event20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014 - New York, NY, United States
Duration: Aug 24 2014Aug 27 2014

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014
CountryUnited States
CityNew York, NY
Period8/24/148/27/14

Keywords

  • event detection and forecasting
  • heterogeneous graphs
  • non-parametric scan statistics
  • social media

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint Dive into the research topics of 'Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs'. Together they form a unique fingerprint.

Cite this