Abstract
We propose a new 'fast subset scan' approach for accurate and computationally efficient event detection in massive data sets. We treat event detection as a search over subsets of data records, finding the subset which maximizes some score function. We prove that many commonly used functions (e.g. Kulldorff's spatial scan statistic and extensions) satisfy the 'linear time subset scanning' property, enabling exact and efficient optimization over subsets. In the spatial setting, we demonstrate that proximity-constrained subset scans substantially improve the timeliness and accuracy of event detection, detecting emerging outbreaks of disease 2 days faster than existing methods.
Original language | English (US) |
---|---|
Pages (from-to) | 337-360 |
Number of pages | 24 |
Journal | Journal of the Royal Statistical Society. Series B: Statistical Methodology |
Volume | 74 |
Issue number | 2 |
DOIs | |
State | Published - Mar 2012 |
Keywords
- Algorithms
- Disease surveillance
- Event detection
- Scan statistics
- Spatial scan
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty