Abstract
Discovery of localized and irregularly shaped anomalous patterns in spatial data provides useful context for operational decisions across many policy domains. The support vector subset scan (SVSS) integrates the penalized fast subset scan with a kernel support vector machine classifier to accurately detect spatial clusters without imposing hard constraints on the shape or size of the pattern. The method iterates between (1) efficiently maximizing a penalized log-likelihood ratio over subsets of locations to obtain an anomalous pattern, and (2) learning a high-dimensional decision boundary between locations included in and excluded from the anomalous subset. On each iteration, location-specific penalties to the log-likelihood ratio are assigned according to distance to the decision boundary, encouraging patterns which are spatially compact but potentially highly irregular in shape. SVSS outperforms competing methods for spatial cluster detection at the task of detecting randomly generated patterns in simulated experiments. SVSS enables discovery of practically-useful anomalous patterns for disease surveillance in Chicago, IL, crime hotspot detection in Portland, OR, and pothole cluster detection in Pittsburgh, PA, as demonstrated by experiments using publicly available data sets from these domains.
Original language | English (US) |
---|---|
Article number | 107149 |
Journal | Computational Statistics and Data Analysis |
Volume | 157 |
DOIs | |
State | Published - May 2021 |
Keywords
- Anomalous pattern detection
- Machine learning
- Spatial analysis
- Subset scanning
ASJC Scopus subject areas
- Statistics and Probability
- Computational Mathematics
- Computational Theory and Mathematics
- Applied Mathematics