TY - JOUR
T1 - Support vector subset scan for spatial pattern detection
AU - Fitzpatrick, Dylan
AU - Ni, Yun
AU - Neill, Daniel B.
N1 - Funding Information:
This work was partially funded by NSF, USA grant IIS-0953330 . A preliminary version was presented at the International Society for Disease Surveillance Annual Conference with a one-page abstract published in the Online Journal of Public Health Informatics ( Fitzpatrick et al., 2017 ).
Publisher Copyright:
© 2020 The Author(s)
PY - 2021/5
Y1 - 2021/5
N2 - Discovery of localized and irregularly shaped anomalous patterns in spatial data provides useful context for operational decisions across many policy domains. The support vector subset scan (SVSS) integrates the penalized fast subset scan with a kernel support vector machine classifier to accurately detect spatial clusters without imposing hard constraints on the shape or size of the pattern. The method iterates between (1) efficiently maximizing a penalized log-likelihood ratio over subsets of locations to obtain an anomalous pattern, and (2) learning a high-dimensional decision boundary between locations included in and excluded from the anomalous subset. On each iteration, location-specific penalties to the log-likelihood ratio are assigned according to distance to the decision boundary, encouraging patterns which are spatially compact but potentially highly irregular in shape. SVSS outperforms competing methods for spatial cluster detection at the task of detecting randomly generated patterns in simulated experiments. SVSS enables discovery of practically-useful anomalous patterns for disease surveillance in Chicago, IL, crime hotspot detection in Portland, OR, and pothole cluster detection in Pittsburgh, PA, as demonstrated by experiments using publicly available data sets from these domains.
AB - Discovery of localized and irregularly shaped anomalous patterns in spatial data provides useful context for operational decisions across many policy domains. The support vector subset scan (SVSS) integrates the penalized fast subset scan with a kernel support vector machine classifier to accurately detect spatial clusters without imposing hard constraints on the shape or size of the pattern. The method iterates between (1) efficiently maximizing a penalized log-likelihood ratio over subsets of locations to obtain an anomalous pattern, and (2) learning a high-dimensional decision boundary between locations included in and excluded from the anomalous subset. On each iteration, location-specific penalties to the log-likelihood ratio are assigned according to distance to the decision boundary, encouraging patterns which are spatially compact but potentially highly irregular in shape. SVSS outperforms competing methods for spatial cluster detection at the task of detecting randomly generated patterns in simulated experiments. SVSS enables discovery of practically-useful anomalous patterns for disease surveillance in Chicago, IL, crime hotspot detection in Portland, OR, and pothole cluster detection in Pittsburgh, PA, as demonstrated by experiments using publicly available data sets from these domains.
KW - Anomalous pattern detection
KW - Machine learning
KW - Spatial analysis
KW - Subset scanning
UR - http://www.scopus.com/inward/record.url?scp=85098702822&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098702822&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2020.107149
DO - 10.1016/j.csda.2020.107149
M3 - Article
AN - SCOPUS:85098702822
SN - 0167-9473
VL - 157
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
M1 - 107149
ER -