TY - GEN
T1 - A generalized fast Subset Sums framework for Bayesian event detection
AU - Shao, Kan
AU - Liu, Yandong
AU - Neill, Daniel B.
PY - 2011
Y1 - 2011
N2 - We present Generalized Fast Subset Sums (GFSS), a new Bayesian framework for scalable and accurate detection of irregularly shaped spatial clusters using multiple data streams. GFSS extends the previously proposed Multivariate Bayesian Scan Statistic (MBSS) and Fast Subset Sums (FSS) approaches for detection of emerging events. The detection power of MBSS is primarily limited by computational considerations, which limit it to searching over circular spatial regions. GFSS enables more accurate and timely detection by defining a hierarchical prior over all subsets of the N locations, first selecting a local neighborhood consisting of a center location and its neighbors, and introducing a sparsity parameter P to describe how likely each location in the neighborhood is to be affected. This approach allows us to consider all possible subsets of locations (including irregularlyshaped regions) but also puts higher weight on more compact regions. We demonstrate that MBSS and FSS are both special cases of this general framework (assuming P = 1 and P = 0.5 respectively), but substantially higher detection power can be achieved by choosing an appropriate value of P. Thus we show that the distribution of the sparsity parameter P can be accurately learned from a small number of labeled events. Our evaluation results (on synthetic disease outbreaks injected into real-world hospital data) show that the GFSS method with learned sparsity parameter has higher detection power and spatial accuracy than MBSS and FSS, particularly when the affected region is irregular or elongated. We also show that the learned models can be used for event characterization, accurately distinguishing between two otherwise identical event types based on the sparsity of the affected spatial region.
AB - We present Generalized Fast Subset Sums (GFSS), a new Bayesian framework for scalable and accurate detection of irregularly shaped spatial clusters using multiple data streams. GFSS extends the previously proposed Multivariate Bayesian Scan Statistic (MBSS) and Fast Subset Sums (FSS) approaches for detection of emerging events. The detection power of MBSS is primarily limited by computational considerations, which limit it to searching over circular spatial regions. GFSS enables more accurate and timely detection by defining a hierarchical prior over all subsets of the N locations, first selecting a local neighborhood consisting of a center location and its neighbors, and introducing a sparsity parameter P to describe how likely each location in the neighborhood is to be affected. This approach allows us to consider all possible subsets of locations (including irregularlyshaped regions) but also puts higher weight on more compact regions. We demonstrate that MBSS and FSS are both special cases of this general framework (assuming P = 1 and P = 0.5 respectively), but substantially higher detection power can be achieved by choosing an appropriate value of P. Thus we show that the distribution of the sparsity parameter P can be accurately learned from a small number of labeled events. Our evaluation results (on synthetic disease outbreaks injected into real-world hospital data) show that the GFSS method with learned sparsity parameter has higher detection power and spatial accuracy than MBSS and FSS, particularly when the affected region is irregular or elongated. We also show that the learned models can be used for event characterization, accurately distinguishing between two otherwise identical event types based on the sparsity of the affected spatial region.
KW - Biosurveillance
KW - Event detection
KW - Scan statistics
UR - http://www.scopus.com/inward/record.url?scp=84863129015&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863129015&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2011.11
DO - 10.1109/ICDM.2011.11
M3 - Conference contribution
AN - SCOPUS:84863129015
SN - 9780769544083
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 617
EP - 625
BT - Proceedings - 11th IEEE International Conference on Data Mining, ICDM 2011
T2 - 11th IEEE International Conference on Data Mining, ICDM 2011
Y2 - 11 December 2011 through 14 December 2011
ER -