TY - GEN

T1 - Rapid detection of significant spatial clusters

AU - Neill, Daniel B.

AU - Moore, Andrew W.

PY - 2004

Y1 - 2004

N2 - Given an N × N grid of squares, where each square has a count C ij and an underlying population pij, our goal is to find the rectangular region with the highest density, and to calculate its significance by randomization. An arbitrary density function D, dependent on a region's total count C and total population P, can be used. For example, if each count represents the number of disease cases occurring in that square, we can use Kulldorff's spatial scan statistic DK to find the most significant spatial disease cluster. A naive approach to finding the maximum density region requires O(N4) time, and is generally computationally infeasible. We present a multiresolution algorithm which partitions the grid into overlapping regions using a novel overlap-kd tree data structure, bounds the maximum score of subrogions contained in each region, and prunes regions which cannot contain the maximum density region. For sufficiently dense regions, this method finds the maximum density region in O((N log N)2) time, in practice resulting in significant (20-2000x) speedups on both real and simulated datasets.

AB - Given an N × N grid of squares, where each square has a count C ij and an underlying population pij, our goal is to find the rectangular region with the highest density, and to calculate its significance by randomization. An arbitrary density function D, dependent on a region's total count C and total population P, can be used. For example, if each count represents the number of disease cases occurring in that square, we can use Kulldorff's spatial scan statistic DK to find the most significant spatial disease cluster. A naive approach to finding the maximum density region requires O(N4) time, and is generally computationally infeasible. We present a multiresolution algorithm which partitions the grid into overlapping regions using a novel overlap-kd tree data structure, bounds the maximum score of subrogions contained in each region, and prunes regions which cannot contain the maximum density region. For sufficiently dense regions, this method finds the maximum density region in O((N log N)2) time, in practice resulting in significant (20-2000x) speedups on both real and simulated datasets.

KW - Biosurveillance

KW - Cluster detection

KW - Spatial scan statistics

UR - http://www.scopus.com/inward/record.url?scp=12244282691&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=12244282691&partnerID=8YFLogxK

U2 - 10.1145/1014052.1014082

DO - 10.1145/1014052.1014082

M3 - Conference contribution

AN - SCOPUS:12244282691

SN - 1581138881

SN - 9781581138887

T3 - KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 256

EP - 265

BT - KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

T2 - KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Y2 - 22 August 2004 through 25 August 2004

ER -