Fast Generalized Subset Scan for anomalous pattern detection

Edward McFowland, Skyler Speakman, Daniel B. Neill

Research output: Contribution to journalArticlepeer-review

Abstract

We propose Fast Generalized Subset Scan (FGSS), a new method for detecting anomalous patterns in general categorical data sets. We frame the pattern detection problem as a search over subsets of data records and attributes, maximizing a nonparametric scan statistic over all such subsets. We prove that the nonparametric scan statistics possess a novel property that allows for efficient optimization over the exponentially many subsets of the data without an exhaustive search, enabling FGSS to scale to massive and high-dimensional data sets. We evaluate the performance of FGSS in three real-world application domains (customs monitoring, disease surveillance, and network intrusion detection), and demonstrate that FGSS can successfully detect and characterize relevant patterns in each domain. As compared to three other recently proposed detection algorithms, FGSS substantially decreased run time and improved detection power for massive multivariate data sets.

Original languageEnglish (US)
Pages (from-to)1533-1561
Number of pages29
JournalJournal of Machine Learning Research
Volume14
StatePublished - Jun 2013

Keywords

  • Anomaly detection
  • Bayesian networks
  • Knowledge discovery
  • Pattern detection
  • Scan statistics

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Fast Generalized Subset Scan for anomalous pattern detection'. Together they form a unique fingerprint.

Cite this