Anomaly pattern detection in categorical datasets

Kaustav Das, Jeff Schneider, Daniel B. Neill

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a new method for detecting patterns of anomalies in categorical datasets. We assume that anomalies are generated by some underlying process which affects only a particular subset of the data. Our method consists of two steps: we first use a "local anomaly detector" to identify individual records with anomalous attribute values, and then detect patterns where the number of anomalous records is higher than expected. Given the set of anomalies flagged by the local anomaly detector, we search over all subsets of the data defined by any set of fixed values of a subset of the attributes, in order to detect self-similar patterns of anomalies. We wish to detect any such subset of the test data which displays a significant increase in anomalous activity as compared to the normal behavior of the system (as indicated by the training data). We perform significance testing to determine if the number of anomalies in any subset of the test data is significantly higher than expected, and propose an efficient algorithm to perform this test over all such subsets of the data. We show that this algorithm is able to accurately detect anomalous patterns in real-world hospital, container shipping and network intrusion data.

Original languageEnglish (US)
Title of host publicationKDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining
Pages169-176
Number of pages8
DOIs
StatePublished - 2008
Event14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008 - Las Vegas, NV, United States
Duration: Aug 24 2008Aug 27 2008

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008
Country/TerritoryUnited States
CityLas Vegas, NV
Period8/24/088/27/08

Keywords

  • Anomaly Detection
  • Machine Learning
  • Pattern Detection

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'Anomaly pattern detection in categorical datasets'. Together they form a unique fingerprint.

Cite this