Principled Sampling for Anomaly Detection

Brendan Juba, Christopher Musco, Fan Long, Stelios Sidiroglou-Douskos, Martin Rinard

    Research output: Contribution to conferencePaperpeer-review


    Anomaly detection plays an important role in protecting computer systems from unforeseen attack by automatically recognizing and filter atypical inputs. However, it can be difficult to balance the sensitivity of a detector - an aggressive system can filter too many benign inputs while a conservative system can fail to catch anomalies. Accordingly, it is important to rigorously test anomaly detectors to evaluate potential error rates before deployment. However, principled systems for doing so have not been studied - testing is typically ad hoc, making it difficult to reproduce results or formally compare detectors. To address this issue we present a technique and implemented system, Fortuna, for obtaining probabilistic bounds on false positive rates for anomaly detectors that process Internet data. Using a probability distribution based on PageRank and an efficient algorithm to draw samples from the distribution, Fortuna computes an estimated false positive rate and a probabilistic bound on the estimate's accuracy. By drawing test samples from a well defined distribution that correlates well with data seen in practice, Fortuna improves on ad hoc methods for estimating false positive rate, giving bounds that are reproducible, comparable across different anomaly detectors, and theoretically sound. Experimental evaluations of three anomaly detectors (SIFT, SOAP, and JSAND) show that Fortuna is efficient enough to use in practice - it can sample enough inputs to obtain tight false positive rate bounds in less than 10 hours for all three detectors. These results indicate that Fortuna can, in practice, help place anomaly detection on a stronger theoretical foundation and help practitioners better understand the behavior and consequences of the anomaly detectors that they deploy. As part of our work, we obtain a theoretical result that may be of independent interest: We give a simple analysis of the convergence rate of the random surfer process defining PageRank that guarantees the same rate as the standard, second-eigenvalue analysis, but does not rely on any assumptions about the link structure of the web.

    Original languageEnglish (US)
    StatePublished - 2015
    Event22nd Annual Network and Distributed System Security Symposium, NDSS 2015 - San Diego, United States
    Duration: Feb 8 2015Feb 11 2015


    Conference22nd Annual Network and Distributed System Security Symposium, NDSS 2015
    Country/TerritoryUnited States
    CitySan Diego

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Control and Systems Engineering
    • Safety, Risk, Reliability and Quality


    Dive into the research topics of 'Principled Sampling for Anomaly Detection'. Together they form a unique fingerprint.

    Cite this