Summarizing two-dimensional data with skyline-based statistical descriptors

Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Much real data consists of more than one dimension, such as financial transactions (eg, price × volume) and IP network flows (eg, duration × numBytes), and capture relationships between the variables. For a single dimension, quantiles are intuitive and robust descriptors. Processing and analyzing such data, particularly in data warehouse or data streaming settings, requires similarly robust and informative statistical descriptors that go beyond one-dimension. Applying quantile methods to summarize a multidimensional distribution along only singleton attributes ignores the rich dependence amongst the variables. In this paper, we present new skyline-based statistical descriptors for capturing the distributions over pairs of dimensions. They generalize the notion of quantiles in the individual dimensions, and also incorporate properties of the joint distribution. We introduce φ-quantours and α-radials, which are skyline points over subsets of the data, and propose (φ, α)-quantiles, found from the union of these skylines, as statistical descriptors of two-dimensional distributions. We present efficient online algorithms for tracking (φ,α)-quantiles on two-dimensional streams using guaranteed small space. We identify the principal properties of the proposed descriptors and perform extensive experiments with synthetic and real IP traffic data to study the efficiency of our proposed algorithms.

    Original languageEnglish (US)
    Title of host publicationScientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings
    Pages42-60
    Number of pages19
    DOIs
    StatePublished - 2008
    Event20th International Conference on Scientific and Statistical Database Management, SSDBM 2008 - Hong Kong, China
    Duration: Jul 9 2008Jul 11 2008

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume5069 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Other

    Other20th International Conference on Scientific and Statistical Database Management, SSDBM 2008
    Country/TerritoryChina
    CityHong Kong
    Period7/9/087/11/08

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • General Computer Science

    Fingerprint

    Dive into the research topics of 'Summarizing two-dimensional data with skyline-based statistical descriptors'. Together they form a unique fingerprint.

    Cite this