TY - GEN
T1 - Summarizing two-dimensional data with skyline-based statistical descriptors
AU - Cormode, Graham
AU - Korn, Flip
AU - Muthukrishnan, S.
AU - Srivastava, Divesh
PY - 2008
Y1 - 2008
N2 - Much real data consists of more than one dimension, such as financial transactions (eg, price × volume) and IP network flows (eg, duration × numBytes), and capture relationships between the variables. For a single dimension, quantiles are intuitive and robust descriptors. Processing and analyzing such data, particularly in data warehouse or data streaming settings, requires similarly robust and informative statistical descriptors that go beyond one-dimension. Applying quantile methods to summarize a multidimensional distribution along only singleton attributes ignores the rich dependence amongst the variables. In this paper, we present new skyline-based statistical descriptors for capturing the distributions over pairs of dimensions. They generalize the notion of quantiles in the individual dimensions, and also incorporate properties of the joint distribution. We introduce φ-quantours and α-radials, which are skyline points over subsets of the data, and propose (φ, α)-quantiles, found from the union of these skylines, as statistical descriptors of two-dimensional distributions. We present efficient online algorithms for tracking (φ,α)-quantiles on two-dimensional streams using guaranteed small space. We identify the principal properties of the proposed descriptors and perform extensive experiments with synthetic and real IP traffic data to study the efficiency of our proposed algorithms.
AB - Much real data consists of more than one dimension, such as financial transactions (eg, price × volume) and IP network flows (eg, duration × numBytes), and capture relationships between the variables. For a single dimension, quantiles are intuitive and robust descriptors. Processing and analyzing such data, particularly in data warehouse or data streaming settings, requires similarly robust and informative statistical descriptors that go beyond one-dimension. Applying quantile methods to summarize a multidimensional distribution along only singleton attributes ignores the rich dependence amongst the variables. In this paper, we present new skyline-based statistical descriptors for capturing the distributions over pairs of dimensions. They generalize the notion of quantiles in the individual dimensions, and also incorporate properties of the joint distribution. We introduce φ-quantours and α-radials, which are skyline points over subsets of the data, and propose (φ, α)-quantiles, found from the union of these skylines, as statistical descriptors of two-dimensional distributions. We present efficient online algorithms for tracking (φ,α)-quantiles on two-dimensional streams using guaranteed small space. We identify the principal properties of the proposed descriptors and perform extensive experiments with synthetic and real IP traffic data to study the efficiency of our proposed algorithms.
UR - http://www.scopus.com/inward/record.url?scp=49049093043&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49049093043&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-69497-7_6
DO - 10.1007/978-3-540-69497-7_6
M3 - Conference contribution
AN - SCOPUS:49049093043
SN - 3540694765
SN - 9783540694762
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 42
EP - 60
BT - Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings
T2 - 20th International Conference on Scientific and Statistical Database Management, SSDBM 2008
Y2 - 9 July 2008 through 11 July 2008
ER -