TY - CHAP
T1 - Estimating dominance norms of multiple data streams
AU - Cormode, Graham
AU - Muthukrishnan, S.
PY - 2003
Y1 - 2003
N2 - There is much focus in the algorithms and database communities on designing tools to manage and mine data streams. Typically, data streams consist of multiple signals. Formally, a stream of multiple signals is (i, ai,j) where i's correspond to the domain, j's index the different signals and a i,j ≥ 0 give the value of the jth signal at point i. We study the problem of finding norms that are cumulative of the multiple signals in the data stream. For example, consider the max-dominance norm, defined as ∑i maxj{ai,j}. It may be thought as estimating the norm of the "upper envelope" of the multiple signals, or alternatively, as estimating the norm of the "marginal" distribution of tabular data streams. It is used in applications to estimate the "worst case influence" of multiple processes, for example in IP traffic analysis, electrical grid monitoring and financial domain. In addition, it is a natural measure, generalizing the union of data streams or counting distinct elements in data streams. We present the first known data stream algorithms for estimating max-dominance of multiple signals. In particular, we use workspace and time-per-item that are both sublinear (in fact, poly-logarithmic) in the input size. In contrast other notions of dominance on streams a, b - min-dominance (∑i minj{a i,j}), countdominance (|{i|ai > bi}|) or relative-dominance (∑i ai/ max{1, bi}) - are all impossible to estimate accurately with sublinear space.
AB - There is much focus in the algorithms and database communities on designing tools to manage and mine data streams. Typically, data streams consist of multiple signals. Formally, a stream of multiple signals is (i, ai,j) where i's correspond to the domain, j's index the different signals and a i,j ≥ 0 give the value of the jth signal at point i. We study the problem of finding norms that are cumulative of the multiple signals in the data stream. For example, consider the max-dominance norm, defined as ∑i maxj{ai,j}. It may be thought as estimating the norm of the "upper envelope" of the multiple signals, or alternatively, as estimating the norm of the "marginal" distribution of tabular data streams. It is used in applications to estimate the "worst case influence" of multiple processes, for example in IP traffic analysis, electrical grid monitoring and financial domain. In addition, it is a natural measure, generalizing the union of data streams or counting distinct elements in data streams. We present the first known data stream algorithms for estimating max-dominance of multiple signals. In particular, we use workspace and time-per-item that are both sublinear (in fact, poly-logarithmic) in the input size. In contrast other notions of dominance on streams a, b - min-dominance (∑i minj{a i,j}), countdominance (|{i|ai > bi}|) or relative-dominance (∑i ai/ max{1, bi}) - are all impossible to estimate accurately with sublinear space.
UR - http://www.scopus.com/inward/record.url?scp=0142245899&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0142245899&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-39658-1_16
DO - 10.1007/978-3-540-39658-1_16
M3 - Chapter
AN - SCOPUS:0142245899
SN - 3540200649
SN - 9783540200642
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 148
EP - 160
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
A2 - di Battista, Giuseppe
A2 - Zwick, Uri
PB - Springer Verlag
ER -