TY - GEN
T1 - What's different
T2 - 22nd International Conference on Data Engineering, ICDE '06
AU - Cormode, Graham
AU - Muthukrishnan, S.
AU - Zhuang, Wei
PY - 2006
Y1 - 2006
N2 - Emerging applications in sensor systems and network-wide IP traffic analysis present many technical challenges. They need distributed monitoring and continuous tracking of events. They have severe resource constraints not only at each site in terms of per-update processing time and archival space for high-speed streams of observations, but also crucially, communication constraints for collaborating on the monitoring task. These elements have been addressed in a series of recent works. A fundamental issue that arises is that one cannot make the "uniqueness" assumption on observed events which is present in previous works, since widescale monitoring invariably encounters the same events at different points. For example, within the network of an Internet Service Provider packets of the same flow will be observed in different routers; similarly, the same individual will be observed by multiple mobile sensors in monitoring wild animals. Aggregates of interest on such distributed environments must be resilient to duplicate observations. We study such duplicate-resilient aggregates that measure the extent of the duplication - how many unique observations are there, how many observations are unique - as well as standard holistic aggregates such as quantiles and heavy hitters over the unique items. We present accuracy guaranteed, highly communication-efficient algorithms for these aggregates that work within the time and space constraints of high speed streams. We also present results of a detailed experimental study on both real-life and synthetic data.
AB - Emerging applications in sensor systems and network-wide IP traffic analysis present many technical challenges. They need distributed monitoring and continuous tracking of events. They have severe resource constraints not only at each site in terms of per-update processing time and archival space for high-speed streams of observations, but also crucially, communication constraints for collaborating on the monitoring task. These elements have been addressed in a series of recent works. A fundamental issue that arises is that one cannot make the "uniqueness" assumption on observed events which is present in previous works, since widescale monitoring invariably encounters the same events at different points. For example, within the network of an Internet Service Provider packets of the same flow will be observed in different routers; similarly, the same individual will be observed by multiple mobile sensors in monitoring wild animals. Aggregates of interest on such distributed environments must be resilient to duplicate observations. We study such duplicate-resilient aggregates that measure the extent of the duplication - how many unique observations are there, how many observations are unique - as well as standard holistic aggregates such as quantiles and heavy hitters over the unique items. We present accuracy guaranteed, highly communication-efficient algorithms for these aggregates that work within the time and space constraints of high speed streams. We also present results of a detailed experimental study on both real-life and synthetic data.
UR - http://www.scopus.com/inward/record.url?scp=33749591511&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749591511&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2006.173
DO - 10.1109/ICDE.2006.173
M3 - Conference contribution
AN - SCOPUS:33749591511
SN - 0769525709
SN - 9780769525709
T3 - Proceedings - International Conference on Data Engineering
SP - 57
BT - Proceedings of the 22nd International Conference on Data Engineering, ICDE '06
Y2 - 3 April 2006 through 7 April 2006
ER -