TY - GEN
T1 - Approximate data stream joins in distributed systems
AU - Kriakov, Vassil
AU - Delis, Alex
AU - Kollios, George
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2007
Y1 - 2007
N2 - The emergence of applications producing continuous high-frequency data streams has brought forth a large body of research in the area of distributed stream processing. In presence of high volumes of data, efforts have primarily concentrated on providing approximate aggregate or top-k type results. Scalable solutions for providing answers to window join queries in distributed stream processing systems have received limited attention to date. We provide a solution for the window join in a distributed stream processing system which features reduced inter-node communications achieved through automatic throughput handling based on resource availability. Our approach is based on incrementally updated discrete Fourier transforms (DFTs). Furthermore, we provide formulae for computing DFT compression factors in order to achieve information reduction. We perform WAN-based prototype experiments to ascertain the viability and establish the effectiveness of our method. Our experimental results reveal that our method scales in terms of throughput and error rates, achieving sub-linear message complexity in domains that exhibit a geographic skew in the joining attributes.
AB - The emergence of applications producing continuous high-frequency data streams has brought forth a large body of research in the area of distributed stream processing. In presence of high volumes of data, efforts have primarily concentrated on providing approximate aggregate or top-k type results. Scalable solutions for providing answers to window join queries in distributed stream processing systems have received limited attention to date. We provide a solution for the window join in a distributed stream processing system which features reduced inter-node communications achieved through automatic throughput handling based on resource availability. Our approach is based on incrementally updated discrete Fourier transforms (DFTs). Furthermore, we provide formulae for computing DFT compression factors in order to achieve information reduction. We perform WAN-based prototype experiments to ascertain the viability and establish the effectiveness of our method. Our experimental results reveal that our method scales in terms of throughput and error rates, achieving sub-linear message complexity in domains that exhibit a geographic skew in the joining attributes.
UR - http://www.scopus.com/inward/record.url?scp=34848902023&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34848902023&partnerID=8YFLogxK
U2 - 10.1109/ICDCS.2007.104
DO - 10.1109/ICDCS.2007.104
M3 - Conference contribution
AN - SCOPUS:34848902023
SN - 0769528376
SN - 9780769528373
T3 - Proceedings - International Conference on Distributed Computing Systems
BT - 27th International Conference on Distributed Computing Systems, ICDCS'07
T2 - 27th International Conference on Distributed Computing Systems, ICDCS'07
Y2 - 25 June 2007 through 27 June 2007
ER -