Approximate data stream joins in distributed systems

Vassil Kriakov, Alex Delis, George Kollios

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The emergence of applications producing continuous high-frequency data streams has brought forth a large body of research in the area of distributed stream processing. In presence of high volumes of data, efforts have primarily concentrated on providing approximate aggregate or top-k type results. Scalable solutions for providing answers to window join queries in distributed stream processing systems have received limited attention to date. We provide a solution for the window join in a distributed stream processing system which features reduced inter-node communications achieved through automatic throughput handling based on resource availability. Our approach is based on incrementally updated discrete Fourier transforms (DFTs). Furthermore, we provide formulae for computing DFT compression factors in order to achieve information reduction. We perform WAN-based prototype experiments to ascertain the viability and establish the effectiveness of our method. Our experimental results reveal that our method scales in terms of throughput and error rates, achieving sub-linear message complexity in domains that exhibit a geographic skew in the joining attributes.

Original languageEnglish (US)
Title of host publication27th International Conference on Distributed Computing Systems, ICDCS'07
DOIs
StatePublished - 2007
Event27th International Conference on Distributed Computing Systems, ICDCS'07 - Toronto, ON, Canada
Duration: Jun 25 2007Jun 27 2007

Publication series

NameProceedings - International Conference on Distributed Computing Systems

Conference

Conference27th International Conference on Distributed Computing Systems, ICDCS'07
Country/TerritoryCanada
CityToronto, ON
Period6/25/076/27/07

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Approximate data stream joins in distributed systems'. Together they form a unique fingerprint.

Cite this