Spark-Parsketch: A massively distributed indexing of time series datasets

Oleksandra Levchenko, Djamel Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Boyan Kolev, Dennis Shasha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A growing number of domains (finance, seismology, internet-of-things, etc.) collect massive time series. When the number of series grow to the hundreds of millions or even billions, similarity queries become intractable on a single machine. Further, naive (quadratic) parallelization won't work well. So, we need both efficient indexing and parallelization. We propose a demonstration of Spark-parSketch, a complete solution based on sketches / random projections to efficiently perform both the parallel indexing of large sets of time series and a similarity search on them. Because our method is approximate, we explore the tradeoff between time and precision. A video showing the dynamics of the demonstration can be found by the link http://parsketch.gforge.inria.fr/video/parSketchdemo_720p.mov.

Original languageEnglish (US)
Title of host publicationCIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management
EditorsNorman Paton, Selcuk Candan, Haixun Wang, James Allan, Rakesh Agrawal, Alexandros Labrinidis, Alfredo Cuzzocrea, Mohammed Zaki, Divesh Srivastava, Andrei Broder, Assaf Schuster
PublisherAssociation for Computing Machinery
Pages1951-1954
Number of pages4
ISBN (Electronic)9781450360142
DOIs
StatePublished - Oct 17 2018
Event27th ACM International Conference on Information and Knowledge Management, CIKM 2018 - Torino, Italy
Duration: Oct 22 2018Oct 26 2018

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other27th ACM International Conference on Information and Knowledge Management, CIKM 2018
Country/TerritoryItaly
CityTorino
Period10/22/1810/26/18

Keywords

  • Distributed data processing
  • Indexing
  • Similarity search
  • Spark
  • Time series

ASJC Scopus subject areas

  • General Decision Sciences
  • General Business, Management and Accounting

Fingerprint

Dive into the research topics of 'Spark-Parsketch: A massively distributed indexing of time series datasets'. Together they form a unique fingerprint.

Cite this