TY - GEN
T1 - Spark-Parsketch
T2 - 27th ACM International Conference on Information and Knowledge Management, CIKM 2018
AU - Levchenko, Oleksandra
AU - Yagoubi, Djamel Edine
AU - Akbarinia, Reza
AU - Masseglia, Florent
AU - Kolev, Boyan
AU - Shasha, Dennis
N1 - Publisher Copyright:
© 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2018/10/17
Y1 - 2018/10/17
N2 - A growing number of domains (finance, seismology, internet-of-things, etc.) collect massive time series. When the number of series grow to the hundreds of millions or even billions, similarity queries become intractable on a single machine. Further, naive (quadratic) parallelization won't work well. So, we need both efficient indexing and parallelization. We propose a demonstration of Spark-parSketch, a complete solution based on sketches / random projections to efficiently perform both the parallel indexing of large sets of time series and a similarity search on them. Because our method is approximate, we explore the tradeoff between time and precision. A video showing the dynamics of the demonstration can be found by the link http://parsketch.gforge.inria.fr/video/parSketchdemo_720p.mov.
AB - A growing number of domains (finance, seismology, internet-of-things, etc.) collect massive time series. When the number of series grow to the hundreds of millions or even billions, similarity queries become intractable on a single machine. Further, naive (quadratic) parallelization won't work well. So, we need both efficient indexing and parallelization. We propose a demonstration of Spark-parSketch, a complete solution based on sketches / random projections to efficiently perform both the parallel indexing of large sets of time series and a similarity search on them. Because our method is approximate, we explore the tradeoff between time and precision. A video showing the dynamics of the demonstration can be found by the link http://parsketch.gforge.inria.fr/video/parSketchdemo_720p.mov.
KW - Distributed data processing
KW - Indexing
KW - Similarity search
KW - Spark
KW - Time series
UR - http://www.scopus.com/inward/record.url?scp=85058040706&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85058040706&partnerID=8YFLogxK
U2 - 10.1145/3269206.3269226
DO - 10.1145/3269206.3269226
M3 - Conference contribution
AN - SCOPUS:85058040706
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1951
EP - 1954
BT - CIKM 2018 - Proceedings of the 27th ACM International Conference on Information and Knowledge Management
A2 - Paton, Norman
A2 - Candan, Selcuk
A2 - Wang, Haixun
A2 - Allan, James
A2 - Agrawal, Rakesh
A2 - Labrinidis, Alexandros
A2 - Cuzzocrea, Alfredo
A2 - Zaki, Mohammed
A2 - Srivastava, Divesh
A2 - Broder, Andrei
A2 - Schuster, Assaf
PB - Association for Computing Machinery
Y2 - 22 October 2018 through 26 October 2018
ER -