RadiusSketch: Massively distributed indexing of time series

Djamel Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Dennis Shasha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Performing similarity queries on hundreds of millions of time series is a challenge requiring both efficient indexing techniques and parallelization. We propose a sketch/random projection-based approach that scales nearly linearly in parallel environments, and provides high quality answers. We illustrate the performance of our approach, called RadiusSketch, on real and synthetic datasets of up to 1 Terabytes and 500 million time series. The sketch method, as we have implemented, is superior in both quality and response time compared with the state of the art approach, iSAX2+. Already, in the sequential case it improves recall and precision by a factor of two, while giving shorter response times. In a parallel environment with 32 processors, on both real and synthetic data, our parallel approach improves by a factor of up to 100 in index time construction and up to 15 in query answering time. Finally, our data structure makes use of idle computing time to improve the recall and precision yet further.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages262-271
Number of pages10
ISBN (Electronic)9781509050048
DOIs
StatePublished - Jul 2 2017
Event4th International Conference on Data Science and Advanced Analytics, DSAA 2017 - Tokyo, Japan
Duration: Oct 19 2017Oct 21 2017

Publication series

NameProceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017
Volume2018-January

Other

Other4th International Conference on Data Science and Advanced Analytics, DSAA 2017
Country/TerritoryJapan
CityTokyo
Period10/19/1710/21/17

ASJC Scopus subject areas

  • Signal Processing
  • Information Systems and Management
  • Statistics, Probability and Uncertainty
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'RadiusSketch: Massively distributed indexing of time series'. Together they form a unique fingerprint.

Cite this