TY - GEN
T1 - RadiusSketch
T2 - 4th International Conference on Data Science and Advanced Analytics, DSAA 2017
AU - Yagoubi, Djamel Edine
AU - Akbarinia, Reza
AU - Masseglia, Florent
AU - Shasha, Dennis
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/2
Y1 - 2017/7/2
N2 - Performing similarity queries on hundreds of millions of time series is a challenge requiring both efficient indexing techniques and parallelization. We propose a sketch/random projection-based approach that scales nearly linearly in parallel environments, and provides high quality answers. We illustrate the performance of our approach, called RadiusSketch, on real and synthetic datasets of up to 1 Terabytes and 500 million time series. The sketch method, as we have implemented, is superior in both quality and response time compared with the state of the art approach, iSAX2+. Already, in the sequential case it improves recall and precision by a factor of two, while giving shorter response times. In a parallel environment with 32 processors, on both real and synthetic data, our parallel approach improves by a factor of up to 100 in index time construction and up to 15 in query answering time. Finally, our data structure makes use of idle computing time to improve the recall and precision yet further.
AB - Performing similarity queries on hundreds of millions of time series is a challenge requiring both efficient indexing techniques and parallelization. We propose a sketch/random projection-based approach that scales nearly linearly in parallel environments, and provides high quality answers. We illustrate the performance of our approach, called RadiusSketch, on real and synthetic datasets of up to 1 Terabytes and 500 million time series. The sketch method, as we have implemented, is superior in both quality and response time compared with the state of the art approach, iSAX2+. Already, in the sequential case it improves recall and precision by a factor of two, while giving shorter response times. In a parallel environment with 32 processors, on both real and synthetic data, our parallel approach improves by a factor of up to 100 in index time construction and up to 15 in query answering time. Finally, our data structure makes use of idle computing time to improve the recall and precision yet further.
UR - http://www.scopus.com/inward/record.url?scp=85046285778&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046285778&partnerID=8YFLogxK
U2 - 10.1109/DSAA.2017.49
DO - 10.1109/DSAA.2017.49
M3 - Conference contribution
AN - SCOPUS:85046285778
T3 - Proceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017
SP - 262
EP - 271
BT - Proceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 October 2017 through 21 October 2017
ER -