TY - JOUR
T1 - ParCorr
T2 - efficient parallel methods to identify similar time series pairs across sliding windows
AU - Yagoubi, Djamel Edine
AU - Akbarinia, Reza
AU - Kolev, Boyan
AU - Levchenko, Oleksandra
AU - Masseglia, Florent
AU - Valduriez, Patrick
AU - Shasha, Dennis
N1 - Publisher Copyright:
© 2018, The Author(s).
PY - 2018/9/1
Y1 - 2018/9/1
N2 - Consider the problem of finding the highly correlated pairs of time series over a time window and then sliding that window to find the highly correlated pairs over successive co-temporous windows such that each successive window starts only a little time after the previous window. Doing this efficiently and in parallel could help in applications such as sensor fusion, financial trading, or communications network monitoring, to name a few. We have developed a parallel incremental random vector/sketching approach to this problem and compared it with the state-of-the-art nearest neighbor method iSAX. Whereas iSAX achieves 100% recall and precision for Euclidean distance, the sketching approach is, empirically, at least 10 times faster and achieves 95% recall and 100% precision on real and simulated data. For many applications this speedup is worth the minor reduction in recall. Our method scales up to 100 million time series and scales linearly in its expensive steps (but quadratic in the less expensive ones).
AB - Consider the problem of finding the highly correlated pairs of time series over a time window and then sliding that window to find the highly correlated pairs over successive co-temporous windows such that each successive window starts only a little time after the previous window. Doing this efficiently and in parallel could help in applications such as sensor fusion, financial trading, or communications network monitoring, to name a few. We have developed a parallel incremental random vector/sketching approach to this problem and compared it with the state-of-the-art nearest neighbor method iSAX. Whereas iSAX achieves 100% recall and precision for Euclidean distance, the sketching approach is, empirically, at least 10 times faster and achieves 95% recall and 100% precision on real and simulated data. For many applications this speedup is worth the minor reduction in recall. Our method scales up to 100 million time series and scales linearly in its expensive steps (but quadratic in the less expensive ones).
KW - Data mining
KW - Data stream processing
KW - Distributed computing
KW - Time series analysis
UR - http://www.scopus.com/inward/record.url?scp=85051745634&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051745634&partnerID=8YFLogxK
U2 - 10.1007/s10618-018-0580-z
DO - 10.1007/s10618-018-0580-z
M3 - Article
AN - SCOPUS:85051745634
SN - 1384-5810
VL - 32
SP - 1481
EP - 1507
JO - Data Mining and Knowledge Discovery
JF - Data Mining and Knowledge Discovery
IS - 5
ER -