TY - GEN
T1 - Identifying representative trends in massive time series data sets using sketches
AU - Indyk, Piotr
AU - Koudas, Nick
AU - Muthukrishnan, S.
PY - 2000
Y1 - 2000
N2 - Many data stores, including scientific and financial databases, business warehouses and network repositories, contain time series data. Time series data depict trends for an observed value e.g., value of a stock, number of bytes sent on a router interface, etc., as a function of time. Analysis of the trends over different time windows is of great interest. In this paper, we formalize problems of identifying various 'representative' trends in time series data. Informally, an interval of observations in a time series is defined to be a representative trend if its distance from other intervals satisfy certain properties, for suitably defined distance functions between time series intervals. Natural trends of interest such as periodic or average trends are examples of representative trends. We present efficient algorithms for analyzing massive time series data sets for representative trends over arbitrary windows of interest. Our algorithms are highly processor and 10 efficient; they are approximate but provide probabilistic guarantees for the approximations achieved. Our approach for identifying representative trends relies on a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector. We present efficient algorithms to construct such sketches using a pool of select sketches that we precompute using polynomial convolutions. Using such sketches, we can compute representative trends accurately. Finally, we present results of a detailed experimental study of our technique on very large real data sets. Our results show that, compared to approaches that determine representative trends exactly, our approach shows significant performance gains with only a small loss in accuracy.
AB - Many data stores, including scientific and financial databases, business warehouses and network repositories, contain time series data. Time series data depict trends for an observed value e.g., value of a stock, number of bytes sent on a router interface, etc., as a function of time. Analysis of the trends over different time windows is of great interest. In this paper, we formalize problems of identifying various 'representative' trends in time series data. Informally, an interval of observations in a time series is defined to be a representative trend if its distance from other intervals satisfy certain properties, for suitably defined distance functions between time series intervals. Natural trends of interest such as periodic or average trends are examples of representative trends. We present efficient algorithms for analyzing massive time series data sets for representative trends over arbitrary windows of interest. Our algorithms are highly processor and 10 efficient; they are approximate but provide probabilistic guarantees for the approximations achieved. Our approach for identifying representative trends relies on a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector. We present efficient algorithms to construct such sketches using a pool of select sketches that we precompute using polynomial convolutions. Using such sketches, we can compute representative trends accurately. Finally, we present results of a detailed experimental study of our technique on very large real data sets. Our results show that, compared to approaches that determine representative trends exactly, our approach shows significant performance gains with only a small loss in accuracy.
UR - http://www.scopus.com/inward/record.url?scp=32344434787&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=32344434787&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:32344434787
SN - 1558607153
SN - 9781558607156
T3 - Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00
SP - 363
EP - 372
BT - Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00
T2 - 26th International Conference on Very Large Data Bases, VLDB 2000
Y2 - 10 September 2000 through 14 September 2000
ER -