TY - GEN

T1 - Identifying representative trends in massive time series data sets using sketches

AU - Indyk, Piotr

AU - Koudas, Nick

AU - Muthukrishnan, S.

PY - 2000

Y1 - 2000

N2 - Many data stores, including scientific and financial databases, business warehouses and network repositories, contain time series data. Time series data depict trends for an observed value e.g., value of a stock, number of bytes sent on a router interface, etc., as a function of time. Analysis of the trends over different time windows is of great interest. In this paper, we formalize problems of identifying various 'representative' trends in time series data. Informally, an interval of observations in a time series is defined to be a representative trend if its distance from other intervals satisfy certain properties, for suitably defined distance functions between time series intervals. Natural trends of interest such as periodic or average trends are examples of representative trends. We present efficient algorithms for analyzing massive time series data sets for representative trends over arbitrary windows of interest. Our algorithms are highly processor and 10 efficient; they are approximate but provide probabilistic guarantees for the approximations achieved. Our approach for identifying representative trends relies on a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector. We present efficient algorithms to construct such sketches using a pool of select sketches that we precompute using polynomial convolutions. Using such sketches, we can compute representative trends accurately. Finally, we present results of a detailed experimental study of our technique on very large real data sets. Our results show that, compared to approaches that determine representative trends exactly, our approach shows significant performance gains with only a small loss in accuracy.

AB - Many data stores, including scientific and financial databases, business warehouses and network repositories, contain time series data. Time series data depict trends for an observed value e.g., value of a stock, number of bytes sent on a router interface, etc., as a function of time. Analysis of the trends over different time windows is of great interest. In this paper, we formalize problems of identifying various 'representative' trends in time series data. Informally, an interval of observations in a time series is defined to be a representative trend if its distance from other intervals satisfy certain properties, for suitably defined distance functions between time series intervals. Natural trends of interest such as periodic or average trends are examples of representative trends. We present efficient algorithms for analyzing massive time series data sets for representative trends over arbitrary windows of interest. Our algorithms are highly processor and 10 efficient; they are approximate but provide probabilistic guarantees for the approximations achieved. Our approach for identifying representative trends relies on a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector. We present efficient algorithms to construct such sketches using a pool of select sketches that we precompute using polynomial convolutions. Using such sketches, we can compute representative trends accurately. Finally, we present results of a detailed experimental study of our technique on very large real data sets. Our results show that, compared to approaches that determine representative trends exactly, our approach shows significant performance gains with only a small loss in accuracy.

UR - http://www.scopus.com/inward/record.url?scp=32344434787&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32344434787&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:32344434787

SN - 1558607153

SN - 9781558607156

T3 - Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00

SP - 363

EP - 372

BT - Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00

T2 - 26th International Conference on Very Large Data Bases, VLDB 2000

Y2 - 10 September 2000 through 14 September 2000

ER -