Abstract
Data arriving in time order (a data stream) arises in fields including physics, finance, medicine, and music, to name a few. Often the data comes from sensors (in physics and medicine for example) whose data rates continue to improve dramatically as sensor technology improves. Further, the number of sensors is increasing, so correlating data between sensors becomes ever more critical in order to distill knowlege from the data. In many applications such as finance, recent correlations are of far more interest than long-term correlation, so correlation over sliding windows (windowed correlation) is the desired operation. Fast response is desirable in many applications (e.g., to aim a telescope at an activity of interest or to perform a stock trade). These three factors - data size, windowed correlation, and fast response - motivate this work. Previous work [10, 14] showed how to compute Pearson correlation using Fast Fourier Transforms and Wavelet transforms, but such techniques don't work for time series in which the energy is spread over many frequency components, thus resembling white noise. For such "uncooperative" time series, this paper shows how to combine several simple techniques - sketches (random projections), convolution, structured random vectors, grid structures, and combinatorial design - to achieve high performance windowed Pearson correlation over a variety of data sets.
Original language | English (US) |
---|---|
Title of host publication | KDD-2005 - Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
Editors | R.L. Grossman, R. Bayardo, K. Bennett, J. Vaidya |
Pages | 743-749 |
Number of pages | 7 |
DOIs | |
State | Published - 2005 |
Event | KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States Duration: Aug 21 2005 → Aug 24 2005 |
Other
Other | KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|---|
Country/Territory | United States |
City | Chicago, IL |
Period | 8/21/05 → 8/24/05 |
Keywords
- Correlation
- Randomized algorithms
- Time series
ASJC Scopus subject areas
- Information Systems