Identifying representative trends in massive time series data sets using sketches

Piotr Indyk, Nick Koudas, S. Muthukrishnan

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Many data stores, including scientific and financial databases, business warehouses and network repositories, contain time series data. Time series data depict trends for an observed value e.g., value of a stock, number of bytes sent on a router interface, etc., as a function of time. Analysis of the trends over different time windows is of great interest. In this paper, we formalize problems of identifying various 'representative' trends in time series data. Informally, an interval of observations in a time series is defined to be a representative trend if its distance from other intervals satisfy certain properties, for suitably defined distance functions between time series intervals. Natural trends of interest such as periodic or average trends are examples of representative trends. We present efficient algorithms for analyzing massive time series data sets for representative trends over arbitrary windows of interest. Our algorithms are highly processor and 10 efficient; they are approximate but provide probabilistic guarantees for the approximations achieved. Our approach for identifying representative trends relies on a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector. We present efficient algorithms to construct such sketches using a pool of select sketches that we precompute using polynomial convolutions. Using such sketches, we can compute representative trends accurately. Finally, we present results of a detailed experimental study of our technique on very large real data sets. Our results show that, compared to approaches that determine representative trends exactly, our approach shows significant performance gains with only a small loss in accuracy.

    Original languageEnglish (US)
    Title of host publicationProceedings of the 26th International Conference on Very Large Data Bases, VLDB'00
    Pages363-372
    Number of pages10
    StatePublished - 2000
    Event26th International Conference on Very Large Data Bases, VLDB 2000 - Cairo, Egypt
    Duration: Sep 10 2000Sep 14 2000

    Publication series

    NameProceedings of the 26th International Conference on Very Large Data Bases, VLDB'00

    Other

    Other26th International Conference on Very Large Data Bases, VLDB 2000
    Country/TerritoryEgypt
    CityCairo
    Period9/10/009/14/00

    ASJC Scopus subject areas

    • Hardware and Architecture
    • Information Systems
    • Software
    • Information Systems and Management

    Fingerprint

    Dive into the research topics of 'Identifying representative trends in massive time series data sets using sketches'. Together they form a unique fingerprint.

    Cite this