Optimal distance estimation between compressed data series

Nikolaos M. Freris, Michail Vlachos, Suleyman S. Kozat

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Most real-world data contain repeated or periodic pat- Terns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appro- priate complete orthogonal basis (e.g., Fourier, Wavelets, Karhunen-Lòeve expansion or Principal Components). In the face of ever increasing data repositories and given that most mining operations are distance-based, it is vital to perform accurate distance estimation directly on the compressed data. However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimiza- Tion problems related to obtaining the tightest lower/upper bound on the distance based on the available information. In particular, we consider the problem where a distinct set of coefficients is maintained for each sequence, and the L2- norm of the compression error is recorded. We establish the properties of optimal solutions, and leverage the theoretical analysis to develop a fast algorithm to obtain an exact so- lution to the problem. The suggested solution provides the tightest provable estimation of the L2-norm or the correla- Tion, and executes at least two order of magnitudes faster than a numerical solution based on convex optimization. The contributions of this work extend beyond the purview of periodic data, as our methods are applicable to any sequen- Tial or high-dimensional data as well as to any orthogonal data transformation used for the underlying data compres- sion scheme.

Original languageEnglish (US)
Title of host publicationProceedings of the 12th SIAM International Conference on Data Mining, SDM 2012
PublisherSociety for Industrial and Applied Mathematics Publications
Pages343-354
Number of pages12
ISBN (Print)9781611972320
DOIs
StatePublished - 2012
Event12th SIAM International Conference on Data Mining, SDM 2012 - Anaheim, CA, United States
Duration: Apr 26 2012Apr 28 2012

Publication series

NameProceedings of the 12th SIAM International Conference on Data Mining, SDM 2012

Other

Other12th SIAM International Conference on Data Mining, SDM 2012
Country/TerritoryUnited States
CityAnaheim, CA
Period4/26/124/28/12

Keywords

  • Compression
  • Distance estimation
  • Fourier
  • KKT conditions
  • Orthogonal bases
  • Tine series
  • Water-filling algorithm
  • Wavelets

ASJC Scopus subject areas

  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Optimal distance estimation between compressed data series'. Together they form a unique fingerprint.

Cite this