MetricMap: An embedding technique for processing distance-based queries in metric spaces

Jason T.L. Wang, Xiong Wang, Dennis Shasha, Kaizhong Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we present an embedding technique, called MetricMap, which is capable of estimating distances in a pseudometric space. Given a database of objects and a distance function for the objects, which is a pseudometric, we map the objects to vectors in a pseudo-Euclidean space with a reasonably low dimension while preserving the distance between two objects approximately. Such an embedding technique can be used as an approximate oracle to process a broad class of distance-based queries. It is also adaptable to data mining applications such as data clustering and classification. We present the theory underlying MetricMap and conduct experiments to compare MetricMap with other methods including MVP-tree and M-tree in processing the distance-based queries. Experimental results on both protein and RNA data show the good performance and the superiority of MetricMap over the other methods.

Original languageEnglish (US)
Pages (from-to)973-987
Number of pages15
JournalIEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Volume35
Issue number5
DOIs
StatePublished - Oct 2005

Keywords

  • Bioinformatics
  • Data mining
  • Embedding method
  • Metric space
  • Nearest neighbors
  • Similarity search

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'MetricMap: An embedding technique for processing distance-based queries in metric spaces'. Together they form a unique fingerprint.

Cite this