TreeRank: A similarity measure for nearest neighbor searching in phylogenetic databases

J. T.L. Wang, Huiyuan Shan, D. Shasha, W. H. Piel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Phylogenetic trees are unordered labeled trees in which each leaf node has a label and the order among siblings is unimportant. In this paper we propose a new similarity measure, called TreeRank, for phylogenetic trees and present an algorithm for computing TreeRank scores. Given a query or pattern tree P and a data tree D, the TreeRank score from P to D is a measure of the topological relationships in P that are found to be the same or similar in D. The proposed algorithm calculates the TreeRank score in O(M2 + N) time where M is the number of nodes appearing in both P and D, and N is the number of nodes in D. We then develop a search engine that, given a query or pattern tree P and a database of trees D, finds and ranks the nearest neighbors of P in D where the "nearness" is measured by the proposed similarity function. This structure-based search engine is fully operational and is available on the World Wide Web.

Original languageEnglish (US)
Title of host publication15th International Conference on Scientific and Statistical Database Management, SSDBM 2003
EditorsSilvia Nittel, Dimitrios Gunopulos
PublisherIEEE Computer Society
Pages171-180
Number of pages10
ISBN (Electronic)0769519644
DOIs
StatePublished - 2003
Event15th International Conference on Scientific and Statistical Database Management, SSDBM 2003 - Cambridge, United States
Duration: Jul 9 2003Jul 11 2003

Publication series

NameProceedings of the International Conference on Scientific and Statistical Database Management, SSDBM
Volume2003-January
ISSN (Print)1099-3371

Other

Other15th International Conference on Scientific and Statistical Database Management, SSDBM 2003
Country/TerritoryUnited States
CityCambridge
Period7/9/037/11/03

Keywords

  • Biology
  • Computer science
  • Data analysis
  • Databases
  • Educational institutions
  • Information retrieval
  • Nearest neighbor searches
  • Phylogeny
  • Search engines
  • Web sites

ASJC Scopus subject areas

  • Software
  • Information Systems

Fingerprint

Dive into the research topics of 'TreeRank: A similarity measure for nearest neighbor searching in phylogenetic databases'. Together they form a unique fingerprint.

Cite this