Unordered tree mining with applications to phylogeny

Dennis Shasha, Jason T.L. Wangt, Sen Zhang

Research output: Contribution to conferencePaperpeer-review

Abstract

Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. In this paper we present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grand-parent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|2) time where |T| is the number of nodes in T. Experimental results on synthetic data and phytogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees).

Original languageEnglish (US)
Pages708-719
Number of pages12
DOIs
StatePublished - 2004
EventProceedings - 20th International Conference on Data Engineering - ICDE 2004 - Boston, MA., United States
Duration: Mar 30 2004Apr 2 2004

Other

OtherProceedings - 20th International Conference on Data Engineering - ICDE 2004
Country/TerritoryUnited States
CityBoston, MA.
Period3/30/044/2/04

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems

Fingerprint

Dive into the research topics of 'Unordered tree mining with applications to phylogeny'. Together they form a unique fingerprint.

Cite this