New techniques for DNA sequence classification

Jason T.L. Wang, Steve Rozen, Bruce A. Shapiro, Dennis Shasha, Zhiyuan Wang, Maisheng Yin

Research output: Contribution to journalArticle


DNA sequence classification is the activity of determining whether or not an unlabeled sequence S belongs to an existing class C. This paper proposes two new techniques for DNA sequence classification. The first technique works by comparing the unlabeled sequence S with a group of active motifs discovered from the elements of C and by distinction with elements outside of C. The second technique generates and matches gapped fingerprints of S with elements of C. Experimental results obtained by running these algorithms on long and well conserved Alu sequences demonstrate the good performance of the presented methods compared with FASTA. When applied to less conserved and relatively short functional sites such as splice- junctions, a variation of the second technique combining fingerprinting with consensus sequence analysis gives better results than the current classifiers employing text compression and machine learning algorithms.

Original languageEnglish (US)
Pages (from-to)209-218
Number of pages10
JournalJournal of Computational Biology
Issue number2
StatePublished - 1999


  • Algorithms
  • Consensus sequence
  • DNA sequence recognition
  • Pattern matching
  • Tools for computational biology

ASJC Scopus subject areas

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'New techniques for DNA sequence classification'. Together they form a unique fingerprint.

  • Cite this

    Wang, J. T. L., Rozen, S., Shapiro, B. A., Shasha, D., Wang, Z., & Yin, M. (1999). New techniques for DNA sequence classification. Journal of Computational Biology, 6(2), 209-218.