Discovering active motifs in sets of related protein sequences and using them for classification

Jason T.L. Wang, Thomas G. Marr, Dennis Shasha, Bruce A. Shapiro, Gung wei Chirn

Research output: Contribution to journalArticlepeer-review


We describe a method for discovering active motifs in a set of related protein sequences. The method is an automatic two step process: (1) find candidate motifs in a small sample of the sequences; (2) test whether these motifs are approximately present in all the sequences. To reduce the running time, we develop two optimization heuristics based on statistical estimation and pattern matching techniques. Experimental results obtained by running these algorithms on generated data and functionally related proteins demonstrate the good performance of the presented method compared with visual method of O'Farrell and Leopold. By combining the discovered motifs with an existing fingerprint technique, we develop a protein classifier. When we apply the classifier to the 698 groups of related proteins in the PROSITE catalog, it gives information that is complementary to the BLOCKS protein classifier of Henikoff and Henikoff. Thus, using our classifier in conjunction with theirs, one can obtain high confidence classifications (if BLOCKS and our classifier agree) or suggest a new hypothesis (if the two disagree).

Original languageEnglish (US)
Pages (from-to)2769-2775
Number of pages7
JournalNucleic acids research
Issue number14
StatePublished - Jul 25 1994

ASJC Scopus subject areas

  • Genetics


Dive into the research topics of 'Discovering active motifs in sets of related protein sequences and using them for classification'. Together they form a unique fingerprint.

Cite this