TY - JOUR
T1 - OrthologID
T2 - Automation of genome-scale ortholog identification within a parsimony framework
AU - Chiu, Joanna C.
AU - Lee, Ernest K.
AU - Egan, Mary G.
AU - Sarkar, Indra Neil
AU - Coruzzi, Gloria M.
AU - DeSalle, Rob
N1 - Funding Information:
We thank all the members of the New York Plant Genomics Consortium for discussion and invaluable suggestions in the development of OrthologID including D. W. Stevenson and E. D. Brenner (New York Botanical Garden), M. S. Katari and E. de la Torre (New York University), R. A. Martienssen and R. W. McCombie (Cold Spring Harbor Laboratory), and P. J. Planet (American Museum of Natural History). This study was supported by NSF Plant Genome Grant DBI-0421604 and NSF SGER Grant DBI-0326436 to G.M.C. and R.D. of the New York Plant Genomics Consortium, and by the Lewis B. and Dorothy Cullman Program for Molecular Systematics at the American Museum of Natural History.
PY - 2006/3/15
Y1 - 2006/3/15
N2 - Motivation: The determination of gene orthology is a prerequisite for mining and utilizing the rapidly increasing amount of sequence data for genome-scale phylogenetics and comparative genomic studies. Until now, most researchers use pairwise distance comparisons algorithms, such as BLAST, COG, RBH, RSD and INPARANOID, to determine gene orthology. In contrast, orthology determination within a character-based phylogenetic framework has not been utilized on a genomic scale owing to the lack of efficiency and automation. Results: We have developed OrthologID, a Web application that automates the labor-intensive procedures of gene orthology determination within a character-based phylogenetic framework, thus making character-based orthology determination on a genomic scale possible. In addition to generating gene family trees and determining orthologous gene sets for complete genomes, OrthologID can also identify diagnostic characters that define each orthologous gene set, as well as diagnostic characters that are responsible for classifying query sequences from other genomes into specific orthology groups. The OrthologID database currently includes several complete plant genomes, including Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, as well as a unicellular outgroup, Chlamydomonas reinhardtii. To improve the general utility of OrthologID beyond plant species, we plan to expand our sequence database to include the fully sequenced genomes of prokaryotes and other non-plant eukaryotes.
AB - Motivation: The determination of gene orthology is a prerequisite for mining and utilizing the rapidly increasing amount of sequence data for genome-scale phylogenetics and comparative genomic studies. Until now, most researchers use pairwise distance comparisons algorithms, such as BLAST, COG, RBH, RSD and INPARANOID, to determine gene orthology. In contrast, orthology determination within a character-based phylogenetic framework has not been utilized on a genomic scale owing to the lack of efficiency and automation. Results: We have developed OrthologID, a Web application that automates the labor-intensive procedures of gene orthology determination within a character-based phylogenetic framework, thus making character-based orthology determination on a genomic scale possible. In addition to generating gene family trees and determining orthologous gene sets for complete genomes, OrthologID can also identify diagnostic characters that define each orthologous gene set, as well as diagnostic characters that are responsible for classifying query sequences from other genomes into specific orthology groups. The OrthologID database currently includes several complete plant genomes, including Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, as well as a unicellular outgroup, Chlamydomonas reinhardtii. To improve the general utility of OrthologID beyond plant species, we plan to expand our sequence database to include the fully sequenced genomes of prokaryotes and other non-plant eukaryotes.
UR - http://www.scopus.com/inward/record.url?scp=33645106540&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33645106540&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btk040
DO - 10.1093/bioinformatics/btk040
M3 - Article
C2 - 16410324
AN - SCOPUS:33645106540
SN - 1367-4803
VL - 22
SP - 699
EP - 707
JO - Bioinformatics
JF - Bioinformatics
IS - 6
ER -