TY - JOUR
T1 - Homology search for genes
AU - Cui, Xuefeng
AU - Vinař, Tomáš
AU - Brejová, Brońa
AU - Shasha, Dennis
AU - Li, Ming
N1 - Funding Information:
We would like to thank Daniel G. Brown and Brendan McConkey for insightful comments. Research of X.C. and M.L. is supported by NSERC, CITO, and Canada Research Chair program. Part of the work of T.V. and B.B. was done while at the University of Waterloo and was supported by NSERC, CITO and Canada Research Chair program. Research of T.V. at Cornell is supported by NSF grant DBI-0644111 and NSF/NIGMS grant DMS-0201037. Research of B.B. at Cornell is supported by NIH/NCI (subcontract 22XS013A). Research of D.S. is supported by NSF grants DBI-044566, N2010 IOB-0519985, N2010 DBI-0519984, DBI-0421604 and MCB-0209754. These supports are greatly appreciated.
PY - 2007/7/1
Y1 - 2007/7/1
N2 - Motivation: Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. This process is error-prone and labor-intensive, especially in genomes without reliable gene annotation. Results: We have developed a homology search solution that automates this process, and instead of HSPs returns complete gene structures. We achieve better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene. Compared to traditional homology search, our novel approach identifies splice sites much more reliably and can even locate exons that were lost in the query gene. On a testing set of 400 mouse query genes, we report 79% exon sensitivity and 80% exon specificity in the human genome based on orthologous genes annotated in NCBI HomoloGene. In the same set, we also found 50 (12%) gene structures with better protein alignment scores than the ones identified in HomoloGene.
AB - Motivation: Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. This process is error-prone and labor-intensive, especially in genomes without reliable gene annotation. Results: We have developed a homology search solution that automates this process, and instead of HSPs returns complete gene structures. We achieve better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene. Compared to traditional homology search, our novel approach identifies splice sites much more reliably and can even locate exons that were lost in the query gene. On a testing set of 400 mouse query genes, we report 79% exon sensitivity and 80% exon specificity in the human genome based on orthologous genes annotated in NCBI HomoloGene. In the same set, we also found 50 (12%) gene structures with better protein alignment scores than the ones identified in HomoloGene.
UR - http://www.scopus.com/inward/record.url?scp=34547852257&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547852257&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btm225
DO - 10.1093/bioinformatics/btm225
M3 - Article
C2 - 17646351
AN - SCOPUS:34547852257
SN - 1367-4803
VL - 23
SP - i97-i103
JO - Bioinformatics
JF - Bioinformatics
IS - 13
ER -