TY - GEN
T1 - Fast and cheap genome wide haplotype construction via optical mapping
AU - Anantharaman, T. S.
AU - Mysoref, V.
AU - Mishra, B.
PY - 2005
Y1 - 2005
N2 - We describe an efficient algorithm to construct genome wide haplotype restriction maps of an individual by aligning single molecule DNA fragments collected with Optical Mapping technology. Using this algorithm and small amount of genomic material, we can construct the parental haplotypes for each diploid chromosome for any individual. Since such haplotype maps reveal the polymorphisms due to single nucleotide differences (SNPs) and small insertions and deletions (RFLPs), they are useful in association studies, studies involving genomic instabilities in cancer, and genetics, and yet incur relatively low cost and provide high throughput. If the underlying problem is formulated as a combinatorial optimization problem, it can be shown to be NP-complete (a special case of /("-population problem). But by effectively exploiting the structure of the underlying error processes and using a novel analog of the Baum-Welch algorithm for HMM models, we devise a probabilistic algorithm with a time complexity that is linear in the number of markers for an e-approximate solution. The algorithms were tested by constructing the first genome wide haplotype restriction map of the microbe T. pseudoana, as well as constructing a haplotype restriction map of a 120 Mb region of Human chromosome 4. The frequency of false positives and false negatives was estimated using simulated data. The empirical results were found very promising.
AB - We describe an efficient algorithm to construct genome wide haplotype restriction maps of an individual by aligning single molecule DNA fragments collected with Optical Mapping technology. Using this algorithm and small amount of genomic material, we can construct the parental haplotypes for each diploid chromosome for any individual. Since such haplotype maps reveal the polymorphisms due to single nucleotide differences (SNPs) and small insertions and deletions (RFLPs), they are useful in association studies, studies involving genomic instabilities in cancer, and genetics, and yet incur relatively low cost and provide high throughput. If the underlying problem is formulated as a combinatorial optimization problem, it can be shown to be NP-complete (a special case of /("-population problem). But by effectively exploiting the structure of the underlying error processes and using a novel analog of the Baum-Welch algorithm for HMM models, we devise a probabilistic algorithm with a time complexity that is linear in the number of markers for an e-approximate solution. The algorithms were tested by constructing the first genome wide haplotype restriction map of the microbe T. pseudoana, as well as constructing a haplotype restriction map of a 120 Mb region of Human chromosome 4. The frequency of false positives and false negatives was estimated using simulated data. The empirical results were found very promising.
UR - http://www.scopus.com/inward/record.url?scp=15944371003&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=15944371003&partnerID=8YFLogxK
M3 - Conference contribution
C2 - 15759644
AN - SCOPUS:15944371003
SN - 9812560467
SN - 9789812560469
T3 - Proceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005
SP - 385
EP - 396
BT - Proceedings of the Pacific Symposium on Biocomputing 2005, PSB 2005
T2 - 10th Pacific Symposium on Biocomputing, PSB 2005
Y2 - 4 January 2005 through 8 January 2005
ER -