TY - GEN
T1 - Efficient algorithms for inverting evolution
AU - Farach, Martin
AU - Kannan, Sampath
N1 - Publisher Copyright:
© 1996 ACM.
PY - 1996/7/1
Y1 - 1996/7/1
N2 - Evolution is a stochastic process which operates on the DNA of species. The evolutionary process leaves tell-tale signs in the DNA which can be used to construct phylogenies, or evolutionary trees, for a set of species. Maximum Likelihood Estimations (MLE) methods seek the evolutionary tree which is most likely to have produced the DNA under consideration. While these methods are widely accepted and intellectually satisfying, they have been computationally intractable. In this paper, we address the intractability of MLE methods as follows. We introduce a metric on stochastic process models of evolution. We show that this metric is meaningful by proving that in order for any algorithm to distinguish between two stochatic models that are close according to this metric, it needs to be given a lot of observations. We complement this result with a simple and efficient algorithm for inverting the stochastic process of evolution, that is, for building the tree from observations on the DNA of the species. Our result can be viewed as a result on the PAC-learnability of the class of distributions produced by tree-like processes. Though there have been many heuristics suggested for this problem, our algorithm is the first one with a guaranteed convergence rate, and further, this rate is within a polynomial of the lower-bound rate we establish. Ours is also the the first polynomial-time algorithm which is guaranteed to converge at all to the correct tree.
AB - Evolution is a stochastic process which operates on the DNA of species. The evolutionary process leaves tell-tale signs in the DNA which can be used to construct phylogenies, or evolutionary trees, for a set of species. Maximum Likelihood Estimations (MLE) methods seek the evolutionary tree which is most likely to have produced the DNA under consideration. While these methods are widely accepted and intellectually satisfying, they have been computationally intractable. In this paper, we address the intractability of MLE methods as follows. We introduce a metric on stochastic process models of evolution. We show that this metric is meaningful by proving that in order for any algorithm to distinguish between two stochatic models that are close according to this metric, it needs to be given a lot of observations. We complement this result with a simple and efficient algorithm for inverting the stochastic process of evolution, that is, for building the tree from observations on the DNA of the species. Our result can be viewed as a result on the PAC-learnability of the class of distributions produced by tree-like processes. Though there have been many heuristics suggested for this problem, our algorithm is the first one with a guaranteed convergence rate, and further, this rate is within a polynomial of the lower-bound rate we establish. Ours is also the the first polynomial-time algorithm which is guaranteed to converge at all to the correct tree.
UR - http://www.scopus.com/inward/record.url?scp=0029714314&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0029714314&partnerID=8YFLogxK
U2 - 10.1145/237814.237868
DO - 10.1145/237814.237868
M3 - Conference contribution
AN - SCOPUS:0029714314
T3 - Proceedings of the Annual ACM Symposium on Theory of Computing
SP - 230
EP - 236
BT - Proceedings of the 28th Annual ACM Symposium on Theory of Computing, STOC 1996
PB - Association for Computing Machinery
T2 - 28th Annual ACM Symposium on Theory of Computing, STOC 1996
Y2 - 22 May 1996 through 24 May 1996
ER -