TY - GEN

T1 - Efficient algorithms for inverting evolution

AU - Farach, Martin

AU - Kannan, Sampath

N1 - Publisher Copyright:
© 1996 ACM.

PY - 1996/7/1

Y1 - 1996/7/1

N2 - Evolution is a stochastic process which operates on the DNA of species. The evolutionary process leaves tell-tale signs in the DNA which can be used to construct phylogenies, or evolutionary trees, for a set of species. Maximum Likelihood Estimations (MLE) methods seek the evolutionary tree which is most likely to have produced the DNA under consideration. While these methods are widely accepted and intellectually satisfying, they have been computationally intractable. In this paper, we address the intractability of MLE methods as follows. We introduce a metric on stochastic process models of evolution. We show that this metric is meaningful by proving that in order for any algorithm to distinguish between two stochatic models that are close according to this metric, it needs to be given a lot of observations. We complement this result with a simple and efficient algorithm for inverting the stochastic process of evolution, that is, for building the tree from observations on the DNA of the species. Our result can be viewed as a result on the PAC-learnability of the class of distributions produced by tree-like processes. Though there have been many heuristics suggested for this problem, our algorithm is the first one with a guaranteed convergence rate, and further, this rate is within a polynomial of the lower-bound rate we establish. Ours is also the the first polynomial-time algorithm which is guaranteed to converge at all to the correct tree.

AB - Evolution is a stochastic process which operates on the DNA of species. The evolutionary process leaves tell-tale signs in the DNA which can be used to construct phylogenies, or evolutionary trees, for a set of species. Maximum Likelihood Estimations (MLE) methods seek the evolutionary tree which is most likely to have produced the DNA under consideration. While these methods are widely accepted and intellectually satisfying, they have been computationally intractable. In this paper, we address the intractability of MLE methods as follows. We introduce a metric on stochastic process models of evolution. We show that this metric is meaningful by proving that in order for any algorithm to distinguish between two stochatic models that are close according to this metric, it needs to be given a lot of observations. We complement this result with a simple and efficient algorithm for inverting the stochastic process of evolution, that is, for building the tree from observations on the DNA of the species. Our result can be viewed as a result on the PAC-learnability of the class of distributions produced by tree-like processes. Though there have been many heuristics suggested for this problem, our algorithm is the first one with a guaranteed convergence rate, and further, this rate is within a polynomial of the lower-bound rate we establish. Ours is also the the first polynomial-time algorithm which is guaranteed to converge at all to the correct tree.

UR - http://www.scopus.com/inward/record.url?scp=0029714314&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029714314&partnerID=8YFLogxK

U2 - 10.1145/237814.237868

DO - 10.1145/237814.237868

M3 - Conference contribution

AN - SCOPUS:0029714314

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 230

EP - 236

BT - Proceedings of the 28th Annual ACM Symposium on Theory of Computing, STOC 1996

PB - Association for Computing Machinery

T2 - 28th Annual ACM Symposium on Theory of Computing, STOC 1996

Y2 - 22 May 1996 through 24 May 1996

ER -