TY - GEN
T1 - On the entropy of DNA
T2 - 6th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1995
AU - Farach, Martin
AU - Noordewier, Michiel
AU - Savari, Serap
AU - Shepp, Larry
AU - Wyner, Abraham
AU - Ziv, Jacob
PY - 1995/1/22
Y1 - 1995/1/22
N2 - We have applied the information theoretic notion of entropy to characterize DNA sequences. We consider a genetic sequence signal that is too small for asymptotic entropy estimates to be accurate, and for which similar approaches have previously failed. We prove that the match length entropy estimator has a relatively fast converge rate and demonstrate experimentally that by using this entropy estimator, we can indeed extract a meaningful signal from segments of DNA. Further, we derive a method for detecting certain signals within DNA - known as splice junctions - with significantly better performance than previously known methods. The main result of this paper is that we find that the entropy of genetic material which is ultimately expressed in protein sequences is higher than that which is discarded. This is an unexpected result, since current biological theory holds that the discarded sequences ("introns") are capable of tolerating random changes to a greater degree than the retained sequences ("exons").
AB - We have applied the information theoretic notion of entropy to characterize DNA sequences. We consider a genetic sequence signal that is too small for asymptotic entropy estimates to be accurate, and for which similar approaches have previously failed. We prove that the match length entropy estimator has a relatively fast converge rate and demonstrate experimentally that by using this entropy estimator, we can indeed extract a meaningful signal from segments of DNA. Further, we derive a method for detecting certain signals within DNA - known as splice junctions - with significantly better performance than previously known methods. The main result of this paper is that we find that the entropy of genetic material which is ultimately expressed in protein sequences is higher than that which is discarded. This is an unexpected result, since current biological theory holds that the discarded sequences ("introns") are capable of tolerating random changes to a greater degree than the retained sequences ("exons").
UR - http://www.scopus.com/inward/record.url?scp=84994364597&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84994364597&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84994364597
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 48
EP - 57
BT - Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1995
PB - Association for Computing Machinery
Y2 - 22 January 1995 through 24 January 1995
ER -