Abstract
Large English texts on ten differnt subject matters were compiled. Estimates were obtained of the n-gram probability distributions, the word-length for each of the texts as well as English as a whole. Experiments were done to test for pairwise differences of the ten texts. Principal component analysis and hierarchical clustering analysis were applied to the data in order to discover any possible similarities and dissimilarities among the different texts. Estimates were obtained of first, second, and third-order entropies for each text, and the texts were tested for pairwise differences according to their first-order entropy estimates. The results are of interest to researchers in psychology, biology, anthropology, and computational linguistics as well as pattern recognition.
Original language | English (US) |
---|---|
Pages | 164-117 |
Number of pages | 48 |
State | Published - 1978 |
Event | Proc IEEE Comput Soc Conf Pattern Recognition Image Process - Chicago, IL, USA Duration: May 31 1978 → Jun 2 1978 |
Other
Other | Proc IEEE Comput Soc Conf Pattern Recognition Image Process |
---|---|
City | Chicago, IL, USA |
Period | 5/31/78 → 6/2/78 |
ASJC Scopus subject areas
- General Engineering