## Abstract

We consider the classic problem of estimating T, the total number of species in a population, from repeated counts in a simple random sample. We first show that the frequently used Chao-Lee estimator can in fact be obtained by Bayesian methods with a Dirichlet prior, and then use such clarification to develop a new estimator; numerical tests and some real experiments show that the new estimator is more flexible than existing ones, in the sense that it adapts to changes in the normalized interspecies variance γ^{2}. Ourmethod involves simultaneous estimation of T, γ^{2}, and of the parameter λ in the Dirichlet prior, and the only limitation seems to come from the required convergence of the prior which imposes the restriction γ^{2} ≤ 1. We also obtain confidence intervals for T and an estimation of the species' distribution. Some numerical examples are given, together with applications to sampling from a Census database closely following Benford's law, showing good performances of the new estimator, even beyond γ^{2} = 1. Tests on confidence intervals show that the coverage frequency appears to be in good agreement with the desired confidence level. AMS (2000) subject classification. Primary 62G05; Secondary 62P10, 62P30, 62P35.

Original language | English (US) |
---|---|

Pages (from-to) | 80-100 |

Number of pages | 21 |

Journal | Sankhya: The Indian Journal of Statistics |

Volume | 74 |

Issue number | 1 A |

DOIs | |

State | Published - Dec 1 2012 |

## Keywords

- Bayesian posterior
- Confidence interval
- Dirichlet prior
- Point estimator
- Simple random sample
- Unobserved probability
- Unobserved species

## ASJC Scopus subject areas

- Statistics, Probability and Uncertainty
- Statistics and Probability