We refine the general methodology in  for the construction and analysis of essentially minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions with support size S comparable with the number of observations n. Specifically, we determine the 'smooth' and 'non-smooth' regimes based on the confidence set and the smoothness of the functional. In the 'non-smooth' regime, we apply an unbiased estimator for a 'suitable' polynomial approximation of the functional. In the 'smooth' regime, we construct a bias corrected version of the Maximum Likelihood Estimator (MLE) based on Taylor expansion. We apply the general methodology to the problem of estimating the KL divergence between two discrete distributions from empirical data. We construct a minimax rate-optimal estimator which is adaptive in the sense that it does not require the knowledge of the support size nor the upper bound on the likelihood ratio. Moreover, the performance of the optimal estimator with n samples is essentially that of the MLE with n ln n samples, i.e., the effective sample size enlargement phenomenon holds.