We propose a general methodology for the construction and analysis of minimax estimators for functionals of discrete distributions, where the support size S is unknown and may be comparable to the number of observations n. We illustrate the merit of our approach by thoroughly analyzing non-asymptotically the performance of the resulting schemes for estimating two important information measures: the entropy H(P) = σi=1S-pi ln pi and Fα(P) = σi=1S piα, α > 0. We obtain the minimax L2 risks for estimating these functionals up to a universal constant. In particular, we demonstrate that our estimator achieves the optimal sample complexity n ≫ S / ln S for entropy estimation. We also demonstrate that the sample complexity for estimating Fα(P), 0 < α < 1 is n ≫ S1/a/ln S, which can be achieved by our estimator and not by the popular plug-in Maximum Likelihood Estimator (MLE). For 1 < a < 3/2, we show the minimax L2 rate for estimating Fα(P) is (n ln n)-2(α-1) regardless of the support size, while the exact L2 rate for the MLE is n-2(α-1). For all the above cases, the behavior of the minimax rate-optimal estimators with n samples is essentially that of the MLE with n ln n samples. Finally, we highlight the practical advantages of our schemes for the estimation of entropy and mutual information.