Abstract
Boosting combines weak learners into a predictor with low empirical risk. Its dual constructs a high entropy distribution upon which weak learners and training labels are uncorrelated. This manuscript studies this primal-dual relationship under a broad family of losses, including the exponential loss of AdaBoost and the logistic loss, revealing: • Weak learnability aids the whole loss family: for any ε > 0, O(ln(1/ε)) iterations suffice to produce a predictor with empirical risk e-close to the infimum; • The circumstances granting the existence of an empirical risk minimizer may be characterized in terms of the primal and dual problems, yielding a new proof of the known rate O(ln(1/ε)); • Arbitrary instances may be decomposed into the above two, granting rate O(1/ε), with a matching lower bound provided for the logistic loss.
Original language | English (US) |
---|---|
Pages (from-to) | 561-606 |
Number of pages | 46 |
Journal | Journal of Machine Learning Research |
Volume | 13 |
State | Published - Mar 2012 |
Keywords
- Boosting
- Convex analysis
- Coordinate descent
- Maximum entropy
- Weak learnability
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence