TY - JOUR

T1 - Convex risk minimization and conditional probability estimation

AU - Telgarsky, Matus

AU - Dudík, Miroslav

AU - Schapire, Robert

N1 - Funding Information:
This work is supported by National Basic Research Program of China (973 Program) under Grant 2012CB316400 and National Natural Science Foundation of China (Grant No: 61125203, 61222207, 61233011, 90920303).
Publisher Copyright:
© 2015 M. Telgarsky, M. Dudík & R. Schapire.

PY - 2015

Y1 - 2015

N2 - This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem. Unlike most previous work, we give results that are general enough to include cases in which no minimum exists, as occurs typically, for instance, with standard boosting algorithms. Concretely, we first show that any sequence of predictors minimizing convex risk over the source distribution will converge to this unique model when the class of predictors is linear (but potentially of infinite dimension). Secondly, we show the same result holds for empirical risk minimization whenever this class of predictors is finite dimensional, where the essential technical contribution is a norm-free generalization bound.

AB - This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem. Unlike most previous work, we give results that are general enough to include cases in which no minimum exists, as occurs typically, for instance, with standard boosting algorithms. Concretely, we first show that any sequence of predictors minimizing convex risk over the source distribution will converge to this unique model when the class of predictors is linear (but potentially of infinite dimension). Secondly, we show the same result holds for empirical risk minimization whenever this class of predictors is finite dimensional, where the essential technical contribution is a norm-free generalization bound.

KW - Classification

KW - Conditional Probability Estimation

KW - Consistency

KW - Convex Duality

KW - Maximum Entropy

KW - Orlicz Spaces

UR - http://www.scopus.com/inward/record.url?scp=84984671856&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84984671856&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84984671856

SN - 1532-4435

VL - 40

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

IS - 2015

T2 - 28th Conference on Learning Theory, COLT 2015

Y2 - 2 July 2015 through 6 July 2015

ER -