TY - GEN
T1 - Learning efficiently with approximate inference via dual losses
AU - Meshi, Ofer
AU - Sontag, David
AU - Jaakkola, Tommi
AU - Globerson, Amir
PY - 2010
Y1 - 2010
N2 - Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cutting-plane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimally. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using co-ordinate descent. Our algorithm is competitive with state-of-the-art methods such as stochastic subgradient and cutting-plane.
AB - Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cutting-plane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimally. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using co-ordinate descent. Our algorithm is competitive with state-of-the-art methods such as stochastic subgradient and cutting-plane.
UR - http://www.scopus.com/inward/record.url?scp=77956556288&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77956556288&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:77956556288
SN - 9781605589077
T3 - ICML 2010 - Proceedings, 27th International Conference on Machine Learning
SP - 783
EP - 790
BT - ICML 2010 - Proceedings, 27th International Conference on Machine Learning
T2 - 27th International Conference on Machine Learning, ICML 2010
Y2 - 21 June 2010 through 25 June 2010
ER -