TY - GEN
T1 - Semantic road segmentation via multi-scale ensembles of learned features
AU - Alvarez, Jose M.
AU - LeCun, Yann
AU - Gevers, Theo
AU - Lopez, Antonio M.
PY - 2012
Y1 - 2012
N2 - Semantic segmentation refers to the process of assigning an object label (e.g., building, road, sidewalk, car, pedestrian) to every pixel in an image. Common approaches formulate the task as a random field labeling problem modeling the interactions between labels by combining local and contextual features such as color, depth, edges, SIFT or HoG. These models are trained to maximize the likelihood of the correct classification given a training set. However, these approaches rely on hand-designed features (e.g., texture, SIFT or HoG) and a higher computational time required in the inference process. Therefore, in this paper, we focus on estimating the unary potentials of a conditional random field via ensembles of learned features. We propose an algorithm based on convolutional neural networks to learn local features from training data at different scales and resolutions. Then, diversification between these features is exploited using a weighted linear combination. Experiments on a publicly available database show the effectiveness of the proposed method to perform semantic road scene segmentation in still images. The algorithm outperforms appearance based methods and its performance is similar compared to state-of-the-art methods using other sources of information such as depth, motion or stereo.
AB - Semantic segmentation refers to the process of assigning an object label (e.g., building, road, sidewalk, car, pedestrian) to every pixel in an image. Common approaches formulate the task as a random field labeling problem modeling the interactions between labels by combining local and contextual features such as color, depth, edges, SIFT or HoG. These models are trained to maximize the likelihood of the correct classification given a training set. However, these approaches rely on hand-designed features (e.g., texture, SIFT or HoG) and a higher computational time required in the inference process. Therefore, in this paper, we focus on estimating the unary potentials of a conditional random field via ensembles of learned features. We propose an algorithm based on convolutional neural networks to learn local features from training data at different scales and resolutions. Then, diversification between these features is exploited using a weighted linear combination. Experiments on a publicly available database show the effectiveness of the proposed method to perform semantic road scene segmentation in still images. The algorithm outperforms appearance based methods and its performance is similar compared to state-of-the-art methods using other sources of information such as depth, motion or stereo.
UR - http://www.scopus.com/inward/record.url?scp=84867692186&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84867692186&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-33868-7_58
DO - 10.1007/978-3-642-33868-7_58
M3 - Conference contribution
AN - SCOPUS:84867692186
SN - 9783642338670
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 586
EP - 595
BT - Computer Vision, ECCV 2012 - Workshops and Demonstrations, Proceedings
PB - Springer Verlag
T2 - Computer Vision, ECCV 2012 - Workshops and Demonstrations, Proceedings
Y2 - 7 October 2012 through 13 October 2012
ER -