TY - GEN
T1 - Improving image classification with location context
AU - Tang, Kevin
AU - Paluri, Manohar
AU - Fei-Fei, Li
AU - Fergus, Rob
AU - Bourdev, Lubomir
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/2/17
Y1 - 2015/2/17
N2 - With the widespread availability of cellphones and cameras that have GPS capabilities, it is common for images being uploaded to the Internet today to have GPS coordinates associated with them. In addition to research that tries to predict GPS coordinates from visual features, this also opens up the door to problems that are conditioned on the availability of GPS coordinates. In this work, we tackle the problem of performing image classification with location context, in which we are given the GPS coordinates for images in both the train and test phases. We explore different ways of encoding and extracting features from the GPS coordinates, and show how to naturally incorporate these features into a Convolutional Neural Network (CNN), the current state-of-the-art for most image classification and recognition problems. We also show how it is possible to simultaneously learn the optimal pooling radii for a subset of our features within the CNN framework. To evaluate our model and to help promote research in this area, we identify a set of location-sensitive concepts and annotate a subset of the Yahoo Flickr Creative Commons 100M dataset that has GPS coordinates with these concepts, which we make publicly available. By leveraging location context, we are able to achieve almost a 7% gain in mean average precision.
AB - With the widespread availability of cellphones and cameras that have GPS capabilities, it is common for images being uploaded to the Internet today to have GPS coordinates associated with them. In addition to research that tries to predict GPS coordinates from visual features, this also opens up the door to problems that are conditioned on the availability of GPS coordinates. In this work, we tackle the problem of performing image classification with location context, in which we are given the GPS coordinates for images in both the train and test phases. We explore different ways of encoding and extracting features from the GPS coordinates, and show how to naturally incorporate these features into a Convolutional Neural Network (CNN), the current state-of-the-art for most image classification and recognition problems. We also show how it is possible to simultaneously learn the optimal pooling radii for a subset of our features within the CNN framework. To evaluate our model and to help promote research in this area, we identify a set of location-sensitive concepts and annotate a subset of the Yahoo Flickr Creative Commons 100M dataset that has GPS coordinates with these concepts, which we make publicly available. By leveraging location context, we are able to achieve almost a 7% gain in mean average precision.
UR - http://www.scopus.com/inward/record.url?scp=84973861940&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84973861940&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2015.121
DO - 10.1109/ICCV.2015.121
M3 - Conference contribution
AN - SCOPUS:84973861940
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 1008
EP - 1016
BT - 2015 International Conference on Computer Vision, ICCV 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Conference on Computer Vision, ICCV 2015
Y2 - 11 December 2015 through 18 December 2015
ER -