TY - GEN
T1 - Cutting through the clutter
T2 - IEEE Winter Conference on Applications of Computer Vision, WACV 2016
AU - Girdhar, Rohit
AU - Fouhey, David F.
AU - Kitani, Kris M.
AU - Gupta, Abhinav
AU - Hebert, Martial
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/5/23
Y1 - 2016/5/23
N2 - Where do we focus our attention in an image? Humans have an amazing ability to cut through the clutter to the parts of an image most relevant to the task at hand. Consider the task of geo-localizing tourist photos by retrieving other images taken at that location. Such photos naturally contain friends and family, and perhaps might even be nearly filled by a person's face if it is a selfie. Humans have no trouble ignoring these 'distractions' and recognizing the parts that are indicative of location (e.g., the towers of Neuschwanstein Castle instead of their friend's face, a tree, or a car). In this paper, we investigate learning this ability automatically. At training-time, we learn how informative a region is for localization. At test-time, we use this learned model to determine what parts of a query image to use for retrieval. We introduce a new dataset, People at Landmarks, that contains large amounts of clutter in query images. Our system is able to outperform the existing state of the art approach to retrieval by more than 10% mAP, as well as improve results on a standard dataset without heavy occluders (Oxford5K).
AB - Where do we focus our attention in an image? Humans have an amazing ability to cut through the clutter to the parts of an image most relevant to the task at hand. Consider the task of geo-localizing tourist photos by retrieving other images taken at that location. Such photos naturally contain friends and family, and perhaps might even be nearly filled by a person's face if it is a selfie. Humans have no trouble ignoring these 'distractions' and recognizing the parts that are indicative of location (e.g., the towers of Neuschwanstein Castle instead of their friend's face, a tree, or a car). In this paper, we investigate learning this ability automatically. At training-time, we learn how informative a region is for localization. At test-time, we use this learned model to determine what parts of a query image to use for retrieval. We introduce a new dataset, People at Landmarks, that contains large amounts of clutter in query images. Our system is able to outperform the existing state of the art approach to retrieval by more than 10% mAP, as well as improve results on a standard dataset without heavy occluders (Oxford5K).
UR - http://www.scopus.com/inward/record.url?scp=84977666435&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84977666435&partnerID=8YFLogxK
U2 - 10.1109/WACV.2016.7477576
DO - 10.1109/WACV.2016.7477576
M3 - Conference contribution
AN - SCOPUS:84977666435
T3 - 2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016
BT - 2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 7 March 2016 through 10 March 2016
ER -