TY - GEN
T1 - Web scale photo hash clustering on a single machine
AU - Gong, Yunchao
AU - Pawlowski, Marcin
AU - Yang, Fei
AU - Brandy, Louis
AU - Boundev, Lubomir
AU - Fergus, Rob
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/10/14
Y1 - 2015/10/14
N2 - This paper addresses the problem of clustering a very large number of photos (i.e. hundreds of millions a day) in a stream into millions of clusters. This is particularly important as the popularity of photo sharing websites, such as Facebook, Google, and Instagram. Given large number of photos available online, how to efficiently organize them is an open problem. To address this problem, we propose to cluster the binary hash codes of a large number of photos into binary cluster centers. We present a fast binary k-means algorithm that works directly on the similarity-preserving hashes of images and clusters them into binary centers on which we can build hash indexes to speedup computation. The proposed method is capable of clustering millions of photos on a single machine in a few minutes. We show that this approach is usually several magnitude faster than standard k-means and produces comparable clustering accuracy. In addition, we propose an online clustering method based on binary k-means that is capable of clustering large photo stream on a single machine, and show applications to spam detection and trending photo discovery.
AB - This paper addresses the problem of clustering a very large number of photos (i.e. hundreds of millions a day) in a stream into millions of clusters. This is particularly important as the popularity of photo sharing websites, such as Facebook, Google, and Instagram. Given large number of photos available online, how to efficiently organize them is an open problem. To address this problem, we propose to cluster the binary hash codes of a large number of photos into binary cluster centers. We present a fast binary k-means algorithm that works directly on the similarity-preserving hashes of images and clusters them into binary centers on which we can build hash indexes to speedup computation. The proposed method is capable of clustering millions of photos on a single machine in a few minutes. We show that this approach is usually several magnitude faster than standard k-means and produces comparable clustering accuracy. In addition, we propose an online clustering method based on binary k-means that is capable of clustering large photo stream on a single machine, and show applications to spam detection and trending photo discovery.
UR - http://www.scopus.com/inward/record.url?scp=84959231611&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959231611&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2015.7298596
DO - 10.1109/CVPR.2015.7298596
M3 - Conference contribution
AN - SCOPUS:84959231611
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 19
EP - 27
BT - IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
PB - IEEE Computer Society
T2 - IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
Y2 - 7 June 2015 through 12 June 2015
ER -