TY - GEN
T1 - Approximate clustering without the approximation
AU - Balcan, Maria Florina
AU - Blum, Avrim
AU - Gupta, Anupam
PY - 2009
Y1 - 2009
N2 - Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as k-median, k-means, and min-sum clustering. This quest for better approximation algorithms is further fueled by the implicit hope that these better approximations also yield more accurate clusterings. E.g., for many problems such as clustering proteins by function, or clustering images by subject, there is some unknown correct "target" clustering and the implicit hope is that approximately optimizing these objective functions will in fact produce a clustering that is close pointwise to the truth. In this paper, we show that if we make this implicit assumption explicit - that is, if we assume that any c-approximation to the given clustering objective Φ is ∈-close to the target - then we can produce clusterings that are O(∈)-close to the target, even for values c for which obtaining a c-approximation is NP-hard. In particular, for k-median and k-means objectives, we show that we can achieve this guarantee for any constant c > 1, and for the min-sum objective we can do this for any constant c > 2. Our results also highlight a surprising conceptual difference between assuming that the optimal solution to, say, the k-median objective is ∈-close to the target, and assuming that any approximately optimal solution is ∈-close to the target, even for approximation factor say c = 1.01. In the former case, the problem of finding a solution that is O(∈)-close to the target remains computationally hard, and yet for the latter we have an efficient algorithm.
AB - Approximation algorithms for clustering points in metric spaces is a flourishing area of research, with much research effort spent on getting a better understanding of the approximation guarantees possible for many objective functions such as k-median, k-means, and min-sum clustering. This quest for better approximation algorithms is further fueled by the implicit hope that these better approximations also yield more accurate clusterings. E.g., for many problems such as clustering proteins by function, or clustering images by subject, there is some unknown correct "target" clustering and the implicit hope is that approximately optimizing these objective functions will in fact produce a clustering that is close pointwise to the truth. In this paper, we show that if we make this implicit assumption explicit - that is, if we assume that any c-approximation to the given clustering objective Φ is ∈-close to the target - then we can produce clusterings that are O(∈)-close to the target, even for values c for which obtaining a c-approximation is NP-hard. In particular, for k-median and k-means objectives, we show that we can achieve this guarantee for any constant c > 1, and for the min-sum objective we can do this for any constant c > 2. Our results also highlight a surprising conceptual difference between assuming that the optimal solution to, say, the k-median objective is ∈-close to the target, and assuming that any approximately optimal solution is ∈-close to the target, even for approximation factor say c = 1.01. In the former case, the problem of finding a solution that is O(∈)-close to the target remains computationally hard, and yet for the latter we have an efficient algorithm.
UR - http://www.scopus.com/inward/record.url?scp=70349129917&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70349129917&partnerID=8YFLogxK
U2 - 10.1137/1.9781611973068.116
DO - 10.1137/1.9781611973068.116
M3 - Conference contribution
AN - SCOPUS:70349129917
SN - 9780898716801
T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms
SP - 1068
EP - 1077
BT - Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms
PB - Association for Computing Machinery
T2 - 20th Annual ACM-SIAM Symposium on Discrete Algorithms
Y2 - 4 January 2009 through 6 January 2009
ER -