TY - GEN
T1 - The Price of Explainability for Clustering
AU - Gupta, Anupam
AU - Pittu, Madhusudhan Reddy
AU - Svensson, Ola
AU - Yuan, Rachel
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Given a set of points in d-dimensional space, an explainable clustering is one where the clusters are specified by a tree of axis-aligned threshold cuts. Dasgupta et al. (ICML 2020) posed the question of the price of explainability: the worst-case ratio between the cost of the best explainable clusterings to that of the best clusterings.We show that the price of explainability for k medians is at most 1+H_k-1; in fact, we show that the popular Random Thresholds algorithm has exactly this price of explainability, matching the known lower bound constructions. We complement our tight analysis of this particular algorithm by constructing instances where the price of explainability (using any algorithm) is at least (1-o(1)) ln k, showing that our result is best possible, up to lower-order terms. We also improve the price of explainability for the k-means problem to O(k ln k) from the previous O(k ln k), considerably closing the gap to the lower bounds of Ω(k). Finally, we study the algorithmic question of finding the best explainable clustering: We show that explainable k medians and k-means cannot be approximated better than O(ln k), under standard complexity-theoretic conjectures. This essentially settles the approximability of explainable k-medians and leaves open the intriguing possibility to get significantly better approximation algorithms for k-means than its price of explainability.
AB - Given a set of points in d-dimensional space, an explainable clustering is one where the clusters are specified by a tree of axis-aligned threshold cuts. Dasgupta et al. (ICML 2020) posed the question of the price of explainability: the worst-case ratio between the cost of the best explainable clusterings to that of the best clusterings.We show that the price of explainability for k medians is at most 1+H_k-1; in fact, we show that the popular Random Thresholds algorithm has exactly this price of explainability, matching the known lower bound constructions. We complement our tight analysis of this particular algorithm by constructing instances where the price of explainability (using any algorithm) is at least (1-o(1)) ln k, showing that our result is best possible, up to lower-order terms. We also improve the price of explainability for the k-means problem to O(k ln k) from the previous O(k ln k), considerably closing the gap to the lower bounds of Ω(k). Finally, we study the algorithmic question of finding the best explainable clustering: We show that explainable k medians and k-means cannot be approximated better than O(ln k), under standard complexity-theoretic conjectures. This essentially settles the approximability of explainable k-medians and leaves open the intriguing possibility to get significantly better approximation algorithms for k-means than its price of explainability.
KW - approximation algorithms
KW - explainable clustering
KW - k-means
KW - k-medians
KW - randomized algorithms
UR - http://www.scopus.com/inward/record.url?scp=85182391277&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182391277&partnerID=8YFLogxK
U2 - 10.1109/FOCS57990.2023.00067
DO - 10.1109/FOCS57990.2023.00067
M3 - Conference contribution
AN - SCOPUS:85182391277
T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
SP - 1131
EP - 1148
BT - Proceedings - 2023 IEEE 64th Annual Symposium on Foundations of Computer Science, FOCS 2023
PB - IEEE Computer Society
T2 - 64th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2023
Y2 - 6 November 2023 through 9 November 2023
ER -