Abstract
Approximating the k-means clustering objective with an online learning algorithm is an open problem. We introduce a family of online clustering algorithms by extending algorithms for online supervised learning, with access to expert predictors, to the unsupervised learning setting. Instead of computing prediction errors in order to re-weight the experts, the algorithms compute an approximation to the current value of the k-means objective obtained by each expert. When the experts are batch clustering algorithms with b-approximation guarantees with respect to the k-means objective (for example, the k-means++ or k-means# algorithms), applied to a sliding window of the data stream, our algorithms obtain approximation guarantees with respect to the kmeans objective. The form of these online clustering approximation guarantees is novel, and extends an evaluation framework proposed by Dasgupta as an analog to regret. Notably, our approximation bounds are with respect to the optimal k-means cost on the entire data stream seen so far, even though the algorithm is online. Our algorithm's empirical performance tracks that of the best clustering algorithm in its expert set.
Original language | English (US) |
---|---|
Pages (from-to) | 227-235 |
Number of pages | 9 |
Journal | Journal of Machine Learning Research |
Volume | 22 |
State | Published - 2012 |
Event | 15th International Conference on Artificial Intelligence and Statistics, AISTATS 2012 - La Palma, Spain Duration: Apr 21 2012 → Apr 23 2012 |
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Statistics and Probability
- Artificial Intelligence