Dimensionality reduction for k-means clustering and low rank approximation

Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, Madalina Persu

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We show how to approximate a data matrix A with a much smaller sketch A that can be used to solve a general class of constrained k-rank approximation problems to within (1 + ∈) error. Importantly, this class includes k-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just O(k) dimensions, we generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For k-means dimensionality reduction, we provide (1 + ∈) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only 'cover' a good subspace for A, but can be used directly to compute this subspace. Finally, for k-means clustering, we show how to achieve a (9 + ∈) approximation by Johnson-Lindenstrauss projecting data to just O(logk/∈2 ) dimensions. This is the first result that leverages the specific structure of k-means to achieve dimension independent of input size and sublinear in k.

    Original languageEnglish (US)
    Title of host publicationSTOC 2015 - Proceedings of the 2015 ACM Symposium on Theory of Computing
    PublisherAssociation for Computing Machinery
    Pages163-172
    Number of pages10
    ISBN (Electronic)9781450335362
    DOIs
    StatePublished - Jun 14 2015
    Event47th Annual ACM Symposium on Theory of Computing, STOC 2015 - Portland, United States
    Duration: Jun 14 2015Jun 17 2015

    Publication series

    NameProceedings of the Annual ACM Symposium on Theory of Computing
    Volume14-17-June-2015
    ISSN (Print)0737-8017

    Other

    Other47th Annual ACM Symposium on Theory of Computing, STOC 2015
    CountryUnited States
    CityPortland
    Period6/14/156/17/15

    ASJC Scopus subject areas

    • Software

    Fingerprint Dive into the research topics of 'Dimensionality reduction for k-means clustering and low rank approximation'. Together they form a unique fingerprint.

  • Cite this

    Cohen, M. B., Elder, S., Musco, C., Musco, C., & Persu, M. (2015). Dimensionality reduction for k-means clustering and low rank approximation. In STOC 2015 - Proceedings of the 2015 ACM Symposium on Theory of Computing (pp. 163-172). (Proceedings of the Annual ACM Symposium on Theory of Computing; Vol. 14-17-June-2015). Association for Computing Machinery. https://doi.org/10.1145/2746539.2746569