## Abstract

In the kernel clustering problem we are given a large n × n positive-semidefinite matrix A = (a_{ij}) with Σ _{i, j=1}^{n} a_{ij} = 0 and a small k × k positive-semidefinite matrix B = (b_{ij}). The goal is to find a partition S_{1}, ⋯, S_{k} of {1, ⋯ n} which maximizes the quantity Σ_{i, j=1}^{k}(Σ _{(p, q)}εS_{i}×S_{j} a_{pq})b _{ij}. We study the computational complexity of this generic clustering problem which originates in the theory of machine learning. We design a constant factor polynomial time approximation algorithm for this problem, answering a question posed by Song et al. In some cases we manage to compute the sharp approximation threshold for this problem assuming the unique games conjecture (UGC). In particular, when B is the 3 × 3 identity matrix the UGC hardness threshold of this problem is exactly 16π/27. We present and study a geometric conjecture of independent interest which we show would imply that the UGC threshold when B is the k × k identity matrix is (8π/9)(1 - 1/k) for every k ≥ 3.

Original language | English (US) |
---|---|

Pages (from-to) | 129-165 |

Number of pages | 37 |

Journal | Mathematika |

Volume | 55 |

Issue number | 1-2 |

DOIs | |

State | Published - Jan 2009 |

## ASJC Scopus subject areas

- Mathematics(all)