Clustering, factor discovery and optimal transport

Hongkang Yang, Esteban G. Tabak

Research output: Contribution to journalArticlepeer-review

Abstract

The clustering problem, and more generally latent factor discovery or latent space inference, is formulated in terms of the Wasserstein barycenter problem from optimal transport. The objective proposed is the maximization of the variability attributable to class, further characterized as the minimization of the variance of the Wasserstein barycenter. Existing theory, which constrains the transport maps to rigid translations, is extended to affine transformations. The resulting non-parametric clustering algorithms include k-means as a special case and exhibit more robust performance. A continuous version of these algorithms discovers continuous latent variables and generalizes principal curves. The strength of these algorithms is demonstrated by tests on both artificial and real-world data sets.

Original languageEnglish (US)
Pages (from-to)1353-1387
Number of pages35
JournalInformation and Inference
Volume10
Issue number4
DOIs
StatePublished - Dec 1 2021

Keywords

  • Wasserstein barycenter
  • clustering
  • explanation of variability
  • factor discovery
  • optimal transport
  • principal curve

ASJC Scopus subject areas

  • Analysis
  • Statistics and Probability
  • Numerical Analysis
  • Computational Theory and Mathematics
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Clustering, factor discovery and optimal transport'. Together they form a unique fingerprint.

Cite this