TY - CONF
T1 - Statistical optimal transport via factored couplings
AU - Forrow, Aden
AU - Hütter, Jan Christian
AU - Nitzan, Mor
AU - Rigollet, Philippe
AU - Schiebinger, Geoffrey
AU - Weed, Jonathan
N1 - Funding Information:
M.N. is supported by the James S. McDonnell Foundation, Schmidt Futures, Israel Council for Higher Education, and the John Harvard Distinguished Science Fellows Program; P.R. by NSF grants DMS-1712596 and TRIPODS-1740751 and IIS-1838071, ONR grant N00014-17-1-2147, the Chan Zuckerberg Initiative DAF 2018-182642, and the MIT Skoltech Seed Fund; G.S. by a Burroughs Welcome Fund Career Award at the Scientific Interface and the Klarman Cell Observatory; and J.W. by the Josephine de Karman fellowship.
Publisher Copyright:
© 2019 by the author(s).
PY - 2020
Y1 - 2020
N2 - We propose a new method to estimate Wasserstein distances and optimal transport plans between two probability distributions from samples in high dimension. Unlike plug-in rules that simply replace the true distributions by their empirical counterparts, our method promotes couplings with low transport rank, a new structural assumption that is similar to the nonnegative rank of a matrix. Regularizing based on this assumption leads to drastic improvements on high-dimensional data for various tasks, including domain adaptation in single-cell RNA sequencing data. These findings are supported by a theoretical analysis that indicates that the transport rank is key in overcoming the curse of dimensionality inherent to data-driven optimal transport.
AB - We propose a new method to estimate Wasserstein distances and optimal transport plans between two probability distributions from samples in high dimension. Unlike plug-in rules that simply replace the true distributions by their empirical counterparts, our method promotes couplings with low transport rank, a new structural assumption that is similar to the nonnegative rank of a matrix. Regularizing based on this assumption leads to drastic improvements on high-dimensional data for various tasks, including domain adaptation in single-cell RNA sequencing data. These findings are supported by a theoretical analysis that indicates that the transport rank is key in overcoming the curse of dimensionality inherent to data-driven optimal transport.
UR - http://www.scopus.com/inward/record.url?scp=85084928186&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084928186&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85084928186
T2 - 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019
Y2 - 16 April 2019 through 18 April 2019
ER -