TY - GEN
T1 - Supervised quantile normalization for low-rank matrix approximation
AU - Cuturi, Marco
AU - Teboul, Olivier
AU - Niles-Weed, Jonathan
AU - Vert, Jean Philippe
N1 - Publisher Copyright:
© 37th International Conference on Machine Learning, ICML 2020.
PY - 2020
Y1 - 2020
N2 - Low rank matrix factorization is a fundamental building block in machine learning, used for instance to summarize gene expression profile data or word-document counts. To be robust to outliers and differences in scale across features, a matrix factorization step is usually preceded by ad-hoc feature normalization steps, such as tf-idf scaling or data whitening. We propose in this work to learn these normalization operators jointly with the factorization itself. More precisely, given a d × n matrix X of d features measured on n individuals, we propose to learn the parameters of quantile normalization operators that can operate row-wise on the values of X and/or of its factorization UV to improve the quality of the low-rank representation of X itself. This optimization is facilitated by the introduction of a new differentiable quantile normalization operator built using optimal transport, providing new results on top of existing work by (Cuturi et al. 2019). We demonstrate the applicability of these techniques on synthetic and genomics datasets.
AB - Low rank matrix factorization is a fundamental building block in machine learning, used for instance to summarize gene expression profile data or word-document counts. To be robust to outliers and differences in scale across features, a matrix factorization step is usually preceded by ad-hoc feature normalization steps, such as tf-idf scaling or data whitening. We propose in this work to learn these normalization operators jointly with the factorization itself. More precisely, given a d × n matrix X of d features measured on n individuals, we propose to learn the parameters of quantile normalization operators that can operate row-wise on the values of X and/or of its factorization UV to improve the quality of the low-rank representation of X itself. This optimization is facilitated by the introduction of a new differentiable quantile normalization operator built using optimal transport, providing new results on top of existing work by (Cuturi et al. 2019). We demonstrate the applicability of these techniques on synthetic and genomics datasets.
UR - http://www.scopus.com/inward/record.url?scp=85105229535&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105229535&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85105229535
T3 - 37th International Conference on Machine Learning, ICML 2020
SP - 2247
EP - 2257
BT - 37th International Conference on Machine Learning, ICML 2020
A2 - Daume, Hal
A2 - Singh, Aarti
PB - International Machine Learning Society (IMLS)
T2 - 37th International Conference on Machine Learning, ICML 2020
Y2 - 13 July 2020 through 18 July 2020
ER -