TY - JOUR
T1 - Dictionary learning for integrative, multimodal and scalable single-cell analysis
AU - Hao, Yuhan
AU - Stuart, Tim
AU - Kowalski, Madeline H.
AU - Choudhary, Saket
AU - Hoffman, Paul
AU - Hartman, Austin
AU - Srivastava, Avi
AU - Molla, Gesmira
AU - Madad, Shaista
AU - Fernandez-Granda, Carlos
AU - Satija, Rahul
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Nature America, Inc. 2023.
PY - 2024/2
Y1 - 2024/2
N2 - Mapping single-cell sequencing profiles to comprehensive reference datasets provides a powerful alternative to unsupervised analysis. However, most reference datasets are constructed from single-cell RNA-sequencing data and cannot be used to annotate datasets that do not measure gene expression. Here we introduce ‘bridge integration’, a method to integrate single-cell datasets across modalities using a multiomic dataset as a molecular bridge. Each cell in the multiomic dataset constitutes an element in a ‘dictionary’, which is used to reconstruct unimodal datasets and transform them into a shared space. Our procedure accurately integrates transcriptomic data with independent single-cell measurements of chromatin accessibility, histone modifications, DNA methylation and protein levels. Moreover, we demonstrate how dictionary learning can be combined with sketching techniques to improve computational scalability and harmonize 8.6 million human immune cell profiles from sequencing and mass cytometry experiments. Our approach, implemented in version 5 of our Seurat toolkit (http://www.satijalab.org/seurat), broadens the utility of single-cell reference datasets and facilitates comparisons across diverse molecular modalities.
AB - Mapping single-cell sequencing profiles to comprehensive reference datasets provides a powerful alternative to unsupervised analysis. However, most reference datasets are constructed from single-cell RNA-sequencing data and cannot be used to annotate datasets that do not measure gene expression. Here we introduce ‘bridge integration’, a method to integrate single-cell datasets across modalities using a multiomic dataset as a molecular bridge. Each cell in the multiomic dataset constitutes an element in a ‘dictionary’, which is used to reconstruct unimodal datasets and transform them into a shared space. Our procedure accurately integrates transcriptomic data with independent single-cell measurements of chromatin accessibility, histone modifications, DNA methylation and protein levels. Moreover, we demonstrate how dictionary learning can be combined with sketching techniques to improve computational scalability and harmonize 8.6 million human immune cell profiles from sequencing and mass cytometry experiments. Our approach, implemented in version 5 of our Seurat toolkit (http://www.satijalab.org/seurat), broadens the utility of single-cell reference datasets and facilitates comparisons across diverse molecular modalities.
UR - http://www.scopus.com/inward/record.url?scp=85160332958&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85160332958&partnerID=8YFLogxK
U2 - 10.1038/s41587-023-01767-y
DO - 10.1038/s41587-023-01767-y
M3 - Article
C2 - 37231261
AN - SCOPUS:85160332958
SN - 1087-0156
VL - 42
SP - 293
EP - 304
JO - Nature Biotechnology
JF - Nature Biotechnology
IS - 2
ER -