TY - JOUR
T1 - Sequence-structure-function relationships in the microbial protein universe
AU - Koehler Leman, Julia
AU - Szczerbiak, Pawel
AU - Renfrew, P. Douglas
AU - Gligorijevic, Vladimir
AU - Berenberg, Daniel
AU - Vatanen, Tommi
AU - Taylor, Bryn C.
AU - Chandler, Chris
AU - Janssen, Stefan
AU - Pataki, Andras
AU - Carriero, Nick
AU - Fisk, Ian
AU - Xavier, Ramnik J.
AU - Knight, Rob
AU - Bonneau, Richard
AU - Kosciolek, Tomasz
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023/12
Y1 - 2023/12
N2 - For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don’t rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict ~200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses.
AB - For the past half-century, structural biologists relied on the notion that similar protein sequences give rise to similar structures and functions. While this assumption has driven research to explore certain parts of the protein universe, it disregards spaces that don’t rely on this assumption. Here we explore areas of the protein universe where similar protein functions can be achieved by different sequences and different structures. We predict ~200,000 structures for diverse protein sequences from 1,003 representative genomes across the microbial tree of life and annotate them functionally on a per-residue basis. Structure prediction is accomplished using the World Community Grid, a large-scale citizen science initiative. The resulting database of structural models is complementary to the AlphaFold database, with regards to domains of life as well as sequence diversity and sequence length. We identify 148 novel folds and describe examples where we map specific functions to structural motifs. We also show that the structural space is continuous and largely saturated, highlighting the need for a shift in focus across all branches of biology, from obtaining structures to putting them into context and from sequence-based to sequence-structure-function based meta-omics analyses.
UR - http://www.scopus.com/inward/record.url?scp=85153915129&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85153915129&partnerID=8YFLogxK
U2 - 10.1038/s41467-023-37896-w
DO - 10.1038/s41467-023-37896-w
M3 - Article
C2 - 37100781
AN - SCOPUS:85153915129
SN - 2041-1723
VL - 14
JO - Nature communications
JF - Nature communications
IS - 1
M1 - 2351
ER -