TY - JOUR
T1 - Protein remote homology detection and structural alignment using deep learning
AU - Hamamsy, Tymor
AU - Morton, James T.
AU - Blackwell, Robert
AU - Berenberg, Daniel
AU - Carriero, Nicholas
AU - Gligorijevic, Vladimir
AU - Strauss, Charlie E.M.
AU - Leman, Julia Koehler
AU - Cho, Kyunghyun
AU - Bonneau, Richard
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023
Y1 - 2023
N2 - Exploiting sequence–structure–function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning methods to address this gap, TM-Vec and DeepBLAST. TM-Vec allows searching for structure–structure similarities in large sequence databases. It is trained to accurately predict TM-scores as a metric of structural similarity directly from sequence pairs without the need for intermediate computation or solution of structures. Once structurally similar proteins have been identified, DeepBLAST can structurally align proteins using only sequence information by identifying structurally homologous regions between proteins. It outperforms traditional sequence alignment methods and performs similarly to structure-based alignment methods. We show the merits of TM-Vec and DeepBLAST on a variety of datasets, including better identification of remotely homologous proteins compared with state-of-the-art sequence alignment and structure prediction methods.
AB - Exploiting sequence–structure–function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning methods to address this gap, TM-Vec and DeepBLAST. TM-Vec allows searching for structure–structure similarities in large sequence databases. It is trained to accurately predict TM-scores as a metric of structural similarity directly from sequence pairs without the need for intermediate computation or solution of structures. Once structurally similar proteins have been identified, DeepBLAST can structurally align proteins using only sequence information by identifying structurally homologous regions between proteins. It outperforms traditional sequence alignment methods and performs similarly to structure-based alignment methods. We show the merits of TM-Vec and DeepBLAST on a variety of datasets, including better identification of remotely homologous proteins compared with state-of-the-art sequence alignment and structure prediction methods.
UR - http://www.scopus.com/inward/record.url?scp=85169902813&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85169902813&partnerID=8YFLogxK
U2 - 10.1038/s41587-023-01917-2
DO - 10.1038/s41587-023-01917-2
M3 - Article
AN - SCOPUS:85169902813
SN - 1087-0156
JO - Nature Biotechnology
JF - Nature Biotechnology
ER -