DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet

Yifei Qi, John Z.H. Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

Computational protein design remains a challenging task despite its remarkable success in the past few decades. With the rapid progress of deep-learning techniques and the accumulation of three-dimensional protein structures, the use of deep neural networks to learn the relationship between protein sequences and structures and then automatically design a protein sequence for a given protein backbone structure is becoming increasingly feasible. In this study, we developed a deep neural network named DenseCPD that considers the three-dimensional density distribution of protein backbone atoms and predicts the probability of 20 natural amino acids for each residue in a protein. The accuracy of DenseCPD was 53.24 ± 0.17% in a 5-fold cross-validation on the training set and 55.53% and 50.71% on two independent test sets, which is more than 10% higher than those of previous state-of-the-art methods. Two approaches for using DenseCPD predictions in computational protein design were analyzed. The approach using the cutoff of accumulative probability had a smaller sequence search space compared with the approach that simply uses the top-k predictions and therefore enabled higher sequence identity in redesigning three proteins with Rosetta. The network and the datasets are available on a web server at http://protein.org.cn/densecpd.html. The results of this study may benefit the further development of computational protein design methods.

Original languageEnglish (US)
Pages (from-to)1245-1252
Number of pages8
JournalJournal of Chemical Information and Modeling
Volume60
Issue number3
DOIs
StatePublished - Mar 23 2020

Keywords

  • Amino Acid Sequence
  • Amino Acids
  • Computational Biology
  • Neural Networks, Computer
  • Proteins

ASJC Scopus subject areas

  • General Materials Science

Fingerprint

Dive into the research topics of 'DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet'. Together they form a unique fingerprint.

Cite this