DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet

Yifei Qi, John Z.H. Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

Computational protein design remains a challenging task despite its remarkable success in the past few decades. With the rapid progress of deep-learning techniques and the accumulation of three-dimensional protein structures, the use of deep neural networks to learn the relationship between protein sequences and structures and then automatically design a protein sequence for a given protein backbone structure is becoming increasingly feasible. In this study, we developed a deep neural network named DenseCPD that considers the three-dimensional density distribution of protein backbone atoms and predicts the probability of 20 natural amino acids for each residue in a protein. The accuracy of DenseCPD was 53.24 ± 0.17% in a 5-fold cross-validation on the training set and 55.53% and 50.71% on two independent test sets, which is more than 10% higher than those of previous state-of-the-art methods. Two approaches for using DenseCPD predictions in computational protein design were analyzed. The approach using the cutoff of accumulative probability had a smaller sequence search space compared with the approach that simply uses the top-k predictions and therefore enabled higher sequence identity in redesigning three proteins with Rosetta. The network and the datasets are available on a web server at http://protein.org.cn/densecpd.html. The results of this study may benefit the further development of computational protein design methods.

Original languageEnglish (US)
Number of pages8
JournalACS Applied Materials and Interfaces
Volume60
Issue number3
DOIs
StateAccepted/In press - 2020

Keywords

  • Amino Acid Sequence
  • Amino Acids
  • Computational Biology
  • Neural Networks, Computer
  • Proteins

ASJC Scopus subject areas

  • General Materials Science

Fingerprint

Dive into the research topics of 'DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet'. Together they form a unique fingerprint.

Cite this