Exploring fragment-based target-specific ranking protocol with machine learning on cathepsin S

Yuwei Yang, Jianing Lu, Chao Yang, Yingkai Zhang

Research output: Contribution to journalArticlepeer-review


Cathepsin S (CatS), a member of cysteine cathepsin proteases, has been well studied due to its significant role in many pathological processes, including arthritis, cancer and cardiovascular diseases. CatS inhibitors have been included in D3R-GC3 for both docking pose prediction and affinity ranking, and in D3R-GC4 for binding affinity ranking. The difficulties posed by CatS inhibitors in D3R mainly come from three aspects: large size, high flexibility and similar chemical structures. We have participated in GC4; our best submitted model, which employs a similarity-based alignment docking and Vina scoring protocol, yielded Kendall’s τ of 0.23 for 459 binders in GC4. In our further explorations with machine learning, by curating a CatS specific training set, adopting a similarity-based constrained docking method as well as an arm-based fragmentation strategy which can describe large inhibitors in a locality-sensitive fashion, our best structure-based ranking protocol can achieve Kendall’s τ of 0.52 for all binders in GC4. In this exploration process, we have demonstrated the importance of training data, docking approaches and fragmentation strategies in inhibitor-ranking protocol development with machine learning.

Original languageEnglish (US)
Pages (from-to)1095-1105
Number of pages11
JournalJournal of Computer-Aided Molecular Design
Issue number12
StatePublished - Dec 1 2019


  • Docking
  • Fragmentation
  • Machine learning
  • Scoring function
  • Virtual screening

ASJC Scopus subject areas

  • Drug Discovery
  • Computer Science Applications
  • Physical and Theoretical Chemistry


Dive into the research topics of 'Exploring fragment-based target-specific ranking protocol with machine learning on cathepsin S'. Together they form a unique fingerprint.

Cite this