TY - JOUR
T1 - Exploring fragment-based target-specific ranking protocol with machine learning on cathepsin S
AU - Yang, Yuwei
AU - Lu, Jianing
AU - Yang, Chao
AU - Zhang, Yingkai
N1 - Funding Information:
We would like to acknowledge the support by NIH (Grant Nos. R35-GM127040, R01GM073943 and R01GM120736) and computing resources provided by NYU-ITS.
Funding Information:
We would like to acknowledge the support by NIH (Grant Nos. R35-GM127040, R01GM073943 and R01GM120736) and computing resources provided by NYU-ITS. Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019/12/1
Y1 - 2019/12/1
N2 - Cathepsin S (CatS), a member of cysteine cathepsin proteases, has been well studied due to its significant role in many pathological processes, including arthritis, cancer and cardiovascular diseases. CatS inhibitors have been included in D3R-GC3 for both docking pose prediction and affinity ranking, and in D3R-GC4 for binding affinity ranking. The difficulties posed by CatS inhibitors in D3R mainly come from three aspects: large size, high flexibility and similar chemical structures. We have participated in GC4; our best submitted model, which employs a similarity-based alignment docking and Vina scoring protocol, yielded Kendall’s τ of 0.23 for 459 binders in GC4. In our further explorations with machine learning, by curating a CatS specific training set, adopting a similarity-based constrained docking method as well as an arm-based fragmentation strategy which can describe large inhibitors in a locality-sensitive fashion, our best structure-based ranking protocol can achieve Kendall’s τ of 0.52 for all binders in GC4. In this exploration process, we have demonstrated the importance of training data, docking approaches and fragmentation strategies in inhibitor-ranking protocol development with machine learning.
AB - Cathepsin S (CatS), a member of cysteine cathepsin proteases, has been well studied due to its significant role in many pathological processes, including arthritis, cancer and cardiovascular diseases. CatS inhibitors have been included in D3R-GC3 for both docking pose prediction and affinity ranking, and in D3R-GC4 for binding affinity ranking. The difficulties posed by CatS inhibitors in D3R mainly come from three aspects: large size, high flexibility and similar chemical structures. We have participated in GC4; our best submitted model, which employs a similarity-based alignment docking and Vina scoring protocol, yielded Kendall’s τ of 0.23 for 459 binders in GC4. In our further explorations with machine learning, by curating a CatS specific training set, adopting a similarity-based constrained docking method as well as an arm-based fragmentation strategy which can describe large inhibitors in a locality-sensitive fashion, our best structure-based ranking protocol can achieve Kendall’s τ of 0.52 for all binders in GC4. In this exploration process, we have demonstrated the importance of training data, docking approaches and fragmentation strategies in inhibitor-ranking protocol development with machine learning.
KW - Docking
KW - Fragmentation
KW - Machine learning
KW - Scoring function
KW - Virtual screening
UR - http://www.scopus.com/inward/record.url?scp=85075448972&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075448972&partnerID=8YFLogxK
U2 - 10.1007/s10822-019-00247-3
DO - 10.1007/s10822-019-00247-3
M3 - Article
C2 - 31729618
AN - SCOPUS:85075448972
SN - 0920-654X
VL - 33
SP - 1095
EP - 1105
JO - Journal of Computer-Aided Molecular Design
JF - Journal of Computer-Aided Molecular Design
IS - 12
ER -