Abstract
The development of new protein–ligand scoring functions using machine learning algorithms, such as random forest, has been of significant interest. By efficiently utilizing expanded feature sets and a large set of experimental data, random forest based scoring functions (RFbScore) can achieve better correlations to experimental protein–ligand binding data with known crystal structures; however, more extensive tests indicate that such enhancement in scoring power comes with significant under-performance in docking and screening power tests compared to traditional scoring functions. In this work, to improve scoring-docking-screening powers of protein–ligand docking functions simultaneously, we have introduced a ΔvinaRF parameterization and feature selection framework based on random forest. Our developed scoring function ΔvinaRF20, which employs 20 descriptors in addition to the AutoDock Vina score, can achieve superior performance in all power tests of both CASF-2013 and CASF-2007 benchmarks compared to classical scoring functions. The ΔvinaRF20 scoring function and its code are freely available on the web at: https://www.nyu.edu/projects/yzhang/DeltaVina.
Original language | English (US) |
---|---|
Pages (from-to) | 169-177 |
Number of pages | 9 |
Journal | Journal of Computational Chemistry |
Volume | 38 |
Issue number | 3 |
DOIs | |
State | Published - Jan 30 2017 |
Keywords
- docking
- machine learning
- protein–ligand binding affinity
- random forest
- scoring function
ASJC Scopus subject areas
- General Chemistry
- Computational Mathematics