Improving scoring-docking-screening powers of protein–ligand scoring functions using random forest

Cheng Wang, Yingkai Zhang

Research output: Contribution to journalArticlepeer-review


The development of new protein–ligand scoring functions using machine learning algorithms, such as random forest, has been of significant interest. By efficiently utilizing expanded feature sets and a large set of experimental data, random forest based scoring functions (RFbScore) can achieve better correlations to experimental protein–ligand binding data with known crystal structures; however, more extensive tests indicate that such enhancement in scoring power comes with significant under-performance in docking and screening power tests compared to traditional scoring functions. In this work, to improve scoring-docking-screening powers of protein–ligand docking functions simultaneously, we have introduced a ΔvinaRF parameterization and feature selection framework based on random forest. Our developed scoring function ΔvinaRF20, which employs 20 descriptors in addition to the AutoDock Vina score, can achieve superior performance in all power tests of both CASF-2013 and CASF-2007 benchmarks compared to classical scoring functions. The ΔvinaRF20 scoring function and its code are freely available on the web at:

Original languageEnglish (US)
Pages (from-to)169-177
Number of pages9
JournalJournal of Computational Chemistry
Issue number3
StatePublished - Jan 30 2017


  • docking
  • machine learning
  • protein–ligand binding affinity
  • random forest
  • scoring function

ASJC Scopus subject areas

  • General Chemistry
  • Computational Mathematics


Dive into the research topics of 'Improving scoring-docking-screening powers of protein–ligand scoring functions using random forest'. Together they form a unique fingerprint.

Cite this