TY - JOUR
T1 - Accurate Prediction of Aqueous Free Solvation Energies Using 3D Atomic Feature-Based Graph Neural Network with Transfer Learning
AU - Zhang, Dongdong
AU - Xia, Song
AU - Zhang, Yingkai
N1 - Funding Information:
This work was supported by the U.S. National Institutes of Health (R35-GM127040). We thank NYU-ITS for providing computational resources.
Publisher Copyright:
© 2022 American Chemical Society. All rights reserved.
PY - 2022/4/25
Y1 - 2022/4/25
N2 - Graph neural network (GNN)-based deep learning (DL) models have been widely implemented to predict the experimental aqueous solvation free energy, while its prediction accuracy has reached a plateau partly due to the scarcity of available experimental data. In order to tackle this challenge, we first build a large and diverse calculated data set Frag20-Aqsol-100K of aqueous solvation free energy with reasonable computational cost and accuracy via electronic structure calculations with continuum solvent models. Then, we develop a novel 3D atomic feature-based GNN model with the principal neighborhood aggregation (PNAConv) and demonstrate that 3D atomic features obtained from molecular mechanics-optimized geometries can significantly improve the learning power of GNN models in predicting calculated solvation free energies. Finally, we employ a transfer learning strategy by pre-training our DL model on Frag20-Aqsol-100K and fine-tuning it on the small experimental data set, and the fine-tuned model A3D-PNAConv-FT achieves the state-of-the-art prediction on the FreeSolv data set with a root-mean-squared error of 0.719 kcal/mol and a mean-absolute error of 0.417 kcal/mol using random data splits. These results indicate that integrating molecular modeling and DL would be a promising strategy to develop robust prediction models in molecular science. The source code and data are accessible at: https://yzhang.hpc.nyu.edu/IMA.
AB - Graph neural network (GNN)-based deep learning (DL) models have been widely implemented to predict the experimental aqueous solvation free energy, while its prediction accuracy has reached a plateau partly due to the scarcity of available experimental data. In order to tackle this challenge, we first build a large and diverse calculated data set Frag20-Aqsol-100K of aqueous solvation free energy with reasonable computational cost and accuracy via electronic structure calculations with continuum solvent models. Then, we develop a novel 3D atomic feature-based GNN model with the principal neighborhood aggregation (PNAConv) and demonstrate that 3D atomic features obtained from molecular mechanics-optimized geometries can significantly improve the learning power of GNN models in predicting calculated solvation free energies. Finally, we employ a transfer learning strategy by pre-training our DL model on Frag20-Aqsol-100K and fine-tuning it on the small experimental data set, and the fine-tuned model A3D-PNAConv-FT achieves the state-of-the-art prediction on the FreeSolv data set with a root-mean-squared error of 0.719 kcal/mol and a mean-absolute error of 0.417 kcal/mol using random data splits. These results indicate that integrating molecular modeling and DL would be a promising strategy to develop robust prediction models in molecular science. The source code and data are accessible at: https://yzhang.hpc.nyu.edu/IMA.
UR - http://www.scopus.com/inward/record.url?scp=85128692808&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128692808&partnerID=8YFLogxK
U2 - 10.1021/acs.jcim.2c00260
DO - 10.1021/acs.jcim.2c00260
M3 - Review article
C2 - 35422122
AN - SCOPUS:85128692808
SN - 1549-9596
VL - 62
SP - 1840
EP - 1848
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 8
ER -