The past few years have witnessed significant advances in developing machine learning methods for molecular energetics predictions, including calculated electronic energies with high-level quantum mechanical methods and experimental properties, such as solvation free energy and logP. Typically, task-specific machine learning models are developed for distinct prediction tasks. In this work, we present a multitask deep ensemble model, sPhysNet-MT-ens5, which can simultaneously and accurately predict electronic energies of molecules in gas, water, and octanol phases, as well as transfer free energies at both calculated and experimental levels. On the calculated data set Frag20-solv-678k, which is developed in this work and contains 678,916 molecular conformations, up to 20 heavy atoms, and their properties calculated at B3LYP/6-31G∗ level of theory with continuum solvent models, sPhysNet-MT-ens5 predicts density functional theory (DFT)-level electronic energies directly from force field-optimized geometry within chemical accuracy. On the experimental data sets, sPhysNet-MT-ens5 achieves state-of-the-art performances, which predict both experimental hydration free energy with a RMSE of 0.620 kcal/mol on the FreeSolv data set and experimental logP with a RMSE of 0.393 on the PHYSPROP data set. Furthermore, sPhysNet-MT-ens5 also provides a reasonable estimation of model uncertainty which shows correlations with prediction error. Finally, by analyzing the atomic contributions of its predictions, we find that the developed deep learning model is aware of the chemical environment of each atom by assigning reasonable atomic contributions consistent with our chemical knowledge.
ASJC Scopus subject areas
- Computer Science Applications
- Physical and Theoretical Chemistry