Fast and Accurate Prediction of Tautomer Ratios in Aqueous Solution via a Siamese Neural Network

Xiaolin Pan, Xudong Zhang, Song Xia, Yingkai Zhang

Research output: Contribution to journalArticlepeer-review

Abstract

Tautomerization plays a critical role in chemical and biological processes, influencing molecular stability, reactivity, biological activity, and ADME-Tox properties. Many drug-like molecules exist in multiple tautomeric states in aqueous solution, complicating the study of protein-ligand interactions. Rapid and accurate prediction of tautomer ratios and identification of predominant species are therefore crucial in computational drug discovery. In this study, we introduce sPhysNet-Taut, a deep learning model fine-tuned on experimental data using a Siamese neural network architecture. This model directly predicts tautomer ratios in aqueous solution based on MMFF94-optimized molecular geometries. On experimental test sets, sPhysNet-Taut achieves state-of-the-art performance with root-mean-square error (RMSE) of 1.9 kcal/mol on the 100-tautomers set and 1.0 kcal/mol on the SAMPL2 challenge, outperforming all other methods. It also provides superior ranking power for tautomer pairs on multiple test sets. Our results demonstrate that fine-tuning on experimental data significantly enhances model performance compared to training from scratch. This work not only offers a valuable deep learning model for predicting tautomer ratios but also presents a protocol for modeling pairwise data. To promote usability, we have developed an accessible tool that predicts stable tautomeric states in aqueous solution by enumerating all possible tautomeric states and ranking them using our model. The source code and web server are freely accessible at https://github.com/xiaolinpan/sPhysNet-Taut and https://yzhang.hpc.nyu.edu/tautomer.

Original languageEnglish (US)
Pages (from-to)3132-3141
Number of pages10
JournalJournal of chemical theory and computation
Volume21
Issue number6
DOIs
StatePublished - Mar 25 2025

ASJC Scopus subject areas

  • Computer Science Applications
  • Physical and Theoretical Chemistry

Fingerprint

Dive into the research topics of 'Fast and Accurate Prediction of Tautomer Ratios in Aqueous Solution via a Siamese Neural Network'. Together they form a unique fingerprint.

Cite this