TY - JOUR
T1 - Improved Data-Driven Collective Variables for Biased Sampling through Iteration on Biased Data
AU - Sasmal, Subarna
AU - McCullagh, Martin
AU - Hocky, Glen M.
N1 - Publisher Copyright:
© 2025 The Authors. Published by American Chemical Society.
PY - 2025/6/26
Y1 - 2025/6/26
N2 - Our ability to efficiently sample conformational transitions between two known states of a biomolecule using collective variable (CV)-based sampling depends strongly on the choice of the CV. We previously reported a data-driven approach to clustering biomolecular configurations with a probabilistic clustering model termed shapeGMM. ShapeGMM is a Gaussian mixture model in Cartesian coordinates, with means and covariances in each cluster representing the harmonic approximation to the conformational ensemble around a metastable state. We subsequently showed that linear discriminant analysis on positions (posLDA) produces good reaction coordinates to characterize the transition between two of these states, and moreover, they can be biased to produce transitions between the states using metadynamics-like approaches. However, the quality of these posLDA coordinates depends on the amount of data used to characterize the states, and here, we demonstrate the ability to systematically improve them using enhanced sampling data. Specifically, we demonstrate that improved CVs for sampling can be generated by iteratively performing biased sampling along a posLDA coordinate and then generating a new shapeGMM model from biased data from the previous iteration. The new coordinates derived from our iterative approach show a substantial improvement in being able to induce transitions between metastable states and to converge a free energy surface.
AB - Our ability to efficiently sample conformational transitions between two known states of a biomolecule using collective variable (CV)-based sampling depends strongly on the choice of the CV. We previously reported a data-driven approach to clustering biomolecular configurations with a probabilistic clustering model termed shapeGMM. ShapeGMM is a Gaussian mixture model in Cartesian coordinates, with means and covariances in each cluster representing the harmonic approximation to the conformational ensemble around a metastable state. We subsequently showed that linear discriminant analysis on positions (posLDA) produces good reaction coordinates to characterize the transition between two of these states, and moreover, they can be biased to produce transitions between the states using metadynamics-like approaches. However, the quality of these posLDA coordinates depends on the amount of data used to characterize the states, and here, we demonstrate the ability to systematically improve them using enhanced sampling data. Specifically, we demonstrate that improved CVs for sampling can be generated by iteratively performing biased sampling along a posLDA coordinate and then generating a new shapeGMM model from biased data from the previous iteration. The new coordinates derived from our iterative approach show a substantial improvement in being able to induce transitions between metastable states and to converge a free energy surface.
UR - http://www.scopus.com/inward/record.url?scp=105008019664&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=105008019664&partnerID=8YFLogxK
U2 - 10.1021/acs.jpcb.5c02164
DO - 10.1021/acs.jpcb.5c02164
M3 - Article
AN - SCOPUS:105008019664
SN - 1520-6106
VL - 129
SP - 6163
EP - 6171
JO - Journal of Physical Chemistry B
JF - Journal of Physical Chemistry B
IS - 25
ER -