How to Train your Force Field

Molecular dynamics simulations face two great methodological challenges – sampling the relevant structures and pathways, and accuracy of the underlying force field. An accurate force field is essential for MD simulations to connect with experiments, and researchers are constantly examining the accuracy of force fields and improving them for the scientific problems that MD is used to model, such as protein folding, ligand binding and conformational change.

The force field is a potential energy function of the atomic positions; it provides the forces for accelerating the atoms in molecular dynamics and the potential energies for sampling from a statistical mechanical ensemble. In comparison to quantum mechanical methods, force fields do not require an explicit treatment of the electrons, so the calculations are millions to billions of times faster.

Some of the most popular force fields today such as AMBER, CHARMM, OPLS and GROMOS use a simple functional form first defined by Lifson and Warshel (Reference 1), which is a sum of: (1) simple harmonic potentials for the vibrations of predefined bonds and angles, (2) periodic potentials for the torsions of molecular backbones, and (3) pairwise interatomic Coulomb and Lennard-Jones potentials for describing intermolecular interactions (Figure 1). In addition, there exists a diverse assortment of force fields with varying levels of detail and domains of applicability; this includes detailed polarizable models that describe some aspects of the electrons, and coarse-grained models that use a single particle for groups of multiple atoms.

Figure 1: Summary of interaction terms in a typical biomolecular force field. Reproduced from Reference 2.

Figure 1: Summary of interaction terms in a typical biomolecular force field. Reproduced from Reference 2.

The empirical parameters in these force fields such as bond lengths, force constants, torsional barriers and atomic charges are carefully fine-tuned by the force field developer to reproduce known experiments and ab initio quantum calculations, with the goal of increasing accuracy and predictive power. This is a highly challenging problem for many reasons, including the following:

1) The force field is a highly approximate description, so any errors resulting from incompleteness of the model are effectively incorporated (or “rolled up”) into the parameters.

2) It is challenging to determine the dependence of simulated observables on the force field parameters, because simulations are expensive and simulated observables are statistically noisy.

3) The parameterization calculations are highly complex and there exists no workflow for carrying out a force field parameterization project, so results are not reproducible.

These difficult challenges gave rise to the colloquialism, “Nobody wants to know how force fields and sausages are made.” That is, until recently.

Figure 2: The ForceBalance calculation procedure. The calculation begins with an initial set of force field parameters (bottom left), which is used to generate a force field and run simulations using external MD software (upper left). ForceBalance evaluates force field predictions of observables from the simulation data and compares them to saved experimental measurements or quantum calculations to evaluate the objective function and its parameter dependence (upper right). The optimization algorithm determines the next iteration of force field parameters (bottom) and the cycle is repeated until convergence. Reproduced from Reference 3.

Figure 2: The ForceBalance calculation procedure. The calculation begins with an initial set of force field parameters (bottom left), which is used to generate a force field and run simulations using external MD software (upper left). ForceBalance evaluates force field predictions of observables from the simulation data and compares them to saved experimental measurements or quantum calculations to evaluate the objective function and its parameter dependence (upper right). The optimization algorithm determines the next iteration of force field parameters (bottom) and the cycle is repeated until convergence. Reproduced from Reference 3.

ForceBalance (Reference 3) is a method / software package designed from the ground up to address the difficult challenges of force field development; it applies Lifson and Warshel’s fundamental idea of least-squares fitting of parameters to a training data set, but the calculation procedure is made automatic, efficient, systematic and reproducible. ForceBalance creates a framework where any force field parameterization project is carried out by setting up and running the program (Figure 2) – much like how MD simulation projects are individual calculations (or sequences of calculations) in OpenMM.

If you’re working on a problem that requires a more accurate force field than is available in the literature, you are encouraged to download ForceBalance and try it for your project. New users are encouraged to visit https://simtk.org/home/forcebalance, follow the link to the GitHub source code repository, and download the development version of the code. If you have questions about how to use the program, please write me a message on simtk.org or post a question to the discussion forum.

References for further reading:

(1) Warshel, A.; Lifson, S. Consistent Force Field Calculations .2. Crystal Structures, Sublimation Energies, Molecular and Lattice Vibrations, Molecular Conformations, and Enthalpies of Alkanes. J. Chem. Phys. 1970, 53, 582-&.

(2) Levitt, M. The Birth of Computational Structural Biology. Nature Structural Biology 2001, 8, 392-393.

(3) Wang, L.-P.; Martinez, T. J.; Pande, V. S. Building Force Fields: An Automatic, Systematic, and Reproducible Approach. J. Phys. Chem. Lett. 2014, 5, 1885-1891.