Accelerating advanced force fields

The last decade has been incredibly exciting in the field of classical molecular dynamics as new hardware, such as GPUs (and the Anton machines from D. E. Shaw Research) has extended the range of MD simulations into the microsecond and even millisecond time scales.

The introduction of new hardware, which was necessary to extend to longer time scales, brought with it the need to rewrite software to take advantage of the accelerators.  These rewrites paid dividends by allowing MD simulations to run up to 100x faster than was previously possible on hardware that is more inexpensive and energy efficient than dedicated supercomputers.

However, these software advances aimed at new accelerator hardware has focused almost exclusively on classical fixed-charge force fields, and advanced force fields like AMOEBA were largely left out.  While fixed-charge force fields have enjoyed jumps computational performance by several orders of magnitude since I started working in the field (ca. 2008), the computational performance of the AMOEBA force field has been growing far slower.

As is a general trend, what makes AMOEBA so much more accurate than fixed-charge force fields with respect to the underlying physics--namely the treatment of atoms as point multipoles with a polarizable dipole rather than a simple charge--also makes it much slower.  For instance, comparing the implementation of AMOEBA in the Amber program suite with the fixed-charge Amber force field showed a performance gap of 40-50x.  Comparing against the GPU implementation increases that gap 10x to 100x more.  Such performance gaps make it nearly impossible to do satisfactory sampling with the AMOEBA force field, which limits the promise of its improved underlying physics and handicaps efforts aimed at improving the parametrization.

But with OpenMM, that is changing.  OpenMM boasts the first, and to date only, implementation of the AMOEBA force field for GPUs.  Its results are indistinguishable from those generated by the reference AMOEBA implementation in Tinker (and is used as a backend in TInker to provide GPU acceleration).  To demonstrate the promise of the AMOEBA implementation in OpenMM's CUDA platform, I've listed some comparative benchmarks below when running the AMOEBA force field with Tinker and OpenMM on the Joint Amber CHARMM benchmark (JAC). This system has 23,558 atoms comprised of the dihydrofolate reductase protein (DHFR) with explicit water molecules.

Number of Cores Tinker (OpenMP) pmemd.amoeba (MPI)
2 0.03 ns/day 0.06 ns/day
4 0.04 ns/day 0.09 ns/day
8 0.8 ns/day 0.11 ns/day
16 0.9 ns/day 0.07 ns/day

These implementations have limited scalability, and cannot utilize more than 16 to 32 processors efficiently.  As a result, it would take a standard server almost 25 years to simulate 1 microsecond of MD!  By comparison, many fixed-charge force fields can achieve performance nearing 200 ns/day!

Now let's look at OpenMM performance using the CUDA implementation of AMOEBA:

GPU Model Performance (ns/day)
C2050 0.3886
K5000 0.5233
GTX 680 0.7468
GTX 780 1.1380
GTX 980 2.1467

Using the latest model of GPU available (GTX 980, which cost around $500-$600 USD in the USA), we achieve about 20x speedup compared to the maximum performance attainable on a standard server!  And the trend in GPU performance is very telling, as incremental increases in GPU models result in substantial performance gains.

To improve matters even more, the AMOEBA implementation has undergone numerous performance improvements since the release of OpenMM 6.3, and now runs between 10 and 15% faster than these benchmarks show.  And further improvements seem well within reach, but those will be saved for a future blog post.