When to Use Molecular Dynamics - Part 2 of 2

In my previous post, I addressed the question of whether a molecular dynamic (MD) simulation is appropriate for the question being posed.  Another question to consider when determining whether or not to use MD is:

“Is an MD simulation possible given my computing resources and the system I am interested in?”

This is a difficult question to answer as everyone has different resources at their disposal, but I will tell you what has been done successfully in the past and give you a better idea of how to know if your simulation is possible.

Not everyone has a super-computer available to them, but at Stanford University, we use a distributed computing network called Folding@Home [1]. This project allows people around the world to donate their personal computers to a protein folding project. Armed with this massive resource, we have been able to simulate several small proteins that fold in the microsecond to millisecond regime.

There are two important considerations when gauging how computationally difficult a simulation will be: the number of atoms in the simulation and the timescale of the process you’re interested in simulating. On a state-of-the-art GPU, OpenMM can generate 100 ns / day for a system of 23,000 atoms [2]. This means, if you have a single GPU and want 100 µs of data, you will need to run continuously for three years. That’s an awfully long time… But if we have the resources, it’s definitely possible!

To put all of this in perspective, we can look at a few of the most recent Folding@Home simulations. Obviously, you may not have access to a resource like Folding@Home, but if your system size is bigger than anything we’ve studied and moves slower, then you might want to reconsider using all-atom MD.

protein          number of atoms     total simulation time     wall-clock time

Protein G                900                                50 ms                    ~ 9 months

NTL9                    24,000                              3.2 ms                   ~ 4 months   

*Note: Protein G was simulated on GPUs with GROMACS 4.0.3 powered by OpenMM. NTL9 was simulated on GPUs with OpenMM 5.1.

Final Thoughts

The point of all of this is to encourage you to think critically about running an MD simulation. It’s too easy to use MD to generate terabytes of essentially useless data. When you’re thinking about setting up a simulation remember to ask yourself:

1.      Will an MD simulation be able to provide insight that an experiment or a simpler model couldn’t?

2.      How many atoms are in my simulation? How much total simulation time will I need? Given my resources, how much time will I have to wait before getting the results?

3.      MD is a physical model, so can I trust the results of my simulation? Are there any experimental observables in the literature that I can calculate and use to validate my data?



[1] For more information, see: folding.stanford.edu

[2] wiki.simtk.org/openmm/BenchmarkOpenMMDHFR