Building and analyzing Markov models with PyEMMA 2

Markov (state) models (MSMs) and related approaches are coarse-grained models of the molecular dynamics consisting of distinct molecular structures / conformations, and the transition rates between them. Markov models can systematically reconcile simulation data from either a few long or many short simulations and allow us to analyze the essential metastable structures, thermodynamics, and kinetics of the molecular system under investigation. However, the estimation, validation, and analysis of such models is far from trivial and involves sophisticated and often numerically sensitive methods.

PyEMMA 2 provides accurate and efficient algorithms for kinetic model construction and analysis. PyEMMA is a python package, runs under all common OSes and can be used through Python user scripts or interactively in IPython notebooks. Its functionalities include:

  • Read all commonly used MD input formats (powered by mdtraj).
  • Featurize your trajectories, i.e. transform Cartesian coordinates into dihedrals, distances, contact maps or custom features.
  • Find the slowest collective coordinates (reaction coordinates) using time-lagged independent component analysis (TICA).
  • Cluster and discretize the state space.
  • Process data either in memory or streaming mode - that way you can work even with very large / many trajectories and big molecular systems.
  • Estimate and validate Markov state models (MSMs). Computer their statistical distribution and uncertainty using Bayesian MSMs.
  • Computing long-lived (metastable) states and structures with Perron-cluster cluster analysis (PCCA)
  • Perform systematic coarse-graining of MSMs to transition models with few states.
  • Estimate hidden Markov Models (HMM) and Bayesian HMMs (powered by bhmm).
  • Take advantage of extensive analysis options for MSMs and HMMs, e.g. calculate committor probabilities, mean first passage times, transition rates, experimental expectation values and time-correlation functions (powered by msmtools).
  • Explore mechanisms of protein folding, protein-ligand association or conformational changes using Transition Path Theory (TPT).
  • Plot and visualize your results in paper-ready form.

Installation, Documentation and Tutorials: http://pyemma.org

Post issues / participate in development: https://github.com/markovmodel/pyemma

PyEMMA Paper: M. K. Scherer, B. Trendelkamp-Schroer, F. Paul, G. Pérez-Hernandez, M. Hoffmann, N. Plattner, C. Wehmeyer, J.-H. Prinz and F. Noé: PyEMMA 2: A software package for estimation, validation and analysis of Markov models. Journal of Chemical Theory and Computation 11, 5525,5542 (2015)

When to Use Molecular Dynamics - Part 1 of 2

It can be tempting to think of molecular dynamics (MD) as an atomic-level microscope, able to describe arbitrary molecular interactions that are unobservable in the lab. But it’s important to remember that MD is a model that attempts to describe the molecular interactions in a very specific way. There is a particular functional form of the forces between any two atoms, and on top of this, there are many forcefields that parameterize these forces differently.

With this in mind, we need to always ask ourselves: “Can I trust the results of my simulation?” This is a difficult question to answer, but at least for the protein folding simulations that we do in the Pande group, we typically ask, “Are our simulations consistent with some experimental measurements of the same protein?” This can be a useful way to validate our simulations.

In addition to these concerns about the accuracy of the MD model, there are several more practical questions we should ask before beginning any simulation, specifically:

  • Is an MD simulation appropriate for the question that I'm asking?
  • Is an MD simulation possible given my computing resources and the system I am interested in?

I will address this first question here and tackle the second question in my next post.

“Is an MD simulation appropriate for the question that I’m asking?”

The power of a simulation is to provide insight that could not be gained in an easier way. Take for instance the field of protein folding. Here, there are numerous experiments that can measure low-dimensional projections of the very complex folding process. Drawing conclusions from these experiments, however, can be difficult since there may be many underlying mechanisms that are consistent with the data. This is where MD can be quite useful: it provides a physical model that can be used to interpret an experiment.  

It may be tempting to turn to simulation as another experimental technique for calculating measurable quantities. For instance, imagine that you’re working with a drug that binds to an enzyme and you’re interested in determining its binding constant, but don’t really care about the mechanism of binding. An experiment should be fully capable of measuring this binding constant, and a simulation would provide no further insight that you’re interested in! More importantly, remember that MD is ultimately just a model for the intrinsic dynamics, so it may turn out that the calculated binding constant is simply inconsistent with the experimental measurement.

The same can be said for other questions, where atomic-level detail is not necessary. For instance, consider a researcher that is interested in the relative diffusion constants between the dimeric and trimeric forms of a protein. A simpler model that does not attempt to describe the atomic-level interactions would likely be just as accurate as an MD simulation.  

Next, we need to consider the practicality of running an MD simulation.