Analysis of MD: Equilibrium and Dynamics

Imagine that you had access to one very long (or very many somewhat shorter) unbiased MD simulation(s). What would (or should) you do with the dataset; what quantities can be estimated?

First, we can estimate equilibrium expectation values. Each trajectory is, by construction, a realization of a Markov chain whose stationary distribution is the equilibrium distribution of the system. Assuming we're in the canonical (NVT) ensemble, then the equilibrium (Boltzmann) distribution of the system is

$$\pi(x) = \frac{1}{Z} e^{-\beta U(x)}$$

If we index the timesteps of the simulation, $t \in \{1, 2, \ldots\}$, and let $X_t$ be the position at time $t$, then $X_t$ converges in distribution to $\pi$.

$$X_t \xrightarrow{d} \pi$$

Why is this useful? Because it provides the basis by which we can calculate expectation values for the system. If we want to calculate things like the ensemble average distance between various residues, FRET absorbance, X-ray scattering intensities, or other properties that can be well-modeled as an ensemble average over $\pi$ of some (instantaneous) observable $g(X)$, this property guarantees that averages of $g$ over a trajectory will converge to the right answer in the limit that the length of the trajectory goes to infinity.

$$\lim_{t\rightarrow \infty}\frac{1}{T} \sum_t^T g(X_t) = \mathbb{E}_\pi \left[ g(X) \right]$$

But there's potentially much more information in an MD trajectory than just estimates of equilibrium expectation values. Consider questions about the system's dynamics like the following:

1. How long does it take for the system to transition between two particular regions of conformation space?
2. What are the slow dynamical modes in the system? Which degrees of freedom or collective variables in the system take the most time to equilibrate?
3. What predictions can be made about relaxation experiments like temperature-jump IR spectroscopy?

These types of questions concern the dynamics of the system, and cannot truly be answered by the calculation of equilibrium expectation values. Why? If we look at the structure of the mean calculation above, we can see that it treats the individual $X_t$ as if they're exchangeable. This means roughly that given an MD trajectory, we get the same estimate for the equilibrium average of $g$ if we scramble the order of the frames in the trajectory (vs. using them in the proper'' order). All of the temporal structure of the trajectories is thus disregarded.

In an upcoming post, I'll talk about how these types of questions about the dynamics can be posed, and how Markov state models can be efficient estimators for these quantities.

Comment