Introducing Ensembler

We are happy to announce the release of Ensembler as part of the Omnia suite!

Ensembler is an automated pipeline for generating diverse arrays of protein configurations from omics-scale genomic and structural data. The user selects a set of protein sequences (targets) and a set of protein structures (templates), and each target-template pair is then subjected to comparative modeling and a series of refinement steps. Briefly, these are the stages of the Ensembler pipeline:

1. Target sequence selection - via a search of UniProt, or defined manually
2. Template structure selection - via a search of UniProt, or by specifying PDB IDs, or defined manually
3. (Optional) Template loop reconstruction (using Rosetta)
4. Alignment and comparative modeling (using Modeller)
5. RMSD-based clustering to filter out non-unique models (using MDTraj)
6. Energy minimization and implicit solvent molecular dynamics simulation (using OpenMM) (default: 100 ps)
7. Solvation with explicit water (using OpenMM)
8. Explicit solvent molecular dynamics simulation (using OpenMM) (default: 100 ps)
9. (Optional) Package and/or compress the models, ready for transfer or set-up on other platforms such as Folding@home

Ensembler thus maximizes usage of publicly available sequence and structure data to produce configurationally diverse protein ensembles. These models can then be subjected to further structural analysis or used to seed highly parallel molecular dynamics simulations (e.g. using distributed computing frameworks such as Folding@home). In the latter case, the increased efficiency of sampling would be of particular utility for methods such as Markov state models which can aggregate trajectory data to construct kinetic models of conformational dynamics.

Documentation and installation information can be found at

All source code is available on GitHub at

Our recently submitted manuscript describing Ensembler and its application to modeling all human tyrosine kinases can be found on bioRxiv.