ProtoMD: A Prototyping Toolkit for Multiscale
Molecular Dynamics
arXiv:1309.6397v6 [physics.comp-ph] 8 Jan 2016
Endre Somogyib,c,∗∗, Andrew Abi Mansoura,c,∗∗, Peter J. Ortolevaa,c,∗
a Department
of Chemistry, Indiana University, Bloomington
of Physics, Indiana University, Bloomington
c Center for Theoretical and Computational Nanoscience, Indiana University, Bloomington
b Department
Abstract
ProtoMD is a toolkit that facilitates the development of algorithms for multiscale molecular dynamics (MD) simulations. It is designed for multiscale
methods which capture the dynamic transfer of information across multiple
spatial scales, such as the atomic to the mesoscopic scale, via coevolving microscopic and coarse-grained (CG) variables. ProtoMD can be also be used to
calibrate parameters needed in traditional CG-MD methods. The toolkit integrates ‘GROMACS wrapper’ to initiate MD simulations, and ‘MDAnalysis’ to
analyze and manipulate trajectory files. It facilitates experimentation with a
spectrum of coarse-grained variables, prototyping rare events (such as chemical reactions), or simulating nanocharacterization experiments such as terahertz
spectroscopy, AFM, nanopore, and time-of-flight mass spectroscopy. ProtoMD
is written in python and is freely available under the GNU General Public License from github.com/CTCNano/proto md.
Keywords: python, molecular dynamics, multiscale, coarse-graining
PROGRAM SUMMARY/NEW VERSION PROGRAM SUMMARY
Manuscript Title: ProtoMD: A Prototyping Toolkit for Multiscale Molecular Dynamics
Authors: Endre Somogyi, Andrew Abi Mansour, and Peter J. Ortoleva
Program Title: ProtoMD
Journal Reference:
Catalogue identifier:
∗ Corresponding
∗∗ Both
author: ortoleva@indiana.edu
authors contributed equally to this work
Preprint submitted to Elsevier
January 11, 2016
Licensing provisions: GPL v3 (or above)
Programming language: python 2.7.3
Computer: x86 / x86 64
Operating system: Linux
RAM: Depends on the size of the system being simulated and duration of the simulation (few MBs to TBs)
Number of processors used: 12 - 128
Supplementary material:
Keywords: python, molecular dynamics, multiscale, coarse-graining
Classification:
Computational Methods
External routines/libraries:
GROMACS, MDAnalysis, GromacsWrapper, numpy, scipy
Subprograms used:
Nature of problem:
Prototyping multiscale coarse-grained algorithms for molecular dynamics
Solution method:
Combining the open-source GROMACS molecular dynamics package and the pythonbased MDAnalysis library for running, debugging, and analyzing multiscale simulations
Restrictions:
The system under study must be characterized by a clear separation of timescales;
otherwise, the multiscale algorithm fails to capture the slowly-varying modes.
Unusual features:
Additional comments:
Running time:
Depending on the problem size, simulations can take few hours to months.
1. Introduction
Supramolecular assemblies of contemporary interest include viruses, vaccine nanoparticles, nanocapsules for drug delivery, nanomaterials, and lightharvesting fibers. These systems have been studied via all-atom [1, 2, 3, 4, 5]
2
and coarse-grained (CG) [6, 7, 8, 9] molecular dynamics (MD). MD software
uses physical concepts and software engineering to make simulations accurate
and efficient via modern hardware (e.g., GPUs and multicore platforms). However, simulating supramillion-atom systems over long time periods still presents
a challenge for MD. As a result, many multiscale coarse-grained MD methods
have been proposed in the past. These methods can be generally divided into
two classes. The first is based on coevolving both the atomistic and CG states
simultaneously [10, 11, 12, 13, 14]; in these methods, the atomic resolution is
never lost. The second class coarse-grains the atomistic structure and evolves
the CG variables based on dynamical equations that depend on parameters to
be determined [15, 16, 17, 18, 19, 20]. These parameters are usually calibrated
from experimental or MD simulation data.
In this work, we present ProtoMD, a multiscale toolkit for prototyping both
classes of multiscale MD algorithms. ProtoMD can be used as a program to
run multiscale simulations or as a library to debug a multiscale algorithm or
perform coarse-grained analysis. ProtoMD uses GROMACS [21] to perform
energy minimization and run MD simulations. This is needed for completing
the atomistic phase in coevolution methods and for calibrating system-specific
parameters for CG-MD methods.
An overview of the theoretical methods on which ProtoMD is based is presented in sec. (2), a discussion on the implementation of these methods and the
design of ProtoMD are provided in sec. (3), and a general user guide is covered
in sec. (4).
2. Theoretical framework
While ProtoMD is a general-purpose toolkit that can be used to prototype
CG-MD methods (if calibration is based on data collected from MD simula-
3
tions), here we emphasize the implementation of multiscale coevolution algorithms using the space-warping method [10, 22] as a coarse-graining technique.
Conventional MD is designed to solve Newton’s equations for a set of N
atoms interacting via conventional or custom force fields. Coevolution methods take advantage of exntensive investment made in the computational and
software engineering of conventional MD codes. This is achieved by integrating
MD into one complete multiscale algorithm that evolves an atomistic system via
MD, followed by coarse-grained dynamics. Coevolution methods have many of
the same goals as conventional MD, primarily to simulate the evolving state of
the N atoms with atomic precision. However, these methods are able to achieve
this with greater efficiency in two distinct ways. The first follows the evolution
of an N atom system along a time course that starts with user specified initial
data. The second is the analogue of ensemble MD where hundreds of MD simulations are followed to provide an assessment of the statistical significance of the
simulation. These two types of simulations (single and ensemble) are achieved
via two distinct algorithms implemented as two branches of the code.
Many-atom systems display cooperative behaviors ranging from mass and
charge density oscillations to structural transitions between crystal phases in
a polymorphic system. One challenge in many-particle physics is to develop a
quantitative understanding about how such behaviors emerge from the Newtonian dynamics of many particles influencing each other via an interatomic force
field such as CHARMM [23] or AMBER [24]. The standard statistical mechanical framework used to address such a challenge is the Liouville equation (LE)
for the dynamics of the N -particle density, ρ(Γ, t), for the 6N particle positions
and momenta, Γ. The pathway to achieving this quantitative understanding
considered here is to unfold the dependencies of ρ on the microstate, i.e., on
both Γ and a set of coarse-grained variables, Φ. Since Φ is constructed from Γ,
4
then Φ obeys the equation of motion Φ̇ = Π, where Π can also be expressed in
terms of Γ through Newton’s equations. With this, and the usual arguments of
probability conservation used to derive the LE, one finds that ρ(Γ, Φ, t) satisfies
the LE. Thus, an extended description of the system state is introduced such
that it includes the microstate (described by Γ), and the mesostate (described
by Φ). At first, this seems like a backward step in the quest for dimensionality reduction. However, this expanded representation facilitates theoretical
developments and an associated conceptual picture, which ultimately imply efficient and accurate computational algorithms. This extended description introduces approximations that enter only via well-founded multiscale perturbation
[25, 26, 11, 13] and Trotter factorization [14, 27] methods. Besides the computational reduction, there are several advantages to our approach. Key features of
the 6N dimensional representation are provided, and codevelopment of a multiscale perturbation technique with the Trotter scheme yields insights into the
physical meaning of the latter. Both methods enable the use of well-established
interatomic force fields directly, avoiding the need to conjecture the form of CG
governing equations.
2.0.1. Multiscale factorization
This technique is employed for the generation of a single N −atom trajectory while an N −atom system simultaneously evolves on multiple scales. This
introduces a computational challenge for a time stepping algorithm, i.e. capturing the slow modes as well as rapidly fluctuating ones restricts the timestep.
The latter determine the time stepping of conventional MD integrators. However, the characteristic time of the collective modes can be orders of magnitude
larger than that of atomic fluctuations. Since the collective behaviors are a
result of many individual atomic motions, the collective and fast-atom modes
are coupled, leading to the slowness of conventional MD. This is overcome in
5
multiscale factorization. Rather than attempting to untangle the collective and
fast particle modes directly, a Trotter factorization strategy is used [28]. In this
approach, the untangling of the collective and fast-atom modes is achieved via
an alternating stepping evolution procedure [14, 27]. In each step, the collective
and fast-particle modes are updated: the former via the collective integration
of the momenta constructed from the MD phase, and the latter by conventional MD. This is justified via the stationarity hypothesis, which states that
the time evolution of the momenta conjugate to the collective variables can be
represented by a stationary process which expresses a representative ensemble
of values over a period of time short relative to the characteristic CG timescale
[14]. Then if the characteristic timescale of the collective behavior is ∆, and
the MD duration needed to generate the representative ensemble of conjugate
momenta states is δ, the theoretical efficiency over MD is expected to be ∆/δ.
2.0.2. Multiscale perturbation
This method generates an ensemble of N -atom trajectories and is therefore
useful as an accelerated trajectory sampling technique, i.e. ensemble MD. Starting from the Liouville equation, Langevin type equations for a variety of CG
variables are derived [25, 29, 26]. The multiscale approach also provides prescriptions for constructing all factors (such as diffusion coefficients and thermalaverage forces) in these CG governing equations, criteria for completeness of the
set of CG variables, and algorithms for efficient simulation of N -atom systems
based on coevolution of CG variables and an ensemble of all-atom configurations
evolving with them [11, 13, 30, 31]. The theoretical speedup over ensemble MD
in this case is also ∆/δ, with δ being the length of the short MD runs constituting the microscopic ensemble needed to advance the mesoscopic state in time
over a period of ∆.
6
2.1. Coarse graining and subsystems
Interatomic forces in many-atom systems induce collective (i.e., coherent)
behaviors which characterize the long space-time dynamics. Identifying these
collective variables is a first step in a multiscale simulation. Mesoscopic systems
also manifest single atomic modes (such as collisions or vibrations) that are
needed for a complete understanding of the system behavior. While a variety
of CG variables have been introduced in the context of many-atom simulations
[32, 33], here the focus is on slowly varying ones [10, 29, 22] since they provide an appropriate starting point for multiscale analysis. The general notion
is to consider a set of CG variables φ, related to the N atom positions r, via
a dimensionality reduction map. Complex systems such as viruses and other
macromolecular assemblies can often be separated into a collection of subsystems. Therefore, it is convenient to apply this dimensionality reduction locally
to each subsystem of the entire structure. In the subsystem decomposition approach, subsystem l has its own set of CG variables, denoted φl . For example,
a virus capsid can often be decomposed into energy-defined capsomers, notably
pentamers or hexamers of a protein (Fig. (1)).
Specifically, let the center of mass of subsystem l be denoted Rl and the
subsystem in which atom i resides be l(i). Let ri be the position of atom i and
si its position relative to Rl(i) , i.e.,
ri = Rl(i) + si .
(1)
Using this notation, one can introduce CG variables describing the position,
orientation, and deformation of each subsystem. For example, using the space
warping method [10, 22], basis functions Ukli are introduced for atom i in subsystem l, where k is the order of the polynomial spatial function used in defining
the CG variables. For example, if Legendre polynomials are used in constructing
7
Figure 1: The capsid-like structure of L1 Human Papilimano virus (HPV) proteins (left) is divided into twelve pentameric subsystems (right), each of which
has its own set of CG variables when subsystem decomposition is used in ProtoMD.
Ukli , then each subsystem is embedded in a cube that contains the subsystem
and within which Legendre polynomials are used to define the CG variables in
terms of a reference atomic-configuration [30]. Let Φkl be the k th CG variable
for subsystem l, then the CG to all-atom map is taken to be a Fourier-like
expansion
si =
X
Φkl Ukli + σi .
(2)
k
The k−sum on the RHS describes the more coherent (i.e. collective) dynamics of
the system while the residual σi accounts for the small scale, stochastic motion
of atom i over-and-above the coherent motion. The basis functions Ukli are
taken to be orthogonal polynomials. The specific relation between Φkl and the
atomic positions is determined by maximizing the information in Φkl . This is
accomplished by minimizing the mass-weighted sum of square residuals [29],
restricted to atoms in the given subsystem l.
As system complexity (e.g., the number of subsystems and their internal
structure) increases, one may increase the number of CG variables, e.g., the
range of the sum in Eq. (2). As the number of CG variables increases, smaller
8
scale features are captured, but the characteristic timescale decreases. This
restricts the timestep used in advancing the CG state. Too coarse a description
might miss a given pathway of, e.g., self-assembly or disassembly. Thus, there
is a trade-off between completeness and numerical efficiency. However, this
strategy has been shown to be successful in arriving at an accurate and efficient
algorithm for simulating bionanosystems [30, 22, 34] since the CG variables guide
the overall dynamics, while much of the atomic-scale structure is captured by
the fine-scale aspect of the coevolution.
The implementation of both multiscale algorithms and the subsystem decomposition outlined here is discussed in the next section that covers the technical
aspects of the design of ProtoMD.
3. Design and implementation
3.1. Strategy
One of the key goals in the design of ProtoMD is to allow users to readily experiment and develop their own CG variable definitions and dynamical
equations. The overall structure of ProtoMD is based on an object-oriented
approach, with strict enforcement that all functions are guaranteed to be side
effect free. Side effect producing functions can modify global variables or input
parameters without the user being aware of it. Such side effects are eliminated
with the approach adopted here, which proved to be practical when dealing with
large sets of numerical data (atomic coordinates, velocities, etc., for supramillion atom systems). All state management is isolated in the top level System
class. The System object contains a number of other objects. However, once
these objects are initialized, they are immutable. For example, the integrators
do not have any internal mutable state, i.e., they simply take an existing atomic
state (from the System.current timestep context) and calculate the next state.
9
3.2. Functions and classes
A system is a set of atoms that can be delineated in a number of ways. We
specify subsystem delineation with an MDAnalysis selection string. MDAnalysis
provides a full atomic selection language comparable to VMD [35] or CHARMM
[1]. Subsystem CG variables are handled with the proto md.subsystems.SubSystem
class. A SubSystem references a specific set of atoms, calculates its associated
CG variables, or conversely it can take new CG variables and construct the
micro state, or an ensemble thereof, consistent with the new CG state. SubSystems are notified by the System of various events such as when the energy of
the atomic state is minimized, the system is equilibrated, or connectivity has
changed. The atoms contained in a SubSystem are stored in an MDAnalysis
AtomGroup object, i.e. each SubSystem contains an AtomGroup.
SubSystems are created by a factory function. The factory creates objects
without specifying the exact class of object that will be created. The essence of
this pattern is to define an interface for creating an object, and let the classes
that implement the interface decide which object to instantiate. Therefore, the
factory method lets a class defer instantiation to subclasses.
A SubSystem factory function must have a signature of def SubsystemFactory(system, selects, *args), where system is a reference to the System object, selects is a list of MDAnalysis Atomselect statements, and args is a list of arbitrary
user specified data. The full textual name of the subsystem factory function is
stored in the database, e.g. proto md.subsystems.RigidSubsystemFactory. The
System object’s initializer dynamically instantiates an instance of the named
factory function and calls it to create a list of SubSystems. This approach to
object creation maximizes the flexibility and the ease with which one can extend
ProtoMD. All one needs to do is provide the textual name of a factory function.
This named function does not need to be part of ProtoMD. The named factory
10
function simply needs to be reference-able in the python path. So this approach
allows one to extend ProtoMD without having to change any ProtoMD source
code. This could prove to be very useful in a situation where one is developing
some new type of CG variable and one does not have administrator privileges,
and ProtoMD is installed in a system path. Users can develop a new SubSystem
object, store it in their home directory, and simply specify the name of their
subsystem factory in the ProtoMD database; ProtoMD will automatically use
their new SubSystem object.
A SubSystem derived object must implement the SubSystem interface. Because the implementation is in python, it is technically not a requirement for
subsystems to be derived from the SubSystem class; they simply must define and implement all the methods defined by the SubSystem class.
The
proto md.subsystems.SubSystem class only defines the interface that subsystem
classes must implement.
• def universe changed(self, universe): changes the universe. This occurs
when the number of atoms has changed (on startup, in the event of solvation or chemical reactions). So,
init
will be called when a new Sub-
System derived object is created, then after the universe is created, it will
call universe changed.
• def frame(self ): notifies the subsystem that a new frame is ready. It should
return a tuple (position, velocity, force) of CG variables for the current
frame. If the CG positions are stored in a multi-dimensional tensor (e.g.,
3 × 3 × 3), then they need to be unfolded and returned as a 1 × 27 row
vector, and similarly for the CG velocity and force variables.
• def translate(self, values): translates all atomic positions given a coarse
grained variable. For example, if the CG variable is the center of mass
of a macromolecule, then translate would be given a 1 × 3 vector which
11
represents how much to shift the center of mass. Next, the subsystem
would simply add this value to all atomic positions. The values argument
is the difference in CG variable space between the current configuration
and a deformed configuration. This is given as an unfolded 1 × n column
vector, and if the CG variables are tensorial, then values needs to be
folded.
• def minimized(self ): is called after the atomic system is minimized; no
return value is required. This method is provided if the CG variables
must be modified in the event of minimization.
• def equilibrated(self ): is called after the atomic system is relaxed (thermalized in isothermal simulations); no return value is required.
• def md(self ): is called after the MD trajectories are processed; no return
value is required.
ProtoMD currently provides two types of SubSystems. The first uses the center
of mass as the CG variable and is denoted proto md.subsystems.RigidSubsystem.
This subsystem serves as a demonstration, because it is probably not useful beyond spherically symmetric systems. A more sophisticated and useful sybsystem
is proto md.subsystems.SpaceWarpingSubSystem, which constructs a set of CG
variables based on an orthogonal polynomial basis set [10, 22]. In principle,
any set of orthogonal polynomials may be used, but currently only Legendre
polynomials are implemented.
3.3. Coevolution integration
The Integrator class performs the time evolution of the system with the
application of the classical propagator Γ(tn+1 ) = eiLt Γ(tn ). The Integrator
currently has two concrete implementations actualizing the physical ideas presented in sec. (2): the LangevinIntegrator implements the concepts presented
12
in sec. (2.0.1) while the FactorizationIntegrator implements those presented in
sec. (2.0.2).
3.4. Key objects and packages
ProtoMD consists of a number objects, the key constituents are discussed
here, and are displayed in Fig. (2).
The principle object is conveniently called the System. This object manages
the state of the simulation and is roughly analogous to OpenGL context (Fig.
(3)). All state variables are contained entirely in the System. Additionally, all
other objects are contained within the System.
system.System
system.System is the top level class, it is a container class
for all the other classes. It is also the single place where ProtoMD handles data
persistence. The System class can be used either for loading and simulation, or
loading and analysis. The System constructor takes the path to a database file,
and a flag indicating the mode to open the file, defaults to “r” - read only. For
running a simulation, it should be “a”. This flag is relevant in that a simulation
can be running in one process (in read / write mode), and another process can
safely open the database in read only mode and not interfere with a running
simulation. This is useful for checking the state of a running simulation.
All simulation data is stored in a single database file, and the System class
provides a programatic interface. All completed timesteps are accessible via the
timesteps property.
3.4.1. State management and data persistence
The present state of a running simulation is contained wholly within the
System.current timestep and System.universe objects.
Molecular simulation programs have typically stored their data in a variety
of proprietary, non-standardized, and incompatible file formats. For example,
13
System
universe
subsystems
timesteps
cg_positions
cg_velocities
cg_forces
current_timestep
begin_timestep()
end_timestep()
translate(cg)
SubSystem
universe_changed(universe)
frame()
translate(cg)
minimized()
equilibriated()
TimeStep
atomic_minimized_positions
atomic_equilibriated_positions
atomic_starting_positions
atomic_final_positions
timestep_begin
timestep_end
cg_positions
cg_velocities
cg_forces
cg_translate
Integrator
system
run()
atomistic_step()
cg_step()
Figure 2: A UML diagram of the key objects and their key attributes in ProtoMD.
GROMACS [2] uses 31 different file formats, some of which are deprecated or
used only as input. Even though most MD file formats are documented, accessing them entails developing a parser for each format. Rather than invent
yet another format, ProtoMD used the standardized and widely supported hierarchical data format version 5 (HDF). HDF is a self-describing file format
designed for the general exchange of numerical data.
All information relevant to a ProtoMD simulation is stored in a single HDF5
database. This approach allows preparation of a database on a completely different computer than the one on which the simulation is intended to be run. All
molecular dynamics topologies, force field parameters, simulation parameters,
coordinates, etc., are stored in the HDF5 database. One could even start a
simulation on one compute cluster, run for a while, and restart the simulation
on a different cluster by simply copying a single file.
14
Considering the environment in which MD simulations are run, a compute
cluster is sometimes unreliable. Compute nodes and data storage frequently
crash or are rebooted. Programs are often terminated by the job scheduler.
Flaws or faults in external programs become evident, and countless other environmental conditions can cause termination of executing programs. Molecular
dynamics simulations produce a considerable amount of data. ProtoMD also
produces and uses a good deal of disparate data, ranging from simulation parameters, archived molecular dynamics input files, and atomic trajectory time
series. Therefore, an efficient, fault-tolerant data storage system which readily
allows restarts is a key, if not the core, requirement.
For on disk file storage, the HDF5 file format was chosen. HDF stands for
“hierarchical data format.” HDF5 is an open standard file format for the high
performance storage and retrieval of an unlimited variety of datatypes. HDF5
provides a library that allows access in a variety of languages. In this sense,
it can be thought of as similar to XML. However, XML is plain text (though
it can be compressed), it is nonetheless inefficient for storage of large amounts
of numerical data. HDF5 can can be accessed natively in a variety of different
mathematical and analysis packages such as Mathematica, MATLAB, and R.
A data-persistence model was chosen that is similar to that of most graphics
libraries such as Cairo, OpenGl, Cocoa, or Windows. All of these libraries
function essentially the same way. One creates or obtains a graphics context for
an appropriate output device, be it the screen, a printer, or an image buffer.
The graphics context is a data structure which maintains information about the
current drawing state. Then there are drawing functions, such as drawRect,
drawPoly, and so forth, which add graphics primitives to the context. In the
case of OpenGl, one adds a sequence of 3D information to the context. The key
point here is that all of these drawing operations operate on the data structure,
15
void d i s p l a y ( ) {
g l B e g i n ( GL LINES ) ;
glVertex3f (1.0 , 1.0 , 0 . 0 ) ;
g l V e r t e x 3 f ( −1.0 , −1.0 , 0 . 0 ) ;
glEnd ( ) ;
glFlush ( ) ;
Figure 3: A sample OpenGL code that displays a line on the screen. Objects in
ProtoMD manage the state of the simulation in a similar way to that employed
in OpenGL.
and not the output device directly. Once all of the drawing operations are
complete, then the graphics context is passed to the output device in a single
atomic operation. In contrast, if one were to output each drawing operation
immediately to the output device, it would be very cost prohibitive as that
output device must be locked, data written to it, and then the device must be
unlocked again. It would be a similar case if every function that produced some
relevant output information wrote that information directly to disk and flushed
the IO buffers. Therefore, all operations that affect the state of a TimeStep
write to the TimeStep data structure, which is a memory resident object. The
TimeStep object is created by the System.begin timestep(), accessed with
the System.current timestep, and atomically persisted to disk and deleted
from memory via System.end timestep().
The TimeStep object behaves as a data structure, however, it is actually
implemented as an HDF5 data set. The reason for this is that the data stored
in a TimeStep object can potentially be huge. Therefore, the underlying HDF5
implementation determines how to best store this data, e.g., all data directly in
memory, or some portions stored on disk.
3.5. Simulation
The command-line interface of ProtoMD can be accessed via python -m
proto md. This front end program is intended for initial database generation
16
and running the simulation. There is an extensive built in help system accessible
via: python -m proto md –help. Available options will likely change in the
future, so it is best to consult the help system for the latest complete up-to-date
information.
This subsection will describe how a simulation is performed from both implementation and user perspectives.
3.5.1. Initial database creation
The first step in setting up a simulation is the creation of a database. We
have provided a command line program that front ends the config.create sim(...)
function. This function (program) reads a set of parameters such as simulation
parameters, subsystem definition and atomic structure files. It can optionally
generate a topology (using pdb2gmx), perform a series of validations, and finally
package everything up into a ProtoMD database file, from which a ProtoMD
simulation can be run.
3.5.2. Stand-alone programs and libraries
ProtoMD can be used either as a stand alone program, or as a library.
There are many instances where it is more convenient to use ProtoMD as a
program, e.g. submission of simulations to a queuing system, or quick and
simple analysis tasks. The stand alone program interface to ProtoMD is made
possible by python’s module mechanism, i.e. when a package is installed with
a
main .py file.
3.5.3. Interfacing with molecular dynamics programs
ProtoMD requires external MD programs to perform a number of calculations, and it interfaces with external MD programs using a very transparent approach; ProtoMD handles all interaction with external programs through context
17
managers. All of the MD related functionality is contained in the proto md.md
package.
Programmatically, interfacing with external MD programs is frequently fraught
with error primarily due to the large number of input files that these MD programs require to be present on the file system. Most MD packages do not
provide an application program interface (API), and are intended to be userinvoked programs. This presents difficulty to calling programs, as they must
determine the current state of the file system, copy and convert large portions
of internal state into MD input files, call the MD program, parse the output,
and finally clean up after the MD program is run. Significant sources of error
arise when the MD program reads state from locations that the calling program
is unaware of, for example, external configuration files, and force field parameter
files. Furthermore, a more common source of error is the MD program reading
previous input files that perhaps were not deleted properly. In effect, most of
these errors arise from the fact that input state required by MD can be scattered
throughout the file system. This can be analogous to a programming language
with no variable scoping, i.e. all variables are global: in such situations, one
loses the concept of functions, and all subroutine calls consist purely of sideeffects. Yet another error occurs when concurrent copies of the calling program
are run: if these programs have to use the file system to pass state to an MD
program, they will overwrite each others’ file system state.
To circumvent or sequester these errors, ProtoMD generates temporary, randomly named directories into which the requisite MD input state variables are
serialized, the necessary files are copied, and various GROMACS programs are
run via GROMACS wrapper [36]. The use of temporary randomly named directories to store MD state minimizes the possibility of using other files or input
that the calling program is unaware of. These temporary directories only ex-
18
ist for the lifetime of the MD program call. In effect, we have created stack
frame, the set of information required for, and existing only for the lifetime of
procedure call.
The lifetime management of these temporary file system based variables is
implemented using a Resource Acquisition Is Initialization (RAII) idiom [37].
This is a programming idiom, widely attributed to Bjarne Stroustrup, which
states that a class with a constructor that acquires the resource (either by
creating it, or obtaining it via a parameter) has a destructor that always releases
the allocated resource. During the object lifetime, the resource may be accessed
via instance variables, and is guaranteed to be freed when the object is either
deleted or goes out of scope. RAII is a widely used pattern in C++: it is a
safe way to deal with resources and also makes the code much cleaner as it
eliminates the need to mix error handling code with functionality. This idiom
was traditionally not used in non-deterministic, garbage collected languages like
python. In C++, the object is destroyed when it either goes out of scope (if
stack allocated), or deleted (if heap allocated). In python, however, there is no
explicit control over when the object is destroyed. Therefore, if it contained a
resource, that resource may potentially exist for the lifetime of the program.
This limitation was overcome with PEP343, where context managers, provide
enter () and
exit () methods that are explicitly invoked on entry to and
exit from the body of the with statement. For instance, the open statement
returns a context manager containing an open file handle, it is kept open as
long as the execution is in the context of the with statement where it is used,
and closed when execution leaves the block, irrespective of whether the exit was
because of an exception or during regular control flow. The with statement can
thus be used in ways similar to the RAII pattern in C++: some resource is
acquired by the with statement and released when the with block is left.
19
ProtoMD wraps all external MD calculations into a set of functions which
return a proto md.md.MDManager object. This is a context manager which
holds all of the file system objects required for, and returned by, MD programs
in the file system directory. Calling functions can access any of the MD results
through a dictionary interface. For example, the function md.solvate(...)
accepts an MDAnalysis.Universe object, along with other parameters, creates a
random temporary directory, write the contents of the universe object, performs
a solvation using external GROMACS programs, and finally returns a context
manager containing this directory with the solvated structure and topology files.
Calling functions can access these files with the “struct” and “top” keys, i.e.
with md. s o l v a t e ( . . . ) a s s o l :
d o s o m e t h i n g ( s o l [ ” s t r u c t ” ] , s o l [ ” top ” ] )
As soon as the with block is exited, the context manager deletes the temporary
directory, and all its contents. Therefore, we have eliminated the chance that
subsequent calls to external MD programs could pick up input files that were
left in this directory. Furthermore, as each call to an MD function creates a
unique and random directory, multiple copies of ProtoMD may be concurrently
executed.
3.5.4. Solvation
The CG variables used in ProtoMD are functions of atomic coordinates of
the macromolecule of interest (although in principle, one could define the solvent as a subsystem). In MD simulations, the solvent atoms typically account
for at least 50% to 75% by particle count of the total system. After the end
of each CG timestep, ProtoMD extracts the macromolecule, which is then resolvated using Gromacs, and counter-ions (by default NaCl) are added to the
system. The reason resolvation is done is to avoid bad atom contacts between
20
the macromolecule and the solvent. In the case of simulations done in vacuum
or using implicit solvent models, solvation is omitted.
4. User guide
The purpose of this section is to describe how users can launch and analyze
multiscale simulations via the ProtoMD platform. ProtoMD can be used as an
executable module, i.e. a program with command-line arguments. In this mode,
a GROMACS compatible structure input file can be used to initiate a multiscale simulation. The user must also supply GROMACS-specific parameters
for running MD simulations (used for the microscopic phase), and ProtoMDspecific parameters needed for the mesoscopic phase wherein the CG variables
are advanced in time and new microstates are generated. The output is an
hdf file that contains information including the trajectory at user-specified CG
time steps. In the second mode, ProtoMD can be used as a library containing
modules that can be assembled into programs; this mode facilitates debugging
runs and analyzing results.
4.1. Genaral concepts
ProtoMD is built on the concept that everything related to a simulation
(including configuration parameters and output) is stored in a single database
file. Most traditional MD programs rely on a large number of configuration
and parameter files. Even though having a number of files in different locations
makes it relatively easy for users to modify these files, it is more convenient to
reproduce a single input file when rerunning a simulation at either a later date
or on another computer.
Since ProtoMD is a wrapper around GROMACS, it requires GROMACSspecific files. When a simulation is first prepared, all these external GROMACS
configuration files are collected and stored in the ProtoMD database file. These
21
files are automatically extracted and passed to GROMACS when the simulation
is run. This ensures that results are always reproducible because there are
no external dependencies. Furthermore, this approach facilitates moving or
restarting simulations on different computers. Subsection 4.3 shows how to use
the ProtoMD command line interface to prepare an initial database.
One can iterate through the output trajectory frames in a very similar manner to that used in MDAnalysis [38]. Details of how one can programmatically
access the ProtoMD database including trajectory analysis is covered in section
4.4.
Nearly all ProtoMD objects and methods have descriptive docstrings. These
strings form the contents of the interactive python help system, and it can be
accessed by typing the name of the class or method followed by a “?” in the
interactive python interpreter.
4.2. Selections
A key concept in ProtoMD is atomic selections. Atomic selections are a
group or subset of atoms selected from the whole system based on some criteria.
They are used throughout ProtoMD, particularly in defining subsystems (sec.
2.1). MDAnalysis [38] provides a selection language with a syntax similar to
that of CHARMM [1] or VMD [35]. One can think of this atomic selection
language as “SQL for atoms”. The atomic selection language provides a way
to select atoms based on a variety of criteria such as boolean combinations of
simple keywords, geometric or connectivity measures, index, name, residue ID,
or segment names. Examples on selection statements using MDAnalysis are as
follows:
# s e l e c t a l l DMPC l i p i d atoms
u n i v e r s e . s e l e c t A t o m s ( ” s e g i d DMPC” )
# s e l e c t a l l atoms w i t h i n 5 . 0 Angstroms from ( 0 , 0 , 0 )
22
universe . selectAtoms ( ” point 0.0 0.0 0.0 5.0 ” )
# s e l e c t a l l atoms b e l o n g i n g t o r e s i d u e s 1 t o 10
universe . selectAtoms ( ” r e s i d 1:10 ” )
In ProtoMD, the user creates a list of selections to define the subsystems.
These selections must not overlap, however, because each subsystem must constitute one or more segment(s). For example, it is natural to define the subsystems at the pentamer or hexamer level for a virus-like particle (Fig. (1)).
4.3. Preparing a system
ProtoMD can be run from the command line as follows: python -m proto md
command [args], where command can be one of the following:
• config: creates a database from which a multiscale simulation can be run
• top: auto-generates a topology file
• solvate: attempts auto-solvation
• run: runs or resumes a simulation
• runsol: like run, but performs only the auto-solvation step
• mn: performs energy minimization
• mneq: performs energy minimization followed by equilibriation
• mneqmd: performs energy minimization, equilibriation, and molecular
dynamics
• eq: performs equilibriation
• atomistic step: performs a full atomistic step (microphase)
• step: performs a single complete coarse-grained step (mesophase)
23
• md: performs only an MD step with the starting structure
• cg step: performs only the coarse grained portion of the time step
4.3.1. Configuring the system
The first step in running a multiscale simulation is to create the initial
database. This can be accomplished by using the following commands: python
-m proto md config [args] where args can be one of the following:
• -o fid FID name of the simulation file to be created.
• -struct STRUCT starting structure name.
• -box BOX BOX BOX x,y,z values of the system box in Angstroms. If box
is not given, the system size is read from the CRYST line in the structure
pdb.
• -top TOP name of the topology file. If this is not given, ProtoMD attempts
to generate a topology file using pdb2gmx.
• -posres POSRES name of a position restraints file, optional.
• -I INCLUDE DIRS name of the directories to search for topology file
includes. There can be many additional include directories, just like gcc,
but UNLIKE GCC, there must be a space between the -I and the dir, for
example -I /home/foo -I /home/foo/bar.
• -temperature TEMPERATURE is the temperature (in Kelvin) at which
to run the simulation. The default temperature is 300 K.
• -subsystem factory SUBSYSTEM FACTORY is a fully qualified function
name which can create a list of subsystems. The function defaults to
proto md.subsystems.SpaceWarpingSubsystemFactory which uses the space
24
warping method for constructing the CG variables for each subsystem.
Currently, only Legendre polynomials are supported.
• -subsystem selects SUBSYSTEM SELECTS [SUBSYSTEM SELECTS ...]
is a list of MDAnalysis select statements, one for each subsystem. For example, if the system is divided into 10 subsystems, subsystem selects takes
a list of length 10, with each entry in the list defining a given subsystem.
• -anion anion name to use for ionization (if system is not neutral). The
default anion is Cl− .
• -cation cation name to use for ionization (if system is not neutral). The
default cation is Na+ .
• -concentration concentration (in mol/L) of ions to use for ionization. The
default is 0.15 M.
• -subsystem args SUBSYSTEM ARGS [SUBSYSTEM ARGS ...] is a list
of additional arguments passed to the subsystem factory, the second item
of the list may be the string resid unique, which creates a separate subsystem for each residue. The most commonly used subsystem is SpaceWarpingSubsystemFactory. The args for this subsystem are [kmax, OPTIONAL(“resid unique”)]. kmax provides the upper bound on the sum
of the three Legendre indices, and the last argument is the optional string
“resid unique” to make a unique subsystem for each residue.
• -integrator INTEGRATOR is a fully qualified name of the integrator function, defaults to proto md.integrators.LangevinIntegrator, and the other
integrator provided is proto md.integrators.FactorizationIntegrator.
• -integrator args INTEGRATOR ARGS are additional arguments passed
to the integrator function.
25
• -cg steps CG STEPS is the number of coarse grained time steps taken
over the duration of a multiscale simulation.
• -dt DT is the size of the coarse grained time step in picoseconds.
• -mn steps MN STEPS is the number of MD steps used in performing
energy minimization.
• -md steps MD STEPS is the number of MD steps used for performing the
MD run of the microphase
• -multi MULTI is the number of parallel MD runs used to construct microstate ensembles for the Langevin integrator.
• -mn args MN ARGS is a dictionary of GROMACS-specific (mdp) parameters used in minimization.
• -md args MD ARGS is a dictionary of GROMACS-specific (mdp) parameters used in MD.
• -ndx NDX is the name of GROMACS-specific index file.
• -solvate is a boolean flag. If it is set and the system is to be auto-solvated,
the initial structure must not contain any solvent. This flag default to
False. To enable solvation, add ‘-solvate’ with no args.
• -debug enables debug mode (all simulation directories are not deleted).
• -mainselection MAINSELECTION is the name of make ndx group which
is used for the main selection. This should be a group that consists of the
non solvent molecules. “Protein” usually works when simulating protein.
However, another selection command is required when simulating lipids.
To find which selection to make, either run ProtoMD config, and an error
26
will pop up informing the user of the available groups, or run make ndx
to get a list of possible selections.
4.3.2. Examples
Algorithms on which ProtoMD is based has been demonstrated for a number
of systems [11, 14, 34, 27, 39]. The following example illustrates the use of the
ProtoMD command line interface to generate a simulation database. In the
following example, a database file named “1LFG-open.hdf” is created using
the starting structure of Lactoferrin, “1LFG-open.gro”. ProtoMD generates a
single-trajectory simulation (multi 1) for a series of 30 CG time steps, each of
length 1 ps (dt 1). The order of Legendre polynomials to be used is less than or
equal to 1 (subsystem args 1), which implies a total of 4 × 3 CG variables are
used to coarse-grain the protein. Furthermore, this example generates a single
subsystem which is specified by the “not segid SOL” selection statement. This
statement specifies that the subsystem should consist of all the atoms present
in the system which are not solvent atoms (i.e., protein atoms). Note that here
no topology file is specified; in this case, ProtoMD uses the ‘pdb2gmx” program
to generate a topology file, which is stored in the hdf database.
4.4. Accessing the database and analyzing simulations
proto md.System object is the main class in ProtoMD responsible for loading
and accessing the underlying hdf5 database, and for orchestrating the various
MD programs used to advance the simulation. The System object allows one
to open the underlying hdf5 database and access all pertinent data as native
python objects. One can create a System object in either read/write or readonly mode. One typically creates a System object in read-only mode to analyze
and access simulation results, and only the actual running simulation creates
the System object in read/write mode. There can be multiple instances of the
27
#! / b i n / sh
python −m proto md c o n f i g \
−o 1LFG−open . hdf \
−s t r u c t 1LFG−open . g r o \
−t e m p e r a t u r e 3 0 0 . 0 \
−s u b s y s t e m s e l e c t s ” not s e g i d SOL” \
−c g s t e p s 30 \
−dt 1 \
−mn steps 5000 \
−md steps 500 \
−m u l t i 1 \
−i n t e g r a t o r \
proto md . i n t e g r a t o r s . F a c t o r i z a t i o n I n t e g r a t o r \
−s u b s y s t e m f a c t o r y \
proto md . s u b s y s t e m s . SpaceWarpingSubsystemFactory \
−s u b s y s t e m a r g s 1
System object being open in the same database. For example, a simulation can
be currently running (on another process or even another node in a compute
cluster), and a user can create a System object to access the simulation. Provided the user creates the System object in read-only mode, it will not interfere
with the currently running simulation.
The System object has a number of important properties and methods. However, from an analysis perspective the most important is the System.timesteps
property. This is stored in the database and a series of hdf5 groups, but is presented to the user as a list of Timestep objects. The Timestep object provides
access to all of the calculated variables in a CG step as Numpy arrays, and the
Timestep object can even create an MDAnalysis Universe object from the CG
timestep data.
The Timestep object provides the following properties:
• atomic minimized positions
• atomic equilibriated positions
28
• atomic starting positions
• atomic final positions
• timestep begin
• timestep end
• cg coords
• cg velocities
• cg forces
• cg translate
The Timestep object is also capable of writing itself to the hdf5 database. This
is accomplished with the Timestep.flush() method. Writing to the database is
normally only performed by a simulation, and will fail if the System was opened
in read-only mode.
For analysis, the Timestep object provides a useful “create universe()” method,
which creates an MDAnalysis Universe from the timestep object. Once the
Universe is created, it can be loaded with the atomic positions listed above.
Syntactically, it would have been simpler if each timestep possessed a universe
property; however, the implementation of such a system would have entailed
either creating a new universe each time the universe property is called (computationally expensive), or create a caching system. It is relatively simple to
first obtain a universe object then set the atomic properties during the course
of iterating over the timesteps.
4.4.1. Examples
This section provides an example on analyzing a ProtoMD trajectory using
the universe object and the timesteps.
29
• Radius Of Gyration: This example first obtains a universe object, then
sets the atomic positions with the values stored in each timestep. Finally,
it performs the calculation and appends the value to a list.
# i m p o r t t h e proto md p a c k a g e
import proto md
# c r e a t e a system from t h e f i l e ‘ m y s i m u l a t i o n . h d f ’
s y s = proto md . System ( ‘ m y s i m u l a t i o n . hdf ’ )
# c r e a t e a new u n i v e r s e o b j e c t , t h i s i s now
# s e t to the i n i t i a l c o o r d i n a t e s
u = s . universe
# c r e a t e a new empty l i s t
l = l i s t ()
# i t e r a t e over the timesteps
for ts in sys . timesteps :
# s e t t h e u n i v e r s e atomic p o s i t i o n s
# to the values stored in the database timestep .
u . atoms . p o s i t i o n s = t s . a t o m i c f i n a l p o s i t i o n s
# c a l c u l a t e the r adius of gyration
r o g = u . atoms . r a d i u s O f G y r a t i o n s ( )
# append r o g t o t h e l i s t
l . append ( r o g )
• Radius Of Gyration (alternative version): perhaps a more ‘pythonic’ method
of calculating the radius of gyration (or any other metric) would be to create a higher order function once the universe is obtained and populate it
with the atomic positions and return it. In this approach, list comprehension can be used to create a list of values using a single list comprehension
statement:
# i m p o r t t h e proto md p a c k a g e
import proto md
30
# c r e a t e a system from t h e f i l e ‘ m y s i m u l a t i o n . h d f ’
s y s = proto md . System ( ‘ m y s i m u l a t i o n . hdf ’ )
# c r e a t e a new u n i v e r s e o b j e c t , t h i s i s now
# s e t to the i n i t i a l c o o r d i n a t e s
u = sys . universe
# f u n c t i o n which t a k e s an a r r a y o f atomic p o s i t i o n s ,
# s e t s t h e u n i v e r s e atomic s t a t e and r e t u r n s t h e
# universe object .
def u f ( pos ) :
# s e t t h e u n i v e r s e ’ s atomic p o s i t i o n s
u . atoms . p o s i t i o n s = pos
return u
# c a l c u l a t e the value in a s i n g l e l i s t statement .
r o g = [ u f ( t s . a t o m i c f i n a l p o s i t i o n s ) . atoms . radiusOfG−
y r a t i o n ( ) f o r t s in s y s . t i m e s t e p s ]
• Distance Between Proteins: This example is taken directly from the MDAnalysis website, but here MDAnalysis is used to analyze a coarse grained
time series. In this example, the end to end distance of a protein and
the radius of gyration are calculated. This is assuming that the protein in question has the segment identifier “S4AKE”. This example uses
MDAnalysis atom groups. Note, any analysis of ProtoMD timesteps performed using MDAnalisys virtually identical to conventional MD analysis,
the only difference being that one must to set the atomic positions from
the ProtoMD timestep values.
# i m p o r t t h e proto md p a c k a g e
import proto md
# c r e a t e a system from t h e f i l e ‘mySim . h d f ’
s y s = proto md . System ( ‘ mySim . hdf ’ )
# c r e a t e a new u n i v e r s e o b j e c t , t h i s i s now
# s e t to the i n i t i a l c o o r d i n a t e s
u = s . universe
31
# t h e f i r s t n i t r o g e n atom i n t h e 4AKE segment .
nterm = u . s4AKE .N [ 0 ]
# l a s t carbon atom i n t h i s segment
cterm = u . s4AKE . C[ −1]
# a s e l e c t i o n ( a AtomGroup )
bb = u . s e l e c t A t o m s ( ‘ p r o t e i n and backbone ’ )
# iterate
f o r t s in
#
#
#
t h r o u g h a l l frames
sys . timesteps :
s e t t h e u n i v e r s e atomic p o s i t i o n s
t h i s i s t h e o n l y d i f f e r e n t a s p e c t from
c o n v e n t i o n a l MD time s e r i e s a n a l y s i s .
u . atoms . p o s i t i o n s = t s . a t o m i c f i n a l p o s i t i o n s
# end−to−end v e c t o r from atom p o s i t i o n s
r = cterm . pos − nterm . pos
# end−to−end d i s t a n c e
d = numpy . l i n a l g . norm ( r )
# AtomGroup i s a ‘ l i v e ’ o b j e c t , i t i s u p d a t e d
# each time new c o o r d i n a t e s a r e l o a d e d .
r g y r = bb . r a d i u s O f G y r a t i o n ( )
print ” frame = %d : d = %f Angstrom , Rgyr = ” \
”%f Angstrom ” % ( t s . frame , d , r g y r )
4.5. Running simulations
Once an hdf file is created, a multiscale simulation is initiated by invoking:
python -m proto md run file.hdf [-debug]. If the user supplies the optional parameter -debug, ProtoMD will not delete any of the directories created to run
MD. This is useful for debugging and tracking problems with unstable simulations.
As an example on running a multiscale simulation using the Trotter integrator, pertussis toxin (PDB ID: 1PRT), a protein-based AB5-type exotoxin [40]
32
was simulated under NVT conditions at 300K. The CHARMM27 force field [23]
was used in explicit solvent using the TIP3P model [41]. NaCl counter-ions
of concentration 0.15 M were added for charge neutrality. The system consisted of 603, 775 atoms in a box of dimensions 16nm × 16nm × 24nm. The CG
timestep was set to 0.5 ps with the micro (MD) phase taken to be 100 fs and
the equilibriation time 50 fs. To prepare the system, the following script was
run:
#! / b i n / sh
python −m proto md c o n f i g \
−o 1PRT. hdf \
−s t r u c t 1PRT. g r o \
−t e m p e r a t u r e 3 0 0 . 0 \
−s u b s y s t e m s e l e c t s ” p r o t e i n ” \
−c g s t e p s 7400 \
−dt 0 . 5 \
−mn steps 0 \
−e q s t e p s 50 \
−md steps 100 \
−t o p a r g s { ’ f f ’ :
’ charmm27 ’ } \
−m u l t i 1 \
−i n t e g r a t o r \
proto md . i n t e g r a t o r s . F a c t o r i z a t i o n I n t e g r a t o r \
−s o l v a t e \
−s u b s y s t e m f a c t o r y \
proto md . s u b s y s t e m s . SpaceWarpingSubsystemFactory \
−s u b s y s t e m a r g s 1
The last argument ‘subsystem args 1’ signifies that only first order Legendre
33
polynomials are used in constructing a total of 4×3 space-warping CG variables
(sec. 2.1). Once the hdf file 1PRT.hdf was created, a multiscale simulation was
initiated with the following command:
python −m proto md run 1PRT. hdf
The simulation performs a series of 7, 400 CG steps, writing the minimized,
equilibriated, and extrapolated coordinates of all protein atoms at the end of
each CG timestep. The protein undergoes a conformational change over this
time period, which is visualized by plotting the root-mean-square deviation
(RMSD) of the protein atomic positions with the reference configuration set
to the protein positions at t = 0 ns. For comparison, two MD runs were also
done (starting from the same initial conditions) at the same thermodynamic
conditions.
import proto md
import numpy a s np
from MDAnalysis . a n a l y s i s . a l i g n import ∗
s y s = proto md . System ( ‘ 1PRT. hdf ’ )
# c r e a t e a new u n i v e r s e o b j e c t , t h i s i s now
# s e t to the i n i t i a l c o o r d i n a t e s
mob = s . u n i v e r s e
# create a reference
r e f = proto md . System ( ‘ 1PRT. hdf ’ )
f o r t s in s y s . t i m e s t e p s :
# a l i g n the mobile to the reference s t r u c t u r e
34
a l i g n t o (mob , r e f , s e l e c t=” p r o t e i n ” , m a s s w e i g h t e d=True )
# s a v e rmsd t o f i l e
r m s d l i s t . append ( rmsd (mob , r e f ) )
# w r i t e rmsd t o f i l e
r m s d l i s t = np . a r r a y ( r m s d l i s t )
np . s a v e t x t ( ’ rmsd . dat ’ , r m s d l i s t )
15
RMSD (Å)
10
5
MF
MD1
MD2
0
0
1
2
Time (ns)
3
4
Figure 4: Variation in the RMSD of pertussis toxin protein as a function of
time, using multiscale factorization (MF) and molecular dynamics (MD1 and
MD2).
5. Acknowledgments
This research was supported in part by the following: the NSF INSPIRE
program (grant 1344263), the NSF division of material science research (grant
35
1533988), and the Indiana University information technology services (UITS)
for high performance computing resources.
References
[1] Bernard R. Brooks, Robert E. Bruccoleri, Barry D. Olafson, David J.
States, S. Swaminathan, and Martin Karplus. Charmm: A program for
macromolecular energy, minimization, and dynamics calculation. J. Comput. Chem., 4:187–217, 1983.
[2] David van Der Spoel, Erik Lindahl, Berk Hess, Gerrit Groenhof, Alan E.
Mark, and Herman J. C. Berendsen. Gromacs: Fast, flexible, and free. J.
Comput. Chem., 26:1701–1718, 2005.
[3] James C. Phillips, Rosemary Braun, Wei Wang, James Gumbart,
Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D. Skeel,
Laxmikant Kale, and Klaus Schulten. Scalable molecular dynamics with
namd. J. Comput. Chem., 26:1781–1802, 2005.
[4] S. Plimpton. Fast parallel algorithms for short-range molecular dynamics
rigid bodies. J. Comp. Phys., 117:1–19, 1995.
[5] David A. Pearlman, David A. Case, James W. Caldwell, Wilson S. Ross,
Thomas E. Cheatham III, Steve DeBolt, David Ferguson, George Seibel,
and Peter Kollman. Amber, a package of computer programs for applying
molecular mechanics, normal mode analysis, molecular dynamics and free
energy calculations to simulate the structural and energetic properties of
molecules. Comput. Phys. Commun., 91:1–41, 1995.
[6] Robert E. Rudd and Jeremy Q. Broughton. Coarse-grained molecular dynamics and the atomic limit of finite elements. Phys. Rev. B, 58:5893–5896,
1998.
36
[7] Amy Y. Shiha, Anton Arkhipov, Peter L. Freddolino, and Klaus Schulten.
Coarse grained protein-lipid model with application to lipoprotein particles.
J. Phys. Chem. B, 110:3674–3684, 2006.
[8] Amy Y. Shiha, Peter L. Freddolinoa, Anton Arkhipova, and Klaus Schulten. Assembly of lipoprotein particles revealed by coarse-grained molecular
dynamics simulations. J. Struct. Biol., 157:579–592, 2007.
[9] Peter J. Bond, John Holyoake, Anthony Ivetac, Syma Khalid, and
Mark S.P. Sansom.
Coarse-grained molecular dynamics simulations of
membrane proteins and peptides. J. Struct. Biol., 157:593–605, 2007.
[10] Khuloud Jaqaman and Peter J. Ortoleva. New space warping method for
the simulation of large-scale macromolecular conformational changes. J.
Comput. Chem., 23:484–491, 2002.
[11] Srinath Cheluvaraja and Peter J. Ortoleva. Thermal nanostructure: An
order parameter multiscale ensemble approach. J. Chem. Phys., 132:75102–
75110, 2010.
[12] Yinglong Miao, J. E. Johnson, and Peter J. Ortoleva. All-atom multiscale
simulation of ccmv capsid swelling. J. Phys. Chem. B, 114:11181–11195,
2010.
[13] Harshad Joshi, Abhishek Singharoy, Yuriy V. Sereda, Srinath C. Cheluvaraja, and Peter J. Ortoleva. Multiscale simulation of microbe structure
and dynamics. Prog. Biophys. Mol. Biol., 107:200–217, 2011.
[14] Andrew Abi Mansour and Peter Ortoleva. Multiscale factorization method
for simulating mesoscopic systems with atomic precision. J. Chem. Theory
Comput., 10:518–523, 2014.
37
[15] I. Bahar, R. A. Atilgan, and B. Erman. Direct evaluation of thermal fluctuations in protein using a single parameter harmonic potential. Folding
and Design, 2:173–181, 1997.
[16] T. Haliloglu, I. Bahar, and B. Erman. Gaussian dynamics of folded proteins.
Phys. Rev. Letters, 79:3090–3093, 1997.
[17] M. K. Kim, R. L. Jernigan, and G. S. Chirikjian. Rigid-cluster models of
conformational transitions in macromolecular machines and assemblies. J.
Biophys., 89:43–55, 2005.
[18] Siewert J. Marrink, Alex H. de Vries, and Alan E. Mark. Coarse grained
model for semiquantitative lipid simulations. J. Phys. Chem. B, 108:750–
760, 2004.
[19] Teemu Murtola, Emma Falck, Mikko Karttunen, and Ilpo Vattulainen.
Coarse-grained model for phospholipid/cholesterol bilayer employing inverse monte carlo with thermodynamic constraints.
J. Chem. Phys.,
126:75101–75114, 2007.
[20] W. G. Noid, Jhih-Wei Chu, Gary S. Ayton, Vinod Krishna, Sergei Izvekov,
Gregory A. Voth, Avisek Das, and Hans C. Andersen. The multiscale
coarse-graining method. i. a rigorous bridge between atomistic and coarsegrained models. J. Chem. Phys., 128:244114–244124, 2008.
[21] J. C. Berendsen, David van der Spoel, and R. van Drunen. Gromacs:
A message-passing parallel molecular dynamics implementation. Comput.
Phys. Commun., 91:43–56, 2013.
[22] Abhishek Singharoy, Harshad Joshi, Yinglong Miao, and Peter J. Ortoleva.
Space warping order parameters and symmetry: Application to multiscale
38
simulation of macromolecular assemblies. J. Phys. Chem. B, 116:8423–
8434, 2012.
[23] Jr. AD MacKerell, N. Banavali N, and N. Foloppe. Development and current status of the charmm force field for nucleic acids. Biopolymers, 56:257–
265, 2001.
[24] Wendy D. Cornell, Piotr Cieplak P, Christopher I. Bayly, Ian R. Gould,
Kenneth M. Merz Jr., David M. Ferguson, David C. Spellmeyer, Thomas
Fox, James W. Caldwell, and Peter A. Kollman. A second generation force
field for the simulation of proteins, nucleic acids, and organic molecules. J.
Am. Chem. Soc, 117:5179–5197, 1995.
[25] Peter J. Ortoleva. Nanoparticle dynamics: A multiscale analysis of the
liouville equation. J. Chem. Phys., 109:2770–2771, 2005.
[26] Stephen Pankavich, Zeina Shreif, Yinglong Miao, and Peter J. Ortoleva.
Self-assembly of nanocomponents into composite structures: Derivation
and simulation of langevin equations. J. Chem. Phys., 130:194115–194124,
2009.
[27] Yuriy V. Sereda, John M. Espinosa-Duran, , and Peter J. Ortoleva. Energy
transfer between a nanosystem and its host fluid: A multiscale factorization
approach. J. Chem. Phys., 140:074102, 2014.
[28] H. F. Trotter. On the product of semi-groups of operators. Proceedings of
the American Mathematical Society, 10:545–551, 1959.
[29] Stephen Pankavich, YingLong Miao, Jamil Ortoleva, Zeina Shreif, and Peter J. Ortoleva. Stochastic dynamics of bionanosystems: Multiscale analysis
and specialized ensembles. J. Chem. Phys., 128:234908–234921, 2008.
39
[30] Abhishek Singharoy, Srinad Cheluvaraja, and Peter J. Ortoleva. Order
parameters for macromolecules: Application to multiscale simulation. J.
Chem. Phys., 134:44104–44120, 2011.
[31] A. Singharoy, H. Joshi, and Peter J. Ortoleva. Multiscale macromolecular
simulation: role of evolving ensembles. J. Chem. Inf. Model., 52:2638–2686,
2012.
[32] Mary A. Rohrdanz, Wenwei Zheng, Mauro Maggioni, and Cecilia Clementi.
Determination of reaction coordinates via locally scaled diffusion map. J.
Chem. Phys., 134:124116–244124, 2011.
[33] F Muller-Plathe. Coarse-graining in polymer simulation: From the atomistic to the mesoscopic scale and back. ChemPhysChem., 3:754–769, 2002.
[34] Jing Yang, Abhishek Singharoy, Yuriy V. Sereda, and Peter J. Ortoleva.
Quasiequivalence of multiscale coevolution and ensemble md simulations:
A deomnstration with lactoferrin. Chem. Phys. Lett., 616-617:154–160,
2014.
[35] William Humphrey, Andrew Dalkea, and Klaus Schulten. Vmd: Visual
molecular dynamics. J. Mol. Graph. Model., 14:33–38, 1996.
[36] Oliver Beckstein. Gromacswrapper. @ONLINE.
[37] Bjarne Stroustrup. The design and evolution of c++. In The Design and
Evolution of C++. Addison-Wesley, 1994.
[38] N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. Mdanalysis: A toolkit for the analysis of molecular dynamics simulations. J.
Comput. Chem., 32:2319–2327, 2011.
40
[39] Andrew Abi Mansour, Yuriy V. Sereda, Jing Yang, and Peter J. Ortoleva.
Prospective on multiscale simulation of virus-like particles: Application to
computer-aided vaccine design. Vaccine, 33:5890–5896, 2015.
[40] Penelope E. Stein, Amechand Boodhoo, Glen D. Armstrong, Stephen A.
Cockle, Michel H. Klein, and Randy J. Read. The crystal structure of
pertussis toxin. Structure, 2:45–57, 1994.
[41] W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. L.
Klein. Comparison of simple potential functions for simulating liquid water.
J. Chem. Phys, 79:926–935, 1983.
41