Chun 2000
Chun 2000
Chun 2000
includes the use of a multiple time scale (MTS) integration scheme. Speed
increases of 5- to 30-fold over atomistic simulations have been realized in various
applications of the method. c 2000 John Wiley & Sons, Inc. J Comput
Chem 21: 159–184, 2000
A self-starting version of SHAKE, termed RATTLE, ple time-scale (MTS) schemes,20 a hierarchy of time
was later developed.11 In general, such approaches steps is introduced instead of the traditional single
allow a modest but significant increase in speed by time step. The idea is to take advantage of the terms
a factor of 2, and are standard techniques applied in the biomolecular Hamiltonian that vary more
in most MD studies. Van Gunsteren and Karplus12 slowly than other terms. Appropriate time steps are
have shown that constraining the bond angles with assigned for the different terms. A significant step
SHAKE significantly modifies the dynamic proper- forward in terms of biomolecular simulations was
ties of the system while constraining bond lengths the introduction of MTS variants that exhibit time
does not significantly alter the dynamics. Note that reversal symmetry.21, 22
computational advantage has to be balanced against By applying MTS methods to a variety of sys-
the additional work required to solve the constraint tems, including the protein BPTI, using a splitting
equations. scheme based on bonding topology, Watanabe and
In a different application of constrained dynam- Karplus23, 24 were able to obtain speed-up factors of
ics, torsion angle dynamics,13 only dihedral angles about 2–4. The essential properties of the dynamics
are allowed to move, maintaining all bonds and (with 1t = 0.5 fs as the shortest time step) were suc-
bond angles fixed. The dynamic simulations are cessfully replicated. Similar results were reported
then performed by integrating the equations of mo- by Berne and collaborators,25 who obtained a speed-
tion in generalized coordinates.14 – 17 Applications up factor of 4 by splitting the nonbonded terms into
to alanine nanopeptide17 succeeded in reproducing fast and slow parts based on distance using a con-
the atomistic simulation results (at 1t = 1 fs) of all tinuous switching function. Forester and Smith26
atom trajectories using 1t = 9 fs for an all-atom rep- reported MTS speed-up factors of 2–3 for liquid
resentation of the molecule. Time steps as high as water and a solvated protein system. The MTS ap-
1t = 13 fs were successfully used for simulations of proach shows significantly better performance en-
an extended atom representation (i.e., after remov- hancements in systems where the separation of fast
ing methyl group rotations). However, despite the and slow motions is more pronounced than in bio-
significant time-step increase, application of these molecules. The C60 molecule is a good example of
methods to large biomolecules is limited because a such a system, realizing MTS speed-up factors of 20
matrix inversion is required at every time step. or more, enabled by time steps of up to 25 fs for the
Another constraint approach based on internal slow forces.27
coordinates was developed by Durup,18 and used In the context of multiscale approaches to bio-
a set of generalized internal coordinates, arranged molecule dynamics, the now common multipole
in a “tree-like” topology. A set of holonomic con- expansion of the electrostatic field should also
straints is used to freeze out the high-frequency be mentioned.28 There are also preliminary at-
motions. Time-step increases from 1 to 9 fs were tempts at applying multigrid techniques to molec-
reported for an application to citrate synthase. ular problems.29 Multigrid techniques have proven
Though the method was applied to a search for re- useful for solving partial differential equations in
action pathways in citrate synthase,19 it has not been other contexts, such as the Navier–Stokes equations
adopted in general, because the choice of coordi- of fluid dynamics.30
nates is not straightforward.
An extreme version of constrained dynamics is Eigenvector-Based Schemes
rigid body dynamics. In this approach, small groups
of atoms are defined as rigid bodies effectively Normal modes, which are obtained from diag-
constraining the relative motion of all atoms con- onalization of the Hessian matrix (second deriva-
tained within a body. The resulting dynamics only tive of the potential), describe global deformations
accounts for the relative translation and rotation be- around a local minimum on the molecular potential
tween the bodies.3 This approach is often used as energy surface. However, the harmonic approxima-
a sampling procedure and in the course of refining tion involved in the derivation of the modes limits
experimentally determined structures.4 their usefulness to small amplitudes, making these
eigenvectors inappropriate for overall dynamics.
Multiple Time-Scale Methods To overcome this problem, principal component
analysis of MD trajectories, also known as “quasi-
Another approach to the integration time-step harmonic analysis” or “covariance modes,” was in-
problem takes advantage of the multiscale character troduced to protein dynamics.31 – 33 In this method,
of protein motions. In what is known as multi- a covariance matrix is constructed from a long
MD simulation, and diagonalized to obtain global Implicit schemes, therefore, are not likely to be
modes that describe that motion. The method was effective dynamical integrators at large time steps
later employed with the hope of describing mole- due to the damping effects. Implicit methods are
cular dynamics trajectories in terms of a small also costly because of nonlinear minimization at
“essential subspace.”34, 35 Berendsen and cowork- each time step, and not likely to be competitive
ers later developed a technique that performs an with other approaches in terms of simulation time
adapted form of MD with constraint forces in the requirements.9
approximate “essential subspace.”36, 37 Because time
information is lost in this protocol, the method is es- Reaction Path Optimization
sentially a sampling technique that cannot directly
Recently, a different approach was suggested by
be interpreted as dynamics. However, it does pro-
Olender and Elber45 to compute long-time molec-
vide an enhanced sampling of the configuration
ular dynamics trajectories of fixed length in cases
space compared to usual MD.
where both the initial and final states are known.
Space et al.38 used such subspaces in a dynam-
The technique is an extension of a reaction path
ics method in which the low frequency motion
method,46 and is based on the stochastic path in-
is propagated via a projective Newton’s equation.
tegral of Onsager and Machlap.47 Trajectories are
Applying this method to a Lennard–Jones crystal
computed by path optimization between the two
and a glass, they found convergence to the ref-
end points, and modes of motion with periods
erence MD trajectory, although notable deviations
shorter than the discrete time steps are filtered out
also emerged. Speed ups of 2–5 relative to atomistic
making the trajectory stable for very long time steps.
MD were obtained with time steps up to 25 fs, but in
Although this approach does not solve the time-step
simple molecular models, significant energy damp-
problem for the typical molecular dynamics simu-
ing was observed.
lation, it is likely to be useful for specific processes
A recent study by Schulten and coworkers39 sug-
where both initial and final states are known.
gests that the principal-component analysis method
As seen from the above brief overview, extend-
may not be suitable for describing long-time protein ing the length of molecular dynamics simulations is
dynamics. The extremely long trajectories needed to an essential issue, leading to a broader application
extract stable principal component modes are cur- of this very useful computational technique. The
rently impractical. available approaches offer only limited speedups
of atomistic MD simulations by factors of 2 to 4
Implicit Schemes (SHAKE, MTS, Langevin–Normal mode), falling
significantly short of what is desired. Other ap-
The Langevin-implicit-Euler scheme was intro-
proaches have either developed into useful sam-
duced into MD by Peskin and Schlick40 to maintain
pling procedures (torsion dynamics, rigid body dy-
numerical stability for large time steps. In implicit
namics, covariance modes) or deal efficiently with
integration schemes, the incorporation of future
a restricted class of problems (reaction path opti-
information helps avoid stability problems associ-
mization), but do not extend the scope of general
ated with extrapolation techniques.41 Although the
purpose MD.
Langevin framework was introduced to replenish
Despite the recent introduction of new ap-
some of the energy that is damped by the im-
proaches to bridge the gap between the time scales
plicit integrator, severe damping of high frequency
accessed by computer and those of more physical
modes still occurs, which in turn, has a significant
processes, the problem is far from being solved. The
effect on global motion.42 Therefore, this scheme is
MBO(N)D flexible-body method presented herein
not appropriate for atomistic simulations. However, represents a new approach to enabling long time-
for macroscopic models, where high-frequency mo- step MD simulations.
tions are irrelevant to the overall behavior, the
Langevin-implicit-Euler allows for dynamical sim-
ulations of large-scale supercoiled DNA models as MBO(N)D-Substructured Multibody
an elastic material.43 A variant of this method, in Molecular Dynamics
which energy was put back into the system in an
ad hoc fashion, was used to study possible folding MBO(N)D’s roots lie in applied mathematics
pathways in BPTI.44 However, this variant should techniques and algorithms developed in the 1980s
be regarded as a sampling procedure rather than a for solving similar computational efficiency prob-
dynamic integration scheme. lems associated with dynamics simulations of large
complex mechanical structures (e.g., spacecraft) in important aspect of MBO(N)D, and is addressed in
the aerospace industry.48 Proof-of-concept studies detail in the Substructuring Strategies section.
were presented in Turner et al.,49 where various as- The internal motions of MBO(N)D’s flexible bod-
pects of the MBO(N)D approach were discussed, ies are modeled by a reduced set of body-based
and results for simple systems were reported. modes, which are added to the motion associated
MBO(N)D is a reduced-variables MD approach with the three translational and three rotational
that seeks to dramatically improve computational rigid body degrees of freedom. The eigensolution
efficiency over atomistic simulation methods, while process needed to generate these flexible body
maintaining comparable accuracy in the trajectory modes is far more computationally efficient than
and ensemble properties at the structure–function in traditional normal mode analyses. This is be-
level. The most important feature of MBO(N)D is cause MBO(N)D’s modal displacement vectors and
that of substructuring, whereby groups of atoms in frequencies are calculated separately for each indi-
a molecular model are grouped into interconnected vidual body, rather than for the entire system. The
flexible and/or rigid bodies. These bodies are al- resulting mode set for each flexible body is then re-
lowed to undergo large motions relative to each duced by selecting only the lowest frequency modes
other. Within flexible bodies, the relative motions that correspond to the overall motions of the body.
among the aggregated atoms are assumed small The high-frequency modes, which correspond to lo-
and at low frequency. For rigid bodies, it is as- calized vibrations, are typically not important to the
sumed there is no motion between the aggregated event of interest, and are ignored during subsequent
atoms. This approach acknowledges that the es- simulation.
sential dynamics of molecules are captured by the MBO(N)D’s rigid bodies are considered as spe-
low-frequency modes.32, 34, 38, 50 – 52 cial cases of flexible bodies where none of the
MBO(N)D’s substructuring approach contains body-based modes are retained. Individual atoms
two essential elements for substantially improving (particles) are modeled classically, with three trans-
the computational efficiency of molecular dynamics lational degrees of freedom, and are used in regions
simulations. First, the number of degrees of freedom of the molecular model where no aggregation into
in the dynamics equations is drastically reduced, larger bodies is appropriate or desirable.
potentially from tens or hundreds of thousands of Compared to individual atoms, rigid bodies have
atoms to tens or hundreds of bodies. Second, both a larger mass, and hence, lower frequency con-
the inter- and intrabody dynamics occur at low fre- tent when subjected to the same level of forcefield
quencies, which allows the simulation time step to interactions. When bodies are made flexible, the
be increased from 1 or 2 fs to 10 fs or more. These truncation of high frequency modes correspond-
two key attributes allow MBO(N)D dynamics simu- ing to localized motions maintains the frequency
lations to execute at computational speeds that are content of the body at a low level. Thus, substruc-
significantly faster than traditional all-atom meth- tured modeling allows the use of long time steps in
ods, depending on the nature of the substructuring MBO(N)D. Because arbitrarily large motions subject
scheme. For systems studied to date, speed ups to interaction forces are permitted between bodies,
of up to 30 have been obtained. As would be ex- MBO(N)D’s substructured modeling approach cap-
pected, higher levels of aggregation (e.g., moderate- tures anharmonic motions that other reduced order
to large-sized bodies) typically execute faster than methods, such as normal modes, cannot properly
lower levels of aggregation (e.g., small bodies and reproduce.
atomistic regions). Substructuring allows multigranular simulations
Substructuring schemes for MBO(N)D simula- in MBO(N)D where different regions of the mole-
tions are determined by the amount of relative cule are modeled at varying levels of aggregation
motion expected throughout the molecule. For re- (flexible/rigid bodies and/or atoms), depending on
gions where motions are expected to be very small their importance to the particular event being stud-
or small enough to be unimportant to the particular ied. Parts of the system, such as the region near an
event of interest in the simulation, the atoms can be active site of a protein, are modeled at a fine level
grouped together into rigid bodies. Regions where of granularity by using individual atoms and small
there are moderate amounts of motion can be mod- bodies. Other parts far away from an active site are
eled as flexible bodies. Regions where large confor- modeled at a coarse level of granularity by using
mational changes are expected are modeled with large bodies.
many small bodies or individual atoms. The devel- MBO(N)D’s multigranularity can be coupled
opment of appropriate substructuring schemes is an with integrator implementations that make use of
the multiple time scales present in the physical sys- cluding the substructuring and body-based mode
tem. Even for conventional all-atom models, one methodology, use of constraints, handling of force
can take advantage of the separation of time scales field interactions, and a new integrator developed
that exist between fast interactions such as bond for MBO(N)D.
stretching and slow interactions such as long-range
electrostatics. For MBO(N)D, a natural separation MULTIBODY DYNAMICS
of time scales arises from differences in body size.
The motions and interactions involving individu- In the MBO(N)D approach, a molecule is gen-
ally modeled atoms are expected to have high- erally described as a cluster of multiple, flexible
frequency content, while bodies are expected to substructures (bodies) that comprise a dynamic sys-
have lower frequency content. Consequently, the tem. Member bodies of the molecule are capable of
motion in regions that are modeled using particles undergoing large relative excursions, and are inter-
can be integrated using shorter time steps, while the connected by fixed or free bond lengths. This type of
motion in regions that are modeled as bodies (rigid multibody system is acted on by inertial forces such
or flexible) can be integrated using longer time as those due to centripetal and Coriolis acceleration
steps. As will become apparent, multiple time-scale as well as by the usual forces derived from empirical
(MTS) integration is the key to allowing MBO(N)D potentials.
handle the higher frequency content of atomistic Figure 1 shows a topological configuration exam-
regions, while still achieving maximum computa- ple of a typical multibody system. The five hinges
tional efficiency from the body-dominated portion and four bodies shown result in one closed path. For
of the model. A similar multigranular approach was each body of the system, there is a body-fixed, right-
developed by Head-Gordon and Brooks,3 where handed reference frame located at a user-selected
small virtual rigid bodies were used in regions sur- location within the body. A body’s elastic deforma-
rounding a central atomistic region to account for tion is measured relative to this reference frame.
global motions that are not captured in stochastic A hinge is defined by locating one point on each
boundary molecular dynamics methods. of two bodies. Each pair of points, pi and qi , in Fig-
Force field interactions for MBO(N)D are gener- ure 1, constitutes a hinge. Clearly, a typical body
ally obtained by conventional all-atom calculations, may contain one or more hinge points, but a single
such as the CHARMM all-atom force field.53 One hinge defines the relative motion of only two bodies.
alternative to the conventional scheme, afforded by An orthogonal reference frame is attached to
MBO(N)D’s body-based formalism, is to replace in- point p, and another is attached to point q. (The
teractions that are internal to a body with modal
stiffness terms. This allows the bond and nonbond
pairlists to be reduced in size by eliminating in-
teraction calculations between atoms that reside in
the same body, resulting in a more efficient com-
putational process. In addition, fast multipole algo-
rithms can be applied to speed up nonbond calcula-
tions for large molecules.28, 54
MBO(N)D is currently integrated with
CHARMM,53 which provides the molecular defini-
tion, the force field interactions, and postsimulation
analyses. The output from an MBO(N)D simulation
is the same as that from all-atom simulations, with
the addition of body-based information. The coor-
dinates and velocities of every atom at every step
in MBO(N)D trajectories can be determined from
the translations, rotations, and modal amplitudes
of the bodies. Thus, conventional postsimulation FIGURE 1. A multibody dynamics system
analysis algorithms, such as those in CHARMM, substructured into several bodies connected by hinges.
can be applied directly to the trajectories resulting Shown are the body-fixed reference frames (dotted lines)
from MBO(N)D simulations. and the inertial frame (solid lines). Roman numerals
In the following subsections the elements of the indicate body numbers. pi and qi indicate the two points
MBO(N)D modeling approach are described, in- associated with hinge i.
subscripts on p and q are implied.) Now six rela- and through a corresponding series of angular co-
tive position/orientation coordinates are associated ordinates that define the relative orientation of the
with the hinge defined by points p and q. These bodies. These relative hinge coordinate vectors, β,
coordinates are used to define the relative posi- are of dimension 6×1. For the case of unconstrained
tion/orientation of the bodies “joined” by points p relative motion
and q. Point q is located relative to point p by a p-
frame referenced position vector. This vector may β = [θ1 θ2 θ3 x y z]T , (1)
be expressed in Cartesian coordinates or spherical where θ1 , θ2 , and θ3 are Euler angles for uncon-
coordinates, depending on the constraints across strained hinges. For the case of bond-length con-
the hinge. The orientation of the q-frame with re- straints,
spect to the p-frame is represented by three Euler
rotations. If NH is the number of system hinges, β = [φ θ r x y z]T , (2)
6 × NH position/orientation coordinates are used where φ, θ , and r are spherical coordinates for bond-
in conjunction with modal displacement coordi- length constrained hinges. In both cases, x, y, and z
nates for defining the system’s position state. Only are the components of the vector defined from the
the time-variable position/orientation coordinates hinge’s p frame origin to the hinge’s q frame origin.
of the 6 × NH set of coordinates are included as
For rigid bodies, the positional state of the system
state-vector elements. The position/orientation co-
is completely described by the above set of rela-
ordinates whose rates are constrained to be zero are
tive coordinates. For flexible bodies, an additional
not included; however, the position/orientation co-
set of modal coordinates is used to describe the de-
ordinates themselves need not be zero.
formational state of each of the flexible bodies. The
The base-body, indicated by Body I, and the base
deformed position of the ith atom within a flexible
hinge, indicated by points p1 –q1 of Figure 1, are
body is given by
given special consideration. Point p1 of the pair is X
coincident with the inertial origin. Although mo- {xi } = φij ξj + O ξj2 + O ξj3 + · · · , (3)
tions across other hinges represent relative motion j
between the associated bodies, motion across the
base hinge defines the motion of the base body where φij is the partition of the jth mode shape vec-
relative to the inertial frame. By traversing consec- tor corresponding to the x, y, and z coordinates of
utive sequences of bodies and hinges, starting from atom i, and ξ the jth modal coordinate for the flexi-
the base body, and calculating positions and orien- ble body.
tations along the way, one can obtain the inertial In MBO(N)D and many structural dynamics for-
positions and orientations of all of the bodies and mulations, the second and higher order terms in
particles within the multibody system. Note that the eq. (3) are ignored, based on the assumption that
selection of any one body as the base body is purely deformations are small. Equations of motion involv-
arbitrary. Atom coordinates and velocities can be ing nonlinear terms in ξ are computationally more
calculated from the multibody variables (which are intensive, and are, therefore, avoided whenever
the displacement vectors, Euler angles, modal coor- possible. In MBO(N)D, large motions are allowed
dinates, and their time derivatives). between bodies, thus providing a computationally
Interaction forces are applied to each of the atoms efficient solution to the problem of limited motions
in each body. The effect of these interaction forces on inherent to the linear modal representation.
the body coordinates is determined by: summing up The hinge kinematics of the multibody system
the atomistic forces to obtain the body force vector; can be written as
summing up the moments about the body reference β̇ = BU, (4)
to get the body torque vector; and multiplying the
atomistic forces by modal displacement vectors to where β is vector of relative velocities across hinges,
obtain the body’s modal force vector [see also the B is a matrix of kinematic coefficients that relate
Generalized Force Vector section and eq. (10)]. hinge relative velocities to body velocities, and U
is a vector of nonholonomic body velocities con-
Multibody Equations of Motion sisting of angular velocities, translational velocities,
and modal velocities and modal velocities,55
In MBO(N)D, the rigid or flexible bodies are ori- T
ented with respect to one another through a series of U = ωx ωy ωz vx vy vz ξ̇1 ξ̇2 . . . ξ̇m .
vectors that span the hinge points between bodies, (5)
The matrix B in eq. (4) is defined in such a way as Generalized Force Vector
to simplify the specification of free and constrained
degrees of freedom. For bond-length constraints, The generalized force vector is composed of the
the appropriate partition of the B matrix is used following terms for each body:
to transform velocities from Cartesian frames into G = Gff + MU + 12 UMj U − ṀU. (9)
spherical frames. The radial degree of freedom, rep-
resenting the bond direction, can be specified as the The first term, Gff , accounts for chemical interac-
constrained degree of freedom. tions embedded in forcefield form. The matrix
An orthogonal set of selection matrices, 8 and 8 in the second term consists of linear and angular
(consisting of ones and zeroes), are created to extract velocities, and accounts for gyroscopic and coriolis
the free and constrained relative degrees of free- effects. The last two terms account for changes in
dom, respectively: the body’s inertia matrix, resulting from the defor-
mation of the body. The subscript j represents the
β̇ = 8β̇ f + 8α̇. (6) derivative with respect to the jth modal coordinate.
The variable β f represents the unconstrained The dynamics of the MBO(N)D flexible multi-
hinge degrees of freedom, and the variable α rep- body model are driven by forcefield interactions
resents the constrained hinge degrees of freedom. between bodies as well as interactions within flex-
For a frozen degree of freedom, which is the most ible bodies. The equations of motion deal with body
common type of constraint, α is zero. Rheonomic forces, body torques, and modal forces. Currently
constraints, where α is an explicit function of time, two alternative methods for calculating these ag-
can also be specified within this formulation. Such gregate forces are available within MBO(N)D. Note
constraints can be useful for forcing parts of the that interactions between atoms within rigid bod-
molecule to move in a predefined manner. ies are not computed, because they do not affect
The multibody equations of motion that are used the dynamics. The two methods are described as:
in MBO(N)D are derived from Lagrange’s equa- (a) Conventional atomistic forcefields—all forcefield
tions of motion,48 with the generalized velocities evaluations are performed as in all-atom models,
transformed into the set of nonholonomic velocities and the resulting force vector is processed to obtain
defined by the U vector. The dynamics of the multi- the generalized forces, as follows:
P
body system are governed by the equation
P rj × fj
MU̇ = G + BT 8λ, (7) Gff = fj . (10)
P φ T f
where M is a generalized inertia matrix, G is a vector j j
of generalized forces, 8 is the constraint selection Here, the implied summation over j includes all
matrix of eq. (6), and λ is a vector of generalized atoms within the body being considered: rj is the
constraint forces. For rigid bodies, the correspond- vector from the body reference origin to atom j, fj is
ing partitions within U, M, G, and B contain only the total force applied to atom j, and, φj represents
translational and rotational degrees of freedom; for the jth partition of the body-based mode vectors for
particles, the corresponding partitions contain only the flexible body.
translational degrees of freedom. The terms com- Note that any number of fast multipole approx-
prising the generalized force vector are described in imations described in the literature28, 54 can be ap-
the next section. The equations of motion are subject plied to speed up the nonbond force field calcu-
to the following constraint: lations. In this case, certain nonbond components
T of the forcefield interactions are evaluated by a
α̇ = 8 β̇. (8)
conventional fast multipole method. The resulting
The O(N) algorithm takes advantage of the spar- atomistic force vector is processed as in eq. (10)
sity of the matrices M and B,56 and is used to solve to yield Gff . (b) Modal stiffness calculations—for
eqs. (7) and (8) in MBO(N)D. Body accelerations, U, flexible bodies, interactions between atoms on the
are transformed into relative accelerations, β, in a same body can be approximated by a linearization
recursive procedure that traverses the topology of about the nominal body structure used for body-
the multibody system. Note that the computational based mode generation. This method is equivalent
requirements of the O(N) algorithm scale linearly to using the Hessian matrix projected into modal
with the number of bodies (N) in the system. Ap- space, and multiplied by the modal coordinates
pendix A (supplementary material) contains a con- to obtain the modal force. There are no effects on
cise description of the algorithm. body translation or body rotation that result from
body internal interactions. To use the modal stiff- However, as the number of closed loops increases,
ness approach, the force vector is separated into two the computational time and required memory in-
terms—the interbody interaction term, and the in- creases. In such cases, it is prudent to turn off the
trabody interaction term, hinge bond length constraints
Gff = Ginter + Gintra . (11) Degrees of Freedom and Temperature
The interbody interaction term is computed as in
eq. (10), but using a modified bond list and nonbond The MBO(N)D substructured model typically in-
pair list that is restricted to the interactions involv- volves fewer degrees of freedom than the corre-
ing at least one atom outside body j. For dihedral sponding atomistic model. The number of degrees
interactions, this term includes dihedrals where one of freedom of the system is given by
or more atoms are within body j and at least one X
NB
atom is outside body j. The same applies for bond nDOF = (6 + mi ) + 3 × natom − nconstraints . (13)
angle terms. The intrabody interactions are approx- i=1
imated by a modal stiffness term Each multiatom body (i) has six rigid degrees of
Gintra = Kξ = φ T Hφξ . (12) freedom and mi modal degrees of freedom. Each
individually modeled atom has three degrees of
The matrix K is formed by transforming the appro- freedom. The total number of degrees of freedom is
priate Hessian matrix, H, into modal space via a reduced by the presence of bond-length constraints
matrix of modal displacement vectors, φ. The modal (described in the previous section).
coordinate is represented by ξ . The Hessian matrix The calculation of system temperature in
includes only those interactions where all atoms in- MBO(N)D takes into account the reduced number
volved are within body j. of degrees of freedom in the system. The formula
used for the temperature calculation is
Bond-Length Constraints
2K
Constraints for bond lengths and bond angles are T= , (14)
kB nDOF
important elements in MD simulations.10 Typically, where T is temperature, K is the total kinetic energy
these constraints are used in an attempt to elimi- of the system, and kB is the Boltzmann constant.
nate the time constants that are attributable to bond For comparison between MBO(N)D and atom-
stretching and angle-bending terms. In MBO(N)D, istic models, both sets of simulations are run at the
such constraints are handled by the O(N) recursion same temperature. The MBO(N)D model exhibits
algorithm that eliminates the constrained degrees of lower kinetic energy than the atomistic model, but
freedom from the system model. this energy is distributed among correspondingly
MBO(N)D’s constrained hinge bond lengths are fewer (but important) degrees of freedom.
constraints that are applied across covalent bonds
that exist between bodies and/or particles. For Initial Coordinates and Velocities
models involving flexible bodies, the body-based
modal displacement vectors do not currently ac- The initial conditions for MBO(N)D position and
count for fixed bond lengths within the bodies. velocity variables are obtained by a least-squares fit
Thus, the specification of bond length constraints to the atomistic coordinates and velocities. Conse-
affects only those covalent bonds that are between quently, MBO(N)D can be started from the same
bodies. set of coordinates and velocities from which an
Whenever hinge bond lengths are constrained atomistic MD run starts. The final coordinates and
in an MBO(N)D substructured simulation, there velocities of an MBO(N)D run can also be read in
exists the possibility of closed topological loops be- by an all-atom code to start an atomistic simulation.
cause MBO(N)D exactly enforces the constraints by The least-squares fitting procedure is used only once
removing the corresponding degrees of freedom. during initialization.
Examples involving closed loops include: aromatic For the positional variables, the least-squares fit
side chains if modeled as atomistic regions; disul- is performed to solve for the following quantities:
fide bonds if not completely enclosed within one the displacement vectors from the inertial frame
body; and, in some instances, bodies defined by origin to the body frame origin; the rotational trans-
nonconsecutive groups of atoms, such as in across- formation matrices that orient the body frames with
strand beta-sheet bodies. MBO(N)D has no inher- respect to the inertial frame; and the modal ampli-
ent difficulty in handling closed topological loops. tudes of each body. Hinge bond-length constraints
are also imposed for covalent bonds between bod- Runge–Kutta integrator.57 The essential difference
ies, if specified for the simulation. An iterative in this new integrator lies in the inclusion of
Newton–Raphson53 procedure is followed, where velocity-dependent terms that are always evaluated
the second-derivative matrix is a function of the po- at the half step. The positional dependencies are
sitional variables to be solved. evaluated at the beginning and end points of the
For the velocity least-squares fitting, the veloc- integration interval. The algorithm proceeds as
ity vectors, angular velocity vectors, and modal follows. The velocity variables are propagated to
velocities are solved for in a one-time calculation. the half step,
The fitting problem is linear and, therefore, requires
1t 1t 1t
no iteration. The fitting also includes derivatives v t+ = v(t) + a x(t), v t + , (15)
2 2 2
of the bond-length constraints, if these were used
for the position fitting. Furthermore, six additional where v(t) is the velocity state, x(t) is the position
constraints are applied to the velocities, such that state, a[·] is the acceleration, and 1t is the integra-
the MBO(N)D model’s linear and angular momenta tion step size. Note that the acceleration term is
match those of the input atomistic conditions. These evaluated with the position variables at the begin-
can be set to zero. ning of the interval, while the velocity variables are
After the positional and velocity variables have evaluated at the half point of the interval. Because
been computed, these are easily converted into rela- the velocity at the half step is present both in the left-
tive and modal coordinates, and relative and modal hand and right-hand parts of eq. (15), the solution
velocities to initialize the MBO(N)D dynamics inte- needs to be solved by iteration. The initial iterate is
gration. calculated as follows:
1t 1t
MBO(N)D INTEGRATION ALGORITHMS v(0) t + = v(t) + a x(t), v(t) . (16)
2 2
Numerical integration of the MBO(N)D equa- In MBO(N)D simulations, it has been found that one
tions of motion is performed by a specially for- iteration is sufficient for convergence.
mulated algorithm. The Verlet-type integrators that The position variables are propagated to the full
are commonly used for molecular dynamics are not step:
directly applicable to the MBO(N)D equations, be-
1t
cause such integrators assume that the acceleration x(t + 1t) = x(t) + v t + 1t. (17)
2
variables are not functions of velocity. The accel-
eration expression in the MBO(N)D equations of The full-step positions are then used to evaluate the
motion depends nonlinearly on the velocity vari- full-step accelerations. The full-step velocities are
ables as shown in eq. (9). These velocity-dependent then obtained using the following equation:
terms arise from gyroscopic and Coriolis effects,
1t
kinematic constraints, and deformation-dependent v(t + 1t) = v t +
2
inertia terms.
1t 1t
Integrators commonly used for multibody dy- + a x(t + 1t), v t + . (18)
namics, such as Runge–Kutta and predictor- 2 2
corrector methods, are not computationally efficient Note that, as in eq. (15), the position and velocity
for use in MBO(N)D because of the large number variables are evaluated at different time points for
of forcefield evaluations involved. Additionally, the calculation of the acceleration term. In this case, the
energy conservation characteristics of these integra- position variables are at the full step, while the ve-
tors over a large number of integration steps are locity variables are at the half step. Thus, the same
poor. An iterated velocity Verlet, which is some- set of velocities is used to compute the acceleration
times used in atomistic MD for treating velocity terms in both eqs. (15) and (18). Because of this, the
dependency,22 also results in poor conservation of acceleration evaluation at the end of the integration
energy when utilized in MBO(N)D. interval is different from that at the beginning of the
The integration algorithm developed for next integration interval. This is due to the veloc-
MBO(N)D to handle the velocity-dependent terms ity dependencies of the acceleration terms being at
and afford high computational efficiency is similar the middle of the respective intervals. Nevertheless,
to the velocity Verlet algorithm. This new integra- the number of forcefield evaluations is the same as
tor, hereinafter called the Lobatto integrator for that required for Verlet integrators, which is once
brevity, is based on the Lobatto III a-b partitioned per integration step. This is because contribution
to the generalized force vector due to force field three integration methods, under constant energy
evaluations, Gff , which are dependent only upon conditions. Conservation of energy was used as the
positions, can be evaluated at the end of the current criterion for judging the accuracy of the integrators.
step and saved for reuse at the beginning of the next Note that in the example presented, the Runge–
step. Similarly, the iteration required for solution of Kutta integrator used four times as many force field
eq. (15) is not very time consuming because the it- evaluations as the Verlet central difference and Lo-
eration is only over the velocity-dependent terms, batto integrators. The figure shows that the total
which are less computationally intensive to calcu- energy is conserved extremely well for the simula-
late than the position-dependent terms. tion performed by the Lobatto algorithm compared
One can show by Taylor expansion of the accel- to the other two. The CPU times reported in Figure 2
eration term that the above integrator is accurate are for runs performed on an SGI/R4000 computer.
to second order. Even though the Lobatto integra- The results show that the Lobatto integrator is the
tor performs exceedingly well compared to many most efficient, and accurate of the three tested.
other integrators, there is still a small amount of A multiple time-scale version of the Lobatto inte-
drift in linear and angular momentum, which needs grator has been developed to handle multigranular
to be removed from time to time. This is not surpris- models. Details are shown in supplementary mater-
ing, given the complexities arising from quadratic ial.
dependency on velocity, with position-dependent
coefficients, for the MBO(N)D accelerations. MODELING BODY FLEXIBILITY
Figure 2 is a comparison of three integra-
tors: fourth order Runge–Kutta, Verlet central Whole-Molecule Modes vs. Body-Based Modes
difference,58 and the Lobatto integrator for an
MBO(N)D simulation of ubiquitin using a 1-fs time Body-based or component modes refer to a set
step. The 1231-atom ubiquitin structure (1UBQ from of modes used to describe a portion, in this ap-
the Protein DataBank) consists of two alpha helices plication a flexible body, of the modeled system.
and five beta strands. To prepare for the MBO(N)D For MD applications, the component modes form
simulations, the structure was minimized, heated, a basis set for the group of atoms that comprise
and equilibrated atomistically. This procedure was a body. Component modes are used in the engi-
then followed by MBO(N)D equilibration and pro- neering community59 – 61 as an intermediate step
duction simulations. In the MBO(N)D model, all in the solution for system eigenvectors. Recently,
unstructured regions were treated atomistically (i.e., such methods have been adapted to the problem
atoms as particles), each alpha helix was treated as a of solving for the whole-molecule normal modes
flexible body, and each beta strand was treated as a of large macromolecules.62, 63 Similar methods for
flexible body. Vacuum modes with frequencies less efficient calculation of normal modes have been de-
than 80 cm−1 were selected for each flexible body. veloped by Durand,64 and Mouwad and Perahia.65
MBO(N)D simulations were performed using the The methods are similar, with each solving a num-
ber of small diagonalization problems, and coupling
these solutions to obtain the low-frequency modes
of the entire system.
Component modes have been used directly to
describe deformational motions of flexible compo-
nents of articulated mechanical systems, such as
mechanical robots with flexible arms, where there
are large motions between the component parts.66
Although the body-based modes account for small
motions within the body, translational and rota-
tional degrees of freedom allow for large motions
between bodies.
FIGURE 2. Total energy for 10 ps MBO(N)D Mode generation involves several issues: the
simulations of ubiquitin using 1-fs time steps for three method of mode generation, the coordinates from
different integrators: fourth order Runge–Kulta (RK4), which to start the generation process, and the se-
Verlet Central Difference (VCD), and Lobatto. The CPU lection of modes to retain. In MBO(N)D, two types
times are reported for runs done on an SGI/R4000 of body-based modes have been implemented: vac-
computer. uum modes and fixed environment modes. These
methods reflect the use of different boundary con- are computed for the molecule using its minimum-
ditions for the molecular component. energy state as the reference coordinates for calcu-
lation of the system Hessian matrix. The resulting
Vacuum and Fixed Environment Modes modes are, thus, valid in the narrow harmonic re-
in MBO(N)D gion around this minimum energy state. Often, the
minimum-energy structure is in a different confor-
Calculation of the Hessian matrix for each body mation than those that are at room temperature. To
in the absence of the rest of the system results in generate a set of body-based mode vectors that are
a set of modes referred to as vacuum modes. This valid at the temperature of interest, the following
set of modes is simple to calculate, as knowledge of procedure was adopted for generating the reference
other parts of the molecule is not required. How- structure. A structure that has been equilibrated
ever, these modes may be less useful when the at room temperature or other desired simulation
body’s deformational motions are strongly affected condition is subjected to a small number of mini-
by its surroundings, such as in buried regions. mization steps using steepest descents to yield the
The second method of generation uses a Hessian reference structure. Note that this is only a partial
matrix that is calculated assuming the rest of the minimization, and serves to relieve instantaneous
system is fixed in inertial space. This set of modes bad contacts while keeping the conformation close
is referred to as fixed environment modes. This to the desired initial state. Alternative methods of
approach accounts for some of the effects of the in- obtaining an initial coordinate set exist such as min-
teraction with the rest of the system; however, the imization using a gradient criteria. However, it is
fixed environment modes may be overly restrictive unclear which method is most general.
on the low-frequency motions of the body. Any modal solution using nonminimum-energy
The calculation of modes is straightforward for coordinates results in modal displacement vec-
both mode generation methods. The usual Hessian tors that correspond to imaginary or unstable fre-
matrix for the entire system can be divided into par- quencies. These unstable modes are important for
titions that represent the bodies of the system: describing barrier crossings and transitional mo-
HAA HAB HAC . . . tions within the molecule or flexible body.67 For
HBB HBC . . .
MBO(N)D, the purpose of the body-based modes
H= HCC . . . . (19) is to provide a set of basis vectors for describing
.. the elastic motions of the body. The imaginary fre-
. quencies do not present a problem for MBO(N)D
To calculate the fixed environment modes for dynamics calculations, because the frequency val-
body A, diagonalize the partition HAA after mass ues themselves are not used in propagating the
weighting it. To obtain the vacuum modes for dynamics of the system. However, these frequency
body A, modify the assembly process to exclude the values may influence the selection of the modes to
components that arise from interactions between be retained for the simulation (see the next section).
atoms in body A and atoms outside of it. Diago- For example, identifying the transition states as-
nalization of the resulting mass-weighted Hessian sociated with unstable modes may influence mode
partition then yields the vacuum modes. selection. Selection of unstable modes associated
Within any MBO(N)D model, it is possible to with desired barrier crossings68 would allow for this
use a set of vacuum modes for one body and a type of motion in the subsequent simulation.
set of fixed environment modes for another body, The procedure of applying a short amount of
as long as the same type of modes is used within minimization to obtain the reference structure has
any given body. MBO(N)D simulations using vac- the effect of reducing the number of unstable modes
uum and fixed environment modes on a variety of that would have resulted if no minimization had
molecular systems has not resulted in a clear prefer- been applied at all. Including the remaining unsta-
ence. In general, we have favored the use of vacuum ble modes allows more anharmonic motion during
modes because they are more efficient to compute. the simulation.
The proper choice of reference coordinates for Currently, there are two alternatives for the se-
body-based mode generation is very important for lection of body-based modes for use in MBO(N)D
obtaining good simulation results. Normal modes dynamics. The first approach is to use the frequency
associated with each mode as a means for select- For proteins, information on the system’s mo-
ing or discarding modes. For stable modes, one tion can be obtained from several sources: (1) If the
can select a cutoff frequency such that only modes Protein Data Bank (PDB) has 3D coordinates for
lower than this frequency are retained. For unsta- several conformations of a given molecule, analy-
ble modes, animation of the modal displacement sis of the multiple set of conformations provides
vectors has shown that unstable modes with large information for grouping atoms into bodies or iden-
imaginary values correspond to highly localized tifying hinges. (2) Short atomistic MD or Monte
motions, such as bond stretching and angle bend- Carlo trajectories can be analyzed to suggest the dy-
ing, and involve only two or three atoms of the namical characteristics of the local conformational
body. As the magnitude of the imaginary frequency space. (3) Dynamical information from NMR spec-
decreases, there is less localized behavior and more troscopy and crystallographic B-factors (tempera-
global behavior in the unstable modal displace- ture factors) can give a rough indication of which
ment vectors. Thus, a selection criterion for unstable parts of the system have larger motions. (4) Knowl-
modes is to select a cutoff frequency and animate edge of particular types of protein dynamics can
the modes in the vicinity of this cutoff frequency be applied. For example, alpha helices tend to have
to observe what kinds of motions are described by small amounts of motion, followed by beta-sheets,
those modes. and unstructured regions. Prolines tend to intro-
The second mode-selection approach utilizes duce kinks into alpha helices, glycines introduce
some form of analysis on the modal displacement disordered floppy points, and peptide groups tend
vectors to evaluate and rank the modes. A delocal- to stay planar. This type of information can be used
ization factor criterion70 has been implemented to to specify the substructuring scheme for molecular
help determine whether a particular mode should systems where other information on structural flex-
be retained. This factor is used to distinguish modes ibility is not available.
that have localized behavior, such as local bond A number of MBO(N)D substructuring strategies
stretching and angle bending, from those that have a have been explored for proteins, and the following
global behavior, such as helical torsion and bending. sections describe how these strategies work. The ba-
The delocalization factor is defined as sic MBO(N)D modeling framework allows for the
P 4 development of many different types of substruc-
8
delocalization factor = P i i2 , (20) turing strategies. The following descriptions can be
2
8
i i regarded as examples from which a user can de-
where the 8i are the elements of the modal displace- velop a specific strategy for the particular modeling
ment vector of interest. problem at hand.
In general, the smaller the value of the delocal-
Molecular Domains
ization factor, the more global the character of the
mode. A minor problem with this criterion is that Some protein motions can be characterized as
some types of concerted local behavior, such as lo- motions between domains. To substructure such a
calized C—H stretching all over the molecule, may molecule, first identify the two or more domains
yield a small delocalization factor. Because these and the linker regions between domains. The linker
modes tend to be of high frequency, it seems that regions will need to be more finely substructured
the best approach is first to apply a frequency cut- than the domain regions. The typically correlated
off, and then to sort the low- to medium-frequency nature of the side chain motions of residues near the
modes by using the delocalization factor. interdomain/linker region requires finer substruc-
turing in this region. (See the Side Chain section.)
SUBSTRUCTURING STRATEGIES The portions of the domains away from the linker
region can be substructured into much larger bod-
The essential idea behind MBO(N)D’s substruc- ies. The disparity in body sizes and the associated
tured modeling methodology is that large relative disparity in time scales for the dynamic simulation
motions are allowed between bodies, while relative can be handled best by use of the MTS integrator.
motions within each body are assumed to be small
for flexible bodies, or negligible for rigid bodies. The Secondary Structure Elements
goal of the substructuring procedure, therefore, is to
identify groupings of atoms that can be treated as It is natural to expect that structurally well-
bodies, and to identify the bonds, hinges, or general defined regions such as alpha-helices and beta-
areas where large motions take place. sheets, whose atoms exhibit concerted motions,
Side Chains. Side chains can be substructured This section presents the results from a number
independently of the protein backbone, as several of simulation analyses on various molecular sys-
rigid or flexible bodies, or they can be treated atom- tems that have been conducted by us to test and
istically. A χ-angle analysis over a short atomistic evaluate MBO(N)D. We selected these systems to
run is often effective in helping to decide on a sub- characterize the dynamics of MBO(N)D and vali-
structuring scheme for side chains. However, for date the use of substructured modeling in a variety
longer side chains, crankshaft motion with little of systems that differ substantially in size and range
change in excluded volume is possible. Side-chain of motion, and to compare the results to correspond-
motion can be more effectively determined by con- ing atomistic methods. Due to the larger number of
sidering the side-chain angle formed by the vector DOF, atomistic systems require significantly more
from the N to C atoms of the residue, and the vector time to stabilize energetically than body-based sys-
from the Cα to an atom at the side-chain terminus. tems; our test systems are, therefore, relatively small
Figure 4 is a schematic diagram showing the vectors to permit an easier task for meaningful compar-
that are used to define the side-chain angle. If the isons. Consequently, the systems studied herein do
analysis of multiple structures, or of frames from not afford optimum conditions for demonstrating
an atomistic trajectory, shows that the overall rela- MBO(N)D’s computational speed capability, which
tive motion between the side chain and main chain is favored by much larger molecules with larger
(as measured by our side-chain angle) of a given bodies than those we will discuss. Our analyses and
residue is large, a candidate substructuring is to cut results are, nevertheless, an important first step in
at the φ and ψ angles, and even the χ1 angle. Ad- assessing MBO(N)D’s capability to yield acceptable
ditional analyses, such as calculating the dihedral agreement with atomistic simulations in terms of
fluctuations for φ, ψ and χ1 angles can be performed the essential dynamics. Once MBO(N)D’s capabili-
on the candidate residues. A large dihedral fluctu- ties and behavior are well understood at this level,
ation will indicate that a particular torsion angle our methodology can be extended to much larger
should not be constrained. See supplementary ma- systems where it will have the opportunity to fa-
terial for an example plot of side chain angle. cilitate significantly higher computational speedups
For globular proteins, experience has shown that compared to atomistic simulations.
it is important to allow χ1 and other torsional side All MBO(N)D simulations were performed using
chain motions in buried regions of the protein for the MBO(N)D code. The standard atomistic sim-
certain essential dynamic characteristics to be re- ulations, to which we compare our results, were
produced. Note that allowing side-chain motion performed with the CHARMM molecular model-
within buried regions does not necessarily require ing program.53 We explored a variety of different
the use of smaller time steps. The atomic fluctu- substructuring strategies to determine the effects
ations depend on details of the local packing en- on the dynamics of each of the following five sys-
vironment, which may serve to over damp local tems. (1) Alanine dipeptide—a relatively small and
oscillations.72, 73 simple system that has a well-defined and frequent
transition between two distinct conformers on the
subnanosecond time scale. (2) The terminal frag-
ment of the L7/L12 ribosomal protein from E. coli—
a globular protein system containing loops, helices,
and beta strands that exhibits key motion between
two of the three helices. A wide range of substruc-
turing strategies were applied to 1CTF. (3) Dick-
erson dodecamer—this system is a DNA duplex
initially in the B form. The goal is to characterize
the ability of MBO(N)D to handle large conforma-
tional changes induced by a external shear force in
nucleic acids. (4) HIV-1 protease–ligand complex—
FIGURE 4. Schematic diagram showing the vectors a protein–ligand complex that represents a model
used for defining overall angular motion between system encountered in rational drug design. We ap-
side-chain and main-chain groups. plied a unique pulling protocol to examine ligand–
FIGURE 5. φ –ψ scatter plots for alanine dipeptide at 300 K for (a) atomistic, (b) MBO(N)D simulations with rigid
peptide plane bodies, and (c) single flexible body with all modal DOF. The values from initial coordinates are
represented by the white circle.
is a medium-size protein that contains 65 residues, at selected ψ dihedrals. Each helix was substruc-
and is essential for efficient polypeptide synthesis tured as a single rigid body. Total of 20 bodies. This
in bacteria. 1CTF contains three α-helices and three scheme is shown in Figure 6. Total number of DOF
β-sheets that account for approximately 76% of its = 95. Maximum time step = 20 fs.
structure.
Simulations by Åqvist et al.,80, 81 using the GRO- Case 2. Same substructuring as in case 1, with
MOS force field, found that an important feature of body-based modes added to each of the 20 bodies.
the dynamics of 1CTF is the low frequency motion, The modes in each body were sorted by delocaliza-
around 5 cm−1 , of the B helix relative to the rest tion factor [eq. (20)] and the lowest 10 modes were
of the structure. This motion was clearly displayed added to each helix body. Similarly, the lowest three
when helix C is used as a reference for the rela- modes were added to each of the small bodies. Fifty-
tive movement. Helix C is tightly bound to the beta eight modes were added. Total number of DOF =
sheet and, therefore, has restrained motion. Thus, 153. Maximum time step = 10 fs.
the motion between helices B and C is considered
the essential dynamic behavior of 1CTF. The pur- Case 3. The entire system was substructured
pose of this test case was to assess the speedups into 31 small rigid bodies with hinges at φ or ψ an-
attainable from various MBO(N)D substructuring
strategies, as well as the level of agreement in the
essential dynamics with atomistic simulations.
1CTF Results
gles. Total number of DOF = 150. Maximum time substructuring. MTS was not utilized with cases 4
step = 15 fs. and 5, as the potential for simulation speedup was
in the range of that attainable by atomistic tech-
Case 4. Based on case 1 substructuring, with each niques. Integration time steps used in these MTS
helix further separated into a flexible main-chain runs are listed in Table II. In these substructur-
body and several side-chain bodies (except for Ala ing strategies, case 1 is the coarsest substructure
and Gly residues), resulting in 66 bodies. Body- scheme, and case 5 is the finest.
based modes with a natural frequency less than A summary of results from all MBO(N)D and
100 cm−1 were added to bodies with more than 10 atomistic simulations performed for 1CTF are listed
atoms. One hundred seven modes were added. To- in Table II. Each MBO(N)D simulation shows good
tal number of DOF = 432. Maximum time step = total energy conservation during the 140 ps produc-
5 fs. tion run; the RMS fluctuation of total energy for
each MBO(N)D simulation is well below our cri-
Case 5. Peptide planes and side chains modeled
terion for a stable run (<1 kcal/mol). shows the
as bodies, and Cα atoms as particles. The five low-
following three general trends. First, smaller time
est frequency modes were added to each side-chain
steps result in increased stability. Second, increas-
body. Total of 150 bodies and 180 modes. Total num-
ing the number of DOF lowers the time step needed
ber of DOF = 966. Maximum time step = 1 fs.
for stable simulation. It is possible that two differ-
Case 6. Domain-based substructuring. Two small ent substructuring strategies can result in different
domains in 1CTF were separated into subdomains stabilities and speedups, even though they have the
including: αα domain (which includes helices A same number of DOF. In other words, the number
and B), and the B-sheet domain (which includes the of DOF used is not necessarily a good predictor of
β-sheet and helix C). Those domains were substruc- MBO(N)D performance; proper substructuring rep-
tured into bodies, which had several residues each. resenting the motions of interest is important also.
Then, linker residues, connecting between the two Third, MTS results in energy stabilities comparable
domains, were substructured with one-residue bod- to the single time step results, but with increased
ies (a total of 20 bodies). Total number of DOF = 95. speed (as we will show).
Maximum time step = 20 fs. Figure 7 shows the speedup vs. the ratio of the
MBO(N)D to atomistic B-C angle RMS fluctuation.
We performed additional simulations using the Representative probability density functions of the
MTS integrator for cases 1, 2, 3, and 6 to evaluate B-C helical angle are presented in Figure 8. As
the performance of this methodology with different expected, both figures show that finer MBO(N)D
TABLE II.
Comparison of Atomistic and MBO(N)D Simulations of 1CTF.
Number
of 1tb Erms c
bodies Modes DOFa (fs) (kcal/mol)
FIGURE 7. Speed up vs. ratio of MBO(N)D to atomistic FIGURE 8. Representative probability density
normalized standard deviation of the B-C interhelical functions of the deviation of the B-C interhelical angle
angle. about its respective average for several MBO(N)D runs
compared with the atomistic result.
Discussion
We have developed a modeling methodology
FIGURE 14. Plot of extraction force vs. applied force based on multibody dynamics, MBO(N)D, for per-
increment at every 10 ps for atomistic simulations and forming long time scale simulations of macromole-
MBO(N)D simulations using the H1 and H2 cules. MBO(N)D is designed to take advantage of
substructures. the fact that low-frequency motion dominates the
overall global motions of macromolecular systems.
Side Chains section) of a short atomistic pull simu- Our studies on various systems that differ sub-
lation were substructured more finely as side-chain stantially in size and dynamics suggest that the
bodies (cutting at the φ and ψ angles). The H1 global motions from MBO(N)D are quite compa-
and H2 substructuring strategies, along with that rable to atomistic results. The studies also suggest
for A-74707, resulted in 131 and 145 bodies, respec- that our substructuring strategies are applicable to
tively, for the protease–ligand system. Figure S7 in a wide range of molecular systems. Interhelical an-
supplementary material shows details of the sub- gle motions, separation forces, order parameters,
structured model. In this test case, the amount of end-to-end distance, and transitional structures are
motion needed was greater than that allowed by all properties that seem to be well within the ca-
modes; therefore, these simulations were done with pabilities of MBO(N)D. The maximum speedups
small rigid bodies in the hinge-bending regions of associated with these results vary from a factor 8
the protein. to 30, depending on the system, type of property,
Figure 14 shows the extraction forces required and corresponding substructuring. Detailed or high
to pull the A-74707 from HIV protease for a va- frequency motions such as RMS atomistic fluctua-
riety of applied forces from the atomistic and tions are sacrificed for the gains in speed. The size
MBO(N)D simulations. This figure suggests that and frequency of the motion of a body affects the
both MBO(N)D simulations are converging on the maximum time step. That is, some MBO(N)D sim-
atomistic results in the slow pulling region. The re- ulations can tolerate small bodies as long as its
sults for H1 converged faster than H2 because fewer frequency of motion is relatively low: for example,
degrees of freedom are included. The range of val- buried side chains bodies that were relatively im-
ues for the extraction forces in Figure 14 underscores mobile in 1CTF resulted in 10 fs time step, but all
the need to perform multiple simulations with dif- side-chain bodies (some of which were relatively
ferent rates of pulling. Only when the values of the mobile) resulted in a much lower time step.
extraction forces have converged within a reason- The key elements of MBO(N)D are the grouping
able range of one another can the contribution to of atoms into rigid or flexible bodies (substructur-
the dynamics from inertial effects be understood. ing), and the modeling of body flexibility using a
The maximum extraction force was computed as truncated set of body-based mode shape vectors.
the maximum value of a curve that resulted from Both elements serve to eliminate high-frequency
averaging every 20 ps over the raw data. We es- motions from the simulation. The remaining low-
timate that the errors associated with each point frequency motions permit the use of long integra-
in Figure 14 is 50 pN, which is based on the shift tion time steps that give MBO(N)D a computational
in maximum value when the window of averag- speed advantage over atomistic simulations.
ing is increased to 40 ps (as was done similarly by Macromolecular systems have diverse local and
Grubmueller94 ). global dynamical properties, and therefore, no sin-
Figure S8 in supplementary materials shows the gle substructuring strategy works best for all pos-
ligand escaping the protease, and is representa- sible cases. We have not identified a single analysis
method that will completely automate the process With the macromolecule substructured into bod-
of substructuring, but have instead relied upon ies, and body flexibility captured with body-based
semiautomated approaches. The types of analyses modes, the model allows large relative motions to
described in this article help in determining the take place between the bodies, while small defor-
most plausible substructuring for the system under mational motions are assumed within the flexible
question, with the intent being that the important bodies. Force field interactions are computed in the
properties—global properties—will be captured ad- normal atomistic fashion, with the resulting atom
equately. We have, nevertheless, found that the sim- forces being projected into the reduced degrees
ple protocol of first defining a “base” substructuring of freedom, yielding body forces, body torques,
that involves three residues per body for the en- and modal forces, for dynamic propagation of the
tire protein, and then refining the substructuring MBO(N)D model. The ability of the bodies to un-
in areas of interest is a reasonable approach to the dergo large motions, and the use of the fully non-
problem. linear force field are the features that allow the
We have shown that a multibody protocol cou- MBO(N)D modeling approach to capture the impor-
pled with the Lobatto integrator produces very sta- tant anharmonic effects of molecular dynamics.
ble dynamics with elevated time steps for constant The substructuring capability in the MBO(N)D
energy and constant temperature ensembles. As far methodology is very versatile. For example, as bod-
ies become smaller (i.e., smaller number of atoms
as we know, the time steps used for the constant en-
per body, with a corresponding increase in the num-
ergy runs are the largest reported in the literature to
ber of bodies), and more body-based modes are
date that use atomistic force fields.
included, the system approaches an atomistic rep-
The multibody equations of motion in MBO(N)D
resentation. The multiple time scale Lobatto inte-
are solved by an efficient O(N) algorithm that scales
grator exploits systems with dissimilar sizes and
linearly with the size of the system. The Lobatto in-
motions, and produces up to a twofold speed im-
tegrator, which is used to propagate the dynamics, provement over the multibody simulations with
is very efficient, and has two main advantages over single time steps, depending on the system and sub-
other integrators for multibody dynamics applica- structuring used.
tions. First, it requires only one force field evalu- There are several potentially useful applications
ation per step while being accurate to the second of the MBO(N)D modeling approach, and some ex-
order. This attribute is on par with the Verlet class amples are briefly discussed in turn.
of integrators for atomistic MD. Second, it accounts
for the nonlinear velocity dependency of the acceler- 1. Folding of structural elements in proteins:
ation expression. Proper handling of this nonlinear MBO(N)D substructuring can be used to pre-
velocity dependency is critical to the efficient and serve certain structural features (e.g., an alpha
accurate propagation of MBO(N)D dynamics. helix) during dynamics. The large-scale mo-
The position fluctuations from MBO(N)D sim- tions between these structural elements can
ulations at the detailed atomistic level, however, then be explored.
tend to be lower in magnitude than for atomistic 2. Protein–ligand, protein–protein, and protein–
simulations. This reduction in atomistic mobility nucleic acid interactions: the long time scale
is not surprising because most or all of the inter- properties of these complexes continue to be
nal high-frequency motions have been eliminated. one of the main problems that cannot be
Said differently, the small localized atomistic mo- treated meaningfully using atomistic meth-
tions are neglected while the more global motions ods. The main advantage of MBO(N)D in
are retained (e.g., domain–domain bending). Never- these cases is speed, and therefore, the ability
theless, the fluctuation profile—the relative fluctua- to explore longer time-scale properties.
tions among the atoms and residues—is reproduced 3. Supramolecular assemblies and rigid rod
by the MBO(N)D models. Reducing the degrees of polymeric systems: systems that approach
freedom, as noted by others, can also have the un- mesocale dimensions are known to have
desirable effect of eliminating the coupling between properties that cannot be elucidated from
degrees of freedom (e.g., the coupling between bond studying the individual component mole-
angles and torsions). This problem can be alleviated, cules. MBO(N)D provides the framework that
however, by adjusting the force field for the remain- can be used to build very large assemblies of
ing degrees of freedom,103 as was done, for example, molecules and study their dynamical proper-
by Head–Gordon and Brooks for virtual bodies.3 ties.
There are a number of ongoing MBO(N)D devel- broad, and have been successful in PKa calculations
opment efforts that are being pursued by us and for BPTI,109 binding free energy calculations for HIV
others to broaden its applicability to other types protease inhibitors,110 and the dynamics of enzyme
of simulations. The most important of these is the catalysis.111
modeling of solvent. Solvent mediates many molec- The MBO(N)D modeling methodology has
ular motions and properties, and correctly account- shown promise in providing significant compu-
ing for the environment of a molecule is crucial for tational speedups while reproducing important
meaningful results in many cases. Solvent can medi- dynamical global properties of protein, nucleic acid,
ate, for example, bulk properties such as dielectric and polymeric systems. A number of modifications
screening, as well as localized properties such as to the methodology have been identified, that will
the geometry and dynamics of side chains at the further enhance MBO(N)D’s ability to reproduce
protein/solvent interface. In special but important other important dynamical behaviors. MBO(N)D
cases of localized properties, a molecule of solvent represents a new approach to removing the
can be considered as part of the ligand binding site computational bottleneck associated with atomistic
in a protein. A good model of solvent will, there- methods. The goal of MBO(N)D is to permit the
fore, address either one or all of these properties efficient study of very long time-scale properties
adequately, depending on the problem to be stud- and of very large molecular systems.
ied. The example simulations in this article were
performed with a crude implicit solvation protocol
(distance-dependent dielectric). Acknowledgments
We are considering several other implicit solva-
tion methods approaches for incorporation within The authors wish to express our greatest appreci-
the MBO(N)D paradigm that have shown promis- ation to Professor Martin Karplus for his interest in
ing simulation results. These methods are the fol- and support of the development of MBO(N)D. He
lowing. First, including only a reduced number of provided many substantial suggestions on method-
explicit waters that are proximate to the surface of ologies, test cases, and analysis methods. His advice
the macromolecule, and that fill the depressions and and insights in molecular modeling were invaluable
holes if appropriate. This “thin shell approach” to to this effort. In addition, the authors would like
solvation is the most straightforward approach as to thank the following individuals for their contri-
each explicit water molecule is treated as a body. bution to this work: Dr. Harold P. Frisch for initial
Indeed, recent work by Steinbach104 and Mazur105 discussions and strong encouragement; Dr. Herman
shows that a relatively small number of explicit van Vlijmen for help on implementing the body-
water molecules is sufficient to solvate myoglobin based mode generation capability; Venkatraramen
and DNA with good reproduction of key dynam- Mohan for preprints of his work on nucleic acids;
ical properties. A fully solvated system would not and Dr. Ryszard Czerminski for numerous technical
be appropriate for MBO(N)D, as the existence of a discussions. The National Aeronautics and Space
large number of very small bodies can severely limit Administration funded initial work for the develop-
the speedups that can be obtained. Second, meth- ment of the O(N) algorithm.
ods such as electrostatic continuum methods106, 107
or empirical solvation potentials,112, 113 are con-
sistent with the reduced variable approach of
MBO(N)D. These methods have levels of fidelity
Supplementary Materials
and time constants that are commensurate with typ-
ical MBO(N)D substructured models. Recent results A substantial amount of supporting text and fig-
by McCammon, for example, have demonstrated ures are contained in the supplementary materials.
the advantage of the Poisson–Boltzmann Stochastic
Dynamics method for reproducing conformational
statistics of alanine dipeptide from fully solvated References
simulations.108 Studies on macromolecules using
these methods, however, are still ongoing. Third, 1. Brooks, III, C. L.; Karplus, M.; Pettitt, B. M. Advances in
Chemical Physics; John Wiley & Sons: New York, 1988, vol.
hybrid methods that combine the aspects of explicit 71.
molecules of water at the surface of the macromole- 2. McCammon, J. A.; Harvey, S. C. Dynamics of Proteins and
cule, with the implicit description of water outside Nucleic Acids; Cambridge University Press: Cambridge,
the solvent shell. The approaches in this area are MA, 1987.
3. Head-Gordon, T.; Brooks, III, C. L. Biopolymers 1991, 31, 37. deGroot, B. L.; Amadei, A.; vanAlten, D. M. F.; Berendsen,
77. H. J. C. J Biomol Struct Dynam 1996, 13, 741.
4. Brünger, A. T. X-PLOR Manual (version 3); Yale University 38. Space, B.; Rabitz, H.; Askar, A. J Chem Phys 1993, 99, 9070.
Press: New Haven, CT, 1992. 39. Balsera, M. A.; Wriggers, W.; Oono, Y.; Schulten, K. J Phys
5. Jain, A.; Vaidehj, N.; Rodrigues, G. J Comp Phys 1993, 106, Chem 1996, 100, 2567.
258. 40. Peskin, C. S.; Schlick, T. Commun Pure Appl Math 1989,
6. McCammon, J. A.; Pettitt, B. M.; Scott, L. R. Comput Math 42, 1001.
Applic 1994, 28, 319. 41. Gear, C. W. Numerical Initial Value Problems in Ordinary
7. Elber, R. Curr Opin Struct Biol 1996, 6, 232. Differential Equations; Prentice Hall: Englewood Cliffs,
8. Leimkuhler, B.; Reich, S.; Skeel, R. D. Mathematical NJ, 1971.
Approaches to Biomolecular Structure and Dynamics; 42. Zhang, G.; Schlick, T. J Comp Chem 1993, 14, 1212.
Mesirov, J. P.; Schulten, K.; Sumner, D. W., Eds.; Springer 43. Schlick, T.; Olson, W. K. J Mol Biol 1992, 223.
Verlag: New York, 1996, p. 161, vol. 82.
44. Hao, M. H.; Pincus, M. R.; Rackovsky, S.; Scheraga, H. Bio-
9. Schlick, T.; Barth, E.; Mandziuk, M. Annu Rev Biophys Bio- chemistry 1993, 32.
mol Struct 1997, 26.
45. Olender, R.; Elber, R. J Chem Phys 1996, 105, 9299.
10. Ryckaert, J. P.; Ciccotti, G.; Berendsen, H. J. C. J Comp Phys
1977, 23, 327. 46. Czerminski, R.; Elber, R. J Quantum Chem 1990, 24, 167.
11. Andersen, H. J Comp Phys 1983, 52. 47. Onsager, L.; Machlap, S. Phys Rev 1953, 91, 1505.
12. van Gunsteren, W. F.; Karplus, M. Macromolecules 1982, 48. Bodley, C. S.; Devers, A. D.; Park, A. C.; Frisch, H. P. NASA
15, 1528. Technical Paper 1219, 1 (1978).
13. Rice, L. M.; Brünger, A. T. Proteins Struct Funct Genet 1994, 49. Turner, J. D.; Weiner, P. K.; Chun, H. M.; Lupi, V.; Galion,
19, 277. S.; Singh, U. C. Computer Simulation of Biomolecular
Systems: Theoretical and Experimental Applications; Van
14. Ryckaert, J. P.; Bellemans, A. Chem Phys Lett 1975, 30, 123.
Gunsteren, W. F.; Weiner, P. K.; Wilkinson, A. J., Eds.; ES-
15. Mazur, K.; Abagyan, R. A. Biomol Struct Dynam 1989, 6, COM: Leiden, 1993.
815.
50. Levy, R. M.; Karplus, M.; Kushik, J.; Perahia, D. Macro-
16. Gibson, K. D.; Scheraga, H. J Comp Chem 1990, 1, 468. molecules 1984, 17, 1370.
17. Mazur, K.; Dorofeev, K. V.; Abagyan, R. A. J Comp Phys 51. Horiuchi, T.; Go, N. Proteins 1991, 10, 106.
1991, 92, 261.
52. Mizuguchi, K.; Kidera, A.; Go, N. Proteins 1994, 18, 34.
18. Durup, J. J Phys Chem 1991, 95, 1817.
53. Brooks, B. R.; Bruccoleri, R. E.; Olafson, B. D.; States, D. J.;
19. Durup, J. Biopolymers 1992, 32, 561. Swaminathan, S.; Karplus, M. J Comp Chem 1983, 187.
20. Streett, W. B.; Tildesley, D. J.; Saville, G. Mol Phys 1978, 35, 54. Ding, H. Q.; Karasawa, N.; Goddard, III, W. A. J Chem
639. Phys 1992, 97, 4309.
21. Grubmüller, H.; Heller, H.; Windemuth, A.; Schulten, K.
55. de Jalón, J. G.; Bayo, E. Kinematic and Dynamic Simulation
Mol Sim 1991, 6, 121.
of Multibody Systems; Springer Verlag: New York, 1994.
22. Tuckerman, M. E.; Berne, B. J.; Martyna, G. J. J Chem Phys
56. Chun, H. M.; Turner, J. D.; Frisch, H. P. Paper AAS 89-457,
1992, 97.
AAS/AIAA Conf., Stowe, VT (1987).
23. Watanabe, M.; Karplus, M. J Chem Phys 1993, 99, 8063.
57. Lapidus, L.; Seinfeld, J. H. Numerical Solution of Ordinary
24. Watanabe, M.; Karplus, M. J Phys Chem 1995, 99, 5680. Differential Equations; Academic Press: New York, 1971.
25. Humphreys, D. E.; Friesner, R. A.; Berne, B. J. J Phys Chem 58. Ferrario, M.; Ryckaert, J. P. Mol Phys 1985, 54, 587.
1994, 98, 6885.
59. Benfield, W. A.; Hruda, R. F. AIAA J 1971, 9, 1255.
26. Forester, T.; Smith, W. Mol Sim 1994, 13, 195.
60. Craig, R. R., Jr. Shock Vib Digest 1977, 9.
27. Procacci, P.; Berne, B. J. J Chem Phys 1994, 101, 2421.
61. Lee, A. Y.; Tsuha, W. S. J Guide Control Dynam 1994, 17,
28. Greengard, L.; Rokhlin, V. J Comp Phys 1987, 73, 325. 69.
29. Schlick, T.; Brandt, A. Sci Eng 1996, 3, 78. 62. Hao, M. H.; Harvey, S. C. Biopolymers 1992, 32, 1393.
30. Briggs, W. L. A Multi-Grid Tutorial; SIAM: Lancaster, PA, 63. Hao, M. H.; Scheraga, H. Biopolymers 1994, 34, 321.
1987.
64. Durand, P.; Trinquier, G.; Sanejouand, Y. H. Biopolymers
31. Perahia, D.; Levy, R. M.; Karplus, M. Biopolymers 1990, 29, 1994, 34, 759.
645.
65. Mouawad, L.; Perahia, D. Biopolymers 1993, 33, 599.
32. Ichiye, T.; Karplus, M. Proteins 1991, 11, 205.
66. Gerber, P. Biopolymers 1992, 32, 1003.
33. Hayward, S.; Kitao, A.; Go, N. Protein Sci 1994, 3, 936.
67. Straub, J. E.; Rashkin, A. B.; Thirumalai, D. JACS 1994, 116,
34. Amadei, A.; Linssen, A. B. M.; Berendsen, H. J. C. Proteins
2049.
Struct Funct Genet 1993, 17, 412.
68. Straub, J. E.; Choi, J. K. J Phys Chem 1994, 98, 10978.
35. vanAalten, D. M. F.; et al. Proteins Struct Funct Genet 1995,
22, 45. 69. Tirion, M. M. Phys Rev Lett 1996, 77, 1905.
36. Amadei, A.; Linssen, A. B. M.; deGroot, B. L.; vanAlten, D. 70. Ichiye, T.; Karplus, M. Proteins Struct Funct Genet 1987, 2,
M. F.; Berendsen, H. J. C. J Biomol Struct Dynam 1996, 13, 236.
615. 71. Kabsch, W.; Sander, C. Biopolymers 1983, 22, 2577.
72. Swaminathan, S.; Ichiye, T.; Vangunsteren, W.; Karplus, M. 93. Florin, E. L.; May, V. T.; Gaub, H. E. Science 1994, 264, 415.
Biochemisty 1982, 21, 5230. 94. Grubmüller, H.; Heymann, B.; Tavan, P. Science 1996, 271,
73. McCammon, J. A.; Karplus, M. Biopolymers 1980, 19, 1375. 997.
74. Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; 95. Erickson, J.; Neidhart, D. J.; VanDrie, J.; Kempf, D. J.; Wang,
DiNola, A.; Haak, J. R. J Chem Phys 1984, 81, 3684. X. C.; Norbeck, D. W.; Plattner, J. J.; Rittenhouse, J. W.;
75. Rossky, P. J.; Karplus, M. J Am Chem Soc 1979, 101, 1913. Turon, M.; Wideburg, N.; et al. Science 1990, 249, 527.
76. Czerminski, R.; Elber, R. J Chem Phys 1990, 92, 5580. 96. Kempf, D. J. J Med Chem 1990, 33, 2687.
77. Allen, M. P.; Tildsley, D. J. Computer Simulation of Liq- 97. Sohn, S. E.; Singer, R. D.; Lamala, S. J.; Kuzyk, M. K. Polym
uids; Oxford Science Publications: New York, 1987. Mater Sci Eng 1986, 55, 532.
78. Derreumaux, P.; Schlick, T. Protiens Struct Funct Genet 98. Tsai, M. L.; Chen, S. H.; Jacobs, S. D. Appl Phys Lett 1989,
1995, 21, 282. 54, 2395.
79. Leijonmarck, M.; Liljas, A. J Mol Biol 1987, 195, 555. 99. Pojodil, G. M.; Farmer, B. L.; Adams, W. W. Polymer 1996,
80. Aqist, J.; vanGunsteren, W.; Leijonmarck, M.; Tapia, O. J 37, 1825.
Mol Biol 1985, 183, 461. 100. Grigoras, S.; Lane, T. H. J Comp Chem 1988, 9, 25.
81. Aqvist, J.; Leijonmarck, M.; Tapia, O. Eur Biophys J 1989, 101. Caves, L. S. D.; Evanseck, J. D.; Karplus, M. Protein Sci
16, 327. 1988, 7, 649.
82. Sanejouand, Y.; Tapia, O. J Phys Chem 1995, 99, 5698. 102. Auffinger, P.; Louise-May, S.; Westhof, E. J Am Chem Soc
83. Noy, A.; Vezenov, D.; Kayyem, J.; Meade, T.; Lieber, C. 1995, 117, 6720.
Chem Biol 1996, 527. 103. Bornemann, F. A.; Schuette, C. Phys D 1997, 102, 57.
84. Isralewitz, B.; Izrailev, S.; Schulten, K. Biophys J 1997, 73, 104. Steinbach, P. J.; Brooks, B. R. Proc Natl Acad Sci USA 1993,
2972. 90, 9135.
85. Dickerson, R. E. J Biomol Struct Dynam 1989, 6, 627. 105. Mazur, K. M. J Am Chem Soc 1998, 120, 10928.
86. Weiner, S. J.; Kollman, P. A.; Case, D.; Singh, U. C.; Ghio, 106. Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T.
C.; Alagona, G.; Weiner, P. K. J Am Chem Soc 1984, 106, J Am Chem Soc 1990, 112, 6127.
765.
107. Scheafer, M.; Karplus, M. J Phys Chem 1996, 100, 1578.
87. Mackerell, unpublished.
108. Gilson, M. K.; McCammon, J. A.; Madura, J. D. J Comp
88. Konrad, M. W.; Bolonick, J. I. J Am Chem Soc 1996, 118,
Chem 1995, 16, 1081.
10989.
109. Russel, A. T.; Warshel, A. J Mol Biol 1985, 185, 389.
89. Chen, Y. Z.; Mohan, V.; Griffey, R. H. J Biomolec Struct
Dynam 1998, 15, 756. 110. Hansson, T.; Aqvist, J. Protein Eng 1995, 8, 1137.
90. Chen, Y. Z.; Mohan, V.; Griffey, R. H. Chem Phys Lett 1998. 111. Washel, A.; Papazyan, A.; Kollman, P. A. Science 1995, 269,
91. Bertrand, H. O.; et al. Nucl Acid Res 1998, 26, 1261. 102.
92. Collins, J. R.; Burt, S. K.; Erickson, J. W. Nat Struct Biol 112. Fraternali, F.; van Gusteren, W. F. J Mol Biol 1996, 256, 939.
1995, 2, 334. 113. Wesson, L.; Eisenberg, D. Protein Sci 1992, 1, 227.