Folding

1
Simulations of Protein Folding

Michael Cahill,
a
Mark Fleharty,
b
and Kevin Cahill
c
a
Department of Chemistry, United States Military Academy, West Point, NY 10997, USA
b
Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA
c
Department of Physics and Astronomy, University of New Mexico, Albuquerque, NM 87131, USA
We have developed a simple, phenomenological, Monte-Carlo code that predicts the three-dimensional structure
of globular proteins from the DNA sequences that dene them. We have applied this code to two small proteins,
the villin headpiece (1VII) and cole1 rop (1ROP). Our code folds both proteins to within 5

A rms of their native
structures.
1. PROTEINS
A protein is a linear chain of amino acids. The
proteins of natural living organisms are composed
of 20 dierent types of amino acids. A typical
protein is a polymer of 300 amino acids, of which
there are 20
300
= 2 10
390
dierent possibilities.
The human body uses about 80,000 dierent pro-
teins for most of its functionality, including struc-
ture, communication, transport, and catalysis.
The order of the amino acids in the proteins of
an organism is specied by the order of the base
pairs in the deoxyribonucleic acid, DNA, of its
genome. Human DNA consists of 10
9
base pairs
with a total length of 3m. Since three base pairs
specify an amino acid, the code for the 80,000
human proteins requires only 3 300 80, 000 =
7 10
7
base pairs or 7% of the genome.
1.1. Amino Acids
The twenty amino acids dier only in their side
chains. The key atom in an amino acid is a car-
bon atom called the -carbon, C
. Four atoms
are attached to the C
by single covalent bonds:

a hydrogen atom H, a carbonyl-carbon atom C
,
a nitrogen atom N, and the rst atom of the side
chain R of the amino acid. The carbonyl car-
bon C
is connected to an oxygen atom by two

covalent bonds and to a hydroxyl group OH by
another covalent bond; the nitrogen atom N is at-
tached to two hydrogen atoms, forming an amine
group NH
2
. The backbone of an amino acid is
the triplet N, C
, C
.
Of the 20 amino acids found in biological sys-
tems, 19 are left handed. If one looks at the C
from the H, then the order of the structures C
O,
R, and N is clockwise (CORN). The one excep-
tion is glycine in which the entire side chain is a
single hydrogen atom; glycine is not chiral.
1.2. Globular Proteins
There are three classes of proteins: brous,
membrane, and globular. Fibrous proteins are
the building materials of bodies; collagen is used
in tendon and bone, -keratin in hair and skin.
Membrane proteins sit in the membranes of cells
through which they pass molecules and messages.
Globular proteins catalyze chemical reactions; en-
zymes are globular proteins.
Under normal physiological conditions, saline
water near pH=7 at 2040
o
C, proteins assume
their native forms. Globular proteins fold into
compact structures. The biological activity of
a globular protein is largely determined by its
unique shape, which in turn is determined by
its primary structure, that is, by its sequence of
amino acids.
1.3. Kinds of Amino Acids
The amino acids that occur in natural living
organisms are of four kinds. Seven are nonpolar:
alanine (ala), valine (val), phenylalanine (phe),
2
proline (pro), methionine (met), isoleucine (ile),
and leucine (leu).They avoid water and are said to
be hydrophobic. Four are charged: aspartic acid
(asp) and glutamic acid (glu) are negative, lysine
(lys) and arginine (arg) are positive. Eight are
polar: serine (ser), threonine (thr), tyrosine (tyr),
histidine (his), cysteine (cys), asparagine (asn),
glutamine (gln), and tryptophan (trp). The four
charged amino acids and the eight polar amino
acids seek water and are said to be hydrophilic.
Glycine falls into a class of its own.
1.4. Protein Geometry
When two amino acids are joined to make a
dipeptide, rst the hydroxyl group OH attached
to the carbonyl carbon C
of the rst amino acid

combines with one of the two hydrogen atoms at-
tached to the nitrogen N of the second amino acid
to form a molecule of water H
2
O, and then a pep-
tide bond forms between the carbonyl carbon C
of the rst amino acid and the nitrogen N of the

second amino acid. A peptide bond is short, 1.33
A, and resists rotations because it is partly a dou-

ble bond.
To a good approximation, the six atoms C
1
,
C
1
, O
1
, N
2
, H
2
, and C
2
lie in a plane, called
the peptide plane. If a third amino acid is added
to the carbonyl carbon C
2
of the second amino
acid, then the six atoms C
2
. . .C
3
also will lie
in a (typically dierent) plane. Exceptionally, the
peptide plane of proline is not quite at because
the side chain loops around, and its third carbon
atom forms a bond with the nitrogen atom of the
proline backbone.
1.5. The Protein Backbone
The protein backbone consists of the chain of
triplets (N C
)
1
, (N C
)
2
, (N C
)
3
, . . .,
(N C
)
N
. Apart from the rst nitrogen N
1
and the last carbonyl carbon C
n
, this backbone
(and its oxygen and amide hydrogen atoms) con-
sists of a chain of peptide planes, C
1
. . .C
2
. . .
C
n1
. . .C
n
. Since the angles among the four
bonds of the C
s are xed, the shape of the back-

bone of peptide planes is determined by the angles
of rotation about the single bonds that link each
C
to the N that precedes it and the C
that fol-
lows it. The angle about the N
i
-C
i
bond is called
i
, that about the C
i
-C
i
bond is
i
. The 2N an-
gles (
1
,
1
) . . . (
N
,
N
) determine the shape of
the backbone of the protein. These angles are the
main kinematic variables of a protein. The prin-
cipal properties of proteins are discussed in the
classic article by Jane Richardson [1].
2. PROTEIN FOLDING
The problem of protein folding is to predict the
natural folded shape of a protein under physiolog-
ical conditions from the DNA that denes its se-
quence of amino acids, which is its primary struc-
ture. This dicult problem has been approached
by several techniques. Some scientists have ap-
plied all-atom molecular dynamics [2]. We have
used the Monte Carlo method in a manner in-
spired by the work of Ken Dill et al. [3].
Our Monte Carlo simulations are guided by
a simple potential with three terms. The rst
term embodies the Pauli exclusion principle. Be-
cause the outer parts of atoms are electrons which
are fermions, the Pauli exclusion principle re-
quires that the side chains of a protein not over-
lap by more than a fraction of an angstrom. In
our present simulations, we have represented each
side chain as a sphere centered at the rst carbon
atom, the C
, of the side chain or at the hydrogen

atom that is the side chain in the case of glycine.
The second term represents the mutual attrac-
tion of nonpolar or hydrophobic amino acids. In
eect the water electric dipoles, the free protons,
the free hydroxyl radicals, and the other ions of
the cellular uid attract the charged and polar
amino acids of a protein but leave unaected the
nonpolar amino acids. The resulting net inward
force on the nonpolar amino acids drives them
into a core which can be as densely packed as an
ionic crystal.
The third term is a very phenomenological rep-
resentation of the eects of steric repulsion and
hydrogen bonding. For a given amino acid, this
term is more negative when its pair of angles
i
and
i
are in a zone that avoids steric clashes be-
tween the backbone and the side chain and that
encourages the formation of hydrogen bonds be-
tween NH
+
and O
groups. One of these Ra-

machandran zones favors the formation of he-
3
lices, others favor structures.
We incorporate these zones in a Metropolis step
with two scales, which we call zoning with mem-
ory. Each Monte-Carlo trial move begins with
a random number that determines whether the
angles
i
and
i
of residue i will change zone,
e.g., from its present zone to the zone, the
zone, or to the miscellaneous zone. If the zone
is changed, then the angles
i
and
i
revert to
the values they possessed when residue i was last
in that zone. The trial move is then modied
slightly and randomly.
2.1. Rotations
We have derived a simple formula for the 33
real orthogonal matrix that represents a right-
handed rotation by = |
| radians about the axis
=

/:
e
i
J
= cos I i

J sin + (1 cos )

(
)
T
in which the generators (J
k
)
ij
= i
ikj
satisfy
[J
i
, J
j
] = i
ijk
J
k
and T means transpose. In
terms of indices, this formula for R(
) = e
i
J
is
R(
)
ij
=
ij
cos sin
ijk
k
+ (1 cos )

j
.
In these formulae
ijk
is totally antisymmetric
with
123
= 1, and sums over k from 1 to 3 are
understood.
2.2. Distance
A conventional measure of the quality of a the-
oretical fold is the mean root-mean-square dis-
tance d between positions r(i) of the carbons
of the folded protein and those x(i) of the native
structure of the protein,
d =
_
1
n
n
i=1
(r(i) x(i))
2
.
The native states of many proteins are avail-
able from http://www.rcsb.org/pdb/. We have
derived a formula for this distance in terms of
the centers of mass r = (1/n)
n
j=1
r(j) and
x = (1/n)
n
j=1
x(j), the relative coordinates
q(i) = r(i) r and y(i) = x(i) x, their inner
products Q
2
=

n
i=1
q(i)
2
and Y
2
=

n
i=1
y(i)
2
,
and the matrix that is the sum of their outer prod-
ucts B =

n
i=1
q(i)y(i)
T
. If (B
T
)
kl
= B
lk
=
n
i=1
q(i)
l
y(i)
k
denotes the transpose of this 33
matrix B and tr denotes the trace, then the rms
distance d is
d =
_
1
n
_
Q
2
+ Y
2
2 tr
_
B B
T
_
1
2
_
_1
2
.
2.3. Two Proteins
We have performed simulations on a protein
fragment of 36 amino acids called the villin head-
piece (1VII). We begin by rotating the 2n dihe-
dral angles and of the protein to , except for
the angle of proline. In this denatured starting
conguration, the average rms distance d is 29

A.
Our best simulations so far fold the villin head-
piece to a mean rms distance d that is slightly less
than 5

A from its native state.
Our second protein is a 56-residue fragment of
the 63-residue protein cole1 rop (1ROP). From a
denatured conguration with d = 55

A, our code
folds this protein to a mean rms distance d of
slightly less than 3.2

A from its native state.
ACKNOWLEDGEMENTS
We wish to thank Ken Dill for many key sugges-
tions; Charles Beckel, John McIver, Susan Atlas,
and Sorin Istrail for helpful conversations; Sean
Cahill and Gary Herling for critical readings of
the manuscript; and Sau Lan Wu and John Ellis
for their hospitality at CERN. We have performed
our computations on two (dual Pentium II) per-
sonal computers running Linux; we are grateful
to Intel and Red Hat for reducing the cost of sci-
entic computing.
REFERENCES
1. J. S. Richardson, Adv. Prot. Chem. 34 (1981)
167.
2. Y. Dong and P. A. Kollman, Science 282
(1998) 740.
3. K. A. Dill, Biochemistry 29 (1990) 7133; H. S.
Chan and K. A. Dill, Annu. Rev. Biophys.
Biophys. Chem. 20 (1991) 447; K. A. Dill, S.
Bromberg, K. Yue, K. M. Fiebig, D. P. Yee,
P. D. Thomas, and H. S. Chan, Protein Sci-
ence 4 (1995) 561; K. Yue and K. A. Dill,
ibid. 5 (1996) 254; K. A. Dill and H. S. Chan,
Nature Struct. Biol. 4 (1997) 10.

Folding

Uploaded by

Copyright:

Available Formats

Folding

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Folding

Uploaded by

Copyright:

Available Formats

1

Simulations of Protein Folding

by single covalent bonds:

is connected to an oxygen atom by two

from the H, then the order of the structures C

of the rst amino acid

of the rst amino acid and the nitrogen N of the

A, and resists rotations because it is partly a dou-

s are xed, the shape of the back-

to the N that precedes it and the C

, of the side chain or at the hydrogen

groups. One of these Ra-

| radians about the axis

You might also like