Folding
Folding
Folding
. Four atoms
are attached to the C
,
a nitrogen atom N, and the rst atom of the side
chain R of the amino acid. The carbonyl car-
bon C
, C
.
Of the 20 amino acids found in biological sys-
tems, 19 are left handed. If one looks at the C
O,
R, and N is clockwise (CORN). The one excep-
tion is glycine in which the entire side chain is a
single hydrogen atom; glycine is not chiral.
1.2. Globular Proteins
There are three classes of proteins: brous,
membrane, and globular. Fibrous proteins are
the building materials of bodies; collagen is used
in tendon and bone, -keratin in hair and skin.
Membrane proteins sit in the membranes of cells
through which they pass molecules and messages.
Globular proteins catalyze chemical reactions; en-
zymes are globular proteins.
Under normal physiological conditions, saline
water near pH=7 at 2040
o
C, proteins assume
their native forms. Globular proteins fold into
compact structures. The biological activity of
a globular protein is largely determined by its
unique shape, which in turn is determined by
its primary structure, that is, by its sequence of
amino acids.
1.3. Kinds of Amino Acids
The amino acids that occur in natural living
organisms are of four kinds. Seven are nonpolar:
alanine (ala), valine (val), phenylalanine (phe),
2
proline (pro), methionine (met), isoleucine (ile),
and leucine (leu).They avoid water and are said to
be hydrophobic. Four are charged: aspartic acid
(asp) and glutamic acid (glu) are negative, lysine
(lys) and arginine (arg) are positive. Eight are
polar: serine (ser), threonine (thr), tyrosine (tyr),
histidine (his), cysteine (cys), asparagine (asn),
glutamine (gln), and tryptophan (trp). The four
charged amino acids and the eight polar amino
acids seek water and are said to be hydrophilic.
Glycine falls into a class of its own.
1.4. Protein Geometry
When two amino acids are joined to make a
dipeptide, rst the hydroxyl group OH attached
to the carbonyl carbon C
1
, O
1
, N
2
, H
2
, and C
2
lie in a plane, called
the peptide plane. If a third amino acid is added
to the carbonyl carbon C
2
of the second amino
acid, then the six atoms C
2
. . .C
3
also will lie
in a (typically dierent) plane. Exceptionally, the
peptide plane of proline is not quite at because
the side chain loops around, and its third carbon
atom forms a bond with the nitrogen atom of the
proline backbone.
1.5. The Protein Backbone
The protein backbone consists of the chain of
triplets (N C
)
1
, (N C
)
2
, (N C
)
3
, . . .,
(N C
)
N
. Apart from the rst nitrogen N
1
and the last carbonyl carbon C
n
, this backbone
(and its oxygen and amide hydrogen atoms) con-
sists of a chain of peptide planes, C
1
. . .C
2
. . .
C
n1
. . .C
n
. Since the angles among the four
bonds of the C
that fol-
lows it. The angle about the N
i
-C
i
bond is called
i
, that about the C
i
-C
i
bond is
i
. The 2N an-
gles (
1
,
1
) . . . (
N
,
N
) determine the shape of
the backbone of the protein. These angles are the
main kinematic variables of a protein. The prin-
cipal properties of proteins are discussed in the
classic article by Jane Richardson [1].
2. PROTEIN FOLDING
The problem of protein folding is to predict the
natural folded shape of a protein under physiolog-
ical conditions from the DNA that denes its se-
quence of amino acids, which is its primary struc-
ture. This dicult problem has been approached
by several techniques. Some scientists have ap-
plied all-atom molecular dynamics [2]. We have
used the Monte Carlo method in a manner in-
spired by the work of Ken Dill et al. [3].
Our Monte Carlo simulations are guided by
a simple potential with three terms. The rst
term embodies the Pauli exclusion principle. Be-
cause the outer parts of atoms are electrons which
are fermions, the Pauli exclusion principle re-
quires that the side chains of a protein not over-
lap by more than a fraction of an angstrom. In
our present simulations, we have represented each
side chain as a sphere centered at the rst carbon
atom, the C
=
/:
e
i
J
= cos I i
J sin + (1 cos )
(
)
T
in which the generators (J
k
)
ij
= i
ikj
satisfy
[J
i
, J
j
] = i
ijk
J
k
and T means transpose. In
terms of indices, this formula for R(
) = e
i
J
is
R(
)
ij
=
ij
cos sin
ijk
k
+ (1 cos )
j
.
In these formulae
ijk
is totally antisymmetric
with
123
= 1, and sums over k from 1 to 3 are
understood.
2.2. Distance
A conventional measure of the quality of a the-
oretical fold is the mean root-mean-square dis-
tance d between positions r(i) of the carbons
of the folded protein and those x(i) of the native
structure of the protein,
d =
_
1
n
n
i=1
(r(i) x(i))
2
.
The native states of many proteins are avail-
able from http://www.rcsb.org/pdb/. We have
derived a formula for this distance in terms of
the centers of mass r = (1/n)
n
j=1
r(j) and
x = (1/n)
n
j=1
x(j), the relative coordinates
q(i) = r(i) r and y(i) = x(i) x, their inner
products Q
2
=
n
i=1
q(i)
2
and Y
2
=
n
i=1
y(i)
2
,
and the matrix that is the sum of their outer prod-
ucts B =
n
i=1
q(i)y(i)
T
. If (B
T
)
kl
= B
lk
=
n
i=1
q(i)
l
y(i)
k
denotes the transpose of this 33
matrix B and tr denotes the trace, then the rms
distance d is
d =
_
1
n
_
Q
2
+ Y
2
2 tr
_
B B
T
_
1
2
_
_1
2
.
2.3. Two Proteins
We have performed simulations on a protein
fragment of 36 amino acids called the villin head-
piece (1VII). We begin by rotating the 2n dihe-
dral angles and of the protein to , except for
the angle of proline. In this denatured starting
conguration, the average rms distance d is 29
A.
Our best simulations so far fold the villin head-
piece to a mean rms distance d that is slightly less
than 5
A from its native state.
Our second protein is a 56-residue fragment of
the 63-residue protein cole1 rop (1ROP). From a
denatured conguration with d = 55
A, our code
folds this protein to a mean rms distance d of
slightly less than 3.2
A from its native state.
ACKNOWLEDGEMENTS
We wish to thank Ken Dill for many key sugges-
tions; Charles Beckel, John McIver, Susan Atlas,
and Sorin Istrail for helpful conversations; Sean
Cahill and Gary Herling for critical readings of
the manuscript; and Sau Lan Wu and John Ellis
for their hospitality at CERN. We have performed
our computations on two (dual Pentium II) per-
sonal computers running Linux; we are grateful
to Intel and Red Hat for reducing the cost of sci-
entic computing.
REFERENCES
1. J. S. Richardson, Adv. Prot. Chem. 34 (1981)
167.
2. Y. Dong and P. A. Kollman, Science 282
(1998) 740.
3. K. A. Dill, Biochemistry 29 (1990) 7133; H. S.
Chan and K. A. Dill, Annu. Rev. Biophys.
Biophys. Chem. 20 (1991) 447; K. A. Dill, S.
Bromberg, K. Yue, K. M. Fiebig, D. P. Yee,
P. D. Thomas, and H. S. Chan, Protein Sci-
ence 4 (1995) 561; K. Yue and K. A. Dill,
ibid. 5 (1996) 254; K. A. Dill and H. S. Chan,
Nature Struct. Biol. 4 (1997) 10.