2014 Glick Medical Biotechnology CH 6 PDF

6
Manipulating
Gene Expression in
Modulation of Gene Expression
Prokaryotes
Promoters
Translational Regulation
Codon Usage
Protein Stability
Fusion Proteins
Metabolic Load
Chromosomal Integration
Increasing Secretion
Overcoming Oxygen Limitation
Reducing Acetate
Protein Folding
he major objective of gene cloning for biotechnological applica-
Heterologous Protein
Production in Eukaryotic
Cells
Eukaryotic Expression Systems
T tions is the expression of a cloned gene in a selected host organism.
In addition, for many purposes, a high rate of production of the
protein encoded by the cloned gene is needed. To this end, a wide range
Saccharomyces cerevisiae of expression vectors that provide genetic elements for controlling tran-
Expression Systems scription and translation of the cloned gene as well as enhanced stability,
Other Yeast Expression Systems facilitated purification, and facilitated secretion of the protein product of
Baculovirus–Insect Cell
Expression Systems
the cloned gene have been constructed. There is not one single strategy
Mammalian Cell Expression for obtaining maximal protein expression from every cloned gene. Rather,
Systems there are a number of different biological parameters that can be manip-
Directed Mutagenesis ulated to yield an optimal level of expression.
Oligonucleotide-Directed The level of foreign gene expression also depends on the host organ-
Mutagenesis with M13 DNA ism. Although a wide range of both prokaryotic and eukaryotic organisms
Oligonucleotide-Directed
Mutagenesis with Plasmid DNA
have been used to express foreign genes, initially, many of the commer-
PCR-Amplified cially important proteins produced by recombinant DNA technology
Oligonucleotide-Directed were synthesized in Escherichia coli. The early dependence on E. coli as a
Mutagenesis host organism occurred because of the extensive knowledge of its genet-
Error-Prone PCR
Random Mutagenesis
ics, molecular biology, biochemistry, and physiology (Milestone 6.1). To
DNA Shuffling date, recombinant proteins have been produced using different strains of
Examples of Modified Proteins bacteria (including E. coli), yeasts, and mammalian cells grown in culture
SUMMARY and transgenic plants (Table 6.1). Each of these systems has its particular
REVIEW QUESTIONS advantages and disadvantages, so, again, there is no universal optimal
REFERENCES host for the expression of recombinant proteins, even those that will even-
tually be used as therapeutic agents or vaccines. Thus, for example, E. coli
cells may be engineered to produce high levels of foreign proteins; how-
ever, these proteins are not glycosylated and are sometimes misfolded. On
the other hand, with mammalian cells in culture, recombinant proteins
are correctly glycosylated and folded, although the yield of proteins is
much lower than in E. coli. Notwithstanding the very large differences
between organisms, the strategies that have been elaborated for E. coli, in
principle, are applicable to all systems.
doi:10.1128/9781555818890.ch6 329
330 CHAPTER 6
Construction of Biologically Functional Bacterial milestone

Plasmids In Vitro 6.1
S. N. Cohen, A. C. Y. Chang, H. W. Boyer, and R. B. Helling
Proc. Natl. Acad. Sci. USA 70:3240–3244, 1973
I
n 1972, Paul Berg and his co- the University of California at San With the publishing of this article,
workers (Jackson et al., 1972) Francisco) showed that a recombinant recombinant DNA technology had
demonstrated that fragments of DNA molecule could be created with- truly arrived. The technology spread,
bacteriophage λ DNA could be spliced out the use of viruses. They demon- first slowly to a few labs and then
into SV40. They reported, for the first strated that foreign DNA could be to dozens and, eventually, to tens of
time, that fragments of DNA could be inserted into plasmid DNA and subse- thousands of labs worldwide. In the
covalently joined with other DNA mol- quently perpetuated in E. coli. As they 40 years since the groundbreaking
ecules. This joining of “unrelated DNA state in the abstract to their research experiments of Cohen and Boyer and
molecules to one another” by Jackson article, “The construction of new plas- their colleagues, more than 200 new
et al. is arguably the first demonstra- mid DNA species by in vitro joining drugs produced by recombinant DNA
tion of the possibility of recombinant of restriction endonuclease-generated technology have been used to treat
DNA technology. However, while SV40 fragments of separate plasmids is over 300 million people for a wide
was at the time thought to be safe in described. Newly constructed plasmids range of human diseases. In addition,
humans, the prospect of an altered that are inserted into Escherichia coli today more than 400 additional drugs
form of the virus spreading unchecked, by transformation are shown to be bio- produced using this technology are in
through the common bacterium E. coli, logically functional replicons that pos- various stages of clinical trials, with
caused Berg to delay a portion of his sess genetic properties and nucleotide many expected to be on the market
research program. Thus, contrary to his base sequences from both of the parent within the next 5 to 10 years. Today,
original plan, Berg did not insert the DNA molecules. Functional plasmids Cohen and Boyer are widely regarded
recombinant virus into bacterial cells. can be obtained by reassociation of as the founders of the scientific
Soon after Berg published the endonuclease-generated fragments of revolution that has become modern
results of his experiments, Stanley larger replicons, as well as by joining biotechnology.
Cohen and Herbert Boyer and their of plamid DNA molecules of entirely
colleagues (at Stanford University and different origins.”
Manipulating Gene Expression in Prokaryotes

A large number of factors affect the level of expression of a foreign gene
in a prokaryotic host. These include the transcriptional promoter regulat-
ing the expression of the target gene, the translational regulatory region
that is upstream of the coding region of every gene, the similarity of the
Table 6.1 Production of recombinant human proteins in various biological hosts

Mammalian
Parameter Bacteria Yeast cell culture Transgenic plants
Glycosylation None Incorrect Correct Generally correct;
small differences
Multimeric proteins Limited Limited Limited Yes
assembled
Production costs Low-medium Medium High Very low
Protein folding accuracy Low Medium High High
Protein yield High Medium-high Low-medium Medium
Scale-up capacity High High Low Very high
Scale-up costs High High High Low
Time required Low Low-medium Medium-high High
Skilled workers required Medium Medium High Low
Acceptable to regulators Yes Yes Yes Not yet
Modulation of Gene Expression 331
codons used by the host organism and the target gene, the stability of both
the recombinant protein and its mRNA, the metabolic functioning of the
host cell, and the localization of the introduced foreign gene as well as the
protein that it encodes.
Promoters
The minimum requirement for an effective gene expression system is
the presence of a strong and regulatable promoter sequence upstream
from a cloned gene. A strong promoter is one that has a high affinity
for the enzyme RNA polymerase, with the consequence that the adjacent
downstream region is frequently transcribed. The ability to regulate the
functioning of a promoter allows the researcher to control the extent of
transcription.
The rationale behind the use of strong and regulatable promoters is
that the expression of a cloned gene under the control of a continuously
activated (i.e., constitutive) strong promoter would likely yield a high level
of continual expression of a cloned gene, which is often detrimental to
the host cell because it creates an energy drain, thereby impairing essential
host cell functions. In addition, all or a portion of the plasmid carrying a
constitutively expressed cloned gene may be lost after several cell division
cycles, since cells without a plasmid grow faster and eventually take over
the culture. To overcome this potential problem, it is desirable to control
transcription so that a cloned gene is expressed only at a specific stage in
the host cell growth cycle and only for a specified duration. This may be
achieved by using a strong regulatable promoter. The plasmids constructed
to accomplish this task are called expression vectors.
For the production of foreign proteins in E. coli cells, a few strong
and regulatable promoters are commonly used, including those from the
E. coli lac (lactose) and trp (tryptophan) operons; the tac promoter, which
is constructed from the −10 region (i.e., 10 nucleotide pairs upstream
from the site of initiation of transcription) of the lac promoter and the
−35 region of the trp promoter; the leftward and rightward, or pL and pR,
promoters from bacteriophage λ; and the gene 10 promoter from bacte-
riophage T7. Each of these promoters interacts with regulatory proteins
(i.e., repressors or inducers), which provide a controllable switch for ei-
ther turning on or turning off the transcription of the adjacent cloned
genes. Each of these promoters is recognized by the major form of the E.
coli RNA polymerase holoenzyme. This holoenzyme is formed when a
protein, called sigma factor, combines with the core proteins (i.e., two α,
one β, and one β′ subunit) of RNA polymerase. The sigma factor directs
the binding of the holoenzyme to promoter regions on the DNA.
One commonly used expression system utilizes the gene 10 promoter
from bacteriophage T7 (Fig. 6.1). This promoter is not recognized by E.
coli RNA polymerase but, rather, requires T7 RNA polymerase for tran-
scription to occur. For this system to work in E. coli, the gene encoding
T7 polymerase is often inserted into the E. coli chromosome under the
transcriptional control of the E. coli lac promoter and operator. The E.
coli host cells must also contain the E. coli lacI gene, which encodes the
lac repressor. The lac repressor forms a tetramer that binds to the lac
332 CHAPTER 6
T7 RNA
polymerase Target protein
mRNA
plac olac T7 RNA TT pT7 olac Target gene TT

polymerase gene
lac repressor
mRNA
placI lacI gene TT

Figure 6.1 Regulation of gene expression controlled by the promoter for gene 10 from
bacteriophage T7 (pT7). In the absence of the inducer IPTG, the constitutively pro-
duced lac repressor, the product of the lacI gene, which is under the control of the lacI
promoter, placI, represses the transcription of the T7 RNA polymerase and the target
gene, which are both negatively regulated by the binding of four molecules of the lac
repressor to the lac operator (olac). In the absence of T7 RNA polymerase, the target
gene, which is under the transcriptional control of pT7, is not transcribed. When lac-
tose or IPTG is added to the medium, it binds to the lac repressor, thereby preventing
it from repressing the transcription of the T7 RNA polymerase gene directed by the
lac promoter (plac) and the target gene directed by pT7. In the presence of T7 RNA
polymerase, the target gene is transcribed. TT, transcription termination sequence.
doi:10.1128/9781555818890.ch6.f6.1
operator and prevents T7 RNA polymerase from being made before it

is needed. Following transformation of the host E. coli cells with a plas-
mid containing the cloned target gene under the transcriptional control
of the T7 promoter, the compound isopropyl-β-d-thiogalactopyranoside
IPTG (IPTG) is added to the growth medium. Under these conditions, IPTG
isopropyl-β-D-thiogalactopyranoside binds to the lac repressor, thereby removing it from the lac operator and
permitting the lac promoter to be transcribed so that T7 RNA polymerase
can be synthesized. The T7 RNA polymerase binds to the T7 promoter
and transcribes the gene of interest. To ensure that synthesis of the target
protein does not interfere with cellular metabolism, potentially depleting
cellular resources, in the absence of IPTG, the lac operator is often in-
serted between the T7 promoter and the target gene so that expression of
the target gene is also negatively controlled by the lac repressor. With this
arrangement of regulatory regions, there is no synthesis of the target gene.

However, following the addition of IPTG, and after a delay of about 1 h,
a large burst of synthesis of the target protein occurs. While the features
of the DNA constructs may vary when other strong and regulatable pro-
moters are used, the experimental design is similar and includes the entry
of the host cells into a growth phase, when the target protein is not ex-
pressed, which is followed by an induction phase when the target protein
is expressed at a high level.
A problem with the T7 expression system is that with the very high
rate of transcription of the T7 RNA polymerase compared to that of the
native E. coli enzyme (the T7 enzyme transcribes DNA into mRNA eight
times faster), the recombinant protein may not fold properly and hence
form insoluble inclusion bodies. In addition, when the target protein is a
membrane protein, very high levels may be toxic to the cell. To avoid these
problems, it is possible to downregulate (attenuate) the activity of the T7
RNA polymerase by producing the inhibitor T7 lysozyme. Moreover, the
amount of T7 lysozyme in the cell can be finely regulated by placing the
gene encoding this enzyme under the transcriptional control of the prhaBAD
promoter (Fig. 6.2). This promoter is regulated by a wide range in the con-
centration of (the deoxy sugar) l-rhamnose (i.e., 10 to 2,000 μM). Thus,
by decreasing the amount of T7 RNA polymerase, the maximum amount
of a target protein may be synthesized without the production of inclusion
bodies for soluble proteins or host toxic levels for membrane proteins.
E. coli is not necessarily the microorganism of choice for the expres-
sion of all foreign proteins. However, an understanding of the genetics
and molecular biology of most other microorganisms is not nearly as well
developed. Moreover, there is no one vector or promoter-repressor system
that gives optimal levels of gene expression in all bacteria, or even in all
gram-negative bacteria. Fortunately, most of the strategies that have been
developed for E. coli are also useful with a wide range of microorganisms
as well as other host cells.
Figure 6.2 Attenuation of the activity of T7 RNA polymerase by the inhibitor T7

lysozyme. The amount of T7 lysozyme is controlled by the level of l-rhamnose added
to the system (which activates the prhaBAD promoter). TT, transcription termination
sequence. doi:10.1128/9781555818890.ch6.f6.2
T7 RNA
T7 lysozyme polymerase Target protein
mRNA
prhaBAD T7 lysozyme gene TT plac olac T7 RNA TT pT7 olac Target gene TT
polymerase gene
334 CHAPTER 6
Translational Regulation
Placing a cloned gene under the control of a regulatable, strong promoter,
although essential, may not be sufficient to maximize the yield of the
cloned gene product. Other factors, such as the efficiency of translation
and the stability of the newly synthesized cloned gene protein, may also
affect the amount of product.
In prokaryotic cells, various proteins are not necessarily synthesized
with the same efficiency. In fact, they may be produced at very different
levels (up to several hundredfold) even if they are encoded within the
same polycistronic mRNA. Differences in translational efficiency and in
transcriptional regulation enable the cell to have hundreds or even thou-
sands of copies of some proteins and only a few copies of others.
The molecular basis for differential translation of bacterial mRNAs is
the presence of a translational initiation signal called a ribosome-binding
site which precedes the protein-coding portion of the mRNA. A
ribosome-binding site is a sequence of six to eight nucleotides in mRNA
that can base-pair with a complementary nucleotide sequence on the 16S
RNA component of the small ribosomal subunit. Generally, the stronger
the binding of the mRNA to the rRNA, the greater the efficiency of trans-
lational initiation.
Thus, many E. coli expression vectors have been designed to ensure
that the mRNA of a cloned gene contains a strong ribosome-binding
Figure 6.3 Example of secondary struc- site. Inclusion of an E. coli ribosome-binding site just upstream from the
ture of the 5′ end of an mRNA that
would prevent efficient translation. In protein-coding open reading frame ensures that heterologous prokaryotic
this example, the ribosome-binding site and eukaryotic genes can be translated readily in E. coli. However, certain
is GGGGG, the initiator codon is AUG conditions must be satisfied for this approach to function properly. First, the
(shown in red), and the first few codons
are CAG-CAU-GAU-UUA-UUU. The ribosome-binding sequence must be located within a short distance (gen-
mRNA is oriented with its 5′ end to erally 2 to 20 nucleotides) from the translational start codon of the cloned
the left and its 3′ end to the right. Note gene. At the RNA level, the translational codon is usually AUG (adenosine,
that in addition to the traditional A⋅U
uridine, and guanidine). In DNA, the coding strand contains the ATG se-
and G⋅C base pairs in mRNA, G can
also base-pair to some extent with U. quence (where T is thymidine) that functions as a start codon, and the
doi:10.1128/9781555818890.ch6.f6.3 complementary noncoding strand is a template for transcription. Second,
A the DNA sequence that includes the ribosome-binding site through the first
C G few codons of the gene of interest should not contain nucleotide sequences
G C
that have regions of complementarity and can fold back (form intrastrand
loops) after transcription (Fig. 6.3), thereby blocking the interaction of the
U A
mRNA with the ribosome. The local secondary structure of the mRNA,
A U
which can either shield or expose the ribosome-binding site, determines the
U G extent to which the mRNA can bind to the appropriate sequence on the
A A ribosome and initiate translation. Thus, for each cloned gene, it is impor-
A U tant to ensure that the mRNA contains a strong ribosome-binding site and
A U that the secondary structure of the mRNA does not prevent its access to the
A U ribosome. However, since the nucleotide sequences that encode the amino
G A
acids at the N-terminal region of the target protein vary from one gene to
another, it is not possible to design a vector that will eliminate the possi-
G U
bility of mRNA fold-back in all instances. Therefore, no single optimized
G U
translational initiation region can guarantee a high rate of translation ini-
G U tiation for all cloned genes. Consequently, the optimization of translation
5' G A A C C G G A A C A C 3' initiation needs to be on a gene-by-gene basis.
Codon Usage
While the genetic code for amino acids, on average, includes about three
different codons (any particular amino acid may have from one to six
codons), these codons are used to different extents in various living or-
ganisms. Any organism, e.g., E. coli, produces cognate tRNAs for each
codon in approximately the same relative amount as that particular co-
don is used in the production of its proteins. Various organisms prefer-
entially use different subsets of codons (Table 6.2) and contain various
amounts of the cognate amino acyl-tRNAs for the synthesis of proteins
encoded by their mRNAs. Thus, expressing a foreign gene in a particular
host organism may result in a cellular incompatibility that can interfere
with efficient translation when a cloned gene has codons that are rarely
used by the host cell. For example, AGG, AGA, AUA, CUA, and CGA
are the least-used codons in E. coli. When a foreign protein is expressed
Table 6.2 Genetic code and codon usage in E. coli and humans

Frequency of use in: Frequency of use in:
Codon Amino acid E. coli Humans Codon Amino acid E. coli Humans
GGG Glycine 0.13 0.23 UAG Stop 0.09 0.17
GGA Glycine 0.09 0.26 UAA Stop 0.62 0.22
GGGU Glycine 0.38 0.18 UAU Tyrosine 0.53 0.42
GGC Glycine 0.40 0.33 UAC Tyrosine 0.47 0.58
GAG Glutamic acid 0.30 0.59 UUU Phenylalanine 0.51 0.43
GAA Glutamic acid 0.70 0.41 UUC Phenylalanine 0.49 0.57
GAU Aspartic acid 0.59 0.44 UCG Serine 0.13 0.06
GAC Aspartic acid 0.41 0.56 UCA Serine 0.12 0.15
GUG Valine 0.34 0.48 UCU Serine 0.19 0.17
GUA Valine 0.17 0.10 UCC Serine 0.17 0.23
GUU Valine 0.29 0.17 AGU Serine 0.13 0.14
GUC Valine 0.20 0.25 AGC Serine 0.27 0.25
GCG Alanine 0.34 0.10 CGG Arginine 0.08 0.19
GCA Alanine 0.22 0.22 CGA Arginine 0.05 0.10
GCU Alanine 0.19 0.28 CGU Arginine 0.42 0.09
GCC Alanine 0.25 0.40 CGC Arginine 0.37 0.19
AAG Lysine 0.24 0.60 AGG Arginine 0.03 0.22
AAA Lysine 0.76 0.40 AGA Arginine 0.04 0.21
AAU Asparagine 0.39 0.44 CAG Glutamine 0.69 0.73
AAC Asparagine 0.61 0.56 CAA Glutamine 0.31 0.27
AUG Methionine 1.00 1.00 CAU Histidine 0.52 0.41
AUA Isoleucine 0.07 0.14 CAC Histidine 0.48 0.59
AUU Isoleucine 0.47 0.35 CUG Leucine 0.55 0.43
AUC Isoleucine 0.46 0.51 CUA Leucine 0.03 0.07
ACG Threonine 0.23 0.12 CUU Leucine 0.10 0.12
ACA Threonine 0.12 0.27 CUC Leucine 0.10 0.20
ACU Threonine 0.21 0.23 UUG Leucine 0.11 0.12
ACC Threonine 0.43 0.38 UUA Leucine 0.11 0.06
UGG Tryptophan 1.00 1.00 CCG Proline 0.55 0.11
UGU Cysteine 0.43 0.42 CCA Proline 0.20 0.27
UGC Cysteine 0.57 0.58 CCU Proline 0.16 0.29
UGA Stop 0.30 0.61 CCC Proline 0.10 0.33
336 CHAPTER 6
at high levels in E. coli, the host cell may not produce enough of the
aminoacyl-tRNAs that recognize these rarely used codons, and either the
yield of the cloned gene protein is much lower than expected or incorrect
amino acids may be inserted into the protein. Any codon that is used
less than 5 to 10% of the time by the host organism may cause prob-
lems. Particularly detrimental to high levels of expression are regions of
mRNA where two or more rarely used codons are close or adjacent to, or
appear in, the sequence encoding the N-terminal portion of the protein.
Fortunately, there are several experimental approaches that can be used
to alleviate this problem. First, if the target gene is eukaryotic, it may be
cloned and expressed in a eukaryotic host cell. Second, a new version of
the target gene containing codons more commonly used by the host cell
may be chemically synthesized (i.e., codon optimization). Third, a host E.
coli cell engineered to overexpress several rare tRNAs may be employed.
In fact, some E. coli strains have been transformed with plasmids that
encode genes that lead to the overproduction of some tRNAs which are
specific for certain rare E. coli codons. These transformed E. coli cell lines
are available commercially and can often facilitate a high level of expres-
sion of foreign proteins that use these rare E. coli codons (Fig. 6.4). For
example, with one of the commercially available E. coli cell lines, it was
possible to overexpress the Ara h2 protein, a peanut allergen, approxi-
mately 100-fold over the amount that was synthesized in conventional
E. coli cells. With this approach, it should be possible to produce large
quantities of a variety of heterologous proteins that are otherwise difficult
to express in different hosts.
Protein Stability
The expression of some foreign proteins in E. coli host strains, which
are typically grown at 37°C, often results in the formation of inclusion
bodies of inactive protein. This occurs because the foreign protein mis-
folds when it cannot attain its native active conformation. A variety of
strategies have been developed, albeit with limited success, to circumvent
this problem. Cultivation of recombinant strains at lower temperatures
sometimes facilitates slower, and hence proper, protein folding, often sig-
nificantly increasing the amount of recoverable active protein. However,
mesophilic bacteria like E. coli grow extremely slowly at low tempera-
tures. In one study, the chaperonin 60 gene (cpn60) and the cochaperonin
10 gene (cpn10) from the psychrophilic bacterium Oleispira antarctica
were introduced into a host strain of E. coli, with the result that the E.
coli strain gained the ability to grow and to express foreign proteins at
a high rate at temperatures of 4 to 10°C (Fig. 6.5). It has been suggested
that at temperatures below around 20°C, E. coli cells are unable to grow
to any appreciable extent as a consequence of the cold-induced inactiva-
tion of several E. coli chaperonins that normally facilitate protein folding
in this bacterium. Thus, transforming E. coli with chaperonins from a
cold-tolerant bacterium allowed the introduced proteins to perform the
functions at low temperature that E. coli proteins perform at higher tem-
peratures. Although very high levels of expression of the cloned gene were
A
metT leuW
lys proL
argW
argU
thrT
l
C hlo ra m p h e nico
S p e c ti n o m y c i n
ileX glyT
pSJS1244 pRARE
tryU
thrU
argU
p15A ileX p15A
B E. coli host cell
Plasmid with
foreign gene Low level of foreign
gene expression
C E. coli host cell

Overproduced tRNA
Figure 6.4 Schematic representation of two commer-
cially available plasmids that may be used to increase
the pool of certain rare tRNAs in E. coli. Plasmid
pSJS1244 carries 3 and plasmid pRARE carries 10
rare E. coli tRNA genes. p15A represents the repli-
cation origins of these plasmids. Spectinomycin and
Plasmid with High level of foreign chloramphenicol are the antibiotics for which resis-
foreign gene gene expression tance genes are carried within these plasmids (A). The
expression of foreign proteins in a typical E. coli host
cell is also shown; the concentration of rare tRNAs
is shown schematically (B) and in an E. coli host cell
that has been engineered (by introduction of one of the
plasmids shown in panel A) to overexpress several rare
tRNAs (C). doi:10.1128/9781555818890.ch6.f6.4
338 CHAPTER 6
Nontransformed E. coli Transformed E. coli
cpn10 cpn60
No growth Significant growth

below 20°C below 20°C
Figure 6.5 The ability of nontransformed E. coli and E. coli transformed to express
plasmid-borne chaperonin genes cpn10 and cpn60, which were isolated from a psy-
chrophilic bacterium, and consequently grow at low temperatures.
doi:10.1128/9781555818890.ch6.f6.5
not attained, this work is an important first step in the development of

expression systems for proteins that are sensitive to high temperatures
and might otherwise be difficult to produce. The next logical step in the
development of this system would likely be the construction of an E. coli
host cell that contains stably integrated copies of these chaperonin genes
in the chromosome.
Fusion Proteins
Occasionally, foreign proteins are found in smaller-than-expected amounts
when they are produced in heterologous host cells. This apparent low level
of expression may be due to degradation of the foreign protein within the
host cell. One solution is to engineer a DNA construct encoding a target
protein in frame with DNA encoding a stable host protein (Fig. 6.6). The
combined, single protein that is produced is called a fusion protein, and
it protects the cloned foreign gene product from attack by host cell pro-
teases. In general, fusion proteins are stable because the target proteins
are fused with proteins that are not especially susceptible to proteolysis.
Fusion proteins are constructed by ligating a portion of the DNA
coding regions of two or more genes. In its simplest form, a fusion vector
system entails the insertion of a target gene into the coding region of a
cloned host gene, or fusion partner (Fig. 6.6). The fusion partner may be
positioned at either the N- or C-terminal end of the target gene. Knowl-
edge of the nucleotide sequences of the various coding segments joined at
P RBS Fusion partner Target protein gene TT
Figure 6.6 Schematic representation of a DNA construct encoding a fusion protein. A

plasmid carrying such a construct would also contain a selectable marker gene. P, tran-
scriptional promoter; RBS, ribosome-binding site; TT, transcriptional terminator. The
arrow indicates the direction of transcription. doi:10.1128/9781555818890.ch6.f6.6
the DNA level is essential to ensure that the ligation product maintains
the correct reading frame. If the combined DNA has an altered reading
frame, i.e., a sequence of successive codons that yields either an incom-
plete or an incorrect translation product, then a functional version of the
protein encoded by the cloned target gene will not be produced.
When the protein encoded by the cloned gene is intended for human
use, it is generally necessary to remove the fusion partner from the final
product. This is because fusion proteins require more extensive testing
before being approved by regulatory agencies, such as the U.S. Food and
Drug Administration (FDA). Therefore, strategies have been developed to FDA
remove the unwanted amino acid sequence from the target protein. One U.S. Food and Drug Administration
way to do this is to join the gene for the target protein to the gene for
the stabilizing fusion partner with specific oligonucleotides that encode
short stretches of amino acids that are recognized by a particular non-
bacterial protease. For example, an oligonucleotide linker encoding the
amino acid sequence Ile-Glu-Gly-Arg may be joined to the cloned gene.
Following synthesis and purification of the fusion protein, a blood coag-
ulation factor called Xa can be used to release the target protein from the
fusion partner, because factor Xa is a specific protease that cleaves peptide
bonds uniquely on the C-terminal side of the Ile-Glu-Gly-Arg sequence
(Fig. 6.7). Moreover, because this peptide sequence occurs rather infre-
quently in native proteins, this approach can be used to recover many
different cloned gene products.
The proteases most commonly used to cleave a fusion partner from
a target protein interest are enterokinase, tobacco etch virus protease,
thrombin, and factor Xa. However, following this cleavage, it is neces-
sary to perform additional purification steps in order to separate both
Figure 6.7 (A) Proteolytic cleavage of a fusion protein by blood coagulation factor
Xa. The factor Xa recognition sequence (Xa linker sequence) lies between the amino
acid sequences of two different proteins. A functional cloned gene protein (with Val at
its N terminus) is released after cleavage. (B) Schematic representation of a tripartite
fusion protein including a stable fusion partner, a linker peptide, and the cloned target
protein. doi:10.1128/9781555818890.ch6.f6.7
A B
Site of
cleavage of
factor Xa
Xa linker sequence
. . . Thr-Ala-Glu-Gly-Gly-Ser-Ile-Glu-Gly-Arg-Val-His-Leu . . .
Peptide
Fusion partner linker Target protein
340 CHAPTER 6
Table 6.3 Some protein fusion systems used to facilitate the purification of foreign
proteins in E. coli and other host organisms
Fusion partner Size Ligand Elution conditions
ZZ 14 kDa Immunoglobulin G Low pH
Histidine tail 6–10 amino acids Ni2+ Imidazole
Strep tag 10 amino acids Streptavidin Iminobiotin
Pinpoint 13 kDa Streptavidin Biotin
Maltose-binding 40 kDa Amylose Maltose
protein
GST 26 kDa Glutathione Reduced glutathione
Flag 8 amino acids Specific MAb EDTA or low pH
Polyarginine 5–6 amino acids SP-Sephadex High salt at pH
>8.0
c-myc 11 amino acids Specific MAb Low pH
S tag 15 amino acids S fragment of RNase A Low pH
Calmodulin-binding 26 amino acids Calmodulin EGTA and high salt
peptide
Cellulose-binding 4–20 kDa Cellulose Urea or guanidine
domain hydrochloride
Chitin-binding domain 51 amino acids Chitin SDS or guanidine
hydrochloride
SBP tag 38 amino acids Streptavidin Biotin
ZZ, a fragment of Staphylococcus aureus protein A; Strep tag, a peptide with affinity for streptavidin;
Pinpoint, a protein fragment that is biotinylated and binds streptavidin; GST, glutathione S-transferase; Flag,
a peptide recognized by enterokinase; EDTA, ethylenediaminetetraacetic acid; c-myc, a peptide from a pro-
tein that is overexpressed in many cancers; S tag, a peptide fragment of ribonuclease (RNase) A; EGTA,
ethylene glycol-bis(β-aminoethyl ether)-N,N,N′,N′-tetraacetic acid; SBP (streptavidin-binding protein), a
peptide with affinity for streptavidin; SP-Sephadex, a cation-exchange resin composed of sulfopropyl groups
covalently attached to Sephadex beads; SDS, sodium dodecyl sulfate.
the protease and the fusion protein from the protein of interest. Unfortu-
nately, sometimes proteases also cleave the protein of interest. When this
occurs to any significant extent, it is necessary to change either the linker
peptide or the digestion conditions.
In addition to reducing the degradation of cloned foreign proteins,
a number of fusion proteins have been developed to simplify the purifi-
cation of recombinant proteins (Table 6.3). This approach is useful for
purification of proteins expressed in either prokaryotic or eukaryotic host
organisms. For example, a vector that contains the human interleukin-2
IL-2 (IL-2) cytokine gene joined to DNA encoding the fusion partner (marker
interleukin-2 peptide) sequence Asp-Tyr-Lys-Asp-Asp-Asp-Asp-Lys serves the dual
function of reducing the degradation of the expressed IL-2 gene prod-
uct and facilitating the purification of the product. Following expression
of this construct, the secreted fusion protein can be purified in a single
step by immunoaffinity chromatography, in which monoclonal antibodies
directed against the marker peptide have been immobilized on a solid
support and act as ligands to bind the fusion protein (Fig. 6.8). Because
this particular marker peptide is relatively small, it does not significantly
decrease the amount of host cell resources that are available for the pro-
duction of IL-2; thus, the yield of IL-2 is not compromised by the con-
comitant synthesis of the marker peptide. In addition, while the fusion
protein has the same biological activity as native IL-2, as mentioned
above, to more readily satisfy the government agencies that regulate the
A B
Marker
peptide Interleukin-2 Other proteins
Marker peptide/interleukin-2
fusion protein bound to specific
antibodies on a column
Figure 6.8 Immunoaffinity chromatographic purification of a fusion protein that in-
cludes a marker protein and IL-2. A monoclonal antibody that binds to the marker
peptide of the fusion protein (anti-marker peptide antibody) is attached to a solid
matrix support. The secreted proteins (A) are passed through the column containing
the bound antibody. The marker peptide portion of the fusion protein is bound to
the antibody (B), and the other proteins pass through. The immunopurified fusion
protein can then be eluted from the column by the addition of pure marker peptide.
doi:10.1128/9781555818890.ch6.f6.8
use of pharmaceuticals, it is necessary to remove the marker peptide if the

product is intended for human use.
Alternatively, it has become popular to generate a fusion protein
containing six or eight histidine residues attached to either the N- or
C-terminal end of the target protein. The histidine-tagged protein, along
with other cellular proteins, is then passed over an affinity column of
nickel–nitrilotriacetic acid. The histidine-tagged protein, but not the other
cellular proteins, binds tightly to the column. The bound protein may be
eluted from the column by the addition of imidazole (the side chain of
histidine). With this protocol, some cloned and overexpressed proteins
have been purified up to 100-fold with greater than 90% recovery in a
single step.
Metabolic Load
The introduction and expression of foreign DNA in a host organism often
change the metabolism of the organism in ways that may impair normal
cellular functioning (Fig. 6.9). This biological response is due to a meta-
bolic load (metabolic burden, metabolic drain) imposed upon the host by
the presence and expression of foreign DNA. A metabolic load can occur
as the result of a variety of conditions, including the following.
• Increasing plasmid copy number and/or size requires increasing
amounts of cellular energy for plasmid replication and maintenance.
• The limited amount of dissolved oxygen in the growth medium is
often insufficient for both host cell metabolism and plasmid main-
tenance and expression (see the section on overcoming oxygen lim-
itation below).
• Overproduction of both target and marker proteins may deplete
the pools of certain aminoacyl-tRNAs (see the section on codon
usage above) and/or drain the host cell of its energy (in the form of GTP
guanosine 5′-triphosphate
ATP or guanosine 5′-triphosphate [GTP]).
342 CHAPTER 6
Cellular building
Nontransformed E. coli
blocks and energy
Figure 6.9 Schematic representation of

nontransformed E. coli cells containing Introduced plasmid DNA Recombinant protein
a high level of cellular building blocks
and energy and E. coli cells transformed
with plasmid DNA (circle) that encodes
the synthesis of a foreign protein (shown
in blue), thereby depleting the host cell
of much of its cellular building blocks
and energy. The overexpression of a for-
Transformed E. coli
eign protein prevents the cell from ob-
taining sufficient energy and resources Cellular building
for its growth and metabolism, so it is blocks and energy
less able to grow rapidly and attain a
high cell density.
doi:10.1128/9781555818890.ch6.f6.9
• When a foreign protein is overexpressed and then exported from

the cytoplasm to either the cell membrane, the periplasm, or the
external medium, it may “jam” membrane export sites and thereby
prevent the proper localization of other, essential, host cell proteins.
• A foreign protein may sometimes interfere with the functioning of
the host cell, for example, by converting an important and needed
metabolic intermediate into a compound that is irrelevant, or even
toxic, to the cell.
One of the most commonly observed consequences of a metabolic
load is a decrease in the rate of cell growth after introduction of foreign
DNA. Sometimes, a metabolic load may result in plasmid-containing cells
losing all or a portion of the plasmid DNA. This is especially problematic
during the large-scale growth of a transformed host because this step is
usually carried out in the absence of selective pressure. Since cells growing
in the presence of a metabolic load generally have a decreased level of
energy for cellular functions, energy-intensive metabolic processes such
protein synthesis are invariably adversely affected by a metabolic load.
A metabolic load may also lead to changes in the host cell size and shape
and to increases in the amount of extracellular polysaccharide produced
by the bacterial host cell. This additional extracellular carbohydrate may
cause the cells to stick together, making harvesting, e.g., by cross-flow mi-
crofiltration procedures, and protein purification more difficult.
When a particular aminoacyl-tRNA becomes limiting, as is often the
case when a foreign protein is overexpressed in E. coli, there is an in-
creased probability that an incorrect amino acid will be inserted in place
of the limiting amino acid. In addition, translational accuracy, which de-

pends upon the availability of GTP as part of a proofreading mechanism,
is likely to be further decreased as a consequence of a metabolic load from
foreign protein overexpression. In one instance, a high level of expres-
sion of mouse epidermal growth factor in E. coli caused about 10 times
the normal amount of incorrect amino acids to be incorporated into the
recombinant protein. This increase in error frequency can dramatically
diminish the usefulness of the target protein as a therapeutic agent.
The extent of the metabolic load can be reduced by using a
low-copy-number rather than a high-copy-number plasmid vector. An
even better strategy might be to avoid the use of plasmid vectors and in-
tegrate the introduced foreign DNA directly into the chromosomal DNA
of the host organism (see the section below on integrating foreign genes
into the host chromosomal DNA). In this case, plasmid instability will not
be a problem. With an integrated cloned gene, without a plasmid vector,
the transformed host cell will not waste its resources synthesizing un-
wanted and unneeded antibiotic resistance marker gene products. The use
of strong but regulatable promoters is also an effective means of reduc-
ing metabolic load (see the section on promoters above). In this case, the
production-scale fermentation process may be performed in two stages.
During the first, or growth, stage, the promoter controlling the transcrip-
tion of the target gene is turned off, while during the second, or induction,
stage, this promoter is turned on.
When the codon usage of the foreign gene is different from the codon
usage of the host organism, depletion of specific aminoacyl-tRNA pools
may be avoided by either completely or partially synthesizing the target
gene to better reflect the codon usage of the host organism (see the sec-
tion on codon usage above). In one study, it was found that levels of the
protein streptavidin were 10-fold higher in E. coli when expression was
directed by a synthetic gene with a G+C content of 54% than when it
was directed by the natural gene with a G+C content of 69%.
Although it may at first seem counterintuitive, one way to increase the
amount of foreign protein produced during the fermentation process is to
accept a modest level of foreign gene expression—perhaps 5% of the total
cell protein—and instead focus on attaining a high host cell density. Thus,
for example, an organism with a 5% foreign protein expression level and
a low level of metabolic load that can be grown to a density of 40 g (dry
weight) per liter produces more of the target protein than one with a 15%
expression level for the same protein and a cell density of only 5 to 10 g
(dry weight) per liter.
Chromosomal Integration
As a consequence of metabolic load, a fraction of the cell population often
loses its plasmids during cell growth. In addition, cells that lack plasmids
grow faster than those that retain them, so plasmidless cells eventually
dominate the culture. After a number of generations of cell growth, the
loss of plasmid-containing cells diminishes the yield of the cloned gene
product. Plasmid-containing cells may be maintained by growing the cells
344 CHAPTER 6
in the presence of either an antibiotic or an essential metabolite that en-

ables only plasmid-bearing cells to thrive. But the addition of either anti-
biotics or metabolites to industrial-scale fermentations can be extremely
costly, and it is imperative that anything that is added to the fermentation,
such as an antibiotic or a metabolite, be completely removed prior to cer-
tifying the product fit for human use. For this reason, cloned DNA is often
introduced directly into the chromosomal DNA of the host organism.
When DNA is part of the host chromosomal DNA, it is quite stable and
can be maintained indefinitely in the absence of selective agents.
For integration of DNA into a chromosomal site, the input DNA
must share some sequence similarity, usually at least 50 nucleotides, with
the chromosomal DNA, and there must be a physical exchange (recom-
bination) between the two DNA molecules. To integrate DNA into the
host chromosome, it is first necessary to identify a desired chromosomal
integration site, i.e., a segment of DNA on the host chromosome that can
be disrupted without affecting the normal functions of the cell. Once the
chromosomal integration site has been isolated and spliced onto a plas-
mid, a marker gene is inserted in the middle of the cloned chromosomal
integration site (Fig. 6.10). The DNA on the plasmid can base-pair with
identical sequences on the host chromosome and subsequently integrate
into the host chromosome as a result of a host enzyme-catalyzed double
Figure 6.10 Insertion of a foreign gene 1 Chromosomal DNA

into a unique predetermined site on a bac-
terial chromosome. In step 1, a marker
gene is integrated into the host cell chro-
mosomal DNA by homologous recom-
bination. In step 2, the selectable marker
gene is replaced by the target gene. Marker gene Plasmid DNA
doi:10.1128/9781555818890.ch6.f6.10
2 Marker gene
Chromosomal DNA
Target gene Plasmid DNA
Target gene
Chromosomal DNA
crossover. In this case, transformants are selected for the acquisition of the
marker gene (often an antibiotic resistance gene). Then, the target gene,
under the control of a regulatable promoter, is inserted in the middle of
the cloned chromosomal integration site on a different plasmid. This plas-
mid construct is used to transform host cells that contain the marker gene
integrated into its chromosome, and following a host enzyme-catalyzed
double crossover, the target gene and its transcriptional regulatory region
are inserted into the chromosome in place of the marker gene. The final
construct is selected for the loss of the marker gene.
Several other methods can also be used to integrate foreign genes into
host chromosomal DNA. For example, when a marker gene is flanked
by certain short specific DNA sequences and then inserted into either a
plasmid or chromosomal DNA, the gene may be excised by treatment of
the construct with an enzyme that recognizes the flanking DNA sequences
and removes them (Fig. 6.11). One combination of an enzyme and DNA
sequence that is useful for this sort of manipulation is the Cre–loxP re-
combination system, which consists of the Cre recombinase enzyme and
two 34-bp loxP recombination sites. The marker gene to be removed is
flanked by loxP sites, and after integration of the plasmid into the chro-
mosomal DNA, the marker gene is removed by the Cre enzyme. A gene
Figure 6.11 Removal of a selectable marker gene following integration of plasmid

DNA into a bacterial chromosome. A single crossover event (×) occurs between chro-
mosomal DNA and a homologous DNA fragment (hatched) on a plasmid, resulting
in the integration of the entire plasmid into the chromosomal DNA. The selectable
marker gene, which is flanked by loxP sites, is excised by the action of the Cre en-
zyme, leaving one loxP site on the integrated plasmid. The Cre enzyme is on a sep-
arate plasmid within the same cell under the transcriptional control of the E. coli
lac promoter, so excision is induced when IPTG is added to the growth medium.
doi:10.1128/9781555818890.ch6.f6.11
Marker
loxP gene loxP
Plasmid
Cloned
gene
Site of Homologous
recombination chromosomal DNA
Homologous
recombination
+ Cre protein
346 CHAPTER 6
encoding the Cre enzyme is located on its own plasmid, which can be
introduced into the chromosomally transformed host cells. Marker gene
excision is triggered by the addition of IPTG to the growth medium; this
derepresses the lacI gene (encoding the lac repressor), which turns on
the E. coli lac promoter–operator, which was present upstream of the
Cre gene, and causes the Cre enzyme to be synthesized. Once there is no
longer any need for the Cre enzyme, the plasmid that contains the gene
for this enzyme under the control of the lac promoter may be removed
from the host cells merely by raising the temperature. This plasmid has a
temperature-sensitive replicon that allows it to be maintained in the cell
at 30°C but not above 37°C.
Increasing Secretion
For most E. coli proteins, secretion entails transit through the inner (cy-
toplasmic) cell membrane to the periplasm. Directing a foreign protein
to the periplasm, rather than the cytoplasm, makes its purification easier
and less costly, as many fewer proteins are present here than in the cyto-
plasm. Moreover, the stability of a cloned protein depends on its cellular
location in E. coli. For example, recombinant proinsulin is approximately
10 times more stable if it is secreted (exported) into the periplasm than
if it is localized in the cytoplasm. In addition, secretion of proteins to the
periplasm often facilitates the correct formation of disulfide bonds be-
cause the periplasm provides an oxidative environment, as opposed to the
more reducing environment of the cytoplasm. Table 6.4 provides some ex-
amples of the amounts of secreted recombinant pharmaceutical proteins
attainable with various bacterial host cells.
Normally, an amino acid sequence called a signal peptide (also called
a signal sequence or leader peptide), located at the N-terminal end of a
newly synthesized protein, facilitates its export by enabling the protein
to pass through the cell membrane (Fig. 6.12). It is sometimes possible
to engineer a protein for secretion to the periplasm by adding the DNA
Table 6.4 Yields of several secreted recombinant proteins produced in different

bacteria
Protein Yield Host bacterium
Hirudin >3 g/L Escherichia coli
Human antibody fragment 1–2 g/L Escherichia coli
Human insulin-like growth factor 8.5 g/L Escherichia coli
Monoclonal antibody 5T4 700 mg/L Escherichia coli
Humanized anti-CD18 F(ab′)2 2.5 g/L Escherichia coli
Human epidermal growth factor 325 mg/L Escherichia coli
Alkaline phosphatase 5.2 g/L Escherichia coli
Staphylokinase 340 mg/L Bacillus subtilis
Human proinsulin 1 g/L Bacillus subtilis
Human calcitonin precursor 2 g/L Staphylococcus carnosus
Organophosphohydrolase 1.2 g/L Ralstonia eutropha
Human CD4 receptor 200 mg/L Streptomyces lividans
Human insulin 100 mg/L Streptomyces lividans
sequence encoding a signal peptide to the cloned gene. When the recom- Ribosome
binant protein is secreted to the periplasm, the signal peptide is precisely mRNA
removed by the cell’s secretion apparatus, so the N-terminal end of the Growing Aminoacyl-
target protein is identical to the natural protein. peptide tRNA
Unfortunately, the fusion of a target gene to a DNA fragment en- chain
Cytoplasm
coding a signal peptide sequence does not necessarily guarantee a high
rate of secretion. When this simple strategy is found to be ineffective in Cell membrane
producing a secreted protein product, alternative strategies need to be Periplasm
employed. One approach that was found to be successful for the secretion
Signal peptide
of the IL-2 cytokine was the fusion of the IL-2 gene downstream from
the gene for the entire propeptide maltose-binding protein, rather than Figure 6.12 Schematic representation
just the maltose-binding protein signal sequence, with DNA encoding the of protein secretion. The ribosome is
factor Xa recognition site as a linker peptide separating these two genes attached to a cellular membrane, and
the signal peptide at the N terminus is
(Fig. 6.13). When this genetic fusion, on a plasmid vector, was used to transported, by the secretion appara-
transform E. coli cells, as expected, a large fraction of the fusion protein tus, across the cytoplasmic membrane,
was found localized in the host cell periplasm. Functional IL-2 could then followed by the rest of the amino acids
that constitute the mature protein. Once
be released from the fusion protein by digestion with factor Xa. the signal peptide has crossed the mem-
Sometimes too high a level of translation of a foreign protein can brane, it is cleaved from the remainder
overload the cell’s secretion machinery and inhibit the secretion of that of the protein by an enzyme associated
protein. Thus, to ensure that secretion of a target protein occurs most with the membrane called a signal pep-
tidase. Membrane proteins as well as se-
efficiently, it is necessary to lower the level of expression of that protein. creted proteins generally contain a signal
E. coli and other gram-negative microorganisms generally cannot se- peptide (prior to removal by processing).
crete proteins into the surrounding medium because of the presence of doi:10.1128/9781555818890.ch6.f6.12
an outer membrane (in addition to the inner or cytoplasmic membrane)
that restricts this process. Of course, it is possible to use as host organ-
isms gram-positive prokaryotes or eukaryotic cells, both of which lack
an outer membrane and therefore can secrete proteins directly into the
medium. Alternatively, it is possible to take advantage of the fact that
some gram-negative bacteria can secrete a bacteriocidal protein called a
bacteriocin into the medium. A cascade mechanism is responsible for this
specific secretion. A bacteriocin release protein activates phospholipase A,
which is present in the bacterial inner membrane, and cleaves membrane
phosopholipids so that both the inner and outer membranes are perme-
abilized (Fig. 6.14A). This results in some cytoplasmic and periplasmic
MBP signal Figure 6.13 Engineering the secretion of

peptide Maltose-binding protein Linker peptide Interleukin-2 IL-2. When IL-2 is fused to the E. coli
maltose-binding protein and its signal
peptide, with the two proteins joined
by a linker peptide, secretion to the
Signal
periplasm occurs. Following the purifi-
peptidase Secretion
cation of the secreted fusion protein, the
maltose-binding protein and the linker
peptide are removed by digestion with
factor Xa.
doi:10.1128/9781555818890.ch6.f6.13
Cleavage
Processing with Xa
348 CHAPTER 6
Bacteriocin
release factor Phospholipase A
Inner membrane
Outer membrane
Target gene
Target protein
C
Target gene Bacteriocin

release factor
gene
Figure 6.14 Schematic representation of a recombinant bacteriocin release protein

activating phospholipase A (present within the E. coli inner membrane) to permeabi-
lize the cell membranes (A). Also shown are schematics of E. coli cells engineered to
secrete a foreign protein to the periplasm by fusing the gene of interest (green) to a
secretion signal (B) and to the growth medium by permeabilizing cell membranes with
a bacteriocin release protein encoded on another plasmid (red) (C).
doi:10.1128/9781555818890.ch6.f6.14
proteins being released into the culture medium. Thus, by putting the
bacteriocin release protein gene onto a plasmid under the control of a
regulatable promoter, E. coli cells may be permeabilized at will. E. coli
cells that carry the bacteriocin release protein gene on a plasmid are trans-
formed with another plasmid carrying a cloned gene that has been fused
to a secretion signal peptide sequence that causes the target protein to
be secreted into the periplasm. The cloned gene is placed under the same
transcriptional regulatory control as the bacteriocin release protein gene
so that the two genes can be induced simultaneously, with the cloned gene
protein being secreted into the medium (Fig. 6.14B and C).
Overcoming Oxygen Limitation

E. coli and most other microorganisms that are used to express foreign
proteins generally require oxygen for optimal growth. Unfortunately, ox-
ygen has only a limited solubility in aqueous media. Thus, as the cell
density of a growing culture increases, the cells rapidly deplete the growth
medium of dissolved oxygen. When cells become oxygen limited, expo-
nential growth slows and the culture rapidly enters a stationary phase
during which cellular metabolism changes. One consequence of the sta-
tionary phase is the production by the host cells of proteases that can
degrade foreign proteins. Oxygen dissolves into the growth medium very
slowly, so the amount of dissolved oxygen available to growing cells is
often not increased fast enough when large amounts of air or oxygen are
added to the growth medium, even with high stirring rates. Modification
of the fermenter (bioreactor) configuration to optimize the aeration and
agitation of cells and addition of chemicals to the growth medium to
increase the solubility of oxygen have been tried in an effort to deal with
the limited amount of dissolved oxygen. However, these efforts have met
with only limited success.
Some strains of the bacterium Vitreoscilla, a gram-negative obligate
aerobe, normally live in oxygen-poor environments such as stagnant
ponds. To obtain a sufficient amount of oxygen for their growth and
metabolism, these organisms synthesize a hemoglobin-like molecule that
tightly binds oxygen from the environment and subsequently increases
the level of available oxygen inside cells (Fig. 6.15). When the gene for
this protein is expressed in E. coli, the transformants display higher lev-
els of protein synthesis of both cellular and recombinant proteins, higher
levels of cellular respiration, a higher ATP production rate, and higher
ATP contents, especially at low levels of dissolved oxygen (0.25 to 1.0%)
in the growth medium, than do nontransformed cells. In these transfor-
mants, the Vitreoscilla hemoglobin increases the intracellular oxygen con-
centration, which raises the activities of both cytochromes d and o. This
causes an increase in proton pumping, with the subsequent generation of
ATP, thereby providing additional energy for cellular metabolic processes
(Fig. 6.15). For this strategy to be effective in different host cells, not
only must the Vitreoscilla sp. hemoglobin gene be efficiently expressed
but also the host cells must be able to synthesize the heme portion of the
hemoglobin molecule. Once these conditions have been met, this strategy
350 CHAPTER 6
A B
Cytochrome H+ ATPase H+
complex complex Periplasm
+ + + + + + + + + + + + + + + + + + + + + +
O2
O2
– – – – – – – – – – – – – – – – – – – – – – O2
O2 O2 Bacterial cells
Cytoplasm
O2 H2O
O2 ADP ATP
O2 O2
O2 O2
Vitreoscilla
hemoglobin Figure 6.15 (A) Schematic representation of the binding of O2 by Vitreoscilla he-
moglobin, the utilization of this O2 in pumping (by proteins such as cytochromes)
H+ from the cytoplasm to the periplasm, and the subsequent coupling of H+ up-
take (by ATPase) to ATP generation. (B) Host cells engineered to express Vitreoscilla
hemoglobin are more efficient at taking up oxygen from the growth medium.
doi:10.1128/9781555818890.ch6.f6.15
can be used to improve growth as well as foreign gene expression in a

range of different industrially important bacteria, including E. coli. Thus,
host cells expressing the Vitreoscilla sp. hemoglobin gene often undergo
several additional doublings before entering stationary phase, producing
a much higher cell density and a much greater yield of the target recom-
binant protein.
Reducing Acetate
It is often difficult to achieve high levels of foreign-gene expression and
a high host cell density at the same time because of the accumulation
of harmful waste products, especially acetate, which inhibits both cell
growth and protein production and also wastes available carbon and en-
ergy resources. Since acetate is often associated with the use of glucose
as a carbon source, lower levels of acetate, and hence higher yields of
protein, are generally obtained when fructose or mannose is used as a
carbon source. In addition, several different types of genetically manip-
ulated E. coli host cells that produce lower levels of acetate have been
developed. One of these modified strains was produced by introducing a
gene (from B. subtilis) encoding the enzyme acetolactate synthase into E.
coli host cells. This enzyme catalyzes the formation of acetolactate from
pyruvate, thereby decreasing the flux through acetyl coenzyme A to ace-
tate (Fig. 6.16). In practice, the acetolactate synthase genes are introduced
into the cell on one plasmid, while the target gene (encoding the protein
Glucose Biomass
Glucose-6-phosphate
Phosphoenolpyruvate Succinate
ALS system
Acetolactate Pyruvate Formate
Acetyl-CoA
Acetoin
Acetaldehyde
Lactate Acetate Ethanol
Figure 6.16 Schematic representation of the pathways for glucose metabolism in an

E. coli strain that has been transformed with a plasmid carrying the genes for the pro-
tein subunits of acetolactate synthase (ALS). Introduction of this pathway results in
the synthesis of acetoin. Note that the conversion of glucose to biomass is a multistep
process. Acetyl-CoA, acetyl coenzyme A. doi:10.1128/9781555818890.ch6.f6.16
that is to be overexpressed in E. coli) is introduced on a second plasmid

from a different incompatibility group. The cells that were transformed
with the acetolactate synthase genes produced 75% less acetate than the
nontransformed cells and instead synthesized acetoin, which is approxi-
mately 50-fold less toxic to cells than acetate. The recombinant protein
yield was also doubled.
In another approach to lowering the level of acetate, researchers
transformed E. coli host cells with a bacterial gene for the enzyme pyru-
vate carboxylase, which converts pyruvate directly to oxaloacetate and is
not present in E. coli (Fig. 6.17). With the introduction of pyruvate car-
boxylase, acetate levels were decreased to less than 50% of their normal
level, the cell yield was increased by more than 40%, and the amount
of foreign protein synthesized was increased by nearly 70%. This result
reflects the fact that the addition of pyruvate carboxylase allows E. coli
cells to use the available carbon more efficiently, directing it away from
the production of acetate toward biomass and protein formation.
Similar to the strategy discussed above, the tricarboxylic acid (TCA) TCA
cycle may also be replenished by converting aspartate to fumarate tricarboxylic acid
(Fig. 6.17). To do this, E. coli host cells were transformed with the gene
for l-aspartate ammonia lyase (aspartase) under the control of the strong
tac promoter on a stable low-copy-number plasmid. The target recom-
binant protein was introduced on a separate plasmid. Using this system,
aspartase activity was induced by the addition of IPTG at the middle to
352 CHAPTER 6
Glucose Biomass
Glucose-6-phosphate
Phosphoenolpyruvate
Pyruvate
Pyruvate Acetyl-CoA Acetate

carboxylase
Oxaloacetate Citrate
TCA cycle
Aspartate Fumarate
Aspartase
Figure 6.17 Replenishment of the TCA cycle in E. coli by the introduction of a gene
encoding pyruvate carboxylase. This avoids the conversion of pyruvate to acetate.
The TCA cycle may also be replenished by the introduction of a gene encoding
aspartase, converting aspartate in the medium to fumarate. Note that the conver-
sion of glucose to biomass is a multistep process. Acetyl-CoA, acetyl coenzyme A.
doi:10.1128/9781555818890.ch6.f6.17
late log phase of growth. When the recombinant E. coli cells were grown
in minimal medium containing aspartate, the production of different re-
combinant proteins could be increased up to fivefold, with 30 to 40%
more biomass production.
Protein Folding
The use of conditions that result in very high rates of foreign gene ex-
pression in E. coli often also lead to the production of misfolded proteins
that can aggregate and form insoluble inclusion bodies within the host
cell. While it is possible to solubilize inclusion bodies and subsequently
establish conditions that allow at least a portion of the recombinant pro-
tein to fold correctly, this is typically a tedious, inefficient, expensive, and
time-consuming process, one that is best avoided if possible. A simple
strategy to avoid the formation of misfolded proteins and hence inclusion
bodies involves reducing the rate of synthesis of the target gene product
so that it has more time to fold properly. This may be achieved by various
means, including using weaker promoters, decreasing the concentration
of inducers (such as IPTG), or lowering the growth temperature to 20 to
30°C. These strategies are sometimes, but not always, effective in prevent-
ing the formation of inclusion bodies.
An alternative strategy to improve the yield of properly folded (and
therefore active) recombinant proteins in E. coli involves the coexpression
of one or more molecular chaperones (proteins that facilitate the correct
folding of other proteins) by the host E. coli strain (Table 6.5). The “fold-
ing chaperones” utilize ATP cleavage to promote conformational changes
to mediate the refolding of their substrates. The “holding chaperones”
bind to partially folded proteins until the folding chaperones have done
their job. The “disaggregating chaperone” promotes the solubilization of
proteins that have become aggregated. Protein folding also involves the
“trigger factor,” which binds to nascent polypeptide chains, acting as a
holding chaperone. Although there are a large number of chaperone mol-
ecules that are involved in the proper folding and secretion of proteins
Table 6.5 E. coli proteins that facilitate the correct folding of

recombinant proteins
Localization Function Name
Cytoplasm Holding chaperone Hsp31
Hsp33
IbpA
IbpB
Trigger factor
Folding chaperone GroEL (Hsp60)
DnaK (Hsp70)
HscA
HscC
HtpG (Hsp90)
Disaggregase ClpB
Secretory chaperone SecB
Periplasm Generic chaperones Skp (OmpH)
FkpA
Specialized chaperones SurA
LolA
PapD
FimC
PPIases SurA
PpiD
FkpA
PpiA (RotA)
Disulfide bond formation DsbA
DsbB
DsbC
DsbD
DsbE
DsbG
CcmH
Adapted from Baneyx and Mujacic, Nat. Biotechnol. 22:1399–1408, 2004.
354 CHAPTER 6
in E. coli, a detailed understanding of the roles of many of these proteins

has begun to emerge, and the proper folding of a number of recombi-
nant proteins has been facilitated by coexpressing some of these chaper-
one proteins. Thus, for example, significantly enhanced correct folding
of periplasmic proteins, as well as reduced recombinant protein degrada-
tion and inclusion body formation, is observed when the chaperone Skp
PPIase and the peptidyl-prolyl cis/trans isomerase (PPIase) FkpA are coexpressed
peptidyl-prolyl cis/trans isomerase along with the recombinant (secreted) protein. Finally, it is important to
note that the periplasm provides an oxidizing environment (compared
to the reducing environment of the cytoplasm), so in nontransformed E.
coli, disulfide bond-containing proteins are found only in the cell enve-
lope, where disulfide formation and isomerization are catalyzed by a set
of thiol-disulfide oxidoreductases known as the Dsb proteins. In practice,
this means that disulfide bond-containing recombinant proteins are un-
likely to be effectively synthesized in the cell cytoplasm.
In a study in which the chaperones DnaK and GroEL (and their co-
chaperonin protein molecules) were overexpressed, the yields of 7 of 10
eukaryotic kinases expressed at the same time as target proteins were
increased up to fivefold. Moreover, the overexpression of additional chap-
erones did not affect the detected levels of these kinases. This result was
interpreted as indicating that the ability of overexpressed chaperones to
facilitate the proper folding of recombinant proteins is a protein-specific
phenomenon effective in some, but not all, instances.
While some researchers now routinely coexpress molecular chaper-
ones along with the recombinant protein of interest, others have sought
technically simpler solutions to the problem of protein folding. Thus, it
has been observed that high levels of certain osmolytes (substances that
contribute to the osmotic pressure in cells), such as sorbitol and betaine,
along with a high level of salt in the growth medium, can enhance the
correct folding and solubility of several recombinant proteins produced
in E. coli.
Heterologous Protein Production in Eukaryotic Cells

Eukaryotic Expression Systems
The eukaryotic proteins produced in bacteria do not always have the
desired biological activity or stability. In addition, despite careful puri-
fication procedures, bacterial compounds that are toxic or that cause a
rise in body temperature in humans and animals may contaminate the
final product. Moreover, any human protein intended for medical use
must be identical to the natural protein in all its properties. The inability
of prokaryotic organisms to produce authentic versions of eukaryotic
proteins is, for the most part, due to improper posttranslational protein
processing and to the absence of appropriate mechanisms that add chem-
ical groups to specific amino acid acceptor sites. To avoid these problems,
investigators have developed eukaryotic expression systems in fungal/
yeast, insect, and mammalian cells for the production of therapeutic
agents (Milestone 6.2).
Synthesis of Rabbit β-Globin in Cultured Monkey Kidney Cells milestone

Following Infection with a SV40 β-Globin Recombinant Genome 6.2
R. C. Mulligan, B. H. Howard, and P. Berg
Nature 277:108–114, 1979
C
onceptually, the development of genes had been cloned into the construct. Mulligan et al. concluded,
of a eukaryotic expression mammalian SV40 vectors, but mature, “The principal conceptual innova-
system appears to be a rel- functional mRNAs were never detected tion is the decision to leave intact
atively simple matter of assembling after infection of host cells. This prob- the regions of the vector implicated
the appropriate regulatory sequences, lem was overcome by inserting the rab- in . . . mRNA processing.” This study
cloning them in the correct order into bit cDNA for β-globin into an SV40 established that an effective eukaryotic
a vector, and then putting the gene of gene that had nearly all of its coding expression system could be created
interest into the precise location that region deleted but retained “all the by placing the cloned gene under the
enables it to be expressed. In reality, regions implicated in transcriptional control of transcription and translation
the development of the first genera- initiation and termination, splicing regulatory sequences. It also stimulated
tion of eukaryotic expression vectors and polyadenylation. . . .” Both rabbit additional research that pinpointed
was a painstaking process following β-globin mRNA and protein were in detail the structural prerequisites
a trial-and-error approach. Before the synthesized in cells that were transfor the next generation of eukaryotic
study of Mulligan et al., a number fected with this β-globin cDNA–SV40 expression vectors.
The basic requirements for expression of a target protein in a eukary-

otic host are similar to those in prokaryotes. Vectors into which the target
gene is cloned for delivery into the host cell can be specialized plasmids YAC
designed to be maintained in the eukaryotic host, such as the yeast 2μm yeast artificial chromosome
plasmid; host-specific viruses, such as the insect baculovirus; or artificial
chromosomes, such as the yeast artificial chromosome (YAC). The vec-
tor must have a eukaryotic promoter that drives the transcription of the
Figure 6.18 Generalized eukaryotic ex-
target gene, eukaryotic transcriptional and translational stop signals, a pression vector. The major features of a
sequence that enables polyadenylation of the mRNA, and a selectable eu- eukaryotic expression vector are a eu-
karyotic marker gene (Fig. 6.18). Because recombinant DNA procedures karyotic transcription unit with a pro-
are technically difficult to carry out with eukaryotic cells, most eukary- moter (p), a multiple-cloning site (MCS)
for a gene of interest, and a DNA seg-
otic vectors are shuttle vectors with two origins of replication and two ment with termination and polyadeny-
selectable marker genes. One set functions in E. coli, and the other set lation signals (t); a eukaryotic selectable
functions in the eukaryotic host cell. marker (ESM) gene system; an origin of
replication that functions in the eukary-
Many eukaryotic proteins undergo posttranslational processing that otic cell (ori euk); an origin of replication
is required for protein activity and stability. Some proteins are produced that functions in E. coli (ori E); and an
as inactive precursor polypeptides that must be cleaved by proteases at E. coli selectable marker (Ampr) gene.
doi:10.1128/9781555818890.ch6.f6.18
specific sites to produce the active form of the protein. In addition, ∼50%
of all human proteins are glycosylated (i.e., certain amino acids are mod- ori E
ified by adding specific sugars), often providing stability and distinctive
binding properties to a protein, including protein folding, targeting a Amp r
t
gene
protein to a particular location, or protecting it from proteases. In the ESM
cell, sugars are attached to newly synthesized proteins in the endoplasmic p
reticulum and in the Golgi apparatus by enzymes known as glycosylases
and glycosyltransferases.
The most common glycosylations entail the attachment of specific ori euk
sugars to the hydroxyl group of either serine or threonine (O-linked glyco- t
sylation) and to the amide group of asparagine (N-linked glycosylation). p MCS
356 CHAPTER 6
Other amino acid modifications include phosphorylation, acetylation,

sulfation, acylation, γ-carboxylation, and the addition of C14 and C16
fatty acids, i.e., myristoylation (or myristylation) and palmitoylation (or
palmitylation), respectively. Unfortunately, there is no universally effec-
tive eukaryotic host cell that performs the correct modifications on every
protein.
Saccharomyces cerevisiae Expression Systems

Yeasts, like prokaryotes, grow rapidly in low-cost medium, generally do
not require the addition of growth factors to the medium, can correctly
process eukaryotic proteins, and can secrete large amounts of heterolo-
gous proteins. The yeast S. cerevisiae, traditionally employed in baking
and brewing, has been used extensively as a host cell for the expression of
cloned eukaryotic genes.
High levels of recombinant protein production can be achieved using
S. cerevisiae. This is because of the following. (i) The detailed biochemis-
try, genetics, and cell biology of this single-celled yeast are well known; its
genome sequence was completely determined in 1996. (ii) It can be grown
rapidly to high cell densities on relatively simple media. (iii) Several strong
promoters have been isolated from S. cerevisiae, and a naturally occurring
plasmid, called the 2μm plasmid, can be used as part of an endogenous
yeast expression vector system. (iv) S. cerevisiae is capable of carrying out
many posttranslational modifications. (v) S. cerevisiae normally secretes
so few proteins that, when it is engineered for extracellular release of a
recombinant protein, the product can be easily purified. (vi) Because of
its use in the baking and brewing industries, S. cerevisiae has been listed
by the FDA as a “generally recognized as safe” organism. Therefore, the
use of the organism for the production of human therapeutic agents does
not require the extensive experimentation demanded for unapproved host
cells.
S. cerevisiae Vectors
There are three main classes of S. cerevisiae expression vectors: episomal,
YEp or plasmid, vectors (yeast episomal plasmids [YEps]), integrating vectors
yeast episomal plasmid (yeast integrating plasmids [YIps]), and YACs. Of these, episomal vec-
tors have been used extensively for the production of either intra- or ex-
YIp tracellular heterologous proteins. Typically, the vectors contain features
yeast integrating plasmid that allow them to function in both bacteria and S. cerevisiae. An E. coli
origin of replication and bacterial antibiotic resistance genes are usually
included on the vector, enabling all manipulations to first be performed
in E. coli before the vector is transferred to S. cerevisiae for expression.
The YEp vectors are based on the 2μm plasmid, a small, indepen-
dently replicating circular plasmid found in about 30 copies per cell in
most natural strains of S. cerevisiae. Many S. cerevisiae selection schemes
rely on mutant host strains that require a particular amino acid (e.g., his-
tidine, tryptophan, or leucine) or nucleotide (e.g., uracil) for growth. Such
auxotrophic strains cannot grow on minimal growth medium unless it is
supplemented with a specific nutrient. In practice, the vector is equipped
with a functional version of a gene that complements the mutated gene

in the host strain. For example, when a YEp with a wild-type LEU2 gene
is transformed into a mutant leu2 host cell and plated onto medium that
lacks leucine, only cells that carry the plasmid will grow.
Generally, regulatable, inducible promoters are preferred for produc-
ing large amounts of recombinant protein during large-scale growth. In
this context, the galactose-regulated promoters respond rapidly to the
addition of galactose with a 1,000-fold increase in transcription. Repres-
sible, constitutive, and hybrid promoters that combine the features of dif-
ferent promoters are also available. Maximal expression also depends on
efficient termination of transcription.
Plasmid-based yeast expression systems are often unstable under
large-scale (≥10 L) growth conditions, even in the presence of selection
pressure. To remedy this problem, a heterologous gene is integrated into
the host genome to provide a more reliable production system. The major
drawback of this strategy is the low yield of recombinant protein from a
single gene copy.
To increase the number of copies of an integrated heterologous gene
and thereby increase the overall yield of the recombinant protein, the het-
erologous gene can be integrated into nonessential portions of the S. cere-
visiae genome. In one study, 10 copies of a target gene were inserted into
the yeast genome and produced a significant amount of the recombinant
protein.
A YAC is designed to clone a large segment of DNA (100 kilobase
pairs [kb]), which is then maintained as a separate chromosome in the
host yeast cell. A YAC vector mimics a chromosome because it has a se-
quence that acts as an origin of DNA replication, a yeast centromere
sequence to ensure that after cell division each daughter cell receives a
copy of the YAC, and telomere sequences that are present at both ends
after linearization of the YAC DNA for stability (Fig. 6.19). However, to
date, YACs have not been used as expression systems for the commercial
production of heterologous proteins.
Secretion of Heterologous Proteins by S. cerevisiae

All glycosylated proteins of S. cerevisiae are secreted, and each must have
a leader sequence to pass through the secretory system. Consequently, the
coding sequences of recombinant proteins that require either O-linked or
N-linked sugars for biological activity must be equipped with a leader se-
quence. Under these conditions, correct disulfide bond formation, prote-
olytic removal of the leader sequence, and appropriate posttranslational
modifications can occur, and an active recombinant protein is secreted.
In recent years, the amount of heterologous protein that can be pro-
duced per liter of yeast culture has increased 100-fold (from about 0.02 to
2 g/L). This increase is mainly due to improvements in growing cultured
cells to high cell densities; the level of protein produced per cell has re-
mained largely unchanged.
One of the major reasons for producing a recombinant protein for
use in human therapeutics in yeasts rather than in bacteria is to en-
sure that the protein is processed correctly following synthesis. Correct
358 CHAPTER 6
TRP1 ARS CEN

Sma I
URA3
Amp r
gene pYAC
ori E
T T
BamHI BamHI
BamHI
SmaI
Alkaline phosphatase
T ori E Amp r TRP1 ARS CEN URA3 T

gene
Input DNA (>100 kb)
Ligate
Amp r
T ori E gene TRP1 ARS CEN URA3 T
Insert DNA
Figure 6.19 YAC cloning system. The YAC plasmid (pYAC) has an E. coli selectable
marker (Ampr) gene; an origin of replication that functions in E. coli (ori E); and yeast
DNA sequences, including URA3, CEN, TRP1, and ARS. CEN provides centro-
mere function, ARS is a yeast autonomous replicating sequence that is equivalent
to a yeast origin of replication, URA3 is a functional gene of the uracil biosynthesis
pathway, and TRP1 is a functional gene of the tryptophan biosynthesis pathway. The
T regions are yeast chromosome telomeric sequences. The SmaI site is the cloning
insertion site. pYAC is first treated with SmaI, BamHI, and alkaline phosphatase and
then ligated with size-fractionated (100-kb) input DNA. The final construct carries
cloned DNA and can be stably maintained in double-mutant ura3 and trp1 cells.
doi:10.1128/9781555818890.ch6.f6.19
protein folding occurs in the endoplasmic reticulum in eukaryotes and is

facilitated by a number of different proteins, including molecular chaper-
ones, enzymes for disulfide bond formation, signal transduction proteins
that monitor the demand and capacity of the protein-folding machin-
ery, and proteases that clear away improperly folded or aggregated pro-
teins (Fig. 6.20). The enzyme protein disulfide isomerase is instrumental
in forming the correct disulfide bonds within a protein. Poor yields of
overexpressed proteins often occur because the capacity of the cell to
properly fold and secrete proteins has been exceeded. One possible way
Protein released to
Golgi apparatus for
further processing
Correctly
UPR Misfolded folded
protein PDI
protein
BiP
Hac1
ER
Nucleus mRNA
Ribosomes
Cytoplasm
Figure 6.20 Summary of protein folding in the endoplasmic reticulum (ER) of yeast
cells. During synthesis on ribosomes associated with the ER, nascent proteins are
bound by the chaperones BiP and calnexin, which aid in the correct folding of the
protein. Protein disulfide isomerases (PDI) catalyze the formation of disulfide bonds
between cysteine amino acids that are nearby in the folded protein. Quality control
systems ensure that only correctly folded proteins are released from the ER. Proteins
released from the ER are transported to the Golgi apparatus for further processing.
Prolonged binding of BiP to misfolded proteins leads to activation of the S. cerevisiae
transcription factor Hac1, which controls the expression of several proteins that me-
diate the unfolded-protein response (UPR). Adapted from Gasser et al., Microb. Cell
Fact. 7:11–29, 2008. doi:10.1128/9781555818890.ch6.f6.20
around this problem includes the overproduction of molecular chap-

erones and protein disulfide isomerases. Thus, when the yeast protein
disulfide isomerase gene was cloned between the constitutive glyceralde-
hyde phosphate dehydrogenase promoter and a transcription terminator
sequence in a YIp vector, and the entire construct was integrated into a
chromosomal site, the modified strain showed a 16-fold increase in pro-
tein disulfide isomerase production compared with that of the wild-type
strain. When protein disulfide isomerase-overproducing cells were trans-
formed with a YEp vector carrying the gene for human platelet-derived
growth factor B, there was a 10-fold increase in the secretion of recombi-
nant protein over that of transformed cells with normal levels of protein
disulfide isomerase. Higher levels of secreted products are also obtained
for some recombinant proteins in S. cerevisiae cells that overexpress the
chaperone BiP.
Overexpression of the molecular chaperone BiP or protein disulfide
isomerase increases the secretion of some heterologous proteins; however,
overexpression of a single chaperone does not always have the desired
outcome. This is because proper protein folding requires the coordinated
efforts of many interacting factors (Fig. 6.20). Even when levels of one
chaperone are adequate, the levels of cochaperones or cofactors may be
limiting.
360 CHAPTER 6
Other Yeast Expression Systems

One of the major drawbacks of using S. cerevisiae is the tendency for the
yeast to hyperglycosylate heterologous proteins by adding 50 to 150 man-
nose residues in N-linked oligosaccharide side chains that can alter pro-
tein function. Also, proteins that are designed for secretion frequently are
retained in the periplasmic space, increasing the time and cost of purifica-
tion. Moreover, S. cerevisiae produces ethanol at high cell densities, which
is toxic to the cells and, as a consequence, lowers the quantity of secreted
protein. For these reasons, researchers have examined other yeast species
that could act as effective host cells for recombinant protein production.
Pichia pastoris is a yeast that is able to utilize methanol as a source of
energy and carbon. It is an attractive host for recombinant protein pro-
duction because glycosylation occurs to a lesser extent than in S. cerevisiae
and the linkages between sugar residues are of the α-1,2 type, which are
not allergenic to humans. With these characteristics as a starting point, a
P. pastoris strain was extensively engineered so that it glycosylates pro-
teins in a manner identical to that of human cells. Both human and yeast
cells add the same small (10-residue) branched oligosaccharide to nascent
proteins in the endoplasmic reticulum (Fig. 6.21). However, this is the
last common precursor between the two cell types, because once the pro-
tein is transported to the Golgi apparatus, further processing is differ-
ent. To create a “humanized” strain, the enzyme responsible for addition
of the α-1,6-mannose was eliminated from P. pastoris to prevent hyper-
mannosylation. Next, the gene encoding a mannose-trimming enzyme (a
mannosidase) from the filamentous fungus Trichoderma reesei was in-
serted into the yeast genome and was found to trim the oligosaccharide
to a human-like precursor. Genes encoding enzymes for the sequential
addition of sugar residues that terminate the oligosaccharide chains in
galactose were also added. It should be noted that the coding sequences
for all engineered genes contained a secretion signal for localization of the
encoded protein to the Golgi apparatus. Finally, several genes for proteins
that catalyze the synthesis, transport to the Golgi apparatus, and addi-
tion of sialic acid to the terminal galactose on the protein precursor were
inserted into the P. pastoris genome. Several properly sialylated recom-
binant proteins that can be used as human therapeutic agents have been
produced by the humanized P. pastoris.
During growth on methanol, enzymes required for catabolism of this
substrate are expressed at very high levels with alcohol oxidase, the first
enzyme in the methanol utilization pathway, encoded by the gene AOX1,
representing as much as 30% of the cellular protein. Transcription of
AOX1 is tightly regulated; in the absence of methanol, the AOX1 gene is
completely turned off, but it responds rapidly to the addition of methanol
to the medium. Therefore, the AOX1 promoter is an excellent candidate
for producing large amounts of recombinant protein under controlled
conditions. Moreover, the expression of the cloned target gene can be
timed to maximize recombinant protein production during large-scale
fermentations. In contrast to S. cerevisiae, P. pastoris does not synthesize
ethanol, which can limit cell yields; therefore, very high cell densities of P.
pastoris are attained, with the concomitant secretion of large quantities of
A D
B C E
Figure 6.21 Differential processing of glycoproteins in P. pastoris, humans, and hu-

manized P. pastoris. Initial additions of sugar residues to glycoproteins in the endo-
plasmic reticulum are similar in humans and P. pastoris cells (A). However, further N
glycosylation in the Golgi apparatus differs significantly between the two cell types.
N-glycans are hypermannosylated in P. pastoris (B), while in humans, mannose resi-
dues are trimmed and specific sugars are added, leading to termination of the oligo-
saccharide in sialic acid (C). P. pastoris cells have been engineered to produce enzymes
that process glycoproteins in a manner similar to that of human cells. In humanized
P. pastoris, a recombinant glycoprotein produced in the endoplasmic reticulum (D) is
transported to the Golgi apparatus, where it is further processed to yield a properly
sialylated glycoprotein (E). Blue squares, N-acetylglucosamine; red circles, mannose;
green squares, galactose; orange squares, sialic acid. Adapted from Hamilton and
Gerngross, Curr. Opin. Biotechnol. 18:387–392, 2007.
doi:10.1128/9781555818890.ch6.f6.21
recombinant protein. P. pastoris normally secretes very few proteins, thus

simplifying the purification of secreted recombinant proteins.
To avoid the problems of plasmid instability during long-term growth,
most P. pastoris vectors are designed to be integrated into the host genome,
usually within the AOX1 gene, the HIS4 gene for histidine biosynthesis,
or ribosomal DNA (rDNA). The P. pastoris expression system has been rDNA
used to produce more than 100 different biologically active proteins from ribosomal DNA
bacteria, fungi, invertebrates, plants, and mammals, including humans.
Authentic heterologous proteins for industrial and pharmaceutical uses
have also been generated in other yeasts, including the methanol-utilizing
(methylotrophic) yeast Hansenula polymorpha and the thermotolerant
dimorphic yeasts Arxula adeninivorans and Yarrowia lipolytica. It is often
necessary to try several host types in order to find the one that produces
the highest levels of a biologically active recombinant protein. Differences
in the processing and productivity of a particular protein can occur among
362 CHAPTER 6
ori E
Ampr
MCS
rDNA module ARS module Expression module Selection module

Figure 6.22 A wide-range yeast vector system for expression of heterologous genes in
several different yeast hosts. The basic vector contains a multiple-cloning site (MCS)
for insertion of selected modules containing appropriate sequences for chromosomal
integration (rDNA module), replication (ARS module), selection (selection module),
and expression (expression module) of a target gene in a variety of yeast host cells.
Sequences for maintenance (ori E) and selection (Ampr) of the vector in E. coli are also
included. doi:10.1128/9781555818890.ch6.f6.22
different yeast strains. The construction of a wide-range yeast vector for

expression in several fungal species has facilitated this trial-and-error
process (Fig. 6.22). The basic vector contains features for propagation
and selection in E. coli and a multiple-cloning site for insertion of inter-
changeable modules that are chosen for a particular yeast host, including
a sequence for vector integration into the fungal genome, a suitable origin
of replication, a promoter to drive expression of the heterologous gene,
and selectable markers. By selecting from a range of available modules,
customized vectors can be rapidly and easily constructed for expression
of the same gene in several different yeast cells to determine which host is
optimal for heterologous-protein production.
Baculovirus–Insect Cell Expression Systems

Baculoviruses are a large, diverse group of viruses that specifically in-
fect arthropods, including many insect species, and are not infectious to
other animals. During the infection cycle, two forms of baculovirus are
produced: budded and occluded (Fig. 6.23). The infection is initiated by
the occluded form of the virus. In this form, the viral nucleocapsids (viri-
ons) are clustered in a matrix that is made up of the protein polyhedrin.
The occluded virions packaged in this protein matrix are referred to as a
polyhedron and are protected from inactivation by environmental agents.
Once the virus is taken up into the midgut of the insect, usually through
ingestion of contaminated plant material, the polyhedrin matrix dissolves
due to the alkaline gut environment, and the virions enter midgut cells to A
begin the infection cycle in the nucleus. Within the insect midgut, the in-
fection can spread from cell to cell as viral particles (single nucleocapsids) Cell membrane
bud off from an infected cell and infect other midgut cells. The budding
form is not embedded in a polyhedrin matrix and is not infectious to
other individual insect hosts, although it can infect cultured insect cells.
Plaques produced in insect cell cultures by the budding form of baculovi-
Nucleocapsid
rus have a morphology different from that of the occluded form. During
the late stages of the infection cycle in the insect host, about 36 to 48 h
after infection, the polyhedrin protein is produced in massive quantities
and production continues for 4 to 5 days, until the infected cells rupture
and the host organism dies. Occluded virions are released and can infect B Polyhedron
new hosts. Polyhedrin
The promoter for the polyhedrin (polyh) gene can account for as
much as 25% of the mRNA produced in cells infected with the virus.
However, the polyhedrin protein is not required for virus production, so
replacement of the polyhedrin gene with a coding sequence for a heterol-
ogous protein, followed by infection of cultured insect cells, results in the
production of large amounts of the heterologous protein. Furthermore,
because of the similarity of posttranslational modification systems be-
tween insects and mammals, it was thought that the recombinant protein
Nucleocapsid
would mimic closely the authentic form of the original protein. Baculo-
viruses have been highly successful as delivery systems for introducing
Figure 6.23 Budded (A) and occluded
target genes for production of heterologous proteins in insect cells. More (B) forms of AcMNPV. During bud-
than a thousand different proteins have been produced using this system, ding, a nucleocapsid becomes enveloped
including enzymes, transport proteins, receptors, and secreted proteins. by the membrane of an infected cell. A
polyhedron consists of clusters of nucle-
The specific baculovirus that has been used extensively as an ex- ocapsids (occluded virions) embedded
pression vector is Autographa californica multiple nucleopolyhedrovirus in various orientations in a polyhedrin
(AcMNPV). A. californica (the alfalfa looper) and over 30 other insect matrix.
doi:10.1128/9781555818890.ch6.f6.23
species are infected by AcMNPV. This virus also grows well on many in-
sect cell lines. The most commonly used cell line for genetically engineered
AcMNPV is derived from the fall armyworm, Spodoptera frugiperda.
In these cells, the polyhedrin promoter is exceptionally active, and dur- AcMNPV
ing infections with wild-type baculovirus, high levels of polyhedrin are Autographa californica multiple
nucleopolyhedrosis virus
synthesized.
Baculovirus Expression Vectors

The first step in the production of a recombinant AcMNPV to deliver
the gene of interest into the insect host cell is to create a transfer vector.
The transfer vector is an E. coli-based plasmid that carries a segment
of DNA from AcMNPV (Fig. 6.24A) consisting of the polyhedrin pro-
moter region (without the polyhedrin gene) and an adjacent portion of
upstream AcMNPV DNA, a multiple-cloning site, the polyhedrin termi-
nation and polyadenylation signal regions, and an adjacent portion of
downstream AcMNPV DNA. The upstream and downstream AcMNPV
DNA segments included on the transfer vector provide regions for ho-
mologous recombination with AcMNPV. A target gene is inserted into the
multiple-cloning site between the polyhedrin promoter and termination
sequences, and the transfer vector is propagated in E. coli.
364 CHAPTER 6
A B
Baculovirus Baculovirus Modified baculovirus
DNA Promoter MCS TT DNA
Bsu36I Bsu36I
Gene 603 Polyhedrin Gene 1629

gene
Treat with
Bsu36I
Cleaved baculovirus
Bsu36I Bsu36I
p t
Gene 603 Gene of interest Gene 1629
Transfer vector
Recombinant baculovirus
p t
Gene 603 Gene of interest Gene 1629
Figure 6.24 Organization of the expression unit of a baculovirus (AcMNPV) transfer

vector. (A) The target gene of interest is inserted into the multiple-cloning site (MCS),
which lies between the polyhedrin gene promoter and polyhedrin gene transcription
termination sequences (TT). The baculovirus DNA upstream from the polyhedrin pro-
moter and downstream from the polyhedrin TT provides sequences for integration of
the expression unit by homologous recombination into the baculovirus genome. (B)
Production of recombinant baculovirus. Single Bsu36I sites are engineered into gene
603 and a gene (1629) that is essential for AcMNPV replication. These genes flank
the polyhedrin gene in the AcMNPV genome. After a baculovirus with two engineered
Bsu36I sites is treated with Bsu36I, the segment between the Bsu36I sites is deleted.
Insect cells are cotransfected with a Bsu36I-treated baculovirus DNA and a transfer
vector with a gene of interest under the control of the promoter (p) and terminator (t)
elements of the polyhedrin gene and the complete sequences of both genes 603 and
1629. A double-crossover event (dashed lines) generates a recombinant baculovirus
with a functional gene 1629. With this system, almost all of the progeny baculoviruses
are recombinant. doi:10.1128/9781555818890.ch6.f6.24
Next, insect cells in culture are cotransfected with AcMNPV DNA

and the transfer vector carrying the cloned gene. Within some of the dou-
bly transfected cells, a double-crossover recombination event occurs at
homologous polyhedrin gene sequences on the transfer vector and in the
AcMNPV genome, and the cloned gene with polyhedrin promoter and
termination regions becomes integrated into the AcMNPV DNA, with the
concomitant loss of the polyhedrin gene.
Unfortunately, the identification of occlusion-negative plaques is sub-
jective, and purification of recombinant baculovirus is tedious due to the
low frequency of recombination (∼0.1%) between the AcMNPV DNA
and the transfer plasmid. However, linearization of the AcMNPV ge-
nome before transfection into insect cells can dramatically increase the
frequency of recombinant plaques. The AcMNPV genome was engineered
with two Bsu36I sites that were placed on either side of the polyhedrin
gene (Fig. 6.24B). One is in gene 603, and the other is in gene 1629,
which is essential for viral replication. When DNA from this modified
baculovirus is treated with Bsu36I and transfected into insect cells, no vi-
ral replication occurs, because a segment of gene 1629 is missing. As part
of this system, a transfer vector is constructed with the gene of interest
between intact versions of gene 603 and gene 1629. This transfer vector
is introduced into insect cells that were previously transfected with lin-
earized, replication-defective AcMNPV DNA that is missing the segment
between the two Bsu36I sites. A double-crossover event both reestablishes
a functional version of gene 1629 and incorporates the cloned gene into
the AcMNPV genome (Fig. 6.24B). With this system, over 90% of the
baculovirus plaques are recombinant.
Integration of Target Genes into Baculovirus

by Site-Specific Recombination
To eliminate the need to use plaque assays to identify and purify recom-
binant viruses, several methods have been developed that introduce the
target gene into the baculovirus genome at a specific nucleotide sequence
by recombination. Transfection of insect cells is required only for the pro-
duction of the heterologous protein. AcMNPV DNA can be maintained in
E. coli as a plasmid known as a bacmid, which is a baculovirus–plasmid
hybrid molecule. In addition to AcMNPV genes, the bacmid contains an E.
coli origin of replication, a kanamycin resistance gene, and an integration
site (attachment site) that is inserted into the lacZ′ gene without impairing
its function (Fig. 6.25A). Another component of this system is the trans-
fer vector that carries the gene of interest cloned between the polyhedrin
promoter and a terminator sequence. In the transfer vector, the target gene
expression unit (expression cassette) and a gentamicin resistance gene are
flanked by DNA attachment sequences that can bind to the attachment
site in the bacmid (Fig. 6.25B). An ampicillin resistance (Ampr) gene lies Ampr
outside the expression cassette for selection of the transfer vector. ampicillin resistance
Bacterial cells carrying a bacmid are cotransformed with the transfer
vector and a helper plasmid that encodes the specific proteins (transposi-
tion proteins) that mediate recombination between the attachment sites
on the transfer vector and on the bacmid and that carries a tetracycline
366 CHAPTER 6
A
AcMNPV genome
Polyhedrin gene
Kanr
lacZ'att lacZ' gene ori E
5' 3'
E. coli plasmid
B
Bacmid
lacZ' lacZ' Kanr

5' att gene ori E 3'
Tetr gene
attR Genr gene p GOI t attL

Ampr gene Transposition genes
Helper plasmid
Transfer vector
C
Recombinant bacmid
* *
5' lacZ'attR Genr p GOI t attLlacZ' Kanr ori E 3'
gene gene
Figure 6.25 Construction of a recombinant bacmid. (A) An E. coli plasmid is incorpo-
rated into the AcMNPV genome by a double-crossover event (dashed lines) between
DNA segments (5′ and 3′) that flank the polyhedrin gene to create a shuttle vector
(bacmid) that replicates in both E. coli and insect cells. The gene for resistance to kan-
amycin (Kanr), an attachment site (att) that is inserted in frame in the lacZ′ sequence,
and an E. coli origin of replication (ori E) are introduced as part of the plasmid DNA.
(B) The transposition proteins encoded by genes of the helper plasmid facilitate the
integration (transposition) of the DNA segment of the transfer vector that is bounded
by two attachment sequences (attR and attL). The gene for resistance to gentamicin
(Genr) and a gene of interest (GOI) that is under the control of the promoter (p) and
transcription terminator (t) elements of the polyhedrin gene are inserted into the at-
tachment site (att) of the bacmid. The helper plasmid and transfer vector carry the
genes for resistance to tetracycline (Tetr) and ampicillin (Ampr), respectively. (C) The
recombinant bacmid has a disrupted lacZ′ gene (*). The right-angled arrow denotes
the site of initiation of transcription of the cloned gene after transfection of the re-
combinant bacmid into an insect cell. Cells that are transfected with a recombinant
bacmid are not able to produce functional β-galactosidase.
doi:10.1128/9781555818890.ch6.f6.25
resistance gene. After recombination, the DNA segment that is bounded

by the two attachment sites on the transfer vector (the expression cassette
carrying the target gene) is transposed into the attachment site on the
bacmid, destroying the reading frame of the lacZ′ gene (Fig. 6.25C). Con-
sequently, bacteria with recombinant bacmids produce white colonies in
the presence of IPTG and the chromogenic substrate 5-bromo-4-chloro-
3-indolyl-β-d-galactopyranoside (X-Gal). Moreover, white colonies that X-Gal
are resistant to kanamycin and gentamicin and sensitive to both ampi- 5-bromo-4-chloro-3-indolyl-
cillin and tetracycline carry only a recombinant bacmid and no transfer β-D-galactopyranoside
or helper plasmids. After all of these manipulations, the integrity of the
cloned gene can be confirmed by PCR. Finally, recombinant bacmid DNA
can be transfected into insect cells, where the cloned gene is transcribed
and the heterologous protein is produced.
Mammalian Glycosylation and Processing

of Precursor Proteins in Insect Cells
Although insect cells can process proteins in a manner similar to that of
higher eukaryotes, some mammalian proteins produced in S. frugiperda
cell lines are not authentically glycosylated. For example, insect cells do
not normally add galactose and terminal sialic acid residues to N-linked
glycoproteins. Where these residues are normally added to mannose res-
idues during the processing of some proteins in mammalian cells, insect
cells will trim the oligosaccharide to produce paucimannose (Fig. 6.26).
Consequently, the baculovirus system cannot be used for the production
of several important mammalian glycoproteins. To ensure the produc-
tion of humanized glycoproteins with accurate glycosylation patterns,
an insect cell line was constructed to express five different mammalian
glycosyltransferases.
Figure 6.26 N glycosylation of proteins in the Golgi apparatus of insect, human, and
humanized insect cells. While the sugar residues added to N-glycoproteins in the en-
doplasmic reticulum are similar in insect and human cells, further processing in the
Golgi apparatus yields a trimmed oligosaccharide (paucimannose) in insect cells and
an oligosaccharide that terminates in sialic acid in human cells. To produce recombi-
nant proteins for use as human therapeutic agents, humanized insect cells have been
engineered to express several enzymes that process human glycoproteins accurately.
Blue squares, N-acetylglucosamine; red circles, mannose; green squares, galactose; or-
ange squares, sialic acid. doi:10.1128/9781555818890.ch6.f6.26
Insects “Humanized” Humans

insect cell lines
Paucimannose Sialylated N-glycans

368 CHAPTER 6
Mammalian Cell Expression Systems

Currently, about half of the commercially available therapeutic proteins
are produced in mammalian cells. However, these cells are slow growing,
have more fastidious growth requirements than bacteria or yeast cells,
and can become contaminated with animal viruses. Chinese hamster
CHO ovary (CHO) cells and mouse myeloma cells are most commonly used for
Chinese hamster ovary long-term (stable) gene expression and when high yields of heterologous
proteins are required. About 140 recombinant proteins are currently ap-
proved for human therapeutic use, most produced in CHO cells that have
been adapted for growth in high-density suspension cultures, and many
more are in clinical trials. Although mammalian cells have been used for
some time to produce therapeutic proteins, current efforts are aimed at
improving productivity through the development of high-production cell
lines, increasing the stability of production over time, and increasing ex-
pression by manipulating the chromosomal environment in which the re-
combinant genes are integrated.
Vector Design
Many cloning vectors for the expression of heterologous genes in mam-
SV40 malian cells are based on simian virus 40 (SV40) DNA (Table 6.6) that
simian virus 40 can replicate in several mammalian species. However, its use is restricted
to small inserts because only a limited amount of DNA can be packaged
into the viral capsid. The genome of this virus is a double-stranded DNA
molecule of 5.2 kb that carries genes expressed early in the infection
cycle that function in the replication of viral DNA (early genes) and
genes expressed later in the infection cycle that function in the produc-
tion of viral capsid proteins (late genes). Other vectors are derived from
adenovirus, which can accommodate relatively large inserts; bovine pa-
pillomavirus, which can be maintained as a multicopy plasmid in some
mammalian cells; and adeno-associated virus, which can integrate into
specific sites in the host chromosome.
All mammalian expression vectors tend to have similar features and
are not very different in design from other eukaryotic expression vectors.
A typical mammalian expression vector (Fig. 6.27) contains a eukary-
otic origin of replication, usually from an animal virus. The promoter
sequences that drive expression of the cloned gene(s) and the selectable
marker gene(s), and the transcription termination sequences (polyadeny-
lation signals), must be eukaryotic and are frequently taken from either
Table 6.6 Genomes of some animal viruses that are used as cloning vectors in
mammalian cells in culture
Virus Genome Genome size (kb)
SV40 Double-stranded DNA 5.2
Adenovirus Double-stranded DNA 26–45
Bovine papillomavirus Double-stranded DNA 7.3–8.0
Adeno-associated virus Single-stranded DNA 4.8
Epstein–Barr virus Double-stranded DNA 170
p I MCS pa TT p SMG pa TT
ori euk ori E Ampr

gene
Figure 6.27 Generalized mammalian expression vector. The multiple-cloning site
(MCS) and selectable marker gene (SMG) are under the control of eukaryotic pro-
moter (p), polyadenylation (pa), and termination-of-transcription (TT) sequences.
An intron (I) enhances the production of heterologous protein. Propagation of the
vector in E. coli and mammalian cells depends on the origins of replication ori E
and ori euk, respectively. The Ampr gene is used for selecting transformed E. coli.
doi:10.1128/9781555818890.ch6.f6.27
human viruses or mammalian genes. Strong constitutive promoters and

efficient polyadenylation signals are preferred. Inducible promoters are
often used when continuous synthesis of the heterologous protein may
be toxic to the host cell. Expression of a target gene is often increased
by placing the sequence for an intron between the promoter and the
multiple-cloning site within the transcribed region. Sequences required
for selection and propagation of a mammalian expression vector in E. coli
are derived from a standard E. coli cloning vector.
For the best results, a gene of interest must be equipped with trans-
lation control sequences (Fig. 6.28). Initiation of translation in higher
eukaryotic organisms depends on a specific sequence of nucleotides sur-
rounding the start (AUG) codon in the mRNA called the Kozak sequence,
i.e., GCCGCC(A or G)CCAUGG in vertebrates. The corresponding
DNA sequence for the Kozak sequence, which is often followed by a sig-
nal sequence to facilitate secretion, a protein sequence (tag) to enhance
the purification of the heterologous protein, and a proteolytic cleavage
sequence that enables the tag to be removed from the recombinant pro-
tein, is placed at the 5′ end of the gene of interest. A stop codon is
added to ensure that translation ceases at the correct location. Finally,
the sequence content of the 5′ and 3′ untranslated regions (UTRs) is UTR
important for efficient translation and mRNA stability. Either synthetic untranslated region
5′ and 3′ UTRs or those from the human β-globin gene are used in most
Figure 6.28 Translation control elements. A target gene can be fitted with various se-
quences that enhance translation and facilitate both secretion and purification, such as
a Kozak sequence (K), signal sequence (S), protein affinity tag (T), proteolytic cleavage
site (P), and stop codon (SC). The 5′ and 3′ UTRs increase the efficiency of translation
and contribute to mRNA stability. doi:10.1128/9781555818890.ch6.f6.28
5' UTR K S T P Target gene SC 3' UTR
370 CHAPTER 6
mammalian expression vectors. The codon content of the target gene

may also require modification to suit the codon usage of the host cell.
The majority of mammalian cell expression vectors carry a single
gene of interest. However, the active form of some important proteins
consists of two different protein chains. The in vivo assembly of dimeric
and tetrameric proteins is quite efficient. Consequently, various strategies
have been devised for the production of two different recombinant pro-
teins within the same cell.
Single vectors that carry two cloned genes provide the most efficient
means of producing heterodimeric or tetrameric protein. The two genes
may be placed under the control of independent promoters and polyade-
nylation signals (double-cassette vectors) (Fig. 6.29A), or, to ensure that
equal amounts of the proteins are synthesized, vectors (bicistronic vectors)
can be constructed with the two cloned genes separated from each other
IRES by a DNA sequence that contains an internal ribosomal entry site (IRES)
internal ribosomal entry site (Fig. 6.29B). IRESs are found in mammalian virus genomes, and after
transcription, they allow simultaneous translation of different proteins
from a polycistronic mRNA molecule. Transcription of a “gene α–IRES–
gene β” construct is controlled by one promoter and polyadenylation sig-
nal. Under these conditions, a single “two-gene” (bicistronic) transcript
is synthesized, and translation proceeds from the 5′ end of the mRNA to
produce chain α and internally from the IRES element to produce chain
β (Fig. 6.29B).
Selectable Markers for Mammalian Expression Vectors

For the most part, the systems that are used to select transfected mam-
malian cells are the same as those for other eukaryotic host cells. In addi-
tion, a number of bacterial marker genes have been adapted for eukaryotic
cells. For example, the bacterial neo gene, which encodes neomycin phos-
photransferase, is often used to select transfected mammalian cells. How-
ever, in eukaryotic cells, G-418 (Geneticin), which is phosphorylated by
neomycin phosphotransferase, replaces neomycin as the selective agent be-
cause neomycin is not an effective inhibitor of eukaryotic protein synthesis.
Some selection schemes are designed not only to identify transfected
cells but also to increase heterologous-protein production by amplifying
the copy number of the expression vector. For example, dihydrofolate
reductase catalyzes the reduction of dihydrofolate to tetrahydrofolate,
which is required for the production of purines. Sensitivity to methotrex-
ate, a competitive inhibitor of dihydrofolate reductase, can be overcome
if the cell produces excess dihydrofolate reductase. As the methotrexate
concentration is increased over time, the dihydrofolate reductase gene
in cultured cells is amplified. In fact, methotrexate-resistant cells can
have hundreds of dihydrofolate reductase genes. The standard dihydro-
folate reductase–methotrexate protocol entails transfecting dihydrofolate
reductase-deficient cells with a vector carrying a dihydrofolate reductase
gene as the selectable marker gene and treating the cells with methotrex-
ate. After the initial selection of transfected cells, the concentration of
methotrexate is gradually increased, and eventually cells with very high
copy numbers of the expression vector are selected.
A Ampr
ori euk ori E gene p SMG pa TT
p1 Gene α pa1/TT p2 Gene β pa2/ TT
RNA
Protein subunits α β
Figure 6.29 (A) A two-gene expres-

Assembled protein α β sion vector. The cloned genes (gene α
and gene β) encode subunits of a pro-
tein dimer (αβ). The cloned genes are
inserted into a vector and are under
the control of different eukaryotic pro-
moter (p), polyadenylation (pa), and
B Ampr
ori euk ori E gene p SMG pa TT termination-of-transcription (TT) se-
quences. Each subunit is translated from
a separate mRNA, and a functional pro-
tein dimer is assembled. The vector has
origins of replication for E. coli (ori E)
and mammalian cells (ori euk), a marker
gene (Ampr) for selecting transformed
E. coli, and a selectable marker gene
p Gene α IRES Gene β pa/ TT
(SMG) that is under the control of eu-
karyotic promoter (p), polyadenylation
(pa), and TT sequences. (B) A bicistronic
expression vector. Each cloned gene is
inserted into a vector on either side of
an IRES sequence; together they form a
RNA transcription unit under the control of a
single eukaryotic promoter (p), polyade-
nylation (pa), and TT sequences. Trans-
lation of the mRNA occurs from the 5′
end and internally (right-angled arrows).
Protein subunits α β Both subunits are synthesized and assem-
bled into a functional protein dimer. The
vector carries origins of replication for
E. coli and mammalian cells, a selectable
marker for selecting transformed E. coli,
and a selectable marker gene that is un-
der the control of eukaryotic promoter,
α β polyadenylation, and TT sequences.
Assembled protein
doi:10.1128/9781555818890.ch6.f6.29
372 CHAPTER 6
Stress
Figure 6.30 Strategy to increase yields
of recombinant mammalian cells.
Cell death (apoptosis), stimulated by
the transcription factor p53, can lead
to decreased yields of recombinant p53 p53 MDM2
mammalian cells grown under stress-
ful conditions in large bioreactors. To
prevent cell death, the gene encoding
MDM2 (the mouse double-mutant 2
protein) is introduced into mammalian Target Apoptosis Target Apoptosis MDM2
cells. The MDM2 protein binds to p53 gene genes gene genes gene
and prevents it from inducing expres-
sion of proteins required for apopto-
sis. Engineered cells not only showed
delayed cell death but also achieved
higher cell densities in bioreactors. Cell death Delayed cell death
doi:10.1128/9781555818890.ch6.f6.30 Higher cell densities
Engineering Mammalian Cell Hosts for Enhanced Productivity

In large-scale bioreactors, depleted nutrients and accumulation of toxic cell
waste can limit the viability and density of cells as they respond to stress by
inducing cell death, also known as apoptosis. One method to improve cell
growth and viability under culture conditions in bioreactors is to prevent
the tumor suppressor protein p53, which is a transcription factor, from
activating the cell death response pathway. The mouse double-mutant 2
MDM2 protein (MDM2) binds to protein p53 and prevents it from acting as a
mouse double-mutant 2 protein transcription factor (Fig. 6.30). MDM2 also marks p53 for degradation.
When mammalian cells were transfected with plasmids containing a regu-
latable MDM2 gene and cultured under conditions that mimicked the late
stages of cell culture and in nutrient-limited medium, cultures expressing
MDM2 had higher cell densities and delayed cell death compared to those
of nontransfected cells, especially in nutrient-deprived medium.
Many cultured mammalian cells are unable to achieve high cell den-
sities in cultures because toxic metabolic products accumulate in the cul-
ture medium and inhibit cell growth. Many cells secrete the acidic waste
Glucose
product lactate as they struggle to obtain energy from glucose. Under
these conditions, pyruvate is converted to lactate by lactate dehydroge-
nase rather than entering into the TCA cycle, where it is further oxidized
Glycolysis
through the activity of pyruvate carboxylase (Fig. 6.31). To counteract the
acidification of the medium from lactate secretion, the human pyruvate
Lactate carboxylase gene was cloned into an expression vector under the control
dehydrogenase of the cytomegalovirus promoter and the SV40 polyadenylation signals
Pyruvate Lactate
and transfected into CHO cells. When the pyruvate carboxylase gene was
Pyruvate
carboxylase
Figure 6.31 When oxygen is present, pyruvate, which is formed from glucose during
glycolysis, is converted by the enzyme pyruvate carboxylase to an intermediate com-
pound in the TCA cycle. This metabolic pathway is important for the generation of
cellular energy and for the synthesis of biomolecules required for cell proliferation.
TCA cycle
However, under low-oxygen conditions, such as those found in large bioreactors, pyru-
vate carboxylase has a low level of activity. Under these conditions, lactate dehydroge-
nase converts pyruvate into lactate, which yields a lower level of energy. Cultured cells
secrete lactate, thereby acidifying the medium. doi:10.1128/9781555818890.ch6.f6.31
stably integrated into the CHO genome and expressed, the enzyme was
detected in the mitochondria, where glucose is degraded. After 7 days
in culture, the rate of lactate production decreased by up to 40% in the
engineered cells.
Many eukaryotic DNA viruses from which the vectors used in mam-
malian cells are derived maintain their genomes as multicopy episomal
DNA (plasmids) in the host cell nucleus. These viruses produce proteins,
such as the large-T antigen in SV40 and the nuclear antigen 1 protein in
Epstein–Barr virus, that help to maintain the plasmids in the host nucleus
and to ensure that each host cell produced after cell division receives a
copy of the plasmid. To increase the copy number of the target gene by in-
creasing the plasmid copy number, HEK 293 cells have been engineered to
express the SV40 large-T antigen or Epstein–Barr virus nuclear antigen 1.
Many proteins of therapeutic value are secreted. However, the high
levels of these proteins that are desirable from a commercial standpoint
can overwhelm the capacity of the cell secretory system. Thus, protein
processing is a major limiting step in the achievement of high recombi-
nant protein yields. Researchers have therefore engineered cell lines with
enhanced production of components of the secretion apparatus. In this
regard, an effective strategy may be to simultaneously overexpress several,
if not all, of the proteins that make up the secretory mechanism. This can
be achieved through the enhanced production of the transcription factor
X box protein 1 (Xbp-1), a key regulator of the secretory pathway. Nor- Xbp-1
mally, full-length, unspliced xbp-1 mRNA is found in nonstressed cells X box protein 1
and is not translated into a stable, functional protein (Fig. 6.32). However,
Figure 6.32 Strategy to increase yields of secreted recombinant proteins from mam-
malian cells by simultaneously upregulating the expression of several proteins in the
secretion apparatus. The expression of chaperones and other proteins of the secre-
tion apparatus is controlled by the transcription factor Xbp-1. In nonstressed cells,
the intron is not cleaved from the xbp-1 transcript, and therefore, functional Xbp-1
transcription factor is not produced. In stressed cells with accumulated misfolded
proteins, an endoribonuclease cleaves the transcript to remove the intron and yield
mature xbp-1 mRNA that is translated into transcription factor Xbp-1. Recombinant
CHO cells transfected with a gene including only the xbp-1 exons overproduced a
functional Xbp-1 transcription factor that directed the production of high levels of
proteins required for protein secretion. doi:10.1128/9781555818890.ch6.f6.32
Unstressed cells Stressed cells Engineered cells
Exon Intron Exon Exon Intron Exon Exon Exon

DNA
RNA
No mRNA and no
mRNA production of Xbp-1
transcription factor
Production of Xbp-1 Overproduction of Xbp-1

transcription factor transcription factor
374 CHAPTER 6
when unfolded or misfolded proteins accumulate in the endoplasmic re-

ticulum, a ribonuclease is activated that specifically cleaves xbp-1 mRNA
(Fig. 6.32). This results in the production of a functional transcription
factor that activates the expression of a number of proteins of the secre-
tion apparatus. An xbp-1 gene without its intron encodes an active form
of xbp-1 mRNA (Fig. 6.32), so the Xbp-1 protein is overproduced. The
overexpression of Xbp-1 facilitates the secretion of human erythropoie-
tin, human γ-interferon, and human monoclonal antibodies, especially in
cell lines engineered to express the target proteins transiently.
Directed Mutagenesis
It is possible with recombinant DNA technology to isolate the gene (or
cDNA) for any protein that exists in nature, to express it in a specific
host organism, and to produce a purified product. Unfortunately, the
properties of some of these “naturally occurring” proteins are sometimes
not well suited for a particular end use. On the other hand, it is some-
times possible, using traditional mutagenesis (often ionizing radiation or
DNA-altering chemicals) and selection schemes, to create a mutant form
of a gene that encodes a protein with the desired properties. However, in
practice, the mutagenesis–selection strategy only very rarely results in any
significant beneficial changes to the targeted protein, because most amino
acid changes decrease the activity of a target protein.
By using a variety of different directed mutagenesis techniques that
change the amino acids encoded by a cloned gene, proteins with prop-
erties that are better suited than naturally occurring counterparts can be
created. For example, using directed mutagenesis techniques, it is possible
to change the specificity, stability, or regulation of target proteins.
Determining which amino acids of a protein should be changed to
attain a specific property is much easier if the three-dimensional structure
of the protein, or a similar protein, has been characterized by X-ray crys-
tallographic analysis. But for many proteins, such detailed information is
often lacking, so directed mutagenesis becomes a trial-and-error strategy
in which changes are made to those nucleotides that are most likely to
yield a particular change in a protein property. Moreover, it is not always
possible to know in advance which individual amino acid(s) contributes
to a particular physical, biological, or chemical property. Regardless of
what types of alterations are made to a target gene, the protein encoded
by each mutated gene has to be tested to ascertain whether the mutagene-
sis process has indeed generated the desired activity change.
Oligonucleotide-Directed Mutagenesis with M13 DNA

Oligonucleotide-directed mutagenesis (site-specific mutagenesis) is
a straightforward method for producing defined point mutations in a
cloned gene (Fig. 6.33). For this procedure, the investigator must know
the precise nucleotide sequence in the region of DNA that is to be
changed and the amino acid changes that are being introduced. In the
original version of this method, the cloned gene was inserted into the

2014 Glick Medical Biotechnology CH 6 PDF

Uploaded by

Copyright:

Available Formats

2014 Glick Medical Biotechnology CH 6 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2014 Glick Medical Biotechnology CH 6 PDF

Uploaded by

Copyright:

Available Formats

6

Construction of Biologically Functional Bacterial milestone

Manipulating Gene Expression in Prokaryotes

Table 6.1 Production of recombinant human proteins in various biological hosts

plac olac T7 RNA TT pT7 olac Target gene TT

placI lacI gene TT

operator and prevents T7 RNA polymerase from being made before it

arrangement of regulatory regions, there is no synthesis of the target gene.

Figure 6.2 Attenuation of the activity of T7 RNA polymerase by the inhibitor T7

Table 6.2 Genetic code and codon usage in E. coli and humans

B E. coli host cell

C E. coli host cell

Nontransformed E. coli Transformed E. coli

No growth Significant growth

not attained, this work is an important first step in the development of

P RBS Fusion partner Target protein gene TT

Figure 6.6 Schematic representation of a DNA construct encoding a fusion protein. A

use of pharmaceuticals, it is necessary to remove the marker peptide if the

Figure 6.9 Schematic representation of

• When a foreign protein is overexpressed and then exported from

of the limiting amino acid. In addition, translational accuracy, which de-

in the presence of either an antibiotic or an essential metabolite that en-

Figure 6.10 Insertion of a foreign gene 1 Chromosomal DNA

Target gene Plasmid DNA

Figure 6.11 Removal of a selectable marker gene following integration of plasmid

Table 6.4 Yields of several secreted recombinant proteins produced in different

MBP signal Figure 6.13 Engineering the secretion of

Target gene Bacteriocin

Figure 6.14 Schematic representation of a recombinant bacteriocin release protein

Overcoming Oxygen Limitation

can be used to improve growth as well as foreign gene expression in a

Acetolactate Pyruvate Formate

Lactate Acetate Ethanol

Figure 6.16 Schematic representation of the pathways for glucose metabolism in an

that is to be overexpressed in E. coli) is introduced on a second plasmid

Pyruvate Acetyl-CoA Acetate

Table 6.5 E. coli proteins that facilitate the correct folding of

in E. coli, a detailed understanding of the roles of many of these proteins

Heterologous Protein Production in Eukaryotic Cells

Synthesis of Rabbit β-Globin in Cultured Monkey Kidney Cells milestone

The basic requirements for expression of a target protein in a eukary-

Other amino acid modifications include phosphorylation, acetylation,

Saccharomyces cerevisiae Expression Systems

with a functional version of a gene that complements the mutated gene

Secretion of Heterologous Proteins by S. cerevisiae

TRP1 ARS CEN

T ori E Amp r TRP1 ARS CEN URA3 T

protein folding occurs in the endoplasmic reticulum in eukaryotes and is

around this problem includes the overproduction of molecular chap-

Other Yeast Expression Systems

Figure 6.21 Differential processing of glycoproteins in P. pastoris, humans, and hu-

recombinant protein. P. pastoris normally secretes very few proteins, thus

rDNA module ARS module Expression module Selection module

different yeast strains. The construction of a wide-range yeast vector for

Baculovirus–Insect Cell Expression Systems

Baculovirus Expression Vectors

Gene 603 Polyhedrin Gene 1629

Figure 6.24 Organization of the expression unit of a baculovirus (AcMNPV) transfer

Next, insect cells in culture are cotransfected with AcMNPV DNA