Lab 2 Degenerate PCR - Background

Cloning and Sequencing Explorer Series
CHAPTER 2: GAPDH PCR
BACKGROUND
CHAPTER 2
Background
PCR
Polymerase chain reaction (PCR) is a technique for rapidly generating multiple copies of a segment
of DNA utilizing repeated cycles of DNA synthesis. PCR has revolutionized molecular biology
and forensics, allowing amplification of small quantities of DNA into amounts that can be used
for experimentation or for forensic testing. Kary Mullis, who later won a Nobel Prize for his work,
developed PCR in 1983. The subsequent discovery of a DNA polymerase that is stable at high
temperatures and the introduction of thermal cyclers, instruments that automate the PCR process,
brought the procedure into widespread use in the late 1980s.
From trace amounts of the DNA used as starting material (template), PCR produces exponentially
larger amounts of a specific piece of DNA. The template can be any form of DNA, and only a single
molecule of DNA is needed to generate millions of copies. PCR makes use of two normal cellular
activities: 1) binding of complementary strands of DNA, and 2) replication of DNA molecules by DNA
polymerases.
DNA Structure
DNA strands are polymers of nucleotides, molecules comprising a sugar, a phosphate group, and one
of four bases: adenine, thymine, guanine, or cytosine (A, T, G, or C). The sugars and phosphates form
the backbone of the DNA polymers. Each sugar has five carbons, making it a pentose. Each sugar
is actually a deoxyribose because it has a hydrogen instead of a hydroxyl group at carbon number 2
(in RNA, the sugar is ribose, as it has the hydroxyl group). Each carbon in the sugar is numbered (see
figure), and the numbering is the source of the 3' and 5' nomenclature used for DNA. For example,
the 5'-phosphate is the phosphate to which the next nucleotide will be attached to the DNA molecule,
and it is called 5' because the phosphate group is attached to carbon number 5 of the sugar.
Bottom, chemical structure of deoxyribose sugar. Carbon numbering is labeled by the orange circles. (Top) Chemical structure of
deoxyribose nucleic acid (DNA).
GAPDH PCR 39
BACKGROUND
CHAPTER 2
Each base forms hydrogen bonds with its complementary base, A with T (two hydrogen bonds)
and G with C (three hydrogen bonds). These pairings of A-T and G-C are called base pairs.
Double-stranded DNA consists of two complementary strands of DNA held together by hydrogen
bonding between the base pairs. The two strands are antiparallel, meaning that the strands are
oriented in opposite directions. One strand, the sense or coding strand, has bases running 5' to 3',
and the second strand, the antisense strand, has the complementary bases running 3' to 5'. When
DNA is transcribed, the antisense strand serves as the template for synthesis of messenger RNA
(mRNA). The mRNA will have the same sequence as the sense or coding strand of DNA (with uracil
instead of thymine).
Base pairs of DNA. Cytosine base pairs with guanine by three hydrogen bonds. Likewise thymine base pairs with adenine by two hydrogen bonds.
DNA Replication
DNA replication is an essential part of life. As cells divide, DNA must be duplicated, and the new
DNA molecules must be exact copies of the original DNA. DNA polymerases are enzymes that
synthesize the new DNA strands, and they are found in all cells. DNA polymerases link together
free nucleotides in the order determined by the template DNA that the polymerase follows. The new
strand will be complementary to the template strand. In other words, each base of the new strand is
the complement of the base in the template strand. For each A in the template, the new strand will
have a T. For each G in the template, the new strand will have a C, etc. Since a DNA polymerase
can use only single-stranded DNA as a template (and since it can synthesize DNA in only one
direction, 5' to 3'), double-stranded DNA must be uncoiled and the strands separated before the
DNA can be replicated.
40 GAPDH PCR
DNA polymerase also needs a signal to determine where to start synthesis. This primer is a short
BACKGROUND
strand of nucleotides that binds to the template DNA at the starting point and becomes the 5' end
CHAPTER 2
of the new DNA strand. In DNA replication in cells, the primers are small RNA molecules, but for
PCR in the lab, the primers are DNA molecules.
In its essentials, DNA replication sounds simple: unwind the double-stranded template, bind a
primer to each strand to give the DNA polymerase a starting point, and the enzyme will produce
replicated DNA strands. In reality, the process is much more complicated. There are as many as 40
proteins involved in DNA replication in eukaryotes. Without detailing all of the proteins involved, the
basic steps of DNA replication are:
DNA replication fork.
1. Template DNA strands begin to separate at the origin of replication. An enzyme called DNA
helicase breaks the hydrogen bonds between the base pairs to separate the strands. The point
where the two strands separate is called the replication fork.
2. As the strands unwind and separate, the DNA ahead of the replication fork starts to form
supercoils. An enzyme named topoisomerase moves ahead of the replication fork, nicking single
strands of the double-stranded DNA and relaxing the supercoiled structure.
3. To keep the two strands from reannealing (binding to each other again), single-stranded DNA-
binding proteins bind to each of the separated strands.
4. Since DNA polymerase can add nucleotides only to the 3' end of an existing nucleotide, an
enzyme named RNA primase binds to each of the template DNA strands and assembles a
short primer of RNA. (The RNA primer will later be removed and replaced by DNA in the new
strands.)
5. DNA polymerase begins to synthesize DNA by adding new nucleotides to the RNA primers.
GAPDH PCR 41
BACKGROUND
CHAPTER 2
6. Since DNA polymerase can synthesize DNA in only one direction, from 5' to 3', synthesis
actually proceeds differently on the two template strands. On the 3' to 5' template strand, called
the leading strand, DNA synthesis proceeds continuously, moving toward the replication fork.
On the second strand, called the lagging strand, synthesis moves away from the replication fork
and is discontinuous. DNA on the lagging strand is synthesized in short pieces (100 to 2,000
bases) called Okazaki fragments. After the Okazaki fragments are synthesized, they are joined
together by DNA ligase.
7. Although DNA polymerase is a high-fidelity enzyme, meaning that it makes few mistakes in
replicating the bases, it does make some mistakes. In eukaryotic replication, the error rate is one
mistake in every 10,000 to 100,000 base pairs. Many DNA polymerases also have proofreading
activity, which means that they can find mistakes and correct them as the enzyme moves along
the template.
8. DNA replication in eukaryotes does not begin at a single origin of replication, but at numerous
locations along a DNA molecule. Origins of replication are found about every 100 kilobases
in eukaryotic cells. Mammalian cells are estimated to have ~30,000 origins of replication. In
addition, at each origin of replication, actually two replication forks form that head in opposite
directions, and, although the description above refers to replication at only one fork, replication
occurs simultaneously (and in the opposite direction) at the other fork.
DNA replication uses multiple origins of replication.
Replication moves along the template DNA molecule until the replication fork meets a fork coming
from the opposite direction.
Each replication of DNA produces two strands of DNA, each identical to the original strand.
Eukaryotic DNA replication is called semiconservative because each double-stranded product
consists of one original strand and one newly synthesized strand.
42 GAPDH PCR
PCR Step by Step
BACKGROUND
The strength of PCR lies in its ability to make many copies of (amplify) a single region (target) of a
CHAPTER 2
longer DNA molecule. For example, a researcher wanting to study a single human gene needs to
amplify only that portion from the enormous human genome of approximately 3.3 x 109 base pairs!
The first step is to identify and sequence areas upstream and downstream from the DNA of interest.
Once this is done, short strands of DNA that are complementary to the upstream and downstream
DNA are synthesized. As in cellular DNA replication, these oligonucleotide primers are used as the
starting point for copying the DNA of interest, but the primers used in PCR are DNA oligonucleotides,
not RNA.
Taq DNA polymerase. Originally, the

DNA synthesis step of PCR was performed
at 37°C using DNA polymerase from the
bacterium E. Coli, but the ezyme was
inactivated during the high-temperature
denaturation step in each cycle. So, the
enzyme had to be added anew during each
cycle. The 1988 discovery of a thermally
stable DNA polymerase brought PCR into
the mainstream. Taq DNA polymerase was
isolated from Thermus aquaticus, thermophilic bacteria that live in hot springs in Yellowstone
National Park. Since the hot springs frequently approach boiling temperatures, T.
aquaticus and other bacterial species that live in these waters must have enzymes that
are functional at high temperatures, so the DNA polymerase from T. aquaticus is not
inactivated by the denaturation step in PCR.
Since the discovery of Taq, several other heat-stable DNA polymerases have been isolated.
Taq has a drawback for DNA synthesis in PCR, which is that it lacks a proofreading
mechanism to catch and correct errors in the new DNA strand. Therefore, Taq is said to have
low replication fidelity. In 1991, scientists discovered and characterized Pfu DNA polymerase
from Pyrococcus furiosus, a thermophilic type of Archaebacteria. Pfu DNA polymerase has the
proofreading capacity that Taq lacks, so Pfu generates fewer errors in the new DNA strands.
Subsequently, a number of companies have developed modified versions of DNA
polymerases. For example, Bio-Rad’s iProof™ polymerase, which is a DNA polymerase with
proofreading capability similar to Pfu, is fused to a protein that binds double-stranded DNA.
PCR involves a repetitive series of cycles, each of which consists of template denaturation,
primer annealing (binding to the template DNA strand), and extension of the annealed primer by a
heat-stable DNA polymerase.
GAPDH PCR 43
BACKGROUND
CHAPTER 2
All of the components needed for PCR are mixed in a microcentrifuge tube. They are:
• Template DNA
• Taq DNA polymerase (or another thermally stable DNA polymerase)
• Primers — synthesized to complement a specific region on the template DNA. The two primers
in a pair are designed to anneal to opposite ends of the region of interest. The primers are added
in excess (that is, there are many more primer molecules than template molecules in the reaction
tube)
• Nucleotides — the four individual bases in the form of deoxynucleoside triphosphates (dNTPs),
which allows them to be added to a DNA polymer. The dNTP mixture includes the same amounts
of dATP, dTTP, dGTP, and dCTP
• Reaction buffer — prepared with the correct ionic strength of monovalent and divalent cations
needed for the reaction and buffered to maintain the pH needed for enzyme activity
The microcentrifuge tubes are specialized tubes used only for PCR. PCR tubes are plastic with very
thin walls, allowing rapid transfer of heat through the plastic, and the tubes usually hold only 0.2 or
0.5 ml. The PCR reaction tubes are placed in a thermal cycler, an instrument developed in 1987
that automates the heating and cooling cycles needed during PCR. Thermal cyclers contain a metal
block with holes for the PCR tubes. The metal block can be heated or cooled very rapidly. Thermal
cyclers are programmable, so they can store the PCR reaction parameters (temperatures, time at
each temperature, and number of cycles). This means that the user can just load the samples and
push a button to run the reactions. Contrast this to early researchers who had to sit by a series of
water baths with a timer, manually switching the tubes from one temperature to another for hours!
The first step of the PCR reaction is the denaturation step. Since DNA polymerase can use only
single-stranded DNA as a template, the first step of PCR is uncoiling and separating the two
strands of the template DNA. In cells, enzymes such as helicase and topoisomerase do this work,
but in PCR, heat is used to separate the strands. When double-stranded DNA is heated to 95°C,
the strands separate, or denature. Since complete denaturation of the template DNA is essential
for successful PCR, the first step is frequently an extended denaturation period of 2–5 minutes.
The initial denaturation is longer than subsequent denaturation steps because the template DNA
molecules are longer than the PCR product molecules that must be denatured in subsequent
cycles. Denaturation steps in subsequent PCR cycles are normally 30–60 seconds.
The thermal cycler then rapidly cools the reactions to 40–60°C to allow the primers to anneal to
the separated template strands. The temperature at which the primers anneal to the template
DNA depends on several factors, including primer length, the G–C content of the primer, and
the specificity of the primer for the template DNA. If the primer sequences match the template
sequences exactly, the primers will anneal to the template DNA at a higher temperature. As
the annealing temperature is lowered, primers will bind to the template DNA at sites where the
two strands are not exactly complementary. In many cases, these mismatches will cause the
strands to dissociate as the temperature rises after the annealing step, but they can also result
in amplification of DNA other than the target.
In the annealing step, the two original strands may reanneal to each other, but the primers are in
such excess that they outcompete the original DNA strands for the binding sites.
44 GAPDH PCR
BACKGROUND
Template
CHAPTER 2
Exact match of
primer and
template
Primer
Template
Mismatch of
primer and
template
Primer
Mismatched base pairs affect DNA annealing.
The final step is extension, in which the reaction is heated to 72°C, the optimal temperature for Taq
DNA polymerase to extend the primers and make complete copies of each template DNA strand.
Example of thermal cycling profile. In this profile an initial denaturation step of 95°C for 5 min is followed by 40 cycles of 1-min denaturation,
1-min annealing, and 2-min extension. A final 6-min extension time is added to ensure completion of DNA synthesis. The final hold ensures
samples are kept stable until retrieved.
At the end of the first PCR cycle (one round of denaturation, annealing, and extension steps define
one cycle), there are two new strands for each original double-stranded template, which means there
is twice as much template DNA for the second cycle of PCR. As the cycle is repeated, the number of
strands doubles with each reaction. After 35 cycles, there will be over 30 billion times more copies of
the target sequence than at the beginning. The number of cycles needed for amplification depends
on the amount of template DNA and the efficiency of the reaction, but reactions are frequently run for
30–40 cycles.
PCR generates DNA of a precise length and sequence. During the first cycle, primers anneal to the
original template DNA strands at opposite ends and on opposite strands. After the first cycle, two
new strands are generated that are shorter than the original template strands but still longer than
the target DNA, because the original template sequence continues past the location where the
other primer binds. It isn’t until the third PCR cycle that fragments of the precise target length are
generated.
GAPDH PCR 45
BACKGROUND
CHAPTER 2
First three cycles of PCR. PCR takes three cycles before a product of the correct length is generated.
46 GAPDH PCR
Primer Design
BACKGROUND
Probably the most important variable in PCR is the design of the primers, which will determine
CHAPTER 2
whether or not the correct piece of DNA is amplified. Primers are short, single-stranded
oligonucleotides, synthesized in the laboratory and designed to bind to the DNA template strands at
the ends of the sequence of interest. (Actually, very few laboratories prepare their own primers, as
there are many companies that make primers to order — quickly, cheaply, and accurately.) Primer
design is usually the responsibility of the researcher, although there are computer programs and
websites that assist in this process. Normally, two different primers are needed, one for each of the
complementary strands of template DNA.
Factors to be considered in designing primers include:
• Length — primers of 18–30 nucleotides are likely to be specific for their target sequence. In other
words, primers of that length are less likely to bind to sites on the template DNA other than the
sites for which they were designed
• Melting temperature of primers (Tm) — the Tm is the temperature at which half the primers
dissociate from the target DNA. It is important that the two primers used in each PCR reaction
have similar melting temperatures (within 5°C). If the Tm values are very different, the primers will
not bind equally during the annealing stage. Tm is a function of length and GC content, because
more energy is required to dissociate the three hydrogen bonds between G and C compared to
the energy to dissociate the two hydrogen bonds between A and T. The Tm for primers around
18–24 bases in length can be estimated from their nucleotide content using the formula:
Tm = 2°C (A+T) + 4°C (G+C)
Another formula for Tm determination is:
Tm = 81.5 + 16.6 (log10[I] ) + 0.41 (%G+C) – 600/n)
where I is the molar concentration of monovalent cations and n is the number of bases in the
primer. This formula gives accurate Tm in °C for primers from 20 to 100 bases long.
There are many website tools that will calculate the exact Tm for primers*.
• Annealing temperature — the temperature for the annealing step of PCR should be about 5°C
below the Tm of the primers. Primers should generally have an annealing temperature of ~50–
60°C
• GC content — the primer should be composed of 40–60% Gs and Cs. Primers with higher GC
content require more energy and thus a high Tm for PCR. (If the Tm is too high, the annealing
temperature can exceed the optimal temperature for Taq polymerase extension of the DNA
strand.) In addition, long stretches of any single base may lead to gaps, hairpin structures, or
mismatches, and should be avoided. An ideal primer would have a random mix of bases with
~50% GC content
* Free calculators include OligoCalc (hosted at Northwestern University, University of Pittsburgh, JustBio.com and others)
and PrimerFox (primerfox.com). There are also calculators available on many biotech company websites and some fee-
based sites.
GAPDH PCR 47
BACKGROUND
CHAPTER 2
• Intra- or inter-primer complementarity — primers should not have any regions of complementarity
longer than three bases. Otherwise, they can form hairpins by internal annealing or generate
double-stranded structures that will interfere with PCR. Also, it is very important that there not be
complementarity at the 3' ends of the two primers. If primers hybridize at their 3' ends, the hybrid
molecule can act as a template for DNA polymerase, resulting in an unwanted PCR product
called a primer-dimer. Primer-dimers are more likely to be produced when the primers do not
bind efficiently to the template DNA
• GC-clamp — the sequence of the primers at the 3' end is important to ensure correct and strong
binding of the primer to the template. If the primers contain GC clamps, which are 1–3 G or C
bases at the 3' end of the primer, they will form a more stable complex with the template DNA
Testing Primers in PCR

Although one should spend time and energy designing PCR primers, there are no guarantees that
a well-designed primer will work. The only way to find out if a primer will actually amplify template
DNA is to test it in a PCR reaction. Frequently primers that appear less efficient on paper work
better than seemingly perfect primers. Researchers will frequently design multiple primers and
spend a lot of time optimizing the PCR to find the best primer pair.
Designing Degenerate Primers from Consensus DNA Sequences

Normally PCR primers are designed based on the known sequence of the target DNA, and
therefore consist of a single, unique sequence. When the sequence of the template DNA is not
known, there are several alternative approaches for primer design. One approach is to take
advantage of genetic homology among closely related organisms. For example, the target DNA
may not have been sequenced in the species of interest, but the gene may have been sequenced
in several other species. Genes that code for the same protein in different organisms are likely to
have sequences that are conserved, therefore very similar or even identical in the different species.
These conserved sequences usually code for parts of the protein that are essential for function;
in other words, mutations in these areas are likely to be detrimental to the organism, so evolution
discourages any changes.
If genomic DNA (gDNA) or messenger RNA (mRNA) sequences from similar species are aligned,
a consensus sequence can be derived. The consensus sequence may be exactly the same in all
species, or it may have one or more bases that vary among the species. For example, a consensus
sequence could be represented by A-C-T-G-G-N-T-T-A-C-C-G, where A, C, G, and T represent
the bases that are the same in all of the species compared, and N represents a base that varies in
different species. In other words, the base at the N position might be G, C, A or T.
Since the goal of PCR is to amplify the DNA region of interest, primers are designed to bracket that
region. Once the two primers have been designed based on the consensus sequences derived
from other organisms, it is possible that they will have enough complementarity with the target
DNA to bind during the annealing step. However, to increase the probability that the primers will
bind to the target DNA, one or more bases within the primers is substituted with the other three
bases, introducing degeneracy, or wobble, to the primer sequences. In a simplified example, if the
consensus sequence is NATC, the set of degenerate primers would be AATC, TATC, GATC, and
CATC.
48 GAPDH PCR
However, in many cases, not all of the bases are used to substitute for the variable base. To
BACKGROUND
increase the probability that the primer will anneal to the target DNA, the variable base is substituted
CHAPTER 2
with a similar base. For example, if the variable base is T, it might be replaced only with C (the
other pyrimidine). There is a code from the International Union of Biochemistry (IUB) used to tell the
company synthesizing the primers which bases to substitute at each variable position shown in the
table below:
IUB codes.
IUB Code Bases Derivation of IUB Code
N A/G/C/T Any
K G/T Keto
S G/C Strong
Y T/C Pyrimidine
M A/C Amino
W A/T Weak
R G/A Purine
B G/T/C —
D G/A/T —
H A/C/T —
V G/C/A —
The following table shows how alignment of GAPC genes from different plant species can be
used to derive a consensus sequence that can then be used for primer design. The plant species
are listed on the left and the GAPC genes are aligned on the right. The vertical highlighting in the
sequences shows bases conserved across all the species. Deriving the consensus sequence for
the gene begins with the conserved bases. For example, all of the sequences begin with GA, so the
consensus sequence will also begin with GA. Twelve out of the 23 bases do not vary between plant
species and are highlighted.
Although the other bases are not conserved, the differences between the species are not random.
For example, the bases in positions 4 and 5 are always either A or T. (A and T are considered weak
bases, as the base pairs they form are not as strong as those formed by G and C). The base that
is more commonly found in that position will be the one used in the consensus sequence; that is,
position 5 will be A, as A is found in 15 of the 19 sequences.
After most of the consensus sequence has been determined, degeneracy can be introduced at
one or more positions, normally at the positions that show the most variability among species, but
this can also depend on experimental optimization. Degeneracy cannot be introduced at each base
where there is variation; this would introduce too much nonspecific binding and also decrease the
concentration of a matching sequence such that amplification would be too inefficient.
Degeneracy is achieved by having multiple bases introduced at specific base positions during the
manufacture of the oligonucleotides (oligos). Oligos are short individual DNA or RNA sequences that
are usually synthetically manufactured and have a wide range of applications in molecular biology. A
primer is a type of oligo that is used as a starting point for DNA synthesis. A primer typically refers to
a single oligo sequence; however, degenerate primers are composed of multiple oligo sequences.
In the case, however, of the initial forward PCR primer, position 3 is an A in two genes, G in eight
genes, C in four genes and T in five genes (see table: Design of initial forward primer). Since A is
less frequent, only G, C, or T were chosen to be represented in the degenerate primer. During
manufacture of the primer, G, C, and T will be randomly incorporated into each individual oligo at
position 3, resulting in a pool of oligos with three different sequences, each differing at position 3.
GAPDH PCR 49
BACKGROUND
CHAPTER 2
Design of initial forward primer.

GenBank
Plant Gene Accession Number Sequence
Arabidopsis GAPC1 AT3G04120 GACTACGTTGTTGAGTCTACTGG
Arabidopsis GAPC2 AT1G13440 GACTTTGTTGTTGAGTCTACTGG
Arabidopsis GAPCP1 AT1G79530 GATTATGTTGTTGAGTCTTCCGG
Arabidopsis GAPCP2 AT1G16300 GAGTATGTTGTTGAGTCTTCAGG
Pepper GAPCP CAN272042 GATTATGTTGTTGAATCTTCTGG
Liverwort GAPC AJ246023 GAGTACGTCGTCGAGTCTACCGG
Corn GAPC1 ZMGPC1 GAGTACGTCGTGGAGTCCACCGG
Corn GAPC2 U45855 GAGTATGTCGTGGAGTCCACCGG
Corn GAPC3 U45856 GAATATGTTGTTGAGTCTACTGG
Corn GAPC4 X73152 GAATATGTTGTTGAGTCTACTGG
Pea GAPC1 L07500 GATATCATTGTTGAGTCTACTGG
Wheat GAPC EF592180 GAGTACGTTGTTGAGTCCACCGG
Rye grass GAPC3 EF463063 GACTACGTTGTTGAGTCCACTGG
Tobacco GAPC AJ133422 GATTACATTGTGGAGTCGACTGG
Tobacco GAPDH* DQ682459 GATTTCGTTGTGGAATCCACTGG
Carrot GAPDH* AY491512 GAGTACATTGTGGAGTCCACTGG
Blue gem GAPDH* X78307 GAGTACGTCGTTGAGTCGACTGG
Tomato GAPDH* AB110609 GACTTCGTTGTTGAATCAACCGG
Snapdragon GAPDH* X59517 GAGTATATTGTGGAGTCCACTGG
Initial forward primer GABTATGTTGTTGARTCTTCWGG
Positions of base 123456789
* Specific GAPDH gene not listed in GenBank
The IUB code designates what degenerate bases are incorporated (see table: IUB codes) and the
code for incorporation of G or C or T is represented by the letter B. In the initial forward primer there
are two more degenerate bases: position 15 is R, as all the bases at that position are purines (G
or A) and position 21 is W (A or T). By adding additional degenerate bases, the number of DNA
sequences in the pool of oligos also increases, thus the initial forward primer is actually composed
of 12 different oligo sequences (3 x 2 x 2). The concentration of the one oligo that is the best match
is therefore reduced by one 12th, which can reduce PCR efficiency.
GAGTATGTTGTTGA(GA)TCTTC(AT)GG
3 bases for position 3
GATTATGTTGTTGA(GA)TCTTC(AT)GG
GACTATGTTGTTGA(GA)TCTTC(AT)GG
GA(GTC)TATGTTGTTGAGTCTTC(AT)GG
GA(GTC)TATGTTGTTGAATCTTC(AT)GG
GA(GTC)TATGTTGTTGA(GA)TCTTCAGG
GA(GTC)TATGTTGTTGA(GA)TCTTCTGG
3 x 2 x 2 = 12 oligos
50 GAPDH PCR
Degenerate primers are beneficial because they increase binding to the target region during
BACKGROUND
PCR, but they have drawbacks. First, they decrease the concentration of specific oligos available
CHAPTER 2
for amplification, which reduces the efficiency of the PCR. Second, they increase the chance of
amplification of nonspecific/unwanted PCR products. This nonspecific binding can be partially
ameliorated by increasing the annealing temperature for the primers, which will discourage
nonspecific annealing. However, the annealing temperature cannot be too high because it is unlikely
that any of the oligos match the target sequence 100%. (Consider that although degenerate base
pairs have been introduced in three highly variable locations, there are still nine bases where the
consensus sequence may not match the target sequence of the specific plant gene being studied.)
In the example below, the snapdragon sequence matches a degenerate oligo in all three bases
(red, italics), but still has four mismatched bases (green, bold).
Initial forward primer: GAGTATGTTGTTGAGTCTTCTGG
Snapdragon GAPDH sequence: GAGTATATTGTGGAGTCCACTGG
Designing Degenerate Primers from a Protein Sequence

In some cases, a researcher may purify a protein of interest and obtain some amino acid sequence
data from the protein but not have any of the DNA sequence for the protein. When that happens,
there is another approach for designing primers. Since organisms use more than one codon of
three nucleotides to specify some amino acids (see table: Amino acids and the DNA codons for
each), primer mixtures can be synthesized that include all possible codons for each amino acid.
Although it seems as though there would be huge numbers of oligonucleotides needed, the task
can be simplified in several ways, such as choosing an area of protein sequence that is heavy in
amino acids that are encoded by only one or two codons.
Amino acids and the DNA codons for each.
Ala (A) Arg (R) Asp (D) Asn (N) Cys (C) Gln (Q) Glu (E) Gly (G) His (H) Ile (I)
GCA CGA GAC AAC TGC CAA GAA GGA CAC ATA
GCC CGC GAT AAT TGT CAG GAG GGC CAT ATC
GCG CGG GGG ATT
GCT CGT GGT
AGA
AGG
Leu (L) Lys (K) Met (M) Phe (F) Pro (P) Ser (S) Thr (T) Trp (W) Tyr (Y) Val (V)
CTA AAA ATG TTC CCA TCA ACA TGG TAC GTA
CTC AAG TTT CCC TCC ACC TAT GTC
CTG CCG TCG ACG
GTG
CTT CCT TCT ACT GTT
TTA AGC
TTG AGT
Note: When there are multiple codons for an amino acid, the codons are very similar. For all amino acids with up to four
codons, only the third base differs between codons (for example, the four codons for valine, which all begin with GT). There
are three codons that code for a stop signal: TAG, TGA, TAA.
GAPDH PCR 51
BACKGROUND
CHAPTER 2
By choosing amino acids with fewer codons, the number of degenerate primers can be minimized.
For example, to make degenerate primers to the DNA that codes for the amino acid sequence
Gly-Leu-Ser-Val, the mixture would include 576 different oligonucleotides:
Amino Acid Number of Codons Number of Degenerate Primers

Glycine (Gly) 4
4 * 6 * 6 * 4 = 576
Leucine (Leu) 6
Serine (Ser) 6
Valine (Val) 4
In comparison, degenerate primers to the amino acid sequence Asp-Trp-Cys-Glu would include
only eight different oligonucleotides:
Amino Acid Number of Codons Number of Degenerate Primers

Aspartic acid (Asp) 2
2*1*2*2=8
Tryptophan (Trp) 1
Cysteine (Cys) 2
Glutamic acid (Glu) 2
These sequence examples are only four amino acids in length, making the primers only twelve
oligonucleotides long. Degenerate primers are usually longer, meaning more oligonucleotide
combinations will be needed. To keep the number needed to a minimum, choose a target amino
acid sequence containing amino acids coded by only one or two codons and try to avoid amino
acids that have six codons.
Nested PCR
A number of variations of PCR have been developed in the last 20 years to address specific
research questions. Some of these variations include inverse PCR, in situ PCR, long PCR,
real-time PCR, and nested PCR. When there is the potential for primers to bind to sequences of the
template DNA other than at the target area (for example, when using degenerate primers), nested
PCR can increase the yield and specificity of amplification of the target DNA. Nested PCR uses
two sequential sets of primers. The first primer set binds to sequences outside the target DNA, as
expected in standard PCR, but it may also bind to other areas of the template. The second primer
set binds to sequences in the target DNA that are within the portion amplified by the first set (that is,
the primers are nested). Thus, the second set of primers will bind and amplify target DNA within the
products of the first reaction. One advantage of nested PCR is that if the first primers bind to and
amplify an unwanted DNA sequence, it is very unlikely that the second set of primers will also bind
within the unwanted region.
52 GAPDH PCR
A second advantage of nested PCR is that the initial PCR step enriches the pool of potential targets
BACKGROUND
for the second set of nested primers. The initial PCR using degenerate primers, which is less
CHAPTER 2
efficient than a PCR using homologous primers, results in a lower concentration of PCR product
than is desirable for ligation and also amplifies undesirable nonspecific PCR products. The low
efficiency is due to the following reasons (see Designing Degenerate Primers from Consensus DNA
Sequences, above, for more explanation):
• The concentration of primers that bind efficiently in the reaction is lower than normal due to the
presence of multiple oligo sequences for each primer
• The annealing temperature used for the PCR is quite high (52°C) to discourage primers from
binding nonspecifically
• Even though degenerate primers are used, the oligo sequences are still not 100% homologous
to the target sequence, thus reducing annealing efficiency
However, by performing the initial PCR reaction, the pool of targets for a second, nested round of
PCR is greatly enriched. During the initial PCR, there are only a few target regions within the millions
of base pairs of DNA in a genomic DNA sample, while after the initial PCR has completed, the
number of target regions has increased by a millionfold or more. This enrichment greatly increases
the efficiency of the nested PCR reaction.
The second round of nested PCR does not use degenerate primers, which increases its specificity
for GAPC and GAPC-2 genes over other GAPDH family members. However, to gain this specificity,
PCR efficiency is sacrificed, since the consensus sequences of the invariable nested primers are
even less complementary to the plant target sequence than the initial degenerate primers. If the
nested primers are used directly on genomic DNA, they usually amplify poorly because they do not
bind well to the template DNA. However, because of the enrichment of target DNA by the initial
round of PCR, there are many more targets for the primers to anneal, increasing the chances of
binding, and boosting the PCR efficiency. In addition, to encourage the noncomplementary primers
to anneal, the annealing temperature for the nested PCR is also reduced to 46°C. In contrast to the
initial round of PCR where nonspecific binding was discouraged through use of a higher annealing
temperature, here as much binding as possible is encouraged, since most nonspecific binding sites
were screened out during the initial PCR.
GAPDH PCR 53
BACKGROUND
CHAPTER 2
Nested PCR of Plant GAPC Genes

Nested PCR. Nested PCR involves two rounds of PCR with the product of the first round acting as template for the second round.
54 GAPDH PCR
The PCR variation of nested PCR will be used to amplify a portion of GAPC, the plant gene for
BACKGROUND
NAD+-dependent cytosolic GAPDH. The section of the gene to be amplified encodes around two-
CHAPTER 2
thirds of the protein, including the active site of the enzyme. The primer annealing sites within the
Arabidopsis GAPC gene are shown below.
GACTACGTTGTTGAGTCTACTGGTGTCTTCACTGACAAAGACAAGGCTGCAGCTCACTTGAAGGTTTGTCT
TATTTGAATTGGTTATTTTTGTCTTGTAATGATATAAATAGTTTATGTGCTAGAATTTGCTTAGTATCATT
CAACTAAATTTGTGACTTGTTGTATTTTCAGGGTGGTGCCAAGAAGGTTGTTATCTCTGCCCCCAGCAAAG
ACGCTCCAATGTTTGTTGTTGGTGTCAACGAGCACGAATACAAGTCCGACCTTGACATTGTCTCCAACGCT
AGCTGCACCACTAACTGCCTTGCTCCCCTTGCCAAGGTAAAATATCTGATATTCTATATGATCAAATTTGA
CTTTGTATTTCAAGTTGAAGTGACTAATTTCATTTAACGTTCTTTGATTTCATTGTGTAGGTTATCAATGA
CAGATTTGGAATTGTTGAGGGTCTTATGACTACAGTCCACTCAATCACTGGTAAATTTATCAATCAGTTAG
AAGTTTATTACAAACTTGCTTGCCTATAGGTGGAAAATTTGTGATTTAATGGGGTTTGCTTTATGATTTCA
GCTACTCAGAAGACTGTTGATGGGCCTTCAATGAAGGACTGGAGAGGTGGAAGAGCTGCTTCATTCAACAT
TATTCCCAGCAGCACTGGAGCTGCCAAGGCTGTCGGAAAGGTGCTTCCAGCTCTTAACGGAAAGTTGACTG
GAATGTCTTTCCGTGTCCCAACCGTTGATGTCTCAGTTGTTGACCTTACTGTCAGACTCGAGAAAGCTGCT
ACCTACGATGAAATCAAAAAGGCTATCAAGTAAGCTTTTGAGCAATGACAGATTAAGTTTACTTATATTCC
AGTAGTGATCAAATTACTCACCAAGTGTTTTTACCACCAATACATAGGGAGGAATCCGAAGGCAAACTCAA
GGGAATCCTTGGATACACCGAGGATGATGTTGTCTCAACTGACTTCGTTGGCGACAACAGGTCGAGCATTT
TTGACGCCAAGGCTGGAATTGCATTGAGCGACAAGTTTGTGAAATTGGTGTCATGGTACGACAACGAATGG
GAPDH PCR primers. Positions of first-round initial GAPDH PCR primers (blue, bold) and second-round nested PCR primers (yellow,
underlined italics) on Arabidopsis GAPC gDNA are indicated. Note: Reverse primers are complementary to the antisense DNA strand, the one
that does not encode the gene.
While the GAPDH protein sequence is highly homologous among family members and species,
the gene structure (the sequence that does not actually code for amino acids in the final protein),
including the number, locations, sequence, and length of introns, is more variable. This can be
observed in the differences in gene structure within the Arabidopsis GAPC gene family, where
GAPC is missing two introns present in the other family members, resulting in a shorter PCR
product. This variability in gene structure results in PCR products of different lengths that can be
identified by agarose gel electrophoresis. Studies of other plant species during the development of
this lab series identified numerous other instances of absent introns.
GAPC
GAPC-2
GAPCP-1
GAPCP-2
Nested GAPDH PCR product
Gene structure of the Arabidopsis GAPC family of genes. Blue bars indicate coding sequence (exons). GAPC differs from the rest of
the family by the absence of two introns (noncoding sequence, indicated by lines), which shortens the gene. The GAPCP subfamily of genes
has a signal peptide at the N-terminus that directs the protein to plastids. Arrows indicate annealing positions of first-round (outer arrows) and
second-round nested (inner arrows) GAPDH PCR primers on the structure of GAPC family genes. Green bar indicates location encoding the
enzyme active site. Note: Figure is not to scale.
GAPDH PCR 55
BACKGROUND
CHAPTER 2
Since the first-round primers used in this lab are degenerate and were designed based on a
consensus sequence derived from a number of GAPC genes (including those encoding isozymes
such as GAPC and GAPCP), they may anneal to the target DNA at several locations. These
locations may be sequences of GAPDH genes other than GAPC, or they may be unrelated
sequences that have a high degree of complementarity to one or more of the degenerate primers.
So it is likely that multiple bands of amplified DNA may be seen on an agarose gel after the initial
round of PCR. The nested primers were designed to be more specific to GAPC (rather than the
GAPCP subfamily) and are not degenerate, so in theory only the GAPC genes from the pool of
GAPDH genes amplified during the initial PCR will be amplified in the second round of nested PCR.
For example, if the plant gDNA used in this lab is from Arabidopsis, then the nested PCR should
amplify only GAPC and GAPC-2. The nested primers should not bind to DNA coding for GAPDH
isozymes or to unrelated DNA sequences, so that DNA should not be amplified.
Analyzing Results
PCR products can be visualized by agarose gel electrophoresis, with the agarose concentration
determined by the expected size of the products. In addition to the experimental samples, the
negative control reaction and a size marker should be included on the gel. The size markers help
determine if the size of the PCR product is as expected. The negative control reaction should
not yield any amplified DNA. If it does, then the reactions may have been contaminated and the
experimental results are suspect. If the PCR products are around 50–100 base pairs, the reactions
may have formed primer-dimers.
However, occasionally GAPCP genes are cloned using these primers. The sizes of PCR products
expected from Arabidopsis GAPC family genes using the primers in this lab are shown in the table
below:
Expected length of Arabidopsis GAPC gene family PCR products.

GAPDH Arabidopsis Length of PCR Product (bp)
Enzyme Protein Arabidopsis Chromosome Initial Nested
Function Subunit Gene Location Primers Primers
NAD+-dependent GAPC GAPC 3 1,065 993
GAPDH in cytosol GAPC-2 GAPC-2 1 1,216 1,145
NAD+-dependent GAPCP GAPCP 1 1,303 1,231
GAPDH in plastids GAPCP-2 GAPCP-2 1 1,205 1,133
Note: The pGAP plasmid, which is used as a PCR control, contains the sequence for the first-round PCR product of the
Arabidopsis GAPC gene.
What does it mean if no DNA is visible on an agarose gel after the nested PCR? If the experimental
controls worked (meaning that the problem was not with the reagents or the thermal cycler), then it
is likely that no GAPC was amplified from the gDNA sample. The most probable reason is that the
initial primers did not bind to any target DNA because there was too little complementarity between
the primers and the target. Alternatively, there could have been too little gDNA or PCR inhibitors
present in the gDNA preparation.
56 GAPDH PCR

Lab 2 Degenerate PCR - Background

Uploaded by

Copyright:

Available Formats

Lab 2 Degenerate PCR - Background

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab 2 Degenerate PCR - Background

Uploaded by

Copyright:

Available Formats

Cloning and Sequencing Explorer Series

CHAPTER 2: GAPDH PCR

DNA replication fork.

DNA replication uses multiple origins of replication.

PCR Step by Step

Taq DNA polymerase. Originally, the

Mismatched base pairs affect DNA annealing.

Testing Primers in PCR

Designing Degenerate Primers from Consensus DNA Sequences

Design of initial forward primer.

Designing Degenerate Primers from a Protein Sequence

Amino Acid Number of Codons Number of Degenerate Primers

Amino Acid Number of Codons Number of Degenerate Primers

Nested PCR of Plant GAPC Genes

Nested GAPDH PCR product

Expected length of Arabidopsis GAPC gene family PCR products.

You might also like