CDU BIOINFORMATICS The Central Dogma

Bioinformatics Worksheet
The Central Dogma
1. Define the following terms as they are used in regular English language AND
give the special meaning each term has when applied to the living cell.
a. Replication - the biological process of producing two identical replicas of

DNA from one original DNA molecule. Before a cell divides, it duplicates
its DNA in a copying process of replication. This process ensures that each
resulting cell has the same complete set of DNA molecules. The purpose of
DNA replication is to produce two identical copies of a DNA molecule.
This is essential for cell division during growth or repair of damaged
tissues. DNA replication ensures that each new cell receives its own copy
of the DNA.
b. transcription - the process by which the genetic information contained

within DNA is re-written into messenger RNA (mRNA) by RNA
polymerase. This mRNA then exits the nucleus, where it acts as the basis
for the translation of DNA. By controlling the production of mRNA within
the nucleus, the cell regulates the rate of gene expression.
c. translation - the process of translating the sequence of a messenger RNA

(mRNA) molecule to a sequence of amino acids during protein synthesis.
The genetic code describes the relationship between the sequence of base
pairs in a gene and the corresponding amino acid sequence that it encodes.
In the cell cytoplasm, the ribosome reads the sequence of the mRNA in
groups of three bases to assemble the protein.
d. splicing - Splicing is an intermediate step in the process when our genes are
decoded into proteins, the workhorses of the cell. In this process, the DNA
of our genes are transcribed into “messenger” RNA, a molecule similar to
DNA that serves as the blueprint for constructing proteins. However, before
messenger RNA can be used to build proteins, some segments of the
message, called introns, must be removed. Although introns have been
described as junk, they can be removed from a single RNA strand in
different ways to modify the blueprint and the resulting protein, with
consequences of differing function in the cell.
2. Now for some numbers…
a. How many different types of nucleotides are found in RNA?

RNA is composed of 3 individual nucleotides
b. How many different types of amino acids are found in proteins? 20 different
kinds
c. How many nucleotides are needed to code for a single amino acid? 3
nucleotides that determine a single amino acid
d. How many different codons are there in the genetic code?

64 different codons
e. How many codons code for amino acids?
61 codons
3. Myoglobin is 153 amino acids long.

a. What would be the minimum number of nucleotides required to encode
myoglobin?
(153 x 3) +3 = 462
b. Why might the size of the myoglobin primary RNA transcript be larger than
your answer above?
For a reason that, the primary transcript has large introns which are detached
before the mRNA that is used for translation of myoglobin
4. Use the genetic code to identify which of the following nucleotide sequences
would code for the polypeptide sequence arginine‐glycine‐aspartate:
a. 5’‐AGAGGAGAU‐3’
b. 5’‐ACACCCACU‐3’
c. 5’‐GGGAAAUUU‐3’
d. 5’‐CGGGGUGAC‐3’
The sequences a and d both code for the peptide Arg-Gly-Asp. For a reason that the
genetic code is redundant, which means that different nucleotide sequences can
encode the same amino acid sequence.
5. Explain why if you are given a protein sequence, you cannot predict the exact
RNA sequence that was used by the cell to generate that sequence.
Since there are more than one codon that can determine amino acid, you can’t predict
which one will be used for a particular protein.
6. Consider the hypothetical protein represented by single letter amino acid

designations: A‐M‐H‐E‐R‐S‐T.
a. Propose one RNA sequence that could encode this sequence (don’t worry
about starting codons for this exercise).
There are actually tons of possibilities, we can assign each letter for their
corresponding number of codons in the amino acids so that we can have a
higher probability of getting the RNA sequence:
A(4) - M(1) - H(2)- E(2)- R(6) - S(6) - T(4)
A- Alanine
M- Methionine
H- Histidine
E- Glutamic Acid
R- Arginine
S- Serine
T- Threonine
b. Can you figure out mathematically how many different RNA sequences
could code for this mini protein?
A(4) x M(1) x H(2) x E(2) x R(6) x S(6) x T(4)= 2304
Since every amino acid has a corresponding number of codons you must use
the mRNA of the amino acid in order for your sequence to have a higher
probability of getting the RNA sequence. Multiply each number in every
letter and it will result in 2304 possibilities that would code for
“AMHERST”. Therefore the maximum number of codons in amino acid is
four. Adding all the numbers in every letter it will result in 21 (nucleotide-
long). Then you can now assume that since you now have four possible
nucleotides at each of 21 spots, you will have 421, which will result in a less
than one billion chance that you may get the RNA sequence correct.
7. Here is the beginning of a protein‐encoding gene sequence.

5’-ATGAAGTTTGGCACTTAA-3’
3’-TACTTCAAACCGTGAATT-5’
a. Give the RNA transcript that would be transcribed off of the bottom strand:
5’-AUGAAGUUUGGCACUUAA-3’
b. Translate this RNA sequence into a protein sequence.

MET-LYS-PHE-GLY-THR
c. Give the RNA transcript that would be transcribed off of the top strand:
5’-UUAAGUGCCAAACUUCAU-3’
d. Translate this RNA sequence into a protein sequence.

This does not have a starting MET, technically it would not get translated; if
so, the sequence would be: LEU-SER-ALA-LYS-LEU-HIS
# 7 continued
e. Let’s assume that the bottom strand is the strand that is used as a
template strand when this gene gets transcribed. What would be the
effect on the final protein product if a mutation caused the following
single base‐pair insertion:
5’-ATGAAGATTTGGCACTTAA-3’
3’-TACTTCTAAACCGTGAATT-5’
All codons after the lysine are different since the reading frame of the
protein changed. Therefore the protein now is
MET-LYS-TRP-HIS-LEU
f. What would be the effect on the final protein product if a mutation

caused the following single base‐pair substitution:
5’-ATGAAGTTCGGCACTTAA-3’
3’-TACTTCAAGCCGTGAATT-5’
No change: MET-LYS-PHE-GLY-THR
g. What would be the effect on the final protein product if a mutation

5’-ATGAAGTTTCGCACTTAA-3’
3’-TACTTCAAAGCGTGAATT-5’
A protein with a single amino acid change, an arginine replaces a glycine for
a reason that a tiny a.a is being replaced by a large charged a.a due to this
changes the overall 3-D structure is affected and therefore the stability of the
protein: MET-LYS-PHE-ARG-THR
h. What would be the effect on the final protein product if a mutation

5’-ATGTAGTTTGGCACTTAA-3’
3’-TACATCAAACCGTGAATT-5’
Due to this, the mutation will change the second codon into a stop codon.
The presence of this premature stop codon results in the production of a
shortened, and likely nonfunctional, protein. (Unless there happens to be
another AUG codon available farther along, and in this case the outcome
depends on how far away it is and whether or not it is in the same reading
frame as the original start codon)
i. Which of the above mutation(s) would you expect to be the most

severe in terms of the overall effect on the person carrying such a mutation?
Explain.
(e) and (h) would be the most severe. There is a slight chance that another
codon may exist in (h), therefore the original protein might have a chance of
being made with missing a.a at the beginning, but this is not the scenario
here. (g) is only one a.a different from the original , however if this
happened to be critical for protein folding, it is possible that this mutation
would be just as bad as (e) or (h). (f) would have absolutely no effect on the
person carrying this mutation.

CDU BIOINFORMATICS The Central Dogma

Uploaded by

Copyright:

Available Formats

CDU BIOINFORMATICS The Central Dogma

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CDU BIOINFORMATICS The Central Dogma

Uploaded by

Copyright:

Available Formats

Bioinformatics Worksheet

The Central Dogma

a. Replication - the biological process of producing two identical replicas of

b. transcription - the process by which the genetic information contained

c. translation - the process of translating the sequence of a messenger RNA

2. Now for some numbers…

a. How many different types of nucleotides are found in RNA?

d. How many different codons are there in the genetic code?

3. Myoglobin is 153 amino acids long.

6. Consider the hypothetical protein represented by single letter amino acid

7. Here is the beginning of a protein‐encoding gene sequence.

b. Translate this RNA sequence into a protein sequence.

d. Translate this RNA sequence into a protein sequence.

f. What would be the effect on the final protein product if a mutation

g. What would be the effect on the final protein product if a mutation

h. What would be the effect on the final protein product if a mutation

i. Which of the above mutation(s) would you expect to be the most

You might also like