CDU BIOINFORMATICS The Central Dogma
CDU BIOINFORMATICS The Central Dogma
CDU BIOINFORMATICS The Central Dogma
1. Define the following terms as they are used in regular English language AND
give the special meaning each term has when applied to the living cell.
d. splicing - Splicing is an intermediate step in the process when our genes are
decoded into proteins, the workhorses of the cell. In this process, the DNA
of our genes are transcribed into “messenger” RNA, a molecule similar to
DNA that serves as the blueprint for constructing proteins. However, before
messenger RNA can be used to build proteins, some segments of the
message, called introns, must be removed. Although introns have been
described as junk, they can be removed from a single RNA strand in
different ways to modify the blueprint and the resulting protein, with
consequences of differing function in the cell.
b. How many different types of amino acids are found in proteins? 20 different
kinds
c. How many nucleotides are needed to code for a single amino acid? 3
nucleotides that determine a single amino acid
(153 x 3) +3 = 462
b. Why might the size of the myoglobin primary RNA transcript be larger than
your answer above?
For a reason that, the primary transcript has large introns which are detached
before the mRNA that is used for translation of myoglobin
4. Use the genetic code to identify which of the following nucleotide sequences
would code for the polypeptide sequence arginine‐glycine‐aspartate:
a. 5’‐AGAGGAGAU‐3’
b. 5’‐ACACCCACU‐3’
c. 5’‐GGGAAAUUU‐3’
d. 5’‐CGGGGUGAC‐3’
The sequences a and d both code for the peptide Arg-Gly-Asp. For a reason that the
genetic code is redundant, which means that different nucleotide sequences can
encode the same amino acid sequence.
5. Explain why if you are given a protein sequence, you cannot predict the exact
RNA sequence that was used by the cell to generate that sequence.
Since there are more than one codon that can determine amino acid, you can’t predict
which one will be used for a particular protein.
There are actually tons of possibilities, we can assign each letter for their
corresponding number of codons in the amino acids so that we can have a
higher probability of getting the RNA sequence:
A(4) - M(1) - H(2)- E(2)- R(6) - S(6) - T(4)
A- Alanine
M- Methionine
H- Histidine
E- Glutamic Acid
R- Arginine
S- Serine
T- Threonine
b. Can you figure out mathematically how many different RNA sequences
could code for this mini protein?
A(4) x M(1) x H(2) x E(2) x R(6) x S(6) x T(4)= 2304
Since every amino acid has a corresponding number of codons you must use
the mRNA of the amino acid in order for your sequence to have a higher
probability of getting the RNA sequence. Multiply each number in every
letter and it will result in 2304 possibilities that would code for
“AMHERST”. Therefore the maximum number of codons in amino acid is
four. Adding all the numbers in every letter it will result in 21 (nucleotide-
long). Then you can now assume that since you now have four possible
nucleotides at each of 21 spots, you will have 421, which will result in a less
than one billion chance that you may get the RNA sequence correct.
a. Give the RNA transcript that would be transcribed off of the bottom strand:
5’-AUGAAGUUUGGCACUUAA-3’
c. Give the RNA transcript that would be transcribed off of the top strand:
5’-UUAAGUGCCAAACUUCAU-3’
5’-ATGAAGATTTGGCACTTAA-3’
3’-TACTTCTAAACCGTGAATT-5’
All codons after the lysine are different since the reading frame of the
protein changed. Therefore the protein now is
MET-LYS-TRP-HIS-LEU
5’-ATGAAGTTCGGCACTTAA-3’
3’-TACTTCAAGCCGTGAATT-5’
No change: MET-LYS-PHE-GLY-THR
A protein with a single amino acid change, an arginine replaces a glycine for
a reason that a tiny a.a is being replaced by a large charged a.a due to this
changes the overall 3-D structure is affected and therefore the stability of the
protein: MET-LYS-PHE-ARG-THR
5’-ATGTAGTTTGGCACTTAA-3’
3’-TACATCAAACCGTGAATT-5’
Due to this, the mutation will change the second codon into a stop codon.
The presence of this premature stop codon results in the production of a
shortened, and likely nonfunctional, protein. (Unless there happens to be
another AUG codon available farther along, and in this case the outcome
depends on how far away it is and whether or not it is in the same reading
frame as the original start codon)
(e) and (h) would be the most severe. There is a slight chance that another
codon may exist in (h), therefore the original protein might have a chance of
being made with missing a.a at the beginning, but this is not the scenario
here. (g) is only one a.a different from the original , however if this
happened to be critical for protein folding, it is possible that this mutation
would be just as bad as (e) or (h). (f) would have absolutely no effect on the
person carrying this mutation.