Module 1_Session 3_Part 2
Module 1_Session 3_Part 2
(Session 3)
Part II
Genbank flat file
3- Graphics tape
/Codon_start has valid value of 1 or 2 or 3, it is the first nucleotide of the CDS , that is the first base of the
first complete codon must be indicated with the qualifier "codon_start".
BLAST places the single letter AA codes in the middle of the complete codons.
We have 2 situation :
1- complete CDSs There is no need to indicate the codon_start on complete CDSs, as the translation always
begins at the first nucleotide of the interval. complete codon (coding triplet)
The default situation is that the codon_start is 1 and in this case it is not the ORF1
2- partial CDSs at its 5’ or 3 end to translate correctly with an incomplete codon (lacking the first nucleotide
or the first and the second nucleotides of the codon).
In this case Codon completion determines the reading frame for translating a 5’ or 3 partial CDS into
protein. GenBank uses the term “codon_start” as a synonym for the reading frame in this case .
• For example, nucleotide 2 begins the first complete codon of the protein x in CDS. So the codon start is
2.
• case 1 • Explanation: BLAST places the single letter AA codes in the middle of the complete
codons. In this case, nucleotides 1, 2, and 3 represent a complete codon. The
CDS <1..18 translation therefore starts with nucleotide 1.
/codon_start=1
/transl_table=1
/translation="FGCRR"
• 2- AA code placed on the 3rd nucleotide: reading frame (codon_start) is 2
• case 2 • Explanation: The translation skips the first base of the sequence to start at the first
CDS <1..22 complete codon (nucleotides 2, 3, and 4).
/codon_start=2
/transl_table=1
/translation="SAAEDK“ • 3- AA code placed on the 4th nucleotide: reading frame (codon_start) is 3
• Explanation: The translation skips the first two nucleotides of the sequence to start the first
• case 3 complete codon (bases 3, 4, and 5).
CDS <1..26
/codon_start=3 nucelotide sequence ttcggctgcagaagataaataaataa
/transl_table=1
translated amino acid sequence, case 1 F G C R R *
/translation="RLQKINK"
translated amino acid sequence, case 2 S A A E D K *
translated amino acid sequence, case 3 R L Q K I N K *
Look at the source annotation to see if you find Homo sapiens So , searching the Organism field will search the
Organism field in the general annotation but also the
/organism fields in the feature annotation.
Activate the Homo sapiens filter in the Results by
taxon section.
How should you do the search to return human
sequences without synthetic constructs?