CAB Unit 1 Notes

Central Dogma- Replication, Transcription,
Translation
 DNA contains the complete genetic information that defines the
structure and function of an organism.
 Proteins are formed using the genetic code of the DNA.
 Conversion of DNA encoded information to RNA is essential to
form proteins.
 Thus, within most cells, the genetic information flows from – DNA to
RNA to protein.
 The flow of information is followed through three different processes
which are responsible for the inheritance of genetic information and
for its conversion from one form to another:
1. Replication: a double stranded nucleic acid is duplicated to give
identical copies. This process perpetuates the genetic information.
2. Transcription: a DNA segment that constitutes a gene is read and
transcribed into a single stranded sequence of RNA. The RNA moves
from the nucleus into the cytoplasm.
3. Translation: the RNA sequence is translated into a sequence of amino
acids as the protein is formed. During translation, the ribosome reads
three bases (a codon) at a time from the RNA and translates them into
one amino acid.
 This flow of information is unidirectional and irreversible.
This explanation is the simplest way in which the Central Dogma of
Molecular Biology is interpreted.
 In the bigger picture, the central dogma of molecular biology is an
explanation of the flow of genetic information within a biological
system.
 It was first stated by Francis Crick in 1958, as
 “Once ‘information’ has passed into protein it cannot get out
again. In more detail, the transfer of information from nucleic
acid to nucleic acid or from nucleic acid to protein may be
possible, but transfer from protein to protein, or from protein to
nucleic acid is impossible.”
This Llama-Derived COVID Treatment Could Be a Game Changer
The Dogmas
 The dogma is a framework for understanding the transfer
of sequence information between information-carrying biopolymers,
DNA and RNA (both nucleic acids), and protein.
 There are 3×3=9 conceivable direct transfers of information that can
occur between these.
 The dogma classes these into 3 groups of 3:
A. Three general transfers

 It describes the normal flow of biological information: DNA can be
copied to DNA (DNA replication), DNA information can be copied
into mRNA (transcription), and proteins can be synthesized using the
information in mRNA as a template (translation).
 It is believed to occur normally in most cells.
B. Three special transfers

 The special transfers describe: RNA being copied from RNA (RNA
replication), DNA being synthesised using an RNA template (reverse
transcription), and proteins being synthesised directly from a DNA
template without the use of mRNA.
 Temin (1970) reported the existence of an enzyme “RNA dependent
DNA polymerase” (inverse transcriptase) which could synthesize DNA
from a single stranded RNA template.
 Baltimore (1970) also reported the activity of this enzyme in certain RNA
tumour viruses.
 This exciting finding in molecular biology gave rise to the concept
of central dogma reverse” or teminism, suggesting that the sequence
of information flow is not necessarily from DNA to RNA to protein but
can also take place from RNA to DNA.
 It is known to occur, but only under specific conditions in case of some
viruses or in a laboratory.
C. Three unknown transfers
 The unknown transfers describe: a protein being copied from a protein,
synthesis of RNA using the primary structure of a protein as a
template, and DNA synthesis using the primary structure of a protein
as a template
 These are not thought to naturally occur.
Significance of the Central Dogma of Molecular

Biology
Thus, the central dogma provides the basic framework for how genetic
information flows from a DNA sequence to a protein product inside cells and
thus give an insight to the important processes going on inside the cells.
Topic 2 : DNA- Structure, Properties, Types, Forms,
Functions
DNA stands for Deoxyribonucleic Acid, which is a molecule that contains the
instructions an organism needs to develop, live and reproduce. These
instructions are found inside every cell and are passed down from parents to
their children.
It is a nucleic acid and is one of the four major types of macromolecules that
are known to be essential for all forms of life. DNA is found in the nucleus,
with a small amount of DNA also present in mitochondria in the eukaryotes.
Awaits FDA Distribution Approval
DNA Structure
Created with BioRender.com

 In 1953, James Watson and Francis Crick discovered the structure of
DNA.
 The works of Rosalind Franklin lead to Watson and Crick’s discovery.
Franklin first had pointed out that the DNA is made up of two spirals.
 The structure of DNA is a double helix structure because it looks like a
twisted ladder.
 The sides of the ladder are made of alternating sugar (deoxyribose) and
phosphate molecules while the steps of the ladder are made up of a
pair of nitrogen bases.
 There are 4 types of nitrogen bases Adenine (A) Thymine (T) Guanine (G)
Cytosine (C) DNA Pairing. The nitrogen bases have a specific pairing
pattern.
 This pairing pattern occurs because the amount of adenine equals the
amount of thymine; the amount of guanine equals the amount of
cytosine. The pairs are held together by hydrogen bonds.
Detailed Structure and Composition of DNA
Image Source: Compound Interest.

 DNA is a double-stranded helix. That is each DNA molecule is comprised
of two biopolymer strands coiling around each other to form a double
helix structure. These two DNA strands are called polynucleotides, as
they are made of simpler monomer units called nucleotides.
 Each strand has a 5′end (with a phosphate group) and a 3′end (with a
hydroxyl group).
 The strands are antiparallel, meaning that one strand runs in a 5′to
3′direction, while the other strand runs in a 3′ to 5′ direction.
 The two strands are held together by hydrogen bonds and are
complimentary to each other.
 Basically, the DNA is composed of deoxyribonucleotides.
 The deoxyribonucleotides are linked together by 3′ – 5′phosphodiester
bonds.
 The nitrogenous bases that compose the deoxyribonucleotides include
adenine, cytosine, thymine, and guanine.
 The complimentary of the strands are due to the nature of the
nitrogenous bases. The base adenine always interacts with a thymine
(A-T) on the opposite strand via two hydrogen bonds and cytosine
always interacts with guanine (C-G) via three hydrogen bonds on the
opposite strand.
 The shape of the helix is stabilized by hydrogen bonding and
hydrophobic interactions between bases.
 The diameter of double helix is 2nm and the double helical structure
repeats at an interval of 3.4nm which corresponds to ten base pairs.
Major and Minor Grooves of the DNA
 As a result of the double helical nature of DNA, the molecule has two
asymmetric grooves. One groove is smaller than the other.
 This asymmetry is a result of the geometrical configuration of the bonds
between the phosphate, sugar, and base groups that forces the base
groups to attach at 120 degree angles instead of 180 degree.
 The larger groove is called the major groove, occurs when the
backbones are far apart; while the smaller one is called the minor
groove, occurs when they are close together.
 Since the major and minor grooves expose the edges of the bases, the
grooves can be used to tell the base sequence of a
specific DNA molecule.
 The possibility for such recognition is critical, since proteins must be
able to recognize specific DNA sequences on which to bind in order
for the proper functions of the body and cell to be carried out.
Properties of DNA
 DNA helices can be right handed or left handed. But the B
– conformation of DNA having the right handed helices is the most
stable.
 On heating the two strands of DNA separate from each other and on
cooling these again hybridize.
 The temperature at which the two strands separate completely
is known as melting temperature (Tm). Melting temperature is specific
for each specific sequence.
 The B sample of DNA having higher melting point must have more C-G
content because C-G pair has 3 hydrogen bonds.
 The sequence of bases along the DNA molecule encodes for the
sequence of amino acids in every protein in all organisms.
Types of DNA
Eukaryotic organisms such as animals, plants and fungi, store the majority of
their DNA inside the cell nucleus and some of their DNA in organelles such as
mitochondria.
Based on the location DNA may be:
Nuclear DNA
 Located within the nucleus of eukaryote cells.
 Usually has two copies per cell.
 The structure of nuclear DNA chromosomes is linear with open ends and
includes 46 chromosomes containing 3 billion nucleotides.
 Nuclear DNA is diploid, ordinarily inheriting the DNA from two parents.
The mutation rate for nuclear DNA is less than 0.3%.
Mitochondrial DNA
 Mitochondrial DNA is located in the mitochondria.
 Contains 100-1,000 copies per cell.
 Mitochondrial DNA chromosomes usually have closed, circular
structures, and contain for example 16,569 nucleotides in human.
 Mitochondrial DNA is haploid, coming only from the mother.
 The mutation rate for mitochondrial DNA is generally higher than
nuclear DNA.
Forms of DNA
 Most of the DNA is in the classic Watson-Crick model simply called as B-
DNA or B-form DNA.
 In certain condition, different forms of DNAs are found to be appeared
like A-DNA,Z-DNA,C- DNA,D-DNA,E-DNA.
 This deviation in forms are based on their structural diversity.
1. B-DNA
Most common, originally deduced from X-ray diffraction of sodium salt of
DNA fibres at 92% relative humidity.
2. A-DNA
Originally identified by X-ray diffraction of analysis of DNA fibres at 75%
relative humidity.
3. Z-DNA
Left handed double helical structure winds to the left in a zig- zag pattern.
4. C-DNA
Formed at 66% relative humidity and in presence of Li+ and Mg2+ ions.
5. D-DNA
Rare variant with 8 base pairs per helical turn, form in structure devoid of
guanine .
6. E- DNA
Extended or eccentric DNA.
Functions of DNA
DNA has a crucial role as genetic material in most living organisms. It carries
genetic information from cell to cell and from generation to generation.
Thus its major functions include:
 Storing genetic information
 Directing protein synthesis
 Determining genetic coding
 Directly responsible for metabolic activities, evolution, heredity, and
differentiation.
It is a stable molecule and holds more complex information for longer periods
of time.
Topic 3 : RNA- Properties, Structure, Types and
Functions
 RNA or ribonucleic acid is a polymer of nucleotides which is made up of
a ribose sugar, a phosphate, and bases such as adenine, guanine,
cytosine, and uracil.
 It is a polymeric molecule essential in various biological roles
in coding, decoding, regulation, and expression of genes.
Figure: (a) Ribonucleotides contain the pentose sugar ribose instead of

the deoxyribose found in deoxyribonucleotides. (b) RNA contains the
pyrimidine uracil in place of thymine found in DNA.
DNA at the Bottom of the Sea in Antarctica
RNA STRUCTURE
Like DNA, RNA is a long polymer consisting of nucleotides.
 RNA is a single-stranded helix.
 The strand has a 5′end (with a phosphate group) and a 3′end (with a
hydroxyl group).
 It is composed of ribonucleotides.
 The ribonucleotides are linked together by 3′ –> 5′ phosphodiester
bonds.
 The nitrogenous bases that compose the ribonucleotides include
adenine, cytosine, uracil, and guanine.
Thus, the difference in the structure of RNA from that of DNA include:
 The bases in RNA are adenine (abbreviated A), guanine (G), uracil (U)
andcytosine (C).
Thus thymine in DNA is replaced by uracil in RNA, a different pyrimidine.
However, like thymine, uracil can form base pairs with adenine.
 The sugar in RNA is ribose rather than deoxyribose as in DNA.
 The corresponding ribonucleosides are adenosine, guanosine, cytidine
and uridine. The corresponding ribonucleotides are adenosine 5’-
triphosphate (ATP), guanosine 5’-triphosphate (GTP), cytidine 5’-
triphosphate (CTP) and uridine 5’-triphosphate (UTP).
RNA Secondary Structure
 Most RNA molecules are single-stranded but an RNA molecule may
contain regions which can form complementary base pairing where
the RNA strand loops back on itself.
 If so, the RNA will have some double-stranded regions.
 Ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs) exhibit substantial
secondary structure, as do some messenger RNAs (mRNAs).
Types of RNA
In both prokaryotes and eukaryotes, there are three main types of RNA –
 rRNA (ribosomal)
 tRNA (transfer)
 mRNA (messenger)
Messenger RNA (mRNA)

 Accounts for about 5% of the total RNA in the cell.
 Most heterogeneous of the 3 types of RNA in terms of both base
sequence and size.
 It carries the genetic code copied from the DNA during transcription in
the form of triplets of nucleotides called codons.
 As part of post-transcriptional processing in eukaryotes, the 5’ end of
mRNA is capped with a guanosine triphosphate nucleotide, which
helps in mRNA recognition during translation or protein synthesis.
 Similarly, the 3’ end of an mRNA has a poly A tail or multiple adenylate
residues added to it, which prevent enzymatic degradation of mRNA.
Both 5’ and 3’ end of an mRNA imparts stability to the mRNA.
Function
mRNA transcribes the genetic code from DNA into a form that can be read
and used to make proteins. mRNA carries genetic information from the
nucleus to the cytoplasm of a cell.
Ribosomal RNA (rRNA)
 Found in the ribosomes and account for 80% of the total RNA present in
the cell.
 Ribosomes consist of two major components: the small
ribosomal subunits, which read the RNA, and the large subunits, which
join amino acids to form a polypeptide chain. Each subunit comprises
one or more ribosomal RNA (rRNA) molecules and a variety of
ribosomal proteins (r-protein or rProtein).
 Different rRNAs present in the ribosomes include small rRNAs and large
rRNAs, which denote their presence in the small and large subunits of
the ribosome.
 rRNAs combine with proteins in the cytoplasm to form ribosomes, which
act as the site of protein synthesis and has the enzymes needed for the
process.
 These complex structures travel along the mRNA molecule during
translation and facilitate the assembly of amino acids to form a
polypeptide chain. They bind to tRNAs and other molecules that are
crucial for protein synthesis.
Function
rRNA directs the translation of mRNA into proteins.
Transfer RNA (tRNA)
 tRNA is the smallest of the 3 types of RNA having about 75-95
nucleotides.
 tRNAs are an essential component of translation, where their main
function is the transfer of amino acids during protein synthesis.
Therefore they are called transfer RNAs.
 Each of the 20 amino acids has a specific tRNA that binds with it and
transfers it to the growing polypeptide chain. tRNAs also act as
adapters in the translation of the genetic sequence of mRNA into
proteins. Therefore they are also called adapter molecules.
Structure of tRNA
tRNAs have a clover leaf structure which is stabilized by strong hydrogen
bonds between the nucleotides. Apart from the usual 4 bases, they normally
contain some unusual bases mostly formed by methylation of the usual bases,
for example, methyl guanine and methylcytosine.
 Three structural loops are formed via hydrogen bonding.
 The 3′ end serves as the amino acid attachment site.
 The center loop encompasses the anticodon.
 The anticodon is a three-base nucleotide sequence that binds to the
mRNA codon.
 This interaction between codon and anticodon specifies the next amino
acid to be added during protein synthesis.
Function
Transfer RNA brings or transfers amino acids to the ribosome that correspond
to each three-nucleotide codon of rRNA. The amino acids then can be joined
together and processed to make polypeptides and proteins.
Other Properties of RNA
 RNA forms in the nucleolus, and then moves to specialized regions of

the cytoplasm depending on the type of RNA formed.
 RNA, containing a ribose sugar, is more reactive than DNA and is not
stable in alkaline conditions. RNA’s larger helical grooves mean it is
more easily subject to attack by enzymes.
 RNA strands are continually made, broken down and reused.
 RNA is more resistant to damage from UV light than DNA.
 RNA’s mutation rate is relatively higher.
 Unusual bases may be present.
 The number of RNA may differ from cell to cell.
 Rate of renaturation after melting is quick.
 RNA is more versatile than DNA, capable of performing numerous,
diverse tasks in an organism.
FUNCTIONS OF RNA
 RNA is a nucleic acid messenger between DNA and ribosomes.
 It serves as the genetic material in some organisms (viruses).
 Some RNA molecules play an active role within cells by catalyzing
biological reactions, controlling gene expression, or sensing and
communicating responses to cellular signals.
 Messenger RNA (mRNA) copies DNA in the nucleus and carries the info
to the ribosomes (in cytoplasm).
 Ribosomal RNA (rRNA) makes up a large part of the ribosome; reads
and decodes mRNA.
 Transfer RNA (tRNA) carries amino acids to the ribosome where they are
joined to form proteins.
 Certain RNAs are able to catalyse chemical reactions such as cutting
and ligating other RNA molecules, and the catalysis of peptide
bond formation in the ribosome; these are known as ribozymes.
Topic 4 :What are Proteins?

Proteins are the most abundant biological macromolecules, occurring in all
cells. It is also the most versatile organic molecule of the living systems and
occurs in great variety; thousands of different kinds, ranging in size from
relatively small peptides to large polymers. Proteins are the polymers of amino
acids covalently linked by the peptide bonds. The building blocks of proteins
are the twenty naturally occurring amino acids. Thus, proteins are the
polymers of amino acids.
Protein Structure
 The linear sequence of amino acid residues in a polypeptide chain
determines the three-dimensional configuration of a protein, and the
structure of a protein determines its function.
 All proteins contain the elements carbon, hydrogen, oxygen, nitrogen,
and sulfur some of these may also contain phosphorus, iodine, and
traces of metals like ions, copper, zinc, and manganese.
 A protein may contain 20 different kinds of amino acids. Each amino
acid has an amine group at one end and an acid group at the other
and a distinctive side chain.
 The backbone is the same for all amino acids while the side chain differs
from one amino acid to the next.
The structure of proteins can be divided into four levels of organization:
1. Primary Structure
 The primary structure of a protein consists of the amino acid sequence
along the polypeptide chain.
 Amino acids are joined by peptide bonds.
 Because there are no dissociable protons in peptide bonds, the charges
on a polypeptide chain are due only to the N-terminal amino group,
the C-terminal carboxyl group, and the side chains on amino acid
residues.
 The primary structure determines the further levels of organization of
protein molecules.
2. Secondary Structure
 The secondary structure includes various types of local conformations in
which the atoms of the side chains are not involved.
 Secondary structures are formed by a regularly repeating pattern of
hydrogen bond formation between backbone atoms.
 The secondary structure involves α-helices, β-sheets, and other types of
folding patterns that occur due to a regularly repeating pattern of
hydrogen bond formation.
 The secondary structure of protein could be :
1. Alpha-helix
2. Beta-helix
 The α-helix is a right-handed coiled strand.
 The side-chain substituents of the amino acid groups in an α-helix
extend to the outside.
 Hydrogen bonds form between the oxygen of the C=O of each peptide
bond in the strand and the hydrogen of the N-H group of the peptide
bond four amino acids below it in the helix.
 The side-chain substituents of the amino acids fit in beside the N-H
groups.
 The hydrogen bonding in a ß-sheet is between strands (inter-strand)
rather than within strands (intra-strand).
 The sheet conformation consists of pairs of strands lying side-by-side.
 The carbonyl oxygens in one strand hydrogen bond with the amino
hydrogens of the adjacent strand.
 The two strands can be either parallel or anti-parallel depending on
whether the strand directions (N-terminus to C-terminus) are the same
or opposite.
 The anti-parallel ß-sheet is more stable due to the more well-aligned
hydrogen bonds.
3. Tertiary Structure
 The tertiary structure of a protein refers to its overall three-dimensional
conformation.
 The types of interactions between amino acid residues that produce the
three-dimensional shape of a protein include hydrophobic
interactions, electrostatic interactions, and hydrogen bonds, all of
which are non-covalent.
 Covalent disulfide bonds also occur.
 It is produced by interactions between amino acid residues that may be
located at a considerable distance from each other in the primary
sequence of the polypeptide chain.
 Hydrophobic amino acid residues tend to collect in the interior of
globular proteins, where they exclude water, whereas hydrophilic
residues are usually found on the surface, where they interact with
water.
4. Quaternary Structure
 Quaternary structure refers to the interaction of one or more subunits to
form a functional protein, using the same forces that stabilize the
tertiary structure.
 It is the spatial arrangement of subunits in a protein that consists of
more than one polypeptide chain.
Classification of Proteins
Based on the chemical nature, structure, shape, and solubility, proteins are
classified as:
1. Simple proteins: They are composed of only amino acid residue. On
hydrolysis, these proteins yield only constituent amino acids. It is
further divided into:
 Fibrous protein: Keratin, Elastin, Collagen
 Globular protein: Albumin, Globulin, Glutelin, Histones
2. Conjugated proteins: They are combined with non-protein moiety.
Eg. Nucleoprotein, Phosphoprotein, Lipoprotein, Metalloprotein, etc.
3. Derived proteins: They are derivatives or degraded products of simple
and conjugated proteins. They may be :
 Primary derived protein: Proteans, Metaproteins, Coagulated
proteins
 Secondary derived proteins: Proteosesn or albunoses, peptones,
peptides.
Functions of Proteins
Proteins are vital for growth and repair, and their functions are endless. They
also have an enormous diversity of biological functions and are the most
important final products of the information pathways.
 Proteins, which are composed of amino acids, serve in many roles in the
body (e.g., as enzymes, structural components, hormones, and
antibodies).
 They act as structural components such as keratin of hair and nail,
collagen of bone, etc.
 Proteins are the molecular instruments through which genetic
information is expressed.
 They execute their activities in the transport of oxygen and carbon
dioxide by hemoglobin and special enzymes in the red cells.
 They function in the homeostatic control of the volume of the
circulating blood and that of the interstitial fluids through the plasma
proteins.
 They are involved in blood clotting through thrombin, fibrinogen, and
other protein factors.
 They act as the defense against infections by means of protein
antibodies.
 They perform hereditary transmission by nucleoproteins of the cell
nucleus.
 Ovalbumin, glutelin, etc. are storage proteins.
 Actin, myosin act as a contractile protein important for muscle
contraction.
Topic 5 :OMICS technology
The terms “Ome” derived from a Greek word and “Omics” are derivations of the
suffix -ome which means “whole,” “all,” or “complete.” With the addition of -ome to
cellular molecules, such as gene, transcript, protein, metabolite, it can be referred as
genome, transcriptome, proteome, metabolome, respectively [3, 4].
Omics technologies and systems biology are the emerging concept of molecular
medicine (Figure 1). Omics refers to collective and high-throughput analyses
including genomics, transcriptomics, proteomics, and metabolomics/lipidomics that
integrated through robust systems biology, bioinformatics, and computational tools to
study the mechanism, interaction, and function of cell populations’ tissues, organs,
and the whole organism at the molecular level in a non-targeted and non-biased
manner [5].
Genomics is the systematic study of an organism’s entire genome [6]. The human
genome is made up of DNA (deoxyribonucleic acid) comprising approximately 3
billion base pairs of four chemical structures (adenine, guanine, cytosine, and
thymine), also called nucleotides. DNA contains genetic information required to build
and maintain cells. A gene denotes a specific unit of DNA that hold information to
make a specific functional unit named protein. It is estimated that the entire human
genome contains approximately 21,500 genes. The order of the nucleotides reveals the
meaning of the information encoded in DNA. Emergence of high-throughput
sequencing technologies, such as next-generation sequencing, enables analysis of
variations between individuals at the genomics level.
Transcriptomics is the study of transcriptome that comprises the entire collection of
RNA (ribonucleic acid) sequences, called transcripts, in a cell. It is estimated that a
human cell contains about 25,000 transcripts. RNAs are classified into two groups: (1)
mRNA is the coding RNA that is translated into protein sequences. (2) Non-coding
RNAs are also classified into two subgroups; short non-coding RNAs such as
microRNA (miRNA) and long non-coding RNAs (lncRNA). Non-coding RNAs are
involved in gene regulation. Next-generation RNA sequencing technologies allow
deeply understanding of variations and gene expression on various types of RNA
molecules including miRNA, mRNA, and lncRNA [2].
Proteomics is the study of proteome, which is defined as the set of all expressed
proteins and interacting protein family networks, and biochemical pathways in a cell,
tissue, or organism. Although, the exact number of proteins/peptides is still unclear, it
is estimated to be around a few hundred thousand.
Metabolomics is the study of metabolome within cells, biofluids, tissues, or

organisms. Metabolome can be defined as the small molecules and their interactions
within a biological system under a given genetic, nutritional, and environmental
condition. Since the metabolome is the final downstream product, changes and
interactions between gene expression, protein expression, and the environment are
directly reflected in metabolome making it more physically and chemically complex
than the other “omes.” The metabolome is the closest to the phenotype among other
omics approaches. Metabolomics best modulates and represents the molecular
phenotype of health and disease [7]. In this regard, metabolomics is a brilliant source
for disease-associated biomarkers. Mass spectrometry-based metabolomics/lipidomics
provides a useful approach for both identification of disease-related metabolites in
biofluids or tissue, and also encompasses classification and/or characterization of
disease- or treatment-associated molecular patterns generated from metabolites [8, 9].
Metabolomics analysis identifies different metabotypes of disease severity and makes
successful clinical and molecular phenotyping and patient stratification.
Topic 6 : Biological Databases :
 These are the databases consisting of biological data like protein
sequencing, molecular structure, DNA sequences, etc in an organized
form.
 Several computer tools are there to manipulate the biological data like
an update, delete, insert, etc. Scientists, researchers from all over the
world enter their experiment data and results in a biological database
so that it is available to a wider audience.
 Biological databases are free to use and contain a huge collection of a
variety of biological data.
Uses of biological Databases :
 It helps the researchers to study the available data and form a new
thesis, anti-virus, helpful bacteria, medicines, etc.
 It helps scientists to understand the concepts of biological
phenomena.
 The database acts as a storage of information.
 It helps remove the redundancy of data.
Types of Biological Databases :
There are basically 3 types of biological databases are as follows.
1. Primary databases :
 It can also be called an archival database since it archives the
experimental results submitted by the scientists. The primary database
is populated with experimentally derived data like genome sequence,
macromolecular structure, etc. The data entered here remains
uncurated(no modifications are performed over the data).
 It obtains unique data obtained from the laboratory and these data are
made accessible to normal users without any change.
 The data are given accession numbers when they are entered into the
database. The same data can later be retrieved using the accession
number. Accession number identifies each data uniquely and it never
changes.
Examples –
 Examples of Primary database- Nucleic Acid Databases are
GenBank and DDBJ
 Protein Databases are PDB,SwissProt,PIR,TrEMBL,Metacyc, etc.
2. Secondary Database :
 The data stored in these types of databases are the analyzed result of
the primary database. Computational algorithms are applied to the
primary database and meaningful and informative data is stored inside
the secondary database.
 The data here are highly curated(processing the data before it is
presented in the database). A secondary database is better and
contains more valuable knowledge compared to the primary database.
Examples –
Examples of Secondary databases are as follows.
 InterPro (protein families, motifs, and domains)
 UniProt Knowledgebase (sequence and functional information on
proteins)
3. Composite Databases :
 The data entered in these types of databases are first compared and
then filtered based on desired criteria.
 The initial data are taken from the primary database, and then they are
merged together based on certain conditions.
 It helps in searching sequences rapidly. Composite Databases contain
non-redundant data.
Examples –
Examples of Composite Databases are as follows.
 Composite Databases -OWL,NRD and Swissport +TREMBL
Topic : 7 DNA sequencing

Key points:
 DNA sequencing is the process of determining the sequence of
nucleotides (As, Ts, Cs, and Gs) in a piece of DNA.
 In Sanger sequencing, the target DNA is copied many times,

making fragments of different lengths. Fluorescent “chain
terminator” nucleotides mark the ends of the fragments and allow
the sequence to be determined.
 Next-generation sequencing techniques are new, large-scale

approaches that increase the speed and reduce the cost of DNA
sequencing.
What is sequencing?
You may have heard of genomes being sequenced. For instance, the
human genome was completed in 2003, after a many-year,
international effort. But what does it mean to sequence a genome, or
even a small fragment of DNA?
DNA sequencing is the process of determining the sequence of

nucleotide bases (As, Ts, Cs, and Gs) in a piece of DNA. Today, with
the right equipment and materials, sequencing a short piece of DNA is
relatively straightforward.
Sequencing an entire genome (all of an organism’s DNA) remains a

complex task. It requires breaking the DNA of the genome into many
smaller pieces, sequencing the pieces, and assembling the
sequences into a single long "consensus." However, thanks to new
methods that have been developed over the past two decades,
genome sequencing is now much faster and less expensive than it
was during the Human Genome Project11start superscript, 1, end
superscript.
In this article, we’ll take a look at methods used for DNA sequencing.
We'll focus on one well-established method, Sanger sequencing, but
we'll also discuss new ("next-generation") methods that have reduced
the cost and accelerated the speed of large-scale sequencing.
Topic 8: Sanger sequencing: The chain

termination method
Regions of DNA up to about 900900900 base pairs in length are
routinely sequenced using a method called Sanger sequencing or
the chain termination method. Sanger sequencing was developed by
the British biochemist Fred Sanger and his colleagues in 1977.
In the Human Genome Project, Sanger sequencing was used to

determine the sequences of many relatively small fragments of human
DNA. (These fragments weren't necessarily 900900900 bp or less, but
researchers were able to "walk" along each fragment using multiple
rounds of Sanger sequencing.) The fragments were aligned based on
overlapping portions to assemble the sequences of larger regions of
DNA and, eventually, entire chromosomes.
Although genomes are now typically sequenced using other methods

that are faster and less expensive, Sanger sequencing is still in wide
use for the sequencing of individual pieces of DNA, such as fragments
used in DNA cloning or generated through polymerase chain
reaction (PCR).
Ingredients for Sanger sequencing
Sanger sequencing involves making many copies of a target DNA
region. Its ingredients are similar to those needed for DNA
replication in an organism, or for polymerase chain reaction (PCR),
which copies DNA in vitro. They include:
 A DNA polymerase enzyme
 A primer, which is a short piece of single-stranded DNA that

binds to the template DNA and acts as a "starter" for the
polymerase
 The four DNA nucleotides (dATP, dTTP, dCTP, dGTP)
 The template DNA to be sequenced
However, a Sanger sequencing reaction also contains a unique

ingredient:
 Dideoxy, or chain-terminating, versions of all four nucleotides

(ddATP, ddTTP, ddCTP, ddGTP), each labeled with a different color
of dye
Dideoxy nucleotides are similar to regular, or deoxy, nucleotides, but
with one key difference: they lack a hydroxyl group on the 3’ carbon of
the sugar ring. In a regular nucleotide, the 3’ hydroxyl group acts as a
“hook," allowing a new nucleotide to be added to an existing chain.
Once a dideoxy nucleotide has been added to the chain, there is no

hydroxyl available and no further nucleotides can be added. The chain
ends with the dideoxy nucleotide, which is marked with a particular
color of dye depending on the base (A, T, C or G) that it carries.
Method of Sanger sequencing

The DNA sample to be sequenced is combined in a tube with primer,
DNA polymerase, and DNA nucleotides (dATP, dTTP, dGTP, and
dCTP). The four dye-labeled, chain-terminating dideoxy nucleotides
are added as well, but in much smaller amounts than the ordinary
nucleotides.
The mixture is first heated to denature the template DNA (separate the
strands), then cooled so that the primer can bind to the single-
stranded template. Once the primer has bound, the temperature is
raised again, allowing DNA polymerase to synthesize new DNA
starting from the primer. DNA polymerase will continue adding
nucleotides to the chain until it happens to add a dideoxy nucleotide
instead of a normal one. At that point, no further nucleotides can be
added, so the strand will end with the dideoxy nucleotide.
This process is repeated in a number of cycles. By the time the cycling

is complete, it’s virtually guaranteed that a dideoxy nucleotide will
have been incorporated at every single position of the target DNA in at
least one reaction. That is, the tube will contain fragments of different
lengths, ending at each of the nucleotide positions in the original DNA
(see figure below). The ends of the fragments will be labeled with dyes
that indicate their final nucleotide.After the reaction is done, the
fragments are run through a long, thin tube containing a gel matrix in a
process called capillary gel electrophoresis. Short fragments move
quickly through the pores of the gel, while long fragments move more
slowly. As each fragment crosses the “finish line” at the end of the
tube, it’s illuminated by a laser, allowing the attached dye to be
detected.
The smallest fragment (ending just one nucleotide after the primer)
crosses the finish line first, followed by the next-smallest fragment
(ending two nucleotides after the primer), and so forth. Thus, from the
colors of dyes registered one after another on the detector, the
sequence of the original piece of DNA can be built up one nucleotide
at a time. The data recorded by the detector consist of a series of
peaks in fluorescence intensity, as shown in the chromatogram above.
The DNA sequence is read from the peaks in the chromatogram.
Uses and limitations

Sanger sequencing gives high-quality sequence for relatively long
stretches of DNA (up to about 900900900 base pairs). It's typically
used to sequence individual pieces of DNA, such as bacterial
plasmids or DNA copied in PCR.
However, Sanger sequencing is expensive and inefficient for larger-

scale projects, such as the sequencing of an entire genome or
metagenome (the “collective genome” of a microbial community). For
tasks such as these, new, large-scale sequencing techniques are
faster and less expensive.
Next-generation sequencing
The name may sound like Star Trek, but that’s really what it’s called!
The most recent set of DNA sequencing technologies are collectively
referred to as next-generation sequencing.
There are a variety of next-generation sequencing techniques that use

different technologies. However, most share a common set of features
that distinguish them from Sanger sequencing:
 Highly parallel: many sequencing reactions take place at the

same time
 Micro scale: reactions are tiny and many can be done at once on
a chip
 Fast: because reactions are done in parallel, results are ready

much faster
 Low-cost: sequencing a genome is cheaper than with Sanger

sequencing
 Shorter length: reads typically range from 505050 -
700700700 nucleotides in length
Conceptually, next-generation sequencing is kind of like running a

very large number of tiny Sanger sequencing reactions in parallel.
Thanks to this parallelization and small scale, large quantities of DNA
can be sequenced much more quickly and cheaply with next-
generation methods than with Sanger sequencing. For example, in
2001, the cost of sequencing a human genome was
almost $100$100dollar sign, 100 millionmillionstart text, m, i, l, l, i, o, n,
end text. In 2015, it was just $1245$1245dollar sign, 124522squared!
Why does fast and inexpensive sequencing matter? The ability to

routinely sequence genomes opens new possibilities for biology
research and biomedical applications. For example, low-cost
sequencing is a step towards personalized medicine – that is, medical
treatment tailored to an individual's needs, based on the gene variants
in his or her genome.
Two main methods are widely known to be used to sequence DNA:

1. The Chemical Method (also called the Maxam–Gilbert method after
its inventors).
2. The Chain Termination Method (also known as the Sanger dideoxy
method after its inventor).
 Maxam–Gilbert technique depends on the relative chemical liability of
different nucleotide bonds, whereas the Sanger method interrupts
elongation of DNA sequences by incorporating dideoxynucleotides
into the sequences.
 The chain termination method is the method more usually used because
of its speed and simplicity.
Watch How These Fire Ants Survive a Flood Using a Pretty Ingenious Method
Topic 9: Chemical Cleavage Method (Maxam–

Gilbert Method)
 In 1976-1977, Allan Maxam and Walter Gilbert developed a DNA
sequencing method based on chemical modification of DNA and
subsequent cleavage at specific bases.
 The method requires radioactive labelling at one end and purification of
the DNA fragment to be sequenced.
 Chemical treatment generates breaks at a small proportions of one or
two of the four nucleotide based in each of four reactions (G,A+G, C,
C+T).
 Thus a series of labelled fragments is generated, from the radiolabelled
end to the first ‘cut’ site in each molecule.
 The fragments in the four reactions are arranged side by side in gel
electrophoresis for size separation.
 To visualize the fragments, the gel is exposed to X-ray film for
autoradiography, yielding a series of dark bands each corresponding
to a radiolabelled DNA fragment, from which the sequence may be
inferred.
Key Features
 Base-specific cleavage of DNA by certain chemicals
 Four different chemicals, one for each base
 A set of DNA fragments of different sizes. DNA fragments contain up to
500 nucleotides
Advantages
 Purified DNA can be read directly
 Homopolymeric DNA runs are sequenced as efficiently as
heterogeneous DNA sequences
 Can be used to analyze DNA protein interactions (i.e. footprinting)
 Can be used to analyze nucleic acid structure and epigenetic
modifications to DNA
Disadvantages
 It requires extensive use of hazardous chemicals.
 It has a relatively complex set up / technical complexity.
 It is difficult to “scale up” and cannot be used to analyze more than 500
base pairs.
 The read length decreases from incomplete cleavage reactions.
 It is difficult to make Maxam-Gilbert sequencing based DNA kits.
Significance of DNA Sequencing

 Information obtained by DNA sequencing makes it possible to
understand or alter the function of genes.
 DNA sequence analysis demonstrates regulatory regions that control
gene expression and genetic “hot spots” particularly susceptible to
mutation.
 Comparison of DNA sequences shows evolutionary relationships that
provide a framework for definite classification of microorganisms
including viruses.
 Comparison of DNA sequences facilitates identification of conserved
regions, which are useful for development of specific hybridization
probes to detect microorganisms including viruses in clinical samples.
 DNA sequencing has become sufficiently fast and inexpensive to allow
laboratory determination of microbial sequences for identification of
microbes. Sequencing of the 16S ribosomal subunit can be used to
identify specific bacteria. Sequencing of viruses can be used to identify
the virus and distinguish different strains.
Topic 10 : Big data is a collection of data from many different sources and is often
describe by five characteristics: volume, value, variety, velocity, and veracity.
 Volume: the size and amounts of big data that companies manage and
analyze
 Value: the most important “V” from the perspective of the business, the value
of big data usually comes from insight discovery and pattern recognition that
lead to more effective operations, stronger customer relationships and other
clear and quantifiable business benefits
 Variety: the diversity and range of different data types, including unstructured
data, semi-structured data and raw data
 Velocity: the speed at which companies receive, store and manage data –
e.g., the specific number of social media posts or search queries received
within a day, hour or other unit of time
 Veracity: the “truth” or accuracy of data and information assets, which often
determines executive-level confidence
The additional characteristic of variability can also be considered:
 Variability: the changing nature of the data companies seek to capture,

manage and analyze – e.g., in sentiment or text analytics, changes in the
meaning of key words or phrases
List of Applications of Big Data In

Biotechnology
Plants, animals, and microbes have been used by humans for nutrition and
development of products for consumption. Biotechnology is the application that
uses living processes and biological systems and their derivatives to obtain and
produce the products to increase the quality of human life. The ultimate goal of
this field is to improve the product yield from living organisms either by
employing principles of bio-engineering/bioprocess technology or by
genetically modifying the organisms. One example is the production of bread or
other bakery items from wheat flour after adding yeast as fermenting organisms.
However, the field of science needs results, information, and statistics for
research, to grow and discover something new. In the same way, biotechnology
research relies on a lot of information. Lately, data analytics, business
intelligence, and research and development are the most reliable tools for almost
any field of research and growth. The possibilities of using the technology of
analyzing large collections of information in Big Data database systems in
medicine have been increasing in recent years. And all the development of the
other applications of biotechnology like agriculture, genetic engineering can be
made even better and efficient due to the applications of Big Data database for
its research. Big Data is like a virtual library where an enormous space of data
and information is stored and analyzed. And researching further on it in a
particular way can help the biotech to improve and attain even more success in a
short duration of time with less effort.
List of Applications of Big Data In

Biotechnology
Biotechnology Could Largely Benefit By the Use of Big Data. Here are various
applications of Big Data in the field of biotechnology.
1. Genomics
The Genome Project, especially for the human, took over a long time of
worldwide research and support to identify the 20,000 plus genes and sequence
of all 3 billion genome bases. This project costs billions of dollars globally, but
today’s biotechnology companies use the Big Data database that can decode
entire genomes for just thousands of dollars. The genomics market helps
different data companies that use frameworks and tools to conduct huge and
complicated computing tasks to analyze genetic, medical, and biological data.
These companies often work with computer hardware giants to improve their
application performance and their Big Data analysis results.
2. Agriculture
Big data can also be quite an application in the field of agriculture. Data
gathered from GPS technology are stored in the framework of Big Data, and
multiple GPS enabled tractors can help farmers to cope with the changing
environmental condition by implementing farming precisely. Data analytics is
also changing the landscape of the biotech industry with its contribution to
genetic research in creating genetically modified organisms. Such engineered
crops can be modified with inputs from data collection from Big Data to
improve crop yield, survive changing conditions, and disease-free plants are
obtained.
3. Pharma Automation
As per almost every pharma company, it receives millions of compounds before
selecting to appropriate for the pre-clinical trials. For the journey to successful
drug discovery, it consumes an enormous amount of time and money. So there
are many software tools that help inefficiency and less time for drug discovery.
Big Data based modeling uses large size and storage like terabytes of data and
information of different compounds and their characteristics. Therefore it acts as
a virtual library that has information of millions of compounds to identify the
compounds that will most likely experience success. These predictive modeling
programs compare the trial criteria and desired outcomes against the target
disease and chemical structures. Pharma automation reduces risks, saves money,
and offers faster research-to-market cycles.
4. Healthcare
Technically the healthcare sector of biotechnology has lagged behind than
others in the use of Big Data database. Healthcare stakeholders now have access
to promising new threads of knowledge. This information in the form of Big
Data gives complexity, diversity, and timelines. Pharmaceutical industry experts
analyze big data to obtain insights. With these technological advances in the
biotech industry have improved. Their ability to work with such data, even
though files are enormous and have different database organizations that
increase the condition and the rate of development of pharmaceutical healthcare.
5. Crowdsourcing
According to Wikipedia, crowdsourcing is the sourcing model in which
individuals or organizations obtain goods and services. These services include
ideas and finances form a large, relatively open, and often rapidly-evolving
group of internet users. It divides work between participants to achieve a
cumulative result. Therefore it is commonly used outsourcing labor and
entrepreneurial projects. Some pharma companies have created online gaming
platforms that involve disease profiles, research challenges, and solving medical
puzzles. With crowdsourcing, patients drove research works through the online
surveys that empower the consumer to conduct their own studies and research,
upload their own medical data and contribute knowledge about their condition
and symptoms to benefit the whole medical community.
6. Business Development
Every day the body of information on scientific discoveries and pharma
progress from the different sources, presenting an enormous flood of data for
biopharma industries to sift through to find potential licensing opportunities.
Some Big Pharma and biotech companies have turned to analytics and data-
mining technologies to scour disparate Big Data sources and deliver the exact
information they seek. This Big Data always grows and develops along with the
business. Therefore Big Data increases the total revenue and profit of the
business and thus develops biotech business.
7. Sentiment Analysis
Among the tools of Big Data, sentiment analysis is one that helps to analyze
social networking posts and comments. Organizations primarily use it for
marketing, advertising, and public relations research. For example, many
companies use it to find the reaction of the consumer and get their feedback.
However, social media platforms contain millions of health-related comments
because health care consumers are sharing personal and public information
about diseases and medical conditions. Some companies are creating an online
group and community to centralize and uncover new discoveries and
technologies. When used together with crowdsourcing, these tools provide
sources of free labor and infinite information.
8. Prevention of Drug Fraud

Every day in developing countries, fake drugs kill many people and affect their
health condition. Due to this sad scenario, patients and their family has lost hope
from different pharmaceutical companies and lose sales. World Health
Organization estimates that 700,000 patients in Africa perish as a result of
dummy versions of anti-malaria and tuberculosis meds, and the problem costs
drug makers $75 billion annually. The killer problem has pushed the startup
Sproxil to work with tech giant IBM ($IBM) to enable drug companies to
analyze Big Data sources to spot patterns of counterfeit drug activity.
Sproxil aims to amass large amounts of transactional data with a system that
enables patients to text-message codes from medicine bottles to learn whether
the meds are authentic. With IBM’s visualization tech and other analytics,
drugmakers can tap a large amount of data on drug transactions in real-time,
according to Big Blue. Presumably, prescription drug frauds can be spotted.
9. Discovering Genetic Biomarkers

There are different genomic analysis tools that identify DNA code variants and
the genetic biomarkers of disease risk factors. Some Big Data analytics and
informatics systems are capable of integrating multiple data types together for
enhanced results. The ability to correlate large data warehouses, phenotypic, and
genomic data, provides a clearer understanding of disease factors, symptoms,
and development. There are software solutions that allow researchers to view
sequence alignments and disease data alongside objective findings. Therefore
Big Data helps in discovering Biomarkers and helps in identifying and curing
the diseases.
In Conclusion
The prime regions enlisting global biotech industries are in the US and Europe,
where there are over 700 companies and over 200 thousand employees that
generate around 140 billion U.S. dollars of revenue. In this 21st century, the
health care sector is growing as another sector of a promising economy where
technology for collecting and processing a large amount of data and information
in Big Data database systems are applied. With the help of Big Data and its
applications in the field of biotechnology, the growth and innovations in biotech
would rise even more rapidly, alongside delivering all the great promises of its
true potential.

CAB Unit 1 Notes

Uploaded by

Copyright:

Available Formats

CAB Unit 1 Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CAB Unit 1 Notes

Uploaded by

Copyright:

Available Formats

Central Dogma- Replication, Transcription,

A. Three general transfers

B. Three special transfers

Significance of the Central Dogma of Molecular

Created with BioRender.com

Image Source: Compound Interest.

Figure: (a) Ribonucleotides contain the pentose sugar ribose instead of

Messenger RNA (mRNA)

 RNA forms in the nucleolus, and then moves to specialized regions of

Topic 4 :What are Proteins?

Metabolomics is the study of metabolome within cells, biofluids, tissues, or

Topic : 7 DNA sequencing

 In Sanger sequencing, the target DNA is copied many times,

 Next-generation sequencing techniques are new, large-scale

DNA sequencing is the process of determining the sequence of

Sequencing an entire genome (all of an organism’s DNA) remains a

Topic 8: Sanger sequencing: The chain

In the Human Genome Project, Sanger sequencing was used to

Although genomes are now typically sequenced using other methods

 A DNA polymerase enzyme

 A primer, which is a short piece of single-stranded DNA that

 The four DNA nucleotides (dATP, dTTP, dCTP, dGTP)

 The template DNA to be sequenced

However, a Sanger sequencing reaction also contains a unique

 Dideoxy, or chain-terminating, versions of all four nucleotides

Once a dideoxy nucleotide has been added to the chain, there is no

Method of Sanger sequencing

This process is repeated in a number of cycles. By the time the cycling

Uses and limitations

However, Sanger sequencing is expensive and inefficient for larger-

There are a variety of next-generation sequencing techniques that use

 Highly parallel: many sequencing reactions take place at the

 Fast: because reactions are done in parallel, results are ready

 Low-cost: sequencing a genome is cheaper than with Sanger

Conceptually, next-generation sequencing is kind of like running a

Why does fast and inexpensive sequencing matter? The ability to

Two main methods are widely known to be used to sequence DNA:

Topic 9: Chemical Cleavage Method (Maxam–

Significance of DNA Sequencing

The additional characteristic of variability can also be considered:

 Variability: the changing nature of the data companies seek to capture,

List of Applications of Big Data In

List of Applications of Big Data In

8. Prevention of Drug Fraud

9. Discovering Genetic Biomarkers

You might also like