Genome
Genome
Genome
Genomics:
DNA Sequencing on a
Genomic Scale
Presented by: Faith L. Mglangit
Content
Introduction to Genomics: DNA Sequencing on a Genomic Scale
Positional Cloning: An Introduction to Genomics
Techniques in Genomic Sequencing
Studying and Comparing Genomic Sequences
Genomics II: Functional Genomics, Proteomics, and Bioinformatics
Functional Genomics: Gene Expression on Genomic scale
Proteomics
Bioinformatics
gENOME - the complete set of genes or genetic
material present in a cell or organism.
DNA Sequencing
- refers to the general laboratory technique for
determining the exact sequence of nucleotides, or
bases, in a DNA molecule.
Positional Cloning: An Introduction to Genomics
Positional cloning - one method for the discovery of the genes involved in genetic traits.
Involves the identification of genes that govern genetic diseases.
2. Locating the CpG islands that tend to be associated with genes -use methylation-sensitive restriction
enzymes to search for CpG islands—DNA regions containing unmethylated CpG sequences .
DETECTING A RFLP
Identifying the Gene Mutated in a Human Disease
Classic example of positional cloning: pinpointing the gene for Huntington disease:
Huntington disease (HD)- is a progressive nerve disorder. It begins almost imperceptibly with small tics and
clumsiness.
Michael Conneally and his colleagues spent more than a decade trying to fi nd such a linked gene, but with no
success. In their attempt to fi nd a genetic marker linked to HD, Wexler, Conneally, and James Gusella turned next to
RFLPs.
- found one (called G8) that detected a RFLP that is very tightly linked to HD in the Venezuelan family.
Geneticists mapped the Huntington disease gene (HD) to a region near the end of chromosome 4. Then they used an exon trap
to identify the gene itself.
Gusella’s team’s comparison of the gene in affected and unaffected individuals in 75 HD families demonstrated that it is.
Unaffected individuals - the number of CAG repeats ranged from 11 to 34, and 98% of these unaffected people had 24 or
fewer CAG repeats.
In all affected individuals - the number of CAG re peats had expanded to at least 42, up to a high of about 100. Thus, we
can predict whether an individual will be affected by the disease by looking at the number of CAG repeats in this gene.
Techniques in Genomic Sequencing
1. Yeast Artifi cial Chromosomes (YAC) - , vectors that contain a yeast origin of replication, a
centromere, and two telomeres. Foreign DNA up to 1 million bp long can be inserted between
the centromere and one of the telomeres. It will then replicate along with the YAC
2. Bacterial artificial chromosomes (BACs) - are vectors based on the F plasmid of E. coli.
They can accept inserts up to about 300 kb, but their inserts average about 150 kb
THE CLONE-BY-CLONE STRATEGY
First, the whole genome is mapped by fi nding markers regularly spaced along each
chromosome. A by-product of the mapping is a collection of clones corresponding to the
markers.
Some of these markers are genes, but many more are nameless stretches of DNA, such as:
RFLPs
Variable Number of Tandem Repeats (VNTR)
Sequence-Tagged Sites: ESTs and Microsatellites
SHOTGUN SEQUENCING
First proposed by Craig Venter, Hamilton, and Leroy Hood in 1996.
- assembles libraries of clones with different size inserts, then sequences the inserts at random. This
method relies on a computer program to find areas of overlap among the sequences and piece them
together. In practice, a combination of these methods was used to sequence the human genome
Studying and Comparing Genomic Sequences
THE HUMAN GENOME
Chromosome 22 - only the long arm (22q) of the chromosome was sequenced.
Sequencing of human chromosome 22q has revealed (the first human chromosome to
be sequenced):
The working draft of the human genome - reported by two separate groups allowed estimates that the
genome probably contains fewer genes than anticipated. About half of the genome has derived from the
action of transposons, and transposons themselves have contributed dozens of genes to the genome.
The finished draft of the human genome - is much more accurate and complete than the working drafts,
but it still contains some gaps. On the basis of the finished draft, geneticists estimate that the genome
contains about 20,000–25,000 genes.
Personal Genomics - two different groups used high throughput sequencing to sequence the genomes of
two non-Caucasian individuals, one of Nigerian descent, and one of Han Chinese
descent.
Other Vertebrate Genomes The complete sequences of the mouse and a pufferfi sh (the tiger pufferfish, Fugu
rubripes) have been published.
THE MINIMAL GENOME
it is also possible to define the minimal genome—the set of genes that is the minimum
required for life. It is likely that this minimal genome is larger than the essential gene
set.
A movement has begun to create a barcode to identify any species of life on earth.
The fi rst “barcode of life” will consist of the sequence of a 648-bp piece of the
mitochondrial COI gene from each organism. This sequence is suffi cient to uniquely
identify almost any animal. Other sequences, or barcodes, are being worked out for
plants
Genomics II: Functional
Genomics, Proteomics, and
Bioinformatics
Deletion Analysis - mutants are created by replacing genes one at a time with an antibiotic resistance gene
fl anked by oligomers that serve as a barcode to identity each mutant
RNAi Analysis - “Knocking out” genes by mutagenesis is laborious, and has so far been accomplished on a
genomewide scale only in yeas
Tissue-Specifi c - observe the tissuespecifi city of the genes that are inactivated by mutation or other
Functional Profi ling means.
Locating Target Sites for Transcription Factors
Locating Promoters
Single-nucleotide polymorphisms can probably account for many genetic conditions caused by single
genes, and even multiple genes. They might also be able to predict a person’s response to drugs. A
haplotype map with over 1 million SNPs will make it easier to sort out the important SNPs from those
with no effect. Structural variation (insertions, deletions, inversions, and other rearrangements of
chunks of DNA) is also a surprisingly prominent source of variation in human genomes. Some
structural variation can in principle predispose certain people to contract diseases, but some is
presumably benign, and some is demonstrably benefi cial
Proteomics
PROTEIN SEPARATIONS: 2-D gel electrophoresis
- digestion of the proteins one by one with proteases, and identifying the
resulting peptides by mass spectrometry
PROTEIN ANALYSIS - properties and activities of all the proteins that organism makes in its lifetime
PROTEIN INTERACTIONS
Bioinformatics
involves the building and use of biological databases, some of which contain the DNA
sequences of genomes. Bioinformatics is essential for mining the massive amount of
biological data for meaningful knowledge about gene structure and expression
Using computational biology techniques, Lander and Kellis have discovered highly conserved sequence
motifs in the promoter regions and 39-UTRs of four mammalian species, including humans. The motifs
in the promoter regions probably represent binding sites for transcription factors. Most of the motifs in
the 39-UTRs probably represent binding sites for miRNAs.
The NCBI website contains a vast store of biological information, including genomic and proteomic
data. You can start with a sequence and discover the gene it belongs to, and compare that sequence
with that of similar genes. You can also start with a topic you want to study and query the database
for information on that topic. Or you can look up a protein of interest and view the structure of that
protein in three dimensions by rotating the structure on your computer screen
Thank You for
listening!!