Genome

Introduction to
Genomics:
DNA Sequencing on a
Genomic Scale
Presented by: Faith L. Mglangit
Content
Introduction to Genomics: DNA Sequencing on a Genomic Scale
Positional Cloning: An Introduction to Genomics
Techniques in Genomic Sequencing
Studying and Comparing Genomic Sequences
Genomics II: Functional Genomics, Proteomics, and Bioinformatics
Functional Genomics: Gene Expression on Genomic scale
Proteomics
Bioinformatics
gENOME - the complete set of genes or genetic
material present in a cell or organism.
gENOMICS - the study of the structure and function of

whole genomes.
DNA Sequencing
- refers to the general laboratory technique for
determining the exact sequence of nucleotides, or
bases, in a DNA molecule.
Positional Cloning: An Introduction to Genomics
Positional cloning - one method for the discovery of the genes involved in genetic traits.
Involves the identification of genes that govern genetic diseases.
CLASSICAL TOOLS OF POSITIONAL CLONING

Restriction Fragment Length Polymorphisms - (abbreviated RFLP) refers to differences (or variations)
among people in their DNA sequences at sites recognized by restriction enzymes.
Polymorphism means that a genetic locus has different forms, or alleles.

- with a Southern blot one can highlight small portions of the total genome with
various probes.
1. Finding the exons with exon traps - which uses a special vector to help clone exons only.
2. Locating the CpG islands that tend to be associated with genes -use methylation-sensitive restriction
enzymes to search for CpG islands—DNA regions containing unmethylated CpG sequences .
DETECTING A RFLP
Identifying the Gene Mutated in a Human Disease
Classic example of positional cloning: pinpointing the gene for Huntington disease:
Huntington disease (HD)- is a progressive nerve disorder. It begins almost imperceptibly with small tics and
clumsiness.
Michael Conneally and his colleagues spent more than a decade trying to fi nd such a linked gene, but with no
success. In their attempt to fi nd a genetic marker linked to HD, Wexler, Conneally, and James Gusella turned next to
RFLPs.
- found one (called G8) that detected a RFLP that is very tightly linked to HD in the Venezuelan family.
4 possible haplotypes (clusters of alleles on a single chromosome)
Which haplotype is associated with the disease in the Venezuelan

family?
Figure 24.3 The RFLP associated with the Huntington disease genegene - demonstrates that it is C, individuals with this haplotype have
the disease.
Pedigree of the large Venezuelan family with Huntington disease Southern blots of HindIII fragments from members
of two families, hybridized to the G8 probe
Geneticists mapped the Huntington disease gene (HD) to a region near the end of chromosome 4. Then they used an exon trap
to identify the gene itself.
Gusella’s team’s comparison of the gene in affected and unaffected individuals in 75 HD families demonstrated that it is.
Unaffected individuals - the number of CAG repeats ranged from 11 to 34, and 98% of these unaffected people had 24 or
fewer CAG repeats.
In all affected individuals - the number of CAG re peats had expanded to at least 42, up to a high of about 100. Thus, we
can predict whether an individual will be affected by the disease by looking at the number of CAG repeats in this gene.
The base sequences of viruses and

organisms ranging from phages to bacteria
to animals and plants have been obtained.
A rough draft and finished version of the
human genome have also been obtained.
Comparison of the genomes of closely
related and more distantly related
organisms can shed light on the evolution
of these species.
MILESTONES IN GENOMIC SEQUENCING
THE HUMAN GENOME PROJECT
- In 1990, American geneticists embarked on an ambitious quest: to map and ultimately

sequence the entire human genome.
VECTORS FOR LARGE-SCALE GENOME PROJECTS
1. Yeast Artifi cial Chromosomes (YAC) - , vectors that contain a yeast origin of replication, a
centromere, and two telomeres. Foreign DNA up to 1 million bp long can be inserted between
the centromere and one of the telomeres. It will then replicate along with the YAC
2. Bacterial artificial chromosomes (BACs) - are vectors based on the F plasmid of E. coli.
They can accept inserts up to about 300 kb, but their inserts average about 150 kb
THE CLONE-BY-CLONE STRATEGY
First, the whole genome is mapped by fi nding markers regularly spaced along each
chromosome. A by-product of the mapping is a collection of clones corresponding to the
markers.
Some of these markers are genes, but many more are nameless stretches of DNA, such as:
RFLPs
Variable Number of Tandem Repeats (VNTR)
Sequence-Tagged Sites: ESTs and Microsatellites
SHOTGUN SEQUENCING
First proposed by Craig Venter, Hamilton, and Leroy Hood in 1996.
- assembles libraries of clones with different size inserts, then sequences the inserts at random. This
method relies on a computer program to find areas of overlap among the sequences and piece them
together. In practice, a combination of these methods was used to sequence the human genome
Studying and Comparing Genomic Sequences
THE HUMAN GENOME
Chromosome 22 - only the long arm (22q) of the chromosome was sequenced.
Sequencing of human chromosome 22q has revealed (the first human chromosome to
be sequenced):
(1) gaps that cannot be filled with available methods;

(2) 679 annotated genes;
(3) the great bulk (about 97%) of the chromosome is made up of noncoding DNA;
(4) over 40% of the chromosome is in interspersed repeats such as Alu sequences and
LINEs;
(5) the rate of recombination varies across the chromosome, with long regions of low
rates of recombination punctuated by short regions with relatively high rates;
(6) several examples of local and long-range duplications;
(7) large regions where linkage among genes has been conserved with that in seven
different mouse chromosomes.
WORKING DRAFT AND FINISHED VERSION OF THE HUMAN GENOME
The working draft of the human genome - reported by two separate groups allowed estimates that the
genome probably contains fewer genes than anticipated. About half of the genome has derived from the
action of transposons, and transposons themselves have contributed dozens of genes to the genome.
The finished draft of the human genome - is much more accurate and complete than the working drafts,
but it still contains some gaps. On the basis of the finished draft, geneticists estimate that the genome
contains about 20,000–25,000 genes.
Personal Genomics - two different groups used high throughput sequencing to sequence the genomes of
two non-Caucasian individuals, one of Nigerian descent, and one of Han Chinese
descent.
Other Vertebrate Genomes The complete sequences of the mouse and a pufferfi sh (the tiger pufferfish, Fugu
rubripes) have been published.
THE MINIMAL GENOME
it is also possible to define the minimal genome—the set of genes that is the minimum
required for life. It is likely that this minimal genome is larger than the essential gene
set.
THE BARCODE OF LIFE
A movement has begun to create a barcode to identify any species of life on earth.
The fi rst “barcode of life” will consist of the sequence of a 648-bp piece of the
mitochondrial COI gene from each organism. This sequence is suffi cient to uniquely
identify almost any animal. Other sequences, or barcodes, are being worked out for
plants
Genomics II: Functional
Genomics, Proteomics, and
Bioinformatics
Presented by: Mohammad Adzlan Usman

Functional Genomics: Gene Expression on a Genomic Scale
Functional genomics is the study of the expression of large numbers of genes
Transcriptomics the study of the transcriptome—the complete set of

RNA transcripts that are produced by the genome,
under specific circumstances or in a specific cell
DNA Microarrays and Microchips
- holds thousands of cDNAs or oligonucleotide, hybridize

labeled RNA (or corresponding DNAs) from cells to these
arrays or chips.
- the intensity of hybridization to each spot reveals the
extent of expression of the correspondng gene.
Figure 25.4 Patterns of expression of Drosophila genes during development
SERIAL ANALYSIS OF GENE EXPRESSION (SAGE)
I- n 1995, Victor Velculescu, working with Kenneth Kinzler

and colleagues, developed a novel method of analyzing
the range of genes expressed in a given cell.
- allows us to determine which genes are

expressed in a given tissue and the extent of that
expression.
CAP ANALYSIS OF GENE EXPRESSION (CAGE)
- gives the same information as SAGE about

which genes are expressed, and how
abundantly, in a given tissue. Because it
focuses on the 59-ends of mRNAs, it also allows
the identifi cation of transcription start sites
and, therefore, helps locate promoters..
Whole Chromosome Transcriptional Mapping
High-density whole chromosome

transcriptional mapping studies have shown
that the majority of sequences in cytoplasmic
polyadenylated RNAs derive from non-exon
regions of 10 human chromosomes.
Furthermore, almost half of the transcription
from these same 10 chromosomes is
nonpolyadenylated. Taken together, these
results indicate that the great majority of
stable nuclear and cytoplasmic transcripts of
these chromosomes comes from regions
outside the exons. This may help to explain the
great differences between species, such as
humans and chimpanzees, whose exons are
almost identical
Genomic Functional Profiling
- to determine the pattern of expression of all the genes in an organism at all stages of the organism’s life.
Deletion Analysis - mutants are created by replacing genes one at a time with an antibiotic resistance gene
fl anked by oligomers that serve as a barcode to identity each mutant
RNAi Analysis - “Knocking out” genes by mutagenesis is laborious, and has so far been accomplished on a
genomewide scale only in yeas
Tissue-Specifi c - observe the tissuespecifi city of the genes that are inactivated by mutation or other
Functional Profi ling means.
Locating Target Sites for Transcription Factors
d ChIP-chip or, sometimes, ChIP-chip analysis can be used to identify DNA-binding

ChIP on chip sites for activators and other proteins. In organisms with
small genomes, such as yeast, all of the intergenic regions
can be included in the microarray. But with large genomes,
such as the human genome, that is now impractical. To
narrow the fi eld, CpG islands can be used, since they are
associated with gene control regions. Also, if the timing or
conditions of an activator’s activity are known, the control
regions of genes known to be activated at those times, or
under those conditions, can be used.
Locating Enhancers that Bind Unknown Proteins
Locating Promoters
In Situ Expression Analysis

SINGLE-NUCLEOTIDE POLYMORPHISMS: PHARMACOGENOMICS
Single-nucleotide polymorphisms can probably account for many genetic conditions caused by single
genes, and even multiple genes. They might also be able to predict a person’s response to drugs. A
haplotype map with over 1 million SNPs will make it easier to sort out the important SNPs from those
with no effect. Structural variation (insertions, deletions, inversions, and other rearrangements of
chunks of DNA) is also a surprisingly prominent source of variation in human genomes. Some
structural variation can in principle predispose certain people to contract diseases, but some is
presumably benign, and some is demonstrably benefi cial
Proteomics
PROTEIN SEPARATIONS: 2-D gel electrophoresis
- digestion of the proteins one by one with proteases, and identifying the
resulting peptides by mass spectrometry
PROTEIN ANALYSIS - properties and activities of all the proteins that organism makes in its lifetime
QUANTITATIVE PROTEOMICS tRADTIONAL METHOD - yeast two-hybrid analysis

OTHER METHODS: protein microarrays
immunoaffinity chromatography
mass spectrometry
combinations of experimental methods such as phage display with computatio
PROTEIN INTERACTIONS
Bioinformatics
involves the building and use of biological databases, some of which contain the DNA
sequences of genomes. Bioinformatics is essential for mining the massive amount of
biological data for meaningful knowledge about gene structure and expression
Using computational biology techniques, Lander and Kellis have discovered highly conserved sequence
motifs in the promoter regions and 39-UTRs of four mammalian species, including humans. The motifs
in the promoter regions probably represent binding sites for transcription factors. Most of the motifs in
the 39-UTRs probably represent binding sites for miRNAs.
The NCBI website contains a vast store of biological information, including genomic and proteomic
data. You can start with a sequence and discover the gene it belongs to, and compare that sequence
with that of similar genes. You can also start with a topic you want to study and query the database
for information on that topic. Or you can look up a protein of interest and view the structure of that
protein in three dimensions by rotating the structure on your computer screen
Thank You for
listening!!

Genome

Uploaded by

Copyright:

Available Formats

Genome

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Genome

Uploaded by

Copyright:

Available Formats

Introduction to

gENOMICS - the study of the structure and function of

CLASSICAL TOOLS OF POSITIONAL CLONING

Polymorphism means that a genetic locus has different forms, or alleles.

4 possible haplotypes (clusters of alleles on a single chromosome)

Which haplotype is associated with the disease in the Venezuelan

The base sequences of viruses and

- In 1990, American geneticists embarked on an ambitious quest: to map and ultimately

VECTORS FOR LARGE-SCALE GENOME PROJECTS

(1) gaps that cannot be filled with available methods;

THE BARCODE OF LIFE

Presented by: Mohammad Adzlan Usman

Functional genomics is the study of the expression of large numbers of genes

Transcriptomics the study of the transcriptome—the complete set of

DNA Microarrays and Microchips

- holds thousands of cDNAs or oligonucleotide, hybridize

I- n 1995, Victor Velculescu, working with Kenneth Kinzler

- allows us to determine which genes are

- gives the same information as SAGE about

High-density whole chromosome

d ChIP-chip or, sometimes, ChIP-chip analysis can be used to identify DNA-binding

Locating Enhancers that Bind Unknown Proteins

In Situ Expression Analysis

QUANTITATIVE PROTEOMICS tRADTIONAL METHOD - yeast two-hybrid analysis

You might also like