Human Molecular Genetics: Fourth Edition
Human Molecular Genetics: Fourth Edition
Human Molecular Genetics: Fourth Edition
Chapter 9
Organization of the Human Genome
Base composition:
Average GC = 41% for euchromatic componenet but there is considerable variation
between chromosomes (38% G+C) for chrm. 4 & 13 and 49% for 19.
Giemsa bands (dark bands, low GC, 37%; light bands, high GC 45%).
why are CpG dinucleotides depleted from vertebrate DNA?
Instability of vertebrate CpG dinucleotides
The horizontal bar in the centre is a linear map of the DNA of human chromosome 16 (the central
green segment represents heterochromatin). The black horizontal bars
at the top and bottom represent linear maps of 16 other chromosomes containing large segments that
are shared with chromosome 16, with red connecting lines marking the positions of homologous
sequences. Intrachromosomal duplications are shown by blue chevrons (^) linking the positions of
large duplicated sequences on chromosome 16.
• Organization, distribution & function human protein-coding genes:
(A) Genes in the class III region of the HLA complex are tightly packed and overlapping in some cases.
Arrows show the direction of transcription. (B) Intron 27b of the NF1 (neurofi bromatosis type I) gene is
60.5 kb long and contains three small internal genes, each with two exons, which are transcribed from the
opposing strand. The internal genes (not drawn to scale) are OGMP (oligodendrocyte myelin
glycoprotein) and EVI2A and EVI2B (human homologs of murine genes thought to be involved in
leukemogenesis and located at ecotropic viral integration sites).
- Protein-coding genes often belong to families that are clustered or
dispersed on multiple chromosomes:
Genes in a cluster are often closely related in sequence and are typically transcribed
from the same strand. Gene clusters often contain a mixture of expressed genes and
nonfunctional pseudogenes. The functional status of the θ-globin and CS-L genes is
uncertain. The scales at the top
(globin and growth hormone clusters) and the bottom (albumin cluster) are in
kilobases.
Gene Family
Three different classes of gene family according to the extent of
sequence identity and structural similarity of the protein products:
(A) DEAD box family motifs. This gene family encodes products implicated in cellular
processes involving the alteration of RNA secondary structure, such as translation
initiation and splicing. Eight very highly conserved amino acid motifs are evident,
including the DEAD box (Asp-Glu-Ala-Asp). Numbers refer to frequently found size
ranges for intervening amino acid sequences; X represents any amino acid.
(B) WD repeat family motifs. This gene family encodes products that are involved in a
variety of regulatory functions, such as regulation of cell division, transcription,
transmembrane signaling and mRNA modification. The gene products are
characterized by 4–16 tandem WD repeats that each contain a core sequence of fixed
length beginning with a GH (Gly-His) dipeptide and terminating in the dipeptide WD
(Trp-Asp), preceded by a sequence of variable length.
Pseudogenes
- Gene duplication events that give rise to multigene families also create
pseudogenes and gene fragments.
- Pseudogenes are defective gene copies that contain multiple exons while
gene fragments have only limited parts of the gene sequence (sometimes a
single exon).
- Pseudogenes could be
(a) nonprocessed (e.g. Fig 9.8 and HLA gene family in Fig 9.10). May
result from chromosmal locations that are unstable such as pericentromeric
and subtelomeric regions. These regions are prone to recombination events
that can result in duplicated gene segments being distributed to other
chromosomal locations. Example of pericentromeric rearrangenements
is NF1 gene (Fig 9.11A) and subteolmeric rearrangements is polycystic
kidney disease gene PKD1 (Fig 9.11B)
(b) processed via retrotransposition by cellular reverse transcriptase
(Fig. 9.12, Table 9.8)
Origins of nonprocessed and processed pseudogenes. (A) Copying of genomic DNA sequence
containing gene A can produce duplicate copies of gene A. Strong selection pressure needs to be
applied to one of the copies to maintain gene function (bold arrow), but the other copy can be
allowed to mutate (dashed arrow). If it picks up inactivating mutations (red circles), a
nonprocessed pseudogene (ΨA) can arise. (B) A processed pseudogene arises after cellular
reverse transcriptases convert a transcript of a gene into a cDNA that then is able to integrate back
into the genome (see Figure 9.12 for details). The lack of important sequences such as a promoter
usually results in an inactive gene copy.
All the pseudogenes are located in the nuclear genome, but they do include defective copies of
genes that reside in the mitochondrial genome (mitochondrial pseudogenes).
The class I HLA gene family: a clustered gene family with nonprocessed pseudogenes and gene
fragments. (A) Structure of a class I HLA heavy-chain mRNA. The full-length mRNA contains a
polypeptide-encoding sequence with a leader sequence (L), three extracellular domains (α1, α2, and α3), a
transmembrane sequence (TM), a cytoplasmic tail (CY), and a 3’ untranslated region (3’ UTR). The three
extracellular domains are each encoded essentially by a single exon. The very small 5’ UTR is not shown.
(B) The class I HLA heavy chain gene cluster is located at 6p21.3 and comprises about 20 genes. They
include six expressed genes (filled blue boxes), four full-length nonprocessed pseudogenes (long red open
boxes labeled Ψ), and a variety of partial gene copies (short open red boxes labeled 1–7). Some of the latter
are truncated at the 5’ end (e.g. 1, 3, 5, and 6), some are truncated at the 3’ end (e.g. 7), and some contain
single exons (e.g. 2 and 4).
Figure 9.11 Dispersal of nonprocessed NF1 and PKD1 pseudogenes as a result of
pericentromeric or subtelomeric instability. (A) The NF1 neurofi bromatosis type I gene is
located close to the centromere of human chromosome 17. It spans 283 kb and has 58 exons.
Exons are represented by thin vertical boxes; introns are shown by connecting chevrons (^).
Defective copies are found in other locations. (B) As a result of segmental duplication events
during primate evolution large components of the 46 kb PKD1 gene have been duplicated and six
PKD1 pseudogenes are located at 16p13.11.
Retrogene
Processed pseudogenes lack a promoter sequence and so are typically not expressed.
Sometimes, however, the cDNA copy integrates into a chromosomal DNA site that happens,
by chance, to be adjacent to a promoter that can drive expression of the processed gene copy.
Selection pressure may ensure that the processed gene copy continues to make a functional
gene product, in which case it is described as a retrogene. A variety of intronless retrogenes
are known to have testis-specific expression patterns and are typically autosomal homologs of
an intron-containing X-linked gene.
During male meiosis, the paired X and Y chromosomes are converted to heterochromatin,
forming the highly condensed and transcriptionally inactive XY body. Autosomal retrogenes
can provide the continued synthesis in testis cells of certain crucially important products that
are no longer synthesized by genes in thehighly condensed XY body.
Figure 9.12 Processed pseudogenes and retrogenes originate by reverse transcription from RNA
transcripts. (A) The mRNA can then be converted naturally into an antisense single-stranded
cDNA by using cellular reverse transcriptase function (provided by LINE-1 repeats). (B)
Integration of the cDNA is envisaged at staggered breaks (indicated by curly arrows) in A-rich
sequences, but could be assisted by the LINE-1 endonuclease. If the A-rich sequence is included
in a 5’ overhang, it could form a hybrid with the distal end of the poly(T) of the cDNA,
facilitating second-strand synthesis. Because of the staggered breaks during integration, the
inserted sequence will be fl anked by short direct repeats (boxed sequences).
RNA Genes
- Various families of small RNA molecules (60-360 nucleotides long) play a role
in assisting general gene expression, mostly at the level of post-
transcriptional processing.
• Non-glycine codons in four-codon boxes. The U/C wobble position is decoded by inosine
(chemically modified adenosine), at the 5’ position in the anticodon. Inosine can base pair with
A, C, or U. For example, the GUU and GUC codons of the four-codon valine box are decoded by
a tRNA with an anticodon of AAC, which is no doubt modified to IAC. The IAC anticodon can
recognize each of GUU, GUC, and GUA. To avoid possible translational misreading, tRNAs
with inosine at the 5’ base of the anticodon cannot be used in two-codon boxes.
• Glycine codons. The four-codon glycine box provides the one exception to the above rule.
Not all snRNAs within the nucleoplasm function as part of
spliceosomes. Both U1 and U2 snRNAs also have non-
spliceosomal functions. U1 snRNA is required to stimulate
transcription by RNA polymerase II. U2 snRNA is known to
stimulate transcriptional elongation by RNA polymerase II.
Figure 9.14. Sm-type snRNAs contain three important recognition elements: a 5’-
trimethylguanosine (TMG) cap, an Sm-protein-binding site (Sm site), and a 3’
stem–loop structure. The Sm site and the 3’ stem elements are required for
recognition by the survival motor neuron (SMN) complex for assembly into stable
core ribonucleoproteins (RNPs). The consensus Sm site directs the assembly of a
ring of the seven Sm core proteins. The TMG cap and the assembled Sm core
proteins are required for recognition by the nuclear import machinery.
Figure 9.14 (B) Lsm-type snRNAs contain a 5’-
monomethylphosphate guanosine (MPG) cap and a 3’ stem, and
terminate in a stretch of uridine residues (the Lsm site) that is
bound by the seven Lsm core proteins.
Structure and function of C/D box snoRNAs
C/D box snoRNAs guide 2’-O-
methylation modifications. The box C
and D motifs and a short 5’, 3’-terminal
stem formed by intrastrand base pairing
(shown as a series of short horizontal red
‘ ‘
lines) constitute a kink-turn structural
motif that is specifically recognized by
the 15.5 kD snoRNP protein. The C’ and
D’ boxes represent internal, frequently
imperfect copies of the C and D boxes.
C/D box snoRNAs and their substrate
RNAs form a 10–21 bp double helix in
which the target residue to be
methylated (shown here by the letter m
in a circle) is positioned exactly five
nucleotides upstream of the D or D’ box.
R represents purine.
Structure and function of H/ACA box snoRNAs
H/ACA box snoRNAs guide the
conversion of uridines to
pseudouridine. These RNAs fold into
a hairpin–hinge–hairpin–tail
structure. One or both of the hairpins
contains an internal Loop, called the
pseudouridylation pocket, that forms
two short (3–10 bp) duplexes with
nucleotides flanking the unpaired
substrate
uridine (Ψ) located about 15
nucleotides from the H or ACA box
of the snoRNA. Although each box
C/D and H/ACA snoRNA could
potentially direct two modification
reactions, apart from a few
exceptions, most
snoRNAs possess only one functional
2’-O-methylation or
pseudouridylation domain.
RNA interference RNA interference. Long double-
stranded (ds) RNA is cleaved by
cytoplasmic dicer to give siRNA.
siRNA duplexes are bound by
argonaute complexes that unwind
the duplex and degrade one strand
to give an activated complex with
a single RNA strand. By base
pairing with complementary RNA
sequences, the siRNA guides
argonaute complexes to recognize
target sequences. Activated RISC
complexes cleave any RNA strand
that is complementary to their
bound siRNA.
The cleaved RNA is rapidly degraded. Activated RITS complexes use their siRNA to bind
to any newly synthesized complementary RNA and then attract proteins, such as histone
methyltransferases (HMT) and sometimes DNA methyltransferases (DNMT), that can
modify the chromatin to repress transcription.
Human miRNA synthesis
Human miRNA synthesis. (B) A specific example: the synthesis of human miR-26a1.
Inverted repeats (shown as highlighted sequences overlined by long arrows) in the pri-
miRNA undergo base pairing to form a hairpin, usually with a few mismatches. The
sequences that will form the mature guide strand are shown in red; those of the passenger
strand are shown in blue. Cleavage by both the human Drosha and dicer (green arrows) is
typically asymmetric, leaving an RNA duplex with overhanging 3’ dinucleotides.
Human primiRNAs The structure of human primiRNAs.
(A) Examples of transcripts that are used
exclusively to make miRNAs: miR-21 is
produced from a single hairpin within a
dedicated primary transcript RNA; a
single multigenic transcript with six
hairpins that will eventually be cleaved to
give six miRNAs, namely miR-17, miR-
18, miR-19a, and so on. (B, C) Examples
of miRNAs that are co-transcribed with a
gene encoding either (B) a long noncoding
RNA (ncRNA) or (C) a polypeptide. In
each part, the upper example shows single
miRNAs located within (B) an exon of an
ncRNA (miR-155) and (C) in the 3’
untranslated region (UTR) within a
terminal exon of an mRNA (miR-198).
The lower examples show multiple
miRNAs located within intronic
sequences of (B) an ncRNA (miR-15a and
miR-16-1) and (C) a pre-mRNA (miR-
106b, miR-93, and miR-25). Cap,
m7G(5‘)ppp(5‘) G.
piRNA piRNA-based transposon silencing in animal
cells. (A) Primary piRNAs (piwi-protein-
interacting RNAs) are 24–31 nucleotides long
and are processed from long RNA precursors
transcribed from defined loci called piRNA
clusters. Any transposon inserted in the reverse
orientation in the piRNA cluster can give rise
to antisense piRNAs (shown in red). (B)
Antisense piRNAs are incorporated into a piwi
protein and direct its slicer activity on sense
transposon transcripts. The 3’ cleavage product
is bound by another piwi protein and trimmed
to piRNA size. This sense piRNA is, in turn,
used to cleave piRNA cluster transcripts and to
generate more antisense piRNAs. (C)
Antisense piRNAs target the piwi complexes to
cDNA for DNA methylation (left) and/or
histone modifi cation (right). DNMT, DNA
methyltransferase; HMT, histone
methyltransferase; HP1, heterochromatin
protein 1.
piRNA piRNA-based transposon
silencing in animal cells. (A)
Primary piRNAs (piwi-protein-
interacting RNAs) are 24–31
nucleotides long and are processed
from long RNA precursors
transcribed from defined loci called
piRNA clusters. Any transposon
inserted in the reverse orientation in
the piRNA cluster can give rise to
antisense piRNAs (shown in red).
(B) Antisense piRNAs are
incorporated into a piwi protein and
direct its slicer activity on sense
transposon transcripts. The 3’
cleavage product is bound by
another piwi protein and trimmed to
piRNA size. This sense piRNA is,
in turn, used to cleave piRNA
cluster transcripts and to generate
more antisense piRNAs.
piRNA (B) Antisense piRNAs are
incorporated into a piwi
protein and direct its slicer
activity on sense transposon
transcripts. The 3’ cleavage
product is bound by another
piwi protein and trimmed to
piRNA size. This sense
piRNA is, in turn, used to
cleave piRNA cluster
transcripts and to generate
more antisense piRNAs. (C)
Antisense piRNAs target the
piwi complexes to cDNA
for DNA methylation (left)
and/or histone modification
(right). DNMT, DNA
methyltransferase; HMT,
histone methyltransferase;
HP1, heterochromatin
protein 1.
Pseudogenes can regulate the
expression of their parent gene by
endogenous siRNA pathways.
Pseudogenes arise through the copying of
a parent gene. Some pseudogenes are
transcribed and, depending on the
genomic context, can produce an RNA
that is the antisense equivalent of the
mRNA produced by the parent gene. An
mRNA transcript of the parent gene (A)
and an antisense transcript of a
corresponding pseudogene (ΨA) can then
form a double-stranded RNA that is
cleaved by dicer to give siRNA.
Endogenous siRNAs can also be produced
from duplicated inverted sequences such
as the example shown here of an inverted
duplication of the pseudogene (ΨA ΨA) at
the right.
Transcription through both copies of the pseudogene results in a long RNA with inverted
repeats (blue, overlined arrows) causing the RNA to fold into a hairpin that is cleaved by
dicer to give siRNA. In either case, the endogenous siRNAs are guided by RISC to interact
with, and degrade, the parent gene’s remaining mRNA transcripts.Green arrows indicate
DNA rearrangements.
Mammalian transposon families. Only a small proportion of members of any of the
illustrated transposon families may be capable of transposing; many have lost such a capacity
after acquiring inactivating mutations, and many are short truncated copies. Subclasses of the
four main families are listed, along with sizes in base pairs. ORF, open reading frame.
The human LINE-1 element. The 6.1 kb LINE-1 element has two open reading frames:
ORF1, a 1 kb open reading frame, encodes p40, an RNA-binding protein that has a nucleic
acid chaperone activity; the 4 kb ORF2 specifies a protein with both endonuclease and
reverse transcriptase activities. A bidirectional internal promoter lies within the 5’
untranslated region (UTR). At the other end, there is an An/Tn sequence, often described as
the 3’ poly(A) tail (pA). The LINE-1 endonuclease cuts one strand of a DNA duplex,
preferably within the sequence TTTT↓A, and the reverse transcriptase uses the released 3’-
OH end to prime cDNA synthesis. New insertion sites are flanked by a small target site
duplication of 2–20 bp (flanking black arrowheads).
The human Alu repeat element. An Alu dimer. The two
monomers have similar sequences that terminate in an An/Tn
sequence but differ in size because of the insertion of a 32 bp
element within the larger repeat. Alu monomers also exist in the
human genome, as do various truncated copies of both monomers
and dimers.
Blurring of gene boundaries at the transcript level
In the past, the four genes at the top would be expected to behave as discrete non-overlapping
transcription units. As shown by recent analyses, the reality is more complicated. A variety of
transcripts often links exons in neighboring genes. The transcripts frequently include
sequences from previously unsuspected transcriptionally active regions (TARs).
https://www.dovepress.com/dna-fingerprinting-for-
sample-authentication-in-biobanking-recent-pers-peer-
reviewed-fulltext-article-BSAM