Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Human Molecular Genetics: Fourth Edition

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 67

Tom Strachan • Andrew Read

Human Molecular Genetics


Fourth Edition

Chapter 9
Organization of the Human Genome

Copyright © Garland Science 2011


• Mitochondrial genome: 16,569 bp, 37 genes, 44%
(G+C), Heavy strand (rich in G), Light strand (rich in C) ,
and a small section of the genome (7S DNA) is triple
stranded (due to repetitive synthesis). 7S contains many of
the control sequences and so is called CR/D-loop region.
• Human cells vary in the number of mt DNA molecules
(typically thousands of copies/cell).
• Sperms do not contribute mtDNA to the zygote (strictly
maternal). During mitosis, mitochondria are passed on to
daughter cells by random assortment.
• Mt DNA contains 37 genes, 28 use H strand (rich in G) as
their sense strand and 9 use L strand (rich in C).
• Of the 37 mt genes: 22 are tRNA genes; 2 rRNA (23S
rRNA and 12S rRNA); 13 are polypeptide coding (oxidative
phosphorylation).
• Because mt DNA encodes 13 proteins only, its genetic code
has drifted from the universal genetic code.
• 93% of mt DNA is coding, all genes lack introns, for some
coding sequences are overlapping, some lack stop codons
(added post-transcriptionally), replication of H strand starts
at the D loop unidirectionally and 2/3 into the mtDNA
replication shifts to using the L strand from a new origin of
replication and it proceeds in the opposite direction.
• Nuclear genome:
- 3100 Mb (3.1 Gb), more than 26,000 genes (6000 of
which are RNA genes), ~5% highly conserved including
1.1% protein coding DNA and 4% of conserved
untranslated & regulatory sequences.
- The coding sequence is present in families of related
sequences generated by gene duplication which resulted in
pseudogenes and gene fragments.
- The 95% non-coding DNA of the human genome is made
up of tandem repeats (head to tail) or dispered repeats
resulting from retrotransposition of RNA transcripts.
for euchromatic component
• Human genome consists of 24 different DNA molecules making 24 chromosomes.
- content: DNA, RNA, histones, non-histones.
- divided to the gene-rich transcriptionally-active euchromatic regions (2.9 Gb)
which was used in the Human Genome Project and constitutive heterochromatin
(200Mb) which is transcriptionally- inactive composed of long arrays of highly
repetitive DNA which are difficult to sequence so are the long arrays of tandemly
repeated transcription units encoding 28S, 18S, & 5.8S (were not sequenced as well).
Each chromosome has some constitutive heterochromatin at the centromere but
chromosomes 1, 9, 16 & 19 have significant heterochromatin in the euchromatic
region close to the centromere. Also significant heterochromatin is found in Y and the
acrocentric chromosomes 13, 14, 15, 21, & 22.

Base composition:
Average GC = 41% for euchromatic componenet but there is considerable variation
between chromosomes (38% G+C) for chrm. 4 & 13 and 49% for 19.
Giemsa bands (dark bands, low GC, 37%; light bands, high GC 45%).
why are CpG dinucleotides depleted from vertebrate DNA?
Instability of vertebrate CpG dinucleotides

In most animals (but


not Drosophila
melanogaster), the
dinucleotide
CpG is a common
target for cytosine
methylation by specific
cytosine
methyltransferases,
forming mCpG.
• Human gene number:
- At least 26,000 (6000 of which are RNA genes
- C. elegans (1 mm long worm) has 959 somatic cells, genome is 1/30
that of humans, contains 19,099 protein-coding genes & >1000 RNA-
coding genes.
Therefore, genome complexity is not parallel to biological complexity.

• Human gene distribution:


- CpG islands are known to strongly associated with genes. Done by
hybridizing CpG islands to metaphase chromosomes. The results
showed that gene density is high in subtelomeric regions & that some
chromosomes (19 & 22) are gene rich while others are gene poor (X &
18).
- Gene density correlates with Giemsa banding. Dark bands are low in
G+C content and vise-versa is true for light bands.
• Duplication of DNA segments resulted in copy-number variation
and gene families
- Tandem gene duplication: Arise by unequal crossover between
unequally aligned chromatids either on homologous chromosomes
(unequal crossver) or on the same chromosome (unequal sister
chromatid exchange (Fig. 9.5).
- Duplicative transposition: this involves retrotransposition
- Gene duplication by ancestral cell fusion: Invasion by a prokaryotic
cell to a eukaryotic cell resulted in establishing organelles. By time
pieces of organelle genomes have been excised and transferred to the
nuclear genome. This resulted in duplication of cytoplasm encoding
genes in the nuclear genome.
- Large-scale subgenomic duplications: arise by chromosome
translocations (segmental duplication) (Fig 9.6).
- Whole genome duplication: Comparative genomics studies
confirmed that during eukaryote evolution (e.g. chordates).
Segmental duplication

The horizontal bar in the centre is a linear map of the DNA of human chromosome 16 (the central
green segment represents heterochromatin). The black horizontal bars
at the top and bottom represent linear maps of 16 other chromosomes containing large segments that
are shared with chromosome 16, with red connecting lines marking the positions of homologous
sequences. Intrachromosomal duplications are shown by blue chevrons (^) linking the positions of
large duplicated sequences on chromosome 16.
• Organization, distribution & function human protein-coding genes:

- Human genes show enormous variation in size and internal


organization. E.g. Dystrophin gene 2.4 Mb is transcribed in 16 hours
- Diversity in exon-intron organization: very small number of
genes lack introns. For intron-containing genes, there is an inverse
correlation between gene size and fraction of coding DNA
(Table 9.4).This is not because exons in large genes are smaller
than those in small genes but because large genes have huge
introns.
- Diversity in repetitive DNA content: gene have repetitive DNA
within introns, flanking sequences, and to different extents in
coding sequences.
- Different proteins can be specified by overlapping transcription units:
- Overlapping genes and genes-within-genes: Gene density varies
between chromosomes and within regions of same chromosome.
- In regions with high gene density, overlapping genes maybe found
which are typically transcribed from opposing DNA strands e.g. HLA
complex (Fig 9.7A)
- 9% of the humna protein-coding genes overlap and more of 90% of
such overlaps involve transcription from opposing strands. However,
sometimes small protein-encoding genes are located within the introns
of larger genes e.g. neurofibromatosis type I (NF1) (Fig. 9.7B)
- Some protein-coding genes share a common promoter and are
transcribed in opposite directions.
Overlapping Genes

(A) Genes in the class III region of the HLA complex are tightly packed and overlapping in some cases.
Arrows show the direction of transcription. (B) Intron 27b of the NF1 (neurofi bromatosis type I) gene is
60.5 kb long and contains three small internal genes, each with two exons, which are transcribed from the
opposing strand. The internal genes (not drawn to scale) are OGMP (oligodendrocyte myelin
glycoprotein) and EVI2A and EVI2B (human homologs of murine genes thought to be involved in
leukemogenesis and located at ecotropic viral integration sites).
- Protein-coding genes often belong to families that are clustered or
dispersed on multiple chromosomes:

- Examples of clustered gene families in Fig 9.8 while some gene


families have copies at two or more chromosomal locations
without gene clustering (Table 9.6).
Examples of human clustered gene families

Genes in a cluster are often closely related in sequence and are typically transcribed
from the same strand. Gene clusters often contain a mixture of expressed genes and
nonfunctional pseudogenes. The functional status of the θ-globin and CS-L genes is
uncertain. The scales at the top
(globin and growth hormone clusters) and the bottom (albumin cluster) are in
kilobases.
Gene Family
Three different classes of gene family according to the extent of
sequence identity and structural similarity of the protein products:

1- High degree of homology over most of the length of the gene


or coding sequence e.g. histone and the α- and β-globin gene families.

2- Members may have very low sequence homology but they


posses one or more common protein domain e.g. the PAX and
SOX gene families (Table 9.7).

3- Gene families are defined by functionally similar short


protein motifs (these encode functionally-related protein with a DEAD
(Asp-Glu-Ala-Asp) or the WD repeat (Fig. 9.9).
Gene families with short conserved amino acid motifs

(A) DEAD box family motifs. This gene family encodes products implicated in cellular
processes involving the alteration of RNA secondary structure, such as translation
initiation and splicing. Eight very highly conserved amino acid motifs are evident,
including the DEAD box (Asp-Glu-Ala-Asp). Numbers refer to frequently found size
ranges for intervening amino acid sequences; X represents any amino acid.
(B) WD repeat family motifs. This gene family encodes products that are involved in a
variety of regulatory functions, such as regulation of cell division, transcription,
transmembrane signaling and mRNA modification. The gene products are
characterized by 4–16 tandem WD repeats that each contain a core sequence of fixed
length beginning with a GH (Gly-His) dipeptide and terminating in the dipeptide WD
(Trp-Asp), preceded by a sequence of variable length.
Pseudogenes
- Gene duplication events that give rise to multigene families also create
pseudogenes and gene fragments.
- Pseudogenes are defective gene copies that contain multiple exons while
gene fragments have only limited parts of the gene sequence (sometimes a
single exon).
- Pseudogenes could be
(a) nonprocessed (e.g. Fig 9.8 and HLA gene family in Fig 9.10). May
result from chromosmal locations that are unstable such as pericentromeric
and subtelomeric regions. These regions are prone to recombination events
that can result in duplicated gene segments being distributed to other
chromosomal locations. Example of pericentromeric rearrangenements
is NF1 gene (Fig 9.11A) and subteolmeric rearrangements is polycystic
kidney disease gene PKD1 (Fig 9.11B)
(b) processed via retrotransposition by cellular reverse transcriptase
(Fig. 9.12, Table 9.8)
Origins of nonprocessed and processed pseudogenes. (A) Copying of genomic DNA sequence
containing gene A can produce duplicate copies of gene A. Strong selection pressure needs to be
applied to one of the copies to maintain gene function (bold arrow), but the other copy can be
allowed to mutate (dashed arrow). If it picks up inactivating mutations (red circles), a
nonprocessed pseudogene (ΨA) can arise. (B) A processed pseudogene arises after cellular
reverse transcriptases convert a transcript of a gene into a cDNA that then is able to integrate back
into the genome (see Figure 9.12 for details). The lack of important sequences such as a promoter
usually results in an inactive gene copy.
All the pseudogenes are located in the nuclear genome, but they do include defective copies of
genes that reside in the mitochondrial genome (mitochondrial pseudogenes).
The class I HLA gene family: a clustered gene family with nonprocessed pseudogenes and gene
fragments. (A) Structure of a class I HLA heavy-chain mRNA. The full-length mRNA contains a
polypeptide-encoding sequence with a leader sequence (L), three extracellular domains (α1, α2, and α3), a
transmembrane sequence (TM), a cytoplasmic tail (CY), and a 3’ untranslated region (3’ UTR). The three
extracellular domains are each encoded essentially by a single exon. The very small 5’ UTR is not shown.
(B) The class I HLA heavy chain gene cluster is located at 6p21.3 and comprises about 20 genes. They
include six expressed genes (filled blue boxes), four full-length nonprocessed pseudogenes (long red open
boxes labeled Ψ), and a variety of partial gene copies (short open red boxes labeled 1–7). Some of the latter
are truncated at the 5’ end (e.g. 1, 3, 5, and 6), some are truncated at the 3’ end (e.g. 7), and some contain
single exons (e.g. 2 and 4).
Figure 9.11 Dispersal of nonprocessed NF1 and PKD1 pseudogenes as a result of
pericentromeric or subtelomeric instability. (A) The NF1 neurofi bromatosis type I gene is
located close to the centromere of human chromosome 17. It spans 283 kb and has 58 exons.
Exons are represented by thin vertical boxes; introns are shown by connecting chevrons (^).
Defective copies are found in other locations. (B) As a result of segmental duplication events
during primate evolution large components of the 46 kb PKD1 gene have been duplicated and six
PKD1 pseudogenes are located at 16p13.11.
Retrogene

Processed pseudogenes lack a promoter sequence and so are typically not expressed.
Sometimes, however, the cDNA copy integrates into a chromosomal DNA site that happens,
by chance, to be adjacent to a promoter that can drive expression of the processed gene copy.
Selection pressure may ensure that the processed gene copy continues to make a functional
gene product, in which case it is described as a retrogene. A variety of intronless retrogenes
are known to have testis-specific expression patterns and are typically autosomal homologs of
an intron-containing X-linked gene.
During male meiosis, the paired X and Y chromosomes are converted to heterochromatin,
forming the highly condensed and transcriptionally inactive XY body. Autosomal retrogenes
can provide the continued synthesis in testis cells of certain crucially important products that
are no longer synthesized by genes in thehighly condensed XY body.
Figure 9.12 Processed pseudogenes and retrogenes originate by reverse transcription from RNA
transcripts. (A) The mRNA can then be converted naturally into an antisense single-stranded
cDNA by using cellular reverse transcriptase function (provided by LINE-1 repeats). (B)
Integration of the cDNA is envisaged at staggered breaks (indicated by curly arrows) in A-rich
sequences, but could be assisted by the LINE-1 endonuclease. If the A-rich sequence is included
in a 5’ overhang, it could form a hybrid with the distal end of the poly(T) of the cDNA,
facilitating second-strand synthesis. Because of the staggered breaks during integration, the
inserted sequence will be fl anked by short direct repeats (boxed sequences).
RNA Genes

• Fig 9.13 shows the functional diversity of human ncRNA (noncoding


RNA).

• Table 9.9 is a compilation of all the major classes of ncRNA


- More than a 1000 human genes, mostly within large gene clusters,
encode rRNA or tRNA

- Ribosomal RNA genes:


- Two mitochondrial rRNA molecules (12S & 16S)
- Four types of cytoplasmic rRNA, 3 associated with the large ribosome
subunit (28S, 5.8S, & 5S) and one with the small ribosome subunit
(18S)
- The 5S occur in small gene clusters, the largest cluster is 16 genes on
1q42 close to the telomere.
- The 28S, 5.8S, & 18S rRNA are encoded by a single multigenic
transcription unit that is tandemly repeated to form megabse-sized
ribosomal DNA arrays (~30-40 tandem repeats or ~100 rRNA genes) on
the short arms of each of the acrocentric chromosomes 13, 14, 15, 21, &
22.
- Transfer RNA genes:
- 22 tRNA genes make 22 different tRNA molecules.
- Nuclear genome has 516 tRNA genes, classified into 49 families
based on codon specificity, that make cytoplasmic tRNA .
- Amino acid frequency doesn’t correlate with the number tRNA genes.
E.g. 30 tRNAs specify the rare cysteine (2.25% of all amino acids in human
proteins) but only 21 tRNA genes specify the more abundant proline (6.10%
of total).
- More than half the tRNA genes (273 out of 516) reside in
chromosome 6 (many clustered in a 4 Mb region) or 1.18 of the 30 Cys
tRNAs are found in a 0.5 Mb stretch of chromosome 7.
• Dispersed gene families make various small nuclear RNAs that
facilitate general gene expression:

- Various families of small RNA molecules (60-360 nucleotides long) play a role
in assisting general gene expression, mostly at the level of post-
transcriptional processing.

There are 3 types:


(i) small nuclear RNAs (snRNAs) are U-rich and bind to various proteins to
function as ribonucleoproteins (snRNPs).
(ii) snRNA that are involved in post-transcriptional processing of rRNA
precursors in the nucleolus were re-classified as small nucleolar RNAs
(snoRNAs).
(iii) Resemble snoRNAs but are confined coiled bodies (discrete structures in the
nucleus that are involved with the maturation of SnRNPs) and are called Cajal
body RNAs (scaRNAs).
Extensive transcriptional complexity of human genes. (A) Human genes are frequently
transcribed on both strands, as shown in this hypothetical gene cluster. (B) A single gene can
have multiple transcriptional start sites (right-angled arrows) as well as many interleaved coding
and noncoding transcripts. Exons are shown as blue boxes. Known short RNAs such as small
nucleolar RNAs (snoRNAs) and microRNAs (miRNAs) can be processed from intronic
sequences, and novel species of short RNAs that cluster around the beginning and end of genes
have recently been discovered.
Codons in two-codon
boxes. The U/C wobble
position is typically
decoded by a G at the 5’
base position in the
tRNA anticodon. For
example, for Phe, there
is no tRNA with an AAA
anticodon to match the
UUU codon, but the
GAA anticodon can
recognize both UUU and
UUC codons in the
mRNA.

• Non-glycine codons in four-codon boxes. The U/C wobble position is decoded by inosine
(chemically modified adenosine), at the 5’ position in the anticodon. Inosine can base pair with
A, C, or U. For example, the GUU and GUC codons of the four-codon valine box are decoded by
a tRNA with an anticodon of AAC, which is no doubt modified to IAC. The IAC anticodon can
recognize each of GUU, GUC, and GUA. To avoid possible translational misreading, tRNAs
with inosine at the 5’ base of the anticodon cannot be used in two-codon boxes.
• Glycine codons. The four-codon glycine box provides the one exception to the above rule.
Not all snRNAs within the nucleoplasm function as part of
spliceosomes. Both U1 and U2 snRNAs also have non-
spliceosomal functions. U1 snRNA is required to stimulate
transcription by RNA polymerase II. U2 snRNA is known to
stimulate transcriptional elongation by RNA polymerase II.
Figure 9.14. Sm-type snRNAs contain three important recognition elements: a 5’-
trimethylguanosine (TMG) cap, an Sm-protein-binding site (Sm site), and a 3’
stem–loop structure. The Sm site and the 3’ stem elements are required for
recognition by the survival motor neuron (SMN) complex for assembly into stable
core ribonucleoproteins (RNPs). The consensus Sm site directs the assembly of a
ring of the seven Sm core proteins. The TMG cap and the assembled Sm core
proteins are required for recognition by the nuclear import machinery.
Figure 9.14 (B) Lsm-type snRNAs contain a 5’-
monomethylphosphate guanosine (MPG) cap and a 3’ stem, and
terminate in a stretch of uridine residues (the Lsm site) that is
bound by the seven Lsm core proteins.
Structure and function of C/D box snoRNAs
C/D box snoRNAs guide 2’-O-
methylation modifications. The box C
and D motifs and a short 5’, 3’-terminal
stem formed by intrastrand base pairing
(shown as a series of short horizontal red
‘ ‘
lines) constitute a kink-turn structural
motif that is specifically recognized by
the 15.5 kD snoRNP protein. The C’ and
D’ boxes represent internal, frequently
imperfect copies of the C and D boxes.
C/D box snoRNAs and their substrate
RNAs form a 10–21 bp double helix in
which the target residue to be
methylated (shown here by the letter m
in a circle) is positioned exactly five
nucleotides upstream of the D or D’ box.
R represents purine.
Structure and function of H/ACA box snoRNAs
H/ACA box snoRNAs guide the
conversion of uridines to
pseudouridine. These RNAs fold into
a hairpin–hinge–hairpin–tail
structure. One or both of the hairpins
contains an internal Loop, called the
pseudouridylation pocket, that forms
two short (3–10 bp) duplexes with
nucleotides flanking the unpaired
substrate
uridine (Ψ) located about 15
nucleotides from the H or ACA box
of the snoRNA. Although each box
C/D and H/ACA snoRNA could
potentially direct two modification
reactions, apart from a few
exceptions, most
snoRNAs possess only one functional
2’-O-methylation or
pseudouridylation domain.
RNA interference RNA interference. Long double-
stranded (ds) RNA is cleaved by
cytoplasmic dicer to give siRNA.
siRNA duplexes are bound by
argonaute complexes that unwind
the duplex and degrade one strand
to give an activated complex with
a single RNA strand. By base
pairing with complementary RNA
sequences, the siRNA guides
argonaute complexes to recognize
target sequences. Activated RISC
complexes cleave any RNA strand
that is complementary to their
bound siRNA.

The cleaved RNA is rapidly degraded. Activated RITS complexes use their siRNA to bind
to any newly synthesized complementary RNA and then attract proteins, such as histone
methyltransferases (HMT) and sometimes DNA methyltransferases (DNMT), that can
modify the chromatin to repress transcription.
Human miRNA synthesis

Human miRNA synthesis. (A) General scheme.


The primary transcript, pri-miRNA, has a 5’ cap
(m7GpppG) and a 3’ poly(A) tail. miRNA
precursors have a prominent double-stranded RNA
structure (RNA hairpin), and processing occurs
through the actions of a series of ribonuclease
complexes. In the nucleus, Rnasen, the human
homolog of Drosha, cleaves the pri-miRNA to
release the hairpin RNA (pre-miRNA); this is then
exported to the cytoplasm, where it is cleaved by
the enzyme dicer to produce a miRNA duplex.
The duplex RNA is bound by an argonaute
complex and the helix is unwound, whereupon one
strand (the passenger) is degraded by the
argonaute ribonuclease, leaving the mature
miRNA (the guide strand) bound to argonaute.
miR, miRNA gene.
Human miRNA synthesis

Human miRNA synthesis. (B) A specific example: the synthesis of human miR-26a1.
Inverted repeats (shown as highlighted sequences overlined by long arrows) in the pri-
miRNA undergo base pairing to form a hairpin, usually with a few mismatches. The
sequences that will form the mature guide strand are shown in red; those of the passenger
strand are shown in blue. Cleavage by both the human Drosha and dicer (green arrows) is
typically asymmetric, leaving an RNA duplex with overhanging 3’ dinucleotides.
Human primiRNAs The structure of human primiRNAs.
(A) Examples of transcripts that are used
exclusively to make miRNAs: miR-21 is
produced from a single hairpin within a
dedicated primary transcript RNA; a
single multigenic transcript with six
hairpins that will eventually be cleaved to
give six miRNAs, namely miR-17, miR-
18, miR-19a, and so on. (B, C) Examples
of miRNAs that are co-transcribed with a
gene encoding either (B) a long noncoding
RNA (ncRNA) or (C) a polypeptide. In
each part, the upper example shows single
miRNAs located within (B) an exon of an
ncRNA (miR-155) and (C) in the 3’
untranslated region (UTR) within a
terminal exon of an mRNA (miR-198).
The lower examples show multiple
miRNAs located within intronic
sequences of (B) an ncRNA (miR-15a and
miR-16-1) and (C) a pre-mRNA (miR-
106b, miR-93, and miR-25). Cap,
m7G(5‘)ppp(5‘) G.
piRNA piRNA-based transposon silencing in animal
cells. (A) Primary piRNAs (piwi-protein-
interacting RNAs) are 24–31 nucleotides long
and are processed from long RNA precursors
transcribed from defined loci called piRNA
clusters. Any transposon inserted in the reverse
orientation in the piRNA cluster can give rise
to antisense piRNAs (shown in red). (B)
Antisense piRNAs are incorporated into a piwi
protein and direct its slicer activity on sense
transposon transcripts. The 3’ cleavage product
is bound by another piwi protein and trimmed
to piRNA size. This sense piRNA is, in turn,
used to cleave piRNA cluster transcripts and to
generate more antisense piRNAs. (C)
Antisense piRNAs target the piwi complexes to
cDNA for DNA methylation (left) and/or
histone modifi cation (right). DNMT, DNA
methyltransferase; HMT, histone
methyltransferase; HP1, heterochromatin
protein 1.
piRNA piRNA-based transposon
silencing in animal cells. (A)
Primary piRNAs (piwi-protein-
interacting RNAs) are 24–31
nucleotides long and are processed
from long RNA precursors
transcribed from defined loci called
piRNA clusters. Any transposon
inserted in the reverse orientation in
the piRNA cluster can give rise to
antisense piRNAs (shown in red).
(B) Antisense piRNAs are
incorporated into a piwi protein and
direct its slicer activity on sense
transposon transcripts. The 3’
cleavage product is bound by
another piwi protein and trimmed to
piRNA size. This sense piRNA is,
in turn, used to cleave piRNA
cluster transcripts and to generate
more antisense piRNAs.
piRNA (B) Antisense piRNAs are
incorporated into a piwi
protein and direct its slicer
activity on sense transposon
transcripts. The 3’ cleavage
product is bound by another
piwi protein and trimmed to
piRNA size. This sense
piRNA is, in turn, used to
cleave piRNA cluster
transcripts and to generate
more antisense piRNAs. (C)
Antisense piRNAs target the
piwi complexes to cDNA
for DNA methylation (left)
and/or histone modification
(right). DNMT, DNA
methyltransferase; HMT,
histone methyltransferase;
HP1, heterochromatin
protein 1.
Pseudogenes can regulate the
expression of their parent gene by
endogenous siRNA pathways.
Pseudogenes arise through the copying of
a parent gene. Some pseudogenes are
transcribed and, depending on the
genomic context, can produce an RNA
that is the antisense equivalent of the
mRNA produced by the parent gene. An
mRNA transcript of the parent gene (A)
and an antisense transcript of a
corresponding pseudogene (ΨA) can then
form a double-stranded RNA that is
cleaved by dicer to give siRNA.
Endogenous siRNAs can also be produced
from duplicated inverted sequences such
as the example shown here of an inverted
duplication of the pseudogene (ΨA ΨA) at
the right.

Transcription through both copies of the pseudogene results in a long RNA with inverted
repeats (blue, overlined arrows) causing the RNA to fold into a hairpin that is cleaved by
dicer to give siRNA. In either case, the endogenous siRNAs are guided by RISC to interact
with, and degrade, the parent gene’s remaining mRNA transcripts.Green arrows indicate
DNA rearrangements.
Mammalian transposon families. Only a small proportion of members of any of the
illustrated transposon families may be capable of transposing; many have lost such a capacity
after acquiring inactivating mutations, and many are short truncated copies. Subclasses of the
four main families are listed, along with sizes in base pairs. ORF, open reading frame.
The human LINE-1 element. The 6.1 kb LINE-1 element has two open reading frames:
ORF1, a 1 kb open reading frame, encodes p40, an RNA-binding protein that has a nucleic
acid chaperone activity; the 4 kb ORF2 specifies a protein with both endonuclease and
reverse transcriptase activities. A bidirectional internal promoter lies within the 5’
untranslated region (UTR). At the other end, there is an An/Tn sequence, often described as
the 3’ poly(A) tail (pA). The LINE-1 endonuclease cuts one strand of a DNA duplex,
preferably within the sequence TTTT↓A, and the reverse transcriptase uses the released 3’-
OH end to prime cDNA synthesis. New insertion sites are flanked by a small target site
duplication of 2–20 bp (flanking black arrowheads).
The human Alu repeat element. An Alu dimer. The two
monomers have similar sequences that terminate in an An/Tn
sequence but differ in size because of the insertion of a 32 bp
element within the larger repeat. Alu monomers also exist in the
human genome, as do various truncated copies of both monomers
and dimers.
Blurring of gene boundaries at the transcript level

In the past, the four genes at the top would be expected to behave as discrete non-overlapping
transcription units. As shown by recent analyses, the reality is more complicated. A variety of
transcripts often links exons in neighboring genes. The transcripts frequently include
sequences from previously unsuspected transcriptionally active regions (TARs).
https://www.dovepress.com/dna-fingerprinting-for-
sample-authentication-in-biobanking-recent-pers-peer-
reviewed-fulltext-article-BSAM

You might also like