The Human Genome Project
The Human Genome Project
The Human Genome Project
The human
genome
thediversity.
fundamental
unity ofsense,
all members
of the human
family, as well
recognition
of their
inherentunderlies
dignity and
In a symbolic
it is the heritage
of humanity.
Universal
onof
the
Human
Genome
and Human
Rights In a symbolic sense, it is the heritage of
theDeclaration
recognition
their
inherent
dignity
and diversity.
as
humanity.
- Universal Declaration on the Human Genome and Human Rights
Aashiana Thiyam
XII D
The Human Genome Project (HGP) is an international scientific research project with
the goal of determining the sequence of chemical base pairs which make up human
DNA, and of identifying and mapping all of the genes of the human genome from
both a physical and functional standpoint.
The project was coordinated by the National Institutes of Health and the U.S.
Department of Energy. Additional contributors included universities across the United
States and international partners in the United Kingdom, France, Germany, Japan, and
China. The Human Genome Project formally began in 1990 and was completed in
2003, 2 years ahead of its original schedule.
What is a genome?
A genome is an organisms complete set of DNA, including all of its genes. Each
genome contains all of the information needed to build and maintain that organism. In
humans, the entire genomemore than 3 billion DNA base pairs exists in two
copies in the chromosomes of all cells except gametes and mature red blood cells.
The "genome" of any given individual is unique; sequencing "the human genome"
involves sequencing multiple variations of each gene. To accurately determine the
sequence of every base in the genome, scientists needed to read the three billion bases
not just once, but at least six to ten times.The project did not study the entire DNA
found in human cells; some heterochromatic areas (about 8% of the total genome)
remain un-sequenced.
FUN FACT!
Telomeres are repetitive stretches
of DNA located at the ends of
linear chromosomes. They protect
the ends of chromosomes in a
manner similar to the way the tips
of shoelaces keep them from
unraveling.
In addition to the human genome, the Human Genome Project sequenced the genomes
of several other organisms, including brewers yeast, the roundworm, and the fruit fly.
In 2002, researchers announced that they had also completed a working draft of the
mouse genome. By studying the similarities and differences between human genes
and those of other organisms, researchers can discover the functions of particular
genes and identify which genes are critical for life.
Whose Genomes?
All humans share the same basic set of genes and genomic regulatory regions that
control the development and maintenance of biological structures and processes.
Therefore, the human reference sequence will not, and does not need to, represent an
exact match for any one persons genome.
Investigators use DNA from donors representing widely diverse populations. For
example, HGP researchers collected samples of blood (female) or sperm (male) from
a large number of people; only a few samples were processed, with source names
protected so neither donors nor scientists would know whose
SNPs are sites in a genome
genomes were being sequenced.
where individuals differ in their
In addition to generating the reference sequence, another important DNA sequence, often by a
HGP goal is to identify many of the small DNA regions that vary single base. For example, one
among individuals and could underlie disease susceptibility and person might have the DNA
drug responsiveness. The most common variations are called SNPs base A where another might
have C, and so on. Scientists
(single nucleotide polymorphisms).
The DNA resources used for these studies came from 24 believe the human genome has
anonymous donors of European, African, American (north, central, at least 10 million SNPs, and
they are generating different
south), and Asian ancestry.
types of maps of these sites,
which can occur in both genes
and non-coding regions.
A few highlights of the first publications of
The human genome contains 3.2 billion chemical nucleotide base pairs (A, C,
T, and G).
The human genome contains about 20,000 to 25,000 genes which is barely 2%
of the entire genome
The average gene consists of 3,000 base pairs, but sizes vary greatly, with the
largest known human gene being dystrophin at 2.4 million base pairs.
Functions are unknown for more than 50% of discovered genes.
The human genome sequence is almost exactly the same (99.9%) in all people.
Repeat sequences that do not code for proteins make up at least 50% of the
human genome and are thought to have no direct functions, but they shed light
on chromosome structure and dynamics.
Over 40% of predicted human proteins share similarity with fruit-fly or worm
proteins.
Genes appear to be concentrated in random areas along the genome, with vast
expanses of non-coding DNA(introns) between.
Chromosome 1 (the largest human chromosome) has the most genes (3,168),
and Y chromosome has the fewest (344).
Particular gene sequences have been associated with numerous diseases and
disorders, including breast cancer, muscle disease, deafness, and blindness.
In February 2001, the Human Genome Project (HGP) published its results to that
date: a 90 percent complete sequence of all three billion base pairs in the human
genome. (The HGP consortium published its data in the February 15, 2001, issue of
the journal Nature).
The project had its ideological origins in the mid-1980s, but its intellectual roots
stretch back further. Alfred Sturtevant created the first Drosophila gene map in 1911.
The crucial first step in molecular genome analysis, and in much of the molecular
biological research of the last half-century, was the discovery of the double helical
structure of the DNA molecule in 1953 by Francis Crick and James Watson. The two
researchers shared the 1962 Nobel Prize (along with Maurice Wilkins) in the category
of "physiology or medicine."
In the mid-1970s, Frederick Sanger developed techniques to sequence DNA, for
which he received his second Nobel Prize in chemistry in 1980. (His first Nobel prize,
in 1958, was for studies of protein structure). With the automation of DNA
sequencing in the 1980s, the idea of analyzing the entire human genome was first
proposed by a few academic biologists.
The United States Department of Energy (DOE), seeking data on protecting the
genome from the mutagenic (gene-mutating) effects of radiation, became involved in
1986, and established an early genome project in 1987.
In 1988, Congress funded both the NIH (National Institutes of Health) and the DOE
to embark on further exploration of this concept, and the two government agencies
formalized an agreement by signing a Memorandum of Understanding to "coordinate
research and technical activities related to the human genome."
James Watson was appointed to lead the NIH component, which was dubbed the
Office of Human Genome Research. The following year, the Office of Human
Genome Research evolved into the National Center for Human Genome Research
(NCHGR).
In 1990, the initial planning stage was completed with the publication of a joint
research plan, "Understanding Our Genetic Inheritance: The Human Genome Project,
The First Five Years, 1991-1995." This initial research plan set out specific goals for
the first five years of what was then projected to be a 15-year research effort.
In 1992, Watson resigned, and the following year, Francis S. Collins was named
director.
The advent and employment of improved research techniques, including the use of
restriction fragment-length polymorphisms (RFLP), the polymerase chain reaction
(PCR), bacterial and yeast artificial chromosomes and pulsed-field gel
electrophoresis, enabled rapid early progress. Therefore, the 1990 plan was updated
with a new five-year plan announced in 1993 in the journal Science.
By 1996, eight NIH institutes and centres had collaborated to create the Center for
Inherited Disease Research (CIDR), for study of the genetics of complex diseases.
In 1997, the NCHGR received full institute status at NIH, becoming the National
Human Genome Research Institute (NHGRI ) in 1997, with Collins remaining as the
director for the new institute. A third five-year plan was announced in 1998, again
in Science.
In June 2000 came the announcement that the majority of the human genome had in
fact been sequenced, which was followed by the publication of 90 percent of the
sequence of the genome's three billion base-pairs in the journal Nature, in February
2001.
Surprises accompanying the sequence publication included: the relatively small
number of human genes, perhaps as few as 30,000; the complex architecture of
human proteins compared to their homologues - similar genes with the same functions
- in, for example, roundworms and fruit flies; and the lessons to be taught by repeat
sequences of DNA.
Deoxyribonucleic acid (DNA) is the chemical compound that contains the instructions
needed to develop and direct the activities of nearly all living organisms. DNA
molecules are made of two twisting, paired strands, often referred to as a double helix.
Each DNA strand is made of four chemical units, called nucleotide bases, which
comprise the genetic "alphabet." The bases are adenine (A), thymine (T), guanine (G),
and cytosine (C). Bases on opposite strands pair specifically: an A always pairs with a
T; a C always pairs with a G. The order of the As, Ts, Cs and Gs determines the
meaning of the information encoded in that part of the DNA molecule just as the order
of letters determines the meaning of a word.
The 3 billion base pairs in the human genome are organized into 23 distinct,
physically separate microscopic units called chromosomes. All genes are arranged
linearly along the chromosomes. The nucleus of most human cells contains 2 sets of
chromosomes, 1 set given by each parent. Each set has 23 single chromosomes22
autosomes and an X or Y sex chromosome.
Chromosomes can be seen under a light microscope and, when stained with certain
dyes, reveal a pattern of light and dark bands reflecting regional variations in the
amounts of A and T vs. G and C. Differences in size and banding pattern allow the 23
chromosomes to be distinguished from each other, an analysis called a karyotype.
With its four-letter language, DNA contains the information needed to build the entire
human body. A gene refers to the unit of DNA that carries the instructions for making
a specific protein or set of proteins. Genes comprise only about 2% of the human
genome; the remainder consists of non-coding regions, whose functions may include
providing chromosomal structural integrity and regulating where, when, and in what
quantity proteins are made.
Each of the estimated 20,000 to 25,000 genes in the human genome codes for an
average of three proteins. These genes are located on 23 pairs of chromosomes packed
into the nucleus of a human cell and direct the production of proteins with the
sequencing vectors that carry shorter pieces of the original cosmid fragments. The
next step is to make the sub-cloned fragments into sets of nested fragments differing
in length by one nucleotide, so that the specific base at the end of each successive
fragment is detectable after the fragments have been separated by gel electrophoresis.
Constructing Clones for Sequencing: Cloned DNA molecules must be made progressively
Mapping
smaller and the fragments sub-cloned into new vectors to obtain fragments small enough for use with
To begin
the project,
researchers
built maps
of the
genome.
They identified
current
sequencing
technology.
Sequencing
results
arehuman
compiled
to provide
longer stretches of
thousands
of
DNA
sequence
landmarks
that
helped
them
navigate
across
the
sequence across a chromosome.
chromosomes.
Developing genome maps was a necessary preparation for DNA sequencing. These
same maps also served to orient geneticists who were hunting for disease genes.
With enough landmarks in place, project scientists created "libraries" of clones that
spanned the genome. Each clone contained a manageably small fragment of human
DNA that was stored in bacteria. Scientists used the landmarks to tell them what part
of the human genome each fragment came from. This clone-by-clone approach made
it possible to double check the location of each DNA sequence.
Building Libraries
Clone libraries offer the same advantage as real libraries: orderly access to
information. In most clone libraries, the DNA fragments are stored in E. Coli. These
are bacteria that normally live in our intestines.
Each E. Coli cell stored a single segment of human DNA and represented a single
book of the library. Clone libraries allowed each human fragment to be tracked and
easily copied.
Subclones
The clone libraries were prepared using bacterial artificial chromosomes, or BACs.
Each BAC clone contained 100,000 to 200,000 bases of DNA sequence. The large
BAC clones were used to establish the order of the DNA sequences. To sequence the
DNA, smaller-sized clones were needed. Project scientists cut the large BAC clones
into smaller fragments of about 2,000 bases. These smaller fragments were typically
stored in viruses called phage that can infect E. coli cells.
Sequencing Reactions
A DNA sequencing reaction includes four main ingredients, "Template" DNA copied
by the E. coli; free bases, the building blocks of DNA that come in 4 types; short
pieces of DNA called "primers"; and DNA polymerase, the enzyme that copies DNA.
The chemical reaction that makes DNA in a test tube is similar to what happens in a
living cell: both rely on DNA polymerase and, in both cases, DNA strands have a
head end, which is called the 5' end, and a tail end, which is called the 3' end. A DNA
strand can grow only from its 3' end i.e. in the 5 3 direction.
Making DNA in cells and sequencing DNA in test tubes both depend on
complementary base pairing. The building blocks on opposite strands of DNA pair
specifically - a C always pairs with a G, and an A always pairs with a T.
The primer sequence binds to its complementary sequence on the template DNA. Free
bases that match the template sequence can attach to the new strand's growing 3' end.
Among the free bases in the solution are a few that have a fluorescent dye attached to
them. When a dye-bearing base attaches to the growing strand, it stops the new DNA
strand from growing any further. A different coloured dye is attached to each of the
four kinds of bases.
Result
The Human Genome Project also produced other advances, not expected to be
accomplished until much later. These included an advanced draft of the mouse
genome and an initial draft of the rat genome.
Medical researchers did not wait to use data from the Human Genome Project. When
the project began in 1990, fewer than 100 human disease genes had been identified.
At the project's conclusion in 2003, the number of identified disease genes had risen
to more than 1,400.
The Human Genome Project focused on the DNA sequence of an individual. The next
step was to analyze DNA sequences from different populations. This catalogue of
human genetic variation was called the HapMap. Completed in 2005, the HapMap
used single nucleotide polymorphisms called SNPs to identify large blocks of DNA
sequence called haplotypes that tend to be inherited together. To use the data,
researchers compare haplotypes between people with and without a disease.
Haplotypes shared by people with the disease are then examined in detail to look for
associated genes. Already, scientists have used its data to identify a gene associated
with age-related macular degeneration, a disease responsible for blindness among the
elderly. It is expected that the HapMap will play an important role in identifying many
more disease genes in the future.
FUN FACT!
The human genome is a massive text. If the
three billion letters (or bases) of the genome
were printed in telephone books, they would
require a stack of books nearly as tall as the
Washington monument.
Technology and resources promoted by the Human Genome Project are starting to
have profound impacts on biomedical research and promise to revolutionize the wider
spectrum of biological research and clinical medicine. Increasingly detailed genome
maps have aided researchers seeking genes associated with dozens of genetic
conditions, including myotonic dystrophy, fragile X syndrome, neurofibromatosis
types 1 and 2, inherited colon cancer, Alzheimer's disease, and familial breast cancer.
On the horizon is a new era of molecular medicine characterized less by treating
symptoms and more by looking to the most fundamental causes of disease. Rapid and
more specific diagnostic tests will make possible earlier treatment of countless
maladies. Medical researchers also will be able to devise novel therapeutic regimens
based on new classes of drugs, immunotherapy techniques, and avoidance of
environmental conditions that may trigger disease, and possible augmentation or even
replacement of defective genes through gene therapy.
In 1994, taking advantage of new capabilities developed by the genome project, DOE
initiated the Microbial Genome Program to sequence the genomes of bacteria useful
in energy production, environmental remediation, toxic waste reduction, and
industrial processing.
Despite our reliance on the inhabitants of the microbial world, we know little of their
number or their nature: estimates are that less than 0.01% of all microbes have been
cultivated and characterized. Microbial genome sequencing will help lay a foundation
for knowledge that will ultimately benefit human health and the environment. The
economy will benefit from further industrial applications of microbial capabilities.
Information gleaned from the characterization of complete microbial genomes will
lead to insights into the development of such new energy-related biotechnologies as
photosynthetic systems, microbial systems that function in extreme environments and
organisms that can metabolize readily available renewable resources and waste
material with equal facility.
Expected benefits also include development of diverse new products, processes, and
test methods that will open the door to a cleaner environment. Biomanufacturing will
use nontoxic chemicals and enzymes to reduce the cost and improve the efficiency of
industrial processes. Microbial enzymes have been used to bleach paper pulp, stone
wash denim, remove lipstick from glassware, break down starch in brewing, and
coagulate milk protein for cheese production. In the health arena, microbial sequences
may help researchers find new human genes and shed light on the disease-producing
properties of pathogens.
Microbial genomics will also help pharmaceutical researchers gain a better
understanding of how pathogenic microbes cause disease. Sequencing these microbes
will help reveal vulnerabilities and identify new drug targets.
Additionally, the new genetic techniques now allow us to establish more precisely the
diversity of microorganisms and identify those critical to maintaining or restoring the
function and integrity of large and small ecosystems; this knowledge also can be
useful in monitoring and predicting environmental change. Finally, studies on
microbial communities provide models for understanding biological interactions and
evolutionary history.
Risk Assessment
Assess health damage and risks caused by radiation exposure, including lowdose exposures
Assess health damage and risks caused by exposure to mutagenic chemicals
and cancer-causing toxins
Reduce the likelihood of heritable mutations
Understanding the human genome will have an enormous impact on the ability to
assess risks posed to individuals by exposure to toxic agents. Scientists know that
genetic differences make some people more susceptible and others more resistant to
such agents.
Understanding genomics will help us understand human evolution and the common
biology we share with all of life. Comparative genomics between humans and other
organisms such as mice already has led to similar genes associated with diseases and
traits. Further comparative studies will help determine the yet-unknown function of
Identify potential suspects whose DNA may match evidence left at crime
scenes
Exonerate persons wrongly accused of crimes
Identify crime and catastrophe victims
Establish paternity and other family relationships
Identify endangered and protected species as an aid to wildlife officials (could
be used for prosecuting poachers)
Detect bacteria and other organisms that may pollute air, water, soil, and food
Match organ donors with recipients in transplant programs
Determine pedigree for seed or livestock breeds
Authenticate consumables such as caviar and wine
Understanding plant and animal genomes will allow us to create stronger, more
disease-resistant plants and animals- reducing the costs of agriculture and providing
consumers with more nutritious, pesticide-free foods. Already growers are using
bioengineered seeds to grow insect- and drought-resistant crops that require little or
no pesticide. Farmers have been able to increase outputs and reduce waste because
their crops and herds are healthier.
Alternate uses for crops such as tobacco have been found. One researcher has
genetically engineered tobacco plants in his laboratory to produce a bacterial enzyme
that breaks down explosives such as TNT and dinitroglycerin. Waste that would take
centuries to break down in the soil can be cleaned up by simply growing these special
plants in the polluted area.
Clinical issues including the education of doctors and other health service providers,
patients, and the general public in genetic capabilities, scientific limitations, and
social risks; and implementation of standards and quality-control measures in testing
procedures.
How will genetic tests be evaluated and regulated for accuracy, reliability, and
utility?
How do we prepare healthcare professionals for the new genetics?
How do we prepare the public to make informed choices?
How do we as a society balance current scientific limitations and social risk with
long-term benefits?
Uncertainties associated with gene tests for susceptibilities and complex conditions
(e.g., heart disease) linked to multiple genes and gene-environment interactions.
Should testing be performed when no treatment is available?
Should parents have the right to have their minor children tested for adult-onset
diseases?
Are genetic tests reliable and interpretable by the medical community?
Conceptual and philosophical implications regarding human responsibility, free
will vs. genetic determinism, and concepts of health and disease.
Do people's genes make them behave in a particular way?
Can people always control their behaviour?
What is considered acceptable diversity?
Where is the line between medical treatment and enhancement?
Health and environmental issues concerning genetically modified foods (GM) and
microbes
Are GM foods and other products safe to humans and the environment?
How will these technologies affect developing nations' dependence on industrialized
nations?
Commercialization of products including property rights (patents, copyrights, and
trade secrets) and accessibility of data and materials
Bibliography