Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
http://ncbi.nlm.nih.gov
GenBank Sections
In addition to DNA sequences of genes
GenBank has a number of other sections
including:
• Protein sequences (translated from DNA)
• Short RNA fragments (ESTs)
• Cancer Genome Anatomy Project (CGAP) gene
expression profiles of normal, pre-cancer, and cancer
cells from a wide variety of tissue types
• Single Nucleotide Polymorphisms (SNPs) which
represent genetic variations in the human population
• Online Mendelian Inheritance in Man (OMIM) a
database of human genetic disorders
Finding Genes
•GenBank contains approximately 13 billion
bases in 12 million sequence records (as of
August 2001).
•These billions of G, A, T, and C letters would
be almost useless without descriptions of
what genes they contain, the organisms they
come from, etc.
•All of this information is contained in the
"annotation" part of each sequence record.
Entrez is a Tool for Finding
Sequences
• NCBI has created a Web-based tool called
Entrez for finding sequences in GenBank.
• Each sequence in GenBank has a unique
“accession number”.
• Entrez can also search for keywords such as
gene names, protein names, and the names of
orgainisms or biological functions
Entrez has links to Medline
• Entrez is much more than just a tool for
finding sequences by keywords.
• It contains links to PubMed/Medline
ACCESSIBILITY
3st: P_3 acc |bbebbeeeeeebbeebbebbeebeeebeeeeeee eebebbebebbbbbb bbbbeb bb|
10st: PHD acc |007006778670077007007706760777777737707007060000005000060500|
Rel acc |103021343252044604644672424555547615444425212186671016926120|
subset: SUB acc |.......e..e..eeb.ebbeeb.e.beeeeeee.eebeb.e....bbbb...bb.b...|
Threading
• Rather than computing a 3-D structure from
scratch, it may be possible to find a similar
structure.
• Must have ~25% aa sequence identity.
• Uses a process called threading to create a
new structure based on a known structure.
• This still requires HUGE amounts of
computer power.
Protein Data Base
• There is a database of all known protein
structures called the PDB.
• These have been determined by X-ray
crystalography and/or NMR.
• Anyone download and view these structures
with a PDB viewer program.
RasMol
RasMol is the simplest PDB viewer.
http://www.umass.edu/microbio/rasmol/
It can work together with a web browser to let you
view the structure of any sequence found with
Entrez that has a known 3-D structure.
Gene Finding & Translation
• How can we find genes on
chromosomes?