An Overview of
Genome Databases
Dave Ussery
DTU course #27104
Communicating Science: Comparative Genomics
Wednesday, first talk
9 September, 2009
Department of Systems Biology, Technical University of Denmark
- or Where can I find up-to-date
information about genomes?
Center for Biological Sequence analysis
Comparative Microbial Genomics group
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
October
November
Learn tools to
compare genomes
Use tools to
compare genomes
Posters due
Journal Clubs /
podcast
Journal Clubs /
podcast
Journal Clubs /
podcast
Lectures on how
to write papers!
Write papers (!)
Referee / publish
Department of Systems Biology, Technical University of Denmark
September
Center for Biological Sequence analysis
Comparative Microbial Genomics group
Department of Systems Biology, Technical University of Denmark
Center for Biological Sequence analysis
Comparative Microbial Genomics group
Fasta format
>gi|169754007|gb|ACA76706.1| histone family protein nucleoid-structuring protein H-NS
MSVMLQSLNNIRTLRAMAREFSIDVLEEMLEKFRVVTKERREEEEQQQRELAERQEKISTWLELMKADGI
NPEELLGNSSAAAPRAGKKRQPRPAKYKFTDVNGETKTWTGQGRTPKPIAQALAEGKSLDDFLI
GenBank
format
LOCUS
ACA76706
134 aa
linear
BCT 09-MAY-2008
DEFINITION histone family protein nucleoid-structuring protein H-NS
[Escherichia coli ATCC 8739].
ACCESSION
ACA76706
SOURCE
Escherichia coli ATCC 8739
ORGANISM Escherichia coli ATCC 8739
Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
Enterobacteriaceae; Escherichia.
FEATURES
Location/Qualifiers
source
1..134
/organism="Escherichia coli ATCC 8739"
/strain="ATCC 8739"
/db_xref="ATCC:8739"
/db_xref="taxon:481805"
Protein
1..134
/product="histone family protein nucleoid-structuring
protein H-NS"
CDS
1..134
/locus_tag="EcolC_1037"
/coded_by="CP000946.1:1125675..1126079"
/note="PFAM: histone family protein nucleoid-structuring
protein H-NS
KEGG: sdy:SDY_2859 DNA-binding protein"
/transl_table=11
/db_xref="InterPro:IPR001801"
ORIGIN
1 msvmlqslnn irtlramare fsidvleeml ekfrvvtker reeeeqqqre laerqekist
61 wlelmkadgi npeellgnss aaapragkkr qprpakykft dvngetktwt gqgrtpkpia
121 qalaegksld dfli
//
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
CP000828
Not curated
Author submits
Only author can revise
Multiple records for same loci common
Records can contradict each other
No limit to species included
Data exchanged among INSDC members
Akin to primary literature
Proteins identified and linked
Access via NCBI Nucleotide databases
RefSeq
NC_009925
Curated
NCBI creates from existing data
NCBI revises as new data emerge
Single records for each molecule of major organisms
Limited to model organisms
Exclusive NCBI database
Akin to review articles
Proteins and transcripts identified and linked
Access via Nucleotide & Protein databases
Department of Systems Biology, Technical University of Denmark
GenBank
Center for Biological Sequence analysis
Comparative Microbial Genomics group
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Department of Systems Biology, Technical University of Denmark
Center for Biological Sequence analysis
Comparative Microbial Genomics group
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
From: tolstoy@ncbi.nlm.nih.gov
Subject: [Genomes-announce] [NCBI Genomes] News on completion of the public
genome projects
Date: 8 September 2009 20:55:04 GMT+02:00
To: genomes-announce@ncbi.nlm.nih.gov
Dear colleagues,
Please see below the most recent news on completion of the public genome projects
Archaea; Euryarchaeota; Halobacteria; Halobacteriales; Halobacteriaceae;
Halomicrobium;Halomicrobium mukohataei DSM 12286
Sequence data files submitted to GenBank/EMBL/DDBJ can be found at:
ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Halomicrobium_mukohataei_DSM_12286
RefSeq provisional version of the genome can be found at:
ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Halomicrobium_mukohataei_DSM_12286
_______________________________________________
Genomes-announce mailing list
Genomes-announce@ncbi.nlm.nih.gov
http://www.ncbi.nlm.nih.gov/mailman/listinfo/genomes-announce
Department of Systems Biology, Technical University of Denmark
What about other [genome-related] databases?
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Nucleic Acids Research, 2008, Vol. 36, Database issue D1
doi:10.1093/nar/gkm1139
EDITORIAL
The 2008 Database Issue of Nucleic Acids Research is the fifteenth in a series dedicated to databases in the field of molecular biology. These
databases are essential resources for experimental and computational biologists alike and this compilation provides descriptions and updates of
the most important of these databases, and serves to introduce newly compiled resources that provide specialist information in the biological
area. The current issue presents 98 new databases (30 more than last year) and updates for 84 existing databases. The 2008 Database Issue is not
included in the print subscription to NAR. Instead, the Database Issue is freely available online to all under NAR’s open access model. However,
print copies are available for separate purchase by institutions and individuals.
After 5 years as the Database Issue Editor I am stepping down. It has been my great pleasure to watch the growth of so many wonderful database
resources and to help provide a forum for describing this important work. I am very pleased to announce that Michael Galperin will take over
editing the next Database Issue.
ALL authors wishing to submit articles for the 2008 Database Issue MUST contact Dr M. Galperin (nardatabase@gmail.com) with a presubmission enquiry, no later than July 1, 2008, to check whether a submission will be suitable for the issue. The pre-submission enquiry must
present a working web accessible database for review by the Editor. Articles describing new databases will need to be received by August 15,
2008 at the latest, and should be prepared according to the instructions on the Nucleic Acids Research website (http://nar.oupjournals.org/).
Authors who are submitting articles providing update information on databases that have previously been featured in Nucleic Acids Research
should note that the deadline for submission of those articles is September 15, 2008.
The database issue would not be possible without timely reports from hundreds of reviewers. Thanks to you all! I would also like to thank
Deborah Wardle for excellent editorial assistance. Finally, I would like to thank Claire Bird, and the rest of the team at Oxford University Press
for producing this important issue.
Alex Bateman
Department of Systems Biology, Technical University of Denmark
Michael Galperin has continued to produce and enlarge the Molecular Biology Database Collection, a compendium of databases that includes all
those databases described in Nucleic Acids Research, as well as selected other databases relevant to biologists. NAR Online contains links to all
of the databases in the compilation as well as brief summaries of their content. Individuals who wish to have their database listed in the
Molecular Biology Database Collection or update a previous submission to the collection should contact Dr Michael Galperin directly
(nardatabase@gmail.com).
Center for Biological Sequence analysis
Comparative Microbial Genomics group
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark
Comparative Microbial Genomics group
Center for Biological Sequence analysis
Department of Systems Biology, Technical University of Denmark