Lecture_3
Lecture_3
net/publication/390534932
CITATIONS READS
0 2
1 author:
Arshia Nazir
University of the Punjab
23 PUBLICATIONS 36 CITATIONS
SEE PROFILE
All content following this page was uploaded by Arshia Nazir on 06 April 2025.
It searches linked databases using a single word or combination of words entered as search
term.
Thus, Entrez provides a global query system and forms a web of connections with the data
bases.
Depending on the database selected for search and retrieval, the primary source of some of
the retrieved entries may be other related but specialized databases. For example, the
Nucleotide, RefSeq, EST, GSS, and Gene databases.
Without the data retrieval system, simultaneous searching across multiple
databases by entering the search term only once is not possible and individual
GenomeNet is the Japanese network of database and computational services for genome
research and related biomedical research.
DBGET searches and extracts entries from a wide range of molecular biology databases, and
LinkDB searches and computes links between entries in divergent databases.
DBGET/LinkDB is currently under a new development phase for integration of both
GenomeNet databases and outside databases.
Accessed on 5th April, 2025
3) SRS and EMBL-EBI
A sequence retrieval system (SRS) is a tool for accessing and querying biological databases,
particularly those with flat file or text formats.
SRS was originally developed by Etzold and Argos at the European Molecular Biology
Laboratory (EMBL) in the early 1990s.
It's designed to work with flat file or text format databases, such as EMBL nucleotide
sequence databank, SwissProt protein sequence databank, and Prosite.
SRS is a homogeneous interface to over 80 biological databases that has been developed at
the European Bioinformatics Institute (EBI).
Accessed on 5th April, 2025
Accessed on 5th April, 2025
While the alphabet of sequences has been standardized, the actual formatting of
the sequence in text files differs from database to database.
Sequence formats differ mostly in the layout and formatting of lines of sequence
codes.
FASTA is one of the simplest and the most popular sequence formats because it
contains plain sequence information that is readable by many bioinformatics
analysis programs.
It has a single definition line that begins with a right angle bracket (>) followed
by a sequence name.
A search using the mRNA or gene name in the Nucleotide databases retrieves
many records.
The Nucleotide database can be searched in different ways to focus the search
more narrowly, such as by utilizing the accession or GI number or even using
the names of the authors of a submission.
about that track appears in the drop-down box (Figure on the Next Slide).
sequence while the green lines with arrows show the introns.
UniProt is updated and distributed every three weeks, and can be accessed
online for searches or download at http://www.uniprot.org.
It has four components optimized for different uses as:
The UniProt Knowledgebase (UniProtKB) is an expertly curated database, a
central access point for integrated protein information with cross-references to
multiple sources. UniProtKB comprises two sections:
UniProtKB/Swiss-Prot which is manually annotated and is reviewed and
UniProtKB/TrEMBL which is automatically annotated and is not reviewed.
The UniProt Archive (UniParc) is a comprehensive sequence repository,
reflecting the history of all protein sequences.
UniProt Reference Clusters (UniRef) merge closely related sequences based
on sequence identity to speed up searches.
The UniProt Metagenomic and Environmental Sequences (UniMES)
database is a repository specifically developed for the newly expanding area of
metagenomic and environmental data.
1. Accessing the UniProt Search
• Go to the UniProt website (https://www.uniprot.org/)
• Locate the search bar at the top of the page.
• Select "UniProtKB" from the dropdown menu to the left of the search box.
2. Performing a Search
By Accession ID:
Enter the UniProt accession ID (e.g., P05067) directly into the search box.
By Sequence:
You can search for a protein sequence using a peptide sequence or a portion of the sequence.
By Keywords:
Use keywords related to the protein, organism, or function to find relevant entries.
Advanced Search:
Use the advanced search options to filter your results based on various criteria, such as sequence length,
organism, or features.
3. Retrieving Sequences
• Once you have performed your search and have a list of entries, click the "Download" button on the
query result page.
• Select the desired download format (e.g., Flat Text, XML, RDF/XML, tab-delimited, Excel, or
FASTA).
Possible Download Formats for Data
Gene Annotation
Tools accessed through UniProt
Gene and Protein Sequences for Human Insulin Pre-protein
Retrieval of Protein Structure
Protein Data Bank
The Protein Data Bank (https://www.rcsb.org/) is the single worldwide archive
of structural data of biological macromolecules.
The Protein Data Bank (PDB) was established at Brookhaven National
Laboratories (BNL) in 1971 as an archive for biological macromolecular
crystal structures.
Today depositors to the PDB have varying expertise in the techniques of X-ray
crystal structure determination, NMR, cryoelectron microscopy and theoretical
modeling.
In October 1998, the management of the PDB became the responsibility of the
Research Collaboratory for Structural Bioinformatics (RCSB).
1. Access the RCSB PDB Website
Go to the RCSB PDB website (https://www.rcsb.org/).
2. Search for the Protein Structure
By PDB ID: If you know the PDB ID (e.g., 1HE8), type it into the search bar and click
"Search".
By Protein Name: You can also search by protein name or other keywords.
Advanced Search: If you have specific criteria (e.g., ligand, sequence, author), use the
advanced search options.
3. Locate the Structure Summary Page
Once you find the protein, click on the structure entry to go to its summary page.
4. Download the PDB File
On the structure summary page, find the "Download Files" option.
Select "PDB format" to download the structure file.
Save the file to your computer.
5. Further Exploration
The RCSB PDB website offers various tools for visualizing, analyzing, and exploring protein
structures.
PDB Home Page
PDB Results for Human Insulin
Structure Summary
Structure of Human Insulin Chain A + B
Download Options
Human Insulin Chain A (red) and
Chain B (Green)
Protein Annotation
Experimental Details About Insulin
Sequences of Human Insulin Chain A + B
Details of Genome Mapping
Thanks