Sequence and Structure Retrieval
Sequence and Structure Retrieval
The databases contain several types of biological data, such as DNA, RNA, and
protein sequences, structural information, gene expression data, molecular
interaction data, mutation data, phenotypic data, information about metabolic
pathways, and taxonomic information about biological organisms.
NCBI
The National Center for Biotechnology Information (NCBI) is part of the United
States National Library of Medicine (NLM), a branch of the National Institutes
of Health (NIH). NCBI houses a series of databases relevant to the basic and
applied life sciences and is an important resource for bioinformatics tools and
services. Major databases include GenBank for DNA sequences and PubMed,
a bibliographic database for biomedical literature. The GenBank sequence
database is an open access collection of publicly available DNA and protein
sequences. GenBank is the actual database, and it can be searched several ways
such as the accession number, or using gene/protein names as keywords.
EMBL-EBI
The EMBL (European Molecular Biology Laboratory) Nucleotide Sequence
Database is a comprehensive database of DNA and RNA sequences directly
submitted from researchers and genome sequencing groups and collected from
the scientific literature and patent applications. In collaboration with DDBJ and
GenBank the database is produced, maintained and distributed at the European
Bioinformatics Institute (EBI) and constitutes Europe’s primary nucleotide
sequence resource.
PDB
PDB (Protein data bank) is a repository for 3D structural data obtained by x-ray
crystallography or NMR spectroscopy of proteins and nucleic acids. Research
Collaboratory for Structural Bioinformatics (RCSB) PDB provides a variety of
tools and resources for studying the structures of biological macromolecules
and their relationship with other sequences, its function and diseases caused if
any .
7. Select “Structure “ in the dropdown menu and type the name for which
protein structure is to be searched for in the search bar. Click on search
button. Clicking on desired search result will show structure summary of
protein.
8. Download the structure in PDB format.
EMBL-EBI
4. Scroll down to search for the desired data (DNA, RNA, or Protein).
5. For retrieval of FASTA sequence, click on “in FASTA format”
6. Copy the FASTA sequence, paste in notepad and save the file.
7. For retrieval of protein structure, go to the https://www.ebi.ac.uk, and click on “About EBI
search” hyperlink, that appears under the search tab.
8. Scroll down to collaborations and click on PDBe.
9. Protein structure can be searched by the name or PDB ID.
Exercise 2 : Retrieve DNA and Protein Sequences for the HIV-1 env gene using NCBI in
GenBank format
a. NG_059281.1
b. NM_000518.5
c. NP_000509.1
Exercise 4 : What use can researchers make of the obtained sequences? Explain one
such application in detail.
Exercise 6 : Retrieve protein structure using the PDB ID 6KXW from PDB.
Exercise 7: Create datasets containing minimum 5 sequences for each of the following :-