0% found this document useful (0 votes)

8 views

Sequence and Structure Retrieval

Uploaded by

juhiyaadav

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Sequence and Structure Retrieval

Uploaded by

juhiyaadav

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Aim: To retrieve DNA, RNA, protein sequences and structures

from biological databases and to create various datasets.

Bioinformaticians store biological data (DNA, RNA, and proteins), in a digitized

format, namely in databases. Bioinformaticians have developed databases for
the global submission, maintenance, access, and sharing of data on
biomolecules. These databases keep the data in a structured manner.

The databases contain several types of biological data, such as DNA, RNA, and
protein sequences, structural information, gene expression data, molecular
interaction data, mutation data, phenotypic data, information about metabolic
pathways, and taxonomic information about biological organisms.

Databases can be classified into primary (archival), secondary (curated), and

composite databases.

 Primary Databases: These databases are constructed based on data

collected from laboratory experiments. After experiments the data are
validated and analyzed before uploading in biological databases. They are
classified based on the type of biological molecules like: -
o Nucleic acid databases (GenBank, EMBL, DDBJ, NDB)
o Protein databases (PIR, Swiss-Prot, TrEMBL, PDB)
o Metabolic pathway database (KEGG, EcoCyc, and MetaCyc) and
o Small molecule databases (PubChem, Drug Bank, ZINC, CSD)

 Secondary Databases: These databases are constructed based on

primary biological databases with additional information. Secondary
databases comprise data derived from the results of analyzing primary
data available on the primary databases. They are often referred to as
curated databases. Secondary databases often draw upon information
from numerous sources, including other databases (primary and
secondary), controlled vocabulary, and scientific literature.

 Composite databases store data of different primary databases, thus

obviates the need to search multiple primary databases for nucleotide
sequence, protein sequence, protein structure etc. Examples of some
composite databases are:-
1. nrdb (nonredundant database) combines and stores sequences from
GenBank (CDS translations), PDB, Swiss-Prot, PIR, and PRF.
2. INSD (International Nucleotide Sequence Database) stores nucleotide
sequences of EMBL, GenBank, and DDBJ.
3. UniProt (universal protein sequence database) is a collection of protein
sequences from PIR-PSD, Swiss-Prot, and TrEMB

NCBI
The National Center for Biotechnology Information (NCBI) is part of the United
States National Library of Medicine (NLM), a branch of the National Institutes
of Health (NIH). NCBI houses a series of databases relevant to the basic and
applied life sciences and is an important resource for bioinformatics tools and
services. Major databases include GenBank for DNA sequences and PubMed,
a bibliographic database for biomedical literature. The GenBank sequence
database is an open access collection of publicly available DNA and protein
sequences. GenBank is the actual database, and it can be searched several ways
such as the accession number, or using gene/protein names as keywords.

EMBL-EBI
The EMBL (European Molecular Biology Laboratory) Nucleotide Sequence
Database is a comprehensive database of DNA and RNA sequences directly
submitted from researchers and genome sequencing groups and collected from
the scientific literature and patent applications. In collaboration with DDBJ and
GenBank the database is produced, maintained and distributed at the European
Bioinformatics Institute (EBI) and constitutes Europe’s primary nucleotide
sequence resource.

PDB
PDB (Protein data bank) is a repository for 3D structural data obtained by x-ray
crystallography or NMR spectroscopy of proteins and nucleic acids. Research
Collaboratory for Structural Bioinformatics (RCSB) PDB provides a variety of
tools and resources for studying the structures of biological macromolecules
and their relationship with other sequences, its function and diseases caused if
any .

Retrieval of DNA, RNA, protein sequences from NCBI

1. Open NCBI (www.ncbi.nlm.gov).
2. In the drop-down menu, select “All databases”.
3. Type in name of gene for which sequence/structure is to be searched for, in
search box, and press the search button. [AQP3]

4. Click on the GENE result AQP3-aquaporin 3 (Gill blood group) – Homo

sapiens (human)
5. Expandable sections for the gene of interest are displayed.

6. Click on the section labelled as “NCBI Reference Sequences (RefSeq)”

For Retrieval of Nucleotide Sequence (FASTA & GenBank formats)

o Under genomic subhead NG_007476.1 RefSeqGene, click on the
hyperlink FASTA/GenBank to get sequence in required format.

o For Nucleotide FASTA Sequence: To download the sequence, click on

the hyperlink “Send to:” and select complete record and choose
destination “File’, and download in “FASTA” format.

o For Nucleotide GenBank Sequence: To download the sequence, click

on the hyperlink “Send to:” and select complete record and choose
destination “File’, and download in “GenBank” format.

For Retrieval of RNA and Protein Sequence (FASTA format)

Under mRNA and Protein subhead,

o Click on the hyperlink NM_001318144.2 in order to get the mRNA

sequence.
o Click on the hyperlink NP_001305073.1 in order to get the mRNA
sequence.
o Download the FASTA sequence for the mRNA in the same way as
described for downloading the FASTA file for nucleotide.
o Download the FASTA sequence for the protein sequence in the same
way as described for downloading the FASTA file for nucleotide.

Retrieval of Protein Structure from NCBI

7. Select “Structure “ in the dropdown menu and type the name for which
protein structure is to be searched for in the search bar. Click on search
button. Clicking on desired search result will show structure summary of
protein.
8. Download the structure in PDB format.

EMBL-EBI

Retrieval of DNA, RNA, protein sequences and

structure from EMBL-EBI
1. Open the database EMBL-EBI (https://www.ebi.ac.uk).
2. Type the name of gene of interest in the search tab, while having ALL
selected in the dropdown menu. Click search button.
3. The search results provide results related to Genomes and metagenomes, Nucleotide
sequences, Protein sequences, etc. Towards left of the webpage, the entire list can be seen.

4. Scroll down to search for the desired data (DNA, RNA, or Protein).
5. For retrieval of FASTA sequence, click on “in FASTA format”
6. Copy the FASTA sequence, paste in notepad and save the file.

7. For retrieval of protein structure, go to the https://www.ebi.ac.uk, and click on “About EBI
search” hyperlink, that appears under the search tab.
8. Scroll down to collaborations and click on PDBe.
9. Protein structure can be searched by the name or PDB ID.

Retrieval of protein structure from PDB

1. Open the database PDB (https://www.rcsb.org).
2. Type in the search box the protein name or PDB ID and click search.
3. Search results appear. Structure can be downloaded in various formats
which includes PDB format.
END EXERCISES
Exercise 1: Retrieve DNA, RNA, Protein Sequence and Protein Structure for provided
protein using NCBI

a. DNA FASTA for the gene AQP3

b. RNA FASTA for the gene AQP3
c. Protein FASTA for the gene AQP3
d. Protein Structure and PDB ID for AQP7

Exercise 2 : Retrieve DNA and Protein Sequences for the HIV-1 env gene using NCBI in
GenBank format

a. DNA GenBank for env gene of HIV-1

b. Protein GenBank for env gene of HIV-1
c.
Exercise 3 : Retrieve sequence in FASTA format for provided accession numbers and
determine the sequence type (DNA/RNA/mRNA/Protein) and name of the gene/protein.

a. NG_059281.1
b. NM_000518.5
c. NP_000509.1

Exercise 3 : Write two differences between FASTA and GenBank format.

Exercise 4 : What use can researchers make of the obtained sequences? Explain one
such application in detail.

Exercise 5 : Retrieve DNA, RNA and protein sequence from EMBL-EBI

a. DNA FASTA for the gene actin

b. RNA FASTA for the gene actin
c. Protein FASTA for the gene actin
d. Protein Structure and PDB ID for actin

Exercise 6 : Retrieve protein structure using the PDB ID 6KXW from PDB.

Exercise 7: Create datasets containing minimum 5 sequences for each of the following :-

a. DNA sequence dataset in FASTA format

b. RNA sequence dataset in FASTA format
c. Protein sequence dataset in FASTA format

Biological Databases Lec 2,3
No ratings yet
Biological Databases Lec 2,3
49 pages
Databases - Final
No ratings yet
Databases - Final
50 pages
Bioinformatics
No ratings yet
Bioinformatics
47 pages
Biological Data and Database
No ratings yet
Biological Data and Database
13 pages
Unit II Major Databases in Bioinformatics
No ratings yet
Unit II Major Databases in Bioinformatics
54 pages
Unit II Bioinformatics
No ratings yet
Unit II Bioinformatics
25 pages
Introduction To Bioinformatics (Databases)
No ratings yet
Introduction To Bioinformatics (Databases)
28 pages
Lec2 Databases
No ratings yet
Lec2 Databases
135 pages
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
No ratings yet
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
48 pages
Bioinformatics Biological Database
No ratings yet
Bioinformatics Biological Database
31 pages
Bioinformatics Lecture Notes Database
No ratings yet
Bioinformatics Lecture Notes Database
28 pages
Nucleic_Acid_Databases
No ratings yet
Nucleic_Acid_Databases
37 pages
BCH 505 Bioinformatics 3(2 2) Databases
No ratings yet
BCH 505 Bioinformatics 3(2 2) Databases
17 pages
Module 2 Biodata
No ratings yet
Module 2 Biodata
36 pages
Bioinformatics Databases
No ratings yet
Bioinformatics Databases
10 pages
Presentation 11
No ratings yet
Presentation 11
20 pages
Biological Databases: - Bio-Informatics
No ratings yet
Biological Databases: - Bio-Informatics
16 pages
Data Base in Bioinformatics
No ratings yet
Data Base in Bioinformatics
30 pages
Protein Database
No ratings yet
Protein Database
3 pages
Biological Data Bases
No ratings yet
Biological Data Bases
36 pages
Bioinformatic Databases 2
No ratings yet
Bioinformatic Databases 2
28 pages
Introduction To Databases - NCBI, PDB and Uniprot
No ratings yet
Introduction To Databases - NCBI, PDB and Uniprot
5 pages
CH12
No ratings yet
CH12
8 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
Biological Data and Database Biological Data
No ratings yet
Biological Data and Database Biological Data
10 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
Database
No ratings yet
Database
40 pages
Bioinformatics and Omics Topic: Database and Biological Database With Examples Assignment-3
No ratings yet
Bioinformatics and Omics Topic: Database and Biological Database With Examples Assignment-3
5 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
Databases Bioinformatics
No ratings yet
Databases Bioinformatics
42 pages
Sec1 Introduction to Bioinformatics
No ratings yet
Sec1 Introduction to Bioinformatics
20 pages
Adv Bi Unit 1
No ratings yet
Adv Bi Unit 1
39 pages
15GN402L_final_bioinformatics_lab_manual (1)
No ratings yet
15GN402L_final_bioinformatics_lab_manual (1)
68 pages
Biological Databases Genbank
No ratings yet
Biological Databases Genbank
31 pages
Protein Databases
No ratings yet
Protein Databases
12 pages
CMSC 838T - Lecture 9: Bioinformatics Databases
No ratings yet
CMSC 838T - Lecture 9: Bioinformatics Databases
65 pages
Bioinform-Tica-Pdf-May-6-2010-12-38-Pm-3-5-Meg
No ratings yet
Bioinform-Tica-Pdf-May-6-2010-12-38-Pm-3-5-Meg
105 pages
BI Lab Manual(18-19)
No ratings yet
BI Lab Manual(18-19)
21 pages
Protein Database Overview
No ratings yet
Protein Database Overview
13 pages
Abasyn University Peshawar: Name: Ihsan Ullah Depart: BS Medical Lab Technology
No ratings yet
Abasyn University Peshawar: Name: Ihsan Ullah Depart: BS Medical Lab Technology
8 pages
Databases of NCBI
No ratings yet
Databases of NCBI
13 pages
DATAbases1KD
No ratings yet
DATAbases1KD
5 pages
Lecture 5- DataBase
No ratings yet
Lecture 5- DataBase
18 pages
Database
No ratings yet
Database
16 pages
Computational Biology B.Tech - Biotech (Vith Semester)
No ratings yet
Computational Biology B.Tech - Biotech (Vith Semester)
34 pages
Ahmed Saad Qatea / 4 Stage
No ratings yet
Ahmed Saad Qatea / 4 Stage
10 pages
BIOINFORMATICS - eNOTES
No ratings yet
BIOINFORMATICS - eNOTES
23 pages
Anjali-1
No ratings yet
Anjali-1
16 pages
Bioinformatics Lecture1
No ratings yet
Bioinformatics Lecture1
28 pages
Lecture 3 Database
No ratings yet
Lecture 3 Database
81 pages
Biological Database 1
No ratings yet
Biological Database 1
50 pages
Biol BDs Singapore
No ratings yet
Biol BDs Singapore
24 pages
SQH7001 Bioinformatics Task - Velda Rifka Almira
No ratings yet
SQH7001 Bioinformatics Task - Velda Rifka Almira
9 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Data Mining & Sequence Retrieval Practical
No ratings yet
Data Mining & Sequence Retrieval Practical
46 pages
6.1 Bioinformatics Databases and Tools - Introduction: Lecture 6: December, 28, 2001
No ratings yet
6.1 Bioinformatics Databases and Tools - Introduction: Lecture 6: December, 28, 2001
31 pages
Lab Report 2 Bioinformatics
No ratings yet
Lab Report 2 Bioinformatics
17 pages
2 Introduction To PDB
No ratings yet
2 Introduction To PDB
43 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
The Phosphate Makes A Difference Cellular Functions of NADP
No ratings yet
The Phosphate Makes A Difference Cellular Functions of NADP
10 pages
2 Organization Chrom
No ratings yet
2 Organization Chrom
98 pages
Chemistry: NO Judul Pengarang
No ratings yet
Chemistry: NO Judul Pengarang
10 pages
The Genetic Material and Transformation
No ratings yet
The Genetic Material and Transformation
16 pages
Biology VCE Unit 3
No ratings yet
Biology VCE Unit 3
10 pages
Transport Mechanism 2
No ratings yet
Transport Mechanism 2
67 pages
DNA Replication: From Wikipedia, The Free Encyclopedia
No ratings yet
DNA Replication: From Wikipedia, The Free Encyclopedia
10 pages
2 Basic Clinical Genetic - ppt-1
No ratings yet
2 Basic Clinical Genetic - ppt-1
24 pages
Structure of A Chloroplast
No ratings yet
Structure of A Chloroplast
2 pages
SBI4U Assignmentfor Unit 3 Mini Summative Chapter 6 Yuchucao
No ratings yet
SBI4U Assignmentfor Unit 3 Mini Summative Chapter 6 Yuchucao
5 pages
Difference Between Prokaryotic Cells and Eukaryotic Cells
100% (1)
Difference Between Prokaryotic Cells and Eukaryotic Cells
11 pages
Cell Structures Worksheet
No ratings yet
Cell Structures Worksheet
3 pages
Isolation of Glycogen: Chem 29 Experiment 8
No ratings yet
Isolation of Glycogen: Chem 29 Experiment 8
30 pages
Agar Agar Physico Chemical Properties
No ratings yet
Agar Agar Physico Chemical Properties
2 pages
CH 17-19 Learning Objectives
No ratings yet
CH 17-19 Learning Objectives
4 pages
PyMOL - Exercise Solutions 18
No ratings yet
PyMOL - Exercise Solutions 18
15 pages
12th Annual biology Important 2026
No ratings yet
12th Annual biology Important 2026
22 pages
Immunology Exam 2 Study Guide-1
No ratings yet
Immunology Exam 2 Study Guide-1
3 pages
(POST LAB) Expt 2 - Qualitative Tests For Proteins
No ratings yet
(POST LAB) Expt 2 - Qualitative Tests For Proteins
2 pages
Endocrinology Part 2
No ratings yet
Endocrinology Part 2
6 pages
Stse
No ratings yet
Stse
3 pages
(Eğitim Tanrısı) Anil Gupta - Comprehensive Biochemistry For Dentistry - Textbook For Dental Students-Springer Singapore (2019)
100% (2)
(Eğitim Tanrısı) Anil Gupta - Comprehensive Biochemistry For Dentistry - Textbook For Dental Students-Springer Singapore (2019)
617 pages
Corosolic Acid Frome Banana
No ratings yet
Corosolic Acid Frome Banana
9 pages
2011 Guimarães et al
No ratings yet
2011 Guimarães et al
2 pages
Lippard and Berg, Principles of Bioinorganic Chemistry Problem Set Answers
No ratings yet
Lippard and Berg, Principles of Bioinorganic Chemistry Problem Set Answers
8 pages
Biochemistry
50% (2)
Biochemistry
74 pages
Photosynthesis in Bacteria - Lecture 2
No ratings yet
Photosynthesis in Bacteria - Lecture 2
12 pages
SPHA7410
No ratings yet
SPHA7410
94 pages
Mastering Bioinformatics and Computational Biology_ Unraveling the Complexities of Life Through Data-Driven Discovery
No ratings yet
Mastering Bioinformatics and Computational Biology_ Unraveling the Complexities of Life Through Data-Driven Discovery
216 pages
RNA Structure and Types
100% (3)
RNA Structure and Types
39 pages