Data Mining & Sequence Retrieval Practical
Data Mining & Sequence Retrieval Practical
&
Sequence Retrieval
Practical Session 1
Mr. C. Mawere
MTech Bioinformatics (JNTUH, India)
BTech Biotechnology (CUT, Zimbabwe)
Databases
Primary Nucleotide Repository
NCBI ( http://www.ncbi.nlm.nih.gov)
EMBL (http:// www.ebi.ac.uk/embl)
DDBJ (http://www.ddbj.nig.ac.jp/)
An Integrated Database
An integration database is a database which acts as
the data store for multiple applications, and thus
integrates data across these applications
e.g UNIPROT, NCBI
Database Records
A typical database record contains:
• The header; includes the sequence description, source of
organism, literature references, locus field, accession number,
taxanomic classification.
• The sequence (per se) which is often more easily analyzed by the
computer.
Example of a
database record
Experiment- 1
AIM :To Retrieve the protein or DNA sequence in FASTA format
from the NCBI database and analyze the obtained data.
PROCEDURE:
STEP 1 : Open web browser and type the web address of the
required NCBI database.
PROCEDURE:
STEP 1: Open Uniprot Database www.unitprot.org.
NB: Topology refers to the way in which constituent parts are interrelated or arranged.
Experiment- 3
AIM : For a given protein find the protein PDB code, release date,
resolution, classification and pub med, citation from PDB structure
database.
PROCEDURE:
STEP 1: Open PDB Database www.pdb.org.
STEP 3: Click on the relevant PDB code displayed on the result page.
PDB Database Search
PDB Database Search (Cont)
PDB Output of SARS-COV-2 ID
SARS-COV-2 complex & ACE2
Literature Citation
Release Date of ACE2-SARS-COV-2
Assignment on Data Mining
Given the protein ID: P0DTC2 and using the Uniprot database:
1. Find the function of the protein