Fasta Sequence Database

FASTA is a program for rapid alignment of protein and DNA sequences. It looks for matching sequence patterns called k-tuples and attempts to build local alignments from these matches. Due to its speed and sensitivity, FASTA is useful for sequence database searches. The FASTA algorithm finds similarities in four steps: identifying k-tuples, evaluating matches, extending alignments, and performing local alignment. An example showed FASTA identifying excellent matches between test and database sequences.

Uploaded by

Klevis Xhyra

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

171 views

Fasta Sequence Database

Uploaded by

Klevis Xhyra

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

FASTA SEQUENCE DATABASE

ALDO LISI
NAIL SPAHIJA
KLEVIS XHYRA
INTRODUCTION
Bio-Informatics is an upcoming field, comprising
of application of Computer Science to Biological
study
The slides discusses the Search techniques used in
FASTA – and – and Example of FASTA database
search.
The searching technique uses the Heuristic
approach for sequence alignment in FASTA.
FASTA
FASTA is a program for rapid alignment of pairs of
protein and DNA sequences.

Rather than comparing individual residues in the two

sequences, FASTA instead looks for matching sequence
patterns or words, called k-tuples, and then attempts to
build a local alignment based upon these word matches.

Due to the relatively high speed and sensitivity of the

algorithm, FASTA is very useful.
FASTA vs BLAST
FASTA is comparable in algorithm and in reliability to BLAST but is
considerably slower in speed.

FASTA was found to be more sensitive at finding related protein

sequences than BLAST when using the blosum62 scoring matrix

Two studies have shown that both FASTA and BLAST are not as good at
finding protein families in sequence databases as exhaustive local
homology searches by dynamic programming.

FASTA, BLAST and other searches may be performed on.

The blocks database should be searched with test sequences to find

proteins which share amino acid motifs with a test protein sequence.
FASTA Algorithm
FASTA compares an input DNA or protein sequence to all of the
sequences in a target sequence database, and then reports the best
matched sequences and local alignments of these matched sequence with
the input sequence.
The input sequence and database are usually in FASTA format.

FASTA finds sequence similarities between the test sequence and each
database sequence in four steps:
1. first, the ten best words or k-tuples in each sequence pair are located;
2. second, the k-tuples are evaluated using a symbol comparison table and
the highest scoring regions are identified and used to rank the database
3. third, longer regions of identity are generated by joining initial regions
with scores greater than a certain threshold, by rescoring these regions
using a gap penalty and
4. fourth, an optimal local alignment is performed between the input test
sequence and the best scoring database sequences.
FASTA Algorithm
In the initial search for regions of
similarity, FASTA uses a computer
method known as hash coding 
In this method, a lookup table
showing the positions of each
sequence word of length k, called a k-
tuple, is constructed for each
sequence.
The relative positions of each word
in the two sequences is then calculated
by subtracting the position in the first
sequence from that in the second.
Words that have the same offset
position reveal a region of alignment
between the two sequences.
FASTA Algorithm
The number of comparisons increases
linearly in proportion to average
sequence length.
In contrast, the time taken in dot
matrix and dynamic programming
methods increases as the square of the
average sequence length.
The k-tuple length is user-defined and
is usually 1 or 2 for protein sequences
For nucleic acid sequences, the k-tuple
is 5-20, and is much longer because short
k-tuples are much more common due to
the 4 letter alphabet of nucleic acids. The
larger the k-tuple chosen, the more rapid
but less thorough, a database search.
Significance of FASTA matches
A major focus of the package is the
calculation of accurate similarity statistics,
so that biologists can judge whether an
alignment is likely to have occurred by
chance, or whether it can be used to infer
homology. The FASTA package is available
from the University of Virginia and the
European Bioinformatics Institute

The FASTA file format used as input for this

software is now largely used by other
sequence database search tools (such as
BLAST) and sequence alignment programs
(Clustal, T-Coffee, etc.)
Implementations of FASTA
There are several implementations of the FASTA algorithm:

FASTA - compares a protein sequence to another protein sequence or a

protein library or a DNA sequence to another DNA sequence or to a
DNA sequence library

TFASTA - compares a protein sequence to a DNA sequence or DNA

sequence library by translating each DNA sequence into all 6 possible
reading frames and then comparing each frame to the protein
sequence.

LFASTA - identifies one or more regions of similarity between two

sequences.

PLFASTA - presents a dot matrix plot of regions of sequence similarity

between two sequences.
Example of FASTA database search
An example of a database search for matches between the amino acid sequence of the E.
coli DinH product and the swissprot sequence database using the GCG implementation of
FASTA is shown below. Shown first are portions of a histogram listing the range of init1 and
initn scores obtained with every sequence in the database.

Higher numbers have been cutoff on the right of the histogram. The number of sequences
within a particular range of init1 scores is shown by '-' (none shown here), the number of
initn in each range by '+', and the numbers of init1 and initn scores, when they are equal, by
'='. The sequences giving the highest scoring matches over 80 are immediately apparent.
Example of FASTA database search
In the next stage of analysis, the high scoring sequences are aligned with the test sequence
using a dynamic programming method to find an optimal local alignment. The similarity
scores of these alignments are calculated and are listed as the 'opt' score. The sequences
giving the highest scores with the test sequence are then listed in descending order of
scores, and the local homology alignments of these sequences with the test sequence are
then shown.
In the alignment, a '|' character between amino acid pairs indicates identity and a ':', a pair
often found in alignments of other related proteins. As may be seen, a previously
unidentified, but excellent, alignment was obtained between the E. coli DinH protein and
the B. subtilis spoIIIA protein.
Example of FASTA database search
LINKS

https://www.ctu.edu.vn/~dvxe/Bioinformatic%20course/manuals/blast/blastmanual/fasta.htm
https://www.ebi.ac.uk/
https://www.ebi.ac.uk/Tools/services/web/toolresult.ebi?jobId=fasta-I20180608-113446-0745-87498455-p2m
The interpretation of significance of results depends on several
factors such as word size, length of the sequences being aligned, the
gap penalties & the alignment scoring system used.

FASTA is better for translated DNA-protein comparison and DNA

database searches because it calculates a single alignment that
allows frame shifts.

By treating forward-reading frames as a single sequence, FASTA

makes it much easier to produce high-quality alignments that extend
the length of the protein sequence, resulting in improved sensitivity.
Bibliography / References:
• Developing BioInformatics Computer Skills – Cynthia Gibas &
Per Jambeck
• An Introduction to BioInformatics Algorithms - Neil Jones
and Pavel Pevzner
• BioInformatics Computing – Bryan Bergeron
• Improved Tools for Biological Sequence Analysis - W. R.
Pearson and D. J. Lipman
• An Introduction to BioInformatics Algorithms -
http://www.bioalgorithms.info
• NCBI references - http://www.ncbi.nlm.nih.gov

Fasta and Blast
No ratings yet
Fasta and Blast
3 pages
Blast Vs Fasta
No ratings yet
Blast Vs Fasta
3 pages
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
No ratings yet
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
26 pages
BLAST
100% (1)
BLAST
4 pages
FASTA
No ratings yet
FASTA
4 pages
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
100% (1)
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
38 pages
2 Introduction To PDB
No ratings yet
2 Introduction To PDB
43 pages
Introduction To Databases
No ratings yet
Introduction To Databases
7 pages
Stem Cells - Sources, Characteristics, Types, Uses - Developmental Biology - Microbe Notes
No ratings yet
Stem Cells - Sources, Characteristics, Types, Uses - Developmental Biology - Microbe Notes
5 pages
CATH, Bilogical Data Bases, Bioinformatics Data Base
No ratings yet
CATH, Bilogical Data Bases, Bioinformatics Data Base
3 pages
Serial Analysis of Gene Expression (SAGE)
No ratings yet
Serial Analysis of Gene Expression (SAGE)
34 pages
Bioinformatics Notes
No ratings yet
Bioinformatics Notes
104 pages
Unit 5-Introduction To Biological Databases
No ratings yet
Unit 5-Introduction To Biological Databases
14 pages
Genome Organization
No ratings yet
Genome Organization
20 pages
Proteins Bioinfo Latest
No ratings yet
Proteins Bioinfo Latest
45 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
12 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
Animal Cloning PDF
No ratings yet
Animal Cloning PDF
6 pages
1 Introduction Symbiosis
100% (1)
1 Introduction Symbiosis
229 pages
KEGG
No ratings yet
KEGG
6 pages
Gene Silencing: Presented by Aastha Pal M.Sc. 4 Semester (Biotechnology) Swami Rama Himalayan University
No ratings yet
Gene Silencing: Presented by Aastha Pal M.Sc. 4 Semester (Biotechnology) Swami Rama Himalayan University
22 pages
15 Immunogenetics-89402
No ratings yet
15 Immunogenetics-89402
34 pages
Structural Biology: What Does 3D Tell Us?
No ratings yet
Structural Biology: What Does 3D Tell Us?
20 pages
Immunohistochemistry Principles Uses and Methods
No ratings yet
Immunohistochemistry Principles Uses and Methods
10 pages
Applications of Animal Cell Culture
100% (2)
Applications of Animal Cell Culture
6 pages
Human Parasitic Diseases - A Diagnostic Atlas
No ratings yet
Human Parasitic Diseases - A Diagnostic Atlas
1 page
Transplantation Immunology S1
No ratings yet
Transplantation Immunology S1
23 pages
Eukaryotic DNA Replication
100% (1)
Eukaryotic DNA Replication
25 pages
Cell Adhesion and Cell Adhesion Molecules
No ratings yet
Cell Adhesion and Cell Adhesion Molecules
7 pages
Cell and Molecular Biology
No ratings yet
Cell and Molecular Biology
3 pages
Introduction of Stem Cell Technology
No ratings yet
Introduction of Stem Cell Technology
48 pages
Complement System: Pathways, Functions and Regulation
No ratings yet
Complement System: Pathways, Functions and Regulation
6 pages
Animal Cell Culture - Part 1
No ratings yet
Animal Cell Culture - Part 1
38 pages
Eukaryotic and Prokaryotic Gene Structure
No ratings yet
Eukaryotic and Prokaryotic Gene Structure
5 pages
Principles of Cell Culture
No ratings yet
Principles of Cell Culture
20 pages
Instant Access to Molecular Biotechnology 6th Edition Bernard R. Glick ebook Full Chapters
100% (12)
Instant Access to Molecular Biotechnology 6th Edition Bernard R. Glick ebook Full Chapters
66 pages
Basic Local Alignment
No ratings yet
Basic Local Alignment
36 pages
Bacterial Conjugation
No ratings yet
Bacterial Conjugation
18 pages
Restriction Enzymes MSC Biotech
0% (1)
Restriction Enzymes MSC Biotech
37 pages
Selection of rDNA
100% (1)
Selection of rDNA
31 pages
Genetic Code HM
100% (2)
Genetic Code HM
30 pages
Organization of Genome in Prokaryotes and Eukaryotesppt
No ratings yet
Organization of Genome in Prokaryotes and Eukaryotesppt
24 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Bioinformatics. CH 3 Databases (Summarized Notes)
50% (2)
Bioinformatics. CH 3 Databases (Summarized Notes)
5 pages
Genetic Basis of Blood Group Diversity: Review
No ratings yet
Genetic Basis of Blood Group Diversity: Review
13 pages
DNA Manipulative Enzymes
No ratings yet
DNA Manipulative Enzymes
17 pages
Monoclonal Antibody Report
No ratings yet
Monoclonal Antibody Report
9 pages
Cloning Vectors
No ratings yet
Cloning Vectors
1 page
Eukaryotic Genome Organisation PDF
No ratings yet
Eukaryotic Genome Organisation PDF
2 pages
Dr.T.V.Rao MD Dr.T.V.Rao MD
No ratings yet
Dr.T.V.Rao MD Dr.T.V.Rao MD
108 pages
Cloning Vectors
No ratings yet
Cloning Vectors
19 pages
Lecture 6 - Vectors
100% (1)
Lecture 6 - Vectors
10 pages
Chapter 1 - What Is Biotechnology Entrepreneu - 2020 - Biotechnology Entrepreneu
No ratings yet
Chapter 1 - What Is Biotechnology Entrepreneu - 2020 - Biotechnology Entrepreneu
14 pages
Homology Modelling
No ratings yet
Homology Modelling
29 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
BCH 314 Tutorial 1
No ratings yet
BCH 314 Tutorial 1
9 pages
Bio-Informatics, Its Application S& Ncbi: Submitted By: Sidhant Oberoi (BTF/09/4038)
No ratings yet
Bio-Informatics, Its Application S& Ncbi: Submitted By: Sidhant Oberoi (BTF/09/4038)
9 pages
Cell Cell Interaction
No ratings yet
Cell Cell Interaction
43 pages
Types of Cell Death
No ratings yet
Types of Cell Death
7 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
Quiz Dna
100% (3)
Quiz Dna
8 pages
Biostatistics Syllabus (Final)
No ratings yet
Biostatistics Syllabus (Final)
2 pages
Biostatistics - Bme Yr Iii
No ratings yet
Biostatistics - Bme Yr Iii
38 pages
Hns 2321 Biostatistics Lecture Notes on Inferential Statistics
No ratings yet
Hns 2321 Biostatistics Lecture Notes on Inferential Statistics
25 pages
Lec 3 Terms and Definitions in Bioinformatics
No ratings yet
Lec 3 Terms and Definitions in Bioinformatics
8 pages
Biostatistics Introduction
100% (1)
Biostatistics Introduction
39 pages
2 Practical of Basic Bioinformatics Module: 2.1. Uniprotkb/Swiss-Prot
No ratings yet
2 Practical of Basic Bioinformatics Module: 2.1. Uniprotkb/Swiss-Prot
12 pages
FASTA
No ratings yet
FASTA
3 pages
Course Code Course Title Credit HRS.: Curriculum For Bs Biotechnology (4-Year Degree Programme)
No ratings yet
Course Code Course Title Credit HRS.: Curriculum For Bs Biotechnology (4-Year Degree Programme)
48 pages
Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary
No ratings yet
Multiple Sequence Alignment: Sumbitted To: DR - Navneet Choudhary
23 pages
Pre-Medical Math Req
No ratings yet
Pre-Medical Math Req
7 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
18 pages
Local and Global Sequence Alignment 5+5 Examples
No ratings yet
Local and Global Sequence Alignment 5+5 Examples
10 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
EU MDR Part 1 - FINAL
No ratings yet
EU MDR Part 1 - FINAL
11 pages
Hifm Dataset - TTL
No ratings yet
Hifm Dataset - TTL
547 pages
Genetic Engineering 3rd Year Syllabus
No ratings yet
Genetic Engineering 3rd Year Syllabus
3 pages
NCBI Handbook
No ratings yet
NCBI Handbook
391 pages
Bif201 Biological-Databases TH 1.00 Ac16
No ratings yet
Bif201 Biological-Databases TH 1.00 Ac16
1 page
Sequence Comparison Homology and Similarity
No ratings yet
Sequence Comparison Homology and Similarity
12 pages
Bioinformatics & Computational Biology Syllabus
No ratings yet
Bioinformatics & Computational Biology Syllabus
2 pages
1 Genomics Notes
No ratings yet
1 Genomics Notes
4 pages
Lecture 101
No ratings yet
Lecture 101
43 pages
Bioinformatics Lab Assignment Group 3
No ratings yet
Bioinformatics Lab Assignment Group 3
7 pages
Name: Class Number: - Date: - Section: - Schedule
No ratings yet
Name: Class Number: - Date: - Section: - Schedule
11 pages
Workshop Protein Modeling PDF
No ratings yet
Workshop Protein Modeling PDF
54 pages
MPL203T
No ratings yet
MPL203T
2 pages
Microbial Genomics and Metagenomics 2021
No ratings yet
Microbial Genomics and Metagenomics 2021
4 pages
PDF Statistical Methods in Computer Security Statistics A Series of Textbooks and Monographs 1st Edition William W.S. Chen download
100% (3)
PDF Statistical Methods in Computer Security Statistics A Series of Textbooks and Monographs 1st Edition William W.S. Chen download
67 pages
CAM625 2019 s1 Module1
No ratings yet
CAM625 2019 s1 Module1
31 pages