Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Multiple Sequence Alignment 3

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

SEQUENCE ALIGNMENT

A sequence alignment is a way of arranging the


sequences of DNA, RNA, or protein to identify
regions of similarity that may be a consequence
of functional, structural, or evolutionary
relationships between the sequences.
TYPES OF SEQUENCE ALIGNMENTS
Pair‐ wise alignment
• Dot matrix method
• Dynamic programming
• Word methods
Multiple sequence alignment
• Progressive methods
• Iterative methods
• Block based method
Multiple Alignment
• Proteins can be classified into families:
– Common structure.
– Common function.
– Common evolutionary origin.
• For a set of sequences belonging to some family
– Each pair has some differences
– But, there are some common motifs in almost all
sequences of the family
• A multiple alignment carries more information than
pairwise alignment
MULTIPLE SEQUENCE ALIGNMENT
A multiple sequence alignment is an alignment
of n > 2 sequences obtained by inserting gaps
(“‐”) into sequences such that the resulting
sequences have all length L and can be arranged
in a matrix of N rows and L columns where each
column represents a homologous position
Applications of multiple sequence alignment

• Identification of conserved sequence patterns


and motifs in the whole sequence family.
• An essential prerequisite to carrying out
– Phylogenetic analysis of sequence families
– Prediction of protein secondary and tertiary
structure.
• Designing degenerate polymerase chain
reaction (PCR) primers based on multiple
related sequences.
TYPES OF MSA
Progressive method
This approach repeatedly aligns two sequences,
two alignments, or a sequence with an alignment.
Iterative method
Works similarly to progressive methods but
repeatedly realigns the initial sequences as well as
adding new sequences to the growing MSA.
Block based method
The method identifies a block of ungapped
alignment shared by all the sequences.
PROGRESSIVE ALIGNMENT
• The most widely used approach
• Builds up a final MSA by combining pairwise alignments
beginning with the most similar pair and progressing to the
most distantly related
• Progressive alignment methods require two stages: ‐
• A first stage in which the relationships between the sequences
are represented as a tree, called a guide tree
• Second step in which the MSA is built by adding the
sequences sequentially to the growing MSA according to the
guide tree
• Clustal W is a progressive multiple alignment program.
WORKING OF CLUSTAL W
• First perform all possible pairwise alignments between
each pair of sequences.
• Calculate the ‘distance’ between each pair of
sequences based on these isolated pairwise
alignments.
• Generate a distance matrix.
• Generate a Neighbor‐Joining ‘guide tree’ from these
pairwise distances.
• This guide tree gives the order in which the progressive
alignment will be carried out.
Important features
Flexibility of using substitution matrix
It applies different scoring matrices when aligning
sequences, depending on degrees of similarity. The
choice of a matrix depends on the evolutionary
distances measured from the guide tree.
Adjustable gap penalties
A gap near a series of hydrophobic residues carries
more penalties than the one next to a series of
hydrophilic residues. In addition, gaps that are too
close to one another can be penalized more than gaps
occurring isolated.
PROS AND CONS OF PROGRESSIVE METHOD
OF ALIGNMENT
PROS
• Efficient enough to implement on a large scale for
many (100s to 1000s) sequences.
• Progressive alignment services are commonly
available on publicly accessible web servers, so
users need not locally install the applications of
interest.
• Most widely used method of multiple sequence
alignment because of speed and accuracy
CONS:
• Progressive alignments are not guaranteed to be
globally optimal.
• The primary problem is that when errors are made
at any stage in growing the MSA, these errors are
then propagated through to the final result.
• Performance is also particularly bad when all of
the sequences in the set are rather distantly
related
Iterative method
Works similarly to progressive methods but repeatedly
realigns the initial sequences as well as adding new sequences
to the growing MSA.

• To reduce the error in progress alignment, iterative methods


are introduced.
• Iterative methods are also heuristics.
• Basic idea:
– Generate an initial multiple alignment based on methods like
progress alignment.
– Iteratively improve the multiple alignment.
Block – Based Alignment
• For divergent sequences that share only
regional similarities.
• A local alignment based approach is used.
• The method identifies a block of ungapped
alignment shared by all the sequences.
• Block – based alignment programs is -
DIALIGN2
DIALIGN2
• The method breaks each of the sequences
down to smaller segments and performs all
possible pairwise alignments between the
segments.
• High – scoring segments, called blocks.
• It gives block-to-block comparison rather
than residue-to-residue comparison.
• The program is especially suitable for aligning
divergent sequences with only local
similarity.
Sum of Pairs

• The sum of pairwise distances between all pairs of


sequences for some scoring matrix

S (mi )   s (mik , mil )


k l

• Not only assumes that alignment of each column


is independent, but also each pair of sequences.
– Each sequence is scored as if descended from k-1
sequences instead of one common ancestor.
PSI- BLAST (Position Specific Iterated)
• BLAST provides a new automatic “profile like” search.
• Iterative procedure:
– Perform BLAST on database.
– Use Significant alignments to construct a “position specific” score
matrix.
– This matrix replaces the query sequence in the next round of database
searching.
• The program may be iterated until no new significant
alignments are found.
• Most commonly used search method today.
Creating a PSSM
• After aligning the sequences we see that there are
some conserved regions.
• We use the multiple alignment of Blast results to
create a Position Specific Scoring Matrix.
• This matrix represents information from a whole
family, it is more strict in highly conserved regions.
PSI-BLAST
PSI-BLAST is designed for more sensitive protein-protein similarity
searches.
 
Position-Specific Iterated (PSI)-BLAST is the most sensitive BLAST
program, making it useful for finding very distantly related proteins or
new members of a protein family. Use PSI-BLAST when your
standard protein-protein BLAST search either failed to find
significant hits, or returned hits with descriptions such as
"hypothetical protein" or "similar to...".

You might also like