Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views

Alignment Methods

Sequence alignment is a method for arranging DNA, RNA, or protein sequences to identify similarities and infer relationships. It includes global and local alignment techniques, with algorithms like Needleman-Wunsch and Smith-Waterman used for scoring. Applications of sequence alignment include function prediction, gene finding, and database searching.

Uploaded by

anis442643
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Alignment Methods

Sequence alignment is a method for arranging DNA, RNA, or protein sequences to identify similarities and infer relationships. It includes global and local alignment techniques, with algorithms like Needleman-Wunsch and Smith-Waterman used for scoring. Applications of sequence alignment include function prediction, gene finding, and database searching.

Uploaded by

anis442643
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Sequence Alignment

Dr. Shazia Rehman


Sequence Alignment
a way of arranging sequences of DNA,RNA or protein to identify regions of
similarity
 Helps in inferring functional , Structural or evolutionary relationship between
the sequence
Sequence alignment methods are used to find the best- matching sequences
The sequence alignment is made between a known sequence and unknown
sequence or between two unknown sequences.
The known sequence is called reference sequence. the unknown sequence is
called query sequence.
Conti…
Sequence alignment is important for:

prediction of function
database searching
gene finding
sequence divergence
sequence assembly
Scoring system

• Simple alignment scores


• A simple way (but not the best) to score an alignment is to count 1
for each match and 0 for each mismatch.
Types
Types of Sequence Alignment
Based on sequence Length – According to the length of sequence being
compared it is of following two types
1) Global sequence Alignment
2) Local sequence Alignment
Global alignment program is based on Needleman-Wunsch algorithm and local
alignment on Smith-Waterman. Both algorithms are derivates from the basic
dynamic programming algorithm.
Conti…
1) Global sequence Alignment – In this method, we consider the entire length of
the 2 sequences and try to match them to obtain the best alignment. these
alignments are also called as Needleman Wunsch.
It can be of two types
PSA (Pairwise sequence alignment)
MSA (Multiple sequence alignment)
 It is obtained by inserting gaps (spaces) to X and Y until the length of the two
sequences will be the same so that the two sequences are matched.
Gaps
• Gap is a succession of indels in alignment
• C T – - - AA
• C T C G C AA
Gaps represent
a) deletions or insertions events
b) sites with missing information
Scoring system
For example, consider the sequences
X = ACGCTGAT and Y = CAGCTAT. One possible global alignment is
If we set a scoring scheme as match score = 1, mismatch score = 0 and gap penalty
= 0, then the overall score for the above alignment will be,
Needleman-Wunsch Algorithm

• Needleman-Wunsch Algorithm
• One of the algorithms that uses dynamic programming to obtain global alignment
is the Needleman-Wunsch algorithm.
• This algorithm was published by Needleman and Wunsch in 1970
• The Needleman-Wunsch algorithm finds the best-scoring global alignment
between two sequences.
Conti…
2) Local sequence Alignment – In this alignment sequences are aligned to find a
region of higher density or strong similarity.

• For example, consider 2 sequences as X=GGTCTGATG and Y=AAACGATC.


Characters in bold are the subsequences to be considered. The best local
alignment is,
Scoring system
• If we set a scoring scheme as match score = 1, mismatch score = 0 and
gap penalty = 0, then the overall score for the above alignment will
be,
Smith–Waterman algorithm
The Smith–Waterman algorithm is a well-known algorithm for performing local
sequence alignment; that is,
for determining similar regions between two nucleotide or protein sequences.
Instead of looking at the total sequence, the Smith–Waterman algorithm
compares segments of all possible lengths and optimizes the similarity measure.
Conti…
Based on Number of sequence- According to number of sequence being compared
it is of following two types
1) Pairwise Sequence Alignment - This involves aligning two sequences and to
get the best region of similarity.
Seq 1 - 1 KTSSGNGAEDS 11
|||||||||||
Seq 2 - 1 KTSSGNGAEDS 11
Conti….
Pair-wise Alignment
1.Collect the two sequences
2. Align the sequences
3. Count the mutations in the alignment
4. Score the alignments
Conti…
A pairwise alignment consists of a series of paired bases, one base from each
sequence.
There are three types of pairs:
(1) matches = the same nucleotide appears in both sequences.
(2) mismatches = different nucleotides are found in the two sequences.
(3) gaps = a base in one sequence and a null base in the other
Methods for pairwise alignment
Various methods used for pairwise alignment of nucleotide and protein
sequences are:
1) Dot Plot – It is graphical method for two sequences to identify the region of
maximum similarity and dissimilarity, depicted by presence and absence of DOTS.
A dot matrix is a grid system where the similar nucleotides of two DNA
sequences are represented as dots.
It is a pairwise sequence alignment made in the computer.
Dot plot illustration
Conti…
In dot matrix , nucleotides of one sequence are written from the left to right on
the top row and those of the other sequence are written from the top to bottom
on the left side (column) of the matrix.
At every point, where the two nucleotides are the same , a dot in the intersection
of row and column becomes a dark dot.
when all these darken dots are connected, it gives a graph called dot plot.
Dynamic Programming Method

Dynamic Programming Method


Dynamic programming is a method that determines optimal alignment by
matching two sequences for all possible pairs of characters between the two
sequences.
It is fundamentally similar to the dot matrix method in that it also creates a two
dimensional alignment grid.
However, it finds alignment in more quantitative way to account for matches and
mismatches between sequences.
Conti…
Heuristic Method – When a single sequence is to be compared against the whole
database heuristic methods like BLAST and FASTA are used.
Multiple sequence Alignment
Multiple sequence Alignment - This involves the alignment of more than two
(protein, DNA) sequences and assess the sequence conservation of proteins
domains and protein structures.
It is an extrapolation of pairwise sequence alignment which reflects alignment of
similar sequences and provides a better alignment score.
Example –
Seq 1 - PQGGGGWGQ
Seq 2 - PHGGGWGQ
Seq 3 - PHGGGWGQ
Seq 4 - PHGGGWGQ
Seq 5 - PHGGGWGQ
Conti…
Tools and softwares for MSA
• Many tools:
• Clustal (ClustalW, ClustalX, Clustal Omega, etc.)
• T-Coffee
• MAFFT
• MUSCLE
Software
• MEGA
• BioEdit
CLUSTAL program
WORKING OF CLUSTAL
There are two types of Clustal (ver. 2) programs:
(1) ClustalW (has a command-line user interface)
and (2) ClustalX (has a GUI)
Clustal Omega is the latest addition to the Clustal family.
This high-capacity program aligns hundreds of thousands
of sequences in only a few hours.
it is preferable to work with protein sequences than
nucleotide sequences.
Clustal W
ClustalW uses a progressive method of alignment,
All pairs of sequences are aligned separately in order
to calculate a distance matrix giving the distance
between each pair of sequences.
A guide tree is calculated from the distance matrix;
The sequences are progressively aligned according to
the branching order in the guide tree.
Clustal W steps
Step2
Clustal out put
Conti…
The bottom row of the ClustalW output of multiple sequence alignment contains
stars (*),
 colons (:),
and dots (.)
A star below a column indicates a fully conserved or an invariant amino acid
residue,
a colon (:) denotes that all the residues in the column have roughly the same size
and hydrophobicity,
a dot (.) signifies that the different amino acid residues in the column are either
similar in size or hydrophobicity, while lack of a symbol indicates that the residues
in the column differ both in size and hydrophobicity.
Application of Sequence alignment

You might also like