The document discusses global sequence alignment and the Needleman-Wunsch algorithm. Global alignment finds the optimal alignment of two sequences over their entire lengths by allowing gaps. It uses a dynamic programming approach to fill a matrix to find the highest scoring alignment based on matches, mismatches and gap penalties. The algorithm allows obtaining an alignment score that accounts for the total number of matches and mismatches as well as gap penalties.
7. •Allows obtaining the optimal alignment with linear gap cost has
been proposed by Needleman and Wunsch by providing a
score, for each position of the aligned sequences.
•Based on the dynamic programming technique.
•For two sequences of length m and n we define a matrix of
dimensions m+1 and n+1.
9. Sequences:
S: ATTATCT
T: TTTCTA
T
S 0 _ T T T C T A
_ 0 -1 -2 -3 -4 -5 -6
A -1 0 -1 -2 -3 -4 -5
T -2 1 2 1 0 -1 -2
T -3 0 3 4 3 2 1
A -4 -1 2 3 4 3 4 Match Score = +2
Mismatch Score = 0
T -5 -2 1 4 3 6 5
Gap Penalty = -1
C -6 -3 0 3 6 5 6
T -7 -4 -1 2 5 8 7
10. T
0 _ T T T C T A
S
_ 0 -1 -2 -3 -4 -5 -6
A -1 0 -1 -2 -3 -4 -5
T -2 1 2 1 0 -1 -2
T -3 0 3 4 3 2 1
A -4 -1 2 3 4 3 4
T -5 -2 1 4 3 6 5
C -6 -3 0 3 6 5 6
T -7 -4 -1 2 5 8 7
11. Optimal Alignment:
S
ATTATC T–
T - TT – TC TA
No: of matches = 5
No: of mismatches = 3
(5 x 2) – (3 x -1) = 7
12. Tools that utilize Global Alignment Algorithm
EMBOSS Needle
EMBOSS Stretcher
Applications:
Identify Conserved Interaction Pathways and Complexes [Brian P.
Kelley,et al.2003]
Functional Orthology Detection [ Rohit Singh.et al.2008]
Advantages:
The similar sequence region is of the same order and orientation.
Disadvantage:
Slow, Memory Intensive
Cannot be applied on genome-sized sequences