Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
GLOBAL ALIGNMENT

      Pinky Sheetal V
      M.tech Bioinformatics
CONTENTS

   Sequence Alignment

   Dynamic Programming Algorithm

   Global Alignment
The result of inserting gaps into the strings such that
afterwards as many positions as possible coincides.



X: AGGCTATCA
Y: TAGCTATCA
Scoring weights:

For a match : +m
For a mismatch : -s
For a gap : -d

Alignment Score:

F = (# matches) x m - (# mismatches) x s – (#gaps) x d
Complex Problem




Sub prob1




 Soln 1        Sub prob2




                              Sub prob3
                Soln 2




                                Soln 3
GLOBAL ALIGNMENT
•Allows obtaining the optimal alignment with linear gap cost has
been proposed by Needleman and Wunsch by providing a
score, for each position of the aligned sequences.


•Based on the dynamic programming technique.


•For two sequences of length m and n we define a matrix of
dimensions m+1 and n+1.
Termination Condition:
Optimal score between the two sequences
obtained at the last cell of the last row and last
column.
Sequences:
S: ATTATCT
T: TTTCTA

    T

S   0   _    T    T    T    C    T    A
    _   0    -1   -2   -3   -4   -5   -6

    A   -1   0    -1   -2   -3   -4   -5

    T   -2   1    2    1    0    -1   -2

    T   -3   0    3    4    3    2    1

    A   -4   -1   2    3    4    3    4    Match Score = +2
                                           Mismatch Score = 0
    T   -5   -2   1    4    3    6    5
                                           Gap Penalty = -1
    C   -6   -3   0    3    6    5    6

    T   -7   -4   -1   2    5    8    7
T

    0       _    T    T    T    C    T    A
S
    _       0    -1   -2   -3   -4   -5   -6

    A       -1   0    -1   -2   -3   -4   -5

    T       -2   1    2    1    0    -1   -2

    T       -3   0    3    4    3    2    1

    A       -4   -1   2    3    4    3    4

    T       -5   -2   1    4    3    6    5

    C       -6   -3   0    3    6    5    6

    T       -7   -4   -1   2    5    8    7
   Optimal Alignment:
S
     ATTATC T–

T    - TT – TC TA

No: of matches = 5
No: of mismatches = 3

(5 x 2) – (3 x -1) = 7
Tools that utilize Global Alignment Algorithm
 EMBOSS Needle
 EMBOSS Stretcher


Applications:
 Identify Conserved Interaction Pathways and Complexes [Brian P.
  Kelley,et al.2003]

   Functional Orthology Detection [ Rohit Singh.et al.2008]

Advantages:
The similar sequence region is of the same order and orientation.

Disadvantage:
Slow, Memory Intensive
Cannot be applied on genome-sized sequences
Global alignment

More Related Content

Global alignment

  • 1. GLOBAL ALIGNMENT Pinky Sheetal V M.tech Bioinformatics
  • 2. CONTENTS  Sequence Alignment  Dynamic Programming Algorithm  Global Alignment
  • 3. The result of inserting gaps into the strings such that afterwards as many positions as possible coincides. X: AGGCTATCA Y: TAGCTATCA
  • 4. Scoring weights: For a match : +m For a mismatch : -s For a gap : -d Alignment Score: F = (# matches) x m - (# mismatches) x s – (#gaps) x d
  • 5. Complex Problem Sub prob1 Soln 1 Sub prob2 Sub prob3 Soln 2 Soln 3
  • 7. •Allows obtaining the optimal alignment with linear gap cost has been proposed by Needleman and Wunsch by providing a score, for each position of the aligned sequences. •Based on the dynamic programming technique. •For two sequences of length m and n we define a matrix of dimensions m+1 and n+1.
  • 8. Termination Condition: Optimal score between the two sequences obtained at the last cell of the last row and last column.
  • 9. Sequences: S: ATTATCT T: TTTCTA T S 0 _ T T T C T A _ 0 -1 -2 -3 -4 -5 -6 A -1 0 -1 -2 -3 -4 -5 T -2 1 2 1 0 -1 -2 T -3 0 3 4 3 2 1 A -4 -1 2 3 4 3 4 Match Score = +2 Mismatch Score = 0 T -5 -2 1 4 3 6 5 Gap Penalty = -1 C -6 -3 0 3 6 5 6 T -7 -4 -1 2 5 8 7
  • 10. T 0 _ T T T C T A S _ 0 -1 -2 -3 -4 -5 -6 A -1 0 -1 -2 -3 -4 -5 T -2 1 2 1 0 -1 -2 T -3 0 3 4 3 2 1 A -4 -1 2 3 4 3 4 T -5 -2 1 4 3 6 5 C -6 -3 0 3 6 5 6 T -7 -4 -1 2 5 8 7
  • 11. Optimal Alignment: S ATTATC T– T - TT – TC TA No: of matches = 5 No: of mismatches = 3 (5 x 2) – (3 x -1) = 7
  • 12. Tools that utilize Global Alignment Algorithm  EMBOSS Needle  EMBOSS Stretcher Applications:  Identify Conserved Interaction Pathways and Complexes [Brian P. Kelley,et al.2003]  Functional Orthology Detection [ Rohit Singh.et al.2008] Advantages: The similar sequence region is of the same order and orientation. Disadvantage: Slow, Memory Intensive Cannot be applied on genome-sized sequences