A sequence alignment arranges DNA, RNA, or protein sequences to identify similar regions that may indicate functional or evolutionary relationships. Multiple sequence alignment aligns more than two sequences by inserting gaps to arrange homologous positions in a matrix. Progressive alignment methods build a multiple alignment starting with the most similar pairs and adding sequences according to a guide tree.
A sequence alignment arranges DNA, RNA, or protein sequences to identify similar regions that may indicate functional or evolutionary relationships. Multiple sequence alignment aligns more than two sequences by inserting gaps to arrange homologous positions in a matrix. Progressive alignment methods build a multiple alignment starting with the most similar pairs and adding sequences according to a guide tree.
A sequence alignment arranges DNA, RNA, or protein sequences to identify similar regions that may indicate functional or evolutionary relationships. Multiple sequence alignment aligns more than two sequences by inserting gaps to arrange homologous positions in a matrix. Progressive alignment methods build a multiple alignment starting with the most similar pairs and adding sequences according to a guide tree.
A sequence alignment arranges DNA, RNA, or protein sequences to identify similar regions that may indicate functional or evolutionary relationships. Multiple sequence alignment aligns more than two sequences by inserting gaps to arrange homologous positions in a matrix. Progressive alignment methods build a multiple alignment starting with the most similar pairs and adding sequences according to a guide tree.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 22
SEQUENCE ALIGNMENT
A sequence alignment is a way of arranging the
sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. TYPES OF SEQUENCE ALIGNMENTS Pair‐ wise alignment • Dot matrix method • Dynamic programming • Word methods Multiple sequence alignment • Progressive methods • Iterative methods • Block based method Multiple Alignment • Proteins can be classified into families: – Common structure. – Common function. – Common evolutionary origin. • For a set of sequences belonging to some family – Each pair has some differences – But, there are some common motifs in almost all sequences of the family • A multiple alignment carries more information than pairwise alignment MULTIPLE SEQUENCE ALIGNMENT A multiple sequence alignment is an alignment of n > 2 sequences obtained by inserting gaps (“‐”) into sequences such that the resulting sequences have all length L and can be arranged in a matrix of N rows and L columns where each column represents a homologous position Applications of multiple sequence alignment
• Identification of conserved sequence patterns
and motifs in the whole sequence family. • An essential prerequisite to carrying out – Phylogenetic analysis of sequence families – Prediction of protein secondary and tertiary structure. • Designing degenerate polymerase chain reaction (PCR) primers based on multiple related sequences. TYPES OF MSA Progressive method This approach repeatedly aligns two sequences, two alignments, or a sequence with an alignment. Iterative method Works similarly to progressive methods but repeatedly realigns the initial sequences as well as adding new sequences to the growing MSA. Block based method The method identifies a block of ungapped alignment shared by all the sequences. PROGRESSIVE ALIGNMENT • The most widely used approach • Builds up a final MSA by combining pairwise alignments beginning with the most similar pair and progressing to the most distantly related • Progressive alignment methods require two stages: ‐ • A first stage in which the relationships between the sequences are represented as a tree, called a guide tree • Second step in which the MSA is built by adding the sequences sequentially to the growing MSA according to the guide tree • Clustal W is a progressive multiple alignment program. WORKING OF CLUSTAL W • First perform all possible pairwise alignments between each pair of sequences. • Calculate the ‘distance’ between each pair of sequences based on these isolated pairwise alignments. • Generate a distance matrix. • Generate a Neighbor‐Joining ‘guide tree’ from these pairwise distances. • This guide tree gives the order in which the progressive alignment will be carried out. Important features Flexibility of using substitution matrix It applies different scoring matrices when aligning sequences, depending on degrees of similarity. The choice of a matrix depends on the evolutionary distances measured from the guide tree. Adjustable gap penalties A gap near a series of hydrophobic residues carries more penalties than the one next to a series of hydrophilic residues. In addition, gaps that are too close to one another can be penalized more than gaps occurring isolated. PROS AND CONS OF PROGRESSIVE METHOD OF ALIGNMENT PROS • Efficient enough to implement on a large scale for many (100s to 1000s) sequences. • Progressive alignment services are commonly available on publicly accessible web servers, so users need not locally install the applications of interest. • Most widely used method of multiple sequence alignment because of speed and accuracy CONS: • Progressive alignments are not guaranteed to be globally optimal. • The primary problem is that when errors are made at any stage in growing the MSA, these errors are then propagated through to the final result. • Performance is also particularly bad when all of the sequences in the set are rather distantly related Iterative method Works similarly to progressive methods but repeatedly realigns the initial sequences as well as adding new sequences to the growing MSA.
• To reduce the error in progress alignment, iterative methods
are introduced. • Iterative methods are also heuristics. • Basic idea: – Generate an initial multiple alignment based on methods like progress alignment. – Iteratively improve the multiple alignment. Block – Based Alignment • For divergent sequences that share only regional similarities. • A local alignment based approach is used. • The method identifies a block of ungapped alignment shared by all the sequences. • Block – based alignment programs is - DIALIGN2 DIALIGN2 • The method breaks each of the sequences down to smaller segments and performs all possible pairwise alignments between the segments. • High – scoring segments, called blocks. • It gives block-to-block comparison rather than residue-to-residue comparison. • The program is especially suitable for aligning divergent sequences with only local similarity. Sum of Pairs
• The sum of pairwise distances between all pairs of
sequences for some scoring matrix
S (mi ) s (mik , mil )
k l
• Not only assumes that alignment of each column
is independent, but also each pair of sequences. – Each sequence is scored as if descended from k-1 sequences instead of one common ancestor. PSI- BLAST (Position Specific Iterated) • BLAST provides a new automatic “profile like” search. • Iterative procedure: – Perform BLAST on database. – Use Significant alignments to construct a “position specific” score matrix. – This matrix replaces the query sequence in the next round of database searching. • The program may be iterated until no new significant alignments are found. • Most commonly used search method today. Creating a PSSM • After aligning the sequences we see that there are some conserved regions. • We use the multiple alignment of Blast results to create a Position Specific Scoring Matrix. • This matrix represents information from a whole family, it is more strict in highly conserved regions. PSI-BLAST PSI-BLAST is designed for more sensitive protein-protein similarity searches.
Position-Specific Iterated (PSI)-BLAST is the most sensitive BLAST program, making it useful for finding very distantly related proteins or new members of a protein family. Use PSI-BLAST when your standard protein-protein BLAST search either failed to find significant hits, or returned hits with descriptions such as "hypothetical protein" or "similar to...".