0% found this document useful (0 votes)

24 views

Lecture - 02 - Comparative Sequence Analysis

Uploaded by

abubakrashfaq1607

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Lecture - 02 - Comparative Sequence Analysis

Uploaded by

abubakrashfaq1607

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Comparative

Sequence Analysis

Department of Life Sciences, SBASSE, LUMS

Genome to Gene

Heredity Unit

2
Latest on Genome Sequencing
• Human Genome Project (1990 – 2003)

Now!

3
Our Genome and Need for Comparative
Genomics
• Number of bases: 3.2 billion bases

• Number of chromosomes: 23 pairs

• Percentage of genes: Only 1% of genome is genes

• Protein-coding Gene Number: 20,000 - 25,000

• Average gene size: ~ 3000 bases & huge variation

• Largest known human gene consists of 2.4 million bases (dystrophin)

• Repetition: Almost 45-50% of the DNA is repetitive

• Similarity between individuals: Almost all (99.9%) nucleotide bases are exactly the same
in all people 4
Proteome to Protein
Genes: 30,000

Alternative Splicing: 2 - 3 per gene

3 x 30,000 = 90,000 proteins

Post translational modifications

10 x 90,000 = 900,000 proteins

Peng and Gygi, JMS 2001

Asa Wheelock

5
Need for Comparative Proteomics
• Number of reported proteins: 150 million and counting

6
Benefits of Comparative Genomics
• Comparison of whole genome sequences provides a highly detailed
view of how organisms are related to each other at the genetic level

• Comparative genomics also provides a powerful tool for studying

evolutionary changes among organisms

• Helps to identify genes that are conserved or common among species

that give each organism its unique characteristics

7
Fly vs. Humans
Comparison between fruit fly genome with the human genome:

• about 75% percent of genes are conserved

• two organisms appear to share a core set of genes

• two-thirds of human genes known to be involved in cancer have

counterparts in the fruit fly

8
Evolutionary Relationship

9
COV2

10
http://bacterialphylogeny.info/overview.html

11
What have we done and what’s
next?
DONE: Gene and Protein Sequences
• GenBank (DNA Sequences)
• Uniprot (Protein Sequences)
• GeneMark (Gene Prediction)

NEXT: Sequence & Structure Analysis

• BLAST (nucleotide, protein)
• PDB
• iTASSER

12
From Sequences to Comparisons
• Problem: If we sequence a new gene or protein, can we compare it
with the existing information in GenBank or Uniprot?

• Idea: Compare NOVEL sequences with KNOWN (previously

characterized) genes or proteins.

• Benefit: STRUCTURAL , FUNCTIONAL and EVOLUTIONARY

information can be inferred from WELL DESIGNED comparisons.

• The most common tool used is called BLAST.

13
BLAST?
• Basic Local Alignment Search Tool

• A method for rapid searching of sequence databases, for both

nucleotides and proteins.

• The BLAST algorithm detects local as well as global matches

(alignments) and regions of similarity embedded in otherwise unrelated
proteins.

• Uses statistical theory to determine if a match might have occurred by

chance.
14
https://blast.ncbi.nlm.nih.gov/Blast.cgi

15
BLAST - Workflow
1. BLAST searches the database sequences using “Dynamic Programming” on “promising”
sequences.

2. This is done by indexing all database sequences in a so-called suffix-tree which makes it
very fast to search for perfect matching sub-strings. A suffix tree is the quickest possible
way (so far) to search for the longest matching sub-string between two strings.

3. BLAST creates a list of all “words” (short subsequences) that have a certain “threshold”
score when compared with the query sequence. Words are 16-256 nucleotides or 3
amino acids put together in a row consecutively.

4. A lookup hash table is made of all such words and “neighboring” words present in the
query sequence (rather than just random words).

5. When a BLAST search is run, candidate sequences from the database is picked based on
perfect matches to small sub-sequences in the query sequence. 16
BLOSUM62 Match/Mismatch Matrix

17
• Here the word is PQG and
Score from neighboring words are
BLOSUM everything with a score
above 13 (for three
letters) as calculated by
the given scoring system
(e.g., BLOSUM62).
T is user provided threshold!
• PSG is a neighboring word,
PQA is not.

18
Example Blast search method
Query sequence: PQGELV

•Make list of all possible k-mer words (length 3 for proteins)

PQG (score 18)
QGE (score 16)
GEL (score 15)
ELV (score 13)

•Assign scores from Blosum62, use those with score >= 13

• PQG, QGE, GEL & ELV

•In total we get: PQG, QGE, GEL & ELV

Example Blast search method
• Make k-mer (word-size 3) of all sequences in database
• Store in a suffix-tree (fast tree-structure to search for identical matches)

• Find all database sequences that has at least 2 matches among our 3 words
• PQG, GEL & PEG

• Find database hit and extend alignment (High-scoring Segment Pair):

Query: M E T P Q G I A V
Database: - - - P Q G E L V
8 5 5 2 0 8

• HSP: PQGI (score 8+5+5+2)

• If 2 HSP in query sequence are < 40 positions away

• Full alignment on query and hit sequences
Advantages of BLAST
• The BLAST algorithm was written balancing speed and
increased sensitivity for finding distant sequence relationships.
• Speed is achieved by:
1. Pre-indexing the database before the search
2. Parallel processing
3. Hash table that contains neighborhood words rather than just random words.

• BLAST emphasizes regions of local alignment to detect

relationships among sequences having isolated regions of
similarity between them.

21
BLAST for Nucleotides and Proteins
• Nucleotides
• blastn
• Compares a nucleotide query sequence against a nucleotide sequence
database.

• Proteins
• blastp
• Compares an amino acid query sequence against a protein sequence
database.

22
Comparing an unknown nucleotide
sequence with possible “protein”
sequences!!
• blastx
> but what about the 6 possible ORFs?

• Compares a nucleotide query sequence translated in all reading

frames against a protein sequence database.

• This option may be used to find potential translation products of

an unknown nucleotide sequence.

23
How about the reverse of blastx?
• tblastn

• Compares a protein query sequence against a nucleotide

sequence database dynamically translated in all reading
frames.

24
Comparing all translated ORFs of a
nucleotide sequence with all ORFs
of a nucleotide DB
• tblastx

• Compares the six-frame translations of a nucleotide query

sequence against the six-frame translations of a nucleotide
sequence database.

25
Getting started with BLAST
Getting started:
http://www.ncbi.nlm.nih.gov/
http://www.ncbi.nlm.nih.gov/BLAST/
and
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html

26
So what if we find out the Alien
Gene in GenBank?
• Homologs
• Features (including DNA and protein sequences) in species being compared that are similar
because they are ancestrally related

• Homologs can be either Orthologs and Paralogs

• Orthologs
• Homologous genes (or any DNA sequences) that separated because of a speciation event
• Derived from the same gene in the last common ancestor

• Paralogs
• Homologous genes that separated because of gene duplication events within the same species

27
28

Advanced Data Engineering With Databricks
No ratings yet
Advanced Data Engineering With Databricks
154 pages
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Blast
100% (1)
Blast
21 pages
Blast (Basic Local Alignment Search Tool)
No ratings yet
Blast (Basic Local Alignment Search Tool)
28 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Basic Bioinformatics
No ratings yet
Basic Bioinformatics
40 pages
Bs982 l08 Basic Blast
No ratings yet
Bs982 l08 Basic Blast
38 pages
TY-Exercise_4_(35)
No ratings yet
TY-Exercise_4_(35)
8 pages
GlOsario Bioinformatica
No ratings yet
GlOsario Bioinformatica
5 pages
Bioinformatics: Arushi Dinesh Kasi Shruthi
No ratings yet
Bioinformatics: Arushi Dinesh Kasi Shruthi
28 pages
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
100% (1)
Blast: Background: BLAST Is One of The Most Widely Used Bioinformatics Programs
4 pages
Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
TY-Exercise_4_(35)(Updated)
No ratings yet
TY-Exercise_4_(35)(Updated)
7 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
Slides 3
No ratings yet
Slides 3
53 pages
Bioinformatics Tutorial 2019
No ratings yet
Bioinformatics Tutorial 2019
54 pages
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
100% (1)
04B. Bioinformatics-Lecture 4 (Alternative) - Blast
38 pages
Seminari 3- Analisis Estructura Proteines
No ratings yet
Seminari 3- Analisis Estructura Proteines
56 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
Lecture 3 Database
No ratings yet
Lecture 3 Database
81 pages
BLAST
100% (1)
BLAST
4 pages
bioinformatics
No ratings yet
bioinformatics
8 pages
Basic Local Alignment
No ratings yet
Basic Local Alignment
36 pages
Bioinformatics Cheat Sheet
No ratings yet
Bioinformatics Cheat Sheet
4 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
Some Significant Databases Blast Blast
No ratings yet
Some Significant Databases Blast Blast
18 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Final Blast PDF
No ratings yet
Final Blast PDF
31 pages
Genome Basic Concept, Terminology and Tools
No ratings yet
Genome Basic Concept, Terminology and Tools
47 pages
Module8 ComparGenomics
No ratings yet
Module8 ComparGenomics
27 pages
Blast Nsuite
No ratings yet
Blast Nsuite
19 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
Bif401 Manual 2023
No ratings yet
Bif401 Manual 2023
27 pages
Data Retrieval
67% (3)
Data Retrieval
17 pages
Fundamentals of bioinformatics_L5
No ratings yet
Fundamentals of bioinformatics_L5
56 pages
BTH 403-BTG407 PRACTICAL SESSION1
No ratings yet
BTH 403-BTG407 PRACTICAL SESSION1
12 pages
Coursera 14b Unit 1-Ncbi PDF
No ratings yet
Coursera 14b Unit 1-Ncbi PDF
5 pages
BLAST - Compare & Identify Sequences - NCBI Bioinformatics Resources - An Introduction - Library Guides at UC Berkeley
No ratings yet
BLAST - Compare & Identify Sequences - NCBI Bioinformatics Resources - An Introduction - Library Guides at UC Berkeley
1 page
LSM2241 Practical 4: Introduction To BLAST
No ratings yet
LSM2241 Practical 4: Introduction To BLAST
12 pages
Experiment No: 1 Aim
No ratings yet
Experiment No: 1 Aim
13 pages
BCH 505 Bioinformatics 3(2 2) Databases
No ratings yet
BCH 505 Bioinformatics 3(2 2) Databases
17 pages
(73 Finding Protein Similarities With Nucleotide Sequence Databases
No ratings yet
(73 Finding Protein Similarities With Nucleotide Sequence Databases
22 pages
Same Nva Tting
No ratings yet
Same Nva Tting
22 pages
Unigene
No ratings yet
Unigene
7 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Sequence Alignment
No ratings yet
Sequence Alignment
29 pages
Retrieval of Data
No ratings yet
Retrieval of Data
22 pages
Bioinformatics Session11
No ratings yet
Bioinformatics Session11
19 pages
38401062 Introduction
No ratings yet
38401062 Introduction
13 pages
BI Manual
No ratings yet
BI Manual
35 pages
Factsheet: Eukaryotic Genome Annotation
No ratings yet
Factsheet: Eukaryotic Genome Annotation
2 pages
Essential Info Notes-1
No ratings yet
Essential Info Notes-1
57 pages
L2 Proteomics, Genomics and Bioinformatics
No ratings yet
L2 Proteomics, Genomics and Bioinformatics
30 pages
BIF401 Midterm Short Notes
No ratings yet
BIF401 Midterm Short Notes
45 pages
Genome Analysis: DNA Typing, Genomics, and Beyond
No ratings yet
Genome Analysis: DNA Typing, Genomics, and Beyond
92 pages
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Genetics
From Everand
Genetics
mark davies
No ratings yet
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Top For Oracle 143
No ratings yet
Top For Oracle 143
10 pages
Introduction To SQL
No ratings yet
Introduction To SQL
22 pages
Arcgis Enterprise: Functionality Matrix
No ratings yet
Arcgis Enterprise: Functionality Matrix
11 pages
Dot Net Questions Solve 2024..
No ratings yet
Dot Net Questions Solve 2024..
34 pages
20201026-WP-Confluent Platform Ref Architecture
No ratings yet
20201026-WP-Confluent Platform Ref Architecture
26 pages
Hibernate Questions
No ratings yet
Hibernate Questions
65 pages
Lab 01 Getting Started With Oim 11g
No ratings yet
Lab 01 Getting Started With Oim 11g
115 pages
XII IP MS
No ratings yet
XII IP MS
6 pages
ABAP Dictionary Objects in Eclipse
No ratings yet
ABAP Dictionary Objects in Eclipse
20 pages
Introduction to Flask Python Web Framework
No ratings yet
Introduction to Flask Python Web Framework
6 pages
Transaction With Replicated Data PDF
No ratings yet
Transaction With Replicated Data PDF
3 pages
Namana
No ratings yet
Namana
28 pages
The Firebird book a reference for database developers 2nd ed. Edition Borrie download pdf
100% (1)
The Firebird book a reference for database developers 2nd ed. Edition Borrie download pdf
61 pages
Mongodb Introduction: Presenter: John Page
No ratings yet
Mongodb Introduction: Presenter: John Page
63 pages
WP Hacme Bank v2 User Guide
No ratings yet
WP Hacme Bank v2 User Guide
74 pages
Oracle Apps Cloning Ver 1 0
No ratings yet
Oracle Apps Cloning Ver 1 0
11 pages
Working With Archival Producers_20250106_FINAL
No ratings yet
Working With Archival Producers_20250106_FINAL
18 pages
Railway Booking System Design
No ratings yet
Railway Booking System Design
7 pages
System Design Cheatsheet 1651760511
No ratings yet
System Design Cheatsheet 1651760511
6 pages
Class 12 CS Record File 2023-2024
No ratings yet
Class 12 CS Record File 2023-2024
59 pages
PL 300ExamRequirements230131from31January2023
100% (1)
PL 300ExamRequirements230131from31January2023
4 pages
DATA Base System Class Notes 2024
No ratings yet
DATA Base System Class Notes 2024
22 pages
SQL Create Constraints
No ratings yet
SQL Create Constraints
10 pages
JJ PDF
No ratings yet
JJ PDF
1 page
Lec. 1 - Introduction To DBMS
No ratings yet
Lec. 1 - Introduction To DBMS
32 pages
SPSPro Manual
No ratings yet
SPSPro Manual
74 pages
Errors 3
No ratings yet
Errors 3
2 pages
Dbms
No ratings yet
Dbms
5 pages
Database Query Criteria
No ratings yet
Database Query Criteria
25 pages