0% found this document useful (0 votes)

2 views

Bioinformatics

Bioinformatics is a scientific discipline that integrates biology, computer science, and information technology to manage complex biological data. The field has evolved significantly since the 1990s with advancements in high-throughput DNA sequencing and the growth of various 'omics' projects, necessitating sophisticated computational tools for data analysis. Key tasks in bioinformatics include sequence alignment, protein folding, and evolutionary analysis, with various databases and algorithms available for researchers to utilize.

Uploaded by

georginaroudri

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Bioinformatics

Uploaded by

georginaroudri

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Bioinformatics:

Copyright© Kerstin Wagner

Introduction: What is bioinformatics?
Can be defined as the body of tools, algorithms needed to handle large
and complex biological information.

Bioinformatics is a scientific discipline created from the interaction

of biology and computer science.

The NCBI defines bioinformatics as:

"Bioinformatics is the field of science in which biology, computer
science, and information technology merge into a single discipline”
Genomics era: High-throughput DNA sequencing

The first high-throughput genomics

technology was automated DNA sequencing
in the early 1990.

In 1995, Venter and Hamilton used whole-

genome shotgun sequencing strategy to
sequence the genomes of Mycoplasma and
Haemophilus .

In September 1999, Celera Genomics

completed the sequencing of the
Drosophila genome.

The 3-billion-bp human genome sequence

was generated in a competition between
the publicly funded Human Genome
Project and Celera
High-throughput DNA sequencing

Top image: confocal detection

by the MegaBACE sequencer
of fluorescently labeled DNA

That was then. How about

now?
The trend of data growth
21st century is a century of biotechnology and OMICS:
8
7

Nucleotides(billion)
6
5
 Genomics: New sequence information is being 4
3
produced at increasing rates. (The 2
contents of GenBank double every year) 1
0
1980 1985 1990 1995 2000

 Transcriptomics: Microarray: Global expression analysis: RNA Years

levels of every gene in the genome analyzed in parallel.

Progressively replaced by RNA-seq

 Proteomics: Global protein analysis generates by large mass

spectra libraries.

 Metabolomics: Global metabolite analysis: 25,000 secondary

metabolites characterized
How to handle the large amount of information?

Drew Sheneman, New Jersey--The Newark Star Ledger

Answer: bioinformatics and Internet

Bioinformatics history
In1960s: the birth of bioinformatics

IBM 7090 computer

Margaret Oakley Dayhoff created:

The first protein database
The first program for sequence assembly

There is a need for computers and algorithms that allow:

Access, processing, storing, sharing, retrieving, visualizing, annotating…
Why do we need the Internet?
“omics” projects and the information associated with involve a huge amount
of data that is stored on computers all over the world.
Because it is impossible to maintain up-to-date copies of all relevant
databases within the lab. Access to the data is via the internet.
Database
storage

You are
here
Scope of this lab
The lab will touch on the following computational tasks:
Similaritysearch
Sequence comparison: Alignment, multiple alignment, retrieval
Sequences analysis: Signal peptide, transmembrane domain,…
Protein folding: secondary structure from sequence
Sequence evolution: phylogenetic trees

Make you familiar with bioinformatics resources available on the

web to do these tasks.
Applying algorithms to analyze genomics data
-Accession #?
-Annotation?
Is it already in
databases?
Protein Other
characteristics? information?
-Sub-localization -Expression profile?
-Soluble? -Mutants?
You have just
-3D fold
cloned a gene

Is there conserved Is there similar Evolutionary

regions? sequences? relationship?
-Alignments? -% identity? -Phylogenetic
-Domains? -Family member? tree

A critical failure of current bioinformatics is the lack of a single software

package that can perform all of these functions.
DNA (nucleotide sequences) databases
They are big databases and searching either one should produce
similar results because they exchange information routinely.

-GenBank (NCBI): http://www.ncbi.nlm.nih.gov

-Ensembl: http://useast.ensembl.org/index.html

-DDBJ (DNA DataBase of Japan): http://www.ddbj.nig.ac.jp

-TIGR: http://tigr.org/tdb/tgi

-Yeast: http://yeastgenome.org

-Microbes: http://img.jgi.doe.gov/cgi-bin/pub/main.cgi
Protein (amino acid) databases
Known proteins:
-Swiss-Prot (very high level of annotation)
http://au.expasy.org/

-PIR (protein identification resource) the world's most

comprehensive catalog of information on proteins
http://www.pir.uniprot.org/

Translated databases:
-TREMBL (translated EMBL): includes entries that have
not been annotated yet into Swiss-Prot.
http://www.ebi.ac.uk/trembl/access.html

-GenPept (translation of coding regions in GenBank)

-pdb (sequences derived from the 3D structure

Brookhaven PDB) http://www.rcsb.org/pdb/
Database homology searching
Use algorithms to efficiently provide mathematical basis of searches
that can be translated to statistical significance.

Assumes that sequence, structure, and function are inter-related.

All
similarity searching methods rely on the concepts of alignment
and distance between sequences.

A similarity
score is calculated from a distance: the number of DNA
bases or amino acids that are different between two sequences.
Database search methods: Sequence Alignment
Two broad classes of sequence alignments exist:

QKESGPSSSYC
 Global alignment: not sensitive
VQQESGLVRTTC

ESG
 Local alignment: faster
ESG

The most widely used local similarity algorithms are:

Smith-Waterman (http://www.ebi.ac.uk/MPsrch/)
Basic Local Alignment Search Tool (BLAST, http://www.ncbi.nih.gov)

Fast Alignment (FASTA, http://fasta.genome.jp; http://www.ebi.ac.uk/fasta33/;

http://www.arabidopsis.org/cgi-bin/fasta/nph-TAIRfasta.pl)
Which algorithm to use for database similarity search?

Speed:
BLAST > FASTA > Smith-Waterman (It is VERY SLOW and uses a
LOT OF COMPUTER POWER)

Sensitivity/statistics:
FASTA is more sensitive, misses less homologues
Smith-Waterman is even more sensitive.

BLAST calculates probabilities

FASTA more accurate for DNA-DNA search then BLAST

Tools to search databases
The dilemma: DNA or protein?

Search by similarity

Using nucleotide seq. Using amino acid seq.

 Is the comparison of two nucleotide sequences accurate?

 By translating into amino acid sequence, are we losing information?

The genetic code is degenerate (Two or more codons can represent
the same amino acid)

 Very different DNA sequences may code for similar protein sequences
We certainly do not want to miss those cases!
Reasons for translating
Comparing DNA sequences give more random matches:
A good alignment with end-gaps A very poor alignment

Almost 50% identity!

Conservation of protein in evolution (DNA similarity decays faster!)

Conclusion:
It is almost always better to compare coding sequences in their amino acid form,
especially if they are very divergent.
 Very highly similar nucleotide sequences may give better results.
BLAST and FASTA variants

FASTA: Compares a DNA query to DNA database, or a protein query

to protein database
FASTX: Compares a translated DNA query to a protein database
TFASTA: Compares a protein query to a translated DNA database

BLASTN: Compares a DNA query to DNA database.

BLASTP: Compares a protein query to protein database.

BLASTX: Compares the 6-frame translations of DNA query to protein

database.
TBLASTN: Compares a protein query to the 6-frame translations of a DNA
database. You can however define your frame of interest
TBLASTX: Compares the 6-frame translations of DNA query to the 6-frame
translations of a DNA database (each sequence is comparable to
BLASTP searches!)

PSI-BLAST: Performs iterative database searches. The results from each round
are incorporated into a 'position specific' score matrix, which is
used for further searching
A practical example of sequence alignment
http://www.ncbi.nlm.nih.gov

BLAST results
Detailed BLAST results

E value: is the expectation value or probability to find by chance hits similar to

your sequence. The lower the E, the more significant the score.
Database searching tips
Use latest database version.

Use BLAST first, then a finer tool (FASTA,…)

Search both strands when using FASTA.

Translate sequences where relevant

Search 6-frame translation of DNA database

E < 0.05 is statistically significant, usually biologically

interesting.

If the query has repeated segments, delete them and

repeat search

Get (eBook PDF) Introduction to Bioinformatics 5th Edition free all chapters
100% (6)
Get (eBook PDF) Introduction to Bioinformatics 5th Edition free all chapters
41 pages
Greek God Program - Road Map
No ratings yet
Greek God Program - Road Map
5 pages
Net Smelter Return
No ratings yet
Net Smelter Return
7 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
Bio Informatics
No ratings yet
Bio Informatics
46 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
unit 1
No ratings yet
unit 1
24 pages
8024 Bio Info
No ratings yet
8024 Bio Info
28 pages
Bioinfo Course Notes M1 2020 Dr Mbulli
No ratings yet
Bioinfo Course Notes M1 2020 Dr Mbulli
56 pages
BTH 403-BTG407 LECTURE 1
No ratings yet
BTH 403-BTG407 LECTURE 1
6 pages
Sec1 Introduction to Bioinformatics
No ratings yet
Sec1 Introduction to Bioinformatics
20 pages
Unit 6 - Bioinformatics
No ratings yet
Unit 6 - Bioinformatics
41 pages
Bio in For Matics
No ratings yet
Bio in For Matics
160 pages
Lab 1 - Introduction and Protocol
No ratings yet
Lab 1 - Introduction and Protocol
28 pages
MSC - Bioinformatics - Year1 Detailing by Bioinformatics Centre SPPU - 03082023
No ratings yet
MSC - Bioinformatics - Year1 Detailing by Bioinformatics Centre SPPU - 03082023
33 pages
Exploring Database and Analyzing Protein Sequence
No ratings yet
Exploring Database and Analyzing Protein Sequence
70 pages
Plant Biotechnology
No ratings yet
Plant Biotechnology
44 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
Download
No ratings yet
Download
19 pages
PB Bioinfo L1 2023
No ratings yet
PB Bioinfo L1 2023
21 pages
Bio Tics
No ratings yet
Bio Tics
7 pages
Bio in For Matics
No ratings yet
Bio in For Matics
17 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
8 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
toolsofbioinforformatics-200511063020
No ratings yet
toolsofbioinforformatics-200511063020
18 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
33 pages
Bioinformatics Class Notes
No ratings yet
Bioinformatics Class Notes
12 pages
Collection
No ratings yet
Collection
8 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
100% (3)
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
23 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
Bioinformatics: Tina Elizabeth Varghese
No ratings yet
Bioinformatics: Tina Elizabeth Varghese
9 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
14 pages
Blast
100% (1)
Blast
21 pages
Introduction To Bioinformatics: Tolga Can
No ratings yet
Introduction To Bioinformatics: Tolga Can
21 pages
BioInformatics Abstract For Paper Presentation
100% (1)
BioInformatics Abstract For Paper Presentation
11 pages
38401062 Introduction
No ratings yet
38401062 Introduction
13 pages
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
No ratings yet
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
54 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
Lec (1) - Introduction
No ratings yet
Lec (1) - Introduction
41 pages
Introduction A La Bioinformatique
No ratings yet
Introduction A La Bioinformatique
165 pages
Bioinformatics Tutorial
No ratings yet
Bioinformatics Tutorial
12 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
7 pages
An Over View of Tics
No ratings yet
An Over View of Tics
24 pages
2006 09 01 - Lect01 - ch1 2 PDF
No ratings yet
2006 09 01 - Lect01 - ch1 2 PDF
104 pages
To Bioinformatics: Dan Lopresti
No ratings yet
To Bioinformatics: Dan Lopresti
43 pages
120-202 Lab 01 - Fall 2018
No ratings yet
120-202 Lab 01 - Fall 2018
13 pages
(eBook PDF) Introduction to Bioinformatics 5th Edition download pdf
100% (10)
(eBook PDF) Introduction to Bioinformatics 5th Edition download pdf
55 pages
Stuart M. Brown-Bioinformatics - A Biologist's Guide To Biocomputing and The Internet-Eaton Publishing Company - Biotechniques Books (2000)
No ratings yet
Stuart M. Brown-Bioinformatics - A Biologist's Guide To Biocomputing and The Internet-Eaton Publishing Company - Biotechniques Books (2000)
189 pages
BIOINFORMATICS-basic
No ratings yet
BIOINFORMATICS-basic
10 pages
bioinformatics
No ratings yet
bioinformatics
3 pages
Article BioinformaticsNewToolsAndAppli
No ratings yet
Article BioinformaticsNewToolsAndAppli
15 pages
Bioinformatics Softwares: by Rifat Shahriyar Student No: 100705037P
No ratings yet
Bioinformatics Softwares: by Rifat Shahriyar Student No: 100705037P
20 pages
Data Retrieval
67% (3)
Data Retrieval
17 pages
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
From Everand
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
Fouad Sabry
No ratings yet
DNA Code Basics
From Everand
DNA Code Basics
Zara Sagan
No ratings yet
Formulation and Solution of Elasticity Problems: Chapter-5
No ratings yet
Formulation and Solution of Elasticity Problems: Chapter-5
29 pages
Car Auto - Data
No ratings yet
Car Auto - Data
6 pages
Managing Correctable Memory Errors On Cisco UCS Servers: White Paper
No ratings yet
Managing Correctable Memory Errors On Cisco UCS Servers: White Paper
9 pages
Suzuki - Wikipedia, The Free Encyclopedia
0% (2)
Suzuki - Wikipedia, The Free Encyclopedia
42 pages
C15 Diagranma
100% (2)
C15 Diagranma
2 pages
Ataremma - The Lord' S Prayer and Hail Mary in Elvish
No ratings yet
Ataremma - The Lord' S Prayer and Hail Mary in Elvish
65 pages
PH 474A-K - Synthetic Gear Oil - Extreme Pressure (EP)
No ratings yet
PH 474A-K - Synthetic Gear Oil - Extreme Pressure (EP)
4 pages
GD&T Symbols, Definitions ASME Y14.5-2009 Training - ISO G&T Symbols 1101 Definitions - GD&T Trainers - Engineers Edge
No ratings yet
GD&T Symbols, Definitions ASME Y14.5-2009 Training - ISO G&T Symbols 1101 Definitions - GD&T Trainers - Engineers Edge
8 pages
Grinding Machine: Presentation On
No ratings yet
Grinding Machine: Presentation On
30 pages
Seasonal Foods: Tota L Hea LTH Acceler Ator
No ratings yet
Seasonal Foods: Tota L Hea LTH Acceler Ator
3 pages
Chandrai Dookhna English Sba Final Draft
No ratings yet
Chandrai Dookhna English Sba Final Draft
18 pages
Full Download Serious Cycling 2nd Edition Edmund R. Burke PDF DOCX
100% (15)
Full Download Serious Cycling 2nd Edition Edmund R. Burke PDF DOCX
60 pages
KP Houses
100% (1)
KP Houses
15 pages
ME Manual Vol.1
100% (2)
ME Manual Vol.1
650 pages
Operating Systems
No ratings yet
Operating Systems
15 pages
How Is Thick 0.093 Inches An In-Depth Look
No ratings yet
How Is Thick 0.093 Inches An In-Depth Look
9 pages
Complete Blood Count
No ratings yet
Complete Blood Count
3 pages
Production of Synthetic Fluorspar From Waste
No ratings yet
Production of Synthetic Fluorspar From Waste
4 pages
AAN 2024 Day 5
No ratings yet
AAN 2024 Day 5
40 pages
RGB Color Table: Color HTML / Css Name Hex Code #RRGGBB Decimal Code (R, G, B)
No ratings yet
RGB Color Table: Color HTML / Css Name Hex Code #RRGGBB Decimal Code (R, G, B)
5 pages
Tunnel Fire Protection: A Lost Formwork System, Using PROMATECT - H
No ratings yet
Tunnel Fire Protection: A Lost Formwork System, Using PROMATECT - H
24 pages
Salvia Monograph 1
No ratings yet
Salvia Monograph 1
6 pages
Naswar
No ratings yet
Naswar
3 pages
Project Workplan and Budget Matrix
100% (1)
Project Workplan and Budget Matrix
12 pages
Sulky 12 Color 3mm Puffy Foam Assortment - Two Pack Ghana Ubuy
No ratings yet
Sulky 12 Color 3mm Puffy Foam Assortment - Two Pack Ghana Ubuy
1 page
Ohs-Pr-09-25-F03 Audit Action Plan - Woa
No ratings yet
Ohs-Pr-09-25-F03 Audit Action Plan - Woa
45 pages
New Gold Project 'Nordgruvan', Sweden
No ratings yet
New Gold Project 'Nordgruvan', Sweden
3 pages
Ada Lab
No ratings yet
Ada Lab
2 pages