0% found this document useful (0 votes)

53 views

Bioinformatics Tutorial

Bioinformatics is the use of computer tools to analyze and manage large biological data sets. It involves the in silico study of biological experiments that were traditionally studied in vitro and in vivo. Bioinformatics aims to expand our use of biological data through computational analysis and is considered a "dry lab" approach. Margaret Dayhoff is considered the "mother and father" of bioinformatics for her pioneering work in protein and sequence analysis. While related, bioinformatics focuses on developing computational tools for biological data, while computational biology applies computational and mathematical modeling to biological systems.

Uploaded by

Gia Boo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Bioinformatics Tutorial

Uploaded by

Gia Boo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Tutorial 1 (15 marks)

1. Define bioinformatics? (2m)

The use of computer tools to manage, analyze and manipulate large set of biological data,
including text data, phylogenetic trees, gene expression profiles, biochemical pathways

2. Before the era of bioinformatics, the biological experiments were studied in 2 ways.
What are the 2 ways and briefly explain the 2 ways. (4m)
In vitro studies – study of biological experiments outside the organism
In vivo studies– study of biological experiments inside the organism.

3. Bioinformatics study is called as....... (1m)

in silico study

4. The term ‘Bioinformatics’ was coined by _____________ (1m)

Paulien Hogeweg

5. Who has been hailed by director of National Centre for Biology Information (NCBI) as
the “mother and father of bioinformatics” and why? (2m)
Margaret Oakley Dayhoff

6. Why bioinformatics is considered dry lab? (1m)

As we use only software in the computers and not chemicals or instruments to do
experiments.

7. What are the differences between bioinformatics and computational biology? (4m)

Bioinformatics Computational biology

Research, development, or application of development and application of data-analytical
computational tools and approaches for expanding and theoretical methods, mathematical modeling
the use of biological, medical or health data, and computational simulation techniques to the
including those to acquire, store, organize, archive, study of biological, behavioral and social systems.
analyze or visualize such data.

biologist called bioinformaticians/ computational biologist (who are computer

bioinformaticists scientists, mathematicians, statisticians and
engineers)

Tutorial 2 (database) Total: 43 marks

1. What is database? (2m)
Can be defined as a collection of related data that is stored in a computer in such a way
that it can easily be found by a computer user

2. There are two common types of biological data. List the two common types of biological
data with one example for each type(4m)
Sequences, eg: DNA
Annotations, eg: gene function

3. List down 4 different types of data format with example. (8m)

Sequence
– eg. Text
• Sequence annotation
– eg. GenBank
• Aligned sequences
– eg. MSF(multiple sequence file
• Protein structural data
– eg. PDB

4. What is flat file? (2m)

The elementary format underlying the information held in DDBJ/EMBL/GenBank
5. List down three major parts of flat files. (3m)
the header, which contains the information (descriptors) that apply to the entire record
the features, which are the annotations on the record
the nucleotide sequence itself

6. What are the two differences between text format and FASTA format? (4m)
Text format FASTA format
no additional annotation can be added additional annotation such as > can be
added at the beginning of the new sequence
Common extensions - .txt, .seq Common extension - .fasta

7. Expand the following (4m):

a. NCBI: The National Center for Biotechnology Information
b. EST: Expressed Sequence Tags
c. RefSeq: Reference Sequences
d. CDD: Conserved Domain Database

8. If you want literature information, what is the best website? (1m)

Pubmed

9. What are the two differences between primary and derivative sequence databases? Give
one example of primary and derivative sequence databases. (6m)
Primary Databases Derivative Databases
Original submissions by experimentalists Derived from primary data
Content controlled by the submitter Content controlled by third party (NCBI)
Ex: Genbank Ex: RefSeq

10. From these following accession numbers, determine what is the type of molecular
sequence that you will get (4m):
a. NM_15392: RNA
b. NT_030059: DNA
c. X02775 : DNA
d. NP_52280: Protein

11. What information will you get when you look up a gene in UniGene? (2m)
UniGene displays information about the abundance of a transcript (expressed gene), as
well as its regional distribution of expression
12. List down three search strategy plan that you can do to help you look for the information
you need in Pubmed. (3m)
Try the PubMed tutorial
Identify the key concepts.
Determine alternative terms for these concepts, if needed.

Tutorial 3 (Total: 18 marks)

1. What is a sequence alignment?(2m)

The process of locating equivalent regions of sequences to maximize their similarity. Two sequences are
directly compared, position by position.

2. Give two advantages and disadvantages of using dot plot in sequence alignments (4m).

Advantages: Use to identify long regions of strong similarity. It produces a plot, which is easy to
make and interpret

Disadvantages: No statistical analysis. Do not provide a precise alignment

3. For the following two sequences, construct a simple dot plot using a grid (or squared paper).
Place each sequence along one axis, and place a dot in the plot for each identical pair of
nucleotides (3m).
ABRACADABRACADABRA
ABRACADABRACADABRA

A B R A C A D A B R A C A D A B R A
A • • • • • • • •
B • • •
R • • •
A • • • • • • • •
C • •
A • • • • • • • •
D • •
A • • • • • • • •
B • • •
R • • •
A • • • • • • • •
C • •
A • • • • • • • •
D • •
A • • • • • • • •
B • • •
R • • •
A • • • • • • • •

Which stretch of these sequences can be aligned best to each other (write down the sequence that
show similarity)? (1m)
ABRACADABRACADABRA
What can you conclude from the plot about these two sequences? (1m)
It is repeated sequence as more than one diagonal in the same region of a sequence.
4. Define the following (3m):
a. Identity: refers to an exact match between two nucleotides or amino acids
b. Similarity: the resemblance between two sequences when they are compared.
c. Homolog: the resemblance or similarity between two sequences due to the organisms
being of common ancestry

5. Give two differences between orthologs and paralogs (4m).

Orthologs Paralogs
similar sequences in two different organisms. similar sequences arisen within the organism
This similarity is arisen due to speciation This similarity arisen due to gene duplication.
(formation of new and distinct species in the
course of evolution) event.

Tutorial 4 Total: 43

1. Expand BLAST (1m)

Basic Local Alignment Search Tool
2. What is the fundamental of BLAST searching? (2m)
BLAST searching is fundamental to understanding the relatedness of any favorite query
sequence to other known proteins or DNA sequences
3. BLAST is a collection of five programs for different combinations of query and database
sequences. Describe briefly all five BLAST programs. (10m)

4. Briefly explain the steps needed to perform BLAST search. (4m)

1. Specify sequence of interest (query)
2. Select the BLAST program
3. Choose the database to search
4. Choose optional parameters

Then click “BLAST”

5. Above is an alignment from BLAST result. Define positive and identities? (2m)
Positive: % similarity
Identities: % identical matches

6. Write the formula to calculate the percentage of positive and identities. (2m)

Positive: Identical matches + Similar matches x 100

Total length of the aligned region

Identities: Identical matches x 100

Total length of the aligned region

7. Give two differences between raw score and bit score. (4m)

Raw score Bit score

Calculated from the substitution matrix and Calculated from the raw score by
gap penalty parameters that are chosen normalizing with the statistical variables
Raw scores are not comparable between Bit scores are comparable between
different searches different searches

8. What is low complexity region? (2m)

Regions with low-complexity sequence have an unusual biased amino acid /nucleotides
composition that can complicate sequence similarity searching.

9. Above is short protein sequence from chicken. Is low complexity region presence in this
sequence? If yes, write down the sequence of low complexity region. (2m)
Yes, SSSSSSSSSSSSSSSSSS
10. Why complexity region need to be filtered out? (2m)
it is as if the low-complexity region is "sticky" and is pulling out many sequences that
are not truly related.

11. Name program to mask complexity region in nucleotide sequences (1m)

DustMasker
12. Name program to mask complexity region in protein sequences (1m)
SEG
13. Given word size (k) is 3, how does BLAST read these sequences? (3m)
MKKKSLALVLATGMA
MKK, KKK, KKS, KSL, SLA, LAL, ALV, LVL, VLA, LAT, ATG, TGM, GMA.

14. BLAST search have been done to predict the function of human query protein. The
alignments of best hits are given above.
a. Which hit is statistically more significant? Explain. (3m)
Hit 2, because it has lower e-value(5e-11), higher score(167), higher
percentage of identities(76%) and higher percentage of positive(84%).
b. Which of the two hits do you think is most likely to be true homolog? (3m)
Hit 1, because it occurs outside low complexity region whereas sequence 2
occurs in the low complexity region. E value of Hit 1is low enough to be
considered as significant which lower than 0.01. Their homology can be
determined.
15. We have determined the genome sequence of a bacterium. How can we use BLAST to
identify protein-coding genes in this genome if we only have access to protein sequence
databases? (1m)
Blast x
Tutorial 5. Total mark: 25

1. BLAST search is done to predict function of a short protein segment from chicken. Top 10 hits
are given above.
a) Can you predict the function of this protein based on this output? Justify your answer.
(2m)
No, because all results show larger e value.
b) What can you do to improve this search? (2m)
Turn on the filter of low complexity region. Increase the length of sequence and use PSI-
Blast.

2. A BLASTP search has not returned any hits at all. Would it be useful to do a PSI-BLAST using
the same settings as the original BLASTP? (2m)
No, because no result to run PSSM.

3. What is multiple sequence alignment?(2m)

Multiple sequence alignment is an alignment of more than two sequences.

4. List all the programs that can be used to conduct multiple sequence alignment (4m)
Clustal Omega
T-Coffee
DIALIGN
MUSCLE

5. If you have 4 different sequences and you want to align the 4 sequences using pairwise alignment
approach, how many number of pairwise alignments needed to find similarity between the 4
different sequences? Show your calculation (2m)
(4-1)(4)/2
=6
6. Why Feng-Doolitle follow rule: once a gap always a gap in making multiple sequence alignment.
(2m)
To maintain the initial gap choices is to trust that those gaps are most believable. Assures that
gaps occurring between sequences that are most closely related in a multiple sequence alignments
will be preserved

7. The alignment above is part of multiple alignment of six protein sequences (human, chimpanzee,
mouse, rat, shark and chicken). Amino acids are shaded according to their conservation and
physio-chemical properties
a. List all conserved positions (3m)
2, 11, 16, 17, 18, 23, 25, 26, 27, 28,29, 30, 31, 32, 33,34, 35, 44, 50
b. If the first sequence is from human and the third is rat, which one is chimpanzee and which
one is mouse? (2m)
Second is chimpanzee, forth is mouse.
c. Is the shark sequence (5th sequence) closer to the human than the chicken is to human? Use
identity (mismatch=0, match= 1) to calculate the distance. (4m)
Yes
Chicken: Mismatch: 19 x
Match: 23 x 1=23
=23
Shark: Mismatch: 26 x0 = 0
Match: 24 x 1 = 24
=24
Tutorial 6

Phylogenetic (total:18)

1. What is phylogenetic tree? (2)

A diagram that illustrates the evolutionary relationships among species, genes, or proteins

2. What is phylogenetic tree made of? Draw simple phylogenetic tree showing parts of phylogenetic
tree. (4)
Phylogenetic tree is made of branches, nodes, terminals/leaves, and a root

3. Define outgroup. (2)

A lineage that is known to be more distantly related to the other species (or DNA/proteins) being
studied.

4. Use the tree above to answer the following questions.

a. A common ancestor for both species C and E could be at position number____4__ (1)
b. The two currently living species that are most closely related to each other are _C & D___
(2)
c. Which of these living species is considered as an outgroup? (3m)
E and A

5. Below is an example of types of phylogenetic tree. Label A and B. Give reason to your answer
(4).

A B

A is cladogram, B is phylogram. Cladograms show branching order – branch lengths are meaningless.
Branches indicate only branching order. Phylograms show branch lengths – branch lengths may indicate
genetic distance. Branch length represent real distances

BioInformatics Quiz1 Week14
100% (4)
BioInformatics Quiz1 Week14
47 pages
Protein Synthesis Model Lab
No ratings yet
Protein Synthesis Model Lab
7 pages
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Tutorial For Proteome Data Analysis Using The Perseus Software Platform
No ratings yet
Tutorial For Proteome Data Analysis Using The Perseus Software Platform
22 pages
Exploring Ensembl: Exercise 1 - Panda
No ratings yet
Exploring Ensembl: Exercise 1 - Panda
4 pages
MCQs Series for Life Sciences: Volume 2
From Everand
MCQs Series for Life Sciences: Volume 2
Maddaly Ravi
4/5 (1)
Bioinformatics Session1
No ratings yet
Bioinformatics Session1
35 pages
Bioinformatics Tutorial 2019
No ratings yet
Bioinformatics Tutorial 2019
54 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
Blast
100% (1)
Blast
21 pages
Bioinformatics Overview Gerstein PDF
No ratings yet
Bioinformatics Overview Gerstein PDF
30 pages
Bi0505 Lab
No ratings yet
Bi0505 Lab
102 pages
Query Sequence 1
No ratings yet
Query Sequence 1
3 pages
Pairwise Sequence Alignment
No ratings yet
Pairwise Sequence Alignment
12 pages
LSM2241 Practical 4: Introduction To BLAST
No ratings yet
LSM2241 Practical 4: Introduction To BLAST
12 pages
Bioinformatics Exercises Print
No ratings yet
Bioinformatics Exercises Print
6 pages
Blast2Go Tutorial
No ratings yet
Blast2Go Tutorial
31 pages
BIOINFORMATICS LAB Report
No ratings yet
BIOINFORMATICS LAB Report
14 pages
Lecture12 Functional Pathway Analysis
No ratings yet
Lecture12 Functional Pathway Analysis
13 pages
Insilico Gene Analysis
No ratings yet
Insilico Gene Analysis
34 pages
Assignment 1 - Database - Oct 2021
No ratings yet
Assignment 1 - Database - Oct 2021
5 pages
Experiment 9 Bioinformatics Tools For Cell and Molecular Biology
No ratings yet
Experiment 9 Bioinformatics Tools For Cell and Molecular Biology
11 pages
Guide Sheet For Tics Lab 1 - 4
No ratings yet
Guide Sheet For Tics Lab 1 - 4
17 pages
Introduction To Bioinformatics Lab: 10B17BT571 Core Course Credits: 1 L0T0P2
No ratings yet
Introduction To Bioinformatics Lab: 10B17BT571 Core Course Credits: 1 L0T0P2
3 pages
Introduction To Biochemstry Lab
No ratings yet
Introduction To Biochemstry Lab
48 pages
Primer Design For PCR Assignment
100% (1)
Primer Design For PCR Assignment
5 pages
202 07 Bioinformatics
No ratings yet
202 07 Bioinformatics
14 pages
Using Genbank and BLAST in The Biology Classroom: Matt Wester
No ratings yet
Using Genbank and BLAST in The Biology Classroom: Matt Wester
9 pages
Phylogenetic Trees
No ratings yet
Phylogenetic Trees
11 pages
Bioinformatics Notes
No ratings yet
Bioinformatics Notes
40 pages
Bioinformatics Assignment Topic: Phylogenetics Analysis Softwares
No ratings yet
Bioinformatics Assignment Topic: Phylogenetics Analysis Softwares
12 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
Bioinformatics
No ratings yet
Bioinformatics
18 pages
FASTA
No ratings yet
FASTA
33 pages
Ensembl: Exercise 1 A. It Is Located On q13.1, It Has Following Contigs
No ratings yet
Ensembl: Exercise 1 A. It Is Located On q13.1, It Has Following Contigs
2 pages
Gate BT-2011 Questions With Key
No ratings yet
Gate BT-2011 Questions With Key
26 pages
Exer 5 - BIOINFORMATICS
No ratings yet
Exer 5 - BIOINFORMATICS
21 pages
Blast ND Fasta
No ratings yet
Blast ND Fasta
28 pages
Biological Database 1
No ratings yet
Biological Database 1
50 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
CBE 647 Lesson Plan - Sept 2017
No ratings yet
CBE 647 Lesson Plan - Sept 2017
3 pages
Tutorial Solution Chapter 1+2+3
No ratings yet
Tutorial Solution Chapter 1+2+3
28 pages
1 - Introduction To Computational Biology
No ratings yet
1 - Introduction To Computational Biology
22 pages
Sequence Analysis
No ratings yet
Sequence Analysis
6 pages
Lab Report 2 Bioinformatics
No ratings yet
Lab Report 2 Bioinformatics
17 pages
Primer Design 2013 PDF
No ratings yet
Primer Design 2013 PDF
58 pages
Creating Phylogenetic Trees With Mega: Prat Thiru
100% (1)
Creating Phylogenetic Trees With Mega: Prat Thiru
18 pages
Protein Folding
No ratings yet
Protein Folding
21 pages
Bioinformatics For Health Care: By-Daniyal Jadhav PRN No - 19010143002
No ratings yet
Bioinformatics For Health Care: By-Daniyal Jadhav PRN No - 19010143002
24 pages
Group21 Bioinformatics Assignment6 .Primary-structure-Protein-localizationdocx
No ratings yet
Group21 Bioinformatics Assignment6 .Primary-structure-Protein-localizationdocx
6 pages
Bioinformatics KSOU
No ratings yet
Bioinformatics KSOU
260 pages
BLOSUM Matrices
No ratings yet
BLOSUM Matrices
18 pages
MM Protein Synthesis Activity CBdiTEE
100% (1)
MM Protein Synthesis Activity CBdiTEE
6 pages
Gel Electrophoresis Basics Worksheet: Name
No ratings yet
Gel Electrophoresis Basics Worksheet: Name
2 pages
Group # 13
No ratings yet
Group # 13
49 pages
IA-5B-Understanding DNA Sequences (DONE)
No ratings yet
IA-5B-Understanding DNA Sequences (DONE)
1 page
Bioinformatics
No ratings yet
Bioinformatics
55 pages
exam-year-questions-and-answers
No ratings yet
exam-year-questions-and-answers
8 pages
Exam Year Questions and Answers
No ratings yet
Exam Year Questions and Answers
8 pages
BIO Final22 Questionssol
No ratings yet
BIO Final22 Questionssol
16 pages
Resume Ss
No ratings yet
Resume Ss
3 pages
Module 5
No ratings yet
Module 5
23 pages
Chapter 2 Bioinformatics
No ratings yet
Chapter 2 Bioinformatics
9 pages
Interpretation
No ratings yet
Interpretation
2 pages
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins 4th Edition Andreas D. Baxevanis - The ebook in PDF format is available for download
100% (1)
Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins 4th Edition Andreas D. Baxevanis - The ebook in PDF format is available for download
48 pages
FSLC Fiche GillesThomas
No ratings yet
FSLC Fiche GillesThomas
2 pages
Lecture Notes in Artificial Intelligence 3430
No ratings yet
Lecture Notes in Artificial Intelligence 3430
357 pages
Multiple Sequence Alignment (MSA)
No ratings yet
Multiple Sequence Alignment (MSA)
78 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
33 pages
A Clinical Dataset and Various Baselines For Chromosome Instance Segmentation
No ratings yet
A Clinical Dataset and Various Baselines For Chromosome Instance Segmentation
9 pages
Sequence Alignment Methods Final
No ratings yet
Sequence Alignment Methods Final
69 pages
Quiz
No ratings yet
Quiz
7 pages
Advertisement T 2012-1-1
No ratings yet
Advertisement T 2012-1-1
2 pages
Additional Note PDF
No ratings yet
Additional Note PDF
25 pages
Functional Genomics
100% (2)
Functional Genomics
404 pages
Scope & Topics: International Journal of Computational Biology, Informatics and Control (IJCBIC)
No ratings yet
Scope & Topics: International Journal of Computational Biology, Informatics and Control (IJCBIC)
3 pages
15GN402L_final_bioinformatics_lab_manual (1)
No ratings yet
15GN402L_final_bioinformatics_lab_manual (1)
68 pages
European Molecular Biology Laboratory (EMBL) : Hafiz.M.Zeeshan - Raza Research Associate - HEC - NRPU
No ratings yet
European Molecular Biology Laboratory (EMBL) : Hafiz.M.Zeeshan - Raza Research Associate - HEC - NRPU
22 pages
Bio Informatics: Assignment No 03
No ratings yet
Bio Informatics: Assignment No 03
3 pages
File#2 2024 - Laboratory List
No ratings yet
File#2 2024 - Laboratory List
1 page
EST - "Expressed Sequence Tags": - Manali Mehendale
No ratings yet
EST - "Expressed Sequence Tags": - Manali Mehendale
19 pages
Macse
No ratings yet
Macse
5 pages
Homology modeling
No ratings yet
Homology modeling
5 pages
Module 1_Session 3_Part 1
No ratings yet
Module 1_Session 3_Part 1
17 pages
Biologics Suite: All The Tools That Are Important in Modeling Biologics, Antibodies, and Proteins
No ratings yet
Biologics Suite: All The Tools That Are Important in Modeling Biologics, Antibodies, and Proteins
2 pages
BIF
No ratings yet
BIF
10 pages
DDBJ
No ratings yet
DDBJ
2 pages
Bio in For Matics
No ratings yet
Bio in For Matics
26 pages
OrthoFinder Manual
No ratings yet
OrthoFinder Manual
17 pages

Bioinformatics Tutorial

Uploaded by

Bioinformatics Tutorial

Uploaded by

Tutorial 1 (15 marks)

1. Define bioinformatics? (2m)

3. Bioinformatics study is called as....... (1m)

4. The term ‘Bioinformatics’ was coined by _____________ (1m)

6. Why bioinformatics is considered dry lab? (1m)

Bioinformatics Computational biology

biologist called bioinformaticians/ computational biologist (who are computer

Tutorial 2 (database) Total: 43 marks

3. List down 4 different types of data format with example. (8m)

4. What is flat file? (2m)

7. Expand the following (4m):

8. If you want literature information, what is the best website? (1m)

Tutorial 3 (Total: 18 marks)

1. What is a sequence alignment?(2m)

Disadvantages: No statistical analysis. Do not provide a precise alignment

5. Give two differences between orthologs and paralogs (4m).

1. Expand BLAST (1m)

4. Briefly explain the steps needed to perform BLAST search. (4m)

Then click “BLAST”

Positive: Identical matches + Similar matches x 100

Total length of the aligned region

Identities: Identical matches x 100

Total length of the aligned region

Raw score Bit score

8. What is low complexity region? (2m)

11. Name program to mask complexity region in nucleotide sequences (1m)

3. What is multiple sequence alignment?(2m)

1. What is phylogenetic tree? (2)

3. Define outgroup. (2)

4. Use the tree above to answer the following questions.

You might also like