Beginning Perl For Bioinformatics
Beginning Perl For Bioinformatics
Beginning Perl For Bioinformatics
Stuart Brown
NYU School of Medicine
Sources
• Beginning Perl for Bioinformatics
– James Tisdall, O’Reilly Press, 2000
• Using Perl to Facilitate Biological Analysis
in Bioinformatics: A Practical Guide (2nd Ed.)
– Lincoln Stein, WileyInterscience, 2001
• Introduction to Programming and Perl
– Alan M. Durham, Computer Science Dept., Univ. of São Paulo, Brazil
Why Write Programs?
• Automate computer work that you do by hand
save time & reduce errors
• Run the same analysis on lots of similar data files
= scaleup
• Analyze data, make decisions
– sort Blast results by evalue &/or species of best mach
• Build a pipeline
• Create new analysis methods
Why Perl?
• Fairly easy to learn the basics
• Many powerful functions for working with
text: search & extract, modify, combine
• Can control other programs
• Free and available for all operating systems
• Most popular language in bioinformatics
• Many prebuilt “modules” are available that
do useful things
Get Perl
• You can install Perl on any type of
computer
• Your account on mcrcr0 already has Perl
• Just log in you don’t even need to type
any command to make Perl active.
• Download and install Perl on your own
computer:
www.perl.org
Programming Concepts
• Program = a text file that contains
instructions for the computer to follow
• Programming Language = a set of
commands that the computer understands
(via a “command interpreter”)
• Input = data that is given to the program
• Output = something that is produced by the
program
Programming
• Write the program (with a text editor)
• Run the program
• Look at the output
• Correct the errors (debugging)
• Repeat
(computers are VERY dumb they do exactly
what you tell them to do, so be careful what
you ask for…)
Strings
• Text is handled in Perl as a string
• This basically means that you have to put
quotes around any piece of text that is not
an actual Perl instruction.
• Perl has two kinds of quotes single ‘ ‘
and double “ “
(they are different more about this later)
Print
• Perl uses the term “print” to create output
• Without a print statement, you won’t
know what your program has done
• You need to tell Perl to put a carriage return
at the end of a printed line
– Use the “\n” (newline) command
• Include the quotes
– The “\” character is called an escape Perl
uses it a lot
Your First Perl Program
• Log in to mcrcr0
• Open a new text file
>emacs my_perl1.pl
• Type:
#!/usr/bin/perl
# my first Perl program
print “Hello world \n”;
Awesome, isn’t it!
Program details
• Perl programs always start with the line:
#!/usr/bin/perl
– this tells the computer that this is a Perl program and
where to get the Perl interpreter
• All other lines that start with # are considered
comments, and are ignored by Perl
• Lines that are Perl commands end with a ;
Run your Perl program
• >chmod u+x *.pl
[#make the file executable]
• >perl my_perl1.pl
[#use the perl interpreter to run your script]
Numbers and Functions
• Perl handles numbers in most common formats:
456
5.6743
6.3E26
• Mathematical functions work pretty much as you
would expect:
4+7
6*4
4327
256/12
2/(35)
Do the Math
(your 2nd Perl program)
#!/usr/bin/perl
print “4+5\n”;
print 4+5 , “\n”;
print “4+5=” , 4+5 , “\n”;
[Note: use commas to separate multiple items
in a print statement, whitespace is ignored]
Variables
• To be useful at all, a program needs to be able to
store information from one line to the next
• Perl stores information in variables
• A variable name starts with the “$” symbol, and it
can store strings or numbers
– Variables are case sensitive
– Give them sensible names
• Use the “=”sign to assign values to variables
$one_hundred = 100
$my_sequence = “ttattagcc”
You can do Math with Variables
#!/usr/bin/perl
#put some values in variables
$sequences_analyzed = 200 ;
$new_sequences = 21 ;
#now we will do the work
$percent_new_sequences =( $new_sequences /
$sequences_analyzed) *100 ;
print “% of new sequences = ” , $percent_new_sequences;
% of new sequences = 952.381
String Operations
• Strings (text) in variables can be used for some math
like operations
• Concatenate (join) use the dot . operator
$seq1= “ACTG”;
$seq2= “GGCTA”;
$seq3= $seq1 . $seq2;
print $seq3
ACTGGGCTA
• String comparison (are they the same, > or <)
• eq (equal )
• ne (not equal ) Uses some nonintuitive
• ge (greater or equal ) ways of comparing letters
• gt (greater than ) (ASCII values)
• lt (less than )
• le (less or equal )