Perl
Perl
What is Perl?
• Practical Extraction and Report Language
• Scripting language created by Larry Wall in the mid-80s
• Functionality and speed somewhere between low-level
languages (like C) and high-level ones (like “shell”)
• Influence from awk, sed, and C Shell
• Easy to write (after you learn it), but sometimes hard to
read
• Widely used in CGI scripting
A Simple Perl Script
hello: turns on warnings
#!/usr/bin/perl -w
print “Hello, world!\n”;
sub dec_by_one {
my @ret = @_; # make a copy
for my $n (@ret) { $n-- }
return @ret;
}
sub dec_by_1 {
for (@_) { $_-- }
}
Reading from STDIN
• STDIN is the builtin filehandle to the standard input
• Use the line input operator around a file handle to read
from it
$line = <STDIN>; # read next line
chomp($line);
• chomp removes trailing string that corresponds to the
value of $/ - usually the newline character
Reading from STDIN example
while (<STDIN>) {
chomp;
print “Line $. ==> $_\n”;
}
# sum of squares of 1 to 5
for ($i = 1; $i <= 5; $i++) {
$sum += $i*$i;
}
next
• next skips the remaining of the current
iteration (like continue in C)
# only print non-blank lines
while (<>) {
if ( $_ eq “\n”) { next; }
else { print; }
}
last
• last exist the loop immediately (like break
in C)
# print up to first blank line
while (<>) {
if ( $_ eq “\n”) { last; }
else { print; }
}
Logical AND/OR
• Logical AND : &&
if (($x > 0) && ($x < 10)) { … }
• Logical OR : ||
if ($x < 0) || ($x > 0)) { … }
• Both are short-circuit operators - the
second expression is only evaluated if
necessary
Regular Expressions
• Use EREs (egrep style)
• Plus the following character classes
– \w “word” character: [A-Za-z0-9_]
– \d digits: [0-9]
– \s whitespace: [\f\t\n\r ]
– \b word boundary
– \W, \D, \S, \B are complements of the corresponding
classes above
• Can use \t to denote a tab
Backreferences
• Support backreferences
• Subexpressions are referred to using \1,
\2, etc. in the RE and $1, $2, etc. outside
the RE
if (/^this (red|blue|green) (bat|ball) is \1/)
{
($color, $object) = ($1, $2);
}
Matching
• Pattern match operator: /RE/ is a shortcut of m/RE/
– Returns true if there is a match
– Match against $_ be default
– Can also use m(RE), m<RE>, m!RE!, etc.
if (/^\/usr\/local\//) { … }
if (m%/usr/local/%) { … }
• Case-insensitive match
if (/new york/i) { … };
Matching cont.
• To match an RE against something other than $_,
use the binding operator =~
if ($s =~ /\bblah/i) {
print “Find blah!”
}
• !~ negates the match
while (<STDIN> !~ /^#/) { … }
• Variables are interpolated inside REs
if (/^$word/) { … }
Match Variables
• Special match variables
– $& : the section matched
– $` : the part before the matched section
– $’ : the part after the matched section
$string = "What the heck!";
$string =~ /\bt.*e/;
print "($`) ($&) ($')\n";
(What ) (the he) (ck!)
Substitutions
• Sed-like search and replace with s///
s/red/blue/;
$x =~ s/\w+$/$`/;
– Unlike m///, s/// modifies the variable
• Global replacement with /g
s/(.)\1/$1/g;
• Transliteration operator: tr/// or y///
tr/A-Z/a-z/;
RE Functions
• split string using RE (whitespace by default)
@fields = split /:/, “::ab:cde:f”;
# gets (“”,””,”ab”,”cde”,”f”)
• join strings into one
$str = join “-”, @fields; # gets “--ab-cde-f”
• grep something from a list
– Similar to UNIX grep, but not limited to using regular expressions
@selected = grep(!/^#/, @code);
– Modifying elements in returned list actually modifies the elements
in the original list
Running Another program
• Use the system function to run an external program
• With one argument, the shell is used to run the command
– Convenient when redirection is needed
$status = system(“cmd1 args > file”);
• To avoid the shell, pass system a list
$status = system($prog, @args);
die “$prog exited abnormally: $?” unless
$status == 0;
Capturing Output
• If output from another program needs to be
collected, use the backticks
my $files = `ls *.c`;
• Collect all output lines into a single string
my @files = `ls *.c`;
• Each element is an output line
• The shell is invoked to run the command
Environment Variables
• Environment variables are stored in the
special hash %ENV
$ENV{‘PATH’} =
“/usr/local/bin:$ENV{‘PATH’}”;
Example: Union and Intersection I
@a = (1, 3, 5, 6, 7);
@b = (2, 4, 5, 9);
@union = @isect = ();
%union = %isect = ();
$size_of_form_info = $ENV{'CONTENT_LENGTH'};
read ($STDIN, $form_info, $size_of_form_info);
use strict;
use CGI qw(:standard);
my $bday = param("birthday");