Tutorial
Tutorial
Hello World!
Here is the basic perl program that we'll use to get started.
#! /usr/local/bin/perl
#
# prints a greeting.
#
print 'Hello world.'; # Print a message
Comments
A common Perl-pitfall is to write cryptic code. In that context, Perl do provide for
comments, albeit not very flexible. Perl treats any thing from a hash # to the end of line
as a comment. Block comments are not possible. So, if you want to have a block of
comments, you must ensure that each line starts with #.
Statements
Everything other than comments are Perl statements, which must end with a semicolon,
like the last line above. Unlike C, you need not put a wrapping character \ for long
statements. A Perl statement always ends with a semicolon.
After you've entered and saved the program make sure the file is executable by using the
command
2.3 Scalars
Perl supports 3 basic types of variables, viz., scalars, lists and hashes. We will explore
each of these little more.
The most basic kind of variable in Perl is the scalar variable. Scalar variables hold both
strings and numbers, and are remarkable in that strings and numbers are completely
interchangeable. For example, the statement
$age = 27;
sets the scalar variable $age to 27, but you can also assign a string to exactly the same
variable:
$age = 'Twenty Seven';
Perl also accepts numbers as strings, like this:
$priority = '9';
$default = '0009';
and can still cope with arithmetic and other operations quite happily. However, please
note that the following code is a bit too much to ask for!
$age = 'Twenty Seven';
$age = $age + 10;
For the curious, the above code will set $age to 10. Think why.
In general variable names consists of numbers, letters and underscores, but they should
not start with a number and the variable $_ is special, as we'll see later. Also, Perl is case
sensitive, so $a and $A are different.
Other operators can be found on the perlop manual page. Type man perlop at the
prompt.
Interpolation
$a = 'apples';
$b = 'pears';
print $a.' and '.$b;
It would be nicer to include only one string in the final print statement, but the line
print '$a and $b';
prints literally $a and $b which isn't very helpful. Instead we can use the double quotes in
place of the single quotes:
print "$a and $b";
The double quotes force interpolation of any codes, including interpreting variables. This
is a much nicer than our original statement. Other codes that are interpolated include
special characters such as newline and tab. The code \n is a newline and \t is a tab.
Exercise
This exercise is to rewrite the Hello world program so that (a) the string is assigned to a
variable and (b) this variable is then printed with a newline character. Use the double
quotes and don't use the concatenation operator.
The array is accessed by using indices starting from 0, and square brackets are used to
specify the index. The expression
$food[2]
returns eels. Notice that the @ has changed to a $ because eels is a scalar.
Array assignments
As in all of Perl, the same expression in a different context can produce a different result.
The first assignment below explodes the @music variable so that it is equivalent to the
second assignment.
To remove the last item from a list and return it use the pop function. From our original
list the pop function returns eels and @food now has two elements:
$f = @food;
assigns the length of @food, but
$f = "@food";
turns the list into a string with a space between each element. This space can be replaced
by any other string by changing the value of the special $" variable. This variable is just
one of Perl's many special variables, most of which have odd names.
When you get overloaded with oddity, use the English module which lets you name these
variables in more user-friendly (i.e. to English-speaking people) way.
Finally, you may want to find the index of the last element of a list. To do this for the
@food array use the expression
$#food
Displaying arrays
Since context is important, it shouldn't be too surprising that the following all produce
different results:
To define an associative array we use the usual parenthesis notation, but the array itself is
prefixed by a % sign. Suppose we want to create an array of people and their ages. It
would look like this:
An associative array can be converted back into a list array just by assigning it to a list
array variable. A list array can be converted into an associative array by assigning it to an
associative array variable. Ideally the list array will have an even number of elements:
Operators
Associative arrays do not have any order to their elements (they are just like hash tables)
but is it possible to access all the elements in turn using the keys function and the values
function:
When keys and values are called in a scalar context they return the number of key/value
pairs in the associative array.
There is also a function each which returns a two element list of a key and its value.
Every time each is called it returns another key/value pair:
Environment variables
When you run a perl program, or any script in UNIX, there will be certain environment
variables set. These will be things like USER which contains your username and
DISPLAY which specifies which screen your graphics will go to. When you run a perl
CGI script on the World Wide Web there are environment variables which hold other
useful information. All these variables and their values are stored in the associative
%ENV array in which the keys are the variable names. Try the following in a perl
program:
foreach
To go through each line of an array or other list-like structure (such as lines in a file) Perl
uses the foreach structure. This has the form
Testing
The next few structures rely on a test being true or false. In Perl any non-zero number and
non-empty string is counted as true. The number zero, zero by itself in a string, and the
empty string are counted as false. Here are some tests on numbers and strings.
for
Perl has a for structure that mimics that of C. It has the form
First of all the statement initialise is executed. Then while test is true the block of actions
is executed. After each time the block is executed inc takes place. Here is an example for
loop to print out the numbers 0 to 9.
Here is a program that reads some input from the keyboard and won't continue until it is
the correct password
#!/usr/local/bin/perl
print "Password? "; # Ask for input
$a = ; # Get input
chop $a; # Remove the newline at end
while ($a ne "fred") # While input is wrong...
{
print "sorry. Again? "; # Ask again
$a = ; # Get input again
chop $a; # Chop off newline again
}
The curly-braced block of code is executed while the input does not equal the password.
The while structure should be fairly clear, but this is the opportunity to notice several
things. First, we can we read from the standard input (the keyboard) without opening the
file first. Second, when the password is entered $a is given that value including the
newline character at the end. The chop function removes the last character of a string
which in this case is the newline.
To test the opposite thing we can use the until statement in just the same way. This
executes the block repeatedly until the expression is true, not while it is true.
Another useful technique is putting the while or until check at the end of the statement
block rather than at the beginning. This will require the presence of the do operator to
mark the beginning of the block and the test at the end. If we forgo the sorry. Again
message in the above password program then it could be written like this.
#!/usr/local/bin/perl
do
{
print "Password? "; # Ask for input
$a = ; # Get input
chop $a; # Chop off newline
}
while ($a ne "fred") # Redo while wrong input
Exercise
Modify the program from the previous exercise so that each line of the file is read in one
by one and is output with a line number at the beginning. You should get something like:
1 root:oYpYXm/qRO6N2:0:0:Super-User:/:/bin/csh
2 sysadm:*:0:0:System V Administration:/usr/admin:/bin/sh
3 diag:*:0:996:Hardware Diagnostics:/usr/diags:/bin/csh
etc
You may find it useful to use the structure
while ($line = <INFO>)
{
...
}
When you have done this see if you can alter it so that line numbers are printed as 001,
002, ..., 009, 010, 011, 012, etc. To do this you should only need to change one line by
inserting an extra four characters. Perl's clever like that.
if-else
Of course Perl also allows if/then/else statements. These are of the following form:
if ($a)
{
print "The string is not empty\n";
}
else
{
print "The string is empty\n";
}
For this, remember that an empty string is considered to be false. It will also give an
"empty" result if $a is the string 0.
Exercise
From the previous exercise you should have a program which prints out the password file
with line numbers. Change it so that works with the text file. Now alter the program so
that line numbers aren't printed or counted with blank lines, but every line is still printed,
including the blank ones. Remember that when a line of the file is read in it will still
include its newline character at the end.
#!/usr/local/bin/perl
#
# Program to open the password file, read it in,
# print it, and close it again.
The open function opens a file for input (i.e. for reading). The first parameter is the
filehandle which allows Perl to refer to the file in future. The second parameter is an
expression denoting the filename. If the filename was given in quotes then it is taken
literally without shell expansion. So the expression '~/notes/todolist' will not be
interpreted successfully. If you want to force shell expansion then use angled brackets:
that is, use <~/notes/todolist> instead.
There are a few useful points to add to this discussion on file-handling. First, the open
statement can also specify a file for output and for appending as well as for input. To do
this, prefix the filename with a > for output and a >> for appending:
Second, if you want to print something to a file you've already opened for output then
you can use the print statement with an extra parameter. To print a string to the file with
the INFO filehandle use
Third, you can use the following to open the standard input (usually the keyboard) and
standard output (usually the screen) respectively:
In the above program the information is read from a file. The file is the INFO file and to
read from it Perl uses angled brackets. So the statement
@lines = <INFO>;
reads the file denoted by the filehandle into the array @lines. Note that the <INFO>
expression reads in the file entirely in one go. This is because the reading takes place in
the context of an array variable. If @lines is replaced by the scalar $lines then only the
next one line would be read in. In either case each line is stored complete with its newline
character at the end.
Exercise
Modify the above program so that the entire file is printed with a # symbol at the
beginning of each line. You should only have to add one line and modify another. Use the
$" variable. Unexpected things can happen with files, so you may find it helpful to use
the -w option.
Extending pipes
You can very easily substitute reading a file to reading a pipe. The following example
shows reading the ouput of the ps command.
open(PS,"ps -aef|") or die "Cannot open ps \n";
while(){
print ;
}
close(PS);
Regular expressions
A regular expression is contained in slashes, and matching occurs with the =~ operator.
The following expression is true if the string the appears in variable $sentence.
$sentence =~ /the/
The RE is case sensitive, so if
$sentence = "The quick brown fox";
then the above match will be false. The operator !~ is used for spotting a non-match. In
the above example
$sentence !~ /the/
is true because the string the does not appear in $sentence.
if ($sentence =~ /under/)
{
print "We're talking about rugby\n";
}
which would print out a message if we had either of the following
$sentence = "Up and under";
$sentence = "Best winkles in Sunderland";
But it's often much easier if we assign the sentence to the special variable $_ which is of
course a scalar. If we do this then we can avoid using the match and non-match operators
and the above can be written simply as
if (/under/)
{
print "We're talking about rugby\n";
}
The $_ variable is the default for many Perl operations and tends to be used very heavily.
More on REs
In an RE there are plenty of special characters, and it is these that both give them their
power and make them appear very complicated. It's best to build up your use of REs
slowly; their creation can be something of an art form.
There are even more options. Square brackets are used to match any one of the characters
inside them. Inside square brackets a - indicates "between" and a ^ at the beginning
means "not":
[qjk] # Either q or j or k
[^qjk] # Neither q nor j nor k
[a-z] # Anything from a to z inclusive
[^a-z] # No lower case letters
[a-zA-Z] # Any letter
[a-z]+ # Any non-zero sequence of lower case letters
At this point you can probably skip to the end and do at least most of the exercise. The
rest is mostly just for reference.
A vertical bar | represents an "or" and parentheses (...) can be used to group things
together:
\n # A newline
\t # A tab
\w # Any alphanumeric (word) character.
# The same as [a-zA-Z0-9_]
\W # Any non-word character.
# The same as [^a-zA-Z0-9_]
\d # Any digit. The same as [0-9]
\D # Any non-digit. The same as [^0-9]
\s # Any whitespace character: space,
# tab, newline, etc
\S # Any non-whitespace character
\b # A word boundary, outside [] only
\B # No word boundary
\| # Vertical bar
\[ # An open square bracket
\) # A closing parenthesis
\* # An asterisk
\^ # A carat symbol
\/ # A slash
\\ # A backslash
and so on.
As was mentioned earlier, it's probably best to build up your use of regular expressions
slowly. Here are a few examples. Remember that to use them for matching they should be
put in /.../ slashes
Previously your program counted non-empty lines. Alter it so that instead of counting
non-empty lines it counts only lines with
the letter x
the string the
the string the which may or may not have a capital t
the word the with or without a capital. Use \b to detect word boundaries.
In each case the program should print out every line, but it should only number those
specified. Try to use the $_ variable to avoid using the =~ match operator explicitly.
Just like the sed and tr utilities in Unix, you have s/// and tr/// in Perl. The former is for
substitution and the later is for translation.
$program =~ s {
/\* # Match the opening delimiter.
.*? # Match a minimal number of characters.
\*/ # Match the closing delimiter.
} []gsx; # Delete (most) C comments.
$myname = "BABU";
$myname =~ tr/[A-Z]/[a-z]/ ; # yields babu
Splitting
Perl provides a split function to split strings, based on REs. The syntax is
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
split
If EXPR is omitted, $_ is used. If PATTERN is also omitted, splits on whitespaces, after
skipping leading whitespaces. LIMIT sets the maximum fields returned - so this can be
used to split partially. Some examples are given below:
# process the password file
open(PASSWD, '/etc/passwd');
while () {
($login, $passwd, $uid, $gid,
$gcos, $home, $shell) = split(/:/);
# note that $shell still has a new line.
# use chop or chomp to remove the newline
#...
($login, $passwd, $remainder) = split(/:/, $_, 3);
# here we use LIMIT to set the number of fields
}
We also have join which is the opposite of split. For fixed length strings, we have
unpack and pack functions.
2.9 Subroutines
Like any good programming language Perl allows the user to define their own functions,
called subroutines. They may be placed anywhere in your program but it's probably best
to put them all at the beginning or all at the end. A subroutine has the form
sub mysubroutine
{
print "Not a very interesting routine\n";
print "This does the same thing every time\n";
}
regardless of any parameters that we may want to pass to it. All of the following will
work to call this subroutine. Notice that a subroutine is called with an & character in
front of the name:
&mysubroutine; # Call the subroutine
&mysubroutine($_); # Call it with a parameter
&mysubroutine(1+2, $_); # Call it with two parameters
Parameters
In the above case the parameters are acceptable but ignored. When the subroutine is
called any parameters are passed as a list in the special @_ list array variable. This
variable has absolutely nothing to do with the $_ scalar variable. The following
subroutine merely prints out the list that it was called with. It is followed by a couple of
examples of its use.
sub printargs
{
print "@_\n";
}
Returning values
Result of a subroutine is always the last thing evaluated. This subroutine returns the
maximum of two input parameters. An example of its use follows.
sub maximum
{
if ($_[0] > $_[1])
{
$_[0];
}
else
{
$_[1];
}
}
Local variables
The @_ variable is local to the current subroutine, and so of course are $_[0], $_[1],
$_[2], and so on. Other variables can be made local too, and this is useful if we want to
start altering the input parameters. The following subroutine tests to see if one string is
inside another, spaces not withstanding. An example follows.
sub inside
{
local($a, $b); # Make local variables
($a, $b) = ($_[0], $_[1]); # Assign values
$a =~ s/ //g; # Strip spaces from
$b =~ s/ //g; # local variables
($a =~ /$b/ || $b =~ /$a/); # Is $b inside $a
# or $a inside $b?
}
&inside("lemon", "dole money"); # true
In fact, it can even be tidied up by replacing the first two lines with
local($a, $b) = ($_[0], $_[1]);
This file is compiled automatically from the URLs listed below. Between
each page is the line containing only '----- boundary ' followed by the
URL
of the next page or 'begin' or 'end' followed by ' -----'. The URLs are:
http://agora.leeds.ac.uk/nik/Perl/start.html
http://agora.leeds.ac.uk/nik/Perl/basic.html
http://agora.leeds.ac.uk/nik/Perl/running.html
http://agora.leeds.ac.uk/nik/Perl/scalars.html
http://agora.leeds.ac.uk/nik/Perl/arrays.html
http://agora.leeds.ac.uk/nik/Perl/filehandling.html
http://agora.leeds.ac.uk/nik/Perl/control.html
http://agora.leeds.ac.uk/nik/Perl/conditionals.html
http://agora.leeds.ac.uk/nik/Perl/matching.html
http://agora.leeds.ac.uk/nik/Perl/sandtr.html
http://agora.leeds.ac.uk/nik/Perl/split.html
http://agora.leeds.ac.uk/nik/Perl/associative.html
http://agora.leeds.ac.uk/nik/Perl/subroutines.html
_________________________________________________________________
There are plenty of other Perl tutorials around, and most (if not
all)
of them can be found at the UF/NA Perl Archive. However I wanted
something that included exercises developing a consistent theme; none
of the others seemed to do this.
Thanks to Neil Bowers whose Perl page is where I ripped off the camel
icon (though he ripped it off someone before me, of course) and to
our
Support team for their technical wizardry.
_________________________________________________________________
Home
_________________________________________________________________
----- boundary http://agora.leeds.ac.uk/nik/Perl/basic.html -----
_________________________________________________________________
_________________________________________________________________
Here is the basic perl program that we'll use to get started.
#!/usr/local/bin/perl
#
# Program to do the obvious
#
print 'Hello world.'; # Print a message
Every perl program starts off with this as its very first line:
#!/usr/local/bin/perl
although this may vary from system to system. This line tells the
machine what to do with the file when it is executed (ie it tells it
to run the file through Perl).
_________________________________________________________________
_________________________________________________________________
Simple printing
Start Next
_________________________________________________________________
----- boundary http://agora.leeds.ac.uk/nik/Perl/running.html -----
_________________________________________________________________
_________________________________________________________________
Type in the example program using a text editor, and save it. Emacs
is
a good editor to use for this because it has its own Perl mode which
formats lines nicely when you hit tab (use `M-x perl-mode'). But as
ever, use whichever you're most comfortable with.
After you've entered and saved the program make sure the file is
executable by using the command
perl progname
./progname
progname
If something goes wrong then you may get error messages, or you may
get nothing. You can always run the program with warnings using the
command
perl -w progname
perl -d progname
When the file is executed Perl first compiles it and then executes
that compiled version. So after a short pause for compilation the
program should run quite quickly. This also explains why you can get
compilation errors when you execute a Perl file which consists only
of
text.
Make sure your program works before proceeding. The program's output
may be slightly unexpected - at least it isn't very pretty. We'll
look
next at variables and then tie this in with prettier printing.
_________________________________________________________________
_________________________________________________________________
SCALAR VARIABLES
_________________________________________________________________
The most basic kind of variable in Perl is the scalar variable.
Scalar
variables hold both strings and numbers, and are remarkable in that
strings and numbers are completely interchangable. For example, the
statement
$priority = 9;
sets the scalar variable $priority to 9, but you can also assign a
string to exactly the same variable:
$priority = 'high';
$priority = '9';
$default = '0009';
and can still cope with arithmetic and other operations quite
happily.
_________________________________________________________________
$a = $b; # Assign $b to $a
$a += $b; # Add $b to $a
$a -= $b; # Subtract $b from $a
$a .= $b; # Append $b onto $a
Other operators can be found on the perlop manual page. Type man
perlop at the prompt.
_________________________________________________________________
Interpolation
$a = 'apples';
$b = 'pears';
print $a.' and '.$b;
_________________________________________________________________
Exercise
This exercise is to rewrite the Hello world program so that (a) the
string is assigned to a variable and (b) this variable is then
printed
with a newline character. Use the double quotes and don't use the
concatenation operator. Make sure you can get this to work before
proceeding.
_________________________________________________________________
ARRAY VARIABLES
_________________________________________________________________
assigns a three element list to the array variable @food and a two
element list to the array variable @music.
$food[2]
_________________________________________________________________
Array assignments
push(@food, "eggs");
which pushes eggs onto the end of the array @food. To push two or
more
items onto the array use one of the following forms:
$f = @food;
$f = "@food";
turns the list into a string with a space between each element. This
space can be replaced by any other string by changing the value of
the
special $" variable. This variable is just one of Perl's many special
variables, most of which have odd names.
The last assignment occurs because arrays are greedy, and @somefood
will swallow up as much of @food as it can. Therefore that form is
best avoided.
Finally, you may want to find the index of the last element of a
list.
To do this for the @food array use the expression
$#food
_________________________________________________________________
Displaying arrays
_________________________________________________________________
Exercise
Try out each of the above three print statements to see what they do.
_________________________________________________________________
_________________________________________________________________
FILE HANDLING
_________________________________________________________________
Here is the basic perl program which does the same as the UNIX cat
command on a certain file.
#!/usr/local/bin/perl
#
# Program to open the password file, read it in,
# print it, and close it again.
The open function opens a file for input (i.e. for reading). The
first
parameter is the filehandle which allows Perl to refer to the file in
future. The second parameter is an expression denoting the filename.
If the filename was given in quotes then it is taken literally
without
shell expansion. So the expression '~/notes/todolist' will not be
interpreted successfully. If you want to force shell expansion then
use angled brackets: that is, use <~/notes/todolist> instead.
Third, you can use the following to open the standard input (usually
the keyboard) and standard output (usually the screen) respectively:
In the above program the information is read from a file. The file is
the INFO file and to read from it Perl uses angled brackets. So the
statement
@lines = <INFO>;
reads the file denoted by the filehandle into the array @lines. Note
that the <INFO> expression reads in the file entirely in one go. This
because the reading takes place in the context of an array variable.
If @lines is replaced by the scalar $lines then only the next one
line
would be read in. In either case each line is stored complete with
its
newline character at the end.
_________________________________________________________________
Exercise
Modify the above program so that the entire file is printed with a #
symbol at the beginning of each line. You should only have to add one
line and modify another. Use the $" variable. Unexpected things can
happen with files, so you may find it helpful to use the -w option as
mentioned in the section on running Perl programs.
_________________________________________________________________
_________________________________________________________________
CONTROL STRUCTURES
_________________________________________________________________
_________________________________________________________________
foreach
_________________________________________________________________
Testing
The next few structures rely on a test being true or false. In Perl
any non-zero number and non-empty string is counted as true. The
number zero, zero by itself in a string, and the empty string are
counted as false. Here are some tests on numbers and strings.
_________________________________________________________________
for
Perl has a for structure that mimics that of C. It has the form
_________________________________________________________________
Here is a program that reads some input from the keyboard and won't
continue until it is the correct password
#!/usr/local/bin/perl
print "Password? "; # Ask for input
$a = <STDIN>; # Get input
chop $a; # Remove the newline at end
while ($a ne "fred") # While input is wrong...
{
print "sorry. Again? "; # Ask again
$a = <STDIN>; # Get input again
chop $a; # Chop off newline again
}
The curly-braced block of code is executed while the input does not
equal the password. The while structure should be fairly clear, but
this is the opportunity to notice several things. First, we can we
read from the standard input (the keyboard) without opening the file
first. Second, when the password is entered $a is given that value
including the newline character at the end. The chop function removes
the last character of a string which in this case is the newline.
To test the opposite thing we can use the until statement in just the
same way. This executes the block repeatedly until the expression is
true, not while it is true.
#!/usr/local/bin/perl
do
{
"Password? "; # Ask for input
$a = <STDIN>; # Get input
chop $a; # Chop off newline
}
while ($a ne "fred") # Redo while wrong input
_________________________________________________________________
Exercise
Modify the program from the previous exercise so that each line of
the
file is read in one by one and is output with a line number at the
beginning. You should get something like:
1 root:oYpYXm/qRO6N2:0:0:Super-User:/:/bin/csh
2 sysadm:*:0:0:System V Administration:/usr/admin:/bin/sh
3 diag:*:0:996:Hardware Diagnostics:/usr/diags:/bin/csh
etc
When you have done this see if you can alter it so that line numbers
are printed as 001, 002, ..., 009, 010, 011, 012, etc. To do this you
should only need to change one line by inserting an extra four
characters. Perl's clever like that.
_________________________________________________________________
_________________________________________________________________
CONDITIONALS
_________________________________________________________________
if ($a)
{
print "The string is not empty\n";
}
else
{
print "The string is empty\n";
}
Exercise
Find a fairly large file that contains some text and some blank
lines.
The file ~nik/WWW/Misc/electricity.txt is pretty good because it's
funny apart from anything else. It was originally posted to our local
news system by David O'Brien.
From the previous exercise you should have a program which prints out
the password file with line numbers. Change it so that works with the
text file. Now alter the program so that line numbers aren't printed
or counted with blank lines, but every line is still printed,
including the blank ones. Remember that when a line of the file is
read in it will still include its newline character at the end.
_________________________________________________________________
_________________________________________________________________
STRING MATCHING
_________________________________________________________________
One of the most useful features of Perl (if not the most useful
feature) is its powerful string manipulation facilities. At the heart
of this is the regular expression (RE) which is shared by many other
UNIX utilities.
_________________________________________________________________
Regular expressions
$sentence =~ /the/
$sentence !~ /the/
_________________________________________________________________
if ($sentence =~ /under/)
{
print "We're talking about rugby\n";
}
But it's often much easier if we assign the sentence to the special
variable $_ which is of course a scalar. If we do this then we can
avoid using the match and non-match operators and the above can be
written simply as
if (/under/)
{
print "We're talking about rugby\n";
}
The $_ variable is the default for many Perl operations and tends to
be used very heavily.
_________________________________________________________________
More on REs
There are even more options. Square brackets are used to match any
one
of the characters inside them. Inside square brackets a - indicates
"between" and a ^ at the beginning means "not":
[qjk] # Either q or j or k
[^qjk] # Neither q nor j nor k
[a-z] # Anything from a to z inclusive
[^a-z] # No lower case letters
[a-zA-Z] # Any letter
[a-z]+ # Any non-zero sequence of lower case letters
At this point you can probably skip to the end and do at least most
of
the exercise. The rest is mostly just for reference.
\n # A newline
\t # A tab
\w # Any alphanumeric (word) character.
# The same as [a-zA-Z0-9_]
\W # Any non-word character.
# The same as [^a-zA-Z0-9_]
\d # Any digit. The same as [0-9]
\D # Any non-digit. The same as [^0-9]
\s # Any whitespace character: space,
# tab, newline, etc
\S # Any non-whitespace character
\b # A word boundary, outside [] only
\B # No word boundary
\| # Vertical bar
\[ # An open square bracket
\) # A closing parenthesis
\* # An asterisk
\^ # A carat symbol
\/ # A slash
\\ # A backslash
and so on.
_________________________________________________________________
_________________________________________________________________
Exercise
Previously your program counted non-empty lines. Alter it so that
instead of counting non-empty lines it counts only lines with
* the letter x
* the string the
* the string the which may or may not have a capital t
* the word the with or without a capital. Use \b to detect word
boundaries.
In each case the program should print out every line, but it should
only number those specified. Try to use the $_ variable to avoid
using
the =~ match operator explicitly.
_________________________________________________________________
_________________________________________________________________
_________________________________________________________________
$sentence =~ s/london/London/
s/london/London/
Notice that the two regular expressions (london and London) are
surrounded by a total of three slashes. The result of this expression
is the number of substitutions made, so it is either 0 (false) or 1
(true) in this case.
_________________________________________________________________
Options
This example only replaces the first occurrence of the string, and it
may be that there will be more than one such string we want to
replace. To make a global substitution the last slash is followed by
a
g as follows:
s/london/London/g
s/[Ll][Oo][Nn][Dd][Oo][Nn]/London/g
but an easier way is to use the i option (for "ignore case"). The
expression
s/london/London/gi
_________________________________________________________________
Remembering patterns
It's often useful to remember patterns that have been matched so that
they can be used again. It just so happens that anything matched in
parentheses gets remembered in the variables $1,...,$9. These strings
can also be used in the same regular expression (or substitution) by
using the special RE codes \1,...,\9. For example
if (/(\b.+\b) \1/)
{
print "Found $1 repeated\n";
}
will identify any words repeated. Each \b represents a word boundary
and the .+ matches any non-empty string, so \b.+\b matches anything
between two word boundaries. This is then remembered by the
parentheses and stored as \1 for regular expressions and as $1 for
the
rest of the program.
The following swaps the first and last characters of a line in the $_
variable:
s/^(.)(.*)(.)$/\3\2\1/
The ^ and $ match the beginning and end of the line. The \1 code
stores the first character; the \2 code stores everything else up the
last character which is stored in the \3 code. Then that whole line
is
replaced with \1 and \3 swapped round.
After a match, you can use the special read-only variables $` and $&
and $' to find what was matched before, during and after the seach.
So
after
$` eq "Lord Wo";
$& eq "pp";
$' eq "er of Fibbing";
$search = "the";
s/$search/xxx/g;
will replace every occurrence of the with xxx. If you want to replace
every occurence of there then you cannot do s/$searchre/xxx/ because
this will be interpolated as the variable $searchre. Instead you
should put the variable name in curly braces so that the code becomes
$search = "the";
s/${search}re/xxx/;
_________________________________________________________________
Translation
$sentence =~ tr/abc/edf/
tr/a-z/A-Z/;
_________________________________________________________________
Exercise
023 Amp, James Wa(tt), Bob Transformer, etc. These pion(ee)rs conducted
many
For a slightly more interesting program you might like to try the
following. Suppose your program is called countlines. Then you would
call it with
./countlines
then those arguments are stored in the array @ARGV. In the above
example we have $ARGV[0] is first and $ARGV[1] is second and $ARGV[2]
is etc. Modify your program so that it accepts one argument and
counts
only those lines with that string. It should also put occurrences of
this string in paretheses. So
./countlines the
019 But (the) greatest Electrical Pioneer of (the)m all was Thomas
Edison, who
_________________________________________________________________
_________________________________________________________________
SPLIT
_________________________________________________________________
@personal = split(/:/);
If the fields are divided by any number of colons then we can use the
RE codes to get round this. The code
is the same as
@personal = ("Capes", "Geoff",
"Shot putter", "Big Avenue");
But this:
would be like
A word can be split into characters, a sentence split into words and
a
paragraph split into sentences:
In the first case the null string is matched between each character,
and that is why the @chars array is an array of characters - ie an
array of strings of length 1.
_________________________________________________________________
Exercise
If you use a negative index that extends beyond the beginning of the
string then Perl will return nothing or give a warning. To avoid this
happening you can pad out the string by using the x operator
mentioned
earlier. The expression (" "x30) produces 30 spaces, for example.
_________________________________________________________________
_________________________________________________________________
ASSOCIATIVE ARRAYS
_________________________________________________________________
Now we can find the age of people with the following expressions
Notice that like list arrays each % sign has changed to a $ to access
an individual element because that element is a scalar. Unlike list
arrays the index (in this case the person's name) is enclosed in
curly
braces, the idea being that associative arrays are fancier than list
arrays.
_________________________________________________________________
Operators
Associative arrays do not have any order to their elements (they are
just like hash tables) but is it possible to access all the elements
in turn using the keys function and the values function:
When keys and values are called in a scalar context they return the
number of key/value pairs in the associative array.
_________________________________________________________________
Environment variables
When you run a perl program, or any script in UNIX, there will be
certain environment variables set. These will be things like USER
which contains your username and DISPLAY which specifies which screen
your graphics will go to. When you run a perl CGI script on the World
Wide Web there are environment variables which hold other useful
information. All these variables and their values are stored in the
associative %ENV array in which the keys are the variable names. Try
the following in a perl program:
_________________________________________________________________
_________________________________________________________________
SUBROUTINES
_________________________________________________________________
Like any good programming langauge Perl allows the user to define
their own functions, called subroutines. They may be placed anywhere
in your program but it's probably best to put them all at the
beginning or all at the end. A subroutine has the form
sub mysubroutine
{
print "Not a very interesting routine\n";
print "This does the same thing every time\n";
}
_________________________________________________________________
Parameters
In the above case the parameters are acceptable but ignored. When the
subroutine is called any parameters are passed as a list in the
special @_ list array variable. This variable has absolutely nothing
to do with the $_ scalar variable. The following subroutine merely
prints out the list that it was called with. It is followed by a
couple of examples of its use.
sub printargs
{
print "@_\n";
}
Just like any other list array the individual elements of @_ can be
accessed with the square bracket notation:
sub printfirsttwo
{
print "Your first argument was $_[0]\n";
print "and $_[1] was your second\n";
}
Again it should be stressed that the indexed scalars $_[0] and $_[1]
and so on have nothing to with the scalar $_ which can also be used
without fear of a clash.
_________________________________________________________________
Returning values
sub maximum
{
if ($_[0] > $_[1])
{
$_[0];
}
else
{
$_[1];
}
}
_________________________________________________________________
Local variables
sub inside
{
local($a, $b); # Make local variables
($a, $b) = ($_[0], $_[1]); # Assign values
$a =~ s/ //g; # Strip spaces from
$b =~ s/ //g; # local variables
($a =~ /$b/ || $b =~ /$a/); # Is $b inside $a
# or $a inside $b?
}
Previous Start
_________________________________________________________________
----- boundary end -----