Introduction To Perl Scripting
Introduction To Perl Scripting
E T W O R K
Q U I P M E N T
E C H N O L O G I E S
P I
A N A
U E
A N A G E M E N T T O
P S
L A T F O R M C R I P T I N G
N T R O D U C T I O N
E R L
E L E A S E
2 . 0
..........................................................
Issued September 1998
NETWORK EQUIPMENT TECHNOLOGIES, INC., (N.E.T.) PROVIDES THIS DOCUMENT AS IS, WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. This document constitutes the sole Specications referred to in N.E.T.'s Product Warranty for the products or services described herein. N.E.T.'s Product Warranty is subject to all the conditions, restrictions, and limitations contained herein and in the applicable contract. N.E.T. has made reasonable efforts to verify that the information in this document is accurate, but N.E.T. reserves the right to correct typographical errors or technical inaccuracies. N.E.T. assumes no responsibility for any use of the use of the information contained in this document or for any infringement of patents or other rights of third parties that may result. Networking products cannot be tested in all possible uses, congurations or implementations, and interoperability with other products cannot be guaranteed. The customer is solely responsible for verifying the suitability of N.E.T.'s products for use in its network. Local market variations may apply. This document is subject to change by N.E.T. without notice as additional information is incorporated by N.E.T. or as changes are made by N.E.T. to hardware or software. Copyright 1998 Network Equipment Technologies, Inc. All rights reserved. No part of this publication may be stored in a retrieval system, transmitted or reproduced in any way, including, but not limited to, photocopy, photograph, magnetic, or other record, without the prior written permission of N.E.T.
The software accompanying this documentation is furnished under a license and may only be used in accordance with the terms of such license. This documentation is commercial computer software documentation as that term is used in 48 CFR 12.212. Unless otherwise agreed, use, duplication, or disclosure of this documentation and any related software by U.S. Government civilian agencies is subject to restrictions as set forth in 48 CFR 52.227-14 (ALT III) and 48 CFR 52.227-19, and use, duplication, or disclosure by DoD is subject to restrictions as set forth in 48 CFR 227.7202-1(a) and 48 CFR 227.7202-3(a) or, if applicable, 48 CFR 252.227-7013(c)(1)(ii) (OCT 1988). Unpublished-rights reserved under the copyright laws of the United States. Network Equipment Technologies, Inc./N.E.T. Federal, Inc. 6500 Paseo Padre Parkway Fremont, CA 94555
ii
..........................................................
Trademarks IDNX, ADNX, and the N.E.T. logo are registered trademarks, and CellXpress, FrameXpress, Frame Relay Exchange,
ISDNX, LAN/WAN Exchange, Network Equipment Technologies, NetOpen, N.E.T., PanaVue, PortExtender, PrimeSwitch, PrimeVideo, PrimeVoice, Promina, SONET Transmission Manager, STM, and SPX are trademarks of Network Equipment Technologies, Inc. All other trademarks are the sole property of their respective companies. Apache Server source, binaries, and documentation copyright 1995,1996, 1997, 1998 The Apache Group. All rights reserved. This product includes software developed by the Apache Group for use in the Apache HTTP server project (http://www.apache.org/). SunOS and Solaris software copyright held by Sun Microsystems, Inc. Sun Microsystems is a registered trademark and Sun, SunOS, OpenWindows, Solaris, and Ultra are trademarks of Sun Microsystems, Inc. SPARC is a registered trademark of SPARC Internal, Inc. SPARCstation is a registered trademark of SPARC International, Inc. licensed exclusively to Sun Microsystems, Inc. ORACLE and SQL*Plus are trademarks of Oracle Corporation. X-Window System software copyright held by Massachusetts Institute of Technology. Openview, HP, and the HP logo are trademarks of Hewlett-Packard Company. OpenSoftware Foundation, OSF, the OSF logo, OSF/MOTIF, and MOTIF are trademarks of the Open Software Foundation. All other trademarks are the sole property of their respective companies.
Note: In this manual, any reference to PanaVue refers to the PanaVue Management Platform product line, unless specied differently.
iii
..........................................................
iv
Document Organization
The document contains the following sections:
Title Description Provides an overview to the Perl language, what this guide covers, and other resources. Provides an introduction to the syntax of Perl scripts and advice for programmers who are new to Perl. Describes how Perl implements scalar variables, arrays, and associative arrays. Describes the commands used to output information to STDOUT, STDERR, and les.
Introduction to Programming in Perl Basics of the Perl Programming Language Using Variables in Perl Perl Output Commands
..........................................................
Title Description Describes Perls built-in arithmetic, logical, and relational operators. Describes the Perl control structures: if, unless, while, for, foreach. Describes Perls built-in arithmetic, timekeeping, and string functions. Describes Perls commands for accessing les and directories. Describes how to perform matchings and substitutions on strings using Perls regular expressions. Describes how to write and use procedures (subroutines) in Perl.
Preface
Operators in the Perl Language Control Structures and Loops Perls Built-In Functions File Access in Perl Using Regular Expressions in Perl
vi
Preface
The switch generates SNMPv2 traps. Press the Delete key. Press Shift+F1.
vii
..........................................................
Convention bold Enter Example Install Card. Enter Install Card. The Alarms Pending message displays on the screen. A gateway node is any node that connects to a domain. For more information, see the Hardware Description manual. Description Indicates a command to be typed. Also used for emphasis. Indicates that after typing the information, press the Return or Enter key. Refers to parameter options and other information displayed by the software. Refers to a new term that is dened.
Preface
italic
The following icons are used in this document to provide important information:
Icon Description Warning Denition Provides information on how to avoid a potentially hazardous situation that, if not avoided, could result in death or serious injury.
viii
..........................................................
Icon Description Caution Denition Provides information on how to avoid possible disruption of trafc or damage to les or equipment. Provides information that helps the user and should be read before proceeding.
Preface
Note
ix
Preface
PanaVue Scripting Guide Perl Man Pages Expect Tutorial and Introduction Scotty Man Pages
Preface
xi
..........................................................
Preface
xii
Contents ..........................................................
I n tro d u ct i o n to P r ogra mming in Perl Topics Not Covered in This Manual .............................................................................................................3 What is Perl? .................................................................................................................................................4 Basic Features of Perl ...................................................................................................................................5 For More Information ...................................................................................................................................7 B as i c s o f t h e P e rl P rogra mming Language Using the Perl Interpreter ..............................................................................................................................9 Specifying a #! Line ..............................................................................................................................10 Command Line Options ........................................................................................................................10 -c (check) ........................................................................................................................................11 -d (debugger) ...................................................................................................................................11 -D (Debugger) .................................................................................................................................11 -e (execute) ......................................................................................................................................12 -n (next/loop) ..................................................................................................................................13 -p (print and loop) ...........................................................................................................................13 -s (switches) ....................................................................................................................................13 -S (Search) ......................................................................................................................................14 -T (Taint checking) .........................................................................................................................14
xiii
..........................................................
-v (version) ......................................................................................................................................15 -w (warnings) ..................................................................................................................................15 -x (extract) .......................................................................................................................................16 Perl Basics ...................................................................................................................................................17 Perl Syntax ............................................................................................................................................17 Naming Conventions ............................................................................................................................20 U si n g V a ri a b l es in P e rl Scalar Variables ..........................................................................................................................................25 Arrays ..........................................................................................................................................................31 Associative Arrays ......................................................................................................................................38 Avoiding Confusion with Perls Variables .................................................................................................46 Special and Predefined Variables ...............................................................................................................48 P e rl O u t p u t C o mm a nds Using the print Command ...........................................................................................................................52 Using the printf Command ..........................................................................................................................55 Using the write Command ..........................................................................................................................59 Writing to Multiple Lines .....................................................................................................................63 Outputting Variables Containing Multiple Lines ................................................................................. 66 Adding Page Headers to a Report .........................................................................................................67
xiv
..........................................................
Problems with Buffering of Output ............................................................................................................69 O p era t o rs i n th e P e rl La ngua ge Assignment .................................................................................................................................................73 Common Shortcuts ...............................................................................................................................75 Autoincrement and Autodecrement ......................................................................................................76 Relational ....................................................................................................................................................79 Logical ........................................................................................................................................................82 C o n t ro l S t ru ctu re s a nd Loops if/else Command .........................................................................................................................................84 unless/else Command ..................................................................................................................................86 while Command ..........................................................................................................................................88 for Command ..............................................................................................................................................90 foreach Command .......................................................................................................................................93 P e rl s B u i l t - I n F unc t ions Arithmetic Functions ..................................................................................................................................97 abs .........................................................................................................................................................97 atan ........................................................................................................................................................97 cos .........................................................................................................................................................97
xv
..........................................................
exp .........................................................................................................................................................97 hex .........................................................................................................................................................97 int ..........................................................................................................................................................98 log .........................................................................................................................................................98 oct ..........................................................................................................................................................98 rand .......................................................................................................................................................99 sin ..........................................................................................................................................................99 sqrt .........................................................................................................................................................99 srand ......................................................................................................................................................99 Timekeeping Functions .............................................................................................................................100 time .....................................................................................................................................................100 localtime ..............................................................................................................................................100 gmtime ................................................................................................................................................100 String Functions ........................................................................................................................................101 chomp ..................................................................................................................................................101 chop .....................................................................................................................................................101 chr .......................................................................................................................................................101 index ....................................................................................................................................................102 length ...................................................................................................................................................102 lc ..........................................................................................................................................................102
xvi
..........................................................
lcfirst ...................................................................................................................................................102 ord .......................................................................................................................................................102 rindex ..................................................................................................................................................103 substr ...................................................................................................................................................103 uc .........................................................................................................................................................103 ucfirst ..................................................................................................................................................103 undef ...................................................................................................................................................104 F i l e A cc e s s i n P erl Using Filehandles ......................................................................................................................................106 Using STDIN, STDOUT, and STDERR ..................................................................................................107 Using STDIN to Read One Line .........................................................................................................108 Using STDIN to Read An Entire File .................................................................................................110 Using STDOUT and STDERR ...........................................................................................................110 Reading Input From the Diamond Operator (<>) .....................................................................................112 Changing the File List .........................................................................................................................112 Finding the Current Filename .............................................................................................................114 Accessing Files on the Command Line ....................................................................................................117 Using Filehandles ......................................................................................................................................119 Opening a File for Read-Only Access ................................................................................................119
xvii
..........................................................
Opening a File for Write Access .........................................................................................................120 Closing a Filehandle ...........................................................................................................................122 Directory Operations .................................................................................................................................124 Changing the Working Directory ........................................................................................................124 Listing the Files in a Directory ...........................................................................................................125 Reading a Directory Entry Directly ....................................................................................................126 Deleting Files ......................................................................................................................................128 Renaming Files ...................................................................................................................................129 Creating and Removing Directories ....................................................................................................130 File Test Operators ....................................................................................................................................132 U si n g R eg u l a r E x pre s s ions in P erl Defining Regular Expressions ..................................................................................................................138 Rules in Using Regular Expressions .........................................................................................................140 Simple Matching and Replacing ...............................................................................................................144 Using Wildcards in Regular Expressions .................................................................................................155 Wildcards are Greedy ......................................................................................................................157 Using WildCards for Matching ...........................................................................................................159 Using Wildcards for Substitutions ......................................................................................................161 Using Subroutines in Perl
xviii
..........................................................
Defining Subroutines ................................................................................................................................167 Using a Subroutines Return Value ..........................................................................................................169 Defining Local Variables ..........................................................................................................................174 Passing Arguments ....................................................................................................................................178
xix
..........................................................
..........................................................
Control Structures and Loops describes the control structures and loops most commonly used in Perl programs. Perls Built-In Functions briey lists the basic built-in functions of the Perl language. File Access in Perl describes the use of lenames and lehandles in reading, writing, creating, and deleting les. Using Regular Expressions in Perl describes the use of regular expressions, one of the major features of the Perl language. Using Subroutines in Perl describes how to declare and use subroutines.
See the following sections for a list of topics that are not covered in this document, as well as a description of Perls basic features and a list of other references about Perl.
Note: This chapter provides only a basic introduction to the Perl language and how it can be used on the PanaVue workstation. It assumes you have some working knowledge of basic programming concepts such as variables, control loops, and subroutines. For a complete reference to these topics and to the Perl language, see the books and online references listed in For More Information on page 7.
For more information about these topics, see the online Perl documentation or any of the Perl reference works listed in For More Information on page 7.
What is Perl?
..........................................................
In addition to scalar data types and standard arrays, Perl offers an associative array that can be used for quick database access and management. Since Perl was originally written to generate reports, it contains many features to help in creating and formatting reports. Perl is easily extensible, so options such as SNMP communication can be incorporated into your programs as if they were part of the Perl language.
The following books are recommended as reference guides and tutorials for the Perl language:
..........................................................
Learning Perl, second edition, by Randal L. Schwartz and Tom Christiansen, OReilly & Associates, Inc. Sebastopol, CA. July, 1997 (http://www.oreilly.com) a short introduction to Perl that introduces the major concepts and commands. Perl 5 Desktop Reference, by Johan Vromans, OReilly and Associates, Inc. 1996 a pocket-sized quick reference to the Perl programming language. Programming Perl, second edition, by Larry Wall and Randal Schwartz, OReilly and Associates, Inc. 1996 the denitive reference to Perl version 5 by its author. It includes many complete programs that can either be used as is or as templates for your own programs. CGI Programming on the World Wide Web, by Shishir Gundavaram, OReilly & Associates, Inc. Sebastopol, CA. 1996. ISBN: 1-56592-168-2. Web Client Programming with Perl, by Clinton Wong, OReilly & Associates, Inc. Sebastopol, CA. 1996. ISBN: 1-56592-214-X.
The following books describe Perls use for writing web-based CGI scripts:
..........................................................
where options are the options for the Perl interpreter, script.pl is the Perl script being run, and command-line-args are the arguments that should be passed to the script. As a general habit, you should also include the -w option for all your scripts since that it warns you of many potential problems, such as misspelling a variable name:
/usr/thirdParty/perl/bin/perl -w script.pl command-line-args
See Command Line Options on page 10 for more information about this and other options.
..........................................................
Specifying a #! Line If the rst line of a script begins with the #! characters, the Perl interpreter rst veries that it should be running the script by looking at this line. If the interpreter does not nd the word perl anywhere on this line, it does not execute the script; instead, it calls whatever program is specied and passes the script to it. For example, if script.sh is a shell script that begins with the line #!/bin/sh, and you give the command perl script.sh, the Perl interpreter executes the command /bin/sh script.sh so that the shell program runs the script itself. This is a quick way of running other scripts if you are not sure which shell or program they belong to, but it is not recommended since it takes several seconds for the Perl interpreter to load, to examine the script le, and to execute the proper command. Command Line Options Perls command line options can be specied either on the command line or as part of the rst line of a standalone script. For example, the -w option instructs the interpreter to print a warning about possible typographical errors and anything else it considers to be a bad programming practice. If you invoke the Perl interpreter directly, specify this option on the command line as follows:
/usr/thirdParty/perl/bin/perl -w script.pl command-line-args
If, on the other hand, your scripts are run as standalone commands, add the -w option at the end of the rst line of each script:
#!/usr/thirdParty/perl/bin/perl -w
10
..........................................................
Note: It is strongly recommended you use the -w option for all your scripts because it is a way of highlighting potential problems and typographical errors.
The following are the most useful command line options for the Perl interpreter. See the online documentation for a complete listing: -c (check) The -c option checks the syntax of a Perl script without actually executing it. This option is especially useful in verifying that the scripts opening and closing braces match, and that all of the library les referenced by use statements actually exist. For example:
perl -c myscript.pl
-d (debugger) The -d option turns on the interactive Perl debugger and it should be used only when running scripts at the command-line. See the online documentation for the Perl debugger for information on using it. For example:
perl -d myscript.pl
-D (Debugger) The -D option customizes the debugger so as to focus on specic operations within your scripts. This option is not enabled in the Perl interpreter that is shipped with PanaVue because including it can signicantly slow down regular operations. To use
11
..........................................................
this feature you must recompile the Perl source code with the -DDEBUGGING ag turned on. If you recompile Perl, be certain that you do not overwrite the Perl interpreter that is shipped with the PanaVue system. Instead, put the debugger version of Perl in a separate directory. -e (execute) The -e option executes Perl statements that are given on the command line instead of executing a Perl script. This option is most commonly used with the -n and -p options (see below) to create shell aliases. For example, to create a command named print-old that lists the lenames of all les in the /opt/Panavue/reports directory that are more than 14 days old, dene the following alias in your .cshrc le:
alias print-old "nd /opt/Panavue/reports -mtime +14 -print | perl -ne print; | more"
This is a trivial example since the nd command can also print out the lenames, but Perl could be used to delete the old les by modifying the alias as follows:
alias remove-old "nd /opt/Panavue/reports -mtime +14 -print | perl -ne chop; unlink;"
Be careful when using Perl to delete or modify les in this manner. Before using a script that can modify or delete les, you should rst test your Perl script by having it print out the lenames as shown in the rst example. When you are satised that only the proper les are being specied, then modify the alias so that it actually deletes or changes the les.
12
..........................................................
-n (next/loop) The -n option instructs the Perl interpreter to continuously loop your script until the end of input (from STDIN or a specied le). This option is commonly used with the -e option (see above). -p (print and loop) The -p option, like the -n option (see above), instructs the Perl interpreter to continuously loop your script until the end of input (from STDIN or a specied le). However, unlike -n, this option prints each line of input after it has been processed. If this option is used with the print-old alias shown above, you do not need Perls print statement:
alias print-old "nd /opt/Panavue/reports -mtime +14 -print | perl -pe | more"
-s (switches) The -s option instructs the Perl interpreter to convert into variables any switches that appear on the command line after the scripts lename. Switches must start with a hyphen (-) and contain only letters, numbers, or underscores (such as -switch or -var2). An optional value can be appended using an equal sign (such as -switch=value); if no value is given, a value of 1 is automatically assigned. For example, the following command line has three switches, two which are assigned specic values (friday and alarms) and one which is given the value of 1:
perl -s myscript.pl -day=friday -ag -report_type=alarms
13
..........................................................
When this command is executed, the three switches are converted into variables that the script myscript.pl can access. For example, you could print out these variables using the following lines:
print The day variable is $day\n; # prints out friday print The ag variable is $ag\n; #prints out 1 print The report_type variable is $report_type\n; # prints out alarms
-S (Search) The -S option instructs the Perl interpreter to search for the specied script using the PATH environment variable. This is useful only when the PATH variable exists and when the desired script is in one of the PATH directories. For example, if myscript.pl is in your search PATH, you could execute it from any directory by giving the following command:
perl -S myscript.pl
-T (Taint checking) The -T option turns on taint checking, which prevents any user input (command line arguments, environment variables, or input from STDIN) from being used in a command that when run by the root user could damage the system or evade the systems built-in security features. You should not use this feature unless you are a system administrator who understands the setuid and setgid functions of the Solaris operating system.
14
..........................................................
-v (version) The -v option displays the version and patchlevel of the Perl interpreter and then exits. To get the version of Perl from within a script, use the $] special variable:
print The Perl version number is $]\n;
-w (warnings) The -w option prints a warning when the following situations occur in your script: A variable or other identier is used only once, which could indicate a typographical error A variable is used before a value has been assigned to it A subroutine is dened more than once A lehandle is used before being dened The script attempts to write to a lehandle that was opened read-only A subroutine recursively calls itself until it is nested 100 or more levels deep The numeric equality (==) or numeric inequality (!=) operator is used with variables that appear to contain strings (which require the string operators eq and ne)
These warnings do not halt the execution of the script, but they do indicate it is likely the script is not operating as originally intended.
15
..........................................................
Note: It is strongly recommended that you use the -w command line option for all your scripts until you have tested them thoroughly and are condent they perform exactly as intended.
-x (extract) The -x option instructs the Perl interpreter to extract the Perl script from the specied input le. The interpreter reads the input le and discards all lines until it nds a line that starts with #! and that contains the word perl. The interpreter then treats all of the following lines as a Perl script until it nds one of the following: the End of File (EOF) a CTRL-D (ASCII 4) a CTRL-Z (ASCII 26) a line with the _END_ keyword
This option must be specied on the command line (and not as part of the #! line) so it is especially useful if you want to include uncommented explanatory documentation at either the beginning or end of your scripts. If so, you can run the scripts without Perl complaining about the non-commented text by giving the following command:
perl -x myscript.pl
16
Perl Basics
This is not required when you run your scripts by directly calling the Perl interpreter but it is still recommended. All Perl statements, except the opening and closing brackets of a control structure or loop, must end with a semicolon (;). For example:
if ($node_num > 250) { print ("Illegal node number.\n"); print ("Resetting node number to 0.\n"; $node_num = 0; }
For the most part the Perl interpreter ignores whitespace (spaces, tabs, newlines, carriage returns, and formfeeds), so you can format your Perl scripts however is most convenient. For example, all of the following versions of code are identical as far as the Perl interpreter is concerned:
17
..........................................................
if ($node_num > 250) {print ("Illegal node number.\n");} or if ($node_num > 250) { print ("Illegal node number.\n"); } or if ($node_num > 250) { print ("Illegal node number.\n"); }
Perl Basics
However, although Perl ignores whitespace and formatting, these things make your programs more readable to humans, so using them is recommended. Choose a style of indentation and formatting that you nd convenient and enhances readability. Comments are indicated by the number sign (#). The Perl interpreter ignores anything on a line that follows the comment sign, so you can put comments on their own lines or on the same line as Perl code:
# Check for a legal node number if ($node_num > 250) { print ("Illegal node number.\n"); } or if ($node_num > 250) # check for a legal node number { print ("Illegal node number.\n"); # show error message }
To spread a comment across multiple lines, use a comment character for each line:
18
..........................................................
# Check to see if the node number is greater than the # maximum allowable number (250) and if it is, print an # error message informing the user if ($node_num > 250) { print ("Illegal node number.\n"); }
Perl Basics
It is strongly recommended you comment your scripts thoroughly, explaining what the script is attempting to do, the logic it is using, and how it is implemented. Doing so makes it easier for others to understand your scripts and helps you when you want to update a script later on. Perl denes true and false slightly differently than other programming languages such as C. When control structures such as if and while and logical operators such as && and || evaluate an expression, true is any nonzero or non-null value. False is the undened value, which in Perl is a null string ("") when used in a string context or the number zero ("0") when used in a numerical context. In practice, the dual denition of the undened value is very convenient, but it can cause some problems in isolated cases when you are converting programs originally written in other languages to Perl. In these cases you should evaluate all test expressions to ensure that they interpret the undened value properly. If Perl has any single distinguishing characteristic, it is that you can accomplish the same task more than one way. For example, there are three obvious ways to read a list of les specied on the command line and many more not so obvious methods. If you look at the various Perl scripts available on the internet, you will see that different programmers routinely use different techniques to accomplish
19
..........................................................
the same tasks, and for the most part which one you choose is a matter of personal preference. However, sometimes these different methods have slightly differently requirements and side-effects, so if you nd a method of doing something that works, be cautious about changing it until you have thoroughly tested the alternatives. If Perl has any secondary distinguishing characteristic, it is that a default exists for most operations. Using these defaults where applicable can simplify your scripts but also make them more difcult for others to read. This, though, is also a matter of personal preference.
Perl Basics
Naming Conventions
Perl uses the same naming conventions for variables, subroutines, and lehandles: A name can use only word characters, which are dened in Perl as being letters (both uppercase and lowercase), numbers (0 through 9), and the underscore (_) character. If the name does not start with a letter, it can be only one character long. Typically this is not signicant because most variables of this type have predened meanings (see Special and Predened Variables). Names are case-sensitive, so variable refers to a different object than Variable or VARIABLE. As a general rule, lowercase names refer to variables and subroutines, while uppercase names refer to lehandles, but this is only a matter of custom and convention, not a requirement of Perl.
20
..........................................................
With the exception of lehandles, an objects name is preceded by a single character that denes what it refers to. See Table 1:
Naming Conventions Identies scalar variable or array element array associative array subroutine Examples $string, $input_line, $number, $answer $array[1], $namelist[22], $months[3] $cards{"prc"}, $days_in_month{"oct"} @array, @namelist, @months %cards, %phone_numbers, %days_in_month &toupper, &get_input, &output_line
Perl Basics
Table 1 Symbol $
@ % &
Programmers familiar with other languages often get confused by Perls use of symbols to differentiate between different variable types, especially when accessing elements of an array. This confusion is increased by the fact that Perl allows you to use the same name for scalar variables, array variables, and subroutines. For example, $name, @name, and %name all refer to different variables with different values and structures. Furthermore, $name[0] (a single element of the @name array) and $name{0} (a single element of the %name array) are different from each other and from $name. Furthermore, &name refers to a subroutine, not a variable of any type.
21
..........................................................
Until you become comfortable with Perls naming conventions and use of variables, it is highly recommended that you use unique names for each variable and subroutine in your scripts. See Special and Predened Variables on page 48 for more information on the different types of variables and how to use them. See Using Subroutines in Perl on page 166 for information on dening and using subroutines.
Perl Basics
22
..........................................................
Unlike other programming languages, Perl does not require you to declare your variables in advance. Instead, the Perl interpreter scans your program to determine
23
..........................................................
what variables are used so it can allocate and deallocate variable space as needed. Because of this approach, Perl has no way of knowing when you have mistyped a variables name; if you mistype $datf instead of $date, Perl does not complain but instead assumes you want to use a new variable. (You can catch many of these errors, though, by using the -w command line option; see Command Line Options on page 10 for more information.) Perl also features a number of predened and special variables that your programs can access to get information such as the scripts name, the version of Perl that is being run, and so forth. See Special and Predened Variables on page 48.
Note: Filehandles are a specialized data type; see File Access in Perl for their use. Perl 5 also supports object-oriented programming and data types, but their use is beyond the scope of this manual. See the Perl reference manual (online the PanaVue workstation at http:/idDocs/scripts/perl) for more information.
24
Scalar Variables
This assignment is not enough to tell Perl whether you intended to assign the number twelve or a two-character string to $var. This becomes clear only when $var is used in an expression:
$var = $var + 1; # $var is being used as a number $var = $var . "1"; # $var is being used as a string
In the rst example above, the number one is added to $var, so Perl interprets it as numeric. In the second, the string concatenation operator (.) is used to append the character 1 to $var, so Perl treats its data as a string. In fact, both of these statements can be used in the same program. You can switch between treating a variable as a string and as a number whenever needed, and Perl interprets the variables data accordingly.
25
..........................................................
Perl supports the use of both string and numeric constants (called literals) when using scalar variables. Numeric literals do not need to be quoted when assigned to a variable (but the quotes can be used in most cases):
$number1 = 10; # $number1 contains numeric value of 10 $number2 = 235; # $number2 contains numeric value of 235 $number3 = -20; # $number 3 contains numeric value of -20 $number3 = "-20";# $number 3 still contains numeric -20
Scalar Variables
String literals do not need to be quoted as long as they do not contain any whitespace characters and as long as they do not conict with any previously dened variable or keyword. Because such conicts can easily occur, it is recommended you always quote string literals:
$string1 = "John": # $string1 contains the name "John" $string2 = "Bob"; # $string2 contains the name "Bob" $string3 = "Mary and I" # $string3 contains the text "Mary and I" (including spaces)
String literals can be quoted either by single quotes () or double quotes ("). The only difference between the two types of quotes is how special characters are interpreted. When a string literal is enclosed within single quotes (), its characters are interpreted exactly as they appear, with only two exceptions: the combination of a backslash/single quote (\) is translated to a single quote the combination of two backslashes (\\) is translated to one backslash See Example 1:
26
..........................................................
Example 1 Using Literals Within Single Quotes
Scalar Variables
$var = hello; # $var contains 5 characters $var = hello\; # $var contains 6 characters, including # one single quote at end $var = hello\n; # $var contains 7 characters including # one backslash and one n $var = hello\\n;# $var contains 7 characters # (same as above because \\ = \
Double quotes are used whenever you want to specify special characters such as the newline character ("\n"). See Example 2:
Example 2 Using Literals Within Single Quotes
$var = "hello\n"; # $var contains 6 characters, including # a final newline character $var = "hello\t\n"; # $var contains 7 characters, # including a final tab and newline $var = "hello, \"Jo\""; # $var contains 12 characters, # including the name Jo in double quotes
Table 2 lists the most common special characters that can be used within double quotes. As a general rule, use double quotes for all string literals unless you do not need any of these special characters.
27
..........................................................
Table 2 Character \a \b \cX \f \n \r Special Characters in Perl (must be double-quoted) Description Bell (ASCII 7) Backspace (ASCII 8) Control Character (where X is any letter from A-Z) Formfeed (ASCII 12) Newline (ASCII 10) Carriage Return (ASCII 13) Character \t \0xx \xff \\ \" Description Tab (ASCII 9) Any octal value between \000 and \0377 Any hexadecimal value between 0x00 and 0xff Backslash Double Quote
Scalar Variables
Perl stores all numeric data in a double-precision oating-point format, but you can use whatever numeric format is most convenient when assigning numbers to scalar variables. See Table 3:
28
..........................................................
Table 3 Format x -x x.xxx1 -x.xxx xExx2 xE-xx -xExx -xE-xx 0xxx -0xxx Allowable Numeric Data Formats Description Integer notation Decimal notation Examples of Use $word_count = "2"; $days_of_year = 365; $price = 1.25; $ratio = "0.114"; $overdraft = "-1.92" $byte = 2E8; # 2x10**8 $avogardo="6.023E23"; #Avogardos number $num = "-3.1E-23"; # -3.1x10**-23 $num = "-31E-24"; # same number as above $EOL = 0015; # decimal value is 13 (cannot use quotes when specifying octal) $byte = 0377; # decimal value is 255 $byte = -0377; # decimal value is -255 $byte = 0xFF; # decimal value is 255 (cannot use quotes when specifying hexadecimal) $word = 0xFFFF; # decimal value is 65535 $word = -0xC000; # decimal value is -49152
Scalar Variables
Octal notation
0xFF -0xFF
Hexadecimal notation
1.
2.
Since Perl stores all numeric values in the same double-precision floating-point format, assigning a value of "1.00" does not provide a greater degree of precision than assigning a value of "1". The use of quotes is optional when specifying numbers in exponential notation, but quotes cannot be used when specifying numbers in hexadecimal or octal notation.
29
..........................................................
Perl itself does not have any limitations as to number size or string size, except whatever limitations are imposed by the computer hardware and operating system. For all practical uses on the PanaVue workstation, strings and arrays have no limitations, but numbers are limited to a maximum of 14 signicant digits to the right of the decimal sign in exponential notation. For example, you might try to set the value of pi to 30 signicant digits using the following script, but Perl still prints out 14 digits to the right of the decimal point:
$pi = 3.1415926535897932384626433832795; print "The value of pi is: $pi\n"; # prints out The value of pi is: 3.14159265358979
Scalar Variables
30
.......................................................... Arrays
Perl supports arrays in much the same manner as other programming languages, except that you do not have to declare the array and its size in advance. Perl automatically grows and shrinks the array as elements are added and taken from it. Perls arrays have the following additional characteristics: An array variable starts with an at-sign (@) but other than that, it can use any combination of word characters (letters, numbers, and the underscore) as part of its name. Variable names are case-sensitive, so @var, @VAR, and @Var are all different arrays. Elements of the array are referenced by putting a dollar sign ($) to the front and a subscript within square brackets ([]) to the back of the variable name, creating a new form of scalar variable. The rst element of any array is always numbered 0 (zero), so the rst elements of the above arrays are $var[0], $VAR[0], and $Var[0].
Arrays
Note: Do not confuse the scalar variables used to access arrays with other scalar variables that have the same name. The scalar $var is totally independent of $var[0] or $var[1], which are used to access elements in the @var array. To avoid such confusion, it is recommended that you use different names for scalar variables and arrays.
Like scalar variables, elements of an array can contain either numeric or string data. The elements in an array do not have to have the same type of data; some elements can contain numeric data, other elements can contain strings.
31
..........................................................
The easiest way to assign values to an array is with a list, which can be either another array, a set of scalars (literals or variables) within parentheses, or a function or command that returns a list of values. See Example 3 for examples of each method:
Using a List to Add Elements to an Array
Arrays
Example 3
# Adding elements to an array using another array @array1 = @array2; # @array1 becomes an exact copy of @array2 # Adding elements to an array using a list of scalar values (the scalars can # be either literals or variables) @array1 = (1,2,3,4); # array1 contains four numbers @array2 = (1,"two",3,"four"); # array2 contains numbers and strings @array3 = ($a, $b, $c, $d); # array3 contains the values contained # in the four scalar variables # Adding elements to the front or end of an existing array by including the # array within the list @array1 = (0,@array1,99); # 0 is added to beginning, 99 to end # Adding elements to an array using the output of a function (in this case, # the split command, which takes a line of input and breaks it into individual # words that are returned in a list) $input = "this is a line of input"; # typical input line @array1 = split($line); # @array1 now contains six elements (words) # the above two lines are equivalent to doing the following: @array1 = ("this","is","a","line","of","input");
You can also assign scalar values to the individual elements of an array. See Example 4, where the rst four elements of the @array are assigned strings:
32
..........................................................
Example 4 $array[0] $array[1] $array[3] $array[4] = = = = Adding Individual Elements to an Array "first element"; "second element"; "third element"; "fourth element";
Arrays
The end of the array is indicated by the rst element containing the undened value (a null string or the number 0). This makes it easy to use loops to access all elements in an array, by testing each element until an undened value is found. Example 5 shows one way all of the elements in an array could be printed, using a while loop that stops only when it reaches an array element that does not have any data in it:
Example 5
Printing the Contents of an Array # # # # set index for first element as long as array has data print the array element point to next element
Note: See Control Structures and Loops for an explanation of loops such as the while loop shown above.
Another way to nd the number of elements in an array is to assign the array to a scalar variable, which then contains the length of the array:
33
..........................................................
$length = @array; # put # of array elements in $length
Arrays
Since the rst element of an array is indexed by zero, you must subtract 1 from the length to get the index of the last element of the array. For example, the code in Example 6 assigns the contents of the last element of the @array into the $last_element scalar variable:
Example 6 Getting the Last Element of an Array
$length = @array; # $length = number of elements $last_element = $array[$length - 1]; # get last element
The most efcient way of accessing the last element of an array is by using the special variable $#array, which is the index number of the last element in @array. Perl automatically changes $#array whenever the array size changes, so it can always be used to access the last element of the array:
Getting the Last Element of an Array with $#array
Example 7
$last_element = $array[$#array];
To add additional elements, simply assign a value to the element at the end of the array; Perl grows the array automatically. As shown in Example 8, the length of an array makes a convenient subscript for adding on new elements to the array:
34
..........................................................
Example 8 Adding a New Element to An Array
Arrays
$length = @array; # $length = number of elements $array[$length] = new data; # add a new element $array[$length+1] = more data; # add another new element
When new elements are added to an array, the length of the array automatically increases, so this technique can be used repeatedly in loops. See Example 9:
Example 9 Adding Multiple New Elements to An Array
while ($input = <STDIN>) { # get new line from STDIN $length = @array; # get current length of array $array[$length] = $line; # put input at end of array } # do this until input ends
Note: See Control Structures and Loops for an explanation of loops such as the while loop shown above. See File Access in Perl for an explanation about using STDIN.
The routine shown in Example 9 reads a line of input from the standard input device (STDIN, usually the users keyboard) and then adds the line to the end of an array. Each time an element is added to the array, its length increases, so each time the while loop executes, the value of the $length variable increases by one.
Note: Elements can also be added to (or removed from) an array using the array operators listed in Table 4, below.
35
..........................................................
A number of operators can be used on arrays. The most common ones are shown in Table 4:
Array Operators (1 of 2) Description Removes the last character from each element in the array: @array = ("one","two","three"); chop(@array); # @array now = ("on","tw","thre") Adds one or more new entries to the end of an array: @array = (1,2,3); push(@array,4,5,6); # @array now = (1,2,3,4,5,6) Removes and returns the last entry at the end of an array: @array = (1,2,3,4,5,6); $last_element = pop(@array); # $last_element = 6 # @array = (1,2,3,4,5) Returns an array in reverse order, leaving the original array unchanged: @array1 = (1,2,3,4,5,6); @array2 = reverse(@array1); # @array1 is unchanged # @array2 = (6,5,4,3,2,1)
Arrays
push
pop
reverse
36
..........................................................
Table 4 Operator sort Array Operators (2 of 2) Description Returns an array sorted in ascending ASCII order, leaving the original array unchanged: @array1 = (1,"one",2,3,"four",10,20); @array2 = sort(@array1); # @array1 is unchanged # @array2 = (1,10,2,20,3,"four,"one") Note: You can change the sort order by specifying your own sort routine to be used with the sort operator. See the Perl documentation (http:/idDocs/scripts/perl) for details. shift Removes and returns the rst element of an array: @array = (1,2,3,4,5,6); $rst_element = shift(@array);# $rst_element = 1 # @array = (2,3,4,5,6) Adds one or more elements to the beginning of an array: @array = (1,2,3,4,5,6); unshift(@array,"a","b","c"); # @array = ("a","b","c",1,2,3,4,5,6)
Arrays
unshift
Note: The push and pop operators add and remove elements from the end of an array. The unshift and shift operators add and remove elements from the beginning of an array.
37
Associative Arrays
Associative arrays are useful whenever you want to associate two arbitrary types of data with one another, such as a phone number with a persons name or a node number with its physical location. Perl optimizes its storage and handling of associative arrays, so they are the fastest way possible in Perl to store and retrieve data like this. Associative arrays have the following characteristics: Associative arrays are prexed by a percent sign (%) instead of the at-sign (@) used by regular arrays. Other than that, an associative array can use any combination of word characters (letters, numbers, and the underscore) as part of its name.
38
..........................................................
Variable names are case-sensitive, so %var, %VAR, and %Var are all different associative arrays. Elements of an associative array are referenced by putting a dollar sign ($) to the front and a key word within curly brackets ({ }) to the back of the variable name, creating a new form of scalar variable. Typical elements of the above arrays could be $var{"robert"}, $VAR{"1-212-555-1212"}, or $Var{23}. Like other scalar variables, the elements of an associative array can contain any scalar value, numeric or string. The elements in an associative array do not have to have the same type of data; some elements can contain numeric data, other elements can contain strings. The keys used to dene elements of an array can have any scalar value, numeric or string. The keys are case-sensitive, so $var{"key"} refers to a different element than $var{"KEY"}. The easiest way to assign values to an associative array is with a list, which can be either another array, a set of scalars (literals or variables) within parentheses, or the a function or command that returns a list of values. Unlike regular arrays, though, the input list for an associative array must be properly ordered into pairs, where the rst scalar is the key that is used to access the second scalar. Example 10 gives examples of each way that elements can be added to an associative array:
Associative Arrays
39
..........................................................
Example 10 Using a List to Add Elements to an Associative Array
Associative Arrays
# Adding elements to an associative array using another associative array %array1 = %array2; # %array1 becomes an exact copy of %array2 # Adding elements to an associative array using a list of ordered scalar values # (the list must be composed of key/value pairs) %array1 = (1,2,3,4); # %array1 contains 2 elements # $array{1} = 2, $array{3} = 4 %array2 = (1,"two",3,"four");# %array2 contains 2 elements # $array{1} = "two", $array{3} = "four" %array3 = ($a, $b, $c, $d); # %array3 contains 2 elements # $array{$a} = $b, $array{$c} = $d # Adding elements to an array using the output of a function (in this case, # the split command, which takes a line of input and breaks it into individual # words that are returned in a list) $input = "john x7990 jill x5917 joan x6134"; # typical input line %extens = split($line); # %extens now contains three elements: # $extens{"john"} = "x7990" # $extens{"jill"} = "x5917" # $extens{"joan"} = "x6134"
To assign scalar values to an individual elements of an associative array, use the appropriate key. If you specify a new key, a new element is added to the array; if you specify a previously used key, that elements previous value is replaced. See Example 11:
40
..........................................................
Example 11 Adding and Modifying Elements of an Associative Array
Associative Arrays
#!/usr/thirdParty/perl/bin/perl -w # Initialize array with two elements %computers = ("robert","pc","judy","mac"); # Add new elements and modify existing ones $computers{"jerry"} = "sparc5"; $computers{"linda"} = "sparc20"; $computers{"robert"} = "sparc20"; $computers{"judy"} = "powerbook"; print %computers; # # # # add new element add new element modify element modify element
Note: Perl orders associative arrays in the most efcient internal format for the given keys and computer system. If you print out an associative array as shown in Example 11, you cannot easily predict which elements will be printed rst.
To delete an element from an associative array, use the delete operator on the element to be deleted:
delete $computers{"robert"}; # this entry no longer exists
Using a previously unknown key with an associative array returns the undened value. If you were to take the array dened in Example 11 and access $computers{"william"}, you would get an undened value. No indication is given that you have used a previously unknown key, which is inconvenient if you want to access only current elements of the array.
41
..........................................................
Perl offers a way around this with its keys operator, which returns all of the keys used for a particular associative array, which you can then use to access all of the elements in that associative array. The keys operator returns the keys in the same order that they are used in the associative array at that time, so there is no guarantee as to how the keys are ordered. However, since the keys operator returns the keys as a regular array, you can access all of the elements in the associative array by manipulating the regular array created to hold all of the keys. See Example 12 for one way to do this:
Example 12 Printing the Contents of an Associative Array
Associative Arrays
%cars = ("alan","ford","jill","toyota","jack","chrysler"); @keys = foreach print } # end keys(%cars); # @keys now = ("alan","jill","jack") $key (@keys) { "$key drives a $cars{$key}\n"; foreach
Note: See Control Structures and Loops for an explanation of loops such as the foreach loop shown above.
The code shown in Example 12 can be shortened by eliminating the use of the @keys array, as shown in Example 13:
42
..........................................................
Example 13 Printing the Contents of an Associative Array (modied)
Associative Arrays
%cars = ("alan","ford","jill","toyota","jack","chrysler"); foreach $key ( keys(%cars) ) { print "$key drives a $cars{$key}\n"; } # end foreach
When the keys operator is used in a scalar context, it returns the number of keys found, which gives you the number of elements in an associative array:
Finding the Number of Elements in an Associative Array
Example 14
The counterpart to the keys operator is the values operator, which returns a list of element values found in an associative array. The list provided by values operator is in the same order as the one provided by the keys operator, so you could use both to print the associative array. See Example 15:
Printing the Values of an Associative Array
Example 15
%cars = ("alan","ford","jill","toyota","jack","chrysler"); @car_type = values(%cars); # create array of values @driver = keys(%cars); # create array of keys $index = 0;
43
..........................................................
while ($car_type[$index]) { print "$driver[$index] drives a $car_type[$index++]\n"; } # end while
Associative Arrays
The most efcient method to loop through an associative array is to use the each operator, which returns key/value pair each time it is used. When each reaches the end of the associative array, it returns the undened value, which makes it a perfect match for use with the while command.
Printing the Values of an Associative Array Using each
You can convert between an associative array and a regular array by assigning the one to another.
Converting Between Associative and Regular Arrays
Example 17
# Converting Associative Array to a Regular Array %cars = ("alan","ford","jill","toyota","jack","chrysler"); @cars = %cars; # @cars has six separate elements # Converting Regular Array to an Associative Array @books = ("camel","Programming Perl", "llama","Learning Perl",
44
..........................................................
"rhino","Javascript", "koala","HTML, the Definitive Guide"); %books = @books; # %books now has four elements, indexed # by the type of animal on the cover
Associative Arrays
Note: When converting a regular array to an associative array, the associative array interprets the even elements of the array (those with index numbers 0, 2, 4, and so forth) as the keys and the odd elements (those with indexes 1, 3, 5, and such) as the element values. Be certain this is what you want before using this technique.
45
Table 5 summarizes the major characteristics of Perls major variable classes (as used in this table, string can be any combination of word characters (letters, numbers, and the underscore), and number must contain only the digits 0 through 9):
Table 5 Comparing Perls Variable Types (1 of 2) Description Scalar variable Regular array Regular array element Examples $var, $VAR, $Var @array, @ARRAY, @Array $array[0], $ARRAY[0], $Array[0]
46
..........................................................
Table 5 Comparing Perls Variable Types (2 of 2) Description Associative array Associative array element Examples %array, %ARRAY, %Array $array{2.54}, $ARRAY{Bill}, $Array{/opt/Panavue/reports}
47
$] $! $|
48
..........................................................
Table 6 Variable Commonly Used Special and Predefined Variables (2 of 3) Description
Variables used for operating system access $0 $$ $< $> $( $) The name of the currently executing Perl script. The Unix process ID of the currently executing Perl program. The real user ID (uid) of the currently executing process. The effective user ID of the currently executing process. The real group ID (gid) of the currently executing process. The effective group ID of the currently executing process.
Variables used for file access (see File Access in Perl) $ARGV @ARGV The name of the current le when using the diamond operator (<>) for input. An array containing the scripts command line arguments.
Variables used for regular expressions (see Using Regular Expressions in Perl) $& $ The string matched by the last successful match. The string preceding the string that was last matched.
49
..........................................................
Table 6 Variable $ $1 ... $9 Commonly Used Special and Predefined Variables (3 of 3) Description The string following the string that was last matched. Represents the appropriate subpatterns when matching and substituting regular expressions.
Note: Table 6 shows the shortcut form of these special variables. Most of these variables also have an English name that is more descriptive. For example, $_ can also be referred to as $ARG. See the Perl documentation (http:/idDocs/scripts/perl) for these alternative names.
50
..........................................................
Note: Perl offers a number of ways to read and write binary information to les. See the descriptions of the read, seek, sysread, and syswrite commands in the online Perl documentation (http:/idDocs/scripts/perl).
If you are writing CGI scripts, also see Problems with Buffering of Output on page 69 for information on a potential problem with how Perl buffers its output for such scripts.
51
The FILEHANDLE must have been previously opened for writing; if not, print returns "0" to indicate it failed to output the list. If FILEHANDLE is not given, print uses the default output device (which is STDOUT, unless changed by the select command).
Note: See File Access in Perl for a discussion of both lehandles and STDOUT.
The list can contain any or all of the following: literals, scalar variables, arrays, associative arrays, and the output from functions, commands, and expressions. Strictly speaking, all items in the list should be separated by Perls list operators (commas), as shown in Example 18:
Example 18 Using the print Statement (strict style)
$name = "Roger"; $car = "Ford"; $color = "blue"; print "\n",$name," owns a ",$color," ",$car,"\n"; # prints Roger owns a blue Ford
52
..........................................................
This example prints a new line and then the text Roger owns a blue Ford, followed by another new line. Fortunately, Perl allows variables to be placed inside the quotes, so this print statement could be more easily written as follows:
Example 19 Using the print Statement
$name = "Roger"; $car = "Ford"; $color = "blue"; print "\n$name owns a $color $car\n"; # prints Roger owns a blue Ford
The only time you must use commas is if you put a function or expression in the print statement that must be evaluated before being printed. For example, if you wanted to print the value of from within a Perl program, you might think you can use the statement shown in Example 20:
Example 20 Printing the Value of (incorrect)
However, this statement just prints The value of pi is atan2(1,1)*4, without evaluating the expression but just treating it as a string of characters. To force Perl to evaluate the expression, move it out of the double quotes and delimit it with commas. See Example 21:
53
..........................................................
Example 21 Printing the Value of (correct)
This line correctly prints The value of pi is 3.14159265358979. With few exceptions, if you want the print statement to display the output of a command or expression, you must move the command or expression outside double quotes and separate it from the rest of the list using commas. Since arrays are just another type of list, you can specify them in the print command. Each element of the array is then printed in sequence, without any separation. See Example 22:
Example 22 Printing an Array
54
The FILEHANDLE must have been previously opened for writing; if not, printf returns "0" to indicate it failed to output the list. If FILEHANDLE is not given, printf uses the default output device (which is STDOUT, unless changed by the select command).
Note: See File Access in Perl for an explanation of lehandles and the STDOUT device.
The list is the same as that used in the print command (see Using the print Command on page 52), but it is formatted according to the format specication, which contains one or more of the format types shown in Table 7:
Table 7 Format %c %d %e %f printf Format Types Description print the list item as a single character print the list item as a decimal character print the list item as an exponential oating-point decimal number print the list item as a xed point oating-point decimal number
55
..........................................................
Table 7 Format %g %ld %lo %lu %lx %o %s %u %x printf Format Types Description print the list item as a compact oating-point decimal number print the list item as a long decimal number print the list item as a long octal number print the list item as a long unsigned decimal number print the list item as a long hexadecimal number print the list item as an octal number print the list item as a string print the list item as an unsigned decimal number print the list item as a hexadecimal number
As shown, most of the format types are for numeric data, but if the output list also contains string or character data, you must use a corresponding number of %s and %c format speciers. You can also specify the minimum and maximum sizes of each item with a format like %m.nx, where m species the minimum length of the eld and n species the maximum length (except with exponential formats, which use n to specify precision). If the output is too long for the given format, it is truncated (or rounded if numeric). If the output is too short to ll the maximum length, it is right justied, using either
56
..........................................................
spaces or zeroes, as appropriate. Example 23 shows how a number of different formats would print the value of :
Example 23 Examples of printf Formatting
printf "%s%d%s","The value of pi is ",atan2(1,1)*4,"\n"; # prints "The value of pi is 3" printf "%s%3.5d%s","The value of pi is ",atan2(1,1)*4,"\n"; # prints "The value of pi is 00003" printf "%s%e%s","The value of pi is ",atan2(1,1)*4,"\n"; # prints "The value of pi is 3.141593e+00" printf "%s%6.10e%s","The value of pi is ",atan2(1,1)*4,"\n"; # prints "The value of pi is 3.1415926536e+00" printf "%s%f%s","The value of pi is ",atan2(1,1)*4,"\n"; # prints "The value of pi is 3.141593" printf "%s%8.9f%s","The value of pi is ",atan2(1,1)*4,"\n"; # prints "The value of pi is 3.141592654" printf "%s%g%s","The value of pi is ",atan2(1,1)*4,"\n"; # prints "The value of pi is 3.14159"
When using printf, you must supply the proper number of formatting speciers because Perl ignores any list items that do not have a corresponding format. If an
57
..........................................................
incorrect format is given (such as an exponential format for string data), Perl tries its best to interpret the list item in that format, with unpredictable results.
58
where FILEHANDLE is a lehandle for a le that has been previously opened for writing; if no FILEHANDLE is listed, write uses the default output device (which is STDOUT, unless changed by the select command).
Note: See File Access in Perl for an explanation of lehandles and the STDOUT device.
The write command does not take a list of items to be printed, as the print and printf commands do. Instead, the write command outputs one or more lines in a predened format. This format must have the same name as the FILEHANDLE being used and includes the list of variables and expressions to be printed. A format is dened with the format command, as shown in Example 24:
Example 24 Dening a Format for the write Command # # # # # formatname must = filehandle name contains formatting information contains list to use in above line contains formatting information contains list to use in above line
59
..........................................................
# begin with a period and have # nothing else on the line
Each eldline describes one line of output, and there can be as many eldlines as desired. A eldline can contain literals, variables, or both. A variable-list must follow each eldline that prints variables. The last line of the format must contain only a single period as its rst and only character. Each eldline can contain spaceholders, eldholders, or both. A spaceholder is text that is reproduced exactly as it appears and can include anything except the special characters used in eldholders. A eldholder represents the space taken up by a variable and determines the output format of that variable. The number of eldholders in a eldline must match the number of variables in a variable-list on the next line. The eldholders can be surrounded by any number of spaceholders, but there must be a one-to-one correspondence between the eldholders on one line and the variables on the next. The eldholder determines the output format of its corresponding variable, using the basic set of format types given in Table 8:
60
..........................................................
Table 8 Format @<<<<< Fieldholder Formats Description Left-justified fixed-length field. The length of output is determined by the number of less-than signs (<) plus 1 for that at-sign (@). If the variable requires less space, the output is padded on the right with space; if the variable requires more space, its output is truncated. Right-justified fixed-length field. The length of output is determined by the number of greater-than signs (>) plus 1 for that at-sign (@). If the variable requires less space, the output is padded on the left with spaces; if the variable requires more space, its output is truncated. Center-justified fixed-length field.The length of output is determined by the number of pipe signs (|) plus 1 for that at-sign (@). If the variable requires less space, the output is padded on both sides to ll out the eld and keep it center; if the variable requires more space, its output is truncated. Fixed-precision numeric field. The minimum number of digits before the decimal point is determined by the at-sign (@) and the number of number signs (#) before the decimal point. If the number has more digits, they are still printed; if the number has fewer digits, the number is padded on the left with spaces. The number signs (#) after the decimal point determines the precision of output. If the number has more digits than this after the decimal point, the number is rounded up or down. If the number has fewer digits, it is padded with zeroes on the right.
@>>>>>
@|||||
@####.##
61
..........................................................
For example, Example 25 shows a script that writes a very simple report to a le named phone.lst (see File Access in Perl for information about opening and closing les). A format named PHONES is dened for this report, and it species that each line in the report contains a name and the corresponding phone number; the name of each person is allocated 11 spaces and the phone number is allocated eight spaces. Spaceholders include the text Name: and Extension: , which precede the two eldholders.
Example 25 Sample Report
%phones = ("Alan","5990","Betty","4301", "Bob","6100","Cathy","7132", "Doris","2395","Edith","6175", "Frank","5986","Geena","5810", "Harry","6323","Jane","7788"); open(PHONES,">phone.lst") || die("Could not open phone.lst for writing.\n"); foreach $name (sort(keys %phones)) { write PHONES; } close(PHONES); format PHONES = Name: @<<<<<<<<<< $name, . Extension: @<<<<<<< $phones{$name}
62
..........................................................
Note: As shown in Example 25, the last line of a format denition must contain only a single period. No other characters can appear on this line, including comments; otherwise, Perl generates an error message.
The simple script shown in Example 25 used an array dened within the script as its data source. Typically, though, reports involve reading a data le of some type and extracting the desired information from it, putting that information into the proper variables, and using the write command to generate a formatted report. Although sometimes the data from a le can t on just one line, it often must be split across multiple lines. You can accommodate this data by using a caret (^) instead of an at-sign (@) when specifying your formats; repeat the format specication on as many lines as needed.
63
..........................................................
For example, to update the PHONES report shown in Example 25 so it includes a persons home address, you could rewrite the PHONES format as follows:
Example 27 Format Specication that Accommodates Multiline Information
format PHONES = Name: @<<<<<<<<<< Extension: @<<<<<<< $name, $phones{$name} Address: ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $address ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $address ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $address .
When Perl uses this format, it tries to t the entire $address variable in the rst eldline set aside for it. If the address cannot t, Perl ts as much of it as possible and then splits the address at the closest word break and tries to t it in the second line. If the address is still too long, this process is repeated with the third line. If the address is still too long for all three lines, it is truncated. If the address does not take up all of the lines set aside for it, Perl prints out the remaining lines as blank lines.
Note: In addition to the left-justied (<) operator shown above, you can also use the centering (|) and right-justied (>) operators when using the caret (^) operator.
64
..........................................................
If the above formatting were used for the PHONES report, typical output could be the following:
Example 28 Sample Multiline Report Output
Name: Jane Extension: 7788 Address: 400 Apple Tree Lane #220, Fremont, CA 94555 (this line is actually a blank line in the report)
To suppress any extra blank lines, put the tilde (~) anywhere in the eldline. For example, the following format prints out a second and third line for the address only if the address actually requires them:
Example 29 Format Specication that Suppresses Blank Lines
format PHONES = Name: @<<<<<<<<<< Extension: @<<<<<<< $name, $phones{$name} Address: ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $address ~^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $address ~^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $address .
65
..........................................................
Outputting Variables Containing Multiple Lines The caret (^) operator uses whitespace characters (the space, tab, newline, carriage return, and formfeed characters) to determine where to break a variables text into words so it can be split across lines. In doing so, all newline characters, except those that actually end lines of the formatted output, are converted into spaces. To preserve the newline characters in your input text, use the multiline operator (@*) in your formats. This operator should be the last eldholder in its line, and the next line must contain the scalar variable or array that contains the multiline data. For example, to preserve the formatting of the $address variable used above, you could rewrite the PHONES format as follows:
Example 30 Multiline Format that Preserves Newline Characters
Note: Any spaceholders appearing on the same line as "@*" are printed only on the rst line of output; succeeding lines contain only the actual data lines being output.
This format produces output similar to the following (depending on the contents of the $address variable):
66
..........................................................
Example 31 Typical Multiline Output
Name: Jane Extension: 7788 Address: 400 Apple Tree Lane #220, Redwood City, CA, 94063
For each report format you dene, you can also dene a top-of-page format that is automatically printed whenever a new page is generated. This is possible because whenever you use the write command, Perl counts the number of lines that are actually output; when that number reaches the number of lines specied in the special page length variable ($=), Perl prints the top-of-page format. The top-of-page format is dened the same way as any other format, except that its name must have the form FORMAT_TOP, where FORMAT is the name of a previously dened report. For example, a top-of-page format for the PHONES report dened earlier could be the following:
Example 32 Dening a Top-of-Page Format
format PHONES_TOP = Phone Number List for @<<<<<<<<<<<<<<<<<<<<<<< Page @< date $% ------------------------------------------------------.
67
..........................................................
Note: The backticks () are used in Example 32 to get the output from the Solaris date command. See the online Perl reference guide (http:/idDocs/scripts/perl) for more information about executing system commands.
This produces a page header similar to the following for each page of the report:
Example 33 Typical Page Header Output
Phone Number List for Thu Oct 17 15:37:31 PDT Page 1 ------------------------------------------------------
As shown in Example 32, you can print the page number by using the special page numbering variable ($%). Perl automatically increments this variable each time a new page is printed.
68
This difference usually does not matter except when CGI scripts use the system command in addition to the print, printf, and write commands. A common mistake is for a script to use the print command to generate the required HTML headers and then use system to run another command that displays its own output. Because the print commands output is buffered, the system command will end up sending its output out rst, before the HTML headers, resulting in an error message from the web server. For example, Example 34 shows a script that attempts to use the Solaris date command to display the current date and time. However, although this script works at the command line, it fails as a CGI script because the print output is buffered and the HTML headers are sent out after the output from the date command.
69
..........................................................
Example 34 Script with Buffering Problems
#!/usr/bin/perl print print print print "Content-type: text/html\n\n"; "<HTML><HEAD><TITLE>Current Date and Time</TITLE></HEAD>\n"; "<BODY>\n"; "<PRE>\n";
You can simulate what happens with this CGI script by running it as part of a pipe from the command line. For example, if this script is named show_date.pl, give the following command at the command line:
show_date.pl | cat
Ths produces output similar to that shown in Example 35 (note that the output from the date command is shown before that of the print commands, even though the print commands are run rst in the script).
Example 35 Simulating a CGI Scripts Pipe Behavior
% ./test.flush.pl | cat Mon Jun 15 10:43:13 PDT 1998 Content-type: text/html <HTML><HEAD><TITLE>The Current Date and Time</TITLE></HEAD>
70
..........................................................
<BODY> <PRE> </PRE> </BODY></HTML>
This problem can be solved by setting a special variable ($| or $OUTPUT_AUTOFLUSH) to any non-zero value to force line-buffering of the print, printf, and write commands. Example 36 shows the same script, but now because $| is set to a nonzero value it will work as a CGI script.
Example 36 Script without Buffering Problems
#!/usr/bin/perl $| = 1; # force line-buffering of print commands # You could also use $OUTPUT_AUTOFLUSH = 1 print print print print "Content-type: text/html\n\n"; "<HTML><HEAD><TITLE>Current Date and Time</TITLE></HEAD>\n"; "<BODY>\n"; "<PRE>\n";
71
..........................................................
72
.......................................................... Assignment
The basic assignment operator is the equal sign (=). In a simple assignment, the variable that is on the left side is assigned the value of whatever is on the right (which can be a numerical literal, a string literal, a set of values, another variable, or the output of a subroutine):
Example 37 Simple Assignments
Assignment
$number = 4; $string = "Hello, world"; @array = (1,2,3,4); $var1 = $var2; $answer = &get_answer;
Typically, the equal sign is combined with one of the operators shown in Table 9. The examples given in the table use numeric and string literals (such as 1 and "dog"), but variables can be used as well.
Table 9 Operator Assignment Operators (1 of 3) Description Examples
73
..........................................................
Table 9 Operator / ** % Assignment Operators (2 of 3) Description division exponentiation modulo division Examples $a = 12/4; # $a is now 3 $a = 3**4; # $a is 81 (3 to the fourth power) $a = 10%3; # $a is 1 (the remainder of 10/3)
Assignment
Binary (Bitwise) Operators (you should be familiar with binary numbers and binary logic to use these operators) & | ^ ~ >> << bitwise AND (binary AND) bitwise OR (binary OR) bitwise XOR (binary XOR) bitwise NOT (binary negation) bitwise shift right (binary shift right) bitwise shift left (binary shift left) $a = 8 & 2; # $a is 0 $a = 20 & 4; # $a is 4 $a = 8 | 2; # $a is 10 $a = 20 | 4; # $a is 20 $a = 8 ^ 2; # $a is 10 $a = 20 ^ 4; # $a is 16 $a = ~8; # $a is 4294967287 (the 32-bit ones complement of 8) $a = 15>>2; # $a is 3 $a = 4>>3; # $a is 0 $a = 15<<2; # $a is 60 $a = 4<<3; # $a is 32
74
..........................................................
Table 9 Operator Assignment Operators (3 of 3) Description Examples
Assignment
String Operators . x string concatenation string multiplication $a = "dog" . "cow"; # $a is now "dogcow" $a = "a" . "b" . "c"; # $a is now "abc" $a = "abc" x 4; # $a is now "abcabcabcabc" $a = "moof" x 2; # $a is now "moofmoof"
Note: A common mistake in Perl programming is to use arithmetic operators on string data. Although this is allowed, it usually produces unpredictable results; at best, the string is treated as the number 0 (zero), but this is not guaranteed in all cases.
Common Shortcuts
The operators shown in Table 9 are most commonly used to modify the existing value of a variable, as is shown in Example 38:
Example 38 $number $string $var1 = $bits = Standard Assignment Notation
Because this is such a common use of these operators, Perl supports a shortcut notation for it, by appending the equal sign (=) to the operator and eliminating the second instance of the variable being changed. See Example 39:
75
..........................................................
Example 39 Shortcut Assignment Notation
Assignment
One further shortcut is with the autoincrement and autodecrement operators (++ and --), which are shorthand for the following common operations:
$a = $a + 1; $a = $a - 1;
The autoincrement and autodecrement operators can be appended either before or after the variable to which they refer. Although this is not important in the examples shown above, it becomes an important consideration in more complicated expressions because the placement determines when the increment or decrement happens. Example 40 shows four examples of using the autoincrement and autodecrement operators. In the rst two cases, the $a variable is modied before its value is assigned to $b, but the reverse happens in the nal two cases:
76
..........................................................
Example 40 Using the Autoincrement and AutoDecrement Operators
Assignment
$a = 10; $b = ++$a; # $a=11, $b=11 because increment happens before assignment $a = 10; $b = --$a; # $a=9, $b=9 because decrement happens before assignment $a = 10; $b = $a++; # $a=11, $b=10 because increment happens after assignment $a = 10; $b = $a--; # $a=9, $b=10 because decrement happens before assignment
Unlike the other arithmetic operators, the autoincrement operator also works with string variables in the following circumstances: The variable has been used only in string contexts (it has not had any arithmetic operations performed on it nor has it been assigned numeric data). The string is alphanumeric only, containing only letters and digits, but not spaces, punctuation, or any special characters.
Under these circumstances, strings are autoincremented by advancing the rightmost character by one letter or digit. When a number is incremented by 9, it rolls over back to 0; when a letter is incremented past z, it wraps around to a. In both cases, the rollover increments the character in the next column to the left. If there is no next column, a new one is created.
77
..........................................................
The script in Example 41 demonstrates this process:
Example 41 Autoincrementing a String Variable
Assignment
#!/usr/thirdParty/perl/bin/perl -w $string = "Z89"; for ($i = 1; $i <= 100; $i++) { print $string++,"\n"; }
When this script it runs, its rst dozen lines are the following:
Example 42 Z89 Z90 Z91 Z92 Z93 Z94 Z95 Z96 Z97 Z98 Z99 AA00 Output from Autoincrementing a String
78
.......................................................... Relational
As with most languages, Perl offers a wide array of relational operators that are most commonly used as part of tests for control structures such as if and while statements (see Control Structures and Loops). Perl is relatively unusual, though, in that it uses different operators for numeric and string comparisons; this is necessary because variables are not predened in advance and can be used for either data type at will. Table 10 lists the common Perl relational operators, and unless otherwise noted the operators return 1 if the expression is true and the undened value (interpreted as a zero in numerical calculations and the null string ("") in string operations) if not.
Note: A large number of operators are available for testing for the presence and type of les on the PanaVue workstation. See File Access in Perl for information on those operators.
Relational
Table 10 Operator
Numerical Relational Operators == != > numeric equality numeric inequality numeric greater than if ($a==$b) { print "$a is equal to $b"); } if ($a!=$b) { print "$a is not equal to $b"); } if ($a>$b) { print "$a is greater than $b"); }
79
..........................................................
Table 10 Operator < <= >= <=> Relational Operators (2 of 3) Description numeric lesser than numeric lesser than or equal numeric greater than or equal numeric compare Examples if ($a<$b) { print "$a is less than $b"); } if ($a<=$b) { print "$a is less than or equal to $b"); } if ($a>=$b) { print "$a is greater than or equal to $b"); } $a <=> $b returns 0 if $a == $b 1 if $a > $b -1 if $a < $b
Relational
String Relational Operators eq ne gt lt ge string equality string inequality string greater than string lesser than string lesser than or equal if ($name1 eq $name2) { print "$name1 is equal to $name2"); } if ($name1 ne $name2) { print "$name1 is not equal to $name2"); } if ($name1 gt $name2) { print "$name1 is greater than $name2"); } if ($name1 lt $name2) { print "$name1 is less than $name2"); } if ($name1 ge $name2) { print "$name1 is greater than or equal to $name2"); }
80
..........................................................
Table 10 Operator le cmp Relational Operators (3 of 3) Description string greater than or equal string compare Examples if ($name1 le $name2) { print "$name1 is less than or equal to $name2"); } $name1 cmp $name2 returns 0 if $name1 eq $name2 1 if $name1 gt $name2, -1 if $name1 lt $name2
Relational
String Transformation Operators1 =~ match or substitution found if ($name1 =~ /"John"/) { print "$name1 matches the name John.\n"; } if ($name1 =~ s/"John"/"Bob"/) { print "changed Bob for John\n"; } if ($name1 !~ /"John"/) { print "$name1 does not match the name John.\n"; } if ($name1 !~ s/"John"/"Bob"/) { print "Did not change Bob for John\n"; }
!~
See Using Regular Expressions in Perl for more information on these operators.
81
.......................................................... Logical
Like the relational operators, Perls logical operators are used primarily as part of the tests used in control structures such as if and while. These operators are commonly used when attempting to match a line of input, as shown in Table 11, but they can be used wherever an expression evaluates to either FALSE (the number 0 (zero), the string "0" (zero), or a null string ("")) or to TRUE (anything else).
Table 11 Operator || && ! Logical Operators Description logical OR logical AND logical NOT Examples if ( /Bob/ || /John/) { print "Found either Bob or John.\n"; } if (/Bob/ && /John/) { print "Found both Bob and John.\n"; } if !(/Bob/ || /John/) { print "Did not nd either Bob or John.\n"; }
Logical
82
..........................................................
Each type of structure uses curly brackets ({}) to delimit its blocks of statements. Any type of statement is allowed within these blocks, including other control structures.
83
if/else Command
The if statement evaluates the expression within the parentheses and if it evaluates to a true value (non-zero and non-null), the rst block of statements is executed. If the expression evaluates to the undened value (either "" or "0"), the second block of statements (the else statement) is executed. For example, the if/else clause in Example 44 determines which line to print on the basis of whether two variables are equal or not:
Example 44 Sample if/else Statement
if ($a == $b) { print "The two variables are equal.\n"; } else { print "The two variables are NOT equal.\n"; }
84
..........................................................
The else statement is optional; if it does not exist, the if statement simply falls through to the rest of the script if the expression inside its parentheses is not true. Because you sometimes have to test for more than one possible value, the else statement can also have the form elsif, which is shorthand for else if. Example 45 shows an if statement that has two elsif clauses and one else clause:
Example 45 Sample if/else Statement
if/else Command
if ($a > $b) { print "$a is greater than $b.\n"; } # end if elsif ($a < $b) { print "$a is less than $b.\n"; } # end elsif elsif ($a == $b) { print "The two variables are equal.\n"; } # end elsif else { print "This line should never be printed because all "; print "possibilities have already been accounted for.\n"; } # end else
Note: Example 45 gives an example of testing for a possibility that should never occur, in this case $a not being equal to, less than, or greater than $b. It is recommended you get into the habit of testing for all situations unless you are absolutely certain of your programs logic.
85
unless/else Command
unless ($a == $b) { print "The two variables are NOT equal.\n"; } else { print "The two variables are equal.\n"; }
The choice of which type of control structure to use is a matter of personal preference, although unless/else is typically used when you are looking for a special case and want to exclude the vast majority of alternative possibilities. For example, if you wanted to compare two arrays to see if they are identical, you could use unless/else to rst exclude all arrays that do not have the same length. This avoids having to do an element-by-element comparison of what could be very lengthy arrays unless there is a chance that the two arrays would match. Example 47 gives one possible way that this could be done using unless/else statements:
86
..........................................................
Example 47 Comparing Two Arrays
unless/else Command
unless (@array1 == @array2) { print "The two arrays are not the same length.\n"; } # end unless else { $array1 = join("",@array1); #convert arrays to strings $array2 = join("",@array2); unless ($array1 cmp $array2) { print "The two arrays are equal.\n"; } else { print "The two arrays are not equal.\n"; } # end else } # end else
87
while Command
#!/usr/thirdParty/perl/bin/perl -w while ($_ = <STDIN>) { # get input from the user print $_; # print it back to the user } # end while
Note: See File Access in Perl for information about using STDIN to get input from the user.
Be cautious in devising the expression that controls the while loop if the expression immediately evaluates to false, the while loop is never executed; the program just ignores the statements in the while loop and continues on with the rest of the
88
..........................................................
program. Conversely, if the expression never evaluates to false, the while loop becomes an innite loop. Example 49 shows an example of each type of loop:
Example 49 Poorly Written while Loops
while Command
# Example of an while loop that is never executed $a = 0; # set $a to zero while ($a > 0) { print "$a "; # these lines are never executed $a = $a + 1; } # Example of an endless while loop $a = 1; # set $a to 1 while ($a > 0) { # this loop continues forever print "$a "; $a = $a + 1; }
The rst while statement executes its block of statements only when $a is greater than zero. Since $a is initialized to zero, this test fails and the while loops statements are never run. Similarly, the second while statement shows a loop that never ends because the statement being executed within the block makes sure that $a is always greater than zero. When devising your own while loops, be sure that they do not get caught in endless loops such as this.
89
for Command
The initial-expression can be either any valid Perl expression but typically it is an expression that sets a variable of some type to a known value. The test-expression must evaluate to true (i.e. non-zero and non-null) before the block of statements can be executed; typically it involves some form of the initial-expression. The increment is an expression that is executed each time through the for loop, and typically it involves changing the initial-expression. For example, the script shown in Example 51 reads any number of lines from STDIN and puts each line into an array. A for loop is then used to print out each line.
Example 51 Example Use of the for Statement
#!/usr/thirdParty/perl/bin/perl -w @array = <STDIN>; # read all lines at once for ( $i = 0; $array[$i]; $i++ ) { print "Line $i is: $array[$i]\n"; } # end for
90
..........................................................
Note: See File Access in Perl for information about using STDIN to get input.
for Command
In Example 51 the test-expression was simply $array[$i], which evaluates to true when that array element contains any data. However, when the end of the array is reached, the array elements contain the null string (""), so this expression evaluates to false and the for loop ends. Like the while loop, the test-expression in the for statement is evaluated at the beginning of the loop; if the test-expression evaluates to false immediately, none of the lines in the loop are ever executed. Similarly, if the test-expression never evaluates to false, the for loop continues forever. Example 52 shows an example of both types of for loops; these examples are the same shown in Example 49 (page 89), except that the while loops in the previous example have been rewritten as for loops.
Example 52 Poorly Written for Loops
# Example of a for loop that is never executed for ($a = 0; $a > 0; $a+1) { print "$a "; # this statement is never executed } # Example of a for loop that never ends for ($a = 1; $a > 0; $a+1) { print "$a "; # this loop never ends }
91
..........................................................
Note: As shown in Example 52, a while loop can be usually be easily rewritten as a for loop and vice versa. Which type of loop you use is usually a matter of personal preference and programming style. As a general rule, though, if the loops exit depends on an incremental type of expression (such as a++), use for; otherwise, use while.
for Command
92
foreach Command
The $element variable is any scalar variable, and it becomes a placeholder for each element in the given @list, which can be an array, a literal list, or anything that evaluates to a list. The foreach statement executes the block of statements within the loop for each element of the list, substituting the actual element of the array for the $element variable. For example, Example 54 shows a script that reads in a series of lines from STDIN and uses two foreach loops; the rst foreach loop uses the tr (translate) function to convert each line to uppercase, and the second loop prints out the entire array.
Example 54 Typical foreach Loop
#!/usr/thirdParty/perl/bin/perl -w @array = <STDIN>; # read all of the lines at once foreach $string (@array) { $string =~ tr/a-z/A-Z/; } # end foreach foreach $string (@array) { # converts array to uppercase # one line at a time # array is now in uppercase
93
..........................................................
print "$string"; } # end foreach
foreach Command
#!/usr/thirdParty/perl/bin/perl -w @array = <STDIN>; for ($i=0; $array[$i]; $i++) { $array[$i] =~ tr/a-z/A-Z/; # convert to uppercase } # end for for ($i=0; $array[$i]; $i++) { print "$array[$i]"; } # end for
It might appear that the foreach loop is redundant and unnecessary, but it offers two major advantages over for: The foreach loop can access any list, including lists that are not in predened arrays. Example 56 shows foreach being used to convert a line to uppercase on a word-by-word basis, without having to rst put the words in an array:
Using foreach on a Non-Array List
Example 56
#!/usr/thirdParty/perl/bin/perl -w while (<>) { foreach $word (split()) { # split line into words $word =~ tr/a-z/A-z/; # convert to uppercase print "$word\n";
94
..........................................................
} # end foreach } # end while
foreach Command
The technique shown in Example 56 is often used when creating lters, which are programs that accept lines of input, usually from STDIN, process each line in some way, and output the lines, usually to STDOUT. The second advantage of foreach loops is that the @list array in a foreach loop can be processed by an array function before being used. The most common functions used in this way are the reverse and sort commands. For example, the script in Example 56 could be easily modied so that the words in each line are sorted before they are converted to uppercase and printed. See Example 57:
Example 57
#!/usr/thirdParty/perl/bin/perl -w while (<>) { foreach $word (sort split()) { # split and sort words $word =~ tr/a-z/A-z/; # convert to uppercase print "$word\n"; } # end foreach } # end while
To reverse the order of the words, use the reverse command instead of sort in the third line of the above script.
95
..........................................................
See the Perl reference guide that is online the PanaVue workstation at http:/idDocs/scripts/perl/htmldocs for more information on using these functions and for a list of the other, less commonly used functions.
96
Arithmetic Functions
atan
cos
exp
hex
hex(expression) Returns the decimal value of expression, as interpreted as hexadecimal; expression should be a string that contains only digits and the letters
97
..........................................................
A through F. If the string starts with 0x, use the oct function (see below), to convert it to hexadecimal.
$a = hex(FF30); # $a is 65328 decimal
Arithmetic Functions
int
int(expression) Returns the integer portion of expression (rounded down to the nearest integer):
$a = int(exp(1)); # $a is 2
If you want to round a number up or down to the nearest integer, add 0.5 to the expression:
$a = int( exp(1) + 0.5 ); # $a is 3
log
oct
oct(expression) Returns the decimal value of expression, as interpreted as either an octal or hexadecimal string. To be interpreted in octal, expression must be a string that starts with 0 and contain only the digits 0 through 7:
$a = oct(100); # $a = 64 decimal $a = oct(377); # $a = 255 decimal
To be interpreted in hexadecimal, expression must be a string that starts with 0x and contain only the digits 0 through 9 and letters A through F:
98
..........................................................
$a = oct("0x100"); # $a = 256 decimal $a = oct("0xF77"); # $a = 2959 decimal
Arithmetic Functions
rand
rand(expression) Returns a random number (with up to 14 signicant digits) between 0 and expression. If expression is not given, it is assumed to be 1:
$a = rand; # $a can be anything between 0 and 1 $a = rand(100); # $a can be anything between 0 and 100
Use the srand function (below) to increase the randomness of this function. sin sin(expression) Returns the sine, in radians, of expression.
$a = sin(3.14159); # $a is (approximately) zero
sqrt
srand
srand(expression) Seeds the random number generator and should be used before using rand (see above). If no expression is given, srand uses whatever is currently returned by the time function (see below) A good expression to use is srand( time | $$ ), which uses the result of a bitwise OR between the current time and the scripts unique process ID (pid):
srand( time | $$ ); # randomize $a = rand(1000); # $a can be anything between 0 and 1000
99
Timekeeping Functions
localtime
localtime(expression) Converts the expression (number of seconds since January 1, 1970) into a 9-element array showing the equivalent local time (as determined by the workstations system clock). This function is typically used with the output of the time function:
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time); print "Todays date is $mon/$mday/$year\n"; # prints a line like "Todays date is 11/15/96" print "The time is $hour:$min:$sec\n"; # prints a line like "The time is 12:13:43"
gmtime
gmtime(expression) Converts the expression (number of seconds since January 1, 1970) into a 9-element array showing the equivalent Greenwich Mean Time (GMT). The typical use is with the output of the time function:
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = gmtime(time); print "Todays date is $mon/$mday/$year\n GMT"; # prints a line like "Todays date is 11/15/96 GMT" print "The time is $hour:$min:$sec GMT\n"; # prints a line like "The time is 12:13:43 GMT"
100
String Functions
chop
chop(expression) Removes the last character of the specied string, or if an array is specied, removes the last character of all elements in the array. Like chomp, this function modies the string or array given, but unlike chomp, it removes any character that is at the end of the string; it also returns the character that was chopped.
$string = "this is a line\n"; $a = chop($string); # $a="\n" and $string="this is a line" $a = chop($string); # $a="e" and $string="this is a lin"
chr
chr(expression) Returns the ASCII character (as a one-character string) specied by expression (which should be between 0 and 255 for most uses).
$a = chr(65); # $a = "A"
101
..........................................................
index index($str,$substr,off) Returns the index position (zero-based) of the rst occurrence of the $substr string in the $str string at or after the given offset (off, which should be numeric). If off is omitted, it defaults to 0, which means the string is searched from its rst character. Returns -1 if the substring is not found.
$string = "This is a line of data\n"; $a = index($string,"data",0); # $a = 18 $a = index($string,"not here"); # $a = -1
String Functions
length
length($string) Returns the length, in characters, of the given string. This count includes any special characters such as newlines and carriage returns:
$string = "This is a line of data\n"; $a = length($string); # $a = 23 (the newline is included)
lc
lc($string) Returns the specied string in lowercase (does not change the original string).
$string = "THIS IS A LINE OF DATA\n"; $a = lc($string); # $a = "this is a line of data\n"
lcfirst
lcrst($string) Returns the specied string with its rst character converted to lowercase (does not change the original string).
$string = "THIS IS A LINE OF DATA\n"; $a = lcrst($string); # $a = "tHIS IS A LINE OF DATA\n"
ord
ord($string) Returns the ASCII numeric value of the rst character in the specied string.
102
..........................................................
$string = "THIS IS A LINE OF DATA\n"; $a = ord($string); # $a = 84 (the ASCII value of "T")
String Functions
rindex
rindex($str,$substr,off) This is reverse index (see index above) and returns the index position (zero-based) of the last occurrence of the $substr string in the $str string from the given offset (off, which should be numeric). If off is omitted, it defaults to the length of the string, which means the string is searched backwards from its last character. Returns -1 if the substring is not found.
$string = "This is a line of data\n"; $a = rindex($string,"is"); # $a = 5 $a = rindex($string,"not here"); # $a = -1
substr
substr($str,off,length) Returns a string that is a substring of the specied $str string at the given offset (off, which should be numeric) and of the given length (which should be numeric). If length is omitted, the substring is from the given offset to the end of the $str string. The original $str string is left unchanged.
$string = "This is a line of data\n"; $a = substr($string,5,4); # $a = "is a"
uc
uc($string) Returns the specied string in uppercase (does not change the original string).
$string = "This Is A Line Of Data\n"; $a = uc($string); # $a = "THIS IS A LINE OF DATA\n"
ucfirst
ucrst($string) Returns the specied string with its rst character converted to uppercase (does not change the original string).
103
..........................................................
$string = "this is a line of data\n"; $a = ucrst($string); # $a = "This is a line of data\n"
String Functions
undef
undef(expression) Returns the undened value (either a null string ("") or a zero ("0")), and initializes the specied expression to the undened value. The expression can be either a scalar variable (either numeric or a string) or an array (in which case all of its elements are set to the undened value). If expression is omitted, returns the undened value, which can be useful for certain tests.
$a = undef(b); # both $a and $b = ""
104
..........................................................
In addition to reading and writing les, Perls directory operators allow you to read directories to nd the lenames in them. Perls le test operators also allow you to test les for various characteristics.
105
Using Filehandles
106
Note: Your Perl scripts can change the assignments of the standard lehandles by closing them and reopening them so they refer to les or other devices.
Normally when STDIN is used, it reads a line of input from the users keyboard. Similarly, when STDOUT and STDERR are used, they output to the users screen. These default assignments, though, can be changed on the Solaris command line, using the standard I/O redirection operators (< and >). For example, to set STDIN to be a le, enter a command similar to the following:
/usr/thirdParty/perl/bin/perl myscript.pl < input-le
If myscript.pl is run like this, whenever it reads a line from STDIN, it actually reads a line from input-le. Similarly, to redirect STDOUT to a le, enter a command similar to the following:
107
..........................................................
/usr/thirdParty/perl/bin/perl myscript.pl > output-le
When this command is run, input from STDIN comes from input-le and output sent to STDOUT is saved in output-le.
Note: STDERR can be redirected as well, but the exact syntax depends on the type of shell you are using. This normally is not done, though, since messages to STDERR are usually intended to inform the user of an immediate problem.
Perl scripts can get the next line of input from the STDIN device by using the STDIN lehandle. For example, the following command reads one line from STDIN, including the newline (\n) character that ends the line:
Example 58 Reading One Line from STDIN
$line = <STDIN>;
After all input has been read from STDIN, any further attempts to read from it return the undened value, which can be dened either as the null string ("") or as a zero ("0"). For this reason, STDIN is usually read as shown in Example 59:
108
..........................................................
Example 59 Reading All Input from STDIN
The while loop continues as loop as input is available (if STDIN is reading from the console, this is when the user enters a CTRL-D character; if STDIN is redirected to a le, this is when all lines have been read). After the last line has been read from STDIN, the while loop exits and the rest of the program is executed. Because the method shown in Example 59 is so commonly used in Perl, it can be abbreviated to use the default operator ($_), as shown in Example 60:
Example 60 Using the Default Operator with STDIN
The default operator ($_) is automatically used when you use a while loop to read from STDIN (or any lehandle). However, when accessing STDIN outside of such loop tests, you cannot use this shortcut but must explicitly name the variable to receive in the input, as shown in Example 58. The code in Example 60 can be shortened even further by using the diamond operator, which in some circumstances is assigned to STDIN. See Reading Input From the Diamond Operator (<>) on page 112 for details.
109
..........................................................
Using STDIN to Read An Entire File In the previous section a scalar variable (either $line or $_) was used to read from STDIN one line at a time. If, however, you want to read all of the available lines at once, you can use an array variable, as shown in Example 61:
Example 61 Reading All of STDIN at Once
@file = <STDIN>;
This one line reads all of the lines available from STDIN and puts each line in one element of the @le array. Thus, $le[0] contains the rst line of input, $le[1] contains the second line, and so on. As with any array, you can use the special variable $#le to get the index number of the last element of the array to determine the number of lines read. You could also loop through the array until you reach an element that is set to the undef value (the null string "" or a single zero "0"). Using STDOUT and STDERR The print command can be used to send output to the STDOUT and STDERR devices as shown in Example 62:
Example 62 Printing to STDOUT and STDERR
print STDOUT "This line goes to STDOUT.\n"; print STDERR "This line goes to STDERR.\n";
Note: STDOUT and STDERR can also be used with the printf and write commands.
110
..........................................................
Since STDOUT is the default lehandle for the print command, you can omit it, as shown in Example 63:
Example 63 Printing to STDOUT
Note: You can change the default lehandle for the print, printf, and write commands using the select command. See the online Perl reference guide (http:/idDocs/scripts/perl) for details.
111
The diamond operator processes the command line arguments found in the @ARGV array (see Accessing Files on the Command Line on page 117), selecting the arguments that appear to be lenames. Using the diamond operator, therefore, is a convenient way to avoid having to parse the command line yourself.
112
..........................................................
However, there are times you might want to change the list of les that will be read by the diamond operator. You can do this by changing the contents of the @ARGV array before the diamond operator is rst invoked. For example, you might have a number of scheduled scripts that periodically gather data from nodes on the network, and each night you want to generate a report on that days ndings. If your scheduled scripts store their data in the /opt/Panavue/logs directory, you could specify this directory on the command line when running the reporting script:
/opt/Panavue/scripts/perl/doreports.pl /opt/Panavue/logs/*
The doreports.pl script could then use the diamond operator to read each le in this directory and process it. If, though, you wanted only todays reports to be processed, you could modify the @ARGV array so that it includes only les that have been modied in the past 24 hours. Example 65 shows some sample code that does this:
Example 65 Modifying the Files Accessed by the Diamond Operator
$i = $j = 0; # set indexes to 0 while ($ARGV[$i]) { # while array has valid data if (-M $ARGV[$i] <= 1) { # has file been modified in # the past 24 hours? $ARGV[$j++] = $ARGV[$i++]; # yes, save it } else { # no, skip it $ARGV[$i++] = ""; } $ARGV[$j] = ""; # ensure have final null in array
113
..........................................................
} # end while # @ARGV now contains only files modified within the past 24 hours
Note: See File Access in Perl on page 105 for information about the -M operator and other le test operators.
The code shown in Example 65 could easily be inserted into a subroutine (see Using Subroutines in Perl) that is called before the rst use of the diamond operator. It assumes, however, that no command line switches were specied (such as -x or -A); if your scripts also accept switches on the command line, you either have to write code to strip them from the @ARGV array or invoke your scripts using the -s Perl command line option (see Command Line Options on page 10). If you want to change the @ARGV array, do it before the rst use of the diamond operator; otherwise, unpredictable behavior can result since the diamond operator has its own internal variables it uses to keep track of which element of the array it is accessing. Depending on the programs current state, the diamond operator might not use your changes. Finding the Current Filename Perl keeps the name of the le currently being read in the special variable $ARGV. When one le is closed and the next opened and a line is read, Perl automatically updates $ARGV with the new lename, including any path information that was specied on the command line. If no les were specied on the command line and the diamond operator is set to STDIN, $ARGV is set to a single hyphen ("-") character.
114
..........................................................
Note: Do not confuse $ARGV with the @ARGV array. Also, $ARGV is an uninitialized string that matches the null string ("") until the rst time the diamond operator is actually invoked and a line of input is read. $ARGV then remains at its current setting until the diamond operator opens up a new le and reads one line from it. Thus, $ARGV cannot tell you the name of the next le that is about to be opened, only the name of the le from which the last line of input was read. Use the @ARGV array to nd the les that remain to be read.
If you do not care which les are being read and want to treat them as just one giant source of input, it is very convenient to have the diamond operator automatically and transparently move from one le to another. However, if you do want to know what les are being read (typically so you can modify them), you must test for a new lename every time you read a line of input. Example 66 shows one possible way this can be done:
Example 66 Monitoring $ARGV for a New Filename
$lastargv = $ARGV; # both variables start off as "" while (<>) { # a line of input is read here # so must immediately test for new file if ($ARGV ne $lastargv) { $lastargv = $ARGV; # reset with new filename do-other-new-file-processing-here } continue-with-rest-of-while-loop } # end of while
115
..........................................................
Like the @ARGV array, the $ARGV variable can be modied by your program to have a new value. Doing so, however, does not change the behavior of the diamond operator, which continues reading the les listed in the @ARGV array. The diamond operator also continues to update the global version of $ARGV whenever it opens and reads from a new le, thereby overwriting any change you have made to it. It is recommended, therefore, you modify $ARGV only when it is a local variable within a subroutine (see Dening Local Variables on page 174).
116
The script shown in Example 67 displays the scripts name and all of the scripts command line arguments:
Example 67 Displaying all Command Line Arguments
#!/usr/thirdParty/perl/bin/perl -w $i = 0; print "This script is named $0.\n\n"; while ($ARGV[$i]) { print "This is command line arg # $i: $ARGV[$i++].\n"; }
117
..........................................................
The arguments on the command line can be anything that the user typed, including switches (such as -x and -y), directory names, commands, and lenames. Your script should therefore parse this information and verify it before use. See File Test Operators on page 132 for ways you can test a string to see if it is a valid le.
118
Using Filehandles
where FILEHANDLE is the name of your lehandle and lename is the name of the le you want to open (you can either use a literal such as "myle.txt" or a scalar variable such as $lename). The open command returns true if it was successful in opening the le and the undened value if not. Perl allows you to read undened lehandles, so you should test the results of the open command before attempting to use a lehandle:
Example 68 Testing for a Valid Open Operation
if (open(FILEHANDLE,"filename")) { print "The open successfully created the filehandle.\n"; } else { print "Could not open the file.\n"; } or unless (open(FILEHANDLE,$filename)) {
119
..........................................................
print "Could not open $filename.\n"; } else { print "The open successfully created the filehandle.\n"; }
Using Filehandles
Being unable to open a le usually means your script has encountered a fatal error and should exit. Perl offers a quick way of doing this by using the die command, which prints a user-dened message to STDERR and then exits. See Example 69:
Example 69 Using the die Operator
If being unable to open a le is not a fatal error but is something the user should be notied about, you can use the warn operator. See Example 70:
Example 70 Using the die Operator
To open a new le for writing, add a greater-than sign (>) to the front of the les name:
120
..........................................................
Example 71 Opening a New File for Writing
Using Filehandles
open(FILEHANDLE,">filename"); or open(FILEHANDLE,">$filename");
Note: You must use double quotation marks when opening a le for writing, both when specifying a literal for the lename (such as ">lename") and when you specify a scalar variable (such as ">$lename"). You can also use the die or warn operator, as shown in Example 69, when opening les for writing.
The commands shown in Example 71 automatically delete, without any warning, any existing le that has the same name as the le you are attempting to open. If this is not what you want, you should open the le using the append operator (>>). This species that the le should be opened for writing, but that all output should be appended to the end of the currently existing le. See Example 72:
Example 72 Opening a New File for Appending
open(FILEHANDLE,">>filename"); or open(FILEHANDLE,">>$filename");
Note: You can also test for a les presence before opening it. See File Test Operators on page 132.
121
..........................................................
To write to an opened lehandle, specify the lehandle when using the print command:
Example 73 Writing to a Filehandle
Using Filehandles
$logfile = "/opt/Panavue/logs/perl.log"; open(LOGFILE,">>$logfile") || die("$0:Could not open $logfile.\n\n"); print LOGFILE "This line is being written to $logfile\n";
Since writing to a le after it has been opened can fail for a number of reasons, such as running out of disk space, consider testing the result of any print commands that write to les. For example, Example 74 shows the OR operator (||) being used with the print command; if the print to the LOGFILE lehandle is unsuccessful, the second print statement sends an error message to STDERR.
Example 74 Writing to a Filehandle
$logfile = "/opt/Panavue/logs/perl.log"; open(LOGFILE,">>$logfile") || die("$0:Could not open $logfile.\n\n"); print LOGFILE "This line is being written to $logfile.\n" || print STDERR "$0:Could not write to $logfile\n";
Closing a Filehandle
122
..........................................................
close(FILEHANDLE);
Using Filehandles
Perl automatically closes any open les when your script terminates, but closing les when they are no longer needed is still a good habit to get into since it ensures that any data that was cached by the operating system is written to the le, which might not happen if a user abnormally terminates the program with a CTRL-C. Closing the le also frees up any memory structures that were used for the lehandle and its buffers.
123
Directory Operations
chdir("dirname"); chdir($dirname);
If you are getting the directory name from a user, you must remove the nal newline character from the input; otherwise, the chdir fails. See Example 76:
Example 76 Changing the Working Directory
print "Enter the working directory?\n"; $dirname = <STDIN>; # get the directory from the user chop($dirname); # eliminate the last char (newline) chdir($dirname) || print STDERR "$0:Could not chdir to $dirname\n\n";
124
..........................................................
Listing the Files in a Directory Any easy way to list the les in the current directory is to use shell-type wildcards within angle brackets (<>). For example, to nd all of the normal les (les that do not start with a period) in the current directory and put their lenames into the @les array, give the following command:
@les = <*>;
Directory Operations
This commands nds all les that match the wildcard pattern and puts each matching lename into one element of the specied array. The les are listed in the same order they are stored in the directory, so $le[0] is the rst le listed, $le[1] is the second le listed, and so forth. Perl actually spawns an instance of the C-Shell (/bin/csh) to expand the list of lenames, so any wildcards that work at the csh command line (known as glob-style wildcards) also work here. For example, to list all les in the /etc directory, use the same wildcard that you would use with the ls command (/etc/*):
Example 77 Listing All Files in the /etc Directory
@etcfiles = </etc/*>;
Note: The wildcards used in globbing are similar to but not the same to the wildcards used in the Bourne, C, and Korn shells. See the sh, csh, and ksh manpages for details.
You can specify any wildcard pattern as long as it can not be interpreted as a variable or lehandle. For example, the following matches would fail because Perl interprets the string inside the brackets as referring to a Perl variable of one type or another:
125
..........................................................
Example 78 @a @a @a @a = = = = Invalid Globbing Patterns for Perl Perl Perl Perl Perl interprets interprets interprets interprets $files as scalar variable @files as array files as filehandle FILES as filehandle
Directory Operations
Because this approach requires spawning another process (the /bin/csh shell), it is not the most efcient approach, especially if you expect to deal with a large number of les. It also returns directory names in addition to regular les, so you also have to test each name before using it to see whether it refers to a le or a directory entry. Because of these limitations, reading the directory directly, as described in the next section, is usually the preferred method of getting a le list from a directory. Reading a Directory Entry Directly To read a directory entry, use the opendir command, which works like the open command but on directories:
Example 79 Opening a Directory Entry
As shown in Example 79, the opendir command uses a lehandle, just as the open command does, but opendir creates a directory lehandle, which can be meaningfully read only by the readdir command, which has the following format:
126
..........................................................
Example 80 Reading a Filename Using readdir
Directory Operations
$filename = readdir(DIRHANDLE);
Only the actual lename is returned by readdir; the path information is not included as part of the string (for example, $lename is set to "passwd" and not "/etc/passwd" when parsing the /etc/ directory). As with other forms of input, readdir returns the undened value ("" or "0") when it has returned all of the names in the les in the directory. Typically, therefore, a while loop is used to read the directory:
Example 81 Reading All Files in a Directory
#!/usr/thirdParty/perl/bin/perl -w $etcdir = "/etc/"; opendir(ETCDIR,$etcdir) || die("$0:Could not open the $etcdir directory.\n"); while($filename = readdir(ETCDIR)) { print "The $etcdir directory contains: $filename\n"; } # end while
The readdir command returns all of the lenames in a directory, including those starting with a period. The lenames are returned in the same order that they exist in the directory entry, which is not necessarily a sorted list. To produce a sorted list, use the sort command while reading the list of lenames into an array:
127
..........................................................
Example 82 Reading and Sorting All Files in a Directory
Directory Operations
#!/usr/thirdParty/perl/bin/perl -w $etcdir = "/etc/"; opendir(ETCDIR,$etcdir) || die("$0:Could not open the $etcdir directory.\n"); @files = sort(readdir(ETCDIR)); foreach $filename (@files) { print "The $etcdir directory contains: $filename\n";}
Note: Use the closedir command to close a directory lehandle after use.
Deleting Files
The unlink command deletes a le and it operates on lenames, not lehandles, so you do not need to open a le before using this function. The unlink command can delete either a single le or a list of les, and it has the following formats:
Example 83 unlink Command Formats # file is specified as a literal # file is specified as a scalar var # list of files specified as an array # list of files specified as glob
The unlink command attempts to delete all of the specied les and then returns the number of les that have been successfully deleted. If you want to verify that all les
128
..........................................................
have been deleted, you must either count the number of les in advance or test for the supposedly deleted les presence (see File Test Operators on page 132).
Note: The unlink command deletes les without warning, so use it very carefully, particularly when specifying a list of les or when using wildcards.
Directory Operations
Renaming Files
The rename command renames a le and it operates on lenames, not lehandles, so you do not need to open a le before using this function. Example 84 shows the format of the rename command:
Example 84 Renaming a File
If the specied lenames do not have complete pathnames, they are assumed to be in the current working directory. If rename does successfully rename the given le, it returns true; otherwise, it returns the undened value. See Example 85:
Example 85 Example of Renaming a File
if (rename($oldfile,$newfile)) { print "Successfully renamed $oldfile to $newfile\n"; } else { print "Could not rename $oldfile to $newfile\n"; } # end if
129
..........................................................
Note: Perl scripts can rename or delete les only if the scripts are run by a user who has the permissions needed to change those les at the command line.
Directory Operations
Perl has its own versions of the Unix mkdir and rmdir commands to create and remove directories. The mkdir function takes two arguments, the name of the directory and its set of permissions:
mkdir("dirname",chmod-value); or mkdir($dirname,$chmod-value);
The directory name can be either a literal string or a scalar variable that species either a full or partial pathname. The permissions must be in the octal format used by the Unix chmod program (see the chmod manpage for details). The mkdir function returns true if successful and the undened value if not. If the directory could not be created, the special variable $! is also set with the appropriate error code. Example 86 attempts to create a directory named tmp in the current working directory, giving all users complete read, write, and execute permissions to the directory:
Example 86 Creating a Directory
if ( mkdir("tmp",0777) ) { print "Successfully created the tmp directory\n"; } else { print "$0:Unable to create directory because of:\n";
130
..........................................................
print "\t$!, error number ",$!+0,"\n"; }
Directory Operations
The rmdir command removes a current directory, assuming the directory has no les in it and the scripts user ID has the proper permissions. The rmdir command removes one directory at a time, and like mkdir, it sets the $! variable if it fails. See Example 87:
Example 87 Removing a Directory
if ( rmdir("tmp") ) { print "Successfully removed the tmp directory\n"; } else { print "$0:Unable to remove directory because of:\n"; print "\t$!, error number ",$!+0,"\n"; }
131
132
..........................................................
Table 12 Operator -S -s -T -t -z File is a socket Returns the size of the le (a non-zero size evaluates to true, a zero size evaluates to false). Also see -z below. File is a text le Filehandle is opened to a tty File has zero size Information about the File or Directorys Timestamp -A -C -M Returns the time that the le or directory was last accessed in days (including fraction for the part that is less than 24 hours). Returns the time that the le or directory was created in days (including fraction for the part that is less than 24 hours). Returns the time since the le or directory was last modied in days (including fraction for the part that is less than 24 hours). Information about the File or Directorys Permissions -O -R -W File is owned by real user ID File is readable by real user ID/group ID File is writable by real user ID/group ID File Test Operators Description
133
..........................................................
Table 12 Operator -X -g -k -o -r -u -w -x File Test Operators Description File is executable by real user ID/group ID File has setgroup ID bit set File has sticky bit set File is owned by effective user ID File is readable by effective user ID/group ID File has setuser ID bit set File is writable by effective user ID/group ID File is executable by effective user ID/group ID
The most common tests that are done are to see if a le or directory exists (-e), if it is readable or writable by the script (-r and -w), its size (-s and -z), and its modication age (-M). The tests can be used within any control structure or loop, but typically they are used within an if clause; the major exception to this are the tests that return specic information, such as the size of the le or its age, which can be used anywhere any other expression can be used. Example 88 demonstrates the use of these le tests by getting a list of les and then testing each one in turn. Note, however, that the scripts rst test is to see if the le exists, which is recommended because some user or process could always delete a le between the time it is rst found and the time it is used.
134
..........................................................
Example 88 Testing Files
#!/usr/thirdParty/perl/bin/perl -w @files = <*>; # get list of files in current working dir foreach $filename (@files) { if (-e $filename) { print "$filename does exist\n"; if (-z $filename) { print "\tIt is zero bytes in size\n"; } else { print "\tIts size is ",(-s $filename), "bytes\n"; } # end else if (-r $filename) { print "\tIt is readable by this script\n"; } # end if if (-w $filename) { print "\tIt is writable by this script\n"; } # end if print "\tIt was modified ",(-M $filename), "days ago\n"; } # end if # should fall through here only if someone deleted the # file between the time the script did the glob command and # the time it tested for the files presence else { print "The file $filename no longer exists in the "; print "current working directory.\n"; } # end else } # end foreach
135
..........................................................
Unlike most word processors, though, regular expressions can use very complex patterns that check for many different words, phrases, or sets of characters at the
136
..........................................................
same time. For example, if you have a report that lists all of the alarms in a Promina 800 Series network, you might want to print out only the trunk and link alarms. This would be difcult to do with a typical word processor because you would have to rst search for link alarms and then do another search for trunk alarms. Then you would have to separate those lines out and print them. In Perl, though, you can use one statement to search for both patterns at once. You could also narrow the search to a specic set of nodes and trunk cards, or even do a search that excludes all trunk and link alarms, printing out the rest. Many other options exist when using Perls regular expressions.
137
Similarly, a substitution pattern is dened as it is in sed or vi the letter s is followed by three forward slashes that separate the pattern to be matched from the pattern that is to replace it. See Example 90:
Example 90 Typical Substitution Patterns
In the examples given in Example 90, the text trunk alarms is replaced by link alarms, the name Robert is replaced by Bob, and Monday is replaced by Tuesday. The nal example deletes the two words not needed.
138
..........................................................
Note: The forward slashes are only the default delimiters for regular expressions and substitution patterns. You can specify other nonalphanumeric character for this purpose by preceding the rst delimiter with m, as in m#Monday#Tuesday#. See the online Perl reference guide (http:/idDocs/scripts/perl) for further details.
139
#!/usr/thirdParty/perl/bin/perl -w while (<>) # read each line from STDIN/files if (/pattern-to-match/) { stuff-to-do-when-match-is found } # end if else { do-this-other-stuff } # end else } # end while
140
..........................................................
To rewrite Example 91 so that the match is made against the variable $var, rewrite the if clause as follows:
Example 92 Performing a Match Against a Variable
Alternatively, you could use the negation regexp operator (!~) to reverse the logic of the if clause:
Example 93 Performing a Negative Match Against a Variable
Substitutions are handled the same way as simple pattern matching: the substitution is done against the default variable for standard input ($_) unless another variable is specied by using the =~ operator.
141
..........................................................
Example 94 Performing Substitutions
$_ = "This line is from the default variable"; $var = "This line is from the variable"; s/ is / was /; # $_ = "This line was from the default variable" $var =~ s/
You can match against more than one pattern by using the OR operator (|) to separate the different patterns. For example, the following regular expression matches either up or down:
/up|down/
Whenever a match is successfully done (but not a substitution), Perl automatically lls in the following read-only variables:
$& contains the text that was rst matched $ contains the text that appeared on the line before the rst match $ contains the text that appeared on the line after the rst match These variables are not set until a successful match is made, but when set they remain so until the next successful match is done. These variables are also read-only, so your program cannot change their values. Therefore if you want to save the information in these variables, you should copy their values into your own variables. By default, a match or substitution is case-sensitive; /abc/ matches only the lowercase letters abc not ABC. To change this append the ignore case operator (i) at the end of the regular expression: /abc/i or s/abc/def/i.
142
..........................................................
Similarly, a match or substitution is done only on the rst successful match on a line. To have a a match or substitution occur everywhere possible on a line, append the global operator (g) to the regular expression: /abc/g or /abc/def/g.
143
#!/usr/thirdParty/perl/bin/perl -w while (<>) { # read from file/STDIN chop; # chop off the newline char $name = $_; # assign the line of input to $name if ($name ne Robert) { # is the name Robert? print "$name\n"; # if not, then print it } # end of if } # end of while
Text can also be replaced in this manner, as shown in the script in Example 96, which prints all lines unchanged except for Robert, which is printed as Bob:
Example 96 A Simple Substitution
#!/usr/thirdParty/perl/bin perl -w while (<>) { # read from file/STDIN chop; # chop off the newline char $name = $_; # assign the line of input to $name if ($name ne Robert) { # is the name Robert?
144
..........................................................
print "$name\n"; } # end if else { $name = "Bob"; print "$name\n"; } # end of if/else } # end of while # if not, then print it # otherwise, if the name is "Robert" # replace "Robert" with "Bob" # and print the new name
The matching and substitution in these examples is extremely simple. The only time a match can be made is when the input line contains nothing but the name Robert; lines containing text such as robert or Robert Johnson are not matched. Also, this sort of matching requires an if or else (or elsif) construction for each possible match. This is not too much of an inconvenience in Example 95 and Example 96 since they have only two possible conditions: either the text matches Robert or it does not. However, this approach rapidly becomes unwieldy if you want to match more than one pattern. For these reasons, regular expressions are typically used as shown in Example 97:
Example 97 A Simple Use of Regular Expressions
#!/usr/thirdParty/perl/bin perl -w while (<>) { # read from file/STDIN chop; # chop off the newline char $name = $_; # assign the line of input to $name unless ($name =~ /Robert/) { # is the name Robert? print "$name\n"; # if not, then print it } # end of unless } # end of while
145
..........................................................
The program shown in Example 97 is almost identical to that shown in Example 95, except that the if clause has been replaced by an unless clause, and its test has become a regular expression that matches Robert wherever it appears on the input line. For example, this program would not print any of the following lines since they all contain the text Robert at some point:
Robert Roberta Robert Johnson My name is Robert abcdefgRoberthijklmnopqrstuvwxyz
This program can be enhanced by the addition of the letter i after the regular expression, which instructs Perl to ignore the case of the letters. Because of this, the program in Example 98 ignores all lines containing Robert, robert, or any other permutation of upper and lowercase letters in that name. Example 98 also has been simplied to eliminate the redundant use of the $name variable by using the default operator of $_.
Example 98 Specifying Case-Insensitive Regular Expressions
#!/usr/thirdParty/perl/bin perl -w while (<>) { unless (/Robert/i) { print; } # end of unless } # end of while # # # # read from file/STDIN does the line contain Robert, robert, or any variation? if not, then print it
146
..........................................................
Regular expressions also greatly simplify substitutions. For example, Example 99 is a rewrite of the script shown in Example 96 (page 144). However, whereas the previous example needed if and else clauses to determine whether to change the line before printing, the rewritten script does not because the substitution process is automatic a substitution is made only if Robert (or a variation such as robert) appears on the line:
Example 99 A Simple Substitution Using Regular Expressions
#!/usr/thirdParty/perl/bin perl -w while (<>) { s/Robert/Bob/i; print; } # # # # # read from file/STDIN if Robert exists in either upper or lowercase letters, change to Bob print the line end of while
The script in Example 99 prints out all lines from the input le (or STDIN), substituting Bob for Robert whenever it appears. However, this substitution is pretty simplistic and has two potential problems. The substitution is done for any form of the name Robert. Thus, names such as Roberta, Roberto, and Robertson become Boba, Bobo, and Bobson respectively. This is probably not what is intended. The substitution is done for only the rst occurrence of Robert on the input line. A line reading Robert, my name is Robert becomes Bob, my name is
147
..........................................................
Robert. The second appearance of Robert is ignored by the substitution expression because unless otherwise specied, regular expressions nd only the rst match on a line. The rst problem can be solved by limiting a match only to Robert when it appears as a complete word, not when it is part of a larger word. This is done by bracketing the text with the word boundary marker \b. The regular expression \bRobert\b species that a match can be made only when the text Robert is both preceded and followed by whitespace or other word terminators such as punctuation. As a result, words such as Roberta and Robertson are not matched. The second problem can be solved by adding the global operator (g) at the end of the regular expression. This indicates that the substitution should be done at all places on the line where a match is made. Example 100 shows the modied script:
Example 100 Enhancing a Simple Substitution
#!/usr/thirdParty/perl/bin perl -w while (<>) { # read from file/STDIN s/\bRobert\b/Bob/ig; # if Robert exists in either upper # or lowercase letters, but only as # a complete word, change to Bob # everywhere on the line print; # print the line } # end of while
148
..........................................................
The script shown in Example 100 thus can make more intelligent substitutions, such as the following:
Example 101 Typical Substitutions becomes becomes becomes remains My name is Bob. Bob robertson Bob, Bob My name is Roberta.
The word boundary marker (\b) is one of four anchoring patterns that can be used to limit a match to a specic situation. Table 13 lists these patterns and illustrates their use:
Table 13 Anchor Pattern \b \B ^ $ Description matches a word boundary matches anything but a word boundary matches the beginning of a line matches the end of a line Anchoring Patterns for Regular Expressions Example /\bmail\b/ matches mail but not email nor mailman /\Bmail/ matches emailer and remail but not mail nor mailman /mail\B/ matches mailman and emailer but not mail nor email /^mail/ matches mail when it appears in the line mail is easy to use but not when it appears in the line I nd mail easy to use. /mail$/ matches mail when it appears in the line I like mail but not in the lines I like mail. (because the line ends with a period) nor the line I like mail sometimes.
149
..........................................................
The anchoring patterns shown in Table 13 are a subset of a large number of special characters that can be used to limit a match using regular expressions. Most of these special characters are used to match nonprintable characters (such as the newline character), but some of them refer to a group of characters (such as whitespace characters or numeric digits). See Table 14:
Table 14 Anchor Pattern \d \D Description matches any digit (0-9) matches any non-digit (any character that is not 0-9) matches a formfeed character (ASCII 12 or CTRL-L) matches a newline character (ASCII 10 or CTRL-J) matches a carriage return character (ASCII 13 or CTRL-M) Special Characters for Regular Expressions (1 of 2) Example /N\d\d/ matches N20 or N91 but not NOPr /N\d\d/ matches Nor or Not but not N00 through N99
\f
/end of page\f/ matches the words end of page only if they are immediately followed by a formfeed character /end of line\n/ matches the words end of line only if they are immediately followed by a linefeed character /end of line\r/ matches the words end of line only if they are immediately followed by a carriage return character (this match is not usually used in Unix systems, where only the newline character is dened as only linefeed, but it could be useful for les from DOS or Macintosh computers that use the carriage return)
\n
\r
150
..........................................................
Table 14 Anchor Pattern \s Description matches any whitespace character (space, tab, newline, carriage return, or formfeed) matches any nonwhitespace character matches a tab character (ASCII 9 or CTRL-I) matches any word character (0-9, a-z, A-Z, or the underscore) matches any non-word character Special Characters for Regular Expressions (2 of 2) Example /Node\sCard/ matches the words Node and Card only if they are separated by one space, tab, newline, carriage return, or formfeed character /Node\SCard/ matches the words Node and Card only if they are not separated by a whitespace character (for example: Node-Card or NodesCard) /Node\tCard/ matches the words Node and Card only if they are separated by one tab character. /N\w\w\w/ matches N120, N_21, Node, and Nick, but not N.12 or N1-2 (since the period and hyphen characters are not word characters). /N\W/ matches N., N-, or the letter N followed by whitespace, but not No or N1
\S
\t \w
\W
Finally, to match a character with a specic ASCII value, specify that value in one of the three ways shown in Table 15:
151
..........................................................
Table 15 Anchor Pattern \OOO \xHH \cX Description Specifying an ASCII Value Example /\15/ matches ASCII 13 (CTRL-M) /\141/ matches ASCII 97 (the letter a) /\x0D/ matches ASCII 13 (CTRL-M) /\x61/ matches ASCII 97 (the letter a) /\cM/ matches ASCII 13 (CTRL-M) /\cZ/ matches ASCII 26 (CTRL-Z)
OOO species an octal value. HH species a hexadecimal value X species a control character (A-Z)
The special characters given in Table 13, Table 14, Table 15 can be combined as desired to nd very specic patterns. For example, if you log in to a node using the Operator Interface and query the event log, the events are displayed in the following format:
Example 102 Typical Event Log
*** Event Record from Event Log on Node 20(NODE20) *** Event Type = TRUNK (2), Subtype = 5 Orig Node = 20(NODE20), Orig TaskId = TRUNK (16.6) Occurred at 16:55:52 TODAY, Sequence Nbr = 0 Alarm Level = MINOR, Network Event Log = NO >>> Card N20C6 experienced a SUPER FRAME LOSS. TxStatus was 10, RxStatus was 70.
152
..........................................................
To print out only the second line that describes the Event Type and the sixth line that contains a description of the event, you could use the following script:
Example 103 Searching for Multiple Matches
#!/usr/thirdParty/perl/bin perl -w while (<>) { # read from file/STDIN if ( /^\s\s\s\sEvent Type/i # if found Event Type || /^>>>\s\w/ # or if found ">>> " || /\.$/) # or line ending with a period { print; } # print the line } # end of while
The script in Example 103 uses the logical OR operator (||) to specify three regular expressions to be matched; the input line is printed if at least one match is made. The rst regular expression (/^\s\s\s\sEvent Type/i) searches for a line that begins with four whitespace characters followed by the text Event Type in either upper or lowercase. The second regular expression (/^>>>\s\w/) searches for a line that begins with three right angle brackets followed by a whitespace character and a word character (either a digit or a letter). The ignore operator (i) is not needed in this case because the word special character (\w) matches both upper and lowercase letters automatically. The third regular expression (/\.$/) searches for a line that ends with a period, assuming that such lines are a continuation of the description that began in the
153
..........................................................
sixth line, as shown in Example 102. This, though, might not be a safe assumption to make, and you might end up printing unwanted lines. In fact, although the script in Example 103 does perform as intended in the great majority of cases, it makes a number of assumptions (such as that there will always be four whitespace characters before an Event Type line) that might not hold true in all cases. This script could be signicantly improved by using wildcard speciers, which are described in the next section.
154
*2
155
..........................................................
Table 16 Wildcard Character1 {x,y} Perl Wildcards (2 of 2) Description The square brackets match a specic number of consecutive instances of the immediately previous character, where x and y dene the range of allowable matches. For example: /1{5,10}/ matches from ve through ten consecutive instances of the number 1. Both the comma and second number of the range are optional. If the comma is present but not the second number, the second number is assumed to be innity. Thus, /\s{5,}/ matches ve or more whitespace characters. If both the command and second number are missing, the rst number species the exact number of characters that must be found. Thus, /\s{5}/ matches exactly ve whitespace characters. If six whitespace characters are present on a line, only the rst ve are matched. The square brackets specify a match with any of the enclosed characters. The characters can be listed singly ([abcdefg]) or as a range ([a-g]). For example: [0-9] is equivalent to the \d special character and matches any single digit; [a-zA-z0-9_] is equivalent to the \w special character and matches any word character. When followed by a carat (^), the square brackets specify a match with anything except the enclosed characters. The characters can be listed singly ([^abcdefg]) or as a range ([^a-g]). For example: [^0-9] is equivalent to the \D special character and matches any single character except a digit; [^a-zA-z0-9_] is equivalent to the \W special character and matches any single character except a word character.
[char list]
[^char list]
1. 2.
To match any of the wildcard characters themselves, put a backslash before the character. For example: /\./ matches the period (.) character, /\*/ matches one asterisk character, /\?/ matches a single question mark, and so forth. Be careful when using the asterisk (*) and question mark (?) since they are optional matches (they can match zero instances of a character or pattern). You might think the expression /\d*\.?\d*/ matches any set of decimal numbers (such as 1.0 or 0.899) but in reality it always matches any string because all of the search patterns are optional. When using * and ? in regular expressions, be sure you include at least one non-optional pattern to ensure a valid match is made.
156
..........................................................
Note: Do not confuse Perls wildcards with those used on the Unix command line. When used on the command line, the asterisk (*) refers to zero or more of any characters; also, the question mark (?), not the period (.), is used on the command line to represent any single character.
Perls wildcards are governed by two overriding rules: 1. Given a choice, earlier matches take precedence over matches that start later in a line. 2. If more than one match is possible with a given starting point, Perl returns the longest possible match. This behavior is called greedy and is why /.*/ matches everything on a line except the newline character. This greedy behavior of wildcards, though, can be both useful and troublesome. If, for example, you wanted to nd the last word in a sentence, you might consider using /\b.*\./ because you assume it nds only the last word boundary before a period. In reality, this regular expression returns everything between the rst whitespace on the line and the last period. Example 104 shows a Perl program that demonstrates this aspect of wildcards:
Example 104 Demonstrating a Greedy Match
#!/usr/thirdParty/perl/bin/perl -w while (<>) { /\b.*\./; # read from file/STDIN # match last word in sentence
157
..........................................................
print "$&\n"; } # # # # print what was matched (using the read-only variable Perl defines for this purpose) end while
To nd only the last word in this sentence (the word behavior), you must modify the script so it uses a more specic regular expression, in this case the \w+ expression, which matches only word characters, not spaces nor punctuation. See Example 105:
Example 105 Demonstrating a Less Greedy Match
#!/usr/thirdParty/perl/bin/perl -w while (<>) { /\b\w+\./; print "$&\n"; } # # # # # # read from file/STDIN match last word in sentence print what was matched (using the read-only variable Perl defines for this purpose) end while
158
..........................................................
When using wildcards, therefore, choose the expression that is as specic as possible (such as using \w+ instead of .*). This makes it less likely that the greedy behavior of wildcards will end up matching more than you intended.
Note: The question mark (?), asterisk (*), and plus sign (+) operators can be made ungreedy by appending another question mark after them (for example: ??, *?, +?). This greatly changes their behavior and should be used only by Perl experts.
The variety of wildcards that is present in Perl allows you to match a wide variety of very specic patterns that would be very difcult to match otherwise. For example, to search an event log to nd all lines that contain a node or card number, without using wildcards, you would have to use all of the following regular expressions:
Example 106 Matching Node and Card Numbers Without Wildcards # # # # # # # # # # # # to to to to to to to to to to to to find find find find find find find find find find find find N0 through N9 N10 through N99 N100 through N250 N0C0 through N9C9 N10C0 through N99C9 N100C0 through N250C9 N0C10 through N9C99 N10C10 through N99C99 N100C10 through N250C99 N0C100 through N9C127 N10C100 through N99C127 N100C100 through N250C127
/N\d/ /N\d\d/ /N\d\d\d/ /N\dC\d/ /N\d\dC\d/ /N\d\d\dC\d/ /N\dC\d\d/ /N\d\dC\d\d/ /N\d\d\dC\d\d/ /N\dC\d\d\d/ /N\d\dC\d\d\d/ /N\d\d\dC\d\d\d/
159
..........................................................
If you also wanted to nd port and bundle numbers, you would have to add another 48 possible patterns to ensure you found all possibilities. Wildcards, however, simplify all of this to one single regular expression, as shown in Example 107:
Example 107 Matching Node and Card Numbers With Wildcards
#!/usr/thirdParty/perl/bin/perl -w while (<>) { # read from file/STDIN if (/N\d+C?\d*/) # if node/card # { print; } # print the line } # end while
The single line shown in Example 107 matches anything that starts with one N and is followed by one or more digits (such as N10 or N204). It also matches an N followed by one or more digits and a C followed by zero or more digits (such as N10C23 or N204C1). Modifying this script to also search for port or bundle numbers is trivial, as shown in Example 108:
Example 108 Matching Node, Card, Port, and Bundle Numbers With Wildcards
#!/usr/thirdParty/perl/bin/perl -w while (<>) { # read from file/STDIN if (/N\d+C?\d*[BP]?\d*/) # if node/card/port/bundle # { print; } # print the line } # end while
160
..........................................................
Actually, though, the regular expression in Example 108 is a bit redundant since testing for the card, port, and bundle numbers is unnecessary testing for the node number also nds any card, port, and bundle numbers since they always include a node number as well. However, this script can be easily modied to print only those lines that contain a port number or a bundle number. Example 109 shows the same script, except that now a line must include a card number and either a port or bundle number to be printed:
Example 109 Matching Port and Bundle Numbers With Wildcards
#!/usr/thirdParty/perl/bin/perl while (<>) { # read from file/STDIN if (/N\d+C\d+[BP]\d+/) # if port or bundle # { print; } # print the line } # end while
Note: When devising complex regular expressions, it is recommended you use the approach shown in Example 107 through Example 109: start off by writing and testing a regular expression that matches the common features of a desired pattern (in this case, the node and card numbers). Then make the regular expression more specic in a step-by-step manner, testing each step to make sure the proper patterns are being matched.
Wildcards can also be used for substitutions, as shown in Example 110, which substitutes the words PORT/BUNDLE for every port or bundle number found on a line:
161
..........................................................
Example 110 Substituting Port and Bundle Numbers
#!/usr/thirdParty/perl/bin/perl while (<>) { # read from file/STDIN s/N\d+C\d+[BP]\d+/PORT\/BUNDLE/g; # do substitution on # all port/bundle #s print; # print the line } # end while
Although this sort of blanket pattern substitution might be useful on occasion, wildcard substitutions are more often done using Perls ability to separate a regular expression into subpatterns using parentheses. When a set of parentheses surrounds a portion of a regular expression, that part of the expression can be referenced later by using a special set of read-only substitution variables: $1 refers to the rst parenthetical expression (as read from left to right), $2 refers to the second such expression, and so forth.
Note: Perl also allows you to refer to the read-only substitution variables as \1 through \9, but this usage is discouraged since it does not work in all situations.
162
..........................................................
Example 112 shows the port/bundle substitution script rewritten to include parenthetical expressions so that more intelligent substitutions can be done:
Example 112 Substitution Using Read-Only Variables
#!/usr/thirdParty/perl/bin/perl -w %pbwords = ("B", "Bundle", "P", "Port", "b", "Bundle", "p", "Port"); # define an associate array to translate # "P" and "B" to the corresponding words while (<>) { # read from file/STDIN s/N(\d+)C(\d+)([BP])(\d+)/Node $1, Card $2, $pbwords{$3} $4/ig; # do substitution using # memorized values print; # print the line } # end while
The script in Example 112 still searches for port and bundle numbers, but now when one is found, it is converted from the NCP or NCB format into one that spells out the words node, card, port, and bundle. To achieve this, the new script has undergone three major changes from the previous one: The script now starts out by dening an associative array (%pbwords) which matches the letters b and B to the word Bundle and the letters p and P to the word Port. Since the regular expression matches either port or bundle numbers, this array uses the third letter (B or P) as the key to determine which word (Bundle or Port) should be printed. Since the regular expression is case-insensitive, the array needs to include both upper and lowercase letters.
163
..........................................................
The matching portion of the regular expression has been rewritten to include four sets of parentheses:
N(\d+)C(\d+)([BP])(\d+)
If a match is made, the rst four read-only substitution variables ($1, $2, $3, and $4) are lled in with whatever was matched. Specically, the $1, $2, and $4 variables are set to the actual node, card, and port or bundle numbers that were matched. The $3 variable is lled with either a B, b, P, or p, depending on what was found in the input line.
Note: The read-only substitution variables are set to new values only when a regular expression contains the appropriate number of parenthetical expressions AND a match is made. If no matches are ever made, these variables remain empty, and once a match is made, these variables retain their values until another match is made. As a general rule, therefore, assume these variables do not contain valid data unless the last regular expression had a successful match.
The substitution portion of the regular expression has been rewritten so that it can substitute the words Node, Card, Port, and Bundle in front of the appropriate node, card, port, and bundle numbers:
Node $1, Card $2, $pbwords{$3} $4
Because the third variable ($3) could contain either a B or P (in either upper or lowercase), the substitution uses the %pbwords associative array to determine what word should be used. The word Bundle is substituted for either a B or b and the word Port is substituted for either a P or p in the original line.
164
..........................................................
These changes to Example 112 allow more intelligent substitutions to be performed, where the original text determines the type of substitution that should be done:
Example 113 N75 N10C12 N10C12P13 N23C7B1 Examples of Substitutions Using Read-Only Variables remains remains becomes becomes Node Card Node Node N75 N10C12 10, Card 12, Port 13 23, Card 7, Bundle 1
165
..........................................................
166
Defining Subroutines
sub subroutine-name { insert one or more Perl statements here } # end of subroutine definition
The rest of your Perl script then accesses the subroutine by attaching an ampersand (&) to the front of its name:
&subroutine-name; # call this subroutine
By default a subroutine can access any variables in your program; the only variables it cannot access are those dened as local within other subroutines (see Dening Local Variables on page 174). Many subroutines can therefore perform their needed functions without any arguments being passed to or from them. For example, the script in Example 115 uses a subroutine named convert_to_upper to translate whatever is in the default operator ($_) to uppercase. The script reads one line, calls the subroutine to convert it, and then prints the converted line.
Example 115 Using a Typical Subroutine
167
..........................................................
&convert_to_upper; print; } ###### ### SUBROUTINE DEFINITIONS ###### sub convert_to_upper { tr/a-z/A-Z/; # translate whatever is in $_ # to uppercase } # end of subroutine definition # convert line to uppercase # print the uppercase line # end while
Defining Subroutines
While it is often convenient to have subroutines directly affect the data in the main program, this can lead to subroutines having unexpected side-effects on the rest of the program. To avoid this, you can instead pass data to and from subroutines, as described in the following sections.
168
For example, Example 116 is the same script shown in Example 115 except that now the variable $return is assigned the return value of the convert_to_upper subroutine. Since the last function or expression evaluated in this subroutine is the translate (tr) function, the return value is whatever tr returns (the number of characters translated by the function):
Example 116 Using a Typical Subroutine
#!/usr/thirdParty/perl/bin/perl -w while (<>) { # while reading files/STDIN $return = &convert_to_upper; # convert line to uppercase print; # print the uppercase line print "$return characters were converted.\n\n"; # show # of chars translated } # end while ###### ### SUBROUTINE DEFINITIONS ###### sub convert_to_upper {
169
..........................................................
# translate whatever is in $_ # to uppercase } # end of subroutine definition tr/a-z/A-Z/;
The return value can be anything, depending on whether the last function or expression evaluated was a number, a string, a simple array, or an associative array. This nal expression can be as simple as a variable assignment or as complex as any of Perls functions. For example, the script in Example 117 reads a line of input, calls a subroutine to divide the line into individual words, and returns those words in an array. This array is then sorted and printed as a list.
Example 117 A Subroutine that Returns an Array as a Value
#!/usr/thirdParty/perl/bin/perl -w while (<>) { # while reading files/STDIN @return = &tokenize; # put line in tokens (words) # Combine the tokens in @return into a sorted comma-separated # string and print them print("The words found in the line are:\n"); $token_list = join(', ',sort(@return)); print("\t$token_list\n\n"); } # end while ###### ### SUBROUTINE DEFINITIONS ###### sub tokenize { s/[^\w\s]//g; # eliminate all non-word, non
170
..........................................................
# whitespace characters # split line into words and # return as an array } # end of subroutine definition split(/\s+/);
Example 117 works because the split function automatically returns its tokens in a simple array, and since the split function was the last expression evaluated in the subroutine, what it returns becomes the subroutines return value as well. This script could be simplied, though, by removing the references to the $return variable, and having the join function operate directly on the output of the tokenize subroutine. See Example 118:
Example 118 Modied Subroutine that Returns an Array as a Value
#!/usr/thirdParty/perl/bin/perl -w while (<>) { # while reading files/STDIN # Tokenize the line and combine the returned tokens into a # comma-separated string and print them print("The words found in the line are:\n"); $token_list = join(', ',sort(&tokenize)); print("\t$token_list\n\n"); } # end while ###### ### SUBROUTINE DEFINITIONS ###### sub tokenize { s/[^\w\s]//g; # eliminate all non-word, non # whitespace characters
171
..........................................................
# split line into words and # return as an array } # end of subroutine definition split(/\s+/);
You might also think that you could simplify the tokenize subroutine into a single line, as shown in Example 119:
Example 119 Poorly Modied Subroutine that Returns an Array as a Value
#!/usr/thirdParty/perl/bin/perl -w while (<>) { # while reading files/STDIN # Tokenize the line and combine the returned tokens into a # comma-separated string and print them print("The words found in the line are:\n"); $token_list = join(', ',sort(&tokenize)); print("\t$token_list\n\n"); } # end while ###### ### SUBROUTINE DEFINITIONS ###### sub tokenize { split(/\s+/,($_ =~ s/[^\w\s]//g)); # split line into words and # return as an array } # end of subroutine definition
However, if you make this modication the script no longer prints out a list of words; at most it prints out only one number. This is because the substitution function, like
172
..........................................................
the tr function, returns how many characters were changed. The split function ends up operating on this number, doing operations such as split(/\s+/, 1) instead of split(/\s+/,This is an input line). This example points out the need of making sure you know what values are being returned by both functions and subroutines. For this purpose, it is highly recommended that you write your scripts as simply and straight-forwardly as possible, particularly near the end of subroutines, so that there is no doubt about what value is being returned.
173
sub subroutine-name { local($local-var1, $local-var2, @local-array); insert one or more Perl statements here } # end of subroutine definition
Example 120 shows three local variables being created: two scalar variables and one array. There is no limit to the type of local variables you can create or to the number of local statements you have within a subroutine.
Note: The local operator can be used anywhere within a subroutine, but it is good programming practice to put all such statements at the beginning of a subroutines declaration, as shown in Example 120, so it is immediately apparent what variables are local.
174
..........................................................
Local variables are created when a subroutine is called and exist only as long as the subroutine runs. Changing them does not affect any other variables in your script, even if those other variables have the same name as your local variables. For example, if a local variable has the same name as a global variable, the global variables value is saved before the subroutine is called. While the subroutine runs, only the local variable is used, and when the subroutine ends, the local variable disappears and the global variable is restored with its previous value. If a subroutine calls other subroutines, all of the variables in the top-level subroutine become global to the lower-level subroutines. Those lower-level subroutines must declare their variables local as well to prevent changing the top-level subroutines variables. Example 121 is a simple demonstration of this: the main part of the script calls the subroutine sub1, which in turn calls the subroutines sub2 and sub3. Because sub2 has declared a local version of the default operator ($_), its print statement outputs the same line over and over again, instead of the input line that is output by the print statements in the main routine and the sub1 and sub3 subroutines.
Example 121 Using Local Variables in More than One Subroutine
#!/usr/thirdParty/perl/bin/perl -w while (<>) { # while reading files/STDIN print The main routine thinks the input line is:\n; print \t$_\n; # original input line &sub1; } # end while
175
..........................................................
###### ### SUBROUTINE DEFINITIONS ###### sub sub1 { print The sub1 subroutine thinks the input line is:\n; print \t$_\n; # original input line &sub2; &sub3; } # end of subroutine definition sub sub2 { # sub2 defines a local $_ local($_) = sub2s input string\n; print The sub2 subroutine thinks the input line is:\n; print \t$_\n; # always the sub2 version } # end of subroutine definition sub sub3 { print The sub3 subroutine thinks the input line is:\n; print \t$_\n; # original input line } # end of subroutine definition
Two things of note should be noted about the script in Example 121. First, as shown in the sub2 declaration, you can assign a value to a local variable when you declare it, using Perls standard assignment rules concerning scalar variables and arrays. For example, all of the following are valid ways of assigning values to local variables:
176
..........................................................
Example 122 Assigning Values to Local Variables both $str1 and $str2 are undefined (equal to null) puts copy of default operator into both $str1 and $str2 puts first two elements of array into $str1 and $str2 copies global @array into local version tokenize global $_ and puts the words into local array
local($str1,$str2);
Local variables can be given the same names as global variables without Perl getting confused, but extensive use of this feature can make debugging and maintaining your scripts a difcult matter. It is not immediately apparent, for example, when looking at the script in Example 121, when $_ refers to a global variable and when it refers to a local variable. Consider, therefore, adopting a naming convention for local variables that makes it easy to tell when a subroutine is using a local variable. For example, you could give all of your local variables a prex such as sub_ ($sub_name, $sub_length) or a sufx such as _loc ($name_loc, $length_loc). Using a consistent naming convention for local variables not only simplies the process of debugging your scripts but can also aid you when you revise a script later on.
177
Passing Arguments
You can pass any type of variable to a subroutine: scalar, array, associative array, or a mixture of these. Neither the number of variables nor their types need to be declared in advance because Perl automatically puts a subroutines arguments into a local array named @_. When creating this array, Perl inserts the passed arguments into the array in the order they were passed to the subroutine. Scalar variables are assigned one array slot each, while arrays are copied in element by element. A subroutine can access the @_ array as with any other array; it can act upon the array as a whole with operators such as split, or it can act upon individual elements of the array ($_[0], $_[1], and so forth). Thus, if you pass a subroutine two scalar variables and an array with 12 elements, that subroutines @_ array contains 14 separate elements. It is up to the subroutine to determine which elements came from which source and which ones it wants to use. You can determine the number of elements either by setting a scalar variable equal to the array (such as $length=@_;) or by looping through the elements of the array until you reach the rst one that is undened. Example 123 demonstrates both methods:
178
..........................................................
Example 123 Subroutine Using an Array as Passed Argument
Passing Arguments
#!/usr/thirdParty/perl/bin/perl -w while (<>) { &to_upper(split()); } # # # # # read files (or STDIN) call subroutine to convert to uppercase, passing an array as the argument list end while
sub to_upper { local($i,$length) = (0,0); # initialize local variables $length = @_; # get number of array elements print("$length words were passed.\n\n"); while ($_[$i]) { # as long as element is defined $_[$i] =~ tr/a-z/A-z/;# convert to uppercase print "Word #$i is: $_[$i]\n"; # and print it $i = $i +1 ; # increment $i } # end while } # end to-upper
The main routine in Example 123 simply reads a line from the input les (or STDIN), uses the split function to convert it into an array of words, and passes that array to the to_upper subroutine. The subroutine gets the length of the array and prints it, and then loops through the array, converting each individual word to uppercase and printing it.
Note: Do not confuse the @_ array and its elements ($_[0], $_[1], and so forth) with the default operator $_. In particular, $_[0] is not the same as $_.
179
..........................................................
Associative arrays can be passed as arguments as well, but be aware that because Perl automatically optimizes its storage of associative arrays, the order in which they are passed to a subroutine might not be the order in which they were created. For example, Example 124 shows a script that creates one regular array and one associative array with the same elements; both arrays are then passed to a subroutine that prints out all of their elements:
Example 124 Passing an Associative Array to a Subroutine
Passing Arguments
#/usr/thirdParty/perl/bin/perl -w %array = ("john","boy","jill","girl"); @array = ("john","boy","jill","girl"); &sub1(%array,@array); # pass both arrays to sub1 sub sub1 { $i = 0; while ($_[$i]) { print "Element $i is: $_[$i]\n"; $i = $i + 1; } # end while } # end subroutine definition
The two arrays shown in Example 124 are created in the exact same order, but the scripts output (see Example 125) shows that the associate arrays rst element (identied as element 0 of the @_ array) has been changed to jill while the regular arrays rst element (identied as element 4 of the @_ array) is still john:
180
..........................................................
Example 125 Element Element Element Element Element Element Element Element 0 1 2 3 4 5 6 7 is: is: is: is: is: is: is: is: Output Comparing an Array to an Associative Array jill girl john boy john boy jill girl
Passing Arguments
Since Perl does not guarantee any sort of ordering of its associative arrays (except that keys are always paired with their associated values), your subroutines should not count on any particular ordering of the arrays elements. Another point that should be made is when associative arrays are passed to a subroutine, they lose their associativity and become regular arrays. If your subroutines need to access an associative array through its key values, the array should be accessed as a global variable, not as arguments passed to the subroutine. It is possible, though, to recreate an associative array within a subroutine by loading a local associative array with the elements in the @_ array, using a technique similar to that shown in Example 126. However, this technique has limited usefulness and is not recommended for large associative arrays.
Example 126 Recreating an Associative Array Within a Subroutine
181
..........................................................
&recreate_array(%array); sub recreate_array { local($i) = 0; local(%_);
Passing Arguments
while ($_[$i]) { $_{$_[$i]} = $_[$i+1]; # recreate associative array $i = $i + 2; # point to next key value } # end while print "John is a ",$_{"john"}; print " and Jill is a ",$_{"jill"},".\n"; } # end subroutine definition
182
A
abs command 97 anchoring patterns for regular expressions 149 arithmetic functions 97 arithmetic operators 73 arrays 31 to 37 $# variable 34 @ARGV 49, 112, 117 to 118 adding elements using a list 32 avoiding confusion 46 converting to an associative array 44 operators 36 reading an entire file 110 rules 31
Index - 1
..........................................................
using $# to get the last element number 110 using with subroutines 178 assignment operators 73 associative arrays 38 to 45 adding or modifying elements 39 avoiding confusion 46 converting to a normal array 44 deleting an element 41 extracting the keys 42 extracting the values 43 using with subroutines 178 atan command 97 abs 97 atan 97 chdir 124 chomp 101 chop 36, 101 chr 101 close 122 closedir 128 comments (#) 18 cos 97 die 120 each 44 exp 97 for 90 to 92 foreach 93 to 95 format 59 gmtime 100 hex 97 if 84 to 85 index 102 int 98 lc 102 lcfirst 102 length 102 localtime 100 log 98 mkdir 130 oct 98 open 119 to 122 opendir 126
Index
B
binary bitwise operators 74 boolean operators 82 buffering of output commands 69
C
chdir command 124 chomp command 101 chop command 36, 101 chr command 101 close command 122 closedir command 128 cmp operator 81 command line arguments 112 command line options 10 to 16 commands
Index - 2
..........................................................
ord 102 pop 36 print 52 to 54 printf 55 to 58 push 36 rand 99 readdir 126 renaming 129 reverse 36, 95 rindex 103 rmdir 130 select 55, 59, 111 shift 37 sin 99 sort 37, 95, 127 sqrt 99 srand 99 sub 166 to 182 substr 103 time 100 tr 93 uc 103 ucfirst 103 undef 104 unless 86 to 87 unlink 128 unshift 37 warn 120 while 88 to 89 write 59 to 68 comments 18 cos command 97
Index
D
delete operator 41 diamond (<>) operator 112 to 116 die command 120
E
each operator 44 eq operator 80 escaped characters 28 exp command 97 exponentiation operator 74
F
filehandles 106, 119 to 123 files 105 to 135 accessing the command line 117 to 118 accessing the command line arguments 112 changing the current working directory 124 deleting a file 128 directory operations 124 to 131 file test operators 132 to 135 finding the current file name when using the <> operator 114 listing the files in a directory 125 making a new directory 130
Index - 3
..........................................................
opening a file for appending 121 opening a file for writing 120 reading a directory 126 reading STDIN one line at a time 108 reading STDIN using the diamond (<>) operator 112 to 116 removing a directory 130 renaming 129 sorting a file list 127 using filehandles 106, 119 to 123 using STDIN to read an entire file 110 using STDOUT and STDERR 110 using STDOUT, STDIN, and STDERR 107 to 111 for command 90 to 92 foreach command 93 to 95 format command 59 functions 96 to 104 arithmetic 97 string 101 timekeeping 100
Index
I
if command 84 to 85 index command 102 int command 98
K
keys operator 42
L
lc command 102 lcfirst command 102 le operator 81 length command 102 localtime command 100 log command 98 logical operators 82 lt operator 80
G
ge operator 80 gmtime command 100 gt operator 80
M
mkdir command 130 modulo operator 74
N
ne operator 80
H
hex command 97
Index - 4
.......................................................... O
oct command 98 open command 119 to 122 opening a file for appending 121 opening a file for writing 120 using the die command 120 opendir command 126 operators ! 82 != 79 !~ 81, 141 % 74 && 82 ** 74 < 80 <=> 80 <> 112 to 116 == 79 =~ 81, 140 > 79 >= 80 @* 66 || 82 arithmetic 73 assignment 73 binary bitwise 74 cmp 81 delete 41 each 44 eq 80 file test 132 to 135 ge 80 gt 80 keys 42 le 81 logical 82 lt 80 ne 80 relational 79 values 43 ord command 102 output commands 51 to 68
Index
P
Perl accessing command line arguments 112 accessing the command line 117 to 118 additional documentation 7 anchoring patterns 149 arithmetic functions 97 array operators 36 arrays 31 to 37 assignment operators 73 associative arrays 38 to 45 basic features 5 basic rules 17 buffering of print commands 69 built-in functions 96 to 104 changing the current working directory 124 command line options 10 to 16
Index - 5
..........................................................
comments 18 control structures 83 to 95 debugger 11 definitions of true and false 19 deleting a file 128 directory operations 124 to 131 displaying the version number 15 file access 105 to 135 file test operators 132 to 135 floating-point format of numeric data 28 generating reports 59 to 68 getting the program or scripts name 117 introduction 4 listing the files in a directory 125 making a new directory 130 naming conventions 20 output commands 51 to 68 producing warnings 15 reading a directory 126 reading an entire file using STDIN 110 regular expressions 136 to 165 removing a directory 130 renaming files 129 report formats 61 scalar variables 25 to 30 simple matching and replacing 144 to 154 special characters 28 string functions 101 subroutines 166 to 182 timekeeping functions 100 topics not covered 3 using filehandles 106, 119 to 123 using STDOUT, STDIN, and STDERR 107 to 111 using subpatterns 162 using the interpreter 9 using wildcards 155 to 165 variables 23 to 50 pop command 36 print command 52 to 54 changing the buffering behavior 69 defaults to STDOUT 111 sending to STDOUT and STDERR 110 printf command 55 to 58 changing the buffering behavior 69 format types 55 procedures 166 to 182 Process ID getting from the $$ variable 99 push command 36
Index
R
rand command 99 readdir command 126 regular expressions 136 to 165 !~ operator 141 $& variable 142 $ variable 142 $ 142 =~ operator 140 anchoring patterns 149
Index - 6
..........................................................
defining 138 to 139 replacing subpatterns 162 rules 140 to 143 simple matching and replacing 144 to 154 special characters 150 special variables 142 specifying ASCII values 152 using wildcards 155 to 165 relational operators 79 rename command 129 reports 59 to 68 $% variable 68 $= variable 67 @* operator 66 adding page headers 67 formats 61 outputting multiple lines 66 writing multiple lines 63 return values, subroutines 169 to 173 reverse command 36, 95 rindex command 103 rmdir command 130
Index
S
scalar variables 25 to 30 rules 25 strings and numeric data 26 using double and single quotes 26 select command 55, 59, 111 shift command 37
sin command 99 sort command 37, 95, 127 special characters 28 sqrt command 99 srand command 99 STDERR 107 to 111 sending a fatal error using the die command 120 sending a warning message using the warn command 120 STDIN 107 to 111 reading an entire file 110 reading using the diamond (<>) operator 112 to 116 using to read one line 108 STDOUT 107 to 111 default for the print command 111 string functions 101 sub command 166 to 182 defining 167 to 168 passing arguments 178 to 182 using local variables 174 to 177 using the return value 169 to 173 subroutines 166 to 182 defining 167 to 168 passing arguments 178 to 182 using local variables 174 to 177 using the return value 169 to 173 substr command 103
Index - 7
.......................................................... T
time command 100 timekeeping functions 100 tr command 93 $_ 48, 109 $| 71 $ 49, 142 $ 50, 142 $0 49, 117 $1 48, 50 $ARGV 49, 114 @_ 178 @ARGV 49, 112, 117 to 118 avoiding confusion 46 defining local variables 174 to 177 filehandles 106 OUTPUT_AUTOFLUSH 71 scalars 25 to 30 special characters 28 string and numeric data 26 use in subroutines 174 to 177
Index
U
uc command 103 ucfirst command 103 undef command 104 unless command 86 to 87 unlink command 128 unshift command 37
V
values operator 43 variables 23 to 50 $! 48, 130, 131 $# 34, 110 $$ 49, 99 $% 68 $& 49, 142 $( 49 $) 49 $/ 48 $< 49 $= 67 $> 49 $] 48
W
warn command 120 while command 88 to 89 wildcards 155 to 165 replacing subpatterns 162 rules 157 using for matching 159 using for substitutions 161 write command 59 to 68 changing the buffering behavior 69
Index - 8