Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Perl Regex

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Universit y of Vict oria

Depart ment of Comput er


Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Sli de 1
Perl: Regular
expressions
A powerful tool for searching and
transforming text.
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Sli de 2
Mot ivat ion
We have seen many
operat ions involving
st ring comparisons
Several Perl built -in
funct ions also help wit h
operat ions on st rings
split & join
subst r
lengt h
There is a lot we can do
wit h such funct ions
Example:
Given a st ring holding
some t imest amp,
ext ract out different
part s of dat e & t ime
while (my $line = <STDIN>) {
chomp $line;
if ($line eq BEGIN!ST"#T$) {
% &&&
'
'
% &&&
my ($p(ope()y* $+,l-e) = .pli) //* $foo;
if ($p(ope()y eq DST"#T) {
% &&& e)c e)c e)c
'
0c.+1fiel2. = .pli) /*/* $inp-)1line;
$o-)p-) = 3oin $* 02,),;
$fi(.)1ch,( = .-4.)( $inp-)* 5* 6;
$wi2)h = len7)h $he,2in7;
p(in) $he,2in7* 8n
p(in) 9$ : $wi2)h;
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Sli de 3
Mot ivat ion
Recall:
iCalendar dat es are used
by iCal-like programs
The year, mont h, et c.
port ions of t he code are
fixed in posit ion
How could we use subst r
t o help us?
This code cert ainly obt ains
what we need.
But it can be a bit t ricky
t o get right .
Adapt ing code t o use
anot her dat e/t ime format
is not t rivial
and is bugbait !
my $2,)e)ime = ;55<6;;<T5<=555$;
$ye,( = .-4.)( $2,)e)ime* 5* >;
$mon)h = .-4.)( $2,)e)ime* >* ;;
$2,y = .-4.)( $2,)e)ime* ?* ;;
$ho-( = .-4.)( $2,)e)ime* @* ;;
$min = .-4.)( $2,)e)ime* 66* ;;
$.ec = .-4.)( $2,)e)ime* 6=* ;;
% ISA B?56 )ime fo(m,)
my $2,)e)ime = i;55=9659=6T6==C6>95<55$;
$ye,( = .-4.)( $2,)e)ime* 6* <;
$mon)h = .-4.)( $2,)e)ime* C* B;
% coffee 4(e,D
% &&&
$2,y = .-4.)( $2,)e)ime* @* ;;
$ho-( = .-4.)( $2,)e)ime* 6;* ;;
$min = .-4.)( $2,)e)ime* 6>* ;;
$.ec = .-4.)( $2,)e)ime* 6?* ;;

H
a
z
a
r
d
o
u
s

t
o

y
o
u
r

h
e
a
l
t
h

Universit y of Vict oria


Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Sli de 4
Mot ivat ion
A bet t er met hod is t o
indicat e t he st ring s pat t ern
in a way t he reflect s t he
act ual order of pat t ern
component s
The dat e begins at t he
st art of t he st ring.
The year is four digit s.
The mont h follows (t wo
digit s)
and t hen t he day.
The T charact er
separat es t he dat e and
t ime
Hour, minut e and dat e
follow, each t wo digit s
long.
For t he elder Perlmongers:
my ($ye,(* $mon)h* $2,y*
$ho-(* $min-)e* $.econ2)
= $2,)e)ime
=E m{ 8" % .),() of .)(in7
(82{>') % ye,(
(82{;') % mon)h
(82{;') % 2,y
T % li)e(,l T
(82{;') % ho-(
(82{;') % min-)e
(82{;') % .econ2
8F % en2 of .)(in7
':m.;
my $2,)e)ime = ;55<6;;<T5<=555$;
if ($2,)e)ime =E
/G(82{>')(82{;')(82{;')T(82{;')(82{;')(82{;')$/)
{
($ye,(* $mon)h* $2,y* $ho-(* $min* $.ec)
= ($6* $;* $=* $>* $<* $?);
'
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Sli de 5
Mot ivat ion
Back t o our code
modificat ion example
Now we have a different
dat e format
Using a regular
expression, we can
great ly reduce t he
possibilit y of bugs
St ring begins wit h an
i
followed by year
followed by a dash
followed by mont h
et c
my ($ye,(* $mon)h* $2,y*
$ho-(* $min-)e* $.econ2)
= $ic,l12,)e
=E m{ 8" % .),() of .)(in7
i % li)e(,l i
(82{>') % ye,(
9 % li)e(,l 2,.h
(82{;') % mon)h
9 % li)e(,l 2,.h
(82{;') % 2,y
T % li)e(,l T
(82{;') % ho-(
% li)e(,l colon
(82{;') % min-)e
% li)e(,l colon
(82{;') % .econ2
&H % i7no(e (em,in2e(
8F % en2 of .)(in7
':m.;
ISA B?56 )ime fo(m,)
my $2,)e)ime = i;55=9659=6T6==C6>95<55$;
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Sli de 6
Topics
Simple mat ching
Met acharact ers
Anchored search
Charact er classes
Range operat ors in
charact er classes
Mat ching any charact er
Grouping
Ext ract ing Mat ches
Search and Replace
Our coverage of regex synt ax will
be much more slowly paced t hat
t he mot ivat ion just shown!
Previous slides have been
shown t o give you a flavour
of what regular expressions
can achieve.
We will learn how t o
const ruct such expression
over t he next few lect ures.
We have a range of t opics
Regular expressions can seem
complex and crypt ic
However, slow and pat ient
work wit h such expressions
will improve your
product ivit y.
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Sli de 7
Perl Regular Expressions
Perl is renowned for it s
excellence at t ext
processing.
Handling of regular
expressions plays a big
fact or in it s fame.
Mast ering even t he basics
will allow you t o manipulat e
t ext wit h ease.
Regular expressions have a
st rong formalism (FSA).
You have already used
some and seen ot hers.
Ot her languages have
some support for regexes,
usually via some library.
I l. J&c
I p. ,-: K 7(ep .;?<.J$ K le..
L,+,
impo() 3,+,&-)il&(e7e:&J;
My)hon
impo() (e;
N%
-.in7 Sy.)em&Te:)&#e7-l,(E:p(e..ion.;
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Sli de 8
Simple St ring Mat ching
Regular expressions are
usually used in
conjunct ion wit h an if
if < st ring mat ches
t his pat t ern>
... t hen > do
somet hing wit h t hat
mat ch> .
The simplest such mat ch
refers t o a st ring
But not e: t his is much
different t hat using eq
my $line = <SAOEINMPT>;
chomp $line;
% Pn4eDnown.) )o p(o7(,mme(* )he fi(.) line
% of )he inp-) i. )he line Qello* Ro(l2$;
if ($line =E m/Ro(l2/:m.) {
p(in) #e7e:p m,)che.S8n$;
'
el.e {
p(in) Ah* poop&8n$;
'
if ($line eq Ro(l2$) {
p(in) line i. eq-,l )o TRo(l2U8n$;
'
el.e {
p(in) line .-(e ,inU) eq-,l )o TRo(l2U8n$;
'
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Sli de 9
A word about
m/ yadayada/ xms
The t ext bet ween t he t wo slashes is t he regular expression
( regex ).
Leading m indicat es t he regex is used for a mat ch
Trailing xms are t hree regex opt ions
x : Ext ended format t ing (whit espace in regex is ignored)
m : For line boundaries (and eliminat es a cause of some subt le
bugs)
s : ensures everyt hing is mat ched by t he . symbol
Why all of t his verbiage inst ead of plain old /yadayada/ as of
old?
Also not e: m{ } or m//
/UVG88UWJ(X88&VG88UWJ)JU/
m{ T % ,n openin7 .in7le q-o)e
VG88UWJ % ,ny non9.peci,l ch,(.
(X % )hen ,ll of&&
88 & % ,ny e:plici)ly 4,cD.l,.he2 ch,(
VG88UWJ % followe2 4y ,ny non9.peci,l ch,(.
)J % (epe,)e2 Fe(o of m,ny )ime.
T % , clo.in7 .in7le q-o)e
':m.
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 10
Anot her example
The code on t he right
searches for a pat t ern in
some dict ionary file
Not e t hat a command-
line argument is being
used for a regex!
Also not e < > synt ax:
This t akes t he first
unused command-line
argument , and uses it
as a filename for
opening!
%S/-.(/4in/pe(l
-.e .)(ic);
my $(e7e:p = .hif) 0"#G!;
while (my $wo(2 = <>) {
if ($wo(2 =E m/$(e7e:p/:m.) {
p(in) $wo(2;
'
'
I &/.e,(ch&pl p)e( /-.(/.h,(e/2ic)/lin-:&wo(2.
,4(-p)e(
"c,lyp)e(,e
,c,n)hop)e(,n
"c,n)hop)e(i
&&& <.nip> &&&
-nch,p)e(
-nch,p)e(e2
-n2e(p(omp)e(
&&& <.nip> &&&
Yy7op)e(i.
Fy7op)e(on
Fy7op)e(o-.
I
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 11
Met acharact ers
Regexs obt ain t heir power
by describing set s of
st rings.
Such descript ions involve
t he use of
met acharact ers
Of course, some st rings
t hat we want t o mat ch will
cont ain t hese st rings.
Therefore we must
escape t hem.
{ ' V W ( )
G $ &
K J X
/ 8
;H;=>$ =E m/;H;/:m. % 2oe.nU) m,)ch
;H;=>$ =E m/;8H;/:m. % 2oe. m,)ch
The in)e(+,l i. V5*6)&$ =E
m/V5*6)&/:m. % .yn),: e((o(
The in)e(+,l i. V5*6)&$ =E
m/8V5*68)8&/:m. % 2oe. m,)ch
/-.(/4in/pe(l$
=E m/8/-.(8/4in/8/pe(l/:m. % m,)che.
/-.(/4in/pe(l$
=E m{/-.(/4in/pe(l':m. % 4e))e(
TN8RINDARSU =E m/N88RINDARS/ % m,)che.
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 12
Anchoring
We may wish t o anchor a mat ch t o cert ain
locat ions
^ mat ches t he beginning of a line.
$ mat ches t he end of a line.
\A mat ches t he beginning of a st ring.
\z mat ches t he end of a st ring.
ho-.eDeepe($ =E m/Deepe(/:m. % m,)che.
ho-.eDeepe($ =E m/GDeepe(/:m. % 2oe. no) m,)ch
ho-.eDeepe($ =E m/Deepe(/:m. % m,)che.
ho-.eDeepe($ =E m/Deepe(8n/:m. % ,l.o m,)che.
Deepe($ =E m/GDeep$/:m. % 2oe. no) m,)ch
Deepe($ =E m/GDeepe($/:m. % m,)che.
Deepe($ =E m{8" Deepe( 8F':m. % m,)che.
my $)e:) =ZQe(e i. one line&8nI) i. followe2 4y8n"no)he( lineS8nZ;
if ($)e:) =E m{line8& $':) { p(in) ZGo)ch,8nZ; ' el.e { p(in) ZAh 2e,(8nZ; '
if ($)e:) =E m{line8& $':m) { p(in) ZGo)ch,8nZ; ' el.e { p(in) ZAh 2e,(8nZ; '
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 13
Charact er classes
These allow
set s of
possible
charact ers
t o be
mat ched
Used at
desired
point s wit hin
a regex.
m/c,)/:m. % m,)che. Tc,)U
m/V4c(W,)/:m. % m,)che. T4,)* Tc,)U* o( T(,)U
m/i)emV56;=><?CB@W/:m. % m,)che. Ti)em5U* && Ti)em@U
,4c$ =E m/Vc,4W/:m. % m,)che. T,U
m/Vy[WVeEWV.SW/:m. % m,)che. c,.e9in.en.i)i+e [ES
m/ye./:m.i % .imple( w,y* -.in7 i$
m/(Xi)ye./:m. % .,me
m/V8WcW2ef/:m. % m,)che. TW2efU o( Tc2efU
$: =T4c(U
m/V$:W,)/:m. % m,)che. T4,)U* Tc,)U* T(,)U
m/V8$:W,)/:m. % m,)che. T$,)U o( T:,)U
m/V88$:W,)/:m. % m,)che. T8,)U* T4,)* Tc,)U*
o( T(,)U
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 14
Range operat ors
Ranges can
eliminat e some
ugly code
[ 0123456789]
becomes [ 0-9]
[ abcdefghijklmnopqrs
t uvwxyz] becomes [ a-
z]
If - is t he first or last
charact er in a charact er
class, it is t reat ed as an
ordinary charact er
m/i)emV59@W/:m. % i)em5* i)em6* &&& i)em@
m/V59@4:9FW,,/:m. % T5,,U* &&&* T@,,U*
% T4,,U* T:,,U* Ty,,U*
% o( TF,,U
m/V59@,9f"9\W/:m. % m,)che. he: 2i7i)
m/V,9FW/i % m,)che. , wo(2$ ch,(
% ,ll ,(e eq-i+,len)
m/V9,4W/:m.
m/V,49W/:m.
/V,894W/:m.
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 15
Negat ed charact er
classes
The special charact er
^ in t he first posit ion
of a charact er class
denot es a negat ed
charact er class
Mat ches any charact er
but t hose in t he
bracket s
m/[a]at/xms
# doesnt match aat or at, but
# matches all other bat, cat,
# 0at, %at, etc.
m/[0!]/xms
# matches a nonnumer"c character
m/[a]at/xms
# matches aat or at# here
# "s ord"nary
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 16
Mat ching any charact er
The period '. ' mat ches any charact er but "\n"
A period is a met acharact er, it needs t o be
escaped t o mat ch as an ordinary period.
m/..rt/xms # matches any $ chars, %ollo&ed by rt
m/end'./xms # matches end.
m/end[.]/xms # same th"n(, matches only end.
)) *+ m/./xms # doesnt match needs a character
)a) *+ m/.,/xms # matches
)) *+ m/.,/xms # doesnt match needs a character
)'n) *+ m/.,/xms # doesnt match needs a character
# other than 'n
)a'n) *+ m/.,/xms # matches, "(nores the 'n
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 17
Mat ching t his or t hat
We would like t o mat ch different
possible words or charact er st rings
We use t he alt ernat ion charact er |
(pipe)
"cats and dogs" = /cat|dog|bird/ # matches "cat"
"cats and dogs" = /dog|cat|bird/ # matches "cat"
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 18
Grouping Things
Toget her
Somet imes we want alt ernat ives for part of a
regular expression.
/(a|b)b/ # matches ab or bb
/(ac|b)b/ # matches acb or bb
/(a|b)c/ # matches ac at start of string or
# bc anywhere
/(a|[bc])d/ # matches ad, bd, or cd
/house(cat|)/ # matches either housecat
# or house
/house(cat(s|)|)/ # matches either housecats or
# housecat or house.
# Note groups can be nested.
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 19
Ext ract ing Mat ches
The grouping met acharact ers () also serve anot her
complet ely different funct ion: t hey allow t he ext ract ion of
t he part s of a st ring t hat mat ched.
For each grouping, t he part t hat mat ched inside goes int o
t he special variables $1, $2, et c.
# extract hours, minutes, seconds
$time = /(\d\d):(\d\d):(\d\d)/ # match hh:mm:ss format
# \d is equivalent to [0-9]
$hours = $1;
$minutes = $2;
$seconds = $3;
# More compact code, equivalent code
($hours,$minutes,$second) = ($time =/(\d\d):(\d\d):
(\d\d)/)
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 20
Mat ching Repet it ions
We would like t o be able t o mat ch mult iple t imes:

a? = mat ch ' a' 0 or 1 t imes (~ opt ional)

a* = mat ch ' a' 0 or more t imes, i.e., any number of t imes

a+ = mat ch ' a' 1 or more t imes, i.e., at least once

a{n,m} = mat ch at least n t imes, but not more t han m


t imes.

a{n,} = mat ch at least n or more t imes.

a{n} = mat ch exact ly n t imes


$year = /\d{2,4}/ # make sure year is at least 2 but
# not more than 4 digits
/[a-z]+\d*/i # match a word and any number of digits
/y(es)?/i # matches y, Y,
# or a case-insensitive yes
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 21
Search and Replace
Regular expressions also play a role in
search and replace operat ions in Perl
Search and replace is accomplished
wit h t he s/// operat or
General form:
s/regexp/replacement/modi ers
$x = "Time to feed the cat!";
if ( $x = s/cat/hamster/ ) {
print $x; # Time to feed the hamster!
}
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 22
More Search and Replace
Commands
$y = "'quoted words'";
$y = s/'(.*)'$/<<$1>>/ # strip single quotes, $y
# contains "<<quoted words>>"
$x = "I batted 4 for 4";
$x = s/4/four/ # doesnt do it all:
# $x contains
# "I batted four for 4
$x = "I batted 4 for 4";
$x = s/4/four/g # /g modifier does it all:
# $x contains
# "I batted four for four"
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 23
A f ew more regexp
t opics
Advanced uses of mat ches
Escape sequences
List and scalar cont ext , e.g., phone
numbers
Finding all inst ances of a mat ch
Parent hesis
Subst it ut ing wit h s///
t r, t he t ranslat e funct ion
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 24
Advanced uses of
mat ches
You can assign pat t ern memory
direct ly t o your own variable
names (capt uring):
($phone) = $value =~ /^phone\:(.+)$/;
Read from right t o left . Apply t his pat t ern
t o t he value in $value, and assign t he
result s t o t he list on t he left .
($front,$back) = /^phone\:(\d{3})-(\d{4})/;

Apply t his pat t ern t o $_ and assign t he


result s t o t he list on t he left .
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 25
Meaning of backslash let t ers
\n : newline
\r: carriage ret urn
\t : t ab
\f: formfeed
\d: a digit (same as [ 0-9] )
\D: a non-digit
\w: an alphanumeric charact er, same as [ 0-9a-z_A-
Z]
\W: a non-alphanumeric charact er
\s: a whit espace charact er, same as [ \t \n\r\f]
\S: a non-whit espace charact er
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 26
Reminder: list or scalar
cont ext ?
A pat t ern mat ch ret urns 0 (false) or 1 (t rue) in
scalar cont ext , and a list of mat ches in array
cont ext .
Recall: There are a lot of funct ions t hat do
different t hings depending on whet her t hey are
used in scalar or list cont ext .
# returns the number of elements
$count = @array
# returns a reversed string
$revString = reverse $string

# returns a reversed list
@revArray = reverse @array
You must always be caut ious of t his behaviour.
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 27
Pract ical Example of
Cont ext
$phone = $string =~ /^.+\:(.+)$/;
$phone cont ains 1 if pat t ern mat ches,
0 ot herwise
($phone) = $string =~ /^.+\:(.+)$/;

$phone cont ains t he mat ched st ring


Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 28
Finding all inst ances of a
mat ch

Use t he g modifier wit h a regular


expression
@sites = $sequence =~ /(TATTA)/g;
t hink g for global
Ret urns a list of all t he mat ches (in
order), and st ores t hem in t he array
If you have n pairs of parent heses,
t he array looks like t he following:
] ($6*$;*^$n*$6*$;*^$n*^)
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 29
Perl is Greedy
Perl regular expressions t ry t o mat ch t he
largest possible st ring which mat ches your
pat t ern:
lalaaaaagag =~ /(la.*ag)/

/la.*ag/ mat ches laag, lalag, laaaaaag


$1 cont ains lalaaaaagag
If t his is not what you want ed t o do, use t he
? modifier:
lalaaaaagag =~ /(la.+?ag)/

/(la.+?ag)/ mat ches as few charact ers


as possible t o find mat ching pat t ern

$1 cont ains lalaaaaag


Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 30
Making parent heses
f orget f ul
Somet imes you need parent heses t o make your
regular expression work, but you don t act ually want
t o keep t he result s. You can st ill use parent heses for
grouping.

/(?:group)/
Cert ain charact ers are overloaded; recall:

\d? means 0 or 1 inst ances

\d+? means t he fewest non zero number of


digit s

(?:group) means look for t he group of


at oms in t he st ring, but don t remember
t hem
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 31
Example of f orget t ing
#!/usr/bin/perl
# Method 1
if (@ARGV && $ARGV[0] eq "-x") {
$mod = "?:";
} else {
$mod = "";
}
$pat1 = "\\w+";
$pat2 = "\\d+";
while (<STDIN>) {
$_ =~ /($mod$pat1) ($pat2)/;
print $1, "\n";
}
#!/usr/bin/perl
# Method 2
if (@ARGV && $ARGV[0] eq "-x") {
$ignore = 1;
} else {
$ignore = 0;
}
while (<STDIN>) {
$_ =~ /(\w+) (\d+)/;
if ($ignore) {
print $2, "\n";
}
else {
print $1, "\n";
}
}
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 32
More examples using
.///
Subst it ut ing one word for anot her
$string =~ s/dogs/cats/
If $string was I love dogs , it is now I love cat s
Removing t railing whit e space
$string =~ s/\s+$//

If $string was ATG , it is now ATG


Adding 10 t o every number in a st ring
$string =~ /(\d+)/$1+10/ge
Not e pat t ern memory

g means global (just like in regular expressions)

e is specific t o s, evaluat e t he expression on t he right


Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 33
tr f unct ion
t ranslat e or t ranslit erat e
] Gene(,l fo(m
tr/list1/list2/
Even less like a regular expression t han
s
subst it ut es charact ers in t he first list
wit h charact ers from t he second list :
$string =~ tr/a/A/
every a t o t ranslat ed t o an A

No need for a global modifier using tr.


Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 34
More examples of tr
convert ing named scalar t o lowercase
$ARGV[1] =~ tr/A-Z/a-z/

count t he number of * in $_
$cnt = tr/*/*/
$cnt = $_ =~ tr/*/*/
change all non-alphabet ic charact ers t o
spaces
tr/a-zA-Z/ /c
not ice space + c = complement search st ring
delet e all non-alphabet ic charact ers complet ely
tr/a-zA-Z//cd
d = delet e found but unreplaced charact ers
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 35
Using t he result s of mat ches
wit hin a pat t ern
\1, \2, \3 refer t o what a previous set of
parent heses mat ched
abc abc =~ /(\w+) \1/ # matches
abc def =~ /(\w+) \2/ # doesnt match

Can also use $1, $2, et c. t o perform some


int erest ing operat ions:
s/^([^ ]*) *([^ ]*)/$2 $1/ #swap first two words
/(\w+)\s*=\s*\1/ # match foo = foo
ot her default variables used in mat ches

$` : ret urns everyt hing before mat ched st ring

$& : ret urns ent ire mat ched st ring

$ : ret urns everyt hing aft er mat ched st ring


Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 36
Example: Celsius
Fahrenheit
#! /usr/bin/perl -w
print "Enter temperature: \n";
$line = <STDIN>;
chomp($line);
if ( $line =~ /^([-+]?[0-9]+(?:\.[0-9]*)?)\s*([CF])$/i ) {
$temp = $1;
$scale = $2;
if ( $scale =~ /c/i ) {
$cel = $temp;
$fah = ($cel * 9 / 5) + 32;
}
else {
$fah = $temp;
$cel = ($fah - 32) * 5 / 9;
}
printf( "%.2f C is %.2f F\n", $cel, $fah );
}
else {
printf( "Bad format\n" );
}
Universit y of Vict oria
Depart ment of Comput er
Science
SENG 265: Sof t ware Development
Met hods
Perl Regular Expression: Slide 37
Regex on command line
We can execut e simple regular
expressions on t he command line:
$ perl p i e 's/kat/cat/g' in.txt
p : apply program t o each line in file
in.txt

i : writ e changes back t o in.txt


e : program bet ween ''

You might also like