PHP String and Regular Expressions
PHP String and Regular Expressions
BRACKETS [ ]
Shows the range of search from the string [0-9] matches any decimal digit from 0 through 9. [a-z] matches any character from lowercase a through lowercase z. [A-Z] matches any character from uppercase A through uppercase Z. [A-Za-z] matches any character from uppercase A through lowercase z.
Quantifiers
The frequency or position of bracketed character sequences and single characters can be denoted by a special character, with each special character having a specific connotation. The +, *, ?,{occurrence_range}, and $ flags all follow a character sequence:
p+ matches any string containing at least one p. p* matches any string containing zero or more ps. p? matches any string containing zero or one p. p{2} matches any string containing a sequence of two ps. p{2,3} matches any string containing a sequence of two or three ps. p{2,} matches any string containing a sequence of at least two ps. p$ matches any string with p at the end of it. Still other flags can precede and be inserted before and within a character sequence: ^p matches any string with p at the beginning of it. [^a-zA-Z] matches any string not containing any of the characters ranging from a through z and A through Z. p.p matches any string containing p, followed by any character, in turn followed by another p.
^.{2}$ matches any string containing exactly two characters. <b>(.*)</b> matches any string enclosed within <b> and </b> (presumably HTML bold tags). p(hp)* matches any string containing a p followed by zero or more instances of the sequence hp. Search a dolor Sign in a string ([\$])([0-9]+)
ereg()
executes a case-sensitive search of string for pattern, returning TRUE if the pattern is found and FALSE otherwise.
eregi()
passwords.
Unlike ereg(), the search is case insensitive.This function can be useful when checking the validity of strings, such as
ereg_replace()
The ereg_replace() function operates much like ereg(), except that the functionality is extended to finding and replacing pattern with replacement instead of simply locating it. If no matches are found, the string will remain unchanged. Like ereg(), ereg_replace() is case sensitive. The eregi_replace() function operates exactly like ereg_replace(), except that the search for pattern in string is not case sensitive.
eregi_replace()
split()
The split() function divides string into various elements, with the boundaries of each element based on the occurrence of pattern in string. The optional input parameter limit is used to specify the number of elements into which the string should be divided, starting from the left end of the string and working rightward. In cases where the pattern is an alphabetical character, split() is case sensitive. Heres how you would use split() to break a string into pieces based on occurrences of horizontal tabs and newline characters:
spliti() The spliti() function operates exactly in the same manner as its
sibling split(), except that it is case insensitive.
Practical Examples
ereg() boolean ereg (string pattern, string string [, array regs])
<?php $username = "jasoN"; if (ereg("([^a-z])",$username)) echo "Username must be all lowercase!"; ?>
<?php $url = "http://www.apress.com"; // break $url down into three distinct pieces: // "http://www", "apress", and "com" $parts = ereg("^(http://www)\.([[:alnum:]]+)\.([[:alnum:]]+)", $url, $regs); echo $regs[0]; // outputs the entire string "http://www.apress.com" echo "<br>"; echo $regs[1]; // outputs "http://www" echo "<br>"; echo $regs[2]; // outputs "apress" echo "<br>"; This returns: echo $regs[3]; // outputs "com" http://www.apress.com http://www ?>
apress com
In this example, the user must provide an alphanumeric password consisting of 8 to 10 characters, or else an error message is displayed.
ereg_replace()
string ereg_replace (string pattern, string replacement, string string)
<?php $text = "This is a link to http://www.wjgilmore.com/."; echo ereg_replace("http://([a-zA-Z0-9./-]+)$", "<a href=\"\\0\">\\0</a>",$text); ?>
This returns: href="http://www.wjgilmore.com/">http://www.wjgilmore.com</a>.
split() array split (string pattern, string string [, int limit]) <?php $text = "this is\tsome text that\nwe might like to parse."; print_r(split("[\n\t]",$text)); ?>
Array ( [0] => this is [1] => some text that [2] => we might like to parse. )
sql_regcase() string sql_regcase (string string) <?php $version = "php 4.0"; print sql_regcase($version); ?>
Output: [Pp] [Hh] [Pp] 4.0
Metacharacters \A: Matches only at the beginning of the string. \b: Matches a word boundary. \B: Matches anything but a word boundary. \d: Matches a digit character. This is the same as [0-9]. \D: Matches a nondigit character. \s: Matches a whitespace character. \S: Matches a nonwhitespace character. []: Encloses a character class. A list of useful character classes was provided in the previous section. (): Encloses a character grouping or defines a back reference. $: Matches the end of a line. ^: Matches the beginning of a line. .: Matches any character except for the newline.
\: Quotes the next metacharacter. \w: Matches any string containing solely underscore and alphanumeric characters. This is the same as [a-zA-Z0-9_]. \W: Matches a string, omitting the underscore and alphanumeric characters.
Lets consider a few examples: /sa\b/ Because the word boundary is defined to be on the right side of the strings, this will match strings like pisa and lisa, but not sand. /\blinux\b/i This returns the first case-insensitive occurrence of the word linux. /sa\B/ The opposite of the word boundary metacharacter is \B, matching on anything but a word boundary. This will match strings like sand and Sally, but not Melissa. /\$\d+\g This returns all instances of strings matching a dollar sign followed by one or more digits.
preg_match_all() int preg_match_all (string pattern, string string, array pattern_array [, int order]) The preg_match_all() function matches all occurrences of pattern in string, assigning each occurrence to array pattern_array in the order you specify via the optional input parameter order. The order parameter accepts two values: preg_quote() string preg_quote(string str [, string delimiter]) The function preg_quote() inserts a backslash delimiter before every character of special significance to regular expression syntax. These special characters include: $ ^ * ( ) + = { } [ ] | \\ : < >. The optional parameter delimiter is used to specify what delimiter is used for the regular expression, causing it to also be escaped by a backslash.
preg_replace() mixed preg_replace (mixed pattern, mixed replacement, mixed str [, int limit]) The preg_replace() function operates identically to ereg_replace(), except that it uses a Perlbased regular expression syntax, replacing all occurrences of pattern with replacement, and returning the modified result. The optional input parameter limit specifies how many matches should take place. Failing to set limit or setting it to -1 will result in the replacement of all occurrences.
preg_replace_callback() mixed preg_replace_callback(mixed pattern, callback callback, mixed str [, int limit]) Rather than handling the replacement procedure itself, reg_replace_callback() function delegates the string-replacement procedure to some other userdefined function. The pattern parameter determines what youre looking for, while the str parameter defines the string youre searching. The callback parameter defines the name of the function to be used for the replacement task. The optional parameter limit specifies how many matches should take place. Failing to set limit or setting it to -1 will result in the replacement of all occurrences. In the following example, a function named acronym() is passed into preg_replace_callback() and is used to insert the long form of various acronyms into the target string
preg_split() array preg_split (string pattern, string string [, int limit [, int flags]]) The preg_split() function operates exactly like split(), except that pattern can also be defined in terms of a regular expression. If the optional input parameter limit is specified, only limit number of substrings are returned
<?php $foods = array("pasta", "steak", "fish", "potatoes"); $food = preg_grep("/^p/", $foods); print_r($food); ?>
This returns: Array ( [0] => pasta [3] => potatoes )
preg_match() int preg_match (string pattern, string string [, array matches] [, int flags [, int offset]]])
<?php $line = "Vim is the greatest word processor ever created!"; if (preg_match("/\bVim\b/i", $line, $match)) print "Match found!"; ?>
For instance, this script will confirm a match if the word Vim or vim is located, but not simplevim, vims, or evim.
preg_match_all() int preg_match_all (string pattern, string string, array pattern_array [, int order])
<?php $userinfo = "Name: <b>Zeev Suraski</b> <br> Title: <b>PHP Guru</b>"; preg_match_all ("/<b>(.*)<\/b>/U", $userinfo, $pat_array); print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n"; ?>
This returns: Zeev Suraski PHP Guru
<?php $text = "Tickets for the bout are going for $500."; echo preg_quote($text); ?>
This returns: Tickets for the bout are going for \$500\.
preg_replace() mixed preg_replace (mixed pattern, mixed replacement, mixed str [, int limit])
<?php $text = "This is a link to http://www.wjgilmore.com/."; echo preg_replace("/http:\/\/(.*)\//", "<a href=\"\${0}\">\${0}</a>", $text); ?>
This returns: This is a link to <a href="http://www.wjgilmore.com/">http://www.wjgilmore.com/</a>.
preg_replace_callback() mixed preg_replace_callback(mixed pattern, callback callback, mixed str [, int limit]) <?php // This function will add the acronym long form // directly after any acronyms found in $matches function acronym($matches) { $acronyms = array( 'WWW' => 'World Wide Web', 'IRS' => 'Internal Revenue Service', 'PDF' => 'Portable Document Format'); if (isset($acronyms[$matches[1]])) return $matches[1] . " (" . $acronyms[$matches[1]] . ")"; else return $matches[1]; }
// The target text $text = "The <acronym>IRS</acronym> offers tax forms in <acronym>PDF</acronym> format on the <acronym>WWW</acronym>."; // Add the acronyms' long forms to the target text $newtext = preg_replace_callback("/<acronym>(.*)<\/acronym>/U", 'acronym', $text); print_r($newtext);?> This returns: The IRS (Internal Revenue Service) offers tax forms in PDF (Portable Document Format) on the WWW (World Wide Web).
preg_split() array preg_split (string pattern, string string [, int limit [, int flags]]) <?php $delimitedText = "+Jason+++Gilmore+++++++++++Columbus+++OH"; $fields = preg_split("/\+{1,}/", $delimitedText); foreach($fields as $field) echo $field."<br />"; ?> This returns the following: Jason Gilmore Columbus OH