4 Pattern Matching With Regular Expressions 1
4 Pattern Matching With Regular Expressions 1
i
Chapter 4
4.1 Introduction
• Regular expressions, or regexes for short, provide a concise and precise specifica-
tion of patterns to be matched in text.
• Example: Suppose you have a bunch of 150000 mail in your drive, And lets fur-
ther suppose that you remember that somewhere in there is an email message
from someone named Angie or Anjie. Or was it Angy? But you dont remember
what you called it or where you stored it. Obviously, you have to look for it.
Simplest way is to write a regular expression to search it:
An[ˆ dn].*
• Description: finding words that begin with An, while the cryptic [ˆ dn] requires
the An to be followed by a character other than (m̂eans not in this context) a
space (to eliminate the very common English word an at the start of a sentence)
or d (to eliminate the common word and) or n (to eliminate Anne, Announcing,
etc.).
• The Java Pattern class can be used in two ways. You can use the Pattern.matches()
method to quickly check if a text (String) matches a given regular expression.
• It is present in the online directory regex of the darwinsys-api repo, you will find
REDemo.java, which you can run to explore how regexes work.
1
Table 4.1: Regex Character classes
5 [a-z&&[def]] d, e, or f (intersection)
Regex Description
X? X occurs once or not at all
X+ X occurs once or more times
X* X occurs zero or more times
X{n} X occurs n times only
X{n,} X occurs n or more times
X{y,z} X occurs at least y times but less than z
times
Regex Metacharacters
The regular expression metacharacters work as shortcodes.
2
Table 4.3: Regex Metacharacters
Regex Description
. Any character (may or may not match ter-
minator)
\d Any digits, short of [0-9]
\D Any non-digit, short for [ˆ0-9]
\s Any whitespace character, short for
[\t\n\x0B\f\r]
\S Any non-whitespace character, short for
[ˆ\s]
\w Any word character, short for [a-zA-Z_0-9]
\W Any non-word character, short for [ˆ\w]
\b A word boundary
\B A non word boundary
\d \ d\d -\ d \d\d -\ d \d \d \d
\d {3} -\ d {3} - d {4}
\d {3}[ -.]\ d {3}[ -.] d {4}
\(?\ d {3}[ -.) ]\ d {3}[ -.]\ d {4}
4.2.1 Assignment
1. Write a regular expression to print all the name starts with An. Exmple: Angie,
Anjie or Angy.
Solution:
2. Write a regular expression to print the string from bunch of string starting with
“A” followed by any number of character.
3
Solution:
Solution:
Regex : ^ dog
Input string to search : dog
Output : found
Solution:
Regex : dog$
Input string to search : abc dog
Output : found
Regex : \ bdog \b
Input string to search : abc dog xyz
Output : found
Regex : \ bdog \b
Input string to search : abc doggg xyz
Output : Not found
4
Solution:
Regex : \ Aabc
Input string to search : abc dogpqr xyz
Output : found
Regex : \ Axyz
Input string to search : abc dogpqr xyz
Output : not found
Regex : xyz \z
Input string to search : abc dogpqr xyz
Output : found
9. Subexpression: [...] Matches: "Character class"; any one character from those
listed string
Solution:
Regex : a[ bc ]d
Input string to search : abc abd xyz
Output : found
Regex : a[ bc ]d
Input string to search : abc abcd xyz
Output : Not found
10. Subexpression: [\^...] Matches: Any one character not from those listed
Solution:
Regex : a[ bc ]d
Input string to search : axd abd xyz
Output : found
5
Example:
If the regex is going to be used more than once or twice in a program, it is more effi-
cient to construct and use a Pattern and its Matcher(s).
This API is large enough to require some explanation. The normal steps for regex
matching in a production program are:
3. Call (once or more) one of the finder methods (discussed later in this section) in
the resulting Matcher.
6
Example: Pattern, Matcher, matches() Demo
Matcher methods
• matches() : Used to compare the entire string against the pattern; this is the
same as the routine in java.lang.String. Because it matches the entire String , I
had to put .* before and after the pattern.
• lookingAt() : Used to match the pattern only at the beginning of the string.
• find() : Used to match the pattern in the string (not necessarily at the first char-
acter of the string), starting at the beginning of the string or, if the method was
previously called and succeeded, at the first character not matched by the previ-
ous match.
7
Example: find() Demo
8
Example: lookingAt() Demo
9
Example: find() Demo
Example:
10
Example:
11
2. Reluctant quantifier: (Appending a ? after quantifier) This quantifier
uses the approach that is opposite of greedy quantifiers. It starts from first char-
acter and processes one character at a time.
Example: Reluctant Quantifier Demo
12
Example:
• start(), end()
Returns the character position in the string of the starting and ending characters
that matched.
• groupCount()
Returns the number of parenthesized capture groups, if any; returns 0 if no groups
were used.
• group(int i)
Returns the characters matched by group i of the current match, if i is greater
than or equal to zero and less than or equal to the return value of groupCount()
. Group 0 is the entire match, so group(0) (or just group() ) returns the entire
portion of the input that matched.
Note:
• The group(int) method lets you retrieve the characters that matched a given
parenthesis group. If you haven’t used any explicit parenthesis, you can just treat
whatever matched as “level zero.”
13
• To find out how many groups are present in the expression, call the groupCount
method on a matcher object. The groupCount method returns an int showing the
number of capturing groups present in the matcher’s pattern.
• There is also a special group, group 0, which always represents the entire expres-
sion. This group is not included in the total reported by groupCount.
Example:group()/group(0) Demo
Example: groupCount()
14
Example: group(int) Demo
Example: group()
15
Write a java program to display formatted phone number.
Solution:
• appendReplacement(StringBuffer, newString)
Copies up to before the first match, plus the given newString .
• appendTail(StringBuffer)
Appends text after the last match (normally used after appendReplacement ).
16
Example: appendReplacement() and appendTail() Demo
17
Solution:
Note:
Matcher reset(CharSequence): This method takes the parameter input which is
the String to be inserted into matcher after getting reset.
18
• CASE_INSENSITIVE
Turns on case-insensitive matching
Example:
• COMMENTS
Causes whitespace and comments (from # to endofline) to be ignored in the pat-
tern.
• DOTALL
Allows dot (.) to match any regular character or the newline, not just any regular
character other than newline.
• MULTILINE
Specifies multiline mode.
• UNICODE_CASE
Enables Unicode aware case folding.
• UNIX_LINES
Makes \n the only valid “newline” sequence for MULTILINE mode.
19
4.8 Matching Accented or Composite Characters
Solution:
Output:
20
Solution:
Output:
INPUT: I dream of engines
more engines, all day long
PATTERN ines
more
DEFAULT match true
MultiLine match true
PATTERN engines$
DEFAULT match false
MultiLine match true
21
Hint: Use replaceAll() method of Matcher class
2 Write a Java program to read all mobile numbers present in given file and vali-
date it on below criteria:
-The first digit should contain number between 7 to 9.
-The rest 9 digit can contain any number between 0 to 9.
-The mobile number can have 11 digits also by including 0 at the beginning.
-The mobile number can be of 12 digits by including 91 at the beginning.
The number which satisfies the above criteria is a valid mobile Number.
3 Write a program to Check if given email or URL (both) addresses are valid or
not.
mail validation
---------------------
Example: rama@gmail.com
1) name: [a-zA-z_-]+
2) @ : @
3) subdomain: [a-zA-Z]{2,256} (Ex: gmail, yahoo, etc)
4) dot (.): \.
4) domain: [a-zA-Z]{2,5} (Ex: in, com etc)
url validation
--------------
Example: https://www.gmail.com
1) URL must start with either http or https : https?
2) followed by :// : ://
3) then it must contain www. : w{3}\.
3) subdomain: [a-zA-Z]{2,256} (Ex: gmail, yahoo, etc)
4) dot (.): \.
4) domain: [a-zA-Z]{2,5} (Ex: in, com etc)
Hint: You are free to add some additional restrictions to the email and URL.
The pattern must satisfy both email and URL.
22
ECE
EEE
CIVIL
Use the split () and case controlling flags to solve this.
6 Write a program to get the first letter of each word in a string using regex in
Java. For example: the input string is “This is CSE Students” and output of the
program is: TiCS.
23