Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
45 views

4 Pattern Matching With Regular Expressions 1

The document discusses using regular expressions (regexes) for pattern matching in Java. It covers regex syntax, including character classes, quantifiers, and metacharacters. It also provides examples of using the Pattern and Matcher classes to test if a string matches a regex pattern, including compiling a Pattern, getting a Matcher, and calling matching methods like matches(). The document is a reference on regex pattern matching in Java that defines important concepts and provides code examples.

Uploaded by

Gajhodhar Paresh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

4 Pattern Matching With Regular Expressions 1

The document discusses using regular expressions (regexes) for pattern matching in Java. It covers regex syntax, including character classes, quantifiers, and metacharacters. It also provides examples of using the Pattern and Matcher classes to test if a string matches a regex pattern, including compiling a Pattern, getting a Matcher, and calling matching methods like matches(). The document is a reference on regex pattern matching in Java that defines important concepts and provides code examples.

Uploaded by

Gajhodhar Paresh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Contents

4 Pattern Matching with Regular Expressions 1


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
4.2 Regular Expression Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 1
4.2.1 Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4.2.2 Using regexes in Java: Test for a Pattern . . . . . . . . . . . . . . 5
4.3 Types of Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 Finding the Matching Text . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5 Replacing the Matched Text . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.6 Printing Lines Containing a Pattern . . . . . . . . . . . . . . . . . . . . . 17
4.7 Pattern.compile() Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.8 Matching Accented or Composite Characters . . . . . . . . . . . . . . . . 20
4.9 Matching Newlines in Text . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.10 Miscellaneous programming Assignment . . . . . . . . . . . . . . . . . . 21

i
Chapter 4

Pattern Matching with Regular


Expressions

4.1 Introduction
• Regular expressions, or regexes for short, provide a concise and precise specifica-
tion of patterns to be matched in text.

• Example: Suppose you have a bunch of 150000 mail in your drive, And lets fur-
ther suppose that you remember that somewhere in there is an email message
from someone named Angie or Anjie. Or was it Angy? But you dont remember
what you called it or where you stored it. Obviously, you have to look for it.
Simplest way is to write a regular expression to search it:

An[ˆ dn].*

• Description: finding words that begin with An, while the cryptic [ˆ dn] requires
the An to be followed by a character other than (m̂eans not in this context) a
space (to eliminate the very common English word an at the start of a sentence)
or d (to eliminate the common word and) or n (to eliminate Anne, Announcing,
etc.).

• The Java Pattern class can be used in two ways. You can use the Pattern.matches()
method to quickly check if a text (String) matches a given regular expression.

4.2 Regular Expression Syntax


• For testing of regular expression syntax you can take the help of the REDemo.java
program of the author.

• It is present in the online directory regex of the darwinsys-api repo, you will find
REDemo.java, which you can run to explore how regexes work.

1
Table 4.1: Regex Character classes

No. Character Class Description

1 [abc] a, b, or c (simple class)

2 [ˆabc] Any character except a, b, or c (negation)

3 [a-zA-Z] a through z or A through Z, inclusive (range)

4 [a-d[m-p]] a through d, or m through p: [a-dm-p] (union)

5 [a-z&&[def]] d, e, or f (intersection)

6 [a-z&&[ˆbc]] a through z, except for b and c: [ad-z] (subtraction)

7 [a-z&&[ˆm-p]] a through z, and not m through p: [a-lq-


z](subtraction)

Regex Character classes


Regex Quantifiers
The quantifiers specify the number of occurrences of a character.

Table 4.2: Regex Quantifiers

Regex Description
X? X occurs once or not at all
X+ X occurs once or more times
X* X occurs zero or more times
X{n} X occurs n times only
X{n,} X occurs n or more times
X{y,z} X occurs at least y times but less than z
times

Regex Metacharacters
The regular expression metacharacters work as shortcodes.

2
Table 4.3: Regex Metacharacters

Regex Description
. Any character (may or may not match ter-
minator)
\d Any digits, short of [0-9]
\D Any non-digit, short for [ˆ0-9]
\s Any whitespace character, short for
[\t\n\x0B\f\r]
\S Any non-whitespace character, short for
[ˆ\s]
\w Any word character, short for [a-zA-Z_0-9]
\W Any non-word character, short for [ˆ\w]
\b A word boundary
\B A non word boundary

Example: Search Pattern


Phone number one 986-123-4532 two 615-123-9867 three 615.123.9867 four (615)123.9867
\d any digit (0-9)
. - any char
* - zero or more (wild card)
character classes [abc] matches any one these characters
Mrs? \. - optionals
Example

\d \ d\d -\ d \d\d -\ d \d \d \d
\d {3} -\ d {3} - d {4}
\d {3}[ -.]\ d {3}[ -.] d {4}
\(?\ d {3}[ -.) ]\ d {3}[ -.]\ d {4}

4.2.1 Assignment
1. Write a regular expression to print all the name starts with An. Exmple: Angie,
Anjie or Angy.
Solution:

Regular expression = " An [^ nd ].* "

2. Write a regular expression to print the string from bunch of string starting with
“A” followed by any number of character.

3
Solution:

Regular expression = "A .* "

3. Subexpression: ^ Matches: Start of line/string

Solution:

Regex : ^ dog
Input string to search : dog
Output : found

4. Subexpression: $ Matches: End of line/string

Solution:

Regex : dog$
Input string to search : abc dog
Output : found

5. Subexpression: \b Matches: Word boundary


Solution:

Regex : \ bdog \b
Input string to search : abc dog xyz
Output : found
Regex : \ bdog \b
Input string to search : abc doggg xyz
Output : Not found

6. Subexpression: \B Matches: Not a word boundary


Solution:

Subexpression : \ B Matches : Not a word boundary


Regex : \ bdog \B ( the word boundry star with
dog then no boundry for word )
Input string to search : abc dogpqr xyz
Output : found

7. Subexpression: \A Matches: Beginning of entire string

4
Solution:

Regex : \ Aabc
Input string to search : abc dogpqr xyz
Output : found
Regex : \ Axyz
Input string to search : abc dogpqr xyz
Output : not found

8. Subexpression: \z Matches: End of entire string string


Solution:

Regex : xyz \z
Input string to search : abc dogpqr xyz
Output : found

9. Subexpression: [...] Matches: "Character class"; any one character from those
listed string
Solution:

Regex : a[ bc ]d
Input string to search : abc abd xyz
Output : found
Regex : a[ bc ]d
Input string to search : abc abcd xyz
Output : Not found

10. Subexpression: [\^...] Matches: Any one character not from those listed
Solution:

Regex : a[ bc ]d
Input string to search : axd abd xyz
Output : found

4.2.2 Using regexes in Java: Test for a Pattern


Matching regex using matches() in String class
If all you need is to find out whether a given regex matches a string, you can use the
convenient boolean matches() method of the String class, which accepts a regex pattern
in String form as its argument:

5
Example:

if ( inputString . matches ( stringRegexPattern ) ) {


// it matched ... do something with it ...
}

Example: String matches() Demo

public class TestString {


public static void main ( String args [])
{
String pattern = " .* Q [^ u ]\\ d +\\..* ";
String line = " Order QT300 . Now ! ";
if ( line . matches ( pattern ) ) {
System . out . println ( line + " matches \" " + pattern +
" \" " );
}
else
{
System . out . println ( " NO MATCH ") ;
}
}
}

Matching regexes using Pattern and Matcher(s)


A regular expression is a special sequence of characters that helps you match or find
other strings or sets of strings, using a specialized syntax held in a pattern. They can
be used to search, edit, or manipulate text and data.

If the regex is going to be used more than once or twice in a program, it is more effi-
cient to construct and use a Pattern and its Matcher(s).

This API is large enough to require some explanation. The normal steps for regex
matching in a production program are:

1. Create a Pattern by calling the static method Pattern.compile().

2. Request a Matcher from the pattern by calling Pattern.matcher(CharSequence)


for each String (or other CharSequence) you wish to look through.

3. Call (once or more) one of the finder methods (discussed later in this section) in
the resulting Matcher.

6
Example: Pattern, Matcher, matches() Demo

import java . util . regex .*;


public class TestPattern1 {
public static void main ( String args []) {
// 1 st way
Pattern p = Pattern . compile ( ". s" ); // . represents single
character
Matcher m = p. matcher ( " as " );
boolean b = m . matches () ;
// 2 nd way
boolean
b2 = Pattern . compile ( " .s ") . matcher ( " as " ). matches () ;
// 3 rd way
boolean b3 = Pattern . matches ( " .s " , " as ") ;
System . out . println ( b + " " + b2 + " "+ b3 ) ;
}
}

Matcher methods

• matches() : Used to compare the entire string against the pattern; this is the
same as the routine in java.lang.String. Because it matches the entire String , I
had to put .* before and after the pattern.

• lookingAt() : Used to match the pattern only at the beginning of the string.

• find() : Used to match the pattern in the string (not necessarily at the first char-
acter of the string), starting at the beginning of the string or, if the method was
previously called and succeeded, at the first character not matched by the previ-
ous match.

Example: matches() Demo

import java . util . regex .*;


class Main {
public static void main ( String [] args ) {
String patt = " pqr .* " ;
String input = " pqr abd pxy " ;
Matcher m= Pattern . compile ( patt ) . matcher ( input ) ;
if (m. matches () )
System . out . println ( " Pattern : " + patt + " found in :
"+ input );
else
System . out . println ( " Pattern not found " );
}
}
Output :
Pattern : pqr .* found in : pqr abd pxy

7
Example: find() Demo

import java . util . regex .*;


class Main {
public static void main ( String [] args ) {
String patt = " abd " ;
String input = " pqr abd pxy " ;
Matcher m= Pattern . compile ( patt ) . matcher ( input ) ;
if (m. find () )
System . out . println ( " Pattern : " + patt + " found in :
"+ input );
else
System . out . println ( " Pattern not found " );
}
}
Output :
Pattern : abd found in : pqr abd pxy

Example: lookingAt() Demo

import java . util . regex .*;


class Main {
public static void main ( String [] args ) {
String patt = " pqr " ;
String input = " pqr abd pxy " ;
Matcher m= Pattern . compile ( patt ) . matcher ( input ) ;
if (m. lookingAt () )
System . out . println ( " Pattern : " + patt + " found in :
"+ input );
else
System . out . println ( " Pattern not found " );
}
}
Output :
Pattern : pqr found in : pqr abd pxy

8
Example: lookingAt() Demo

import java . util . regex .*;


class Main {
public static void main ( String [] args ) {
String patt = "^ Q [^ u ]\\ d +\\. ";
String [] input = {
" QA777 . is the next flight . It is on time . " ,
" Quack , Quack , Quack !"
};
Pattern p = Pattern . compile ( patt ) ;
for ( String in : input ) {
boolean found = p. matcher ( in ) . lookingAt () ;
if ( found )
System . out . println ( " Pattern : " + patt + " found in :
"+ in );
else
System . out . println ( " Pattern not found " );
}
}
}
Output :
Pattern : ^Q [^ u ]\ d +\. found in : QA777 . is the next flight . It is
on time .
Pattern not found

The matches and lookingAt Methods


The matches() and lookingAt() methods both attempt to match an input sequence
against a pattern. The difference, however, is that matches requires the entire input
sequence to be matched, while lookingAt does not.
Example: matches() and lookingAt() Demo

import java . util . regex .*;


class Main {
public static void main ( String [] args ) {
String REGEX = " foo " ;
String INPUT = " fooooooooooooooooo ";
Matcher m = Pattern . compile ( REGEX ) . matcher ( INPUT ) ;
System . out . println ( " REGEX is : "+ REGEX ) ;
System . out . println ( " INPUT is : "+ INPUT ) ;
System . out . println ( " lookingAt () : " +m . lookingAt () ) ;
System . out . println ( " matches () : " +m . matches () ) ;
}
}
Output :
REGEX is : foo
INPUT is : fooooooooooooooooo
lookingAt () : true
matches () : false

9
Example: find() Demo

import java . util . regex .*;


class Main {
public static void main ( String [] args ) {
String patt = " abd ";
String input = " pqr abd pxy abd ";
Matcher m= Pattern . compile ( patt ) . matcher ( input ) ;
while (m. find () )
System . out . println ( " Pattern found from " +
m. start () + " to " + ( m . end () -1) ) ;
}
}
Output :
Pattern found from 4 to 6
Pattern found from 12 to 14

Example:

/* Create a regular expression that accepts alphanumeric


characters only . Its length must be six characters long
only . */
import java . util . regex .*;
class RegExample {
public static void main ( String args []) {
System . out . println ( Pattern . matches ( " [a -zA - Z0 -9]{6} " ,
" arun32 " )) ; // true
System . out . println ( Pattern . matches ( " [a -zA - Z0 -9]{6} " ,
" kkvarun32 ") ); // false ( more than 6 char )
System . out . println ( Pattern . matches ( " [a -zA - Z0 -9]{6} " ,
" JA2Uk2 " )) ; // true
System . out . println ( Pattern . matches ( " [a -zA - Z0 -9]{6} " ,
" arun$2 " )) ; // false ($ is not matched )
}
}

10
Example:

/* Create a regular expression that accepts 10 digit numeric


characters starting with 7 , 8 or 9 only . */
import java . util . regex .*;
class RegExample {
public static void main ( String args []) {
System . out . println ( " by character classes and
quantifiers ... " );
System . out . println ( Pattern . matches ( " [789]{1}[0 -9]{9} " ,
" 9953038949 " )) ; // true
System . out . println ( Pattern . matches ( " [789][0 -9]{9} " ,
" 9953038949 " )) ; // true
System . out . println ( Pattern . matches ( " [789][0 -9]{9} " ,
" 99530389490 ") ) ; // false (11 characters )
System . out . println ( Pattern . matches ( " [789][0 -9]{9} " ,
" 6953038949 " )) ; // false ( starts from 6)
System . out . println ( Pattern . matches ( " [789][0 -9]{9} " ,
" 8853038949 " )) ; // true
System . out . println ( " by metacharacters ... " );
System . out . println ( Pattern . matches ( " [789]{1}\\ d {9} " ,
" 8853038949 " )) ; // true
System . out . println ( Pattern . matches ( " [789]{1}\\ d {9} " ,
" 3853038949 " )) ; // false ( starts from 3)
}
}

4.3 Types of Quantifiers


1. Greedy quantifier: By default, quantifiers are Greedy. Greedy quantifiers try
to match the longest text that matches given pattern.Greedy quantifiers work by
first reading the entire string before trying any match.
If the entire text doesn’t match, remove last character and try again, repeating
the process until a match is found.
Example: Greedy Quantifier Demo

Pattern p = Pattern . compile ( "g +" );


Matcher m = p. matcher ( " ggg ") ;
while (m. find () )
System . out . println ( " Pattern found from " + m . start () + " to
" + (m. end () -1) ) ;
Output :
Pattern found from 0 to 2

Explanation: The pattern g+ means one or more occurrences of g. Text is ggg.


The greedy matcher would match the longest text even if parts of matching text
also match. In this example, g and gg also match, but the greedy matcher pro-
duces ggg.

11
2. Reluctant quantifier: (Appending a ? after quantifier) This quantifier
uses the approach that is opposite of greedy quantifiers. It starts from first char-
acter and processes one character at a time.
Example: Reluctant Quantifier Demo

Pattern p = Pattern . compile ( "g +? ") ;


Matcher m = p. matcher ( " ggg ") ;
while (m. find () )
System . out . println ( " Pattern found from " + m . start () + " to
" + (m. end () -1) ) ;
Output :
Pattern found from 0 to 0
Pattern found from 1 to 1
Pattern found from 2 to 2

3. Possessive quantifier: (Appending a + after quantifier) This quantifier


matches as many characters as it can like greedy quantifier. But if the entire
string doesnt match, then it doesn’t try removing characters from end.
Example: Possessive Quantifier Demo

Pattern p = Pattern . compile ( "g ++ ") ;


Matcher m = p. matcher ( " ggg ") ;
while (m. find () )
System . out . println ( " Pattern found from " + m . start () + "
to " + ( m. end () -1) ) ;
Output :
Pattern found from 0 to 2

Example to Show Difference between Greedy and Possessive Quantifiers.

12
Example:

String s= " xfooxxxxxxfoo " ;


// Greedy Quantifier
String patt = " .* foo ";
Matcher m= Pattern . compile ( patt ) . matcher ( s ) ;
if (m . find () )
System . out . println ( " Pattern found " );
else
System . out . println ( " Pattern Not Found " );
// Possessive Quantifier
String patt1 = " .*+ foo ";
Matcher m1 = Pattern . compile ( patt1 ) . matcher ( s ) ;
if ( m1 . find () )
System . out . println ( " Pattern found " );
else
System . out . println ( " Pattern Not Found " );
Output :
Pattern found
Pattern Not Found

4.4 Finding the Matching Text


You need to find the text that the regex matched.
use:

• start(), end()
Returns the character position in the string of the starting and ending characters
that matched.

• groupCount()
Returns the number of parenthesized capture groups, if any; returns 0 if no groups
were used.

• group(int i)
Returns the characters matched by group i of the current match, if i is greater
than or equal to zero and less than or equal to the return value of groupCount()
. Group 0 is the entire match, so group(0) (or just group() ) returns the entire
portion of the input that matched.

Note:

• The group(int) method lets you retrieve the characters that matched a given
parenthesis group. If you haven’t used any explicit parenthesis, you can just treat
whatever matched as “level zero.”

• Exception: group() method throws IllegalStateException if no match has yet


been attempted, or if the previous match operation failed.

13
• To find out how many groups are present in the expression, call the groupCount
method on a matcher object. The groupCount method returns an int showing the
number of capturing groups present in the matcher’s pattern.

• There is also a special group, group 0, which always represents the entire expres-
sion. This group is not included in the total reported by groupCount.

Example:group()/group(0) Demo

String patt = "Q [^ u ]\\ d +\\. " ;


Pattern r = Pattern . compile ( patt ) ;
String line = " Order QT300 . Now ! ";
Matcher m = r. matcher ( line ) ;
if (m . find () ) {
System . out . println ( patt + " matches \" " +
m. group (0) +
" \" in \" " + line + " \" " );
} else {
System . out . println ( " NO MATCH ") ;
}
Output :
(Q [^ u ]) (\ d +\.) matches " QT300 . " in " Order QT300 . Now !"

Example: groupCount()

import java . util . regex .*;


public class RESimple {
public static void main ( String [] argv ) {
String pattern = " (.*) (\\ d {6}) ";
String input = " abdpxy 100000 ";
Pattern p = Pattern . compile ( pattern ) ;
Matcher m=p. matcher ( input ) ;
System . out . println ( " Total group = " +m . groupCount () ) ;
}
}
Output : Total group =2

14
Example: group(int) Demo

import java . util . regex .*;


class Main {
public static void main ( String [] args ) {
String patt = "( Q [^ u ]) (\\ d +\\.) " ;
// there are 2 parenthesis (2 groups )
String line = " Order QT300 . Now ! ";
Matcher m = Pattern . compile ( patt ) . matcher ( line ) ;
if (m. find () )
{
System . out . println ( line . substring ( m . start () ,m . end () ) );
System . out . println ( " group () : "+ m . group () ) ;
System . out . println ( " group (0) :" +m . group (0) ) ;
System . out . println ( " group (1) :" +m . group (1) ) ;
System . out . println ( " group (2) :" +m . group (2) ) ;
}
System . out . println ( " groupCount () :" +m . groupCount () ) ;
}
}
Output :
QT300 .
group () : QT300 .
group (0) : QT300 .
group (1) : QT
group (2) :300.
groupCount () :2

Example: group()

import java . util . regex .*;


public class RESimple {
public static void main ( String [] argv ) {
String pattern = " (.*) (\\ d {6}) ";
String input = " abdpxy 100000 ";
Pattern p = Pattern . compile ( pattern ) ;
Matcher m=p. matcher ( input ) ;
if (m. find () ){
System . out . println ( " Patern " + pattern +" found in
string " + input +" with group " +m . group (0) ) ;
System . out . println ( " Patern " + pattern +" found in
string " + input +" with group " +m . group (1) ) ;
System . out . println ( " Patern " + pattern +" found in
string " + input +" with group " +m . group (2) ) ;
}
}
}
Output :
Patern (.*) (\ d {6}) found in string abdpxy 100000 with group
abdpxy 100000
Patern (.*) (\ d {6}) found in string abdpxy 100000 with group
abdpxy
Patern (.*) (\ d {6}) found in string abdpxy 100000 with group
100000

15
Write a java program to display formatted phone number.
Solution:

import java . util . regex . Matcher ;


import java . util . regex . Pattern ;
public class Main {
public static void main ( String [] args ) {
String regex = " \\ b (\\ d {3}) (\\ d {3}) (\\ d {4}) \\ b ";
Pattern p = Pattern . compile ( regex ) ;
String source = " 1234567890 , 12345 , and 9876543210 ";
Matcher m = p. matcher ( source ) ;
if (m. find () ) {
String format = "( "+ m . group (1) + " )
"+m . group (2) + " -" + m . group (3) ;
System . out . println ( " Phone : "+ m . group () + "
Formatted : "+ format ) ;
}
}
}
Output :
Phone : 1234567890 , Formatted Phone : (123) 456 -7890
Phone : 9876543210 , Formatted Phone : (987) 654 -3210

4.5 Replacing the Matched Text


• replaceAll(newString)
Replaces all occurrences that matched with the new string.

Example: replaceAll() Demo

import java . util . regex .*;


class Main {
public static void main ( String [] args ) {
String patt = " \\ bfavor \\ b ";
String input = " Do me a favor ? Fetch my favorite ." ;
System . out . println ( " Input : " + input ) ;
Matcher m= Pattern . compile ( patt ) . matcher ( input ) ;
System . out . println ( " Output :
"+m. replaceAll ( " favour ") ) ;
}
}
Output :
Input : Do me a favor ? Fetch my favorite .
Output : Do me a favour ? Fetch my favorite .

• appendReplacement(StringBuffer, newString)
Copies up to before the first match, plus the given newString .

• appendTail(StringBuffer)
Appends text after the last match (normally used after appendReplacement ).

16
Example: appendReplacement() and appendTail() Demo

import java . util . regex . Pattern ;


import java . util . regex . Matcher ;
public class ReplaceAll {
public static void main ( String args [])
{
String patt = " \\ bfavor \\ b ";
String input = " Do me a favor ? Fetch my favorite . favor " ;
System . out . println ( " Input : " + input ) ;
Pattern r = Pattern . compile ( patt ) ;
Matcher m = r. matcher ( input ) ;
StringBuffer sb = new StringBuffer () ;
while (m . find () ) {
// Copy to before first match ,
// plus the word " favor "
m. appendReplacement ( sb , " favour " );
}
m. appendTail ( sb ) ; // copy remainder
System . out . println ( " Output : "+ sb . toString () ) ;
}
}
Output :
Input : Do me a favor ? Fetch my favorite .
Output : Do me a favour ? Fetch my favorite .

4.6 Printing Lines Containing a Pattern


Write a java program to print line containing a pattern.

17
Solution:

// Write a simple grep - like program .


import java . util . regex .*;
import java . io .*;
class Main {
public static void main ( String [] args ) throws Exception {
if ( args . length !=2)
{
System . out . println ( " Wrong input " );
System . exit (1) ;
}
FileReader fr = new FileReader ( args [0]) ;
BufferedReader br = new BufferedReader ( fr ) ;
String patt = args [1];
Pattern p= Pattern . compile ( patt ) ;
Matcher m = p. matcher ( "" );
String line ;
while (( line = br . readLine () ) != null )
{
m. reset ( line );
if (m. find () )
System . out . println ( " Match : " + line ) ;
}
}
}
Output :
javac Main . java ( compile )
java Main input . txt " \\ bhappy \\ b " ( run / execute )
Lifes better when were happy .
A tensed , and angry mind cannot become happy .
Everybody wants to be happy .
input . txt
--------------------
The pursuit of happiness is real .
Lifes better when were happy .
Happiness is a state of mind .
A tensed , and angry mind cannot become happy .
Everybody wants to be happy .

Note:
Matcher reset(CharSequence): This method takes the parameter input which is
the String to be inserted into matcher after getting reset.

4.7 Pattern.compile() Flags


• CANON_EQ
Enables so-called “canonical equivalence.” In other words, characters are matched
by their base character, so that the character e followed by the “combining char-
acter mark” for the acute accent ( ´) can be matched either by the composite
character é or the letter e followed by the character mark for the accent.

18
• CASE_INSENSITIVE
Turns on case-insensitive matching
Example:

Pattern reCaseInsens = Pattern . compile ( pattern ,


Pattern . CASE_INSENSITIVE | Pattern . UNICODE_CASE ) ;
reCaseInsens . matches ( input ) ;

• COMMENTS
Causes whitespace and comments (from # to endofline) to be ignored in the pat-
tern.

• DOTALL
Allows dot (.) to match any regular character or the newline, not just any regular
character other than newline.

• MULTILINE
Specifies multiline mode.

• UNICODE_CASE
Enables Unicode aware case folding.

• UNIX_LINES
Makes \n the only valid “newline” sequence for MULTILINE mode.

19
4.8 Matching Accented or Composite Characters
Solution:

// Matching Accented or Composite Characters


import java . util . regex .*;
public class CanonEqDemo {
public static void main ( String [] args ) {
String pattStr = " \ u00e9gal " ; // egal
String [] input = {
"\ u00e9gal " , // egal - this one had better match : -)
"e \ u0301gal " , // e + " Combining acute accent "
"e \ u02cagal " , // e + " modifier letter acute accent "
"e ’ gal " , // e + single quote
"e \ u00b4gal " , // e + Latin -1 " acute "
};
Pattern pattern = Pattern . compile ( pattStr ,
Pattern . CANON_EQ ) ;
for ( int i = 0; i < input . length ; i ++) {
if ( pattern . matcher ( input [ i ]) . matches () ) {
System . out . println ( pattStr + " matches input " +
input [i ]) ;
}
else {
System . out . println (
pattStr + " does not match input " + input [ i ]) ;
}
}
}
}

Output:

4.9 Matching Newlines in Text


Problem
You need to match newlines in text.

20
Solution:

import java . util . regex .*;


class Main {
public static void main ( String [] args ) {
String input = " I dream of engines \ nmore engines , all day
long ";
System . out . println ( " INPUT : " + input ) ;
System . out . println () ;
String [] patt = {
" engines . more engines " ,
" ines \ nmore " ,
" engines$ "
};
for ( int i = 0; i < patt . length ; i ++) {
System . out . println ( " PATTERN " + patt [ i ]) ;
boolean found ;
Pattern p1l = Pattern . compile ( patt [ i ]) ;
found = p1l . matcher ( input ) . find () ;
System . out . println ( " DEFAULT match " + found ) ;
Pattern pml = Pattern . compile ( patt [ i ] ,
Pattern . DOTALL | Pattern . MULTILINE ) ;
found = pml . matcher ( input ) . find () ;
System . out . println ( " MultiLine match " + found ) ;
System . out . println () ;
}
}
}

Output:
INPUT: I dream of engines
more engines, all day long

PATTERN engines.more engines


DEFAULT match false
MultiLine match true

PATTERN ines
more
DEFAULT match true
MultiLine match true

PATTERN engines$
DEFAULT match false
MultiLine match true

4.10 Miscellaneous programming Assignment


1 Write a program in Java to remove whitespaces from a string.

21
Hint: Use replaceAll() method of Matcher class

2 Write a Java program to read all mobile numbers present in given file and vali-
date it on below criteria:
-The first digit should contain number between 7 to 9.
-The rest 9 digit can contain any number between 0 to 9.
-The mobile number can have 11 digits also by including 0 at the beginning.
-The mobile number can be of 12 digits by including 91 at the beginning.
The number which satisfies the above criteria is a valid mobile Number.

3 Write a program to Check if given email or URL (both) addresses are valid or
not.

mail validation
---------------------
Example: rama@gmail.com
1) name: [a-zA-z_-]+
2) @ : @
3) subdomain: [a-zA-Z]{2,256} (Ex: gmail, yahoo, etc)
4) dot (.): \.
4) domain: [a-zA-Z]{2,5} (Ex: in, com etc)

url validation
--------------
Example: https://www.gmail.com
1) URL must start with either http or https : https?
2) followed by :// : ://
3) then it must contain www. : w{3}\.
3) subdomain: [a-zA-Z]{2,256} (Ex: gmail, yahoo, etc)
4) dot (.): \.
4) domain: [a-zA-Z]{2,5} (Ex: in, com etc)

Hint: You are free to add some additional restrictions to the email and URL.
The pattern must satisfy both email and URL.

4 Write a program to extract maximum numeric value from a given string.


Example
Input: There is 60 students in cse-d section, 40 in cse-b, and 55 in cse-a
Output:Max: 60.

5 Write a program to demonstrate the working of splitting a text by a given pat-


tern. The given input is “CSE1ECE2EEE3CIVIL”.The output of the program is
look like below:
CSE

22
ECE
EEE
CIVIL
Use the split () and case controlling flags to solve this.

6 Write a program to get the first letter of each word in a string using regex in
Java. For example: the input string is “This is CSE Students” and output of the
program is: TiCS.

7 Write a program to demonstrate the differences of various quantifiers e.g. Greedy


Quantifiers VS Reluctant Quantifiers VS Possessive Quantifiers.

23

You might also like