Java - Regular Expressions: Capturing Groups
Java - Regular Expressions: Capturing Groups
Java provides t he java.ut il.regex package for pat t ern mat ching wit h regular expressions. Java regular
expressions are very similar t o t he Perl programming language and very easy t o learn.
A regular expression is a special sequence of charact ers t hat helps you mat ch or find ot her st rings or
set s of st rings, using a specialized synt ax held in a pat t ern. They can be used t o search, edit , or
manipulat e t ext and dat a.
Pattern Class: A Pat t ern object is a compiled represent at ion of a regular expression. The
Pat t ern class provides no public const ruct ors. To creat e a pat t ern, you must first invoke one
of it s public st at ic compile met hods, which will t hen ret urn a Pat t ern object . These met hods
accept a regular expression as t he first argument .
Matcher Class: A Mat cher object is t he engine t hat int erpret s t he pat t ern and performs
mat ch operat ions against an input st ring. Like t he Pat t ern class, Mat cher defines no public
const ruct ors. You obt ain a Mat cher object by invoking t he mat cher met hod on a Pat t ern
object .
Capt uring groups are numbered by count ing t heir opening parent heses from left t o right . In t he
expression ((A)(B(C))), for example, t here are four such groups:
((A)(B(C)))
(A)
(B(C))
(C)
To find out how many groups are present in t he expression, call t he groupCount met hod on a
mat cher object . The groupCount met hod ret urns an int showing t he number of capt uring groups
present in t he mat cher's pat t ern.
There is also a special group, group 0, which always represent s t he ent ire expression. This group is
not included in t he t ot al report ed by groupCount .
Example:
Following example illust rat es how t o find a digit st ring from t he given alphanumeric st ring:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
Subexpressio n Matches
. Mat ches any single charact er except newline. Using m opt ion allows it t o
mat ch newline as well.
\Z End of ent ire st ring except allowable final line t erminat or.
(?: re) Groups regular expressions wit hout remembering mat ched t ext .
(?> re) Mat ches independent pat t ern wit hout backt racking.
\Z Mat ches end of st ring. If a newline exist s, it mat ches just before newline.
\b Mat ches word boundaries when out side bracket s. Mat ches backspace (0x08)
when inside bracket s.
Ret urns t he st art index of t he subsequence capt ured by t he given group during t he previous
mat ch operat ion.
Ret urns t he offset aft er t he last charact er of t he subsequence capt ured by t he given group
during t he previous mat ch operat ion.
At t empt s t o find t he next subsequence of t he input sequence t hat mat ches t he pat t ern.
Reset s t his mat cher and t hen at t empt s t o find t he next subsequence of t he input
sequence t hat mat ches t he pat t ern, st art ing at t he specified index.
Replaces every subsequence of t he input sequence t hat mat ches t he pat t ern wit h t he
given replacement st ring.
Replaces t he first subsequence of t he input sequence t hat mat ches t he pat t ern wit h t he
given replacement st ring.
Ret urns a lit eral replacement St ring for t he specified St ring. This met hod produces a St ring
t hat will work as a lit eral replacement s in t he appendReplacement met hod of t he Mat cher
class.
while(m.find()) {
count++;
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
}
}
}
Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11
Match number 4
start(): 19
end(): 22
You can see t hat t his example uses word boundaries t o ensure t hat t he let t ers "c" "a" "t " are not
merely a subst ring in a longer word. It also gives some useful informat ion about where in t he input
st ring t he mat ch has occurred.
The st art met hod ret urns t he st art index of t he subsequence capt ured by t he given group during
t he previous mat ch operat ion, and end ret urns t he index of t he last charact er mat ched, plus one.
Bot h met hods always st art at t he beginning of t he input st ring. Here is t he example explaining t he
funct ionalit y:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
-foo-foo-foo-
Ret urns a mult i-line st ring cont aining t he descript ion of t he synt ax error and it s index, t he
erroneous regular expression pat t ern, and a visual indicat ion of t he error index wit hin t he
pat t ern.