Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
30 views

Java - Regular Expressions: Capturing Groups

java regular expressions

Uploaded by

rafael
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Java - Regular Expressions: Capturing Groups

java regular expressions

Uploaded by

rafael
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

JAVA - REGULAR EXPRESSIONS

http://www.tutorialspoint.com/java/java_regular_expressions.htm Copyright © tutorials point.com

Java provides t he java.ut il.regex package for pat t ern mat ching wit h regular expressions. Java regular
expressions are very similar t o t he Perl programming language and very easy t o learn.

A regular expression is a special sequence of charact ers t hat helps you mat ch or find ot her st rings or
set s of st rings, using a specialized synt ax held in a pat t ern. They can be used t o search, edit , or
manipulat e t ext and dat a.

The java.ut il.regex package primarily consist s of t he following t hree classes:

Pattern Class: A Pat t ern object is a compiled represent at ion of a regular expression. The
Pat t ern class provides no public const ruct ors. To creat e a pat t ern, you must first invoke one
of it s public st at ic compile met hods, which will t hen ret urn a Pat t ern object . These met hods
accept a regular expression as t he first argument .

Matcher Class: A Mat cher object is t he engine t hat int erpret s t he pat t ern and performs
mat ch operat ions against an input st ring. Like t he Pat t ern class, Mat cher defines no public
const ruct ors. You obt ain a Mat cher object by invoking t he mat cher met hod on a Pat t ern
object .

PatternSyntaxExceptio n: A Pat t ernSynt axExcept ion object is an unchecked except ion


t hat indicat es a synt ax error in a regular expression pat t ern.

Capt uring Groups:


Capt uring groups are a way t o t reat mult iple charact ers as a single unit . They are creat ed by placing
t he charact ers t o be grouped inside a set of parent heses. For example, t he regular expression (dog)
creat es a single group cont aining t he let t ers "d", "o", and "g".

Capt uring groups are numbered by count ing t heir opening parent heses from left t o right . In t he
expression ((A)(B(C))), for example, t here are four such groups:

((A)(B(C)))

(A)

(B(C))

(C)

To find out how many groups are present in t he expression, call t he groupCount met hod on a
mat cher object . The groupCount met hod ret urns an int showing t he number of capt uring groups
present in t he mat cher's pat t ern.

There is also a special group, group 0, which always represent s t he ent ire expression. This group is
not included in t he t ot al report ed by groupCount .

Example:
Following example illust rat es how t o find a digit st ring from t he given alphanumeric st ring:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches


{
public static void main( String args[] ){

// String to be scanned to find the pattern.


String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";

// Create a Pattern object


Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
} else {
System.out.println("NO MATCH");
}
}
}

This would produce t he following result :

Found value: This order was placed for QT3000! OK?


Found value: This order was placed for QT300
Found value: 0

Regular Expression Synt ax:


Here is t he t able list ing down all t he regular expression met acharact er synt ax available in Java:

Subexpressio n Matches

^ Mat ches beginning of line.

$ Mat ches end of line.

. Mat ches any single charact er except newline. Using m opt ion allows it t o
mat ch newline as well.

[...] Mat ches any single charact er in bracket s.

[^...] Mat ches any single charact er not in bracket s

\A Beginning of ent ire st ring

\z End of ent ire st ring

\Z End of ent ire st ring except allowable final line t erminat or.

re* Mat ches 0 or more occurrences of preceding expression.

re+ Mat ches 1 or more of t he previous t hing

re? Mat ches 0 or 1 occurrence of preceding expression.

re{ n} Mat ches exact ly n number of occurrences of preceding expression.

re{ n,} Mat ches n or more occurrences of preceding expression.

re{ n, m} Mat ches at least n and at most m occurrences of preceding expression.

a| b Mat ches eit her a or b.

(re) Groups regular expressions and remembers mat ched t ext .

(?: re) Groups regular expressions wit hout remembering mat ched t ext .

(?> re) Mat ches independent pat t ern wit hout backt racking.

\w Mat ches word charact ers.

\W Mat ches nonword charact ers.

\s Mat ches whit espace. Equivalent t o [\t \n\r\f].


\S Mat ches nonwhit espace.

\d Mat ches digit s. Equivalent t o [0-9].

\D Mat ches nondigit s.

\A Mat ches beginning of st ring.

\Z Mat ches end of st ring. If a newline exist s, it mat ches just before newline.

\z Mat ches end of st ring.

\G Mat ches point where last mat ch finished.

\n Back-reference t o capt ure group number "n"

\b Mat ches word boundaries when out side bracket s. Mat ches backspace (0x08)
when inside bracket s.

\B Mat ches nonword boundaries.

\n, \t , et c. Mat ches newlines, carriage ret urns, t abs, et c.

\Q Escape (quot e) all charact ers up t o \E

\E Ends quot ing begun wit h \Q

Met hods of t he Mat cher Class:


Here is a list of useful inst ance met hods:

Index Met hods:


Index met hods provide useful index values t hat show precisely where t he mat ch was found in t he
input st ring:

SN Metho ds with Descriptio n

1 public int start()

Ret urns t he st art index of t he previous mat ch.

2 public int start(int gro up)

Ret urns t he st art index of t he subsequence capt ured by t he given group during t he previous
mat ch operat ion.

3 public int end()

Ret urns t he offset aft er t he last charact er mat ched.

4 public int end(int gro up)

Ret urns t he offset aft er t he last charact er of t he subsequence capt ured by t he given group
during t he previous mat ch operat ion.

St udy Met hods:


St udy met hods review t he input st ring and ret urn a Boolean indicat ing whet her or not t he pat t ern is
found:
SN Metho ds with Descriptio n

1 public bo o lean lo o kingAt()

At t empt s t o mat ch t he input sequence, st art ing at t he beginning of t he region, against t he


pat t ern.

2 public bo o lean find()

At t empt s t o find t he next subsequence of t he input sequence t hat mat ches t he pat t ern.

3 public bo o lean find(int start

Reset s t his mat cher and t hen at t empt s t o find t he next subsequence of t he input
sequence t hat mat ches t he pat t ern, st art ing at t he specified index.

4 public bo o lean matches()

At t empt s t o mat ch t he ent ire region against t he pat t ern.

Replacement Met hods:


Replacement met hods are useful met hods for replacing t ext in an input st ring:

SN Metho ds with Descriptio n

1 public Matcher appendReplacement(StringBuffer sb, String replacement)

Implement s a non-t erminal append-and-replace st ep.

2 public StringBuffer appendT ail(StringBuffer sb)

Implement s a t erminal append-and-replace st ep.

3 public String replaceAll(String replacement)

Replaces every subsequence of t he input sequence t hat mat ches t he pat t ern wit h t he
given replacement st ring.

4 public String replaceFirst(String replacement)

Replaces t he first subsequence of t he input sequence t hat mat ches t he pat t ern wit h t he
given replacement st ring.

5 public static String quo teReplacement(String s)

Ret urns a lit eral replacement St ring for t he specified St ring. This met hod produces a St ring
t hat will work as a lit eral replacement s in t he appendReplacement met hod of t he Mat cher
class.

The start and end Met hods:


Following is t he example t hat count s t he number of t imes t he word "cat s" appears in t he input
st ring:
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches


{
private static final String REGEX = "\\bcat\\b";
private static final String INPUT =
"cat cat cat cattie cat";

public static void main( String args[] ){


Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
int count = 0;

while(m.find()) {
count++;
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
}
}
}

This would produce t he following result :

Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11
Match number 4
start(): 19
end(): 22

You can see t hat t his example uses word boundaries t o ensure t hat t he let t ers "c" "a" "t " are not
merely a subst ring in a longer word. It also gives some useful informat ion about where in t he input
st ring t he mat ch has occurred.

The st art met hod ret urns t he st art index of t he subsequence capt ured by t he given group during
t he previous mat ch operat ion, and end ret urns t he index of t he last charact er mat ched, plus one.

The matches and lookingAt Met hods:


The mat ches and lookingAt met hods bot h at t empt t o mat ch an input sequence against a pat t ern.
The difference, however, is t hat mat ches requires t he ent ire input sequence t o be mat ched, while
lookingAt does not .

Bot h met hods always st art at t he beginning of t he input st ring. Here is t he example explaining t he
funct ionalit y:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches


{
private static final String REGEX = "foo";
private static final String INPUT = "fooooooooooooooooo";
private static Pattern pattern;
private static Matcher matcher;

public static void main( String args[] ){


pattern = Pattern.compile(REGEX);
matcher = pattern.matcher(INPUT);

System.out.println("Current REGEX is: "+REGEX);


System.out.println("Current INPUT is: "+INPUT);
System.out.println("lookingAt(): "+matcher.lookingAt());
System.out.println("matches(): "+matcher.matches());
}
}

This would produce t he following result :

Current REGEX is: foo


Current INPUT is: fooooooooooooooooo
lookingAt(): true
matches(): false

The replaceFirst and replaceAll Met hods:


The replaceFirst and replaceAll met hods replace t ext t hat mat ches a given regular expression. As
t heir names indicat e, replaceFirst replaces t he first occurrence, and replaceAll replaces all
occurrences.

Here is t he example explaining t he funct ionalit y:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches


{
private static String REGEX = "dog";
private static String INPUT = "The dog says meow. " +
"All dogs say meow.";
private static String REPLACE = "cat";

public static void main(String[] args) {


Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
INPUT = m.replaceAll(REPLACE);
System.out.println(INPUT);
}
}

This would produce t he following result :

The cat says meow. All cats say meow.

The appendReplacement and appendTail Met hods:


The Mat cher class also provides appendReplacement and appendTail met hods for t ext replacement .

Here is t he example explaining t he funct ionalit y:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches


{
private static String REGEX = "a*b";
private static String INPUT = "aabfooaabfooabfoob";
private static String REPLACE = "-";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
StringBuffer sb = new StringBuffer();
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
System.out.println(sb.toString());
}
}

This would produce t he following result :

-foo-foo-foo-

Pat t ernSynt axExcept ion Class Met hods:


A Pat t ernSynt axExcept ion is an unchecked except ion t hat indicat es a synt ax error in a regular
expression pat t ern. The Pat t ernSynt axExcept ion class provides t he following met hods t o help you
det ermine what went wrong:

SN Metho ds with Descriptio n

1 public String getDescriptio n()

Ret rieves t he descript ion of t he error.

2 public int getIndex()

Ret rieves t he error index.

3 public String getPattern()

Ret rieves t he erroneous regular expression pat t ern.

4 public String getMessage()

Ret urns a mult i-line st ring cont aining t he descript ion of t he synt ax error and it s index, t he
erroneous regular expression pat t ern, and a visual indicat ion of t he error index wit hin t he
pat t ern.

You might also like