Java and Regular Expressions
Java and Regular Expressions
the usage of regular expressions with Java. It also provides several Java regular expression examples.
Regex
Table of Contents 1. Regular Expressions 1.1. Overview 1.2. Usage 1.3. Junit 2. Regular Expressions 2.1. Common matching symbols 2.2. Metacharacters 2.3. Quantifier 3. Using Regular Expressions with String.matches() 3.1. Overview 3.2. Examples 4. Pattern and Matcher 5. Java Regex Examples 5.1. Or 5.2. Phone number 5.3. Check for a certain number range 5.4. Building a link checker 6. Thank you 7. Questions and Discussion 8. Links and Literature
1. Regular Expressions
1.1. Overview
A regular expression define a search pattern for strings. This pattern may match one or several times or not at all for a given string. The abbreviation for regular expression is "regex". A simple example for a regular expression is a (literal) string. For example the regex "Hello World" will match exactly the phrase "Hello World". Another example for a regular expression is "." (dot) which matches any single character; it would match for example "a" or "z" or "1".
1.2. Usage
Regular expressions can be used to search, edit and manipulate text. Regular expressions are used in several programming languages, e.g. Java but also Perl, Groovy, etc. Unfortunately each language / program supports regex slightly different. The pattern defined by the regular expression is applied on the string from left to right. Once a source character has been used in a match, it cannot be reused. For example the regex "aba" will match "ababababa" only two times (aba_aba__).
1.3. Junit
Some of the following examples use JUnit to validate the result. You should be able to adjust them in case if you don't want to use JUnit. To learn about JUnit please see JUnit Tutorial .
2. Regular Expressions
The following is an overview of regular expressions. This chapter is supposed to be a references for the different regex elements.
2.2. Metacharacters
The following metacharacters have a pre-defined meaning and make certain common pattern easier to use, e.g. \d instead of [0..9]. Table 2. Regular Expression \d \D \s \S \w \W \S+ Description Any digit, short for [0-9] A non-digit, short for [^0-9] A whitespace character, short for [ \t\n\x0b\r\f] A non-whitespace character, for short for [^\s] A word character, short for [a-zA-Z_0-9] A non-word character [^\w] Several non-whitespace characters
2.3. Quantifier
A quantifier defines how often an element can occur. The symbols ?, *, + and {} define the quantity of the regular expressions Table 3. Regular Expression * + ? {X} {X,Y} *? Description Examples
Occurs zero or more times, is short X* - Finds no or several for {0,} letter X, .* - any character sequence Occurs one or more times, is short X+ - Finds one or several for {1,} letter X Occurs no or one times, ? is short X? -Finds no or exactly one for {0,1} letter X Occurs X number of times, {} \d{3} - Three digits, .{10} describes the order of the preceding any character sequence of liberal length 10 .Occurs between X and Y times, \d{1,4}- \d must occur at least once and at a maximum of four ? after a qualifier makes it a "reluctant quantifier", it tries to find
Regular Expression
Examples
package de.vogella.regex.test;
public static final String EXAMPLE_TEST = "This is my small example string which I'm going to use for pattern matching.";
public static void main(String[] args) { System.out.println(EXAMPLE_TEST.matches("\\w.*")); String[] splitString = (EXAMPLE_TEST.split("\\s+")); System.out.println(splitString.length);// Should be 14 for (String string : splitString) { System.out.println(string); } // Replace all whitespace with tabs System.out.println(EXAMPLE_TEST.replaceAll("\\s+", "\t")); } }
3.2. Examples
Create for the following example the Java project "de.vogella.regex.string". The following class gives several examples for the usage of regular expressions with strings. See the comment for the purpose.
package de.vogella.regex.string;
public class StringMatcher { // Returns true if the string matches exactly "true" public boolean isTrue(String s){
return s.matches("true"); } // Returns true if the string matches exactly "true" or "True" public boolean isTrueVersion2(String s){ return s.matches("[tT]rue"); }
// Returns true if the string matches exactly "true" or "True" // or "yes" or "Yes" public boolean isTrueOrYes(String s){ return s.matches("[tT]rue|[yY]es"); }
// Returns true if the string contains exactly "true" public boolean containsTrue(String s){ return s.matches(".*true.*"); }
// Returns true if the string contains of three letters public boolean isThreeLetters(String s){ return s.matches("[a-zA-Z]{3}"); // Simpler from for // } return s.matches("[a-Z][a-Z][a-Z]");
// Returns true if the string does not have a number at the beginning public boolean isNoNumberAtBeginning(String s){ return s.matches("^[^\\d].*"); } // Returns true if the string contains a arbitrary number of characters except b public boolean isIntersection(String s){ return s.matches("([\\w&&[^b]])*"); } // Returns true if the string contains a number less then 300 public boolean isLessThenThreeHundret(String s){ return s.matches("[^0-9]*[12]?[0-9]{1,2}[^0-9]*"); }
package de.vogella.regex.string;
assertTrue(m.isNoNumberAtBeginning("asdfdsf")); }
@Test public void testLessThenThreeHundret() { assertTrue(m.isLessThenThreeHundret("288")); assertFalse(m.isLessThenThreeHundret("3288")); assertFalse(m.isLessThenThreeHundret("328 8")); assertTrue(m.isLessThenThreeHundret("1")); assertTrue(m.isLessThenThreeHundret("99")); assertFalse(m.isLessThenThreeHundret("300")); }
For advanced regular expressions the classes you java.util.regex.Pattern and java.util.regex.Matcher are used. You first create a Pattern object which defines the regular expression. This pattern object allows create a Matcher object for a given string. This matcher object then allows you to do regex operations on the string.
package de.vogella.regex.test;
public class RegexTestPatternMatcher { public static final String EXAMPLE_TEST = "This is my small example string which I'm going to use for pattern matching.";
public static void main(String[] args) { Pattern pattern = Pattern.compile("\\w+"); // In case you would like to ignore case sensitivity you could use this // statement // Pattern pattern = Pattern.compile("\\s+", Pattern.CASE_INSENSITIVE); Matcher matcher = pattern.matcher(EXAMPLE_TEST); // Check all occurance while (matcher.find()) { System.out.print("Start index: " + matcher.start()); System.out.print(" End index: " + matcher.end() + " "); System.out.println(matcher.group());
} // Now create a new pattern and matcher to replace whitespace with tabs Pattern replace = Pattern.compile("\\s+"); Matcher matcher2 = replace.matcher(EXAMPLE_TEST); System.out.println(matcher2.replaceAll("\t")); } }
package de.vogella.regex.eitheror;
import org.junit.Test;
String s = "humbapumpa jim"; assertTrue(s.matches(".*(jim|joe).*")); s = "humbapumpa jom"; assertFalse(s.matches(".*(jim|joe).*")); s = "humbaPumpa joe"; assertTrue(s.matches(".*(jim|joe).*")); s = "humbapumpa joe jim"; assertTrue(s.matches(".*(jim|joe).*")); } }
package de.vogella.regex.phonenumber;
import org.junit.Test;
@Test public void testSimpleTrue() { String pattern = "\\d\\d\\d([,\\s])?\\d\\d\\d\\d"; String s= "1233323322"; assertFalse(s.matches(pattern)); s = "1233323"; assertTrue(s.matches(pattern)); s = "123 3323"; assertTrue(s.matches(pattern)); } }
import org.junit.Test;
@Test public void testSimpleTrue() { String s= "1233"; assertTrue(test(s)); s= "0"; assertFalse(test(s)); s = "29 Kasdkf 2300 Kdsdf"; assertTrue(test(s)); s = "99900234"; assertTrue(test(s)); }
public static boolean test (String s){ Pattern pattern = Pattern.compile("\\d{3}"); Matcher matcher = pattern.matcher(s); if (matcher.find()){ return true;
} return false; }
package de.vogella.regex.weblinks;
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.net.MalformedURLException; import java.net.URL; import java.util.ArrayList; import java.util.List; import java.util.regex.Matcher; import java.util.regex.Pattern;
private Pattern htmltag; private Pattern link; private final String root;
public List<String> getLinks(String url) { List<String> links = new ArrayList<String>(); try { BufferedReader bufferedReader = new BufferedReader( new InputStreamReader(new URL(url).openStream())); String s; StringBuilder builder = new StringBuilder(); while ((s = bufferedReader.readLine()) != null) { builder.append(s); }
matcher.find(); String link = matcher.group().replaceFirst("href=\"", "") .replaceFirst("\">", ""); if (valid(link)) { links.add(makeAbsolute(url, link)); } } } catch (MalformedURLException e) { e.printStackTrace(); } catch (IOException e) {