Friday, 27 January 2017

Regular Expressions in Java

Regular expressions are a language of string patterns built in to most modern programming languages, including Java 1.4 onward; they can be used for: searching, extracting, and modifying text. Java provides java.util.regex package for pattern matching with regular expressions. Although the syntax accepted by this package is similar to the Perl programming language, knowledge of Perl is not a prerequisite. 

Regular expressions, by definition, are string patterns that describe text. These descriptions can be used in nearly infinite ways. The basic language constructs include character classes, quantifiers, 
and meta-characters.

1. Character Classes - 
Character classes  are used to define the content of the pattern i.e what the pattern should look for

(a) .     - Dot, any character (may or may not match line terminators
(b) \d   - A digit [0-9]
(c) \D  - A non-digit [^0-9]
(d) \s   - A white space character [\t \n \f \r \x0B]
(e) \S  - A non-white space character [^\s]
(f) \w  - A word character [a-zA-Z_0-9]
(g) \W - A non-word character [^\w]

2. Quantifiers -
Quantifies can be used to specify the number or length that part of a pattern should match or repeat. A Quantifier will bind to the expression group to its immediate left.

(a) *   - Match 0 or more occurrences
(b) +   - Match 1 or more occurrences
(c) ?   - Match 0 or 1 occurrences
(d) {n} - Match exactly n occurrences
(e) {n, } - Match at least n occurrences
(f) {n, m} - Match at least n occurrences but not more than m 

3. Meta-Characters - 
Meta-Characters are used to group, divide and perform special operations in patterns.

(a) \   - Escape the next meta-character
(b) ^  - Match the beginning of the line
(c) .   - Match any character (except newline)
(d) $  - Match the end of the line (or before newline at the end)
(e) |   - Alteration
(f) ()  - Grouping
(g) []  - Custom character-class

Now, we'll see example to match string and grouping data using regex -


package com.anjan.regex;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexDemo {
// Matching String with regex
public static void printRegex(String value, String regex){
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(value);
if(matcher.matches()){
System.out.println(value+" is matching with the \\"+regex);
}else{
System.out.println(value+" is not matching with the \\"+regex);
}
}
// Grouping Data of String using regex
public static void printGroupData(String value, String regex){
Pattern pattern = Pattern.compile(regex);
Matcher matcher pattern.matcher(value);
if(matcher.matches()){
String str = matcher.group(1);
System.out.println("Group Data : "+str);
}else{
System.out.println(value+" is not matching with the \\"+regex+", hence no group data");
}
}
public static void main(String args[]){
printRegex("123456", "\\d+");
printRegex("123456", "\\d{6}");
printRegex("abABcdCD", "\\w+");
printRegex("   \t", "\\s+");
printGroupData("asdf2345hfjd", "\\w+(\\d{4})\\w+");
printGroupData("123\"hhjhd12s\"123hfjds23hjs", ".*\"(.*)\".*");
}
}


Output - 

123456 is matching with the \\d+
123456 is matching with the \\d{6}
abABcdCD is matching with the \\w+
    is matching with the \\s+
Group Data : 2345
Group Data : hhjhd12s