Regular Expressions
Regular Expressions
Regular Expressions
EXPRESSIONS
RICHARD A. SANTOS
FORMAL RECURSIVE DEFINITION OF RE
• These are shorthand notations that describe a language. These are used in different programming languages
and language tools such as lex, vi editor, PHP and PERL.
• These are useful for representing certain sets of strings in an algebraic fashion.
• RE is the language part of the Type 3 grammar and is accepted by finite automata (FA) machines.
• Any set represented by a regular expression is called a regular set.
• Any terminal symbol or element that belongs to an alphabet (Σ) is a RE. The null string (ε) and the null set (∅)
are also considered as RE.
• RE was first proposed by an American mathematician Stephen Kleene to describe the algebra of a regular set.
Then, Ken Thompson used the RE in early computer text editor ‘QED’ and UNIX editor ‘ed’.
• For instance:
In a regular expression, x* means zero or more occurrence of x. It can generate {ε, x, xx, xxx, xxxx, .....}
In a regular expression, x+ means one or more occurrence of x. It can generate {x, xx, xxx, xxxx, .....}
FORMAL RECURSIVE DEFINITION OF RE
Example 1:
Write the regular expression for the language accepting all combinations of a's, over the set ∑ = {a}
Solution:
All combinations of a's means a may be zero, single, double and so on. If a is appearing zero times, that means a
null string. That is we expect the set of {ε, a, aa, aaa, ....}. So we give a regular expression for this as: R = a*
FORMAL RECURSIVE DEFINITION OF RE
Example 2:
Write the regular expression for the language accepting all combinations of a's except the null string, over the set
∑ = {a}
Solution:
The regular expression has to be built for the language L = {a, aa, aaa, ....}
This set indicates that there is no null string. So we can denote regular expression as: R = a+
FORMAL RECURSIVE DEFINITION OF RE
Example 3:
Write the regular expression for the language accepting all the string containing any number of a's and b's.
Solution:
The regular expression will be: r.e. = (a + b)*
This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, .....}, any combination of a and b.
The (a + b)* shows any combination with a and b even a null string.
BASIC OPERATIONS ON RE
1. Union: If R1 and R2 are regular expressions, then R1 | R2 (also written as R1 U R2 or R1 + R2) is also a regular
expression. The symbol + means union or or.
Example 1: Build a regular expression of a set of all strings ending with 01.
Solution: The language has the elements {001, 101, 101001, 00101 …}. Hence, the RE is (0 + 1)*01.
3. Iteration / Closure: If R1 and R2 are two regular expressions over Σ, then L(R*) or L(R+) is a string obtained by
concatenating n elements for n ≥ 0.
Kleene closure: If R1 is a regular expression, then R1* (the Kleene closure of R1) is also a regular expression.
Examples:
a. If R = 01, then the closure on R denoted by R* is 01, 0101, 010101 … etc. The RE can be described as the
iteration of the same string 01.
b. If Σ = {0, 1}, then the Kleene’s closure is denoted by Σ* = {ε, 01, 00, 11, 010, 011, 100 …}. The alphabet can be
described as the set of any combinations of 0 and 1, including ε.
BASIC OPERATIONS ON RE
Precedence of Operation
The basic operations performed on regular expressions are union, concatenation, and closure. Among these three
(3) operations, closure has the highest precedence, the next highest is for concatenation, and the least
precedence is for union.
BASIC OPERATIONS ON RE
Practice Exercise:
Identify the regular set for each regular expressions (write at least 5 elements for each set)
1. (0 + 10*)
2. (0*10*)
3. (0 + ε)(1 + ε)
4. (a+b)*
5. (a+b)*abb
6. (11)*
7. (aa)*(bb)*b
8. (aa + ab + ba + bb)*