Regular Expressions

REGULAR
EXPRESSIONS
RICHARD A. SANTOS
FORMAL RECURSIVE DEFINITION OF RE
• These are shorthand notations that describe a language. These are used in different programming languages
and language tools such as lex, vi editor, PHP and PERL.
• These are useful for representing certain sets of strings in an algebraic fashion.
• RE is the language part of the Type 3 grammar and is accepted by finite automata (FA) machines.
• Any set represented by a regular expression is called a regular set.
• Any terminal symbol or element that belongs to an alphabet (Σ) is a RE. The null string (ε) and the null set (∅)
are also considered as RE.
• RE was first proposed by an American mathematician Stephen Kleene to describe the algebra of a regular set.
Then, Ken Thompson used the RE in early computer text editor ‘QED’ and UNIX editor ‘ed’.
• For instance:
In a regular expression, x* means zero or more occurrence of x. It can generate {ε, x, xx, xxx, xxxx, .....}
In a regular expression, x+ means one or more occurrence of x. It can generate {x, xx, xxx, xxxx, .....}
Example 1:
Write the regular expression for the language accepting all combinations of a's, over the set ∑ = {a}
Solution:
All combinations of a's means a may be zero, single, double and so on. If a is appearing zero times, that means a
null string. That is we expect the set of {ε, a, aa, aaa, ....}. So we give a regular expression for this as: R = a*
Example 2:
Write the regular expression for the language accepting all combinations of a's except the null string, over the set
∑ = {a}
Solution:
The regular expression has to be built for the language L = {a, aa, aaa, ....}
This set indicates that there is no null string. So we can denote regular expression as: R = a+
Example 3:
Write the regular expression for the language accepting all the string containing any number of a's and b's.
Solution:
The regular expression will be: r.e. = (a + b)*
This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, .....}, any combination of a and b.
The (a + b)* shows any combination with a and b even a null string.
BASIC OPERATIONS ON RE
1. Union: If R1 and R2 are regular expressions, then R1 | R2 (also written as R1 U R2 or R1 + R2) is also a regular
expression. The symbol + means union or or.
L(R1|R2) = L(R1) U L(R2).
Example 1: Build a regular expression of a set of all strings ending with 01.
Solution: The language has the elements {001, 101, 101001, 00101 …}. Hence, the RE is (0 + 1)*01.
Example 2: If language L is {ma, pa} and language M is {be, bop}, then

Solution: L + M is {ma, pa, be, bop}
2. Concatenation: If R1 and R2 are regular expressions, then R1R2 (also written as R1.R2) is also a regular
expression.
L(R1R2) = L(R1) concatenated with L(R2).
Example 1: Describe the RE (ab)* in the English language.

Solution: The Language has the elements {ab, abab, ababab ...}. In English, it is described as: The set of all strings
of a and b with an equal number of a and b containing ‘ab’ as repetition.
Example 2: If language L is {ma, pa} and language M is {be, bop}, then

Solution: LM is {mabe, mabop, pabe, pabop}
3. Iteration / Closure: If R1 and R2 are two regular expressions over Σ, then L(R*) or L(R+) is a string obtained by
concatenating n elements for n ≥ 0.
Kleene closure: If R1 is a regular expression, then R1* (the Kleene closure of R1) is also a regular expression.
L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U ...
Example: If language L is {ma, pa} and language M is {be, bop}, then

Solution: L ∗ is {ε, ma, pa, mama, . . . , pamamapa, . . .}
Difference between Closure and Kleene’s Closure:

- Closure is nothing but the iteration of 0 to ∞ times, but Kleene’s closure is the set, including ε.
- Closure is applied to RE, but Kleene’s closure is applied to Σ.
Examples:
a. If R = 01, then the closure on R denoted by R* is 01, 0101, 010101 … etc. The RE can be described as the
iteration of the same string 01.
b. If Σ = {0, 1}, then the Kleene’s closure is denoted by Σ* = {ε, 01, 00, 11, 010, 011, 100 …}. The alphabet can be
described as the set of any combinations of 0 and 1, including ε.
Precedence of Operation
The basic operations performed on regular expressions are union, concatenation, and closure. Among these three
(3) operations, closure has the highest precedence, the next highest is for concatenation, and the least
precedence is for union.
Practice Exercise:
Identify the regular set for each regular expressions (write at least 5 elements for each set)
1. (0 + 10*)
2. (0*10*)
3. (0 + ε)(1 + ε)
4. (a+b)*
5. (a+b)*abb
6. (11)*
7. (aa)*(bb)*b
8. (aa + ab + ba + bb)*

Regular Expressions

Uploaded by

Copyright:

Available Formats

Regular Expressions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regular Expressions

Uploaded by

Copyright:

Available Formats

REGULAR

L(R1|R2) = L(R1) U L(R2).

Example 2: If language L is {ma, pa} and language M is {be, bop}, then

L(R1R2) = L(R1) concatenated with L(R2).

Example 1: Describe the RE (ab)* in the English language.

Example 2: If language L is {ma, pa} and language M is {be, bop}, then

L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U ...

Example: If language L is {ma, pa} and language M is {be, bop}, then

Difference between Closure and Kleene’s Closure:

You might also like