Chapter 2 RegularExpressions
Chapter 2 RegularExpressions
Chapter 2 RegularExpressions
Chapter 2
Topics
1) Regular Expressions(RE)
2) FA to RE conversion and vice-versa
3) How to prove whether a given language is
regular or not?
4) Closure properties of regular languages
1
RE’s: Introduction
Regular expressions are an algebraic way to describe
languages.
They describe exactly the regular languages.
If E is a regular expression, then L(E) is the language it
defines.
A regular is expression (sometimes called a rational
expression) in computer science and formal languages
theory.
A sequence of characters that define a search pattern. usually
this pattern is then used by in string searching algorithm
"find" or "find and replace" operations on strings.
2
RE’s: Introduction…..
Regular expressions are the most effective way to represent
any language.
A regular expression can be defined as a language or string
accepted by a finite automata.
Basis 1: If a is any symbol, then a is a RE, and L(a) = {a}.
Note: {a} is the language containing one string, and that
string is of length 1.
Basis 2: ε is a RE, and L(ε) = {ε}.
Basis 3: ∅ is a RE, and L(∅) = ∅.
3
RE’s: Introduction…
The set of regular expression of defined by the following rules:
(i) Every letter of ∑ can be made into regular expression, null
string,€ itself is a regular expression.
(ii)If r1 and r2 are regular expression, then
(a) (r1) (b) r1r2
(c) r1+r2 (d) r*1
+
(e) r1 are also regular expression
(iii) Nothing else is regular expression.
Regular Expressions vs. Finite Automata
Offers a declarative way to express the pattern of any
string we want to accept
E.g., 01*+ 10*
Automata => more machine-like
< input: string , output: [accept/reject] >
Regular expressions => more program syntax-like
Unix environments heavily use regular expressions
E.g., bash shell, grep, vi & other editors, sed
Perl scripting – good for string processing
Lexical analyzers such as Lex or Flex
5
Regular Expressions
Regular = Finite Automata
expressions (DFA, NFA, -NFA)
Syntactical
expressions Automata/machines
Regular
Languages
6
Language Operators
Union of two languages:
L U M = all strings that are either in L or M
language
Concatenation of two languages:
L . M = all strings that are of the form xy
s.t., x L and y M
The dot operator is usually omitted
7
“i” here refers to how many strings to concatenate from the parent language L
to produce strings in the language L i
9
Building Regular Expressions
(i) The constants ϵ(null string) and ɸ(empty set) are regular
expression,
denote the languages {ϵ} and ɸ, respectively.
That is, L(ϵ) = {ϵ} , and L(ɸ)= ɸ.
(ii)
If a is any symbol, then a is regular expression. This
expression denotes
the language {a}. That is L(a)={a}.
(iii) A variable, usually capitalized and such as L is a variable,
representing any language.
Building Regular Expressions
Let E be a regular expression and the language represented
by E is L(E)
Then:
(E) = E
L(E*) = (L(E))*
11
identity Rules for RE
The two regular expression’s P and Q are equivalent (denoted as P=Q) if and
only if P represents the same set of strings as Q does.
For showing the equivalence of two regular expressions we need to show some
identities of regular expression’s
Let P, Q and R be the regular expressions then the identity rules are as follows −
εR=R ε=R
ε*= ε ε is null string (P+Q)R=PR+QR
(Φ)*= ε Φ is empty string (P+Q)*=(P*Q*)*=(P*+Q*)*
ΦR=R Φ= Φ R*(ε+R)=( ε+R)R*=R*
Φ+R=R (R+ε)*=R*
R+R=R Ε+R*=R*
RR*=R*R=R+ (PQ)*P=P(QP)*
(R*)*=R* R*R+R=R*R
Ε+RR*=R*
12
Example: how to use these regular expression
properties and language operators?
L = { w | w is a binary string which does not
contain two consecutive 0s or two
consecutive 1s anywhere)
E.g., w = 01010101 is in L, while w = 10010 is not
in L • Regular expression for
Goal: Build a regular expression for L the four cases:
Case A: (01)*
Four cases for w: Case B: (10)*
Case A: w starts with 0 and |w| is even Case C: 0(10)*
Case B: w starts with 1 and |w| is even Case D: 1(01)*
Case C: w starts with 0 and |w| is odd
Case D: w starts with 1 and |w| is odd
13
Examples
Write the regular expression for the
language accepting all the string r.e. = (a + b)*
containing any number of a's and b's.
. (concatenation)
+ operator
Example:
01* + 1 = ( 0 . ((1)*) ) + 1
16
Equivalence between regular expressions
and finite automata
Strategy:
Convert regular expression to an -NFA
-NFA NFA
Theorem 2 Kleene Theorem
Reg Ex DFA
Theorem 1
18
DFA to RE construction
The two popular methods for converting a given DFA to its
regular expression are-
19
DFA to RE construction
DFA Reg Ex
Theorem 1
Informally, trace all distinct paths (traversing cycles only once)
from the start state to each of the final states and enumerate all
the expressions along the way.
1 0 0,1
Example: q0 0 q1 1 q2
1* 00* 1 (0+1)*
Statement − Conditions-
• Let P and Q be two regular To use Arden’s Theorem, following
expressions. conditions must be satisfied-
• If P does not contain null string • The transition diagram must not
have any ∈ transitions.
(I) R = Q + RP has a unique
• There must be only a single initial
solution,
state.
(II) R = QP*
Cont…
Proof −
R = Q + (Q + RP)P [After putting the value R = Q +
RP]
= Q + QP + RPP
When we put the value of R recursively again and again, we get the
following equation −
R = Q + QP + QP2 + QP3…..
R = Q (ϵ + P + P2 + P3 + …. )
R = QP* [As P* represents (ϵ
+ P + P2 + P3 + ….) ]
proved.
Assumptions for Applying Arden’s
Theorem
• The transition diagram must not have NULL
transitions
• It must have only one initial state:
Method
Step 1 − Create equations as the following form for all the states of the DFA
having n states with initial state q1.
q1 = q1R11 + q2R21 + … + qnRn1 + ϵ
q2 = q1R12 + q2R22 + … + qnRn2
…………………………………………………………….
…………………………………………………………….
24
Cont.…
= q1a + q2aa + ε (Substituting value of q3)
= q1a + q1b(b + ab*)aa + ε (Substituting value of q2)
= q1(a + b(b + ab)*aa) + ε
= ε (a+ b(b + ab)*aa)*
= (a + b(b + ab)*aa)*
Hence, the regular expression is (a + b(b + ab)*aa)*.
25
Example 2
Find regular expression for the following
DFA using Arden’s Theorem-
Step-02:
Solution- Bring final state in the form R = Q + RP.
Step-01: Using (1) in (2), we get-
Form a equation for each state- B = (∈ + B.1).0
A = ∈ + B.1 ……(1) B = ∈.0 + B.1.0
B = A.0 ……(2) B = 0 + B.(1.0) ……(3)
Using Arden’s Theorem in (3), we get-
B = 0.(1.0)*
Thus, Regular Expression for the given
DFA = 0(10)*
26
Example 3
Find regular expression for the
following DFA using Arden’s
Theorem-
Step-02:
Solution- Bring final state in the form
Step-01: R = Q + RP.
Form a equation for each state- Using (1) in (2), we get-
q1 = ∈ ……(1) q2 = ∈.a
q2 = q1.a ……(2) q2 = a …….(4)
q3 = q1.b + q2.a + q3.a …….(3) Using (1) and (4) in (3), we get-
q3 = q1.b + q2.a + q3.a
Using Arden’s Theorem in (5), we get- q3 = ∈.b + a.a + q3.a
q3 = (b + a.a)a* q3 = (b + a.a) + q3.a …….(5)
Thus, Regular Expression for the given
DFA = (b + aa)a* 27
Exercise
Construct the regular expression for the following FA
q3
State Elimination Method-
This method involves the following steps in finding the regular
expression for any given DFA-
Thumb Rule
Step-01:
The initial state of the DFA must not have any incoming edge.
• If there exists any incoming edge to the initial state, then create a new
initial state having no incoming edge to it.
Example-
29
State Elimination Method…..
Step-02: Thumb Rule
There must exist only one final state in the DFA.
• If there exists multiple final states in the DFA,
then convert all the final states into non-final
states and create a new single final state.
Example-
30
State Elimination Method…..
Thumb Rule
Step-03: The final state of the DFA must not have any outgoing
edge.
If there exists any outgoing edge from the final state,
then create a new final state having no outgoing edge
from it.
Example-
31
State Elimination Method…..
Step-04:
Eliminate all the intermediate states one by one.
These states may be eliminated in any order.
In the end,
• Only an initial state going to the final state will be left.
• The cost of this transition is the required regular expression.
NOTE: The state elimination method can be applied to any finite automata.
(NFA, ∈-NFA, DFA etc)
32
Example 1
Find regular expression for the following FA-
Solution-
Step-01:
Initial state A has an incoming edge. Step-02:
So, we create a new initial state qi. Final state B has an outgoing
The resulting FA is- edge.
So, we create a new final state
qf.
The resulting FA is-
33
Cont….
Step-03:
Now, we start eliminating the intermediate states.
First, let us eliminate state A.
There is a path going from state qi to state B via state A.
So, after eliminating state A, we put a direct path from state
qi to state B having cost ∈.0 = 0
There is a loop on state B using state A.
So, after eliminating state A, we put a direct loop on state B
having cost 1.0 = 10.
34
Cont…
Step-04:
Now, let us eliminate state B.
• There is a path going from state qi to state qf via state B.
• So, after eliminating state B, we put a direct path from state qi to state qf having
cost 0.(10)*.∈ = 0(10)*
Eliminating state B, we get-
From here, Regular Expression = 0(10)*
35
Example 2
Find regular expression for the following
DFA
Solution-
Step 01:
There exist multiple final states.
So, we convert them into a single final
state.
The resulting FA is
36
Cont…
Step-02:
Now, we start eliminating the intermediate states.
First, let us eliminate state q4.
There is a path going from state q2 to state qf via state q4.
So, after eliminating state q4 , we put a direct path from
state q2 to state qf having cost b.∈ = b.
37
Cont…
Step-03:
Now, let us eliminate state q3.
There is a path going from state q2 to state qf via state q3.
So, after eliminating state q3 , we put a direct path from
state q2 to state qf having cost c.∈ = c.
38
Cont…
Step-04:
Now, let us eliminate state q5.
There is a path going from state q2 to state qf via state q5.
So, after eliminating state q5 , we put a direct path from state q2 to state
qf having cost d.∈ = d.
39
Cont…
Step-05:
Now, let us eliminate state q2.
There is a path going from state q1 to state qf via state q2.
So, after eliminating state q2 , we put a direct path from state q1 to state
qf having cost a.(b+c+d).
40
Example 3
Solution-
Step-01:
Initial state q1 has an incoming edge.
• So, we create a new initial state qi.
The resulting DFA is-
Step-02:
Final state q2 has an outgoing edge.
• So, we create a new final state qf.
The resulting DFA is-
41
Example3:
Step-03:
Now, we start eliminating the intermediate states.
First, let us eliminate state q1.
There is a path going from state qi to state q2 via state q1 .
So, after eliminating state q1, we put a direct path from state qi to state q2 having
cost ∈.c*.a = c*a
There is a loop on state q2 using state q1 .
So, after eliminating state q1 , we put a direct loop on state q2 having cost b.c*.a =
bc*a
Eliminating state q1, we get-
42
Example3:
Step-04:
Now, let us eliminate state q2.
There is a path going from state qi to state qf via state q2 .
So, after eliminating state q2, we put a direct path from state qi to state qf having
cost c*a(d+bc*a)*∈ = c*a(d+bc*a)*
Eliminating state q2, we get-
43
Exercises
Find regular expression for the following DFA-
44
RE to -NFA construction
(Thompson Construction )
Reg Ex -NFA
Theorem 2
(0+1)*01(0+1)*
Example:
(0+1)* 01 (0+1)*
0 0
0 1
1
1
45
Thompson Construction Method
The algorithm works recursively by splitting an expression into its
constituent sub expressions, from which the NFA will be constructed
using a set of rules.
Following are the rules :
46
Cont…
1. The union expression s/t converted to
Start q
a q
qf
b
1 2
Case 3 − For a regular expression (a+b), we can construct the following FA
−
Start q1
a qf
b
a,b
Start qf
Example:-
Start ϵ ϵ
b
Cont…
Step 2: Since closure is required to take next, we construct automation for
(a+b)* using automation for (a+b) ……..
ϵ
a
ϵ ϵ
Start ϵ ϵ
ϵ b ϵ
ϵ
Cont…
ϵ a ϵ ϵ
Star a ϵ ϵ
ϵ b ϵ ϵ
Cont…
ϵ
Cont…
Step 5: Now finally we can construct automation for a.
(a+b)*.b.b
ϵ ϵ a ϵ
Start a ϵ b b
ϵ ϵ
ϵ b ϵ
Algebraic Laws of Regular Expressions
Commutative: Distributive:
E+F = F+E E(F+G) = EF + EG
E+Φ = E Φ* =
E = E = E * =
Annihilator: E+ =EE*
ΦE = EΦ = Φ E? = +E
57
True or False?
Let R and S be two regular expressions. Then:
1. ((R*)*)* = R* ?
2. (R+S)* = R* + S* ?
58
The Pumping Lemma for Regular
Languages
What it is?
The Pumping Lemma is a property of all regular
languages.
How is it used?
A technique that is used to show that a given language is
not regular
59
Pumping Lemma for Regular Languages
Let L be a regular language
Then there exists some constant N such that for every
60
Method to prove that a language
L is not regular
At first, we have to assume that L is regular.
So, the pumping lemma should hold for L.
Use the pumping lemma to obtain a contradiction −
Select w such that |w| ≥ c
Select y such that |y| ≥ 1
Select x such that |xy| ≤ c
Assign the remaining string to z.
Select k such that the resulting string is not in L.
61
Pumping Lemma: Proof
L is regular => it should have a DFA.
Set N := number of states in the DFA
62
Pumping Lemma: Proof…
=> We should be able to break w=xyz as follows:
x=a1a2..ai; y=ai+1ai+2..aJ; z=aJ+1aJ+2..am
x’s path will be p0..pi
y’s path will be pi pi+1..pJ (but pi=pJ implying a loop)
z’s path will be pJpJ+1..pm yk (for k loops)
Now consider another x z
p0 pi pm
string wk=xykz , where k≥0
=pj
Case k=0
DFA will reach the accept state pm
Case k>0
DFA will loop for yk, and finally reach the accept state pm for z
This proves part (3) of the lemma
In either case, wk L
63
Pumping Lemma: Proof…
For part (1): yk (for k loops)
Since i<j, y ≠ p0
x
pi
z
pm
=pj
For part (2):
By PHP, the repetition of states has to occur within
the first N symbols in w
==> |xy|≤N
64
Applications of Pumping Lemma
65
Using the Pumping Lemma
Note: We don’t have any control over N, except that it is positive.
We also don’t have any control over how to split w=xyz,
but xyz should respect the P/L conditions (1) and (2).
68
Cont…
F={ww | w∈{0,1}* } is non-regular
proof:
Suppose F is regular
Let P be the pumping length given by the pumping
lemma
Let s = 0P10P1∈F
Split s into 3 pieces, s =x yz
By condition 3 in the lemma: |x y| ≤ P
Thus y must have 0 only.
⇒ x yyz ∉ F 0…010…01 →← w y
69
Cont….
E={0i1j : i >j } is non-regular
proof:
Assume E is regular
Let P be the pumping length
Let s = 0P+11P∈E
Split s into 3 pieces, s =x yz
By pumping lemma: x yi z∈E for any i ≥ 0
|y |>0, y have 0 only. x z∈E.
But x z has #(0) ≤ #(1)
70
Cont…
n2
D={1 : n ≥ 0} is not regular
proof:
Assume D is regular
Let P be the pumping length
2
Let s = 1P ∈D
Split s into 3 pieces, s =x yz ⇒ x yiz∈D, i ≥ 0
Consider x yiz∈D and x yi+1z∈D
⇒|x yiz| and |x yi+1z| are perfect squre for any i ≥0
If m=n2, (n+1)2 - n2 =2n+1 = 2 +1
71
Cont…
Let m=|x yiz|
|y| ≤ |s |= P2
Let i = P4
|y|= |x yi+1z|-|x yiz|
≤ P2 = (P4)1/2
< 2(P4)1/2+1
≤ 2(|x yiz|)1/2+1
=2 +1
→←
72
Example of using the Pumping Lemma to prove that a
language is not regular
Let Leq = {w | w is a binary string with equal number of 1s
and 0s}
Your Claim: Leq is not regular
Proof:
adv.
By contradiction, let Leq be regular
P/L constant should exist adv.
Let N = that P/L constant
you
Consider input w = 0N1N
(your choice for the template string)
you
By pumping lemma, we should be able to break w=xyz,
such that:
1) y≠
2) |xy|≤N
3) For all k≥0, the string xykz is also in L 73
Template string w = 0N1N = 00 …. 011 … 1
N N
Proof…
Because |xy|≤N, xy should contain only 0s you
(This and because y≠ , implies y=0+)
Therefore x can contain at most N-1 0s
Also, all the N 1s must be inside z
By (3), any string of the form xykz Leq for all k≥0
Setting k=0 is Case k=0: xz has at most N-1 0s but has N 1s
referred to as
“pumping down” Therefore, xy0z Leq
This violates the P/L (a contradiction)
(Notice that the above should hold for all possible N values of N>0.
Therefore, this completes the proof.) 76
Closure properties for Regular
Languages (RL) This is different
from Kleene
closure
Closure property:
If a set of regular languages are combined using an
Reversal
Kleene closure
Concatenation
77
RLs are closed under union
if L and M are two RLs then:
they both have two corresponding regular expressions,
R and S respectively
(L U M) can be represented using the regular
expression R+S
Therefore, (L U M) is also regular
78
RLs are closed under
complementation
If L is an RL over ∑, then L=∑*-L
To show L is also regular, make the following construction
Convert every final state into non-final, and
every non-final state into a final state
q0 qi qF2 q0 qi qF2
…
…
qFk qFk
80
DFA construction for L ∩ M
AL = DFA for L = {QL, ∑ , qL,FL, δL }
AM = DFA for M = {QM, ∑ , qM,FM, δM }
Build AL ∩ M = {QLx QM,∑, (qL,qM), FLx FM,δ} such
that:
δ((p,q),a) = (δL(p,a), δM(q,a)), where p in QL, and q in
QM
This construction ensures that a string w will be
accepted if and only if w reaches an accepting
state in both input DFAs.
81
DFA construction for L ∩ M
DFA for L DFA for M
qF1 pF1
a a
q0 qi qj qF2 p0 pi pj pF2
…
DFA for LM
(qF1 ,pF1)
a
(q0 ,p0) (qi ,pi) (qj ,pj)
…
82
RLs are closed under set
difference
Closed under intersection
We observe: Closed under
L-M=L∩M complementation
83
RLs are closed under reversal
Reversal of a string w is denoted by wR
E.g., w=00111, wR=11100
Reversal of a language:
LR = The language generated by reversing all
strings in L
84
-NFA Construction for LR
New -NFA for LR
DFA for L
qF1
a
q0 qi qj qF2 q’0 New start
state
Make the
…
old start state
as the only new qFk
final state
Here we are using two Machines for finding the Finite Automata Output
Start q1 0/1 q2
1/0
95