Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Regular Expressions and Regular Languages

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Regular Expressions

and
Regular Languages

BİL405 - Automata Theory and Formal Languages 1


Operations on Languages
Remember: A language is a set of strings

Union:

Concatenation:

Powers:

Kleene Closure:

BİL405 - Automata Theory and Formal Languages 2


Operations on Languages - Examples
L = {00,11} M = {1,01,11}

L  M = {00,11,1,01}
L.M = {001,0001,0011,111,1101,1111}
L0 = {} L1= L ={00,11} L2={0000,0011,1100,1111}
L*={,00,11,0000,0011,1100,1111,000000,000011,...}

Kleene closures of all languages (except two of them) are infinite.


1. * = {}* = {}
2. {}* = {}

BİL405 - Automata Theory and Formal Languages 3


Regular Expressions
• Regular Expressions are an algebraic way to describe languages.
• Regular Expressions describe exactly the regular languages.
• If E is a regular expression, then L(E) is the regular language it defines.
• A regular expression is built up of simpler regular expressions (using
defining rules)
• For each regular expression E, we can create a DFA A
such that L(E) = L(A).
• For each a DFA A, we can create a regular expression E
such that L(A) = L(E)

BİL405 - Automata Theory and Formal Languages 4


Regular Expressions - Definition
Regular expressions over alphabet 

Reg. Expr. E Language it denotes L(E)


Basis 1:  {}
Basis 2:  {}
Basis 3: a {a}

Note:
{a} is the language containing one string, and that string is of length 1.

BİL405 - Automata Theory and Formal Languages 5


Regular Expressions - Definition
Induction 1 – or : If E1 and E2 are regular expressions, then E1+E2 is a
regular expression, and L(E1+E2) = L(E1)L(E2).

Induction 2 – concatenation: If E1 and E2 are regular expressions, then


E1E2 is a regular expression, and L(E1E2) = L(E1)L(E2) where L(E1)L(E2)
is the set of strings wx such that w is in L(E1) and x is in L(E2).

Induction 3 – Kleene Closure: If E is a regular expression, then E* is a


regular expression, and L(E*) = (L(E))*.

Induction 4 – Pranteheses: If E is a regular expression, then (E) is a


regular expression, and L( (E) ) = L(E).
BİL405 - Automata Theory and Formal Languages 6
Regular Expressions - Parentheses
• Parentheses may be used wherever needed to influence the grouping of
operators.
• We may remove parentheses by using precedence and associativity
rules.
Operator Precedence Associativity
* highest
concatenation next left associative
+ lowest left associative

ab*+c means (a((b)*))+(c)

BİL405 - Automata Theory and Formal Languages 7


Regular Expressions - Examples
Alphabet  = {0,1}

• L(01) = {01}. L(01) = L(0) L(1) ={0}{1}={01}


• L(01+0) = {01, 0}. L(01+0) = L(01)  L(0) = (L(0) L(1))  L(0)
= ({0}{1}) {0} = {01} {0}={01,0}
• L(0(1+0)) = {01, 00}.
– Note order of precedence of operators.
• L(0*) = {ε, 0, 00, 000,… }.
• L((0+10)*(ε+1)) = all strings of 0’s and 1’s without two consecutive 1’s.
• L((0+1)(0+1) ) = {00,01,10,11}
• L((0+1)*) = all strings with 0 and 1, including the empty string

BİL405 - Automata Theory and Formal Languages 8


Regular Expressions - Examples
All strings of 0’s and 1’s starting with 0 and ending with 1
0(0+1)*1

All strings of 0’s and 1’s with even number of 0’s


1*(01*01*)*

All strings of 0’s and 1’s with at least two consecutive 0’s
(0+1)*00 (0+1)*

All strings of 0’s and 1’s without two consecutive 0’s


((1+01)*(ε+0))

BİL405 - Automata Theory and Formal Languages 9


Equivalence of FA's and Regular Expressions
• We have already shown that DFA's, NFA's, and -NFA's all are
equivalent.
• To show FA’s equivalent to regular expressions we need to establish
that
1. For every DFA A we can construct a regular expression R, s.t. L(R) = L(A).
2. For every regular expression R there is a -NFA A (a DFA A), s.t. L(A) = L(R).

BİL405 - Automata Theory and Formal Languages 10


From DFA's to Regular Expressions
Theorem 3.4: For every DFA A = (Q, , , q0, F) there is a regular expression R, s.t.
L(R) = L(A).
Proof:
• Let the states of A be {1,2,...,n} with 1 being the start state.
(k)
• Let Rij be a regular expression describing the set of labels (strings) of all paths in
A from state i to state j going through intermediate states {1,2,...,k} only.
– Note that the beginning and end points of the path are not "intermediate." so there is no
constraint that i and/or j be less than or equal to k.

BİL405 - Automata Theory and Formal Languages 11


(k)
Rij Definition -Basis
Basis: k = 0, i.e. no intermediate states.

Case 1: i  j

Case 2: i = j

BİL405 - Automata Theory and Formal Languages 12


(k)
Rij Definition -Induction

Case1: The path does not. go through state k at all. In this


case, the label of the path is in the language of (k-1)
Rij

Case 2: The path goes through state k at, least once.


• The first goes from state i to state k without passing
through k,
• the last piece goes from k to j without passing through k,
and
• all the pieces in the middle go from k to itself, without
passing through k.

BİL405 - Automata Theory and Formal Languages 13


(k)
Rij Definition

• If we construct these expressions in order of increasing superscript,


(k)
then since each Rij depends only on expressions with a smaller
superscript, then all expressions are available when we need them.
(n)
• Eventually, we have Rij for all i and j. We may assume that state 1 is
the start state, although the accepting states could be any set of the
states.
• The regular expression for the language of the automaton is then the
sum (union) of all expressions R(n) such that state j is an accepting state.
ij

BİL405 - Automata Theory and Formal Languages 14


Example

BİL405 - Automata Theory and Formal Languages 15


(1)
Example Rij

BİL405 - Automata Theory and Formal Languages 16


(2)
Example Rij

The final regular expression equivalent to DFAis constructed by taking the union of
all the expressions where the first state is the start state and the second state is accepting.
(2)
With 1 as the start state and 2 as the only accepting state, we need only the expression R12
(2)
R12 = 1*0(0+1)*

BİL405 - Automata Theory and Formal Languages 17


Some Simplification Rules
(+R)* = R*

R = R =   is an annihilator for concatenation.

+R = R+ = R  is the identity for union.

BİL405 - Automata Theory and Formal Languages 18


Converting DFA's to Regular Expressions
by Eliminating States
• The previous method is expensive since we have to construct about n3 expressions.
• There is more efficient way to convert DFA’s to Regular Expressions by eliminating
states.
• When we eliminate a state s. all the paths that went through s no longer exist in the
automaton.
– If the language of the automaton is not to change, we must include, on an arc that
goes directly from q to p, the labels of paths that went from some state q to state
p, through s.
– Since the label of this arc may now involve strings, rather than single symbols,
and there may even be an infinite number of such strings, we cannot simply list
the strings as a label. Regular expressions are, finite way to represent all such
strings.
– Thus, automata will have regular expressions as labels.
– The language of the automaton is the union over all paths from the start state to an
accepting state of the language formed by concatenating the languages of the
regular expressions along that path.
BİL405 - Automata Theory and Formal Languages 19
Converting DFA's to Regular Expressions
by Eliminating States


Eliminate
the state s

label the edges with regex's instead


of symbols BİL405 - Automata Theory and Formal Languages 20
Converting DFA's to Regular Expressions
by Eliminating States
To construct a RegExp from a DFA

1. For each accepting state q, apply the above reduction process to produce an
equivalent automaton with regular-expression labels on the arcs. Eliminate all states
except q and the start state q0.

2. If q  q0, a two-state automaton will be created (CASE 1)

3. If q = q0, a single-state automaton will be created (CASE 2)

4. The desired regular expression is the sum (union) of all the expressions derived from
the reduced automata for each accepting state, by rules (2) and (3).
BİL405 - Automata Theory and Formal Languages 21
Converting DFA's to Regular Expressions
by Eliminating States
CASE 1: If q  q0, a two-state automaton will be created

It accepts the regular expression:

(R+SU*T)*SU*

CASE 2: If q = q0, a single-state automaton will be created

It accepts the regular expression:

R*

BİL405 - Automata Theory and Formal Languages 22


Example
Convert a NFA to a regular expression

 Replace all symbols on arcs with regular expressions

BİL405 - Automata Theory and Formal Languages 23


Example

 Eliminate the state B


NewArcAC = ArcAC + ArcAB ArcBB* ArcBC
=  + 1 * (0+1)
= 1 (0+1)

BİL405 - Automata Theory and Formal Languages 24


Example

 Eliminate the state C


NewArcAD = ArcAD + ArcAC ArcCC* ArcCD
=  + 1(0+1) * (0+1)
= 1 (0+1) (0+1)

BİL405 - Automata Theory and Formal Languages 25


Example

 Eliminate the state D


NewArcAC = ArcAC + ArcAD ArcDD* ArcDC
= 1(0+1) +  * 
= 1 (0+1)

BİL405 - Automata Theory and Formal Languages 26


Example - Result
RE = (ArcAA+ArcAC ArcCC* ArcCA)*ArcACArcCC*
= ((0+1)+1(0+1) * )* 1(0+1) *
= (0+1)*1(0+1)

RE = (ArcAA+ArcAD ArcDD* ArcDA)*ArcADArcDD*


= ((0+1)+1(0+1) (0+1)* )* 1(0+1) (0+1) *
= (0+1)*1(0+1) (0+1)

Final Reg Exp = (0+1)*1(0+1) + (0+1)*1(0+1) (0+1)

BİL405 - Automata Theory and Formal Languages 27


From Regular Expressions to -NFA's
Theorem 3.7: For every regex R we can construct and -NFA A,
s.t. L(A) = L(R).

BİL405 - Automata Theory and Formal Languages 28


From Regular Expressions to -NFA's – R+S

BİL405 - Automata Theory and Formal Languages 29


From Regular Expressions to -NFA's – RS

BİL405 - Automata Theory and Formal Languages 30


From Regular Expressions to -NFA's – R*

BİL405 - Automata Theory and Formal Languages 31


Example: Convert (0+1)*1(0+1) to -NFA

BİL405 - Automata Theory and Formal Languages 32


Example: Convert (0+1)*1(0+1) to -NFA

BİL405 - Automata Theory and Formal Languages 33


Algebraic Laws for Languages –
Associativity and Commutativity
• Commutativity is the property of an operator that says we can
switch the order of its operands and get the same result.

• Associativity is the property of an operator that allows us to


regroup the operands when the operator is applied twice.

Union is commutative: MN=NM


Union is associative: (M  N)  R = M  (N  R)

Concatenation is associative: (M N) R = M (N R)
Concatenation is NOT commutative,
i.e., there are M and Nsuch that MN  NM

BİL405 - Automata Theory and Formal Languages 34


Algebraic Laws for Languages –
Identities and Annihilators
• An identity for an operator is a value such that when the operator is
applied to the identity and some other value, the result is the other value.

• An annihilator for an operator is a value such that when the operator is


applied to the annihilator and some other value, the result is the
annihilator.

 is identity for union:   N = N   = N

{} is left and right identity for concatenation: {} N = N {} = N

 is left and right annihilator for concatenation:  N = N  = 

BİL405 - Automata Theory and Formal Languages 35


Algebraic Laws for Languages –
Distributive and Idempotent
• A distributive law involves two operators, and asserts that one
operator can be pushed down to be applied to each argument of the
other operator individually.

Concatenation is left and right distributive over union:


R (M  N) = RM  RN
(M  N) R = MR  NR

• An operator is said to be idempotent if the result of applying it to


two of the same values as arguments is that value.

Union is idempotent: M  M = M

BİL405 - Automata Theory and Formal Languages 36


Algebraic Laws for Languages –
Closure Laws
Languages Regular Expressions
* = {} * = 
{}* = {} * = 
L+ = LL* = L*L R+ = RR* = R*R
L* = L+  {} R* = R+ + 
L? = L  {} R? = R + 
(L*)* = L* (R*)* = R*

BİL405 - Automata Theory and Formal Languages 37


Algebraic Laws for Languages
Theorem: (L*)* = L*

BİL405 - Automata Theory and Formal Languages 38


Discovering Laws for Regular Expressions
• There is an infinite variety of laws about regular expressions that might
be proposed.

• Is there a general methodology that will make our proofs of the correct
laws easy?  YES
– This methodology only works for regular expression operators (concetanation, or,
closure)

• Methodology: Exp1 = Exp2


– Replace each variable in the law (in Exp1 and Exp2) with unique
symbols to create concrete regular expressions, RE1 and RE2.
– Check the equality of the languages of RE1 and RE2,
ie. L(RE1) = L(RE2)
BİL405 - Automata Theory and Formal Languages 39
Discovering Laws for Regular Expressions

BİL405 - Automata Theory and Formal Languages 40


Discovering Laws for Regular Expressions -
Example
Law: R(M+N) = RM + RN

Replace R with a, M with b, and N with c.

 a(b+c) = ab + ac

Then, check whether L(a(b+c)) is equal to L(ab+bc)

If their languages are equal, the law is TRUE.

Since, L(a(b+c)) is equal to L(ab+bc)


 R(M+N) = RM + RN is a true law

BİL405 - Automata Theory and Formal Languages 41


Discovering Laws for Regular Expressions -
Example
Law: (M+N)* = (M*N*)*

Replace M with a, and N with b.

 (a+b)* = (a*b*)*

Then, check whether L((a+b)*) is equal to L((a*b*)*)

Since, L((a+b)*) is equal to L((a*b*)*)


 (M+N)* = (M*N*)* is a true law

BİL405 - Automata Theory and Formal Languages 42

You might also like