Unit I
Unit I
Unit I
LANGUAGES:
An alphabet is a finite set of symbols. For example {0, 1} is an alphabet with two symbols, {a, b}
is another alphabet with two symbols and English alphabet is also an alphabet.
A string is a finite sequence of symbols of an alphabet. b, a and aabab are examples of string over
alphabet {a, b} and 0, 10 and 001 are examples of string over alphabet {0, 1}.
A language is a set of strings over an alphabet. Thus {a, ab, baa} is a language (over alphabet
{a,b}) and {0, 111} is a language (over alphabet {0,1}).
The number of symbols in a string is called the length of the string. For a string w its length is
represented by |w|. It can be defined more formally by recursive definition. The empty string (also
called null string) is the string with length 0. That is, it has no symbols.
Let u and v be strings. Then uv denotes the string obtained by concatenating u with v, that is, uv is
the string obtained by appending the sequence of symbols of v to that of u. For example
if u = aab and v = bbab, then uv = aabbbab. Note that vu = bbabaab ≠ uv.
A string x is called a substring of another string y if there are strings u and v such that y = uxv.
Note that u and v may be an empty string. So a string is a substring of itself. A string x is a prefix of
another string y if there is a string v such that y = xv. v is called a suffix of y.
The empty set ⱷ is a language which has no strings. The set {€} is a language which
has one string, namely €. Though € has no symbols, this set has an object in it. So it is
not empty. For any alphabet ∑, the set of all strings over ∑ (including the empty
string) is denoted by ∑*. Thus a language over alphabet ∑ is a subset of ∑*.
OPERATIONS ON LANGUAGES:
Union
Intersection
Difference
Concatenation
kleen * closure
Since languages are sets, all the set operations can be applied to languages. Thus the
union, intersection and difference of two languages over an alphabet ∑ are
languages over ∑. The complement of a language L over an alphabet ∑ is ∑* - L and
it is also a language.
Another operation on languages is concatenation. Let L1 and L2 be languages. Then
the concatenation of L1 with L2 is denoted as L1L2 and it is defined as
L1L2 = { uv | u €L1 and v €L2 }. That is L1L2 is the set of strings obtained by
concatenating strings of L1 with those of L2.
Regular languages are languages that can be generated from one-element languages
by applying certain standard operations a finite number of times. They are the
languages that can be recognized by finite automata. These simple operations include
concatenation, union and kleen closure. By the use of these operations regular
languages can be represented by an explicit formula.
Regular expressions can be thought of as the algebraic description of a regular
language. Regular expression can be defined by the following rules:
1. Every letter of the alphabet ∑ is a regular expression.
2. Null string є and empty set Φ are regular expressions.
3. If r1 and r2 are regular expressions, then
(i) r1 + r2 ( union of r1 and r2 )
(ii) r1r2 ( concatenation of r1r2 )
(iii) r1*, r2* ( kleene closure of r1 and r2 ) are also regular expressions
4. If a string can be derived from the rules 1, 2 and 3 then it is also a regular
expression.
Note that a* means zero or more occurrence of a in the string while a+ means that one
or more occurrence of a in the string. That means a* denotes language L = {є , a, aa,
aaa, ….} and a+ represents language L = {a, aa, aaa, ….}. And also note that there can
be more than one regular expression for a given set of strings
Example 1:
Write the regular expression for the language accepting all combinations of a's, over
the set ∑ = {a}
Solution:
All combinations of a's means a may be zero, single, double and so on. If a is
appearing zero times, that means a null string. That is we expect the set of {ε, a, aa,
aaa, ....}. So we give a regular expression for this as:
R = a*
That is Kleen closure of a.
Example 2:
Write the regular expression for the language accepting all combinations of a's except
the null string, over the set ∑ = {a}
Solution:
The regular expression has to be built for the language
L = {a, aa, aaa, ....}
This set indicates that there is no null string. So we can denote regular expression as:
R = a+
Example 3:
Write the regular expression for the language accepting all the string containing any
number of a's and b's.
Solution:
The regular expression will be:
R. = (a + b)*
This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, .....}, any combination of
a and b.
The (a + b)* shows any combination with a and b even a null string.
Example 4:
Write the regular expression for the language accepting all the string which are
starting with 1 and ending with 0, over ∑ = {0, 1}.
Solution:
In a regular expression, the first symbol should be 1, and the last symbol should be 0.
The r.e. is as follows:
R = 1 (0+1)* 0
Example 5:
Write the regular expression for the language starting and ending with a and having
any having any combination of b's in between.
Solution:
The regular expression will be:
R = a b* b
Example 6:
Write the regular expression for the language starting with a but not having
consecutive b's.
Solution: The regular expression has to be built for the language:
[
R = (00)*
Example 9:
Write the regular expression for the language having a string which should have
atleast one 0 and alteast one 1.
Solution:
The regular expression will be:
R = [(0 + 1)* 0 (0 + 1)* 1 (0 + 1)*] + [(0 + 1)* 1 (0 + 1)* 0 (0 + 1)*]
Example 10:
Describe the language denoted by following regular expression
R = (1* 0*)
Example 12:
Write the regular expression for the language containing the string over {0, 1} in
which there are at least two occurrences of 1's between any two occurrences of 1's
between any two occurrences of 0's.
Solution: At least two 1's between two occurrences of 0's can be denoted by
(0111*0)*.
Similarly, if there is no occurrence of 0's, then any number of 1's are also allowed.
Hence the r.e. for required language is:
R = (1 + (0111*0))*
Example 13:
Write the regular expression for the language containing the string in which every 0 is
immediately followed by 11.
Solution:
The regular expectation will be:
R = (011 + 1)*
Order for precedence for the operations is: kleen > concatenation > union. This
rule allows us to lessen the use of parentheses while writing the regular expression.
For example a + b*c is the simplified form of (a + ((b)*c)). Note that (a + b)* is not
the same as a + b*, a + b* is (a + (b)*).
Finite Automata
Finite automata are used to recognize patterns.
It takes the string of symbol as input and changes its state accordingly. When the desired
symbol is found, then the transition occurs.
At the time of transition, the automata can either move to the next state or stay in the
same state.
Finite automata have two states, Accept state or Reject state. When the input string is
processed successfully, and the automata reached its final state, then it will accept.
It has a set of states and rules for moving from one state to another but it depends
upon the applied input symbol. Based on the states and the set of rules the input string
can be either accepted or rejected.
Basically, it is an abstract model of a digital computer which reads an input string and
changes its internal state depending on the current input symbol. Every automaton
defines a language i.e. set of strings it accepts. The following figure shows some
essential features of general automation.
Formal Definition of FA
A finite automaton is a collection of 5-tuple (Q, ∑, δ, q0, F), where:
Q : finite set of states
∑ : finite set of the input symbol
q0: initial state
F : final state
δ : Transition function
In a DFA, for a particular input character, the machine goes to one state only.
A transition function is defined on every state for every input symbol. Also in DFA
null (or ?) move is not allowed, i.e., DFA cannot change state without any input
character.
For example, construct a DFA which accept a language of all strings ending with „a‟.
Given: ∑= {a,b}, q = {q0}, F={q1}, Q = {q0, q1}
First, consider a language set of all the possible acceptable strings in order to
construct an accurate state transition diagram.
L = {a, aa, aaa, aaaa, aaaaa, ba, bba, bbbaa, aba, abba, aaba, abaa}
Above is simple subset of the possible acceptable strings there can many other strings
which ends with „a‟ and contains symbols {a,b}.
1. Create a single start state for the automaton, and mark it as the initial state.
2. For each character in the regular expression, create a new state and add an edge between
the previous state and the new state, with the character as the label.
3. For each operator in the regular expression (such as “*” for zero or more, “+” for one or
more, and “?” for zero or one), create new states and add the appropriate edges to
represent the operator.
4. Mark the final state as the accepting state, which is the state that is reached when the
regular expression is fully matched.
Rules for construction of ∈ -NFA :
∈ -NFA for a+ :
This structure is for a+ which means there must be at least one „a‟ in the expression. It is
preceded by epsilon and also succeeded by one. There is epsilon feedback from state q2 to q1 so
that there can be more than one „a‟ in the expression.
∈-NFA for a* :
This structure is for a* which means there can be any number of „a‟ in the expression, even 0.
The previous structure is just modified a bit so that even if there is no input symbol, i.e. if the
input symbol is null, then also the expression is valid
This structure accepts either a or b as input. So there are two paths, both of which lead to the
final state
∈-NFA for ab :
For concatenation, a must be followed by b. Only then it can reach the final state. Both structures
are allowed here but as it is ∈ -NFA so the second structure is recommended.
L = (0+1)*(00 + 11) can be divided into two parts – (0+1)* and (00 + 11). Since they are
concatenated, the two parts will be linearly connected to each other.
The first part can be drawn using the third rule and the second rule. (0+1) is easy to draw
following the third rule and considering (0+1) as one unit, (0+1)* can also be drawn applying the
second rule. Here‟s the first part as follows.
The Final ∈ -NFA will be : Connecting the two structures linearly gives us our final ∈ -NFA.
L =b + ba* has two terms. The first term is fairly easy to construct. Since both the terms are
connected by „+‟ sign, there will be two paths coming out of the first node. The second term is to
be drawn following the second rule of construction, a* which is simply preceded by b. The Final
∈ -NFA will be :
However, these above features don‟t add any power to NFA. If we compare both in
terms of power, both are equivalent.
Due to the above additional features, NFA has a different transition function, the rest
is the same as DFA.
Transition Function
Q X (∑ U ∈) --> 2Q.
As you can see in the transition function is for any input including null (or ?), NFA
can go to any state number of states. For example, below is an NFA for the above
problem.
As you can see in the transition function is for any input including null (or ?), NFA
can go to any state number of states. For example, below is an NFA for the above
problem.
One important thing to note is, in NFA, if any path for an input string leads to a
final state, then the input string is accepted. For example, in the above NFA, there
are multiple paths for the input string “00”. Since one of the paths leads to a final
state, “00” is accepted by the above NFA.
Every DFA is NFA but not vice-versa. Yet there is a way to convert an NFA to
DFA, so there exists an equivalent DFA for every NFA.
1. Both NFA and DFA have the same power and each NFA can be
translated into a DFA.
2. There can be multiple final states in both DFA and NFA.
3. NFA is more of a theoretical concept.
4. DFA is used in Lexical Analysis in Compiler.
5. If the number of states in the NFA is N then, its DFA can have maximum
2N number of states.
Design an NFA with ∑ = {0, 1} accepts all string ending with 01.
Design an NFA with ∑ = {0, 1} in which double '1' is followed by double '0'.
Design an NFA with ∑ = {0, 1} accepts all string in which the third symbol from the right
end is always 0.
Design a DFA with ∑ = {0, 1} accepts those string which starts with 1 and ends with 0.
Design DFA with ∑ = {0, 1} accepts even number of 0's and even number of 1's.
This FA will consider four different stages for input 0 and input 1. The stages could be:
Here q0 is a start state and the final state also. Note carefully that a symmetry of 0's and
1's is maintained. We can associate meanings to each state as:
Design DFA with ∑ = {0, 1} accepts the set of all strings with three consecutive 0's.
Design a DFA L(M) = {w | w ε {0, 1}*} and W is a string that does not contain consecutive 1's.
Design a FA with ∑ = {0, 1} accepts the strings with an even number of 0's followed by single 1
For the given transition diagram we will first construct the transition table.
State 0 1
→q0 q0 q1
q1 {q1, q2} q1
State 0 1
For the given transition diagram we will first construct the transition table.
State 0 1
δ'([q1], 0) = ϕ
δ'([q1], 1) = [q0, q1]
State 0 1
Steps:
1. Find out all the ε transitions from each state from Q. That will be called as ε-
closure{q1} where qi ∈ Q.
2. Then δ' transitions can be obtained. The δ' transitions mean a ε-closure on δ
moves.
3. Repeat Step-2 for each input symbol and each state of given NFA.
4. Using the resultant states, the transition table for equivalent NFA without ε can
be built.
States a b
*q1 Ф {q2}
*q2 Ф {q2}
State q1 and q2 become the final state as ε-closure of q1 and q2 contain the final state q2.
The NFA can be shown by the following transition diagram:
Step 2: Find the states for each input symbol that can be traversed from the present. That means
the union of transition value and their closures for each state of NFA present in the current state
of DFA.
Step 3: If we found a new state, take it as current state and repeat step 2.
Step 4: Repeat Step 2 and Step 3 until there is no new state present in the transition table of
DFA.
Step 5: Mark the states of DFA as a final state which contains the final state of NFA.
Solution:
For state B:
δ'(B, 0) = ε-closure {δ(q3, 0) }
=ϕ
δ'(B, 1) = ε-closure {δ(q3, 1) }
= ε-closure {q4}
= {q4} i.e. state C
For state C:
Now we will obtain δ' transition. Let ε-closure(q0) = {q0, q1, q2} call it as state A.
δ'(A, 0) = A
δ'(A, 1) = B
δ'(A, 2) = C
Now we will find the transitions on states B and C for each input.
Let P and Q be two regular expressions. If P does not contain null string, then R = Q + RP has
a unique solution that is R = QP*
Proof −
= Q + QP + RPP
When we put the value of R recursively again and again, we get the following equation −
R = Q + QP + QP2 + QP3…..
R = Q (ε + P + P2 + P3 + …. )
Hence, proved.
The equations for the three states q1, q2, and q3 are as follows −
q3 = q2a
q1 = q1a + q3a + ε
= (a + b(b + ab)*aa)*
Hence, the regular expression is (a + b(b + ab)*aa)*.
q1 = ε0* [As, εR = R]
So, q1 = 0*
q2 = 0*1 + q20
So, q2 = 0*1(0)* [By Arden‟s theorem]
Pick start state and output is on symbol 'a' we are going on state B
So we will write as :
A -> aB
And then we will pick state B and then we will go for each output.
so we will get the below production.
B -> aB/bB/ε
DFA NFA
Next state is completed by determining The state is only partially determined by the
current state and current symbol current state and current input symbol
The transition function returns only one The transition function returns zero, one or more
state.(i.e) : Q X Q states.(i.e) : Q X 2Q
ii. All strings that don’t contain the substring 110. DEC 2011
n
Construct a DFA for the language L = { a b , n > 0}.
a b
s0 s2
s1
a, b
a, b
b
s3
1, 0
0 1
S0 S1 S2
Represent a language over ∑ ={1} having (i) even length of string (ii) odd length of a String
(i) Even length of string R=(11)*
Operator Precedence
Keene Closure 1
Positive Closure 2
Concatenation 3
Union 4
Construct a DFA for the language over {0, 1}* such that it contains “000” as a substring.
Q0 Q1 Q2 Q3
€ b
a €
€ €
€
€
€
c