Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit-1 F&CD

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Unit-1

1.1 Introduction: Formal Language & Regular Expressions


A formal language is an abstraction of the general characteristics of programming languages. A formal
language consists of a set of symbols and some rules of formation by which these symbols can be combined
into entities called sentences. A formal language is the set of all sentences permitted by the rules of
formation. Although some of the formal languages we study here are simpler than programming languages,
they have many of the same essential features. We can learn a great deal about programming languages
from formal languages. Finally, we will formalize the concept of a mechanical computation by giving a
precise definition of the term algorithm and study the kinds of problems that are (and are not) suitable for
solution by such mechanical means. In the course of our study, we will show the close connection between
these abstractions and investigate the conclusions we can derive from them.
we will look at models that represent features at the core of all computers and their applications. To model
the hardware of a computer, we introduce the notion of an automaton (plural, automata). An automaton is
a construct that possesses all the indispensable features of a digital computer. It accepts input, produces
output, may have some temporary storage, and can make decisions in transforming the input into the output
1.2 Automata, Related Terminologies
Automata
The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting". An
automaton (Automata in plural) is an abstract self-propelled computing device which follows a
predetermined sequence of operations automatically. An automaton with a finite number of states is called
a Finite Automaton (FA) or Finite State Machine (FSM).
Related Terminologies
Alphabet
• Definition − An alphabet is any finite set of symbols.
• Example − ∑ = {a, b, c, d} is an alphabet set where ‘a’, ‘b’, ‘c’, and ‘d’ are symbols.
String
• Definition − A string is a finite sequence of symbols taken from ∑.
• Example − ‘cabcad’ is a valid string on the alphabet set ∑ = {a, b, c, d}
Length of a String
• Definition − It is the number of symbols present in a string. (Denoted by |S|).
• Examples −
o If S = ‘cabcad’, |S|= 6
o If |S|= 0, it is called an empty string (Denoted by λ or ε)
Kleene Star
• Definition − The Kleene star, ∑*, is a unary operator on a set of symbols or strings, ∑, that gives the
infinite set of all possible strings of all possible lengths over ∑ including λ.
• Representation − ∑* = ∑0 ∪ ∑1 ∪ ∑2 ∪……. where ∑p is the set of all possible strings of length p.
• Example − If ∑ = {a, b}, ∑* = {λ, a, b, aa, ab, ba, bb,………..}
Kleene Closure / Plus
• Definition − The set ∑+ is the infinite set of all possible strings of all possible lengths over ∑ excluding
λ.
• Representation − ∑+ = ∑1 ∪ ∑2 ∪ ∑3 ∪…….
∑+ = ∑* − { λ }
Example − If ∑ = { a, b } , ∑+ = { a, b, aa, ab, ba, bb,………..}
1.3 Languages and operations on languages
Language
• Definition − A language is a subset of ∑* for some alphabet ∑. It can be finite or infinite.
• Example − If the language takes all possible strings of length 2 over ∑ = {a, b}, then L = { ab, aa, ba, bb
}
A language is a set of strings from some alphabet (finite or infinite). In other words, any subset L of E* is
a language in TOC.
Some special languages are as follows −
• {} The empty set/language, containing no string.
• {Ø} A language containing one string, the empty string.
Examples
• E = {0, 1}
L = {x | x is in E* and x contains an even number of 0’s}
• E = {0, 1, 2,., 9, .}
L = {x | x is in E* and x forms a finite length real number}
= {0, 1.5, 9.326,.}
• E = {a, b, c,., z, A, B,., Z}
L = {x | x is in E* and x is a Pascal reserved word}
= {BEGIN, END, IF,...}
• E = {Pascal reserved words} U { (, ), ., :, ;,...} U {Legal Pascal identifiers}
L = {x | x is in E* and x is a syntactically correct Pascal program}
• E = {English words}
L = {x | x is in E* and x is a syntactically correct English sentence}
Operations on Regular Languages
Some of the operations on regular languages are as follows −
Union
Intersection
Difference
Concatenation
Kleene * closure
Let us understand these operations one by one.
Union
If Ll and If L2 are two regular languages, their union Ll u L2 will also be regular.
For example, Ll = {an I n > O} and L2 = {bn I n > O}
L3 = Ll U L2 = {an U bn I n > O} is also regular.
Intersection
If Ll and If L2 are two regular languages, their intersection Ll n L2 will also be regular.
For example,
Ll= {am bn I n > 0 and m > O} and
L2= {am bn U bn am I n > 0 and m > O}
L3 = Ll n L2 = {am bn I n > 0 and m > O} is also regular.
Concatenation
If Ll and If L2 are two regular languages, their concatenation L1.L2 will also be regular.
For example,
Ll = {an I n > 0} and L2 = {bn I n > O}
L3 = L1.L2 = {am . bn I m > 0 and n > O} is also regular.
Kleene Closure
If Ll is a regular language, its Kleene closure Ll* will also be regular.
For example, Ll = (a U b ), Ll* = (a U b)*
Complement
If L(G) is a regular language, its complement L'(G) will also be regular. Complement of a language can
be found by subtracting strings which are in L(G) from all possible strings.
For example,
L(G) = {an I n > 3}
L'(G) = {an I n <= 3}
Note: Two regular expressions are equivalent, if languages generated by them are the same.
For example, (a+b*)* and (a+b)* generate the same language. Every string which is generated by (a+b*)*
is also generated by (a+b)* and vice versa .
Example 1
Write the regular expression for the language accepting all combinations of a's, over the set l: = {a}
All combinations of a's mean a may be zero, single, double and so on. If a is appearing zero times, that
means a null string. That is, we expect the set of {E, a, aa, aaa, ....}.
So we give a regular expression for this as follows −
R = a*
That is Kleen closure of a.
1.4 Grammar & Derivations from a grammar
Grammar
A grammar G can be formally written as a 4-tuple (N, T, S, P) where −
• N or VN is a set of variables or non-terminal symbols.
• T or ∑ is a set of Terminal symbols.
• S is a special variable called the Start symbol, S ∈ N
• P is Production rules for Terminals and Non-terminals. A production rule has the form α → β, where α
and β are strings on VN ∪ ∑ and least one symbol of α belongs to VN.
Example
Grammar G1 −
({S, A, B}, {a, b}, S, {S → AB, A → a, B → b})
Here,
• S, A, and B are Non-terminal symbols;
• a and b are Terminal symbols
• S is the Start symbol, S ∈ N
• Productions, P : S → AB, A → a, B → b
Example
Grammar G2 −
(({S, A}, {a, b}, S,{S → aAb, aA → aaAb, A → ε } )
Here,
• S and A are Non-terminal symbols.
• a and b are Terminal symbols.
• ε is an empty string.
• S is the Start symbol, S ∈ N
• Production P : S → aAb, aA → aaAb, A → ε
Derivations from a Grammar
Strings may be derived from other strings using the productions in a grammar. If a grammar G has a
production α → β, we can say that x α y derives x β y in G. This derivation is written as −
x α y ⇒G x β y
Example
Let us consider the grammar −
G2 = ({S, A}, {a, b}, S, {S → aAb, aA → aaAb, A → ε } )
Some of the strings that can be derived are −
S ⇒ aAb using production S → aAb
⇒ aaAbb using production aA → aAb
⇒ aaaAbbb using production aA → aaAb
⇒ aaabbb using production A → ε
The set of all strings that can be derived from a grammar is said to be the language generated from that
grammar. A language generated by a grammar G is a subset formally defined by
L(G)={W|W ∈ ∑*, S ⇒G W}
If L(G1) = L(G2), the Grammar G1 is equivalent to the Grammar G2.
Example
If there is a grammar
G: N = {S, A, B} T = {a, b} P = {S → AB, A → a, B → b}
Here S produces AB, and we can replace A by a, and B by b. Here, the only accepted string is ab, i.e.,
L(G) = {ab}
Example
Suppose we have the following grammar −
G: N = {S, A, B} T = {a, b} P = {S → AB, A → aA|a, B → bB|b}
The language generated by this grammar −
L(G) = {ab, a2b, ab2, a2b2, ………}
= {am bn | m ≥ 1 and n ≥ 1}
Construction of a Grammar Generating a Language
We’ll consider some languages and convert it into a grammar G which produces those languages.
Example
Problem − Suppose, L (G) = {am bn | m ≥ 0 and n > 0}. We have to find out the grammar G which
produces L(G).
Solution
Since L(G) = {am bn | m ≥ 0 and n > 0}
the set of strings accepted can be rewritten as −
L(G) = {b, ab,bb, aab, abb, …….}
Here, the start symbol has to take at least one ‘b’ preceded by any number of ‘a’ including null.
To accept the string set {b, ab, bb, aab, abb, …….},
we have taken the productions −
S → aS , S → B, B → b and B → bB
S → B → b (Accepted)
S → B → bB → bb (Accepted)
S → aS → aB → ab (Accepted)
S → aS → aaS → aaB → aab(Accepted)
S → aS → aB → abB → abb (Accepted)
Thus, we can prove every single string in L(G) is accepted by the language generated by the production
set.
Hence the grammar −
G: ({S, A, B}, {a, b}, S, { S → aS | B , B → b | bB })
Example
Problem − Suppose, L (G) = {am bn | m > 0 and n ≥ 0}. We have to find out the grammar G which produces
L(G).
Solution −
Since L(G) = {am bn | m > 0 and n ≥ 0}, the set of strings accepted can be rewritten as −
L(G) = {a, aa, ab, aaa, aab ,abb, …….}
Here, the start symbol has to take at least one ‘a’ followed by any number of ‘b’ including null.
To accept the string set {a, aa, ab, aaa, aab, abb, …….}, we have taken the productions −
S → aA, A → aA , A → B, B → bB ,B → λ
S → aA → aB → aλ → a (Accepted)
S → aA → aaA → aaB → aaλ → aa (Accepted)
S → aA → aB → abB → abλ → ab (Accepted)
S → aA → aaA → aaaA → aaaB → aaaλ → aaa (Accepted)
S → aA → aaA → aaB → aabB → aabλ → aab (Accepted)
S → aA → aB → abB → abbB → abbλ → abb (Accepted)
Thus, we can prove every single string in L(G) is accepted by the language generated by the production
set.
Hence the grammar −
G: ({S, A, B}, {a, b}, S, {S → aA, A → aA | B, B → λ | bB })
1.5 Chomsky Normal Form
According to Noam Chomsky, there are four types of grammars − Type 0, Type 1, Type 2, and Type 3.
The following table shows how they differ from each other −
Grammar Grammar Language Accepted Automaton
Type Accepted

Type 0 Unrestricted Recursively Turing


grammar enumerable Machine
language

Type 1 Context-sensitive Context-sensitive Linear-


grammar language bounded
automaton

Type 2 Context-free Context-free Pushdown


grammar language automaton

Type 3 Regular grammar Regular language Finite state


automaton

Take a look at the following illustration. It shows the scope of each type of grammar −

Type - 3 Grammar

Type-3 grammars generate regular languages. Type-3 grammars must have a single non-terminal on the
left-hand side and a right-hand side consisting of a single terminal or single terminal followed by a single
non-terminal.
Type -3 grammar is accepted by FINITE AUTOMATA
The productions must be in the form X → a or X → aY
where X, Y ∈ N (Non terminal)
and a ∈ T (Terminal)
The rule S → ε is allowed if S does not appear on the right side of any rule.
Example
X→ε
X → a | aY
Y→b

Type - 2 Grammar

Type-2 grammars generate context-free languages.


The productions must be in the form A → γ
where A ∈ N (Non terminal)
and γ ∈ (T ∪ N)* (String of terminals and non-terminals).
1) The start symbol can have ∈ productions
2) The non terminal generating single terminal
3) The non terminal generating exactly 2 non terminals
4) L.H.S have exactly one variable
These languages generated by these grammars are be recognized by a non-deterministic pushdown
automaton.
Example
S→Xa
X→a
X → aX
X → abc
X→ε
Type - 1 Grammar
Type-1 grammars generate context-sensitive languages. The productions must be in the form
αAβ→αγβ
where A ∈ N (Non-terminal)
and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)
The strings α and β may be empty, but γ must be non-empty.
The rule S → ε is allowed if S does not appear on the right side of any rule. The languages generated by
these grammars are recognized by a linear bounded automaton.
Example
AB → AbBc
A → bcA
B→b

Type - 0 Grammar
Type-0 grammars generate recursively enumerable languages. The productions have no restrictions. They
are any phase structure grammar including all formal grammars.
They generate the languages that are recognized by a Turing machine.
The productions can be in the form of α → β where α is a string of terminals and non terminals with at least
one non-terminal and α cannot be null. β is a string of terminals and non-terminals.
Example
S → ACaB
Bc → acB
CB → DB
aD → Db
A CFG is in Chomsky Normal Form if the Productions are in the following forms −
•A → a
• A → BC
•S → ε
where A, B, and C are non-terminals and a is terminal.
Algorithm to Convert into Chomsky Normal Form −
Step 1 − If the start symbol S occurs on some right side, create a new start symbol S’ and a new
production S’→ S.
Step 2 − Remove Null productions. (Using the Null production removal algorithm discussed earlier)
Step 3 − Remove unit productions. (Using the Unit production removal algorithm discussed earlier)
Step 4 − Replace each production A → B1…Bn where n > 2 with A → B1C where C → B2 …Bn. Repeat
this step for all productions having two or more symbols in the right side.
Step 5 − If the right side of any production is in the form A → aB where a is a terminal and A, B are non-
terminal, then the production is replaced by A → XB and X → a. Repeat this step for every production
which is in the form A → aB.
Problem
Convert the following CFG into CNF
S → ASA | aB, A → B | S, B → b | ε
Solution
(1) Since S appears in R.H.S, we add a new state S0 and S0→S is added to the production set and it
becomes −
S0→S, S→ ASA | aB, A → B | S, B → b | ∈

(2) Now we will remove the null productions −


B → ∈ and A → ∈
After removing B → ε, the production set becomes −
S0→S, S→ ASA | aB | a, A → B | S | ∈, B → b
After removing A → ∈, the production set becomes −
S0→S, S→ ASA | aB | a | AS | SA | S, A → B | S, B → b
(3) Now we will remove the unit productions.
After removing S → S, the production set becomes −
S0→S, S→ ASA | aB | a | AS | SA, A → B | S, B → b
After removing S0→ S, the production set becomes −
S0→ ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A → B | S, B → b
After removing A→ B, the production set becomes −
S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A→S|b
B→b
After removing A→ S, the production set becomes −
S0 → ASA | aB | a | AS | SA, S→ ASA | aB | a | AS | SA
A → b |ASA | aB | a | AS | SA, B → b
(4) Now we will find out more than two variables in the R.H.S
Here, S0→ ASA, S → ASA, A→ ASA violates two Non-terminals in R.H.S.
Hence, we will apply step 4 and step 5 to get the following final production set which is in CNF −
S0→ AX | aB | a | AS | SA
S→ AX | aB | a | AS | SA
A → b |AX | aB | a | AS | SA
B→b
X → SA
(5) We have to change the productions S0→ aB, S→ aB, A→ aB
And the final production set becomes −
S0→ AX | YB | a | AS | SA
S→ AX | YB | a | AS | SA
A → b A → b |AX | YB | a | AS | SA
B→b
X → SA
Y→a
1.6 Regular Expressions, Properties, Identity rules for Regular Expression
Regular Expressions
A Regular Expression can be recursively defined as follows −
• ε is a Regular Expression indicates the language containing an empty string. (L (ε) = {ε})
• φ is a Regular Expression denoting an empty language. (L (φ) = { })
• x is a Regular Expression where L = {x}
• If X is a Regular Expression denoting the language L(X) and Y is a Regular Expression denoting the
language L(Y), then
o X + Y is a Regular Expression corresponding to the language L(X) ∪ L(Y) where L(X+Y) = L(X)
∪ L(Y).
o X . Y is a Regular Expression corresponding to the language L(X) . L(Y) where L(X.Y) = L(X)
. L(Y)
o R* is a Regular Expression corresponding to the language L(R*)where L(R*) = (L(R))*
• If we apply any of the rules several times from 1 to 5, they are Regular Expressions.
Some RE Examples
Regular Regular Set
Expressions

(0 + 10*) L = { 0, 1, 10, 100, 1000, 10000, … }

(0*10*) L = {1, 01, 10, 010, 0010, …}

(0 + ε)(1 + ε) L = {ε, 0, 1, 01}

(a+b)* Set of strings of a’s and b’s of any length including the null string.
So L = { ε, a, b, aa , ab , bb , ba, aaa…….}

(a+b)*abb Set of strings of a’s and b’s ending with the string abb. So L = {abb,
aabb, babb, aaabb, ababb, …………..}

(11)* Set consisting of even number of 1’s including empty string, So L=


{ε, 11, 1111, 111111, ……….}

(aa)*(bb)*b Set of strings consisting of even number of a’s followed by odd


number of b’s , so L = {b, aab, aabbb, aabbbbb, aaaab, aaaabbb,
…………..}

(aa + ab + ba String of a’s and b’s of even length can be obtained by


+ bb)* concatenating any combination of the strings aa, ab, ba and bb
including null, so L = {aa, ab, ba, bb, aaab, aaba, …………..}

Any set that represents the value of the Regular Expression is called a Regular Set.
Properties of Regular Sets
Property 1. The union of two regular set is regular.
Proof −
Let us take two regular expressions
RE1 = a(aa)* and RE2 = (aa)*
So, L1 = {a, aaa, aaaaa,.....} (Strings of odd length excluding Null)
and L2 ={ ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∪ L2 = { ε, a, aa, aaa, aaaa, aaaaa, aaaaaa,.......}
(Strings of all possible lengths including Null)
RE (L1 ∪ L2) = a* (which is a regular expression itself)
Hence, proved.
Property 2. The intersection of two regular set is regular.
Proof −
Let us take two regular expressions
RE1 = a(a*) and RE2 = (aa)*
So, L1 = { a,aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∩ L2 = { aa, aaaa, aaaaaa,.......} (Strings of even length excluding Null)
RE (L1 ∩ L2) = aa(aa)* which is a regular expression itself.
Hence, proved.
Property 3. The complement of a regular set is regular.
Proof −
Let us take a regular expression −
RE = (aa)*
So, L = {ε, aa, aaaa, aaaaaa, .......} (Strings of even length including Null)
Complement of L is all the strings that is not in L.
So, L’ = {a, aaa, aaaaa, .....} (Strings of odd length excluding Null)
RE (L’) = a(aa)* which is a regular expression itself.
Hence, proved.
Property 4. The difference of two regular set is regular.
Proof −
Let us take two regular expressions −
RE1 = a (a*) and RE2 = (aa)*
So, L1 = {a, aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 – L2 = {a, aaa, aaaaa, aaaaaaa, ....}
(Strings of all odd lengths excluding Null)
RE (L1 – L2) = a (aa)* which is a regular expression.
Hence, proved.
Property 5. The reversal of a regular set is regular.
Proof −
We have to prove LR is also regular if L is a regular set.
Let, L = {01, 10, 11, 10}
RE (L) = 01 + 10 + 11 + 10
LR = {10, 01, 11, 01}
RE (LR) = 01 + 10 + 11 + 10 which is regular
Hence, proved.
Property 6. The closure of a regular set is regular.
Proof −
If L = {a, aaa, aaaaa, .......} (Strings of odd length excluding Null)
i.e., RE (L) = a (aa)*
L* = {a, aa, aaa, aaaa , aaaaa,……………} (Strings of all lengths excluding Null)
RE (L*) = a (a)*
Hence, proved.
Property 7. The concatenation of two regular sets is regular.
Proof −
Let RE1 = (0+1)*0 and RE2 = 01(0+1)*
Here, L1 = {0, 00, 10, 000, 010, ......} (Set of strings ending in 0)
and L2 = {01, 010,011,.....} (Set of strings beginning with 01)
Then, L1 L2 = {001,0010,0011,0001,00010,00011,1001,10010,.............}
Set of strings containing 001 as a substring which can be represented by an RE − (0 + 1)*001(0 + 1)*
Hence, proved.
Identities Related to Regular Expressions
Given R, P, L, Q as regular expressions, the following identities hold −
• ∅* = ε
• ε* = ε
• RR* = R*R
• R*R* = R*
• (R*)* = R*
• RR* = R*R
• (PQ)*P =P(QP)*
• (a+b)* = (a*b*)* = (a*+b*)* = (a+b*)* = a*(ba*)*
• R + ∅ = ∅ + R = R (The identity for union)
• R ε = ε R = R (The identity for concatenation)
• ∅ L = L ∅ = ∅ (The annihilator for concatenation)
• R + R = R (Idempotent law)
• L (M + N) = LM + LN (Left distributive law)
• (M + N) L = ML + NL (Right distributive law)
• ε + RR* = ε + R*R = R*

1.7 Finite Automata- DFA, NFA


Types of finite automata
Finite Automaton can be classified into two types −
• Deterministic Finite Automaton (DFA)
• Non-deterministic Finite Automaton (NDFA / NFA)
Deterministic Finite Automaton (DFA)
Definition:
In DFA, for each input symbol, one can determine the state to which the machine will move. Hence, it is
called Deterministic Automaton. As it has a finite number of states, the machine is called Deterministic
Finite Machine or Deterministic Finite Automaton.
Formal Definition of a DFA
A DFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −
• Q is a finite set of states.
• ∑ is a finite set of symbols called the alphabet.
• δ is the transition function where δ: Q × ∑ → Q
• q0 is the initial state from where any input is processed (q0 ∈ Q).
• F is a set of final state/states of Q (F ⊆ Q).

Graphical Representation of a DFA

A DFA is represented by digraphs called state diagram.


• The vertices represent the states.
• The arcs labeled with an input alphabet show the transitions.
• The initial state is denoted by an empty single incoming arc.
• The final state is indicated by double circles.
Example
Let a deterministic finite automaton be →
• Q = {a, b, c},
• ∑ = {0, 1},
• q0 = {a},
• F = {c}, and

Transition function δ as shown by the following table –

Present Next State for Next State for


State Input 0 Input 1

a a b

b c a

c b c

Its graphical representation would be as follows −


Non-deterministic Finite Automaton
Definition:
In NDFA, for a particular input symbol, the machine can move to any combination of the states in the
machine. In other words, the exact state to which the machine moves cannot be determined. Hence, it is
called Non-deterministic Automaton. As it has finite number of states, the machine is called Non-
deterministic Finite Machine or Non-deterministic Finite Automaton.
Formal Definition of an NDFA
An NDFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −
• Q is a finite set of states.
• ∑ is a finite set of symbols called the alphabets.
• δ is the transition function where δ: Q × ∑ → 2Q
(Here the power set of Q (2Q) has been taken because in case of NDFA, from a state, transition can occur
to any combination of Q states)
• q0 is the initial state from where any input is processed (q0 ∈ Q).
• F is a set of final state/states of Q (F ⊆ Q).

Graphical Representation of an NDFA: (same as DFA)


An NDFA is represented by digraphs called state diagram.
• The vertices represent the states.
• The arcs labeled with an input alphabet show the transitions.
• The initial state is denoted by an empty single incoming arc.
• The final state is indicated by double circles.

Example
Let a non-deterministic finite automaton be →
• Q = {a, b, c}
• ∑ = {0, 1}
• q0 = {a}
• F = {c}
The transition function δ as shown below −

Present Next State for Next State for


State Input 0 Input 1

a a, b b

b c a, c

c b, c c

Its graphical representation would be as follows −

DFA vs NDFA

The following table lists the differences between DFA and NDFA.

DFA NDFA

The transition from a state is to a single The transition from a state can be to
particular next state for each input symbol. multiple next states for each input symbol.
Hence it is called deterministic. Hence it is called non-deterministic.

Empty string (ε -moves) transitions are not seen NDFA permits empty string (ε -moves)
in DFA. transitions.

Backtracking is allowed in DFA In NDFA, backtracking is not always


possible.

Requires more space. Requires less space.

A string is accepted by a DFA, if it transits to a A string is accepted by a NDFA, if at least


final state. one of all possible transitions ends in a final
state.

Dead configuration is not allowed Dead configuration is allowed

Number of states are more, so designing & Number of states are less, so designing &
understanding is difficult understanding is easy
Acceptability by DFA and NDFA

• A string is accepted by a DFA/NDFA iff the DFA/NDFA starting at the initial state ends in an accepting
state (any of the final states) after reading the string wholly.
• A string S is accepted by a DFA/NDFA (Q, ∑, δ, q0, F), iff
δ*(q0, S) ∈ F

• The language L accepted by DFA/NDFA is


{S | S ∈ ∑* and δ*(q0, S) ∈ F}

• A string S′ is not accepted by a DFA/NDFA (Q, ∑, δ, q0, F), iff


δ*(q0, S′) ∉ F

• The language L′ not accepted by DFA/NDFA (Complement of accepted language L) is


{S | S ∈ ∑* and δ*(q0, S) ∉ F}
Example
Let us consider the DFA shown in Figure 1.3. From the DFA, the acceptable strings can be derived.

Strings accepted by the above DFA: {0, 00, 11, 010, 101, ...........}
Strings not accepted by the above DFA: {1, 011, 111, ........}
1.8 Conversion of NFA to DFA

Problem Statement

Let X = (Qx, ∑, δx, q0, Fx) be an NFA which accepts the language L(X). We have to design an equivalent
DFA Y = (Qy, ∑, δy, q0, Fy) such that L(Y) = L(X). The following procedure converts the NDFA to its
equivalent DFA −

Algorithm (steps for converting an NFA to DFA)

Input − An NDFA.
Output − An equivalent DFA.
Step 1 – start from the beginning state of the NFA& take the state within [].
Step 2 – place the next states for the beginning state for the given input in the next state columns. Put them
also in [].
Step 3 – If any new combination of state appears in next state column, which is not yet taken in the present
state column, then take that combination of state in the present state column.
Step 4 – If in the present state column more than one state appears, then the next state for that combination
will be the combination of the next states for each of the states.
Step 5 – If no new combination of state appears, which is not yet taken in the present state column, stop
the process.
Step 6 – The beginning state for the constructed DFA will be the beginning state of NFA.
Step 7 – The final state or final states for the constructing DFA will be the combination of states containing
at least one final states.
Example
Let us consider the NDFA shown in the figure below.

q δ(q,0) δ(q,1)

a {a,b,c,d,e} {d,e}

b {c} {e}

c ∅ {b}

d {e} ∅

e ∅ ∅

Using the above algorithm, we find its equivalent DFA. The state table of the DFA is shown in below.

q δ(q,0) δ(q,1)

[a] [a,b,c,d,e] [d,e]

[a,b,c,d,e] [a,b,c,d,e] [b,d,e]

[d,e] [e] ∅

[b,d,e] [c,e] [e]

[e] ∅ ∅
[c, e] ∅ [b]

[b] [c] [e]

[c] ∅ [b]

The state diagram of the DFA is as follows −

1.9 Conversion of Regular Expression to NFA & DFA


We can use Thompson's Construction to find out a Finite Automaton from a Regular Expression. We will
reduce the regular expression into smallest regular expressions and converting these to NFA and finally to
DFA.
Some basic RA expressions are the following −
Case 1 − For a regular expression ‘a’, we can construct the following FA −

Case 2 − For a regular expression ‘ab’, we can construct the following FA −


Case 3 − For a regular expression (a+b), we can construct the following FA −

Case 4 − For a regular expression (a+b)*, we can construct the following FA −

Method
Step 1 Construct an NFA with Null moves from the given regular expression.
Step 2 Remove Null transition from the NFA and convert it into its equivalent DFA.
Problem
Convert the following RA into its equivalent DFA − 1 (0 + 1)* 0
Solution
We will concatenate three expressions "1", "(0 + 1)*" and "0"

Now we will remove the ε transitions. After we remove the ε transitions from the NDFA, we get the
following −
It is an NDFA corresponding to the RE − 1 (0 + 1)* 0. If you want to convert it into a DFA, simply apply
the method of converting NDFA to DFA .
Finite Automata with Null Moves (NFA-ε)
A Finite Automaton with null moves (FA-ε) does transit not only after giving input from the alphabet set
but also without any input symbol. This transition without input is called a null move.
An NFA-ε is represented formally by a 5-tuple (Q, ∑, δ, q0, F), consisting of
• Q − a finite set of states
• ∑ − a finite set of input symbols
• δ − a transition function δ : Q × (∑ ∪ {ε}) → 2Q
• q0 − an initial state q0 ∈ Q
• F − a set of final state/states of Q (F⊆Q).

The above (FA-ε) accepts a string set − {0, 1, 01}

Removal of Null Moves from Finite Automata

If in an NDFA, there is ϵ-move between vertex X to vertex Y, we can remove it using the following steps

• Find all the outgoing edges from Y.


• Copy all these edges starting from X without changing the edge labels.
• If X is an initial state, make Y also an initial state.
• If Y is a final state, make X also a final state.
Problem
Convert the following NFA-ε to NFA without Null move.

Solution
Step 1 −
Here the ε transition is between q1 and q2, so let q1 is X and qf is Y.
Here the outgoing edges from qf is to qf for inputs 0 and 1.
Step 2 −
Now we will Copy all these edges from q1 without changing the edges from qf and get the following FA −

Step 3 −
Here q1 is an initial state, so we make qf also an initial state.
So the FA becomes −

Step 4 −
Here qf is a final state, so we make q1 also a final state.
So the FA becomes −
Example 1:

Design a FA from given regular expression 10 + (0 + 11)0* 1.

Solution: First we will construct the transition diagram for a given regular expression.

Step 1:

Step 2:

Step 3:
Step 4:

Step 5:

Now we have got NFA without ε. Now we will convert it into required DFA for that, we will first write a
transition table for this NFA.

State 0 1

→q0 q3 {q1,
q2}

q1 qf ϕ

q2 ϕ q3

q3 q3 qf

*qf ϕ ϕ

The equivalent DFA will be:

State 0 1

→[q0] [q3] [q1,


q2]

[q1] [qf] Φ

[q2] Φ [q3]
[q3] [q3] [qf]

[q1, [qf] [qf]


q2]

*[qf] Φ Φ

Example 2:

Design a NFA from given regular expression 1 (1* 01* 01*)*.

Solution: The NFA for the given regular expression is as follows:

Step 1:

Step 2:

Step 3:

Example 3:

Construct the FA for regular expression 0*1 + 10.

Solution:

We will first construct FA for R = 0*1 + 10 as follows:


Step 1:

Step 2:

Step 3:

Step 4:
1.10 Introduction to Compilers: phases of the compiler

Introduction of Compiler Design


• A compiler is a translator that converts the high-level language into the machine language.
• High-level language is written by a developer and machine language can be understood by the processor.
• Compiler is used to show errors to the programmer.
• The main purpose of compiler is to change the code written in one language without changing the
meaning of the program.
• When you execute a program which is written in HLL programming language then it executes into two
parts.
• In the first part, the source program compiled and translated into the object program (low level language).
• In the second part, object program translated into the target program through the assembler.

Fig: Execution process of source program in Compiler

The compiler is software that converts a program written in a high-level language (Source Language) to
a low-level language (Object/Target/Machine Language/0’s, 1’s).

Language processing systems (using Compiler): We know a computer is a logical assembly of


Software and Hardware. The hardware knows a language, that is hard for us to grasp, consequently, we
tend to write programs in a high-level language, that is much less complicated for us to comprehend and
maintain in thoughts. Now, these programs go through a series of transformations so that they can readily
be used by machines. This is where language procedure systems come in handy.
A translator or language processor is a program that translates an input program written in a
programming language into an equivalent program in another language.

TYPE OF TRANSLATORS: -
• Interpreter
• Compiler
• Preprocessor

The compiler is a type of translator, which takes a program written in a high-level programming language
as input and translates it into an equivalent program in low-level languages such as machine language or
assembly language.
❖ The program written in a high-level language is known as a source program, and the program
converted into low-level language is known as an object (or target) program.
❖ Moreover, the compiler traces the errors in the source program and generates the error report.
Without compilation, no program written in a high-level language can be executed.
❖ After compilation, only the program in machine language is loaded into the memory for
execution.
❖ For every programming language, we have a different compiler; however, the basic tasks
performed by every compiler are the same.
LIST OF COMPILERS
Ada compilers
ALGOL compilers
BASIC compilers
C# compilers
C++ compilers
COBOL compilers
Common Lisp compilers
Java compilers
Pascal compilers
PL/I compilers
Python compilers
Smalltalk compilers
CIL compilers
High-Level Language: If a program contains #define or #include directives such as #include or #define
it is called HLL. They are closer to humans but far from machines. These (#) tags are called preprocessor
directives. They direct the pre-processor about what to do.
Pre-Processor: The pre-processor removes all the #include directives by including the files called file
inclusion and all the #define directives using macro expansion. It performs file inclusion, augmentation,
macro-processing, etc.
Assembly Language: It’s neither in binary form nor high level. It is an intermediate state that is a
combination of machine instructions and some other useful data needed for execution.
Assembler: For every platform (Hardware + OS) we will have an assembler. They are not universal
since for each platform we have one. The output of the assembler is called an object file. Its translates
assembly language to machine code.
INTERPRETER: An interpreter is a program that appears to execute a source program as if it were
machine language.

Figure 1.3: Interpreter

Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA also uses
interpreter. The process of interpretation can be carried out infollowing phases.
Lexical analysis Syntax
analysis Semantic analysis
Direct Execution
Advantages:
Modification of user program can be easily made and implemented as execution proceeds. Type of
object that denotes a various may change dynamically. Debugging a program and finding errors is
simplified task for a program used for interpretation.
The interpreter for the language makes it machine independent.
Disadvantages:
h of the program is slower. Memory consumption is more.
the execution

Relocatable Machine Code: It can be loaded at any point and can be run. The address within the
program will be in such a way that it will cooperate with the program movement.

Loader/Linker: It converts the relocatable code into absolute code and tries to run the program resulting
in a running program or an error message (or sometimes both can happen). Linker loads a variety of
object files into a single file to make it executable. Then loader loads it in memory and executes it.
Types of Compilers
Cross-Compilers − These are the compilers that run on one machine and make code for another machine.
A cross compiler is a compiler adequate for making executable code for a platform other than the one on
which the compiler is running. Cross compiler tools are used to create executables for installed systems or
several platforms.
Single-Pass Compiler − In a single-pass compiler, when a line source is processed it is scanned and the
tokens are extracted. Thus the syntax of the line is inspected and the tree structure and some tables
including data about each token are constructed. Finally, after the semantical element is tested for
correctness, the code is created. The same process is repeated for each line of code until the whole program
is compiled. Usually, the entire compiler is built around the parser, which will call procedures that will
perform different functions.
Multi-Pass Compiler − The compiler scans the input source once and makes the first modified structure,
therefore scans the first-produced form and makes a second modified structure, etc., until the object form
is produced. Such a compiler is known as a multi-pass compiler.
Structure of a Compiler
PHASES OF THE COMPILER
Phases of a compiler: A compiler operates in phases. A phase is a logically interrelated operation that
takes source program in one representation and produces output in another representation. The phases of
a compiler are shownin below
There are two phases of compilation.
Analysis (Machine Independent/Language Dependent)
Synthesis (Machine Dependent/Language independent) Compilation process is partitioned into no-of-sub
processes called “phases”.

We basically have two phases of compilers, namely the Analysis phase and Synthesis phase. The analysis
phase creates an intermediate representation from the given source code. The synthesis phase creates an
equivalent target program from the intermediate representation.
Symbol Table – It is a data structure being used and maintained by the compiler, consisting of all the
identifier’s names along with their types. It helps the compiler to function smoothly by finding the
identifiers quickly.
The analysis of a source program is divided into mainly three phases. They are:
1. Lexical Analyzer –
It is also called a scanner. It takes the output of the preprocessor (which performs file inclusion and
macro expansion) as the input which is in a pure high-level language. It reads the characters from the
source program and groups them into lexemes (sequence of characters that “go together”). Each lexeme
corresponds to a token. Tokens are defined by regular expressions which are understood by the lexical
analyzer. It also removes lexical errors (e.g., erroneous characters), comments, and white space.
2. Syntax Analyzer –
It is sometimes called a parser. It constructs the parse tree. It takes all the tokens one by one and uses
Context-Free Grammar to construct the parse tree.
Why Grammar?
The rules of programming can be entirely represented in a few productions. Using these productions
we can represent what the program actually is. The input has to be checked whether it is in the desired
format or not.
The parse tree is also called the derivation tree. Parse trees are generally constructed to check for
ambiguity in the given grammar. There are certain rules associated with the derivation tree.
• Any identifier is an expression
• Any number can be called an expression
• Performing any operations in the given expression will always result in an expression. For
example, the sum of two expressions is also an expression.
• The parse tree can be compressed to form a syntax tree
Syntax error can be detected at this level if the input is not in accordance with the grammar.

3. Semantic Analyzer –
It verifies the parse tree, whether it’s meaningful or not. It furthermore produces a verified parse tree. It
also does type checking, Label checking, and Flow control checking.
4. Intermediate Code Generator –
It generates intermediate code, which is a form that can be readily executed by a machine We hav e
many popular intermediate codes. Example – Three address codes etc. Intermediate code is converted to
machine language using the last two phases which are platform dependent.
Till intermediate code, it is the same for every compiler out there, but after that, it depends on the
platform. To build a new compiler we don’t need to build it from scratch. We can take the intermediate
code from the already existing compiler and build the last two parts.
5. Code Optimizer –
It transforms the code so that it consumes fewer resources and produces more speed. The meaning of the
code being transformed is not altered. Optimization can be categorized into two types: machine-
dependent and machine-independent.
6. Target Code Generator –
The main purpose of the Target Code generator is to write a code that the machine can understand and
also register allocation, instruction selection, etc. The output is dependent on the type of assembler. This
is the final stage of compilation. The optimized code is converted into relocatable machine code which
then forms the input to the linker and loader.
All these six phases are associated with the symbol table manager and error handler as shown in the
above block diagram

You might also like