Unit-1 F&CD
Unit-1 F&CD
Unit-1 F&CD
Take a look at the following illustration. It shows the scope of each type of grammar −
Type - 3 Grammar
Type-3 grammars generate regular languages. Type-3 grammars must have a single non-terminal on the
left-hand side and a right-hand side consisting of a single terminal or single terminal followed by a single
non-terminal.
Type -3 grammar is accepted by FINITE AUTOMATA
The productions must be in the form X → a or X → aY
where X, Y ∈ N (Non terminal)
and a ∈ T (Terminal)
The rule S → ε is allowed if S does not appear on the right side of any rule.
Example
X→ε
X → a | aY
Y→b
Type - 2 Grammar
Type - 0 Grammar
Type-0 grammars generate recursively enumerable languages. The productions have no restrictions. They
are any phase structure grammar including all formal grammars.
They generate the languages that are recognized by a Turing machine.
The productions can be in the form of α → β where α is a string of terminals and non terminals with at least
one non-terminal and α cannot be null. β is a string of terminals and non-terminals.
Example
S → ACaB
Bc → acB
CB → DB
aD → Db
A CFG is in Chomsky Normal Form if the Productions are in the following forms −
•A → a
• A → BC
•S → ε
where A, B, and C are non-terminals and a is terminal.
Algorithm to Convert into Chomsky Normal Form −
Step 1 − If the start symbol S occurs on some right side, create a new start symbol S’ and a new
production S’→ S.
Step 2 − Remove Null productions. (Using the Null production removal algorithm discussed earlier)
Step 3 − Remove unit productions. (Using the Unit production removal algorithm discussed earlier)
Step 4 − Replace each production A → B1…Bn where n > 2 with A → B1C where C → B2 …Bn. Repeat
this step for all productions having two or more symbols in the right side.
Step 5 − If the right side of any production is in the form A → aB where a is a terminal and A, B are non-
terminal, then the production is replaced by A → XB and X → a. Repeat this step for every production
which is in the form A → aB.
Problem
Convert the following CFG into CNF
S → ASA | aB, A → B | S, B → b | ε
Solution
(1) Since S appears in R.H.S, we add a new state S0 and S0→S is added to the production set and it
becomes −
S0→S, S→ ASA | aB, A → B | S, B → b | ∈
(a+b)* Set of strings of a’s and b’s of any length including the null string.
So L = { ε, a, b, aa , ab , bb , ba, aaa…….}
(a+b)*abb Set of strings of a’s and b’s ending with the string abb. So L = {abb,
aabb, babb, aaabb, ababb, …………..}
Any set that represents the value of the Regular Expression is called a Regular Set.
Properties of Regular Sets
Property 1. The union of two regular set is regular.
Proof −
Let us take two regular expressions
RE1 = a(aa)* and RE2 = (aa)*
So, L1 = {a, aaa, aaaaa,.....} (Strings of odd length excluding Null)
and L2 ={ ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∪ L2 = { ε, a, aa, aaa, aaaa, aaaaa, aaaaaa,.......}
(Strings of all possible lengths including Null)
RE (L1 ∪ L2) = a* (which is a regular expression itself)
Hence, proved.
Property 2. The intersection of two regular set is regular.
Proof −
Let us take two regular expressions
RE1 = a(a*) and RE2 = (aa)*
So, L1 = { a,aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∩ L2 = { aa, aaaa, aaaaaa,.......} (Strings of even length excluding Null)
RE (L1 ∩ L2) = aa(aa)* which is a regular expression itself.
Hence, proved.
Property 3. The complement of a regular set is regular.
Proof −
Let us take a regular expression −
RE = (aa)*
So, L = {ε, aa, aaaa, aaaaaa, .......} (Strings of even length including Null)
Complement of L is all the strings that is not in L.
So, L’ = {a, aaa, aaaaa, .....} (Strings of odd length excluding Null)
RE (L’) = a(aa)* which is a regular expression itself.
Hence, proved.
Property 4. The difference of two regular set is regular.
Proof −
Let us take two regular expressions −
RE1 = a (a*) and RE2 = (aa)*
So, L1 = {a, aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 – L2 = {a, aaa, aaaaa, aaaaaaa, ....}
(Strings of all odd lengths excluding Null)
RE (L1 – L2) = a (aa)* which is a regular expression.
Hence, proved.
Property 5. The reversal of a regular set is regular.
Proof −
We have to prove LR is also regular if L is a regular set.
Let, L = {01, 10, 11, 10}
RE (L) = 01 + 10 + 11 + 10
LR = {10, 01, 11, 01}
RE (LR) = 01 + 10 + 11 + 10 which is regular
Hence, proved.
Property 6. The closure of a regular set is regular.
Proof −
If L = {a, aaa, aaaaa, .......} (Strings of odd length excluding Null)
i.e., RE (L) = a (aa)*
L* = {a, aa, aaa, aaaa , aaaaa,……………} (Strings of all lengths excluding Null)
RE (L*) = a (a)*
Hence, proved.
Property 7. The concatenation of two regular sets is regular.
Proof −
Let RE1 = (0+1)*0 and RE2 = 01(0+1)*
Here, L1 = {0, 00, 10, 000, 010, ......} (Set of strings ending in 0)
and L2 = {01, 010,011,.....} (Set of strings beginning with 01)
Then, L1 L2 = {001,0010,0011,0001,00010,00011,1001,10010,.............}
Set of strings containing 001 as a substring which can be represented by an RE − (0 + 1)*001(0 + 1)*
Hence, proved.
Identities Related to Regular Expressions
Given R, P, L, Q as regular expressions, the following identities hold −
• ∅* = ε
• ε* = ε
• RR* = R*R
• R*R* = R*
• (R*)* = R*
• RR* = R*R
• (PQ)*P =P(QP)*
• (a+b)* = (a*b*)* = (a*+b*)* = (a+b*)* = a*(ba*)*
• R + ∅ = ∅ + R = R (The identity for union)
• R ε = ε R = R (The identity for concatenation)
• ∅ L = L ∅ = ∅ (The annihilator for concatenation)
• R + R = R (Idempotent law)
• L (M + N) = LM + LN (Left distributive law)
• (M + N) L = ML + NL (Right distributive law)
• ε + RR* = ε + R*R = R*
a a b
b c a
c b c
Example
Let a non-deterministic finite automaton be →
• Q = {a, b, c}
• ∑ = {0, 1}
• q0 = {a}
• F = {c}
The transition function δ as shown below −
a a, b b
b c a, c
c b, c c
DFA vs NDFA
The following table lists the differences between DFA and NDFA.
DFA NDFA
The transition from a state is to a single The transition from a state can be to
particular next state for each input symbol. multiple next states for each input symbol.
Hence it is called deterministic. Hence it is called non-deterministic.
Empty string (ε -moves) transitions are not seen NDFA permits empty string (ε -moves)
in DFA. transitions.
Number of states are more, so designing & Number of states are less, so designing &
understanding is difficult understanding is easy
Acceptability by DFA and NDFA
• A string is accepted by a DFA/NDFA iff the DFA/NDFA starting at the initial state ends in an accepting
state (any of the final states) after reading the string wholly.
• A string S is accepted by a DFA/NDFA (Q, ∑, δ, q0, F), iff
δ*(q0, S) ∈ F
Strings accepted by the above DFA: {0, 00, 11, 010, 101, ...........}
Strings not accepted by the above DFA: {1, 011, 111, ........}
1.8 Conversion of NFA to DFA
Problem Statement
Let X = (Qx, ∑, δx, q0, Fx) be an NFA which accepts the language L(X). We have to design an equivalent
DFA Y = (Qy, ∑, δy, q0, Fy) such that L(Y) = L(X). The following procedure converts the NDFA to its
equivalent DFA −
Input − An NDFA.
Output − An equivalent DFA.
Step 1 – start from the beginning state of the NFA& take the state within [].
Step 2 – place the next states for the beginning state for the given input in the next state columns. Put them
also in [].
Step 3 – If any new combination of state appears in next state column, which is not yet taken in the present
state column, then take that combination of state in the present state column.
Step 4 – If in the present state column more than one state appears, then the next state for that combination
will be the combination of the next states for each of the states.
Step 5 – If no new combination of state appears, which is not yet taken in the present state column, stop
the process.
Step 6 – The beginning state for the constructed DFA will be the beginning state of NFA.
Step 7 – The final state or final states for the constructing DFA will be the combination of states containing
at least one final states.
Example
Let us consider the NDFA shown in the figure below.
q δ(q,0) δ(q,1)
a {a,b,c,d,e} {d,e}
b {c} {e}
c ∅ {b}
d {e} ∅
e ∅ ∅
Using the above algorithm, we find its equivalent DFA. The state table of the DFA is shown in below.
q δ(q,0) δ(q,1)
[d,e] [e] ∅
[e] ∅ ∅
[c, e] ∅ [b]
[c] ∅ [b]
Method
Step 1 Construct an NFA with Null moves from the given regular expression.
Step 2 Remove Null transition from the NFA and convert it into its equivalent DFA.
Problem
Convert the following RA into its equivalent DFA − 1 (0 + 1)* 0
Solution
We will concatenate three expressions "1", "(0 + 1)*" and "0"
Now we will remove the ε transitions. After we remove the ε transitions from the NDFA, we get the
following −
It is an NDFA corresponding to the RE − 1 (0 + 1)* 0. If you want to convert it into a DFA, simply apply
the method of converting NDFA to DFA .
Finite Automata with Null Moves (NFA-ε)
A Finite Automaton with null moves (FA-ε) does transit not only after giving input from the alphabet set
but also without any input symbol. This transition without input is called a null move.
An NFA-ε is represented formally by a 5-tuple (Q, ∑, δ, q0, F), consisting of
• Q − a finite set of states
• ∑ − a finite set of input symbols
• δ − a transition function δ : Q × (∑ ∪ {ε}) → 2Q
• q0 − an initial state q0 ∈ Q
• F − a set of final state/states of Q (F⊆Q).
If in an NDFA, there is ϵ-move between vertex X to vertex Y, we can remove it using the following steps
−
Solution
Step 1 −
Here the ε transition is between q1 and q2, so let q1 is X and qf is Y.
Here the outgoing edges from qf is to qf for inputs 0 and 1.
Step 2 −
Now we will Copy all these edges from q1 without changing the edges from qf and get the following FA −
Step 3 −
Here q1 is an initial state, so we make qf also an initial state.
So the FA becomes −
Step 4 −
Here qf is a final state, so we make q1 also a final state.
So the FA becomes −
Example 1:
Solution: First we will construct the transition diagram for a given regular expression.
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Now we have got NFA without ε. Now we will convert it into required DFA for that, we will first write a
transition table for this NFA.
State 0 1
→q0 q3 {q1,
q2}
q1 qf ϕ
q2 ϕ q3
q3 q3 qf
*qf ϕ ϕ
State 0 1
[q1] [qf] Φ
[q2] Φ [q3]
[q3] [q3] [qf]
*[qf] Φ Φ
Example 2:
Step 1:
Step 2:
Step 3:
Example 3:
Solution:
Step 2:
Step 3:
Step 4:
1.10 Introduction to Compilers: phases of the compiler
The compiler is software that converts a program written in a high-level language (Source Language) to
a low-level language (Object/Target/Machine Language/0’s, 1’s).
TYPE OF TRANSLATORS: -
• Interpreter
• Compiler
• Preprocessor
The compiler is a type of translator, which takes a program written in a high-level programming language
as input and translates it into an equivalent program in low-level languages such as machine language or
assembly language.
❖ The program written in a high-level language is known as a source program, and the program
converted into low-level language is known as an object (or target) program.
❖ Moreover, the compiler traces the errors in the source program and generates the error report.
Without compilation, no program written in a high-level language can be executed.
❖ After compilation, only the program in machine language is loaded into the memory for
execution.
❖ For every programming language, we have a different compiler; however, the basic tasks
performed by every compiler are the same.
LIST OF COMPILERS
Ada compilers
ALGOL compilers
BASIC compilers
C# compilers
C++ compilers
COBOL compilers
Common Lisp compilers
Java compilers
Pascal compilers
PL/I compilers
Python compilers
Smalltalk compilers
CIL compilers
High-Level Language: If a program contains #define or #include directives such as #include or #define
it is called HLL. They are closer to humans but far from machines. These (#) tags are called preprocessor
directives. They direct the pre-processor about what to do.
Pre-Processor: The pre-processor removes all the #include directives by including the files called file
inclusion and all the #define directives using macro expansion. It performs file inclusion, augmentation,
macro-processing, etc.
Assembly Language: It’s neither in binary form nor high level. It is an intermediate state that is a
combination of machine instructions and some other useful data needed for execution.
Assembler: For every platform (Hardware + OS) we will have an assembler. They are not universal
since for each platform we have one. The output of the assembler is called an object file. Its translates
assembly language to machine code.
INTERPRETER: An interpreter is a program that appears to execute a source program as if it were
machine language.
Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA also uses
interpreter. The process of interpretation can be carried out infollowing phases.
Lexical analysis Syntax
analysis Semantic analysis
Direct Execution
Advantages:
Modification of user program can be easily made and implemented as execution proceeds. Type of
object that denotes a various may change dynamically. Debugging a program and finding errors is
simplified task for a program used for interpretation.
The interpreter for the language makes it machine independent.
Disadvantages:
h of the program is slower. Memory consumption is more.
the execution
Relocatable Machine Code: It can be loaded at any point and can be run. The address within the
program will be in such a way that it will cooperate with the program movement.
Loader/Linker: It converts the relocatable code into absolute code and tries to run the program resulting
in a running program or an error message (or sometimes both can happen). Linker loads a variety of
object files into a single file to make it executable. Then loader loads it in memory and executes it.
Types of Compilers
Cross-Compilers − These are the compilers that run on one machine and make code for another machine.
A cross compiler is a compiler adequate for making executable code for a platform other than the one on
which the compiler is running. Cross compiler tools are used to create executables for installed systems or
several platforms.
Single-Pass Compiler − In a single-pass compiler, when a line source is processed it is scanned and the
tokens are extracted. Thus the syntax of the line is inspected and the tree structure and some tables
including data about each token are constructed. Finally, after the semantical element is tested for
correctness, the code is created. The same process is repeated for each line of code until the whole program
is compiled. Usually, the entire compiler is built around the parser, which will call procedures that will
perform different functions.
Multi-Pass Compiler − The compiler scans the input source once and makes the first modified structure,
therefore scans the first-produced form and makes a second modified structure, etc., until the object form
is produced. Such a compiler is known as a multi-pass compiler.
Structure of a Compiler
PHASES OF THE COMPILER
Phases of a compiler: A compiler operates in phases. A phase is a logically interrelated operation that
takes source program in one representation and produces output in another representation. The phases of
a compiler are shownin below
There are two phases of compilation.
Analysis (Machine Independent/Language Dependent)
Synthesis (Machine Dependent/Language independent) Compilation process is partitioned into no-of-sub
processes called “phases”.
We basically have two phases of compilers, namely the Analysis phase and Synthesis phase. The analysis
phase creates an intermediate representation from the given source code. The synthesis phase creates an
equivalent target program from the intermediate representation.
Symbol Table – It is a data structure being used and maintained by the compiler, consisting of all the
identifier’s names along with their types. It helps the compiler to function smoothly by finding the
identifiers quickly.
The analysis of a source program is divided into mainly three phases. They are:
1. Lexical Analyzer –
It is also called a scanner. It takes the output of the preprocessor (which performs file inclusion and
macro expansion) as the input which is in a pure high-level language. It reads the characters from the
source program and groups them into lexemes (sequence of characters that “go together”). Each lexeme
corresponds to a token. Tokens are defined by regular expressions which are understood by the lexical
analyzer. It also removes lexical errors (e.g., erroneous characters), comments, and white space.
2. Syntax Analyzer –
It is sometimes called a parser. It constructs the parse tree. It takes all the tokens one by one and uses
Context-Free Grammar to construct the parse tree.
Why Grammar?
The rules of programming can be entirely represented in a few productions. Using these productions
we can represent what the program actually is. The input has to be checked whether it is in the desired
format or not.
The parse tree is also called the derivation tree. Parse trees are generally constructed to check for
ambiguity in the given grammar. There are certain rules associated with the derivation tree.
• Any identifier is an expression
• Any number can be called an expression
• Performing any operations in the given expression will always result in an expression. For
example, the sum of two expressions is also an expression.
• The parse tree can be compressed to form a syntax tree
Syntax error can be detected at this level if the input is not in accordance with the grammar.
3. Semantic Analyzer –
It verifies the parse tree, whether it’s meaningful or not. It furthermore produces a verified parse tree. It
also does type checking, Label checking, and Flow control checking.
4. Intermediate Code Generator –
It generates intermediate code, which is a form that can be readily executed by a machine We hav e
many popular intermediate codes. Example – Three address codes etc. Intermediate code is converted to
machine language using the last two phases which are platform dependent.
Till intermediate code, it is the same for every compiler out there, but after that, it depends on the
platform. To build a new compiler we don’t need to build it from scratch. We can take the intermediate
code from the already existing compiler and build the last two parts.
5. Code Optimizer –
It transforms the code so that it consumes fewer resources and produces more speed. The meaning of the
code being transformed is not altered. Optimization can be categorized into two types: machine-
dependent and machine-independent.
6. Target Code Generator –
The main purpose of the Target Code generator is to write a code that the machine can understand and
also register allocation, instruction selection, etc. The output is dependent on the type of assembler. This
is the final stage of compilation. The optimized code is converted into relocatable machine code which
then forms the input to the linker and loader.
All these six phases are associated with the symbol table manager and error handler as shown in the
above block diagram