1

1.
What is the relationship between Automata and Complexity Theory

and Compiler Design? How Automata and complexity theory helps to
compiler design. Elaborate in detail.
Automata theory plays a crucial role in compiler design, particularly in the
early stages of lexical analysis, by providing a formal framework to recognize
patterns (like keywords, identifiers, and operators) within the source code
using finite automata (FA), while complexity theory helps analyze the time and
space requirements of different compiler algorithms, ensuring efficient
compilation by identifying potential bottlenecks and optimizing code
generation.
How Automata helps in Compiler Design:

 Lexical Analysis:
The most prominent application of automata in compilers is in the lexical analysis
phase, where a deterministic finite automaton (DFA) is used to scan the input source
code and identify tokens (meaningful units like variables, operators, keywords) based
on predefined patterns defined by regular expressions.
 Pattern Recognition:
By defining states and transitions in a DFA, the compiler can effectively recognize
specific sequences of characters that constitute valid tokens in the programming
language.
 Error Detection:
When an input sequence does not match any valid transition in the DFA, it indicates a
lexical error, allowing the compiler to identify and report syntax errors early in the
compilation process.
How Complexity Theory helps in Compiler Design:
 Algorithm Analysis:
Complexity theory provides tools to analyze the time and space complexity of different
compiler algorithms, such as parsing algorithms (like recursive descent or LR parsing)
and code generation techniques, allowing developers to choose the most efficient
approach for a given scenario.
 Optimization Strategies:
By understanding the computational complexity of different optimization techniques,
compiler designers can identify which optimizations are most likely to yield significant
performance improvements without introducing excessive overhead.
 Trade-offs in Design Decisions:
Complexity analysis helps in making informed decisions about the trade-offs between
different compiler design choices, such as choosing a more complex parsing algorithm
for potentially better optimization opportunities.
Key Concepts:
 Finite Automata (FA):
A basic model of computation used in lexical analysis to recognize patterns in input
strings.
 Regular Expressions:
A notation used to describe patterns recognized by a finite automaton, which are often
used to define the syntax of tokens in a programming language.
 Time Complexity:
Measures the execution time of an algorithm as a function of the input size, typically
expressed as O-notation.
 Space Complexity:
Measures the amount of memory required by an algorithm as a function of the input
size.
Example:
 Identifying a valid identifier: A compiler might use a DFA to recognize a valid identifier in
a programming language, where the initial state represents the start of the identifier,
and transitions are made based on whether the current character is a letter or a digit.
2. What is the difference between Pars Tree, Abstract Syntax Tree and Directed
Acyclic Graph? Describe clearly with appropriate example.
A parse tree represents the complete syntactic structure of a string according

to a grammar, capturing all details including parentheses and operator
precedence, while an abstract syntax tree (AST) is a simplified version of a
parse tree that focuses on the essential structure of the code, removing
unnecessary details like parentheses, and a directed acyclic graph (DAG) is a
more optimized version of an AST where identical sub-expressions are shared
to reduce redundancy, essentially representing the data flow without cycles.
Example:
Consider the expression "a * (b + c) - d":

 Parse Tree:
 Root: "Expression"
 Children:
 "Term": "a * (b + c)"
 Child:
 Operator: "-"
 "Term": "d"
 Abstract Syntax Tree (AST):
 Directed Acyclic Graph (DAG):
 Nodes:
 "Variable": "b"
 "Variable": "c"
 "BinaryOperation": "+": (links to "b" and "c")
 "BinaryOperation": "*": (links to "a" and "+")
 "BinaryOperation": "-": (links to "*" and "d")
Key points:
 Parse Tree:
Represents the full structure of the input string, including all syntax details.
 AST:
Removes unnecessary details from the parse tree, focusing on the essential structure
of the code.
 DAG:
Further optimizes the AST by sharing identical sub-expressions, representing data
flow without cycles.
3. What is symbol table and why each complier phase are connected to it?
Discuss in detail.
The symbol table is used to store essential information about every

symbol contained within the program.
Virtually every phase of the compiler will use the symbol table.
The initialization phase will place keywords, operators, and standard
identifiers in it.
A symbol table is a data structure used by a compiler to store information

about identifiers (variables, functions, constants) encountered during the
compilation process, including their data types, scope, memory locations, and
other relevant attributes, making it crucial for almost every phase of
compilation as it allows the compiler to check for semantic correctness and
generate appropriate code based on the identifier's context throughout the
program.
How each compiler phase interacts with the symbol table:

 Lexical Analysis:
During this phase, the source code is broken down into tokens (keywords, identifiers,
operators, etc.). When an identifier is recognized, its name is added to the symbol
table along with its data type (if declared).
 Syntax Analysis (Parsing):
The parser uses the information from the symbol table to ensure the syntax of the
program is correct. It checks if the identifiers used in expressions are valid and have
appropriate data types based on their entries in the symbol table.
 Semantic Analysis:
This phase heavily relies on the symbol table to check for semantic errors like
undeclared variables, type mismatches, and scope violations.
For example
If a variable is used before its declaration, the symbol table will not have an entry for it
indicating an error.
 Intermediate Code Generation:
The symbol table provides information about the data type and memory location of
variables, which is essential for generating intermediate code that accurately
represents the program's operations.
 Code Optimization:
Optimizers can use the symbol table to analyze the usage of variables and make
informed decisions about code optimizations, such as eliminating redundant
calculations or optimizing memory access.
 Code Generation:
The final machine code generation phase uses the symbol table to determine the
appropriate memory addresses and register assignments for each identifier, ensuring
the generated code correctly reflects the program's variables and functions.
Key points about the symbol table:
 Scope Management:
The symbol table often employs a hierarchical structure to manage the scope of
variables, where entries within a local block are only accessible within that block.
 Data Structures:
Hash tables are commonly used to efficiently search for entries in the symbol table
due to the frequent lookups required during compilation.
 Dynamic Updates:
As the compiler progresses through the source code, the symbol table is constantly
updated with new entries for declared identifiers.
4. Why intermediate codes are mostly machine independent? Explain in your

own words clearly
Because of the machine-independent intermediate code, portability will be

enhanced.
For example,
suppose, if a compiler translates the source language to its target
machine language without having the option for generating intermediate
code, then for each new machine, a full native compiler is required.
Because, obviously, there were some modifications in the compiler itself

according to the machine specifications.
Retargeting is facilitated.
It is easier to apply source code modification to improve the performance
of source code by optimizing the intermediate code.
If we generate machine code directly from source code then for n target
machine we will have optimizers and n code generator but if we will have a
machine-independent intermediate code, we will have only one optimizer.
Intermediate code can be either language-specific (e.g., Bytecode for Java)
or language. independent (three-address code). The following are commonly
used intermediate code representations:
1. Postfix Notation:
Also known as reverse Polish notation or suffix notation.
In the infix notation, the operator is placed between operands.
e.g. a +b. Postfix notation positions the operator at the right end as in ab +.
For any postfix expressions e1 and e2 with a binary operator (+) , applying
the operator yields e1e2+.
Postfix notation eliminates the need for parentheses, as the operator’s
position and arity allow unambiguous expression decoding.
In postfix notation, the operator consistently follows the operand.
Example 1: The postfix representation of the expression
(a + b) * c is : ab + c *
Example 2: The postfix representation of the expression
(a – b) * (c + d) + (a – b) is : ab – cd + *ab -+
Three-Address Code:
A three address statement involves a maximum of three references,
consisting of two for operands and one for the result.
A sequence of three address statements collectively forms a three
address code.
The typical form of a three address statement is expressed as x = y op
z, where x, y, and z represent memory addresses.
Each variable (x, y, z) in a three address statement is associated with
a specific memory location.
While a standard three address statement includes three references,
there are instances where a statement may contain fewer than three
references, yet it is still categorized as a three address statement.
Example: The three address code for the expression a + b * c + d : T1
= b * c T2 = a + T1 T3 = T2 + d; T 1 , T2 , T3 are temporary variables.
There are 3 ways to represent a Three-Address Code in compiler
design:
i) Quadruples
ii) Triples
iii) Indirect Triples
3. Syntax Tree:
A syntax tree serves as a condensed representation of a parse tree.
The operator and keyword nodes present in the parse tree undergo a
relocation process to become part of their respective parent nodes in
the syntax tree.
the internal nodes are operators and child nodes are operands.
Creating a syntax tree involves strategically placing parentheses within

the expression. This technique contributes to a more intuitive
representation, making it easier to discern the sequence in which
operands should be processed.
The syntax tree not only condenses the parse tree but also offers an
improved visual representation of the program’s syntactic structure,
Example: x = (a + b * c) / (a – b * c)

1

Uploaded by

Copyright:

Available Formats

1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1

Uploaded by

Copyright:

Available Formats

1.

What is the relationship between Automata and Complexity Theory

How Automata helps in Compiler Design:

A parse tree represents the complete syntactic structure of a string according

Consider the expression "a * (b + c) - d":

The symbol table is used to store essential information about every

A symbol table is a data structure used by a compiler to store information

How each compiler phase interacts with the symbol table:

4. Why intermediate codes are mostly machine independent? Explain in your

Because of the machine-independent intermediate code, portability will be

Because, obviously, there were some modifications in the compiler itself

Creating a syntax tree involves strategically placing parentheses within

You might also like