Compilers and Translators Assignment
Compilers and Translators Assignment
QUESTION:
WRITE ABOUT THE STRUCTURE OF THE LEXICAL ANALYZER.
Lexical Analysis is the first phase of compilation process which is also known as a scanner.
It takes source codes as input and convert meaningful lexemes by reading one character at a
time.
Lexical analyzers represent these lexemes in the form of tokens. The lexical analyzer
however, takes lexemes as inputs and generates tokens. Lexemes are similar to words which
individually have their own meanings. Whereas group of lexemes in their entirety convey the
meaning.
Lexical Analysis in compiler design are known as lexical analyzers. A lexical analyzer
contains tokenizer or scanner. Lexical analyzers take characters making up the source
program is read from left to right and grouped into tokens that are sequences of characters
having collective meaning. Lexical analyzers also detect invalid tokens and generates an
error. The role of Lexical Analyzer in compiler design is to read character streams from the
source code, check for tokens legalization, and pass the data to the syntax analyzer when it
demands.
tokens
Source Lexical Parser
program Analyzer / syntax analyzer
Get next token
Symbol table
In the compilation face, whatever source program that is written in a high level
programming language such as C, C++ must go through the lexical analysis face. This face
involves cleaning up the input text and preparing it for lexical analysis. This may include
removing comments, whitespace, and other non-essential characters from the input text.
There changes the source code into a pure high level code without comments and
whitespaces.
Remove
Lexemes Scanning Analyzing Tokens
Non
token
element
So the lexical analyzer converts the sequence of statements and characters in a high level
programming language into tokens and those tokens are entered and represented into the
parser which is also known as the syntax tree and these sequence form the token to the parser
repeated. In this process sequence of tokens are broken into input text. This is usually done
by matching the characters in the input text against a set of patterns or regular expressions
that define the different types of tokens.
Example of a line of code taken from a C program:
c=a+b
After a lexical analysis, a symbol table is generated as given below. This code contains
characters but 5 tokens:
c – identifier + - operator
= - operator
a - identifier
b - identifier
Each token consists of one or more characters that are collected into a unit before further
processing takes place.
In this tokenization stage, the lexical analyzer determines the type of each token. For
example, in a programming language, the lexical analyzer might classify keywords,
identifiers, operators, and punctuation symbols as separate token types.
The lexical analyzer checks that each token is valid according to the rules of the
programming language. For example, it might check that a variable name is a valid identifier,
or that an operator has the correct syntax.
So in the diagram above the lexical analyzer and the parser are both dependent on the symbol
table.
Now the symbol table consist of all identifiers name and their types that are stored. The
symbol table makes it easier to the lexical analysis face or the parser face to quickly search
the identifiers record.
Upon receiving the get next token command from the parser, the lexical analyzer reads the
input character and then it can identify the next token.
In the final stage, the lexical analyzer generates the output of the lexical analysis process,
which is typically a list of tokens. This list of tokens can then be passed to the next stage of
compilation or interpretation.
In summary, lexical analysis is the very first phase in the compiler designing. it helps you to
convert a sequence of characters into a sequence of tokens. The lexical analyzer breaks this
syntax into a series of tokens. It removes any extra space or comment written in the source
code. The role of Lexical Analyzer in compiler design is to read character streams from the
source code, check for legal tokens, and pass the data to the syntax analyzer when it
demands.
Citation:
[1] https://www.geeksforgeeks.org/introduction-of-lexical-analysis/
[2] Compiler-construction-principles-and-practice-k-c-louden-pws-1997-cmp-2002-592s.pdf
[3] https://www.nesoacademy.org/cs/12-compiler-design/ppts/01-introduction-to-compiler-
design