Lexical Analysis in Compiler Design With Example
Lexical Analysis in Compiler Design With Example
Example
What is Lexical analysis?
Lexical analysis is the very first phase in the compiler designing. It takes
the modified source code which is written in the form of sentences. In other
words, it helps you to converts a sequence of characters into a sequence of
tokens. The lexical analysis breaks this syntax into a series of tokens. It
removes any extra space or comment written in the source code.
Example
How Pleasant Is The Weather?
See this example; Here, we can easily recognize that there are five words
How Pleasant, The, Weather, Is. This is very natural for us as we can
recognize the separators, blanks, and the punctuation symbol.
Now, check this example, we can also read this. However, it will take some
time because separators are put in the Odd Places. It is not something
which comes to you immediately.
Basic Terminologies
What's a lexeme?
A lexeme is a sequence of characters that are included in the source
program according to the matching pattern of a token. It is nothing but an
instance of a token.
What's a token?
The token is a sequence of characters which represents a unit of
information in the source program.
What is Pattern?
A pattern is a description which is used by the token. In the case of a
keyword which uses as a token, the pattern is a sequence of characters.
Lexical analyzer scans the entire source code of the program. It identifies
each token one by one. Scanners are usually implemented to produce
tokens only when requested by a parser. Here is how this works-
1. "Get next token" is a command which is sent from the parser to the
lexical analyzer.
2. On receiving this command, the lexical analyzer scans the input until
it finds the next token.
3. It returns the token to Parser.
#include <stdio.h>
int maximum(int x, int y) {
// This will compare 2 numbers
if (x > y)
return x;
else {
return y;
}
}
Lexeme Token
int Keyword
maximum Identifier
( Operator
int Keyword
x Identifier
, Operator
int Keyword
Y Identifier
) Operator
{ Operator
If Keyword
Examples of Nontokens
Type Examples
Macro NUMS
Whitespace /n /b /t
Lexical Errors
A character sequence which is not possible to scan into any valid token is a
lexical error. Important facts about the lexical error:
Summary
Lexical analysis is the very first phase in the compiler designing
A lexeme is a sequence of characters that are included in the source
program according to the matching pattern of a token
Lexical analyzer is implemented to scan the entire source code of the
program
Lexical analyzer helps to identify token into the symbol table
A character sequence which is not possible to scan into any valid
token is a lexical error
Removes one character from the remaining input is useful Error
recovery method
Lexical Analyser scan the input program while parser perform syntax
analysis
It eases the process of lexical analysis and the syntax analysis by
eliminating unwanted tokens
Lexical analyzer is used by web browsers to format and display a
web page with the help of parsed data from JavsScript, HTML, CSS
The biggest drawback of using Lexical analyzer is that it needs
additional runtime overhead is required to generate the lexer tables
and construct the tokens