Problems in Compilation
Problems in Compilation
Problems in Compilation
Issues in scanning
By;
Maira
Maimoona
Shomaila
To;
Madam Ayesha
Background knowledge;
computer
software hardware
series of 1s and 0 s
What's a lexeme?
A lexeme is a sequence of characters that are
included in the source program according to the
matching pattern of a token. It is nothing but an
instance of a token.
What's a token?
The token is a sequence of characters which
represents a unit of information in the source
program.
What is Pattern?
A pattern is a description which is used by the
token. In the case of a keyword which uses as a
token, the pattern is a sequence of characters.
Lexical Analyzer Architecture:
How tokens are recognized
• Lookahead
• Ambiguities
Lookahead;
Lookahead is required to decide when one
token will end and the next token will
begin. The simple example which has
lookahead issues are i vs. if, = vs. ==.
Therefore a way to describe the lexemes
of each token is required.
Cont…
A way needed to resolve ambiguities
• Is if it is two variables i and f or if?
• Is == is two equal signs =, = or ==?
• arr(5, 4) vs. fn(5, 4) II in Ada (as array
reference syntax and function call syntax are
similar.
Hence, the number of lookahead to be
considered and a way to describe the
lexemes of each token is also needed.
Regular expressions are one of the most
popular ways of representing tokens.
Ambiguities
Lex can handle ambiguous specifications.
When more than one expression can
match the current input, lex chooses as
follows:
• The longest match is preferred.
• Among rules which matched the same
number of characters, the rule given first
is preferred.
Error Recovery Schemes;
• Panic mode recovery
• Local correction
• Global correction
Lexical error handling
approaches;
Lexical errors can be handled by the
following actions:
• Deleting one character from the
remaining input.
• Inserting a missing character into the
remaining input.
• Replacing a character by another
character.
• Transposing two adjacent characters.