Language Processing Activities
Language Processing Activities
10 CSS 02
1|Page
2|Page
Program Generation
The program generator is a software system which accepts the specification of a program to be generated, and generates a program in the target PL. In effect, the program generator introduces a new domain between the 3|Page
application and PL domains we call this the program generator domain. The specification gap is now the gap between the application domain and the program generator domain. This gap is smaller than the gap between the application domain and the target PL domain. Reduction in the specification gap increases the reliability of the generated program. Since the generator domain is close to the application domain, it is easy for the designer or programmer to write the specification of the program to be generated. The harder task of bridging the gap to the PL domain is performed by the generator. This arrangement also reduces the testing effort. Proving the correctness of the program generator amounts to proving the correctness of the transformation .This would be performed while implementing the generator. To test an application generated by using the generator, it is necessary to only verify the correctness of the specification input to the program generator. This is a much simpler task than verifying correctness of the generated program. This task can be further simplified by providing a good diagnostic (i.e. error indication) capability in the program generator, which would detect inconsistencies in the specification. It is more economical to develop a program generator than to develop a problem-oriented language. This is because a problem-oriented language suffers a very large execution gap between the PL domain and the execution domain whereas the program generator has a smaller semantic gap to the target PL domain, which is the domain of a standard procedure oriented language. The execution gap between the target PL domain and the execution domain is bridged by the compiler or interpreter for the PL.
Program Execution
Two popular models for program execution are translation and interpretation. Program translation The program translation model bridges the execution gap by translating a program written in a PL, called the source program (SP), into an equivalent program in the machine or assembly language of the computer system, called the target program (TP) Characteristics of the program translation model are: A program must be translated before it can be executed. The translated program may be saved in a file. The saved program may be executed repeatedly. A program must be retranslated following modifications.
Program interpretation
The interpreter reads the source program and stores it in its memory. During interpretation it takes a source statement, determines its meaning and performs actions which implement it. This includes computational and inputoutput actions. The CPU uses a program counter (PC) to note the address of the next instruction to be executed. This instruction is subjected to the instruction execution cycle consisting of the following steps: Fetch the instruction. Decode the instruction to determine the operation to be performed, and also its operands. Execute the instruction. At the end of the cycle, the instruction address in PC is updated and the cycle is repeated for the next instruction. Program interpretation can proceed in an analogous manner. Thus, the PC can indicate which statement of the source program is to be interpreted next. This statement would be subjected to the interpretation cycle, which could consist of the following steps: 4|Page
Fetch the statement. Analyse the statement and determine its meaning, viz. the computation to be performed and its operands. Execute the meaning of the statement.
From this analogy, we can identify the following characteristics of interpretation: The source program is retained in the source form itself, i.e. no target program form exists. A statement is analysed during its interpretation.
Comparison
A fixed cost (the translation overhead) is incurred in the use of the program translation model. If the source program is modified, the translation cost must be incurred again irrespective of the size of the modification. However, execution of the target program is efficient since the target program is in the machine language. Use of the interpretation model does not incur the translation overheads. This is advantageous if a program is modified between executions, as in program testing and debugging.
The analysis phase uses each component of the source language specification to determine relevant information concerning a statement in the source program. Thus, analysis of a source statement consists of lexical, syntax and semantic analysis. The synthesis phase is concerned with the construction of target language statement(s) which have the same meaning as a source statement. Typically, this consists of two main activities: Creation of data structures in the target program 5|Page
Generation of target code. We refer to these activities as memory allocation and code generation, respectively.
Semantic analysis
Semantic analysis of declaration statements differs from the semantic analysis of imperative statements. The former results in addition of information to the symbol table, e.g. type, length and dimensionality of variables. The latter identifies the sequence of actions necessary to implement the meaning of a source statement. In both cases the structure of a source statement guides the application of the semantic rules. Semantic analysis determines the meaning of a sub tree in the IC. It adds information to a table or adds an action to the sequence. It then modifies the IC to enable further semantic analysis. The analysis ends when the tree has been completely processed.
language grammar is a set of rules which precisely specify the sentences of L. It is clear that natural languages are not formal languages due to their rich vocabulary. However, PLs are formal languages.
Source text analysis is based on the grammar of the source language. The component subtasks of analysis phase are: Syntax analysis, which determine the syntactic structure of the source statement. Semantic analysis, which determines the meaning of a statement, once its grammatical structures become known.
Syntax analysis Semantic analysis Syntax analysis determines the grammatical or syntactic structure or the input statement & represents it in an intermediate form from which semantic analysis can be performed. A compiler must perform two major tasks: The analysis of a source program & the synthesis of its corresponding object program. The analysis task deals with the decomposition of the source program into its basic parts using these basic parts the synthesis task builds their equivalent object program modules. A source program is a string of symbols each of which is generally a letter, a digit or a certain special constants, keywords & operators. It is therefore desirable for the compiler to identify these various types as classes. The source program is input to a lexical analyser or scanner whose purpose is to separate the incoming text into pieces or tokens such as constants, variable name, keywords & operators. In essence, the lexical analyser performs low- level syntax analysis performs low-level syntax analysis. For efficiency reasons, each of tokens is given a unique internal representation number. The lexical analyser supplies tokens to the syntax analyser. The syntax analyser is much more complex than the lexical analyser its function is to take the source program from the lexical analyser & determines the manner in which it is to be decomposed into its constituent parts. That is, the syntax analyser determines the overall structure of the source program. The semantic analyser uses syntax analyser. The function of the semantic analyser is to determine the meaning the meaning (or semantics) of the source program. The semantic analyser is passed on to the code generators. At this point the intermediate form of the source language programs usually translated to either assembly language or machine language. The output of the code generator is passed on to a code optimizer.
Loader, linking loaders, linkage editors are used in software literature. The loader is program, which accepts the object program decks, prepares this program for execution by the computer and initializes the execution. In particular the loader must perform four functions:
8|Page
Allocate space in memory for the program (allocation). Resolve symbolic references between objects decks (linking). Adjust all address dependent locations, such as address constants, to correspond to the allocated space(relocation). Physically place the machine instructions and data into memory (loading).
AT&T code are available as open source, as part of systems such as OpenSolaris and Plan 9 from Bell Labs. Another popular open source version of Lex is flex, the "fast lexical analyzer".The structure of a Lex file is intentionally similar to that of a yacc file; files are divided into three sections, separated by lines that contain only two percent signs, as follows: Definition section %% Rules section %% C code section The definition section defines macros and imports header files written in C. It is also possible to write any C code here, which will be copied verbatim into the generated source file. The rules section associates regular expression patterns with C statements. When the lexer sees text in the input matching a given pattern, it will execute the associated C code. The C code section contains C statements and functions that are copied verbatim to the generated source file. These statements presumably contain code called by the rules in the rules section. In large programs it is more convenient to place this code in a separate file linked in at compile time.
10 | P a g e
/*** Rules section ***/ /* [0-9]+ matches a string of one or more digits */ [0-9]+ { /* yytext is a string containing the matched text. */ printf("Saw an integer: %s\n", yytext); }
.|\n { /* Ignore all other characters. */ } %% /*** C Code section ***/ int main(void) { /* Call the lexer, then quit. */ yylex(); return 0; } If this input is given to flex, it will be converted into a C file, lex.yy.c. This can be compiled into an executable which matches and outputs strings of integers. For example, given the input: abc123z.!&*2gj6 The program will print: Saw an integer: 123 Saw an integer: 2 Saw an integer: 6
stream.Scannerless parsing refers to where a parser consumes the input character-stream directly, without a distinct lexer.
Yacc The computer program Yacc is a parser generator developed by Stephen C. Johnson at AT&T for the Unix operating system. The name is an acronym for "Yet Another Compiler Compiler." It generates a parser (the part of a compiler that tries to make syntactic sense of the source code) based on an analytic grammar written in a notation similar to BNF. Yacc used to be available as the default parser generator on most Unix systems. It has since been supplanted as the default by more recent, largely compatible, programs such as Berkeley Yacc, GNU bison, MKS Yacc and Abraxas PCYACC. An updated version of the original AT&T version is included as part of Sun's OpenSolaris project. Each offers slight improvements and additional features over the original Yacc, but the concept has remained the same. Yacc has also been rewritten for other languages, including Ratfor, ML, Ada, Pascal, Java, Python, Ruby and Common Lisp. The parser generated by Yacc requires a lexical analyzer. Lexical analyzer generators, such as Lex or Flex are widely available. The IEEE POSIX P1003.2 standard defines the functionality and requirements for both Lex and Yacc.Some versions of AT&T Yacc have become open source. For example, source code (for different implementations) is available with the standard distributions of Plan 9 and OpenSolaris. Implementation Of Yacc and Lex is shown below:
13 | P a g e
---------------xxxxxx--------------14 | P a g e