Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
54 views

Compiler

Code optimization involves improving the quality and efficiency of code. Quality is measured by size and running time, with running time being more important for large computations. Local optimization focuses on improving code within a block, such as constant folding. Global optimization looks across blocks and functions. The main areas of code optimization are local optimization, loop optimization, and data flow analysis.

Uploaded by

Elisante David
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Compiler

Code optimization involves improving the quality and efficiency of code. Quality is measured by size and running time, with running time being more important for large computations. Local optimization focuses on improving code within a block, such as constant folding. Global optimization looks across blocks and functions. The main areas of code optimization are local optimization, loop optimization, and data flow analysis.

Uploaded by

Elisante David
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

CODE OPTIMIZATION: How the quality of object program is measured?

The quality of an object program is @ A given optimization technique may have to be applied repeatedly until no further optimization can be obtained.
measured by its Size or its running time. For large computation running time is particularly important. For small (For example, removing one redundant identifier may introduce another.) @ A given optimization technique may give
computations size may be as important or even more. What is the more accurate term for code optimization? The rise to other forms of redundancy and thus sequences of optimization techniques may have to be repeated. (For
more accurate term for code optimization would be “code improvement” Explain the principle sources of example, above we removed a redundant identifier and this gave rise to redundant code, but removing redundant
optimization. Code optimization techniques are generally applied after syntax analysis, usually both before and code may lead to further redundant identifiers.) @The order in which optimizations are applied may be significant.
during code generation. The techniques consist of detecting patterns in the program and replacing these patterns by (How many ways are there of applying n optimization techniques to a given piece of code?). REGULAR EXPRESSION
equivalent and more efficient constructs. What are the patterns used for code optimization? The patterns may be Recall from the Introduction that a lexical analyzer uses pattern matching with respect to rules associated with the
local or global and replacement strategy may be a machine dependent or independent. What are the 3 areas of code source language’s tokens. For example, the token then is associated with the pattern t, h, e, n, and the token id might
optimization? Local optimization//Loop optimization//Data flow analysis. Define local optimization. The be associated with the pattern “an alphabetic character followed by any number of alphanumeric characters”. The
optimization performed within a block of code is called a local optimization. Define constant folding. Deducing at notation of regular expressions is a mathematical formalism ideal for expressing patterns such as these, and thus
compile time that the value of an expression is a constant and using the constant instead is known as constant ideal for expressing the lexical structure of programming languages. Regular expressions represent patterns of strings
folding. What do you mean by inner loops? The most heavily traveled parts of a program, the inner loops, are an of symbols. A regular expression r matches a set of strings over an alphabet. This set is denoted L(r) and is called the
obvious target for optimization. Typical loop optimizations are the removal of loop invariant computations and the language determined or generated by r. Let Σ be an alphabet. We define the set RE(Σ) of regular expressions over Σ,
elimination of induction variables. What is code motion? Code motion is an important modification that decreases the strings they match and thus the languages they determine, as follows: @∅ ∈ RE(Σ) matches no strings. The
the amount of code in a loop. What are the properties of optimizing compilers? Transformation must preserve the language determined is L(∅) = ∅. @ǫ ∈ RE(Σ) matches only the empty string. Therefore, L(ǫ) = {ǫ}. @If a ∈ Σ then a ∈
meaning of programs//Transformation must, on the average, speed up the programs by a measurable amount//A RE(Σ) matches the string a. Therefore, L(a) = {a}. @if r and s are in RE(Σ) and determine the languages L(r) and L(s)
Transformation must be worth the effort. Give the block diagram of organization of code optimizer. respectively, then @ r|s ∈ RE(Σ) matches all strings matched either by r or by s. Therefore, L(r|s) = L(r) ∪ L(s).
COMPILATION TECHNIQUES: in History of compilation some scolar were involved, discuss the following techniques
to whom was involved, to which type programming language and year. @Single-pass table driven with stacks: Bauer
and Samelson for Alcon// Dijkstra 1960, Algol for X-1//Randell 1962, Whetstone Algol @Single-Pass recursive
descent: Hoare 1962, one procedure per language construct @ Multi-pass ad hoc: Fortran I, 6 passes. @ Multi-pass
table driven with stacks: Naur 1962 GIER Algol, 9 passes// Hawkins 1962 Kidsgrove Algol @ General syntax-directed
table driven: Irons 1961 Algol for CDC 1604. COMPILER Q & A: 1. What is a Complier? A Complier is a program that
reads a program written in one language-the source language-and translates it in to an equivalent program in another
language-the target language. As an important part of this translation process, the compiler reports to its user the
presence of errors in the source program. State some software tools that manipulate source program? 1) Structure
editors 2) Pretty printers 3) Static checkers 4) Interpreters. What are the cousins of compiler? The following are the
cousins of compilers 1) Preprocessors 2) Assemblers 3) Loaders 4) Link editors. What are the main two parts of
compilation? What are they performing The two main parts are? @Analysis part breaks up the source program into
constituent pieces and creates an intermediate representation of the source program. @ Synthesis part constructs
the desired target program from the intermediate representation What is a Structure editor? A structure editor takes
as input a sequence of commands to build a source program. The structure editor not only performs the text creation
and modification functions of an ordinary text editor but it also analyzes the program text putting an appropriate
hierarchical structure on the source program. What are a Pretty Printer and Static Checker? @ A Pretty printer
What are the advantages of the organization of code optimizer? analyses a program and prints it in such a way that the structure of the program becomes clearly visible @ A static
@The operations needed to implement high level constructs are made explicit in the intermediate code, so it is
checker reads a program, analyses it and attempts to discover potential bugs without running the program. How
possible to optimize them. @The intermediate code can be independent of the target machine, so the optimizer does
many phases does analysis consists? Analysis consists of three phases: 1: Linear analysis 2: Hierarchical analysis 3:
not have to change much if the code generator is replaced by one for a different machine. Define Local
Semantic analysis What happens in linear analysis? This is the phase in which the stream of characters making up the
transformation & Global Transformation. A transformation of a program is called Local, if it can be performed by
source program is read from left to right and grouped in to tokens that are sequences of characters having collective
looking only at the statements in a basic block otherwise it is called global. Give examples for function preserving
meaning. 9. What happens in Hierarchical analysis? This is the phase in which characters or tokens are grouped
transformations. @Common subexpression elimination @Copy propagation @Dead – code elimination @Constant
hierarchically in to nested collections with collective meaning. What happens in Semantic analysis? This is the phase
folding. What is meant by Common Subexpressions? An occurrence of an expression E is called a common
in which certain checks are performed to ensure that the components of a program fit together meaningfully. State
subexpression, if E was previously computed, and the values of variables in E have not changed since the previous
some compiler construction tools? @Parse Scanner generators @Syntax-directed translation engines @Automatic
computation. What is meant by Dead Code? A variable is live at a point in a program if its value can be used code generator @Data flow engines. What is a Loader? What does the loading process do? A Loader is a program
subsequently otherwise, it is dead at that point. The statement that computes values that never get used is known
that performs the two functions: 1: Loading 2: Link editing. The process of loading consists of taking relocatable
Dead code or useless code. What are the techniques used for loop optimization? @Code motion @Induction
machine code, altering the relocatable address and placing the altered instructions and data in memory at the proper
variable elimination @Reduction in strength What is meant by Reduction in strength? Reduction in strength is the
locations. What does the Link Editing does? Link editing: This allows us to make a single program from several files of
one which replaces an expensive operation by a cheaper one such as a multiplication by an addition. What is meant
relocatable machine code. These files may have been the result of several compilations, and one or more may be
by loop invariant computation? An expression that yields the same result independent of the number of times the
library files of routines provided by the system and available to any program that needs them. What is a
loop is executed is known as loop invariant computation. Define data flow equations. A typical equation has the
preprocessor? A preprocessor is one, which produces input to compilers. A source program may be divided into
form Out[S] = gen[S] U (In[S] – kill[S]) and can be read as, “the information at the end of a statement is either
modules stored in separate files. The task of collecting the source program is sometimes entrusted to a distinct
generated within the statement, or enters at the beginning and is not killed as control flows through the statement”.
program called a preprocessor. The preprocessor may also expand macros into source language
Such equations are called data flow equations. What are the two standard storage allocation strategies? The two statements. State some functions of Preprocessors 1) Macro processing 2) File inclusion 3) Relational
standard allocation strategies are @Static allocation. @Stack allocation. Discuss about static allocation. In static
Preprocessors 4) Language extensions. What is a Symbol table? A Symbol table is a data structure containing a
allocation the position of an activation record in memory is fixed at run time. Write short notes on activation tree. A
record for each identifier, with fields for the attributes of the identifier. The data structure allows us to find the record
tree which depicts the way of control enters and leaves activations. In an activation tree: @Each node represents an
for each identifier quickly and to store or retrieve data from that record quickly. State the general phases of a
activation of an procedure. @The root represents the activation of the main program. @The node for a is the parent
compiler 1) Lexical analysis 2) Syntax analysis 3) Semantic analysis 4) Intermediate code generation 5) Code
of the node for b , if and only if control flows from activation a to b @Node for a is to the left of the node for b, if and
optimization 6) Code generation. What is an assembler? Assembler is a program, which converts the source
only if the lifetime of a occurs before the lifetime of b. Define control stack. A stack which is used to keep track of live
language in to assembly language. What is the need for separating the analysis phase into lexical analysis and
procedure actions is known as control stack. Define heap. A separate area of run-time memory which holds all other
parsing? (Or) What are the issues of lexical analyzer? @ Simpler design is perhaps the most important consideration.
information is called a heap. Give the structure of general activation record
The separation of lexical analysis from syntax analysis often allows us to simplify one or the other of these phases
@Compiler efficiency is improved @ Compiler portability is enhanced. What is Lexical Analysis? The first phase of
compiler is Lexical Analysis. This is also known as linear analysis in which the stream of characters making up the
source program is read from left-to-right and grouped into tokens that are sequences of characters having a collective
meaning. What is a lexeme? Define a regular set. A Lexeme is a sequence of characters in the source program that is
matched by the pattern for a token. A language denoted by a regular expression is said to be a regular set. What is
a sentinel? What is its usage? A Sentinel is a special character that cannot be part of the source program. Normally
we use ‘eof’ as the sentinel. This is used for speeding-up the lexical analyzer. What is a regular expression? State the
rules, which define regular expression? Regular expression is a method to describe regular language. Rules: 1) ɛ-is a
regular expression that denotes { ɛ } that is the set containing the empty string 2) If a is a symbol in ∑,then a is a
regular expression that denotes {a} 3) Suppose r and s are regular expressions denoting the languages L(r ) and L(s)
Then, a) (r )/(s) is a regular expression denoting L(r) U L(s). b) (r )(s) is a regular expression denoting L(r )L(s) c) (r
)* is a regular expression denoting L(r)*. d) (r) is a regular expression denoting L(r ). What are the Error-recovery
actions in a lexical analyzer? 1. Deleting an extraneous character 2. Inserting a missing character 3. Replacing an
incorrect character by a correct character 4. Transposing two adjacent characters Construct Regular expression for
the language L= {w E{a,b}/w ends in abb} Ans: {a/b}*abb. What is recognizer? Recognizers are machines. These are
the machines which accept the strings belonging to certain language. If the valid strings of such language are
accepted by the machine then it is said that the corresponding language is accepted by that machine, otherwise it is
rejected. Syntax analysis is the second phase of the compiler. It gets the input from the tokens and generates a
syntax tree or parse tree. Advantages of grammar for syntactic specification: 1. A grammar gives a precise and easy-
to-understand syntactic specification of a programming language. 2. An efficient parser can be constructed
automatically from a properly designed grammar. 3. A grammar imparts a structure to a source program that is
useful for its translation into object code and for the detection of errors. 4. New constructs can be added to a
language more easily when there is a grammatical description of the language.
Discuss about stack allocation. In stack allocation a new activation record is pushed on to the stack for each
execution of a procedure. The record is popped when the activation ends. What are the 2 approaches to implement
dynamic scope? @Deep access @Shallow access What is padding? Space left unused due to alignment consideration
is referred to as padding. What are the 3 areas used by storage allocation strategies? @Static allocation @ stack
allocation @heap allocation What are the limitations of using static allocation? @The size of a data object and
constraints on its position in memory must be known at compile time. @Recursive procedure are restricted, because
all activations of a procedure use the same bindings for local name @Data structures cannot be created dynamically
since there is no mechanism for storage allocation at run time Define calling sequence and return sequence. A call
sequence allocates an activation record and enters information into its fields. A return sequence restores the state of Metalanguage: a language used to define another language LANGUAGES: Alphabet - any finite set of symbols {0, 1}:
the machine so that calling procedure can continue execution. When dangling reference occurs? A dangling binary alphabet String - a finite sequence of symbols from the alphabet 1011: a string of length 4  : the empty string
reference occurs when there is storage that has been deallocated. It is logical error to use dangling references, since Language - any set of strings on the alphabet {00, 01, 10, 11}: the set of strings of length 2  : the empty set.
the value of deallocated storage is undefined according to the semantics of most languages. Define static scope rule Operations on Languages Union of L and M, L  M L  M = { s | s  L or s  M} Concatenation of L and M,
and dynamic rule @Lexical or static scope rule determines the declaration that applies to a name by a examining the LM LM = {st | s  L and t  M} Kleene closure of L, L* L* = Positive closure of L, L+ L+ = i0Li I I 1L .
program text alone. @Dynamic scope rule determines the declaration applicable to name at runtime, by considering Operations on Strings concatenation: x = dog y = house xy = doghouse exponentiation: s0 = s1 = s s2 = ss
the current activations. What is block? Give its syntax. A block is a statement containing its own data declaration. A compiler is a computer program (or set of programs) that transforms source code written in a programming
Syntax: language (the source language) into another computer language (the target language, often having a binary form
{ known as object code). The most common reason for wanting to transform source code is to create
Declaration statements an executable program. Why to compile? Answer: Writing machine language-numeric codes is time consuming and
} tedious
What is access link? An access link is a pointer to each activation record which obtains a direct implementation of C7 06 0000 0002
lexical scope for nested procedure. What is known as environment and state? The term environment refers to a Mov x, 2
function that maps a name to a storage location. The term state refers to a function that maps a storage location to X=2
the value held there. How the run-time memory is sub-divided? @Generated target code @Data objects @A The assembly language has a number of defects: Not easy to write//Difficult to read and understand
counterpart of the control stack to keep track of procedure activations. NOTES: An optimizer attempts to improve the HISTORY OF COMPILER 1) The first compiler was developed between 1954 and 1957:The FORTRAN language and its
time and space requirements of a program. There are many ways in which code can be optimized, but most are compiler by a team at IBM led by John Backus//The structure of natural language was studied at about the same time
expensive in terms of time and space to implement. Common optimization’s include: @removing redundant by Noam Chomsky 2) The related theories and algorithms in the 1960s and 1970s: The classification of language:
identifiers, @removing unreachable sections of code, @ identifying common subexpressions, @unfolding loops and Chomsky hierarchy//The parsing problem was pursued:-Context-free language, parsing algorithms//-The symbolic
@eliminating procedures. Note that here we are concerned with the general optimization of abstract code. methods for expressing the structure of the words of a programming language:- Finite automata, Regular
Example. Consider the TAC code: expressions//-Methods have been developed for generating efficient object code: -Optimization techniques or code,
temp1 := x improvement techniques// 3) Programs were developed to automate the complier development for parsing
temp2 := temp1 @Parser generators, such as Yacc by Steve Johnson in 1975 for the Unix system @Scanner generators, such as Lex by
if temp1 = temp2 goto 200 Mike Lesk for Unix system about same time. 4)Projects focused on automating the generation of other parts of a
temp3 := temp1 * y compiler @Code generation was undertaken during the late 1970s and early 1980s @Less success due to our less
goto 300 than perfect understanding of them
200 temp3 := z
300 temp4 := temp2 + temp3 TYPES OF COMPILERS: 1) One-pass compiler, like early compilers for Pascal:The compilation is done in one pass,
Removing redundant identifiers (just temp2) gives hence it is very fast. 2)Threaded code compiler (or interpreter), like most implementations of FORTH: This kind of
temp1 := x compiler can be thought of as a database lookup program. It just replaces given strings in the source with given
if temp1 = temp1 goto 200 binary code. The level of this binary code can vary; in fact, some FORTH compilers can compile programs that don't
temp3 := temp1 * y even need an operating system. 3)Incremental compiler, like many Lisp systems: Individual functions can be compiled
goto 300 in a run-time environment that also includes interpreted functions. Incremental compilation dates back to 1962 and
200 temp3 := z the first Lisp compiler, and is still used in Common Lisp systems. 4)Stage compiler that compiles to assembly language
300 temp4 := temp1 + temp3 of a theoretical machine, like some Prolog implementations: This Prolog machine is also known as the Warren
Removing redundant code gives abstract machine (or WAM). Byte-code compilers for Java, Python (and many more) are also a subtype of this. 5)Just-
temp1 := x in-time (JIT)compiler, used by Smalltalk and Java systems: Applications are delivered in bytecode, which is compiled
200 temp3 := z to native machine code just prior to execution 6) A retargetable compiler is a compiler that can relatively easily be
300 temp4 := temp1 + temp3 modified to generate code for different CPU architectures. The object code produced by these is frequently of lesser
Notes. Attempting to find a ‘best’ optimization is expensive for the following reasons: quality than that produced by a compiler developed specifically for a processor: Retargetable compilers are often also
cross compilers. GCC is an example of a retargetable compiler//A parallelizing compiler converts a serial input
program into a form suitable for efficient execution on a parallel computer architecture.

Syntax Analysis/Parsing: Syntax analysis is the second phase of compiler which is also called as parsing//Parser
converts the tokens produced by lexical analyser into a tree like representation called parse tree//A parse tree
describes the syntactic structure of the input//Syntax tree is a compressed representation of the parse tree in which
the operators appear as interior nodes and the operands of the operator are the children of the node for that
operator//Input: Tokens//Output: Syntax tree

Lexical Analysis or Scanning: Source program is scanned to read the stream of characters and those characters are
grouped to form a sequence called lexemes which produces token as output. Token: Token is a sequence of
characters that represent lexical unit, which matches with the pattern, such as keywords, operators, identifiers etc.
Lexeme: Lexeme is instance of a token i.e., group of characters forming a token. Pattern: Pattern describes the rule Semantic Analysis: Semantic analysis is the third phase of compiler.1/It checks for the semantic consistency.2Type
that the lexemes of a token takes. It is the structure that must be matched by strings. Once a token is generated the information is gathered and stored in symbol table.3Performs type checking//Semantic Analysis in English Example:
corresponding entry is made in the symbol table.Input: stream of characters. Output: Token//Token Template: Jack said Jerry left his assignment at home. //What does “his” refer to? Jack or Jerry? Even worse: Jack said Jack left
<token-name, attribute-value> (eg.) c=a+b*5; Explanation: his assignment at home? How many Jacks are there? Which one left the assignment? {
int Jack = 3;
{
int Jack = 4;
cout << Jack;
}
}
In context free grammars (CFGs), structures are independent of the other structures surrounding them. Backus-Naur
form (BNF) notation describes CFGs. Symbols are either tokens or nonterminal symbols. Productions are of the form
nonterminal → definition where definition defines the structure of a nonterminal. Rules may be recursive, with
nonterminal symbol appearing both on left side of a production and in its own definition. Metasymbols are used to
identify the parts of the production (arrow), alternative definitions of a nonterminal (vertical bar). Parse trees show
Functions Lexical Analyser 1) Grouping input characters into tokens 2)Stripping out comments and white spaces derivation of a structure from BNF E.g., number → DIGIT | DIGIT number. Abstract syntax trees (ASTs) encapsulate
3)Correlating error messages with the source program. Issues (why separating lexical analysis from parsing) the details. Very useful for converting between structurally similar forms.
@Simpler design @Compiler efficiency @Compiler portability (e.g. Linux to Win)

Intermediate Code Generation: Intermediate code generation produces intermediate representations for the source
program which are of the following forms: 1) Postfix notation 2) Three address code 3) Syntax tree. Most commonly
used form is the three-address code. t1 = inttofloat (5), t2 = id3* t1 t3 = id2 + t2, id1 = t3 . Properties of
intermediate code 1) It should be easy to produce. 2) It should be easy to translate into target program. 3Code
Lexical analysis: What do we want to do? Example: if (i == j)// Z = 0;//else//Z = 1;//The input is just a string of optimization phase gets the intermediate code as input and produces optimized intermediate code as output. 4It
characters: \t if (i == j) \n \t \t z = 0;\n \t else \n \t \t z = 1; Goal: Partition input string into substrings//Where the results in faster running machine code. 5It can be done by reducing the number of lines of code for a program. 6This
substrings are tokens. What is token: In English: noun, verb, adjective, … In a programming language: Identifier, phase reduces the redundant code and attempts to improve the intermediate code so that faster-running machine
Integer, Keyword, Whitespace, What are Tokens for: @Classify program substrings according to role @Output of code will result.7During the code optimization, the result of the program is not affected.To improve the code
lexical analysis is a stream of tokens . . .which is input to the parser @Parser relies on token distinctions @An generation, the optimization involves 1 Deduction and removal of dead code (unreachable code). 2 Calculation of
identifier is treated differently than a keyword. Tokens correspond to sets of strings. Example: Identifier: strings of constants in expressions and terms. 3Collapsing of repeated expression into temporary string. 4Loop unrolling.
letters or digits, starting with a letter//Integer: a non-empty string of digits //Keyword: “else” or “if” or “begin” or 5Moving code outside the loop. 6Removal of unwanted temporary variables. t1 = id3* 5.0, id1 = id2 + t1. CODE
…//Whitespace: a non-empty sequence of blanks, newlines, and tabs. TYPICAL TOKENS IN A PL: Symbols/operators: GENERATION: Code generation is the final phase of a compiler. It gets input from code optimization phase and
+, -, *, /, =, <, >, ->, … @Keywords: if, while, struct, float, int, … @Integer and Real (floating produces the target code or object code as result. Intermediate instructions are translated into a sequence of
point) literals 123, 123.45 @Char (string) literals @Identifiers @Comments @White space DEFINITIONS: Pattern: A machine instructions that perform the same task. The code generation involves1) Allocation of register and
rule that describes a set of strings @Token: A set of strings in the same pattern @Lexeme: The sequence of memory.2)Generation of correct references.3)Generation of correct data types.4)Generation of missing code. LDF R2,
characters of a token. Attributes for Token: If more than one lexeme can match the pattern for a token, the scanner id3/ MULF R2, # 5.0// LDF R1, id2// ADDF R1, R2 //STF id1, R1. for (int i=0; i<n; i++)//sum += sqrt(arr[i]); Symbol Table:
must indicate the actual lexeme that matched. This information is given using an attribute associated with the Symbol table is used to store all the information about identifiers used in the program//t is a data structure
token.Example: The program statement count = 123 yields the following token-attribute pairs: identifier, pointer to containing a record for each identifier, with fields for the attributes of the identifier//It allows finding the record for
the string “count” assg_op, the operator =  integer_const, the integer value 123 LEXICAL ERROR RECOVERY: A each identifier quickly and to store or retrieve data from that record//Whenever an identifier is detected in any of the
character sequence that can’t be scanned into any valid token is a lexical error. Lexical errors are uncommon, but they phases, it is stored in the symbol table.Criterion of code optimization: Must preserve the semantic equivalence of the
still must be handled by a scanner. We won’t stop compilation because of so minor error. Approaches to lexical error programs//The algorithm should not be modified//Transformation, on average should speed up the execution of the
handling include: Delete the characters read so far and restart scanning at the next unread character. Delete the program//Worth the effort: Intellectual and compilation effort spend on insignificant improvement//Transformations
first character read by the scanner and resume scanning at the character following it. Both of these approaches are are simple enough to have a good effect
reasonable. SPECIFICATION OF TOKENS: In theory of compilation regular expressions are used to formalize the
specification of tokens. Regular expressions are means for specifying regular languages, Example: Letter_(letter_ |
digit)* : more later Each regular expression is a pattern specifying the form of strings. A regular expression is a text
string that defines a character pattern. One use of regular expressions is pattern-matching, in which a text string is
tested to see whether it matches the pattern defined by a regular expression Expression!!! alphabet, string,
languages. SRINGS AND LANGUAGES: An alphabet is any finite set of symbols such as letters, digits, and punctuation.
The set {0,1) is the binary alphabet. If x and y are strings, then the concatenation of x and y is also string, denoted xy,
For example, if x = dog and y = house, then xy = doghouse. The empty string is the identity under concatenation; that
is, for any string s, ES = SE = s. A string over an alphabet is a finite sequence of symbols drawn from that alphabet. In
language theory, the terms "sentence" and "word" are often used as synonyms for "string." |s| represents the length
of a string s, Ex: banana is a string of length 6. The empty string, is the string of length zero. A language is any
countable set of strings over some fixed alphabet. Let L = {A, . . . , Z}, then{“A”,”B”,”C”, “BF”…,”ABZ”,…] is consider Classifications of Optimization techniques: Peephole optimization (Improve code by examining and changing a small
the language defined by L//Abstract languages like , the empty set, or//{},the set containing only the empty string, sequence (peephole) of code at a time.) Local optimizations (Optimizations within a basic block is called local
are languages under this definition. OPERATIONS IN LANGUAGES optimization) Global Optimizations (Optimizations across basic blocks is called global optimization.) Inter-procedural
Intra-procedural Loop optimization. Factors influencing Optimization: The target machine: machine dependent
factors can be parameterized to compiler for fine tuning. Architecture of Target CPU: Number of CPU registers, RISC
vs CISC, Pipeline Architecture, Number of functional units. Machine Architecture: Cache Size and
type,Cache/Memory transfer rate What is RISC vs CISC? RISC stands for 'Reduced Instruction Set Computer Whereas,
CISC stands for Complex Instruction Set Computer. The RISC processors have a smaller set of instructions with few
addressing nodes. The CISC processors have a larger set of instructions with many addressing nodes.
Themes behind Optimization Techniques Avoid redundancy: something already computed need not be computed
again** Smaller code: less work for CPU, cache, and memory! Less jumps: jumps interfere with code pre-fetch Code
locality: codes executed close together in time is generated close together in memory – increase locality of reference
Extract more information about code: More info – better code generation. Optimizing Transformations Compile time
evaluation
Example: Let L be the set of letters {A, B, . . . , Z, a, b, . . . , z ) and let D be the set of digits {0,1,.. .9). L and D are, Common sub-expression elimination//
respectively, the alphabets of uppercase and lowercase letters and of digits. other languages can be constructed 1. Code motion //
from L and D, using the operators illustrated above. 1. L U D is the set of letters and digits - strictly speaking the 2. Strength Reduction//
language with 62 (52+10) strings of length one, each of which strings is either one letter or one digit. 2. LD is the set of 3. Dead code elimination//
520 strings of length two, each consisting of one letter followed by one digit.(10×52). Ex: A1, a1,B0,etc 3. L4 is the 4. Copy propagation//
set of all 4-letter strings. (example: aaba, bcef) 4. L* is the set of all strings of letters, including e, the empty string. 5. 5. Loop optimization//
L(L U D)* is the set of all strings of letters and digits beginning with a letter. 6. D+ is the set of all strings of one or more
digits. The standard notation for regular languages is regular expressions. Atomic regular expression:

Compound regular expression:

larger regular expressions are built from smaller ones. Let r and s are regular expressions denoting languages L(r) and
L(s), respectively. 1. (r) | (s) is a regular expression denoting the language L(r) U L(s). 2. (r) (s) is a regular expression
denoting the language L(r) L(s) . 3. (r) * is a regular expression denoting (L (r)) * . 4. (r) is a regular expression denoting
L(r). This last rule says that we can add additional pairs of parentheses around expressions without changingthe
language they denote. for example, we may replace the regular expression (a) | ((b) * (c)) by a| b*c. EXAMPLE OF
REGULAR EXPRESSION: Regular expressions are all around you! Example: Phone Numbers Consider (650)-723-3232

You might also like