Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Symbol Table and Activation Records

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 31

Compiler Design

Symbol Tables

1
 Information in Symbol Table
 Features of Symbol Table
 Simple Symbol Table
 Scoped Symbol Table
 Conclusion

2
Introduction
• Essential data structure used by compilers to remember
information about identifiers in the source program
• Usually lexical analyzer and parser fill up the entries in the
table, later phases like code generator and optimizer make use
of table information
• Types of symbols stored in the symbol table include variables,
procedures, functions, defined constants, labels, structures etc.
• Symbol tables may vary widely from implementation to
implementation, even for the same language

3
Information in Symbol Table
• Name
– Name of the identifier
– May be stored directly or as a pointer to another character string in an associated string table –
names can be arbitrarily long
• Type
– Type of the identifier: variable, label, procedure name etc.
– For variables, its type: basic types, derived types etc.
• Location
– Offset within the program where the identifier is defined
• Scope
– Region of the program where the current definition is valid
• Other attributes: array limits, fields of records, parameters, return values etc.

4
Usage of Symbol Table Information
• Semantic Analysis – check correct semantic usage of language constructs,
e.g. types of identifiers
• Code Generation – Types of variables provide their sizes during code
generation
• Error Detection – Undefined variables. Recurrence of error messages can
be avoided by marking the variable type as undefined in the symbol table
• Optimization – Two or more temporaries can be merged if their types are
same

5
Operations on Symbol Table
• Lookup – Most frequent, whenever an identifier is seen it is needed to
check its type, or create a new entry
• Insert – Adding new names to the table, happens mostly in lexical and
syntax analysis phases
• Modify – When a name is defined, all information may not be available,
may be updated later
• Delete – Not very frequent. Needed sometimes, such as when a procedure
body ends

6
Issues in Symbol Table Design
• Format of entries – Various formats from linear array to tree structured
table
• Access methodology – Linear search, Binary search, Tree search, Hashing,
etc.
• Location of storage – Primary memory, partial storage in secondary
memory
• Scope Issues – In block-structured language, a variable defined in upper
blocks must be visible to inner blocks, not the other way

7
Simple Symbol Table
• Works well for languages with a single scope
• Commonly used techniques are
– Linear table
– Ordered list
– Tree
– Hash table

8
Linear Table
• Simple array of records with each record corresponding to an
identifier in the program
• Example:
int x, y Name Type Location
real z x integer Offset of x
... y integer Offset of y
proce z real Offset of z
dure abc procedure Offset of abc
abc
L1 label Offset of L1
...
L1:...
...
9
Linear Table
• If there is no restriction in the length of the string for the name of an
identifier, string table may be used, with name field holding pointers
• Lookup, insert, modify take O(n) time
• Insertion can be made O(1) by remembering the pointer to the next free
index
• Scanning most recent entries first may probably speed up the access –
due to program locality – a variable defined just inside a block is
expected to be referred to more often than some earlier variables

10
Ordered List
• Variation of linear tables in which list organization is used
• List is sorted in some fashion , then binary search can be used
with O(log n) time
• Insertion needs more time
• A variant – self-organizing list: neighbourhood of entries
changed dynamically

11
Self-Organizing List
• In Fig (a), Identifier4 is the most
recently used symbol, followed by
Identifier2, Identifier3 and so on
• In Fig (b), Identifier5 is accessed
next, accordingly the order changes
• Due to program locality, it is expected
that during compilation, entries near
the beginning of the ordered list will
be accessed more frequently
•This improves lookup time

12
Tree
• Each entry represented by a node of the tree
• Based on string comparison of names, entries lesser than a reference
node are kept in its left subtree, otherwise in the right subtree
• Average lookup time O(log n)
• Proper height balancing techniques need to be utilized

13
Hash Table
• Useful to minimize access time
• Most common method for implementing symbol tables in compilers
• Mapping done using Hash function that results in unique location in the table
organized as array
• Access time O(1)
• Imperfection of hash function results in several symbols mapped to the same
location – collision resolution strategy needed
• To keep collisions reasonable, hash table is chosen to be of size between n and 2n
for n keys

14
Desirable Properties of Hash Functions
• Should depend on the name of the symbol. Equal emphasis
be given to each part
• Should be quickly computable
• Should be uniform in mapping names to different parts of the
table. Similar names (such as, data1 and data2) should not
cluster to the same address
• Computed value must be within the range of table index

15
Compiler Design
Runtime Environment Management

16
 What is Runtime Environment
 Activation Record
 Environment without Local Procedures
 Environment with Local Procedures
 Display
 Conclusion

17
What is Runtime Environment
• Refers to the program snap-shot during execution
• Three main segments of a program
– Code
– Static and global variables
– Local variables and arguments
• Memory needed for each of these entities
– Generated code: Text for procedures and programs. Size known at compile time. Space can be
allotted statically before execution
– Data objects:
• Global variables/constants – space known at compile time
• Local variables – space known at compile time
• Dynamically created variables – space (heap) in response to memory allocation requests
– Stack: To keep track of procedure activations

18
Logical Address Space of Program
Code Static Heap Free Memory Stack
• Low High

• Code occupies the lowest portion


• Global variables are allocated in the static portion
• Remaining portion of the address space, stack and heap are allocated
• from the opposite ends to have maximum flexibility

19
Static vs. Dynamic Allocation
• Static: Compile time, Dynamic: Runtime allocation
• Many compilers use some combination of following
• Stack storage: for local variables, parameters and so on
• Heap storage: Data that may outlive the call to the procedure that created it
• Stack allocation is a valid allocation for procedures since procedure
calls are nested
Sketch of a quicksort program
Activation for Quicksort
Activation tree representing calls during
an execution of quicksort
Activation records
• Procedure calls and returns are usaully managed by a run-time stack
called the control stack.
• Each live activation has an activation record (sometimes called a
frame)
• The root of activation tree is at the bottom of the stack
• The current execution path specifies the content of the stack with the
last activation has record in the top of the stack.
A General Activation Record
Activation Record
• Temporary values
• Local data
• A saved machine status
• An “access link”
• A control link
• Space for the return value of the called function
• The actual parameters used by the calling procedure
Downward-growing stack of activation records
Designing Calling Sequences
• Values communicated between caller and callee are generally placed
at the beginning of callee’s activation record
• Fixed-length items: are generally placed at the middle
• Items whose size may not be known early enough: are placed at the
end of activation record
• We must locate the top-of-stack pointer judiciously: a common
approach is to have it point to the end of fixed length fields.
Division of tasks between caller and callee
calling sequence
• The caller evaluates the actual parameters
• The caller stores a return address and the old value of top-sp into the
callee's activation record.
• The callee saves the register values and other status information.
• The callee initializes its local data and begins execution.
corresponding return sequence
• The callee places the return value next to the parameters
• Using information in the machine-status field, the callee restores top-
sp and other registers, and then branches to the return address that
the caller placed in the status field.
• Although top-sp has been decremented, the caller knows where the
return value is, relative to the current value of top-sp; the caller
therefore may use that value.

You might also like