Symbol Table and Activation Records
Symbol Table and Activation Records
Symbol Table and Activation Records
Symbol Tables
1
Information in Symbol Table
Features of Symbol Table
Simple Symbol Table
Scoped Symbol Table
Conclusion
2
Introduction
• Essential data structure used by compilers to remember
information about identifiers in the source program
• Usually lexical analyzer and parser fill up the entries in the
table, later phases like code generator and optimizer make use
of table information
• Types of symbols stored in the symbol table include variables,
procedures, functions, defined constants, labels, structures etc.
• Symbol tables may vary widely from implementation to
implementation, even for the same language
3
Information in Symbol Table
• Name
– Name of the identifier
– May be stored directly or as a pointer to another character string in an associated string table –
names can be arbitrarily long
• Type
– Type of the identifier: variable, label, procedure name etc.
– For variables, its type: basic types, derived types etc.
• Location
– Offset within the program where the identifier is defined
• Scope
– Region of the program where the current definition is valid
• Other attributes: array limits, fields of records, parameters, return values etc.
4
Usage of Symbol Table Information
• Semantic Analysis – check correct semantic usage of language constructs,
e.g. types of identifiers
• Code Generation – Types of variables provide their sizes during code
generation
• Error Detection – Undefined variables. Recurrence of error messages can
be avoided by marking the variable type as undefined in the symbol table
• Optimization – Two or more temporaries can be merged if their types are
same
5
Operations on Symbol Table
• Lookup – Most frequent, whenever an identifier is seen it is needed to
check its type, or create a new entry
• Insert – Adding new names to the table, happens mostly in lexical and
syntax analysis phases
• Modify – When a name is defined, all information may not be available,
may be updated later
• Delete – Not very frequent. Needed sometimes, such as when a procedure
body ends
6
Issues in Symbol Table Design
• Format of entries – Various formats from linear array to tree structured
table
• Access methodology – Linear search, Binary search, Tree search, Hashing,
etc.
• Location of storage – Primary memory, partial storage in secondary
memory
• Scope Issues – In block-structured language, a variable defined in upper
blocks must be visible to inner blocks, not the other way
7
Simple Symbol Table
• Works well for languages with a single scope
• Commonly used techniques are
– Linear table
– Ordered list
– Tree
– Hash table
8
Linear Table
• Simple array of records with each record corresponding to an
identifier in the program
• Example:
int x, y Name Type Location
real z x integer Offset of x
... y integer Offset of y
proce z real Offset of z
dure abc procedure Offset of abc
abc
L1 label Offset of L1
...
L1:...
...
9
Linear Table
• If there is no restriction in the length of the string for the name of an
identifier, string table may be used, with name field holding pointers
• Lookup, insert, modify take O(n) time
• Insertion can be made O(1) by remembering the pointer to the next free
index
• Scanning most recent entries first may probably speed up the access –
due to program locality – a variable defined just inside a block is
expected to be referred to more often than some earlier variables
10
Ordered List
• Variation of linear tables in which list organization is used
• List is sorted in some fashion , then binary search can be used
with O(log n) time
• Insertion needs more time
• A variant – self-organizing list: neighbourhood of entries
changed dynamically
11
Self-Organizing List
• In Fig (a), Identifier4 is the most
recently used symbol, followed by
Identifier2, Identifier3 and so on
• In Fig (b), Identifier5 is accessed
next, accordingly the order changes
• Due to program locality, it is expected
that during compilation, entries near
the beginning of the ordered list will
be accessed more frequently
•This improves lookup time
12
Tree
• Each entry represented by a node of the tree
• Based on string comparison of names, entries lesser than a reference
node are kept in its left subtree, otherwise in the right subtree
• Average lookup time O(log n)
• Proper height balancing techniques need to be utilized
13
Hash Table
• Useful to minimize access time
• Most common method for implementing symbol tables in compilers
• Mapping done using Hash function that results in unique location in the table
organized as array
• Access time O(1)
• Imperfection of hash function results in several symbols mapped to the same
location – collision resolution strategy needed
• To keep collisions reasonable, hash table is chosen to be of size between n and 2n
for n keys
14
Desirable Properties of Hash Functions
• Should depend on the name of the symbol. Equal emphasis
be given to each part
• Should be quickly computable
• Should be uniform in mapping names to different parts of the
table. Similar names (such as, data1 and data2) should not
cluster to the same address
• Computed value must be within the range of table index
15
Compiler Design
Runtime Environment Management
16
What is Runtime Environment
Activation Record
Environment without Local Procedures
Environment with Local Procedures
Display
Conclusion
17
What is Runtime Environment
• Refers to the program snap-shot during execution
• Three main segments of a program
– Code
– Static and global variables
– Local variables and arguments
• Memory needed for each of these entities
– Generated code: Text for procedures and programs. Size known at compile time. Space can be
allotted statically before execution
– Data objects:
• Global variables/constants – space known at compile time
• Local variables – space known at compile time
• Dynamically created variables – space (heap) in response to memory allocation requests
– Stack: To keep track of procedure activations
18
Logical Address Space of Program
Code Static Heap Free Memory Stack
• Low High
19
Static vs. Dynamic Allocation
• Static: Compile time, Dynamic: Runtime allocation
• Many compilers use some combination of following
• Stack storage: for local variables, parameters and so on
• Heap storage: Data that may outlive the call to the procedure that created it
• Stack allocation is a valid allocation for procedures since procedure
calls are nested
Sketch of a quicksort program
Activation for Quicksort
Activation tree representing calls during
an execution of quicksort
Activation records
• Procedure calls and returns are usaully managed by a run-time stack
called the control stack.
• Each live activation has an activation record (sometimes called a
frame)
• The root of activation tree is at the bottom of the stack
• The current execution path specifies the content of the stack with the
last activation has record in the top of the stack.
A General Activation Record
Activation Record
• Temporary values
• Local data
• A saved machine status
• An “access link”
• A control link
• Space for the return value of the called function
• The actual parameters used by the calling procedure
Downward-growing stack of activation records
Designing Calling Sequences
• Values communicated between caller and callee are generally placed
at the beginning of callee’s activation record
• Fixed-length items: are generally placed at the middle
• Items whose size may not be known early enough: are placed at the
end of activation record
• We must locate the top-of-stack pointer judiciously: a common
approach is to have it point to the end of fixed length fields.
Division of tasks between caller and callee
calling sequence
• The caller evaluates the actual parameters
• The caller stores a return address and the old value of top-sp into the
callee's activation record.
• The callee saves the register values and other status information.
• The callee initializes its local data and begins execution.
corresponding return sequence
• The callee places the return value next to the parameters
• Using information in the machine-status field, the callee restores top-
sp and other registers, and then branches to the return address that
the caller placed in the status field.
• Although top-sp has been decremented, the caller knows where the
return value is, relative to the current value of top-sp; the caller
therefore may use that value.