Symbol Tables
Symbol Tables
Symbol Tables
Enables compiler to do type checking and determine/control scope of a variable There are no type compatibility constraints or scoping rules at run time
CS331 Symbol Tables
Scope
Scope rules of a language are used for specifying which declaration of a variable is associated with a specific occurrence of the variable Scope rules apply to variables, constants, new type definitions and functions
Static Scope
Also called Lexical Scope Lexical scope rules specify the association of variables with declarations based on just the examination of the source code The binding of variables to declarations is done at compile time Pascal and C use Static Scoping rules
CS331 Symbol Tables
Dynamic Scope
The binding of variables to declarations is done at run time Dynamic scoping can be achieved by copying a function verbatim at the place of call Pure Lisp and original Common Lisp, as well as Snobol and APL use Dynamic Scoping Rules
Static vs Dynamic
int i = 10; void fun1() { printf(Inside fun1%d\n, i); } void fun2() { int i = 20; fun1(); } int main() { fun1(); fun2(); fun1(); }
Static Scope
Inside fun1 10 Inside fun1 10 Inside fun1 10
Dynamic Scope
Inside fun1 10 Inside fun1 20 Inside fun1 10
CS331 Symbol Tables
// Bound Occurrence
Declaration specifies the type of the variable Definition allocates storage for the variable
Blocks
A set of statements enclosed within blocking symbols (BEGIN and END, { and }, etc.) is called a block (compound statement) Blocks nest inside other blocks Blocks are either disjoint or nested A block-structured language allows procedures/functions to nest within other procedures/functions
CS331 Symbol Tables
Most-Closely-Nested Rule
An occurrence of a variable is associated with the innermost enclosing declaration of that variable Alternatively, a variable is bound to a binding of that variable (is in the scope) until the enclosing block ends, provided there are no reint I = 1; declarations of that variable
void fun1() { printf(%d\n, I); { int I = 3; printf(%d\n, I); } }
Local Variables
Also called automatic variables Formal Parameters are essentially local variables of the function Actual Parameters are the items passed in
Lifetime
The lifetime of a local variable is the function activation The lifetime of a static variable is the whole program execution
CS331 Symbol Tables
Scope in C
The scope of a declaration in Cis either Local or Global A block in C can have declarations (only at the beginning, though) All storage for a procedure is allocated up front The scope extends from that point onwards until the end of the block
A variable can be declared anywhere in a block and not just in the beginning of the block The scope extends from that point onwards till the end of the block
CS331 Symbol Tables
Scope Resolution
The Scope Resolution Operator ( :: ) From a method, one can access a global variable (that has been redeclared locally).
Example
int i=0; // (or extern int i;) class TestC { public: int i ; TestC() { i = 10;} void fun1() { printf("In TestC fun1: %d\n", i); printf("In TestC fun1, Class: %d\n", TestC::i); printf("In TestC fun1, Global:%d\n, ::i); } void fun2(); } void TestC::fun2() { int i = 20; printf("Inside TestC fun2, Local i: %d\n", i); printf("Inside TestC fun2, TestC i: %d\n", TestC::i); printf("Inside TestC fun2, Global i: %d\n",::i); }
Example
#include <stdio.h> #include "TestC.cpp" int main() { int i = 18; TestC *test = new TestC(); test->fun1(); test->fun2(); } Inside fun1 of TestC : 10 Inside fun1 of TestC : Class : 10 Inside fun1 of TestC : Global: 0 Inside fun2 of TestC : Local i: 20 Inside fun2 of TestC : TestC i: 10 Inside fun2 of TestC : Global i: 0
CS331 Symbol Tables
Symbol Tables
Programming languages contain declarations + statements Declarations are non-executable
In fact, they are just compiler handwaving to make programming easier and enforce constraints
E.g. type consistency etc.
Symbol table is a database used by the compiler to maintain information about variables, procedures, etc.
CS331 Symbol Tables
Static Checking
Compiler enforces required declarations, type compatibility, etc. at compile-time
Machine code has no provision for any of this
Special requirements
A database, but with :
Speed : symbol table is accessed every time an ID or type is referenced
Table must be in memory
Ease of maintenance : Symbol table is the most complex data structure in the compiler Flexibility : a language like C does not limit the complexity of variable declarations
Must represent variables of arbitrary type Must be able to grow as symbols added
Support for duplicate entries : variables with the same name can exist at different nesting levels Ability to delete arbitrary elements/groups of elements (e.g. all variables local to a block)
CS331 Symbol Tables
Functions:
Lookup : find entry for a given name Insert : add an entry Delete : delete an entry
CS331 Symbol Tables
curly
hardy
laurel
LEVEL 1
All variables associated with a block can be deleted at one time by adding a constant to the stack pointer
CS331 Symbol Tables
Binary tree
Solves the search time and limited size problems Average search time (balanced binary tree) is logarithmic Tree size can grow dynamically Ease of insertion
Binary tree
Deletion of arbitrary nodes is difficult
But not an issue in ST!
Nodes for a given level inserted as a block Newer levels deleted before older ones
Most recently inserted are leaves if inner node at most recent level, all children at same level
All variables in a block can be removed by breaking links without rearranging tree (always at the end of a branch)
CS331 Symbol Tables
Example
laurel
hardy
moe
curly
larry
Break to delete level 3
house of representatives
Disadvantages of BT
It is common for programmers to declare variables in alphabetical order
Variables are added to the ST in order of declaration Degrades to linked list: search is linear
Solve at cost of greater insert and delete time by using height-balanced (AVL) tree BUT: shuffling can destroy the order that made deletions easy Collisions are hard to handle Solve at expense of lookup time
Each node has 2CS331 Symbol Tables name, one for fields: one for
moe
curly
Solution:
CS331 Symbol Tables
Hash Table
Several elements of the uncompressed array found at a single location in the compressed array Convert key index into a pseudo-random number
Use this to index compressed array
Collisions resolved by making each array element the head of a linked list of table
CS331 Symbol Tables
Hash Function
Most common, simplest:
String (key) treated as a number (e.g., add the numeric values for some or all characters in the name) MOD this number by the table size Ideally, a prime number
NAME A B NUMERIC VALUE 97 98 VALUE MOD TABLE SIZE 1 2
C
D
99
100
0
1
We want to get comparable chain length in the hash table Difficult to predict chain lengths from a given algorithm because they are determined by the words in the input
CS331 Symbol Tables
Symbol Attributes
Each piece of information associated with a name in a program is called an attribute
Language-dependent
Example: Arrays
Must associate dimensions, upper and lower bound of each dimension
Fortran: maximum of 3 dimensions, lower bound always 1 Most languages: many or unlimited dimensions
Need a pointer to a list of lower bound / upper bound pairs
A is dynamically allocated storage on run-time stack at execution time Compiler must generate code to compute N*2 at runtime, store in a temporary variable Instead of upper bound, store pointer to the ST entry for the temporary variable
CS331 Symbol Tables