Cs5363 Intro
Cs5363 Intro
Cs5363 Intro
cs5363
Qing Yi Ph.D. Rice University, USA. Assistant Professor, Department of Computer Science Office: SB 4.01.30 Phone : 458-5671 Research Interests Compilers and software development tools program analysis&optimization for high-performance computing Programming languages type systems, different programming paradigms Software engineering systematic error-discovery and verification of software
cs5363
General Information
Class website
www.cs.utsa.edu/~qingyi/cs5363
Check for class handouts and announcements
Engineering a Compiler
Second Edition. By Keith Cooper and Linda Torczon. Morgan-Kaufmann. 2011. by Michael Scott, Second Edition, Morgan Kaufmann Publishers, 2006
Prerequisites
C/C++/Java programming Basic understanding of algorithms and computer architecture Exams (midterm and final): 50%; Projects: 25%; Homeworks: 20%; Problem solving (challenging problems of the week): 5%
Grading
cs5363
Outline
Compilation vs. interpretation Functional, imperative, and object-oriented programming What are the differences? Front end (parsing), mid end (optimization), and back end (code generation) Language implementation instead of design Compilation instead of interpretation
Focus of class
Algorithms analyzing properties of application programs Optimizations that make your code run faster
cs5363
Programming languages
Express data structures and algorithms Instruct machines what to do Communicate between computers and programmers Program input .... c = a * a; b = c + b; . High-level (human-level) programming languages .. 00000 01010 11110 01010 .. Low-level (machine-level) programming Program output languages
Better machine efficiency
cs5363
Translate programming languages to machine languages Translate one programming language to another Program input .... c = a * a; b = c + b; . Source code .. 00000 01010 11110 01010 .. Target code Program output Translation (compile) time cs5363 6 Run time
Compiler
Interpreter
Program output 7
Compilers
Compiled code can run many times Heavy weight optimizations are affordable Can pre-examine programs for errors Static analysis has limited capability Cannot change programs on the fly
Interpreters
Re-interpret every expression at run time Cannot afford heavy-weight optimizations Discover errors only when they occur at run time Have full knowledge of program behavior Can dynamically change program behavior cs5363 8
Programming Paradigms
Compute new values instead of modifying existing ones (disallow modification of compound data structures) Treat functions as first-class objects (can return functions as results, nest functions inside each other) Mostly interpreted and used for project prototyping (Lisp, Scheme, ML, Haskell, ) Emphasize machine efficiency (Fortran, C, Pascal, Algol,) Combined data and function abstractions Separate interface and implementation Support subtype polymorphism and inheritance Simila, C++, Java, smalltalk,
Led by John Backus around 1954-1956 Designed for numerical computations Introduced variables, arrays, and subroutines
Lisp
Led by John McCarthy in late 1950s Designed for symbolic computation in artificial intelligence Introduced high-order functions and garbage collection Descendents include Scheme, ML, Haskell, Led by a committee of designers of Fortran and Lisp in late 1950s Introduced type system and data structuring Descendents include Pascal, Modula, C, C++ Led by Kristen Nygaard and Ole-Johan Dahl arround 1961-1967 Designed for simulation Introduced data-abstraction and object-oriented design Descendents include C++, Java, smalltalk
Algol
Simula
cs5363
10
Categorizing Languages
Are these languages compiled or interpreted (sometimes both)? What paradigms do they belong?
Objectives of Compilers
Correctness: compilers must preserve semantics of the input program Usefulness: compilers must do something useful to the input program Compare with software testing tools---which must be useful, but not necessarily sound Does Does Does Does Does the the the the the compiled code run with high speed? compiled code fit in a compact space? compiler provide feedbacks on incorrect program? compiler allow debugging of incorrect program? compiler finish translation with reasonable speed?
Are they sound? Do they produce useful results? How fast do they run? How fast are the generated code? cs5363 12
Target program
Optimizer (mid end) --- improve the input program Back end --- generate output in a new language
Data-flow analysis, redundancy elimination, computation re-structuring Native compilers: executable for target machine Instruction selection and scheduling, register allocation
Front end
cs5363
14
Intermediate Representation
Source program for (w = 1; w < 100; w = w * 2); Parsing --- convert input tokens to IR
Abstract syntax tree --- structure of program forStmt assign less assign emptyStmt
The lexical analyzer (characters tokens) The syntax analyzer (tokens AST) Context-sensitive analysis (ASTsymbol tables)
cs5363
16
Data-flow analysis: where data are defined and used Dependence analysis: when operations can be reordered Useful for program understanding and verification
Redundancy elimination Improve data movement and instruction parallelism In program evolution, improve program modularity/correctness
cs5363
17
Memory management
Every variable must be allocated with a memory location Address stored in symbol tables during translation Assembly language of the target machine Abstract assembly (three/two address code) Most instructions must operate on registers Values in registers are faster to access Reorder instructions to enhance parallelism/pipelining in processors
Instruction selection
Register allocation
Instruction scheduling
Source-to-source translation
Program understanding --- output analysis results Code generation/evolution/optimization --- output in high-level languages cs5363 18
Roadmap
Regular expression and context-free grammar(wk1), NFA and DFA(wk2), top-down and bottom-up parsing(wk3), attribute grammar and type checking(wk4) Intermediate representation(wk5), procedural abstraction and code shape(wk6-7), instruction selection(wk8) Redundancy elimination(wk9), data-flow analysis and SSA(wk10), scalar optimizations(wk11), instruction scheduling(wk12), register allocation(wk13) Needs to parse input in a small language, perform type checking, perform some analysis/optimization, then output the result Intermediate projects are due by week 4, week 9, and week 11 respectively (dates will be posted at class web site) Understanding of concepts/algorithms: smaller size projects in scripting languages Enjoys programming and debugging: larger projects in C/C++/Java
Implementation choices:
cs5363
19