Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

Computer Science Notes

The document contains comprehensive notes on computer science topics, particularly focusing on programming principles in Scheme, evaluation processes, and data structures. It covers concepts such as procedures, abstraction, recursion, and digital circuit design, providing definitions and examples throughout. The notes are intended for educational purposes and may not be guaranteed for accuracy or completeness.

Uploaded by

securenetcyber
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Computer Science Notes

The document contains comprehensive notes on computer science topics, particularly focusing on programming principles in Scheme, evaluation processes, and data structures. It covers concepts such as procedures, abstraction, recursion, and digital circuit design, providing definitions and examples throughout. The notes are intended for educational purposes and may not be guaranteed for accuracy or completeness.

Uploaded by

securenetcyber
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 214

COMPUTER SCIENCE NOTES

This pdf contains notes I, Norbert Ochieng, have taken while studying. These notes are
shared here in the hope that they may be useful to others, but I do not guarantee their
accuracy, completeness, or whether they are fully up-to-date.

SEMESTER 1 PART A
CHAPTER 1:
Principles of Programming
Introduction to Scheme
Scheme uses a Read-Eval-Print loop:

In Scheme, numbers are expressions known as primitive expressions. Other expressions are
compound (or combination) expressions.

See also: SICP 1.1.1

Compound expressions always have the form:

👻👻moodswings👻👻
Scheme uses prefix notation, for example: (+ 7 6) gives 13. Expressions can be also be
nested. 2 + 3 * 4 is equivalent to (+ 2 (* 3 4)) in Scheme.

The value of a compound expression is the result of applying the procedure specified by the
operator to the operands.

To avoid typing out long expressions multiple times when applying different procedures to it,
you can use definitions, for example: (define name value). This

 Creates a new variable - allocates a memory location where values are stored.
 Gives the location a name and links the name to that variable.
 Puts the value in that location.

Variables are also primitive expressions, but they can be procedures.

Scheme stores the variables in a table of names and values called the environment.

SICP 1.1.3, 2.3.1

Evaluation is recursive. To evaluate a compound expressions:

1. Evaluate each sub-expression


2. Apply the values of the first sub-expression to the latter.

The first value of an expressions must be a procedure.

Procedures and Abstraction

SICP p62

Procedures are defined using lambda. The value of (lambda) is the procedure, which when
applied to a list of operand values evaluates body with substitutions for the corresponding
parameters.

(lambda (param1 param2 ... paramn) (body) )

The params are zero or more parameters. body is an expression which refers to the
parameters.

An alternative is: (define (name param1 param2 ... paramn) (body) ).

SICP 1.1.4

There are many different ways of defining equivalent procedures, so what is the best way to
do so? Not using abstraction.

👻👻moodswings👻👻
Abstraction is the generalisation of a method, e.g., defining double and then using that rather
than (* 2 6). Abstraction is our fundamental design methodology.

What Happens During Evaluation?

Take for example this code:

(define (double x) (* 2 x))


(define (triple x) (+ x (double x)))

SICP p13/14

Scheme uses the substitution model for evaluation, so evaluating some simple code goes like
this:

(triple 5)

(+ 5 (double 5))
(+ 5 (*2 5))
(+ 5 10)

15

Choice

either/or

SICP 1.1.6

You can do either/or choices like so: (if expression then else).

expression is evaluated to true or false.


then is the value of the if expression if the expression is true
else is the value of the if expression if the expression is false.

#f is false, and #t is true, however anything that isn't #f is considered to be true.

To add more power to choice, you can use logical composition operations.

(and expr1 expr2 ... exprn)

This is evaluated left to right, until one is false or all are true. This is different to the standard
Scheme operation.

(or expr1 expr2 ... exprn)

This also evaluates left to right, until one is true or all are false.

👻👻moodswings👻👻
(not expr)

This returns the opposite value to expr.

One of several

SICP p17

(cond (pred1 expr1) (pred2 expr2) ... (predn exprn) )

The value of (cond) is the value of the first expression whose predicate evaluates to true. If
the predicate is #t, this is always true and is a useful "catch-all" situation.

Evaluation

SICP ex1.5

Applicative vs Normal Order

Normal order substitutes expressions for parameters until left with primitive procedures only.
Applicative does not evaluate the operands until the values were needed. Instead, it
substitutes operand expressions for parameters until it obtained an expression involving only
primitive operators and then evaluates it.

Recursive Definition

All follow the same general pattern. It consists of both a base case (or cases), where the value
does not refer to the procedure and a recursive case, where the value uses the procedure.
Which case it is is determined by the values of the parameters.

(sequence) allows you to chain together multiple procedures, and the value is the value of the
last expression.

(print) is a special form with a side-effect. It prints a value to the terminal but does not effect
the environment in any way.

Compound Data

Procedures Data
primitive numbers, symbols,
+ - quote
(built in) booleans
evaluate to « evaluates
» compound to
(lambda) (built from primitive and other ?
compound)

As the ? suggests, you can have compound data, such as lists.

👻👻moodswings👻👻
Lists

SICP 2.2.1, p101 (footnote 10)

There are 2 forms of lists, empty and non-empty. Non-empty lists are values followed by
another list. An empty list is the value nil.

Constructors

(cons) allows you to create lists, like so: (cons expr list). This creates a list consisting of the
value of expr followed by list. Lists can also have other lists as values (sublists)

Selectors

(car list) returns the first element (head) of a list.

(cdr list) returns the list without the first element (tail).

There are also shortcuts, such as (cadr list), which is equivalent to (car (cdr list)). There are
also (caddr list), (cadar list), etc.

There are also some tests (predicates) which can be applied to lists. (null? expr) is true is
the expr is nil. (list? expr) is true if expr is a list.

Useful list procedures

SICP p100

(list expr expr)

Creates a list containing the values of the expressions

SICP p103

(append list1 list2)

Creates a list containing all the elements of list1, followed by all of the elements of list2

SICP p102

(length list)

This is the number of elements in the list.

Local Procedure Definitions

👻👻moodswings👻👻
The body of a procedure definition can take the form:

(define ...) (define ...) (expr)

The definitions bind names to values. Then expr is evaluated in a local environment in which
these bindings are valid.

SICP 1.1.8

expr is said to be in scope of these bindings.

(let ((name value) (name value) ...) expr)

The value of the (let) expression is the value of expr in the environment in which each name
corresponds to a value. This avoids evaluating the same code repeatedly.

SICP p63-66

N.B. expr is any expression, including another (let). (let)s can be nested.

Computational Processes
A process is represented by a sequence of operations that are generated by a procedure when
it is evaluated.

Memory is stored in the stack. The stack space required for our our factorial program is
proportional to n, where n is the number to be computed. We say this is of order n, or O(n).
The time complexity is also O(n)

SICP 1.2.3
ADS

O(something) resource means roughly: "resource is used up no faster than something as the
problem size increases"

Factorial Iteratively

👻👻moodswings👻👻
To do factorial iteratively, we need to keep a note of the answer so far and a counter which
goes up from 1. This gets passed as parameters in our recursion. When counter > n, the final
answer is the same as the answer so far. This has a space complexity is O(1). Any procedure
that has a space complexity of O(1) is called iterative.

Tail Recursion

SICP p35

A procedure that is tail-recursive if the recursive call is not a sub-expression of some other
expression in the procedure body. Informally, something is tail recursive is a recursive call is
the last thing done by the procedure. Tail recursive procedures generate iterative processes
(O(1) space complexity)

Complexity

SICP 1.2.2

Time Space
Recursive O(kn) O(kn) "exponential"
Iterative O(n) O(1)

Abstraction

See: Dictionary

Abstract (adjective)

1. Separated from matter, practice or particular examples.


2. Ideal or theoretical way of regarding things

Abstract (verb)

remove, summarise

Abstraction (noun)

Process of stripping an idea from its concrete accompaniment

First class elements

SICP 1.3.1-1.3.3, p58

👻👻moodswings👻👻
An element is a first class element in a language if it can be:

 named by variables
 passed as arguments to procedures
 returned as the results of procedures
 included in data structures

Abstract Data Types


ADTs are defined by a set of interface procedures.

 Constructors - make instances of data


 Selectors - returns value of the components of the data
 Type predicates - test whether something is an instance of the data type

For example, lists are a built in ADT in Scheme. You have constructors (cons, nil), selectors
(car, cdr) and type predicates (list?, null?) .

Using this type of abstraction allows you to split the load of creating a system.

Types
A type is a set of values (and operations, procedures, etc - e.g., eq? a b). x :: t means x is a
type of t. e.g., in Scheme:

numbers 42 10.5 -7 1e-4


symbols 'h 'thing
booleans #t #f
pairs (2 . 3) (cons 2 3)
procedures #<closure ... >
strings "hello world"
nil '()
unspecified #<unspecified>
characters

The union of all of these types could be called Scheme values.

Type Constructors

Product: A x B is the type of a pair of values, the first of type A and second of type B.
Union: A + B is the type of a value of type A or of type B.
Function: A -> B is the type of a procedure which takes a value of type A and returns a value
of type B.

(+ a b)

(Number x Number) --> Number

👻👻moodswings👻👻
(sigma f a b)

sigma :: (number --> number) x number x number --> number

(combine-with operator identity f a b)

combine-with :: ((number x number) --> number) x number x (number --> number) x number
x number --> number

ax2 + bx + c

quadratic :: number x number x number --> (number --> number)

Type Checking

Static

 All types checked before program is run (compile time)


 Most languages do this
 This is safer and has no runtime overhead

Dynamic

 Types checked when an expression is evaluated (at run-time)


 Scheme does this
 This is more flexible

Functional languages (Haskell, ML, etc) exist that combine both. They have very powerful
static type checking, the best of both worlds.

Representing Objects
Assignment

SICP 3.1

(set! name expression)

Changes the value bound to a name to the value of an expression. The name must already be
defined and exist in the environment which set! is evaluated.

The Substitution Model

SICP 1.1.5

To evaluate a compound procedure, evaluate the body of the procedure with each formal
parameter replaced by the value of the corresponding argument.

👻👻moodswings👻👻
SICP 3.2

There are problems with global variables to store state. There may be multiple instances of
items and global variables decrease modularity. To work round this, you can encapsulate
state within an object. An object must have internal state and be active. You tell the object
what to do with the state rather than do it directly.

In Scheme, objects are represented by procedures with local state variables.

There are 4 programming paradigms:

 functional
 imperative
 logic
 object-orientated (sometimes called message-passing)

The Environment Model of Evaluation

The substitute model doesn't work when assignment is involved.

An environment is made of frames, where frames are tables of bindings (names and values)
and a pointer to the enclosing environment. In a frame, a variable can be bound, unbound, the
first binding or a "shadow" binding (overriding any earlier bindings).

A procedure is the code and pointer to an environment. This is the environment in which the
lambda expression defining the pointer was evaluated.

e.g., "blood-donor" is an object which has state (quantity of blood) and responds to "drain"
requests. A constructor of blood donors takes a quantity of blood and returns a blood donor
object. This object is itself a procedure which responds to requests (drain, etc) and then
returns the procedure to do that event, which itself may take multiple arguments.

An object needs to be defined with initial states, the object procedure and request handling
procedures. The object constructor should return the object.

Mutators

Primitive Data Compound Data


Creation 1, 'x, #f Constructors cons
Selectors car, cdr
Assignment set! Mutators set-car!, set-cdr!

Pointers
Box and Pointer Diagrams

👻👻moodswings👻👻
SICP 2.2, 2.3

(cons x y) creates a new pair

The contents of the pair are pointers.

A list terminator can be represented as follows:

e.g., '(1 2 3) is the short form of (cons 1 (cons 2 (cons 3 (nil))))

Equality

SICP p257-258

=, two numerical values are the same


eq?, two pointers are the same
equal?, two structures are the same (here, it is the representation of the structures that are
compared)

Say we create 2 structures, (cons x 'a 'b) and (cons z1 x x)

👻👻moodswings👻👻
We can now do (set-car! (cdr z1) 'boom) which changes 'a in the structure to 'boom. This now
makes z1 (('boom 'b) 'boom 'b) and x is now ('boom 'b). However, if we (cons z2 (list 'a 'b)
(list 'a 'b)), this makes z1 equal? to z2, but now (set-car! (cdr z2) 'boom) gives us (('a 'b)
'boom 'b)

Association List

An association list is a list of records, each of which consists of a key and a value. Each
record is a pair formed by (cons key value).

e.g., ((answer . 23) ('y . #t) ('z . "Hello")) associates answer with 23, y with #t and z with
"Hello".

The procedure (retrieve key associations) returns the first pair in associations which has key
in its car or returns nil if there is no such record.

Tables

SICP 3.3.3

To create a one-dimensional table as an association list with a header pair.

👻👻moodswings👻👻
To add a new element, set the cdr of the new list element to the pointer from the header pair
and then change the header pair pointer to point at the new element.

👻👻moodswings👻👻
CHAPTER 2:
Introduction to Digital Circuit Design
Digital Circuits
 Constructed from discrete state components
 Inputs and outputs can only have two possible states
 They are called logic elements

Logic states can be referred to as: 1 and 0; True and False; On and Off. All are equivalent to
each other, but we tend to use 1 and 0 in this strand.

Physical Representation of States

Logic states are electrically represented by 2 voltage levels. For TTL, these voltage levels are
approximately 5V and 0V.

There are two representation conventions: positive logic and negative logic.

In positive logic, 5V is logic 1 and 0V is logic 0. In negative logic, the inverse is true; 5V is
logic 0 and 0V is logic 1. In this strand, we tend to use the positive logic convention.
Voltages are in respect to earth. High is considered to be logic 1 and low is logic 0.

Simple Gates

AND

out = A.B

A B out
L L L
L H L
H L L
H H H

OR

out = A + B

👻👻moodswings👻👻
A B out
L L L
L H H
H L H
H H H

NOT

out = A

A out
L H
H L

Three-input gates

Three input gates do exist, they are basically two 2-input gates chained together.

NAND

out = A.B

A B out
L L H
L H H
H L H
H H L

NOR

out = A + B (this is not as the same as out=A + B)

A B out
L L H
L H L
H L L
H H L

👻👻moodswings👻👻
XOR

out = AB + AB

A B out
L L L
L H H
H L H
H H L

Drawing Conventions
Normally, inputs are on the top and left of a piece of paper, and outputs are on the bottom and
right.

These wires are unconnected.

These wires are connected

A combinatorial circuit is one whose outputs are entirely dependent on the current state of the
inputs. All gates also act as buffers.

Dual in-line packages


These hold many types of gate and are connected via pins on either side. There are also 2
special pins, Vcc and GND, which provide the power for the chip. Plastic DIPs are low heat,
but cheap. Ceramic DIPs are more expensive but can be opened under heat. They are also
more durable.

Boolean Algebraic Manipulation

👻👻moodswings👻👻
Logical Operations on Constants

NOT

1=0
0=1

AND

0.0 = 0
0.1 = 0
1.0 = 0
1.1 = 1

OR

0+0=0
0+1=1
1+0=1
1+1=1

Logical Operations on Variables

AND

A.0 = 0
A.1 = A

OR

A+0=A
A+1=1

NOT

A.A = 0
A+A=1

All TTLs float high by default.

Commutation

A+B=B+A
A.B = B.A

Association

A + B + C = A + (B + C) = (A + B) + C
A.B.C = A.(B.C) = (A.B).C

👻👻moodswings👻👻
de Morgan's Theorem

A + B = A.B
A.B = A + B

Some Logical Manipulation

A.(B + C)
= A.B + A.C

(A + B).(C + D)
= A.C + B.C + B.D + A.D

A.B + A.B
= A.(B + B)
=A

A.(A + B)
= A + A.B
= A.(1 + B)
=A

XOR

C = A.B + A.B

De Morgan's Law

Y.Z = Y + Z

Terminology
Product Term

A single variable or logical product of several variables. E.g., A, X, A.B.C. This is basically
the AND function. Note that A.B.C is not a product term.

Sum Term

Sum term is the single variable or the logical sum of several variables. The variables may be
in true, or complemented, form. E.g., A + B + C, etc... This is the OR function. Note that here
also, A + B + C is not a sum term.

Sum-of-products

This is product terms added together., e.g., A.B.C + Q.R.S + A.Q. Note that A.B.C
+ X.Y.C is not a valid sum-of-products form.

👻👻moodswings👻👻
Product-of-sums

This is the sum of several terms multiplied together, e.g., (A + B + C).(X + H + J). Note that
(A + B + C).(C + D) is not in product of sums form.

Canonical Forms

The first canonical form

If each variable is in the true of complimentary form and it appears in each term of the sub-
products, then it is known as the canonical sum-of-products and each term is a minterm.

The second canonical form

This similar to the first canonical form, but each variable in a product-of-sums form and each
term becomes a maxterm.

Minterm

A minterm is a product term which contains each variable in complimentary form. When
used in the canonical sum-of-products, the minterm represent an input condition that causes
the output to be 1.

Maxterm

A sum term which contains each variable in complimentary form. When used in the
canonical product-of-sums the maxterm represents an input condition which causes the
function to be 0.

Karnaugh Map
This is based on boolean algebra and is another method of minimisation.

C\AB 00 01 11 10
0 0 0 0 0
1 1 0 1 0

The order of bits on the top row is important. Only one bit can change between columns.

This is essentially a re-arranged truth table. Variables which appear together horizontally or
vertically are logically adjacent.

If there are 2n maxterms, n is the number of rows that can be looped. The member of grouped
minterms must be a power of 2.

 Make loops as big as possible


 Choose fewest loops possible
 Include all minterms

👻👻moodswings👻👻
2n minterms are logically adjacent if there are n bits changed.

Five and six variable karnaugh maps can occur - these are represented in 3 dimensions.

CD\AB 00 01 11 10
00 1 1
01
11
10 1 1

CD\AB 00 01 11 10
00 1 1
01
11
10 1 1 1

E is on top of E.

The same applies to a six-variable Karnaugh Map, which looks like this:

Maxterms can be looped up in a similar way to minterms, but are inverted.

Prime Implicants

👻👻moodswings👻👻
These are the biggest adjacent terms which can be looped together. Single isolated implicants
are also prime implicants.

Essential Prime Implicants

This is a prime implicant that contains a minterm not included in any other prime implicants.
Isolated minterms are also essential prime implicants. An essential prime implicant must be
included in the final expression.

Quine-McCluskey Minimisation
1. Find all logically adjacent minterms to produce implicants - Tabulate all the
minterms from the expressions and re-order them so that all the minterms without
any 1's are together, the minterms with one 1 are together, etc. Then you need to
write down pairs of logically adjacent minterms, these will give you the
implicants. Replace the bits that make them logically adjacent with '-'
2. Find all logically adjacent implicants to produce prime implicants. Repeat for all
possible prime implicants - Find all logically adjacent implicants from the last
step using the same process. Repeat until you have all adjacancies.
3. Use a prime implicant table to determine essential prime implicants - From the
previous step, any implicants that can not be reduced any further are prime.
4. Select the minimum number of additional prime implicants to produce minimum
expression - Plot a table of the prime implicants against the original minterms.
Columns that only have one tick are essential.
5. Choose best expression based on implementation issues

Quine-McCluskey is algorithmic. It is tedious and error prone when done by hand. However,
it can be automated, and is guaranteed to find the set of minimal solutions. It works for
maxterms as well as minterms.

Hardware Realisation
IC realisation requires minimising the number of gates. PCB realisation involves minimising
the number of packages (therefore minimising the number of gates and gate types).

 OR-AND realisations, this is a minimised sum-of-products


 AND-OR realisations, this is a minimised product-of-sums.
 All NAND realisations. In an AND-OR realisation, replace all gates with NAND
gates, then any input that goes directly to the second stage of gates (OR) needs to
be inverted (which can be done using NAND gates).
 All NOR realisations. Replace all gates in an OR-AND realisation with NOR
gates, and any input that goes directly to the stage of gates (AND) needs to be
inverted. This can be done using a NOR gate.

In negative logic, an AND gate is an OR gate and an OR is an AND. When a NAND gate has
positive logic inputs and negative logic outputs, it behaves like an AND. Similarly, is an OR
gate has negative logic on inputs and positive logic on outputs, it behaves like an OR.

Map-Factoring (Inhibiting Functions)

👻👻moodswings👻👻
This can be used when you nearly have a minterm, but there is one maxterm blocking it. If
you realise a design assuming that the maxterm is indeed a minterm (and you then have a
prime implicant) you can then add in an inhibiting function which stops the circuit being true
unless the maxterm is false, making your assumption work as intended.

However, this increases gate delay, but does tend to lead to fewer gates or packages being
used. It can also make it less obvious what a design is for.

Designing multiple-output circuits

You need to treat each circuit as if it were being generated by an independent circuit and look
for common terms.

Don't Cares

Don't cares are input conditions that will never occur under normal operations and are
marked as output X. You can treat a don't care as either a max-term or a min-term, whichever
is more convenient for you. So, for example, you can loop a don't care with minterms or
maxterms to create a more minimised expression. Don't cares by themselves are not looped
however.

You need to take care, however, incase a don't care term does occur (for example during the
initialisation of flip-flops, etc).

Design Considerations

Which hardware implementation to use? PCB, IC, PLD? Which device


technology? TTL, CMOS transistors, ECL? Hardware environment? Temperature, radiation,
pressure, vibrations, etc...

You need to minimise gates and packages, the gate layers (circuit delay), the number of
interconnects between gates and between packages, maintenance costs, power consumption,
weight, design costs, production costs, hazardous behaviour.

Design Steps

Check each stem!

1. Obtain requirements - an imprecise statement of objectives


2. Map requirements into a formal specification - truth table, etc
3. Design the circuit - use minimisation
4. Realise the circuit - consider any further minimisation
5. Analyse the circuit - either by hand or on computer. Allow for production
and environmental consideration.
6. Prototype the circuit - check it on the lab under full range of conditions
7. Test

Electrical Considerations

👻👻moodswings👻👻
Logic Families Standard spec (e.g.,) Military spec (e.g.,)
Old Standard SN7400 SN5400
High Speed SN74H00 SN54H00
Low Power SN74L00 SN54L00
Schottky SN74S00 SN54S00
Low Power Schottky SN74LS00 SN54LS00
Advanced Schottky SN74AS00 SN54AS00
Advanced Low Power Schottky SN74ALS00 SN54ALS00

Temperature ranges: Standard: 0 °C - 70 °C; Military -55 °C - 125 °C

Power Supply

Gates are supplied by power from a power supply via a power rail known as Vcc and ground.
This power rail is implied and not actually show on circuit diagrams. In a dual-inline
package, powering the package automatically powers all the gates.

Normally, the power supply is 5 V. For standard specification gates, the allowed variances
are ±0.25 V and for military this is ±0.5 V. There is also an absolute voltage rating, above
which the gate burns out. This is approximately 7 V.

Totem Pole Output

👻👻moodswings👻👻
Output is high when Q1 is on and Q2 is off, inversely, output is low when Q1 is off and Q2 is
on. During transitions, Q1 and Q2 both conduct, but current is limited by R1, this causes a
"spike" to be seen on the supply rail and is known as electrical noise. The spike is caused by
the sudden increase and then decrease in current required by the gate. Capacitors are evenly
spread around a PCB according to some in-house rule of thumb. These "decoupling"
capacitors are connected between supply rail and earth, supplying instantaneous current
which the transistors need.

Fan-out

When the interconnecting node is low, current flows out of the second gate into the first one.
The inverse happens when the interconnect is high.

In a data book, the following notations are used to signify currents:

 IOL - output when low


 IOH - output when high

👻👻moodswings👻👻
 IIL - input when low
 IIH - input when high

Unused Inputs

NAND and NOR gates can be used as inverters. 4-input gates can be used by 2 variables
only, etc... However, TTL inputs float high and CMOS floats low. For a TTL 4-input AND:
ABC1 = ABC, but for a 4-input OR: A + B + C + 1 = 1, which is a tautology.

Unused inputs are susceptible to electrical noise and may slow down gate operation.

This is two different ways to make a 3-input gate work with two inputs (generating logic 0).
The bottom method is preferred.

To generate logic 1, you can do something like this:

The bottom circuit can only be used with low-power Schottky, however. For other devices,
you can tie them to the Vcc rail using a 1k Ω resistor.

TTL Voltage Levels

Output Input
Logic 1 2.4-5.0 V 2.0-5.0 V
Logic 0 0.0-0.4 V 0.0-0.8 V

Levels of Integration

👻👻moodswings👻👻
SSI - small scale integration. 1 - 20 gates, up to 100 transistors, few gates and flip flops.

MSI - medium scale integration. 20 - 200 gates, functional building blocks.

LSI - large scale integration. 200 - 200,000 gates, PLDs and early microprocessors.

VLSI - very large scale integration. 500,000+ gates, 32-bit microprocessors, etc...

Propagation Delay

The propagation delay of a gate is the time it takes for the gate output to change in response
to a change on its input. High-to-low delays may differ from low-high-delays.

tPHL is the time between specified reference points on the input and output voltage waveforms
with the output changing from the defined high level to the defined low level.

tPLH is time between specified reference points on the input and output voltage waveforms
with the output changing from the defined low level to the defined high level.

A static hazard is where there's a change from minterm to minterm (static-1) or maxterm to
maxterm (static-0) and a "blip" occurs. A dynamic hazard is where there's a hazard in a
change between max- and min-terms.

Buffers help mask propagation delays and can decrease hazards. This, however, isn't the best
solution to hazards. Waveform analysis is a better indicator for predicting hazards, but it may
not be accurate in reality.

System Organisation

These are dedicated buses.

A bus is a set of wires designed to transfer all bits of a word from a source to a destination.

👻👻moodswings👻👻
This is a shared bus.

Multibus (IEEE 796)

This standard defines a bus standard. There are 86 wires (16 bit bi-directional data bus, 24 bit
address bus, 26 bit control bus (used for data-transfer commands, handshaking, etc) and a 20
bit power and ground line bus (8 GND, 8 +5 V, 2 +12 V, 2 -12 V)).

Open Collector Devices

These type of devices can accept voltages of ±15 V on the Vcc rail, and can therefore sink
higher voltages. They are typically indicated by a star over the end of the gate. Because of
this, you can do things like this:

The value of R is calculated according to a formula specified by the manufacturer. The value
must be re-calculated every time an input or output is added or removed.

Wired-AND Gate

You can also do things like this to emulate AND gates:

👻👻moodswings👻👻
Which, by inspection, gives us an AND gate.

Tristate Devices

Tristate devices have 3 output states

 TTL logic 1 state


 TTL logic 0 state
 A high-impedance state

A tristate device can sink and drive large currents than TTL.

Bus Driver/Receiver

Of all the gates connected to a bus wire, only one should drive at once. All gates can be in a
high impendence state, however.

Representation

This is a bidirectional bus

👻👻moodswings👻👻
This is a unidirectional bus.

The number represents the bus width.

Signals takes a finite time to propagate and are therefore comparable to gate delay.

Characteristic Impedance

Z0 = v / i

Z0
PCB tracks 50 - 150 Ω
Twisted pair 100 - 120 Ω
Coaxial cable 50 - 75 Ω

Multiplexing and Demultiplexing


A multiplexer switches from various inputs to an output, e.g., a mechanical one may be an
input selector on a hi-fi amplifier. An electrical multiplexer: offers one logic load, have
normal fan out, and have a strobe to enable/disable the mux (multiplexer).

A demultiplexer does the opposite - it puts an input onto the addressed output.

Programmable Logic Devices


PLDs have an inverting stage, an ANDing stage and an ORing stage. They have multiple I/Os
and they realise sum-of-product expressions.

A programmer is a device to which an unprogrammed PLD is plugged. Using a programmer


keyboard and a schematic of the device, internal connections can be located and blown away.
Traditionally the method for doing this is:

1. Create boolean equations


2. Enter them into computer program
3. Compile them into JEDEC form

👻👻moodswings👻👻
4. Programmer uses JEDEC file to program the PLD.

PROMs are general purpose decoders leading to an ORing stage. Only the ORing stage is
programmable. PROMs are available in different varieties, such as:

 ROMs are programmed by the manufacturer, and are only cost-effective if


manufactured in large quantities.
 PROMs are developed in a lab. Once the fuses are blown, they can't be
reinstated. They are programmed by electrical pulses up the output.
 EPROMs, this are like PROMs, but UV light resets the fuses.
 EEPROMs, like EPROM, but electrical pulses are used to reinstate the fuses, not
UV light.

For PLDs, instead of a conventional notation, crosses are put where wires intersect to indicate
fuses being intact.

Programmable Gate Array

Inverters lead to NAND gates which lead to XOR gates. The XOR inverts. NAND gates and
output polarity are programmable. If the polarity of the XOR gate is intact, the NAND gate is
shown, otherwise the other input is set to logical 1 and inverted.

Programmable Array Logic

This is a programmable AND array, but a fixed OR array. In PALs, every output is not
programmable with every possible input combination, however they are low cost and easily
programmed. If you have any unused AND gates in the array, all fuses must be intact, which
sets the output to 0 which doesn't interfere with the OR.

Some PALs have tristate buffers for bus driving (the tristate selects whether the PAL is an
driving or receiving), hence the PAL can be used for inputs and outputs.

Programmable Logic Array

This has a programmable AND array and OR array.

PLDs have extra security to allow the device to be checked and allows the fuse arrays to be
read. Some PLDs have security fuses to stop the devices being read.

IDD is continued straight through into DAD.

👻👻moodswings👻👻
CHAPTER 3:
Introduction to Computer Systems
Systems

A system is contained within a boundary, either physical or logical. Outside the boundary is
the environment, which the system interacts with via inputs and outputs. The system has no
direct control over the environment, it can only control what happens inside the boundary.
The system receives inputs, but has no control over what these inputs are. It gives outputs but
has no control over what then happens to them.

Systems can be defined in many ways. They could have the same boundary, but a different
way of looking at it. It could also have a different boundary. For example, you could look at a
computer system in the following ways:

 A tool that takes commands and returns data


 A collection of components that take in electrical signals and returns electrical
signals
 A device for converting characters to binary code
 A component in an office workflow diagram

A system is analogous to mathematical relations. A system maps inputs to outputs. This is an


abstraction level which is sufficient for many purposes.

Systems are nested, therefore decomposition is normally a good way to study a system.
Systems can "overlap" (at least logically). Definitions can be applied to all systems.

To design a system:

1. Determine boundary
o What can be changed
o What is in the scope of design, what is outside
2. Determine the interfaces/how it interacts with the environment
3. Determine mapping relations
4. Determine what internal data is required to support mappings

👻👻moodswings👻👻
o Data storage

To design a system to work into an existing system, you need to think of separate
components. Each component contains a boundary and an interface, and I/O mappings. How
the system works should be transparent to each other module. All you need to know is:

 How to use the interfaces (I/O structures)


 What the mappings achieve
 How trustworthy the components are

Hierarchies

Systems are hierarchical structures. Each level is a component of a higher level, and each
component also has components at lower levels. Some components and levels are logical.
Each topic in Computer Science tries to deal with one level at a time. Mixing levels of
hierarchy can lead to confusion.

Science and Engineering


Science explores the physics of computing, i.e., what happens in energy terms, what the
components are and why they behave as they do and the theory underpinning hardware and
software.

Engineering studies ways to build systems. It looks at the hardware and software, and the
network and systems architectures.

Generalisation of Engineering

Engineering principles in one area usually apply in some form in other areas (one principle
can be applied to, say, civil engineering, just as much as CS). For example, good practice in
development, documentation and testing principles, the planning and management, dealing
with cost effectiveness and professional oversight are all traits of engineers.

Students rarely meet reality, however engineering is based in reality. Reality is big, messy
and imprecise. Reality changes goals and resources during projects, demands evidence of
performance and wants things to last forever (but then discards them the following week).

As reality is often too much for engineers to conceive, they simplify it using models. Models
are representations of reality. Programs, diagrams, mathematical definitions, CAD, working
scale models are all examples of models. Models are used to abstract away detail that is not
relevant to the current purpose. Models can be static (diagrams, maths, etc) or dynamic
(programs, train layouts, animated CAD drawings, etc).

Lifecycles
All engineering projects can be characterised in general terms:

 Requirements
 Specifications

👻👻moodswings👻👻
 Designs
 Implementations
 Testing
 Maintenance
 Decommission

We will briefly look at each. You should apply these principles in other modules. There are
lots of lifecycle models (waterfall, spiral, etc) that can be used. Lifecycles are abstract
summaries of a chaotic reality - don't expect models to be perfect, but they shouldn't be
dismissed either.

Development methods are different to lifecycles. The methods are practical help with the
development based on expertise/experience of experts.

Requirements

This is what the system is required to do. This is a constraint also - you should not be making
the system do things that is beyond the requirements. Requirements are quite general, as this
is looking at the system from a high level. At lower levels, requirements are more specific to
a particular architecture, platform, etc...

Testing happens against the requirements, therefore the requirements need to be objective and
clear. Requirements also change, and these changes must be documented. Engineers aim to
just meet the requirements; extras are not needed, or in some cases desired.

Specification

A specification is a contact for a development. It meets the requirements and is an abstract


description of the requirements - it is not how the requirements are going to be achieved. It
could use diagrams, mathematics or logical analysis. It needs to be rigorous and
unambiguous. It should be checked against the requirements.

VVC (Verification, Validation and Certification)

Verification - This checks that the data is factually correct.


Validation - This checks that the data fits in with it's intended use.
Certification - The checks a system against external documentation and operational standards.

Design

This is less abstract than a specification, however still does not contain any code. There may
be different design at different levels. A design also adds more constraints.

Implementation

This must match the design. It also includes documentation - both of the development and for
the users. Implementation is a trivial stage compared to the specification and designs, if they
are done right.

👻👻moodswings👻👻
Some tips for implementation are:

 Re-use code where ever possible


 Document the original or re-used code
 Document all change and the rationale behind those changes to re-used code.
 Write re-usable code
 Use a good style of coding and document your coding fully
 Document how procedures and packages are used
 Document what each component does and the conditions they take.

Testing

This is the most important part. It is dependent on the earlier parts and consists of various
parts:

 Unit tests - this tests the internal code


 Module, integration and system tests - testing the individual components and
composition verification
 Acceptance or certification testing - validation that works as intended in
context and within standards

Testing reveals errors, not correctness; it is impossible to show a program is perfect. Tests
should be designed to break the system; you should always be testing for unexpected things
(e.g., illegal inputs, out-of-range data). Testing should occur close to or on the boundary of
the operational envelope.

The specification and design should be used to identify meaningful tests.

Maintain

Most systems change after delivery, perhaps a new platform is introduced, etc... Bug fixes
may also need to be applied. Documentation and code needs to be maintained..

Regression testing should be performed.

 Work out what tests should and should not still work
 Run all your tests again and check the results
 Don't just test the changed parts.

Decommission

Few systems are ever thrown away. If a system is used, it is likely to be replaced. You should
try to reuse and recycle. There could be lessons to be learnt from an old system, requirements
and scenarios could be similar to the new system, components might need to be retained and
each of these elements requires careful analysis.

Documentation is crucial

Computer Hierarchy

👻👻moodswings👻👻
A computer is a hierarchy of parts (a system). There are many overlapping hierarchies and
views, logical and physical views are both hierarchical and you can look at things from both a
functional and architectural point of view. ICS only covers some of the views and hierarchies
that exist.

Events

Any input is an event. A command or data sent may change the system state. How an event is
determined depends on your view, for example from the view of the CPU only electrical
pulses are valid. Logically this could be a single character, a signal, etc... From a human
view, this could be a string of text, or a whole mouse movement (which relates to multiple
signals).

Keyboard Events

Stallings 7.1

A keypress sends electrical signals to a keyboard transducer. The transducer converts the
character pressed to an IRA (ASCII) bit value.

Mouse Event

Press, release, etc are all considered individual mouse events. The window manager interprets
events and movements to commands. Commands are interpreted to machine instructions in
the form of bits.

Bits are transmitted to the I/O module and memory. Transmission of input is identical for all
bits. In a window, a window manager interprets key presses and mouse movements to
machine instructions or data.

A filename is a logical reference to a distributed set of addresses. Physically this is an index


address and lots of separate data blocks.

A computer takes well formed commands and then uses context to deduce what's intended.

Views and Levels

Stallings 1.1

There are different ways you can look at things:

 Internals - above the hardware level but below the operating system
 Architecture - how the programmer sees the system
 Organisation - how features are laid out

Architecture

👻👻moodswings👻👻
 Data representation
 Instruction set
 Addressing
 I/O mechanisms

Most of these are logical abstractions. Physical signals represent bits and they represent
instructions and addresses.

Organisation

 Internal signals, clocks and controllers


 Memory addressing
 Instructions
 Hardware support

Structures and Functions

Stallings 1.2

Structure Functions
CPU storage
Memory processing
I/O movement
communications control

👻👻moodswings👻👻
 control unit - controls execution cycle
 ALU - computational core
 registers - internal memory
 connections - connections between other units

IAS

Stallings 2.1

John von Neumann was at the Princeton Institute for Advanced Studies (IAS). He came up
with the theory for a stored-program computer in about 1945. The first IAS machine was
completed in 1952. Although this is later than Turing's efforts in the 1940's, the IAS system
was public and open from inception so became more well known.

The IAS architecture stored data and instructions in a single read/write memory. The memory
contents were addressable by locations and sequential execution from one instruction to the
next occurred.

👻👻moodswings👻👻
Main Memory

Main memory is a collection of locations, each holding a binary value. Instructions are stored
and read sequentially in one set of addresses. A logical abstraction of this is a stack. All
locations have sequential identities (addresses) which data is stored and accessed by.

Architecture functions and organisation are built round groups of addresses.

instruction 0
instruction 1
instruction 2
instruction .
instruction .
instruction x
data x+1
data x+2
data .
data .
data n-1
data n

Main memory has n locations. Instructions are in locations 0-x. The instruction stack is a
logical data structure with input at the top, and reads happen at the top too.

Words and Bits

Each memory location is 1 word. The IAS machine had 1000 words in memory. The Intel
8088 (1979) had 1 MB. The current Pentium 4 (IA32) architecture has 64 GB of words.

Each word contains a fixed number of bits. The IAS had 40-bits words, but most modern
systems have 2n bits, for example, Intel processors until the 386 had 16 bit words. Early
Pentium's had 32 bit words and the Pentium II onwards had 64 bit words.

There are many formats for a word. The simplest is the opcode followed by an operand. The
opcode identifies one machine instruction (operation). The operand is the address of data to
be operated on.

Fetch-Execute Cycle

👻👻moodswings👻👻
Computer operations are in a series of cycles. For example, retrieve a word from main
memory, execute instruction in a word, etc..

Instructions locations number logically from 0. When the computer systems, a program
counter (pc) is set (i.e., to 0). This causes the 0th instruction to be fetched. Instruction is then
executed and the pc is incremented. The first instruction is executed next. The fetched
instruction is written to the instruction register (ir) for execution.

The pc and ir are CPU registers, like main memory, but fast and inside the CPU. pc and ir are
CPU registers. The pc only needs to hold an address. CPU buffers are common, for example,
between the pc, ir and main memory. Buffers could include the mar (memory address
register) and the mbr (memory buffer register)

The CPU control unit determines when to fetch an instruction. The communication between
the main memory and the CPU uses buses for addresses, control signals and data. The control
uses a clock tick (this is a regular pulse signal). The control units ticks to start the next
element of the F-E cycle.

Fetched instructions might need more memory interaction because an execution can take
place - this is an indirect cycle.

The execution that occurs depends on the instruction in the ir. For example, it could be

 moving data among registers, from CPU to main memory, etc


 modifying control
 I/O
 ALU invocation

Interruptions

Assume end-to-end execution of stored programs. The CPU is very efficient compared to
other peripherals and the F-E cycle is faster than other devices. Rather than the CPU hanging
and waiting for a device, interruptions allow the CPU to continue with another instruction
and return to one that's waiting on a device when the interrupt is called.

Different types of interrupts exist:

 Program Interrupts - Illegal processing attempted (for example, a divide by


0, arithmetic overflow or an illegal memory call)
 Timer Interrupts - Allow regular functions to occur (housekeeping, polling
for data, etc)
 Device Interrupts - Come from device controllers, such as I/O. These signals
could be such as "ready", "error", etc.
 Hardware Failure Interrupts - Power out, memory parity error, etc.

Device Handling

A processor fetches an instructiowww.n which involves a device call. The execution calls a
set-up program (I/O program) which prepares buffers for data, etc... The device driver (an I/O
command) is then called and the execution terminates.

👻👻moodswings👻👻
Preparing for an Interrupt Cycle

This is a variant of the fetch cycle. The current pc is stored in a dedicated address. After the
interrupt is completed, the control unit reads the stored pc value and resumes at that location.

The CPU knows addresses to be used. The interrupt only starts when the previous instruction
is fully executed.

Interrupt Handling

Any device may send an interrupt (signalling that it is ready for I/O, etc), however the
handling details are specific - it might include the OS for example.

Computer Connections

Devices inside a computer are connected using buses, lines and interfaces.

There are many different types of connections - point-to-point (a dedicated link) or a bus (a
link shared by many components). A bus corresponds to a particular traffic type - data,
addresses, etc...

Networking can be wired or wireless. Conceptually it is different to internal communications,


but physically it is becoming similar to buses. Bus and network protocols are slowly
converging.

Internal and external communications have different emphases. Networking has an emphasis
on security, whereas buses tend to emphasise robustness.

Buses link at least two components, and traffic is available to all logical components
connected to that bus. Logically, a bus is dedicated to either data, addresses or control
signals. Physically, a bus comprises of a bundle of separate bus lines.

Buses are things passing values - like a flow of communications. In reality, it carries current,
so has values at all points simultaneously - logically this is either 1 or 0. Unfortunately
transfer isn't really instant (assuming that it is is a level of abstraction). A pulse that is either
on or off doesn't exist in the real world. There is a leading edge and a trailing edge as voltage
rises or falls. These edges don't carry any data and signals are not relevant here.

Buses use serial communications, i.e., 1 bit at a time. There are 8 lines for an 8 bit bus.

Parallel buses are buses of serial lines. Each bus still carries one thing, but a communication
may be split over many buses. In theory, parallel is faster and more efficient, however serial
buses tend to have higher data rates due to the overhead of parallel (assembly and
disassembly of packets).

Early buses had separated, unsynchronised clocks - CPU time was wasted waiting for devices
to catch up. Modern buses are controlled by a single CPU clock, however, variable speeds are
being re-introduced according to application.

👻👻moodswings👻👻
Data Bus and Lines

Data buses have one bus line per bit in a word (or multiples of this), e.g., 32-bit words have
32 (or a multiple of this) data lines, each requiring its own connection pin.

Address Bus and Lines

Buses may have a memory-I/O split. Most significant bit may say whether it's intended for
memory or an I/O address.

Control Bus and Lines

These are normally sufficient with 1 bit. They carry things such as timing signals
(clock/reset), command signals and interrupt signals.

Modules and Interfaces

Stallings 3.3, 3.4, 7.5

Each computer component is a module with an interface. It is based on the engineering


principle of each module being self-contained with minimal interfaces. This is true of
modules in any other engineering (software, etc)

Memory Interface

The memory interface passes fixed length words to the memory. The memory doesn't care
what it's storing, it could be an address or data, or anything.

Control signals (read signal, write signal) tell the memory module what to do, the address bus
carries the address that's being considered, and the data bus will either carry the contents of
the memory to or from the memory.

CPU module

The CPU is "intelligent". It identifies the content of the data bus to be different types and
deals with them accordingly.

Coming in, it has control signals (interrupts) and data signals (instructions and data). Going
out it gives out control signals, addresses and data.

DMA

To save CPU cycles, an architectural extension called Direct Memory Access allows the I/O
module to read/write directly to memory. DMA modules control access between I/O and
memory. It must avoid clashing with the CPU however - a common approach to solve this is
cycle stealing. Cycle stealing happens like this:

1. The CPU is in the normal FE cycle

👻👻moodswings👻👻
2. It sends a write signal to the DMA
3. Outputs to the data bus - the address of I/O and memory and the number
of words to be written
4. CPU resumes FE cycles

The DMA only has 1 data register. It only reads one word at a time and writes a word to the
I/O device. The DMA also has a data count register. It holds the number of words it is
supposed to transfer and it is decremented after each transfer. When the data count register is
0, it sends an interrupt to the CPU to signal it is ready.

This only works for memory to I/O. I/O to memory using DMA isn't implemented in most
systems, but is essentially the same thing in reverse.

The DMA tends to interact with the FE cycle like this. The CPU and DMA module share
control of devices. CPU cycles are suspended before bus access occurs and an interrupt is
sent to the DMA. The DMA finishes transferring the current word and releases bus control,
sending a signal to the CPU control unit. When the CPU finishes using the bus, it signals the
DMA to resume.

Stallings 7.13

CPU suspend is not the same as an interrupt - context is not saved and the CPU does nothing
else whilst it is interrupted. The CPU simply waits one bus cycle then takes control.

DMA pause is also not an interrupt. The DMA has no other role in computation, so cannot
defer tasks to a later date.

Module Bus Arrangements

Stallings 3.4

A very simple arrangement is as follows:

👻👻moodswings👻👻
However, with external buses there are propagation delays. It takes time to co-ordinate bus
use. More components or devices requite long buses, and this makes co-ordination slower.
Buses also tend to be bottlenecks, the classic example of this being the "von Neumann
bottleneck". Ways to alleviate delays may involve deeper hierarchies (using caches and
bridges), dedicated buses for high-volume traffic and separation of slow and fast devices. A
more accurate bus diagram may look like this, for example:

However this is still a gross simplification. In reality, there are many more layers than this.

Memory Hierarchy

Memory can be simply split up into two forms, internal (registers, caches, main memory, etc)
and external (hard disc drives, memory sticks, etc)... The focus in ICS is on internal.

Total storage space is typically expressed in bytes. The size of a byte was variable in earlier
architectures, but now is standardised on 8 bits. An alternative way of expressing memory is
using words. A word is a "natural" unit of memory, and is architecture specific. Typically, a
word is the length required to represent a number or instruction (commonly 8, 16, 32 bits,
etc). There are many exceptions to this though - variable/multiple word lengths, word lengths
not in the form of 2n, etc...

Addressable units are the memory locations recognised. Each unit is a word length and the
address range is built into the architecture. There are many exceptions, however, the most
common being byte-level addressing within words. The number of addressable units is
2n where n is the number of address bits. 16 bit addressing can cope with 216 units, or 65536.
These are number from 0 to 65535, for example.

A unit of transfer is physically the number of data lines connected to memory. There are
exceptions to this, for example partial words being read, etc...

External transfer units are called a block.

👻👻moodswings👻👻
Memory Access and Performance

Different architectures have different methods. Older forms used external memory, which use
sequential/direct access. Newer architectures tend to use random or associative access.

Stallings 6

Access time is dependent on architecture, hardware and organisation. ICS considers these for
random access memory. The parameters used to measure performance are access time )or
latency), memory cycle time and transfer rate.

Latency

This is the inevitable delay whilst processing a command. For reads, the access time is the
time from the request of an address to the memory until the data is ready to use. For writes,
the access time is the time from presentation of the address and data until the write process is
completed.

Cycle Time

Cycle time is the gap between memory access - for example waiting for memory to stabilise,
buses to reset, etc...

Transfer Rate

This is the rate at which data can be transferred in and out of memory. It's the reciprocal of
the cycle time.

Physical Media

This needs to be stable and hold two clearly distinct values, such as semiconductor, magnetic
media, optical discs, etc...

Memory Volatility

Volatile memory requires a power source to maintain state. Memory such as cache is volatile.

Memory Hierarchy

Register
Inboard Memory Cache
Main memory
Magnetic disc
Outboard Storage
Optical
Off-line Storage Magnetic Tape
Magnetoptical

👻👻moodswings👻👻
Write Once Read Many

As you move down the hierarchy, you get a lower cost per bit, greater capacity, slower access
and rarer use by the CPU.

More information on memory is available on the course web pages.

Errors and Error Correction

Stallings 5.2

Errors can occur in the semi-conductor. Hard errors are permanent errors where a cell is
always stuck at 0 or 1, or a random value. Soft errors are unintended events changing cell
values, physical causes, power surges, alpha particles, etc.

Error detection works on the principle of storing two values, m - the data - and k, which is a
function of m.

If m has M bits and k has K bits, then the stored word is M + K bits. When m is read, f(m) is
calculated again, giving k', which is compared to the stored value of k.

If k' and k are the same, no error has been detected, however if they're different, two things
can occur. m gets sent to the corrector and corrected, if it's correctable, and if not, an error is
reported.

This gives three various levels of detection: no detection, uncorrectable detected error, or
correctable detected error. However, this is not always accurate. An error could have gone
undetected, or a corrected error may not still give the right value.

An early method of detection was parity, which was 1 bit long. If the stored number is even,
parity is 0, otherwise if it's odd it's 1. If a double bit flip occurs, no error is detected, so it is
limited. Also, it can not correct detected errors.

Hamming Error Detection

This improves detectiblity (known as Hamming distance) and also maximises the data:parity
bit ratio (the information rate).

The Hamming distance is the minimum number of bit flips that cannot be detected. For
parity, this is 2.

Because there is also the chance that parity is incorrect, embedding parity in the data
increases the chance of it being accurate. Only one error in parity gives a parity error.

If two bits are flipped, but corrections applied then an even more wrong number will be
extracted. Combining old style parity with Hamming allows you to check between single bit
flips and two-bit flips. If parity is right, but Hamming is wrong, you know 2 bit flips have
occurred, so no correction is possible. This type of method is known as SEC-DED (single

👻👻moodswings👻👻
error correction, double error detection) and it is considered adequate for most modern
systems. The probability of error is quite low.

Cache Memory and Memory Addressing

Stallings 4

Caching works on block of memory contents, rather than single words. For example, a 16-bit
main memory address may identify a block of data (where a block is 4 words in this
example). In this case, this make 000016 the start of the first block, 000416 the start of the
second, etc... A word is then identified by its offset from the start.

Caching is used to improve memory performance (it was first introduced in 1968 on the IBM
S/360-85) and has become a feature of all modern computing. Cache is typically the fastest
memory available and behaves like a module in the CPU.

All information is still held in main memory. The cache holds copies of some memory blocks
(recently accessed data). CPU reads are initiated as normal, the cache is checked for that
word, and if it's not cached, the block is fetched into cache and the CPU gets served directly
from cache.

Main memory consists of 2n words with unique addresses 0 to 2n - 1 assigned to blocks of K


words. This makes 2n / K blocks. Cache consists of C lines of K words, where C holds a
block. C <<< M.

There are many different ways of implementing cache. The most common is that all buses
lead to cache and the CPU has no direct memory access (logically).

The cache needs to know which memory block contains which word. Two ways of
accomplishing this is with direct mapping and associative mapping.

Direct Mapping

A section of memory is mapped to one line in cache. This is simple and cheap. However,
swapping could happen a lot. Two blocks which are accessed a lot by a program are mapped
to the same cache line. The program causes these lines to be constantly swapped in and out,
this is called "thrashing" and wastes CPU time.

Associative Mapping

Here, any memory block can map to any cache line. The main memory needs a block tag and
word, the cache just needs a block tag. This isn't as fast to look up, as more lines need to be
checked for a particular word.

There are further cache issues to consider, for example, what happens when a cache line is
overwritten. A delete/write policy needs to be defined, the block and line size needs to be
considered, and there could be multiple caches.

👻👻moodswings👻👻
CPU

The ALU performs computation and the other CPU units exist to service the ALU. The ALU
is made of simple digital hardware components. It stores bits and performs Boolean logic
operations. All data comes to and from registers, input is from the control unit and outputs
goes to registers, with status flags going to the control unit. As a module, the ALU may be
made up of more modules (4 common ones are the status flag, the shifter, the complementer
and the boolean logic unit).

Status Flag

This is linked to the status register, or program status word. All CPUs have at least one.

Condition codes and other status information may consist of:

 The sign of the result of the last operation


 A 'zero' flag (if the last operation returned 0)
 Carry (for multi-word arithmetic operations)
 Equal (a logical operation result)
 Overflow
 Interrupt

Shift and Complement

This is a very common ALU component and is a computational function that occurs
frequently. It can be solved using addition, but it normally more efficient to use shifts.

Shifting multiplies or divides by 2. 11111111 may become 11111110 if multiplied or


01111111 if divided.

There are also circuits for boolean logic and arithmetic, but the implementation details vary
from architecture to architecture.

👻👻moodswings👻👻
Registers

CPUs normally have two types of registers, general-purpose (or user-visible), which are
accessible from low level languages and allows for optimisation of memory references by the
programmer and control & status registers - the control unit controls access to these. Some
are readable, and some are also allowed to be written to by the OS, for example, to control
program execution.

Instructions

A computer operates in response to instructions. The lowest representable form is machine


code (binary). This is normally represented in hex, however, as it is easier to read. The first
human-readable form of the code is assembly, which is a textual representation of machine
code.

Instructions have an opcode and an operand(s). The opcode names a particular CPU
operation. The operand is either data, or related to the data (an address, for example), or a
number, etc... In some cases, operands may be implicit.

Many simple operations combine to form fetch-executions, etc. Sometimes these are called
micro-instructions.

Normally, there is 1 opcode and up to 4 operands. The format of the instruction is


architecture dependent. Architecture designs have different representations for data types and
how the CPU distinguishes them.

Representations
Representing Integers

Binary natural numbers aren't a problem for computers, however, integers are signed (can be
positive or negative) and normally the msb represents the sign (-ve is binary 1 and +ve is
binary 0). Therefore, a binary representation of an integer must contain the sign and the
magnitude. The advantage to this method is that negation can happen due to a single bit flip,
however, there are now two different representations of 0 and arithmetic must consider both
sign and magnitude.

Another method of representing integers is the two's complement method. Here, a positive
number has msb 0. In 4 bits, 1111 is the largest representable negative and 0111 is the largest
representable positive. The range of representable numbers is -2n-1 to 2n - 1 - 1. For 4 bits, this
gives us +7 to -8. 0 is always positive. The main advantages to this method of representation
is that there is a single representation of 0, and the arithmetic is simpler (for a computer).

Representing Real Numbers

Real numbers are identified by a 32-bit word. There is a 1 bit sign, 8 bit exponent and 23 bits
for the significand. This is defined by the IEEE 754 floating point standard. IEEE 754 also
defines the 1,11,52-bit standard for 64 bit systems and extended standards which can be used
in the internals of a CPU only. The standard also gives standard meanings for extreme values,

👻👻moodswings👻👻
e.g., 0 is represented by all 0s, infinity is represented by all 1s in exponent and 0s in the
significand, etc... A de-normalised number should have an all-0 exponent, whereas all 1s in
the exponent and a non-0 significand represents 'NaN' - not a number.

Underflow (the numbers get too close to 0 - i.e., too precise for the 32-bit word to represent)
and overflow (numbers get too large) occurs when the exponent can't represent all bits. 0 can
not be represented as a floating point.

For integers, a normalised floating point 32-bit word can hold one of 232 values. Some values
are too precise to represent so they get rounded to the nearest one - in the case of underflow
errors this is 0.

There is a trade off between precision and range by changing the number of exponent and
significand bits.

Stallings 9.5

Exceptions are errors - where something different happens from normal expectations.
Exceptions tend to be raised by a program and can occur at any level - exception handling is
an essential part of good programming at all levels.

Representing Characters

Text is a thing that you don't do arithmetic on. It consists of printable characters (such as this
text) and non-printing characters (newline, escape, bell, etc). Characters are coded in an order
and this can be exploited for inequality and mathematical ordering ( 1 < A < a < } )

Computers store binary values, so text must be encoded. The most standard encoding (until
recently) was ASCII, but others existed such as the International Reference Alphabet (IRA,
IA5), EBCDIC, etc... These traditionally used 7-bits, but extended forms were available.
Normally, a parity bit was added to pad it to 8 bits for storage.

International standards have taken longer to develop, and have only recently come to
dominate. Most of these standards are derived from ASCII. The modern international
standard is Unicode, which was developed by ISO. Unicode was originally 16-bit, but
nowadays can be up to 32.

Representing Other Objects

There are far too many to discuss in detail, but other items, such as images, sound, video,
etc... needs to be represented also. These are traditionally done using large data structures
with special handling procedures.

Addressing Modes
Addresses have several modes, identified by operand types, special flag bits, etc...

👻👻moodswings👻👻
Address (A) is the contents of the instructions' address field.
Register (R) is used in specialised addressing modes where contents of the address field
refers to a register.
Effective Address (EA) is address of an actual operand location.
(X) means the contents of memory location X.

Immediate

A = Operand. There is no separate memory reference and this can be used to set constants or
initialise variables.

e.g., LOAD 11111001 . Loads a two's complement value in an 8-bit address field. When
stored, it is padded to word or address size.

This mode is not very common due to capacity limitations

Direct Addressing

The operand holds the effective address. EA = A.

This means there is one memory reference per operand, however there is still limited address
space and for this reason it is not used much.

Indirect Addressing

Operand points to a memory location, which points to EA. EA = (A).

The operand is smaller than a memory word - but the possible addresses are still limited. The
memory location is a full word, and any address is possible. This method of addressing is
used in virtual memories.

Register Addressing

The address directly refers to a CPU register.

Register Indirect Addressing

The address is in a register, not main memory, otherwise it's the same as indirect.

Displacement Addressing

This combines direct and register indirect.

EA = A + (R)

There are three common variants to this:

 relative addressing - program counter is the displacement


 base-register addressing - the register is implicit

👻👻moodswings👻👻
 indexing - the reverse of base-register. This is fast for iteration.

Stack Memories
Stacks are standard computer data structures. A sequence of memory locations are applied to
for a stack memory. Stacks operate as FILO, so access is only to the top item in memory.
You "push" to an empty address above the current top and "pull" to get the current top
address.

Stallings 10 appendix A

An area of memory is reserved for stack. A pointer is directed to the current top address in a
dedicated stack pointer register, because of this, stacks use register indirect-addressing. The
top few items in a stack may be in a register or cache for fast access.

With stacks, instructions don't need addresses. The opcode always operates on the top value.

Assembler and Machine Code

Stallings 10

To execute an instruction, each element is read and decoded in turn. Micro-operations occur
for decoding addresses and data and then the instruction cycle is a sub-unit of the execution.

The CPU and instructions should be designed to remove any ambiguity.

The instruction cycle works like this:

1. CPU: Calculate next instruction address


2. Memory: Fetch next instruction
3. CPU: Decode instructions (using micro-instructions, where are arch dependant)
and get input operands, opcodes and output operands.
4. CPU: Calculate address of input operands
5. Memory: Fetch the address and return the data
6. CPU: Perform the operation (determined by the opcode) on the data.
7. CPU: Calculate the output address
8. Memory: Store the output
9. CPU: Loop to the next item in the F-E cycle.

The exact instruction cycle varies according to the architecture (there may be multiple
operands, implicit operands, etc)...

The maximum amount of operands any instruction needs is 4 - two input, 1 result and the
next instruction reference. However, due to implicitness, a 4 operand instruction set is rare. 1
or 2 operands instruction sets are common.

👻👻moodswings👻👻
Reverse Polish Notation

Wikipedia

Reverse Polish Notation is useful when coding for a 1 operand instruction set. The format
always gives correct precedence and it is most easily envisaged as a graph.

There are different ways of representing machine code:

 Machine code - binary codes that map directly to electrical signals.


 Symbolic instructions: Assembler - mnemonic codes and addresses (normally
hexadecimal). This needs to be "assembled" in machine code.
 High-level programming language - needs to be compiled and assembled.

The instruction set determines functionality and capabilities, and also how easy programming
is. More instructions require more opcodes, and therefore more space, but it makes coding
easier.

There are many issues in dispute about instruction set design, the hardware has never been
stable enough for consensus to occur.

Types of Operation

Stallings 10.3

In 1998, J Hayes said there were 7 types of operation which instructions can be grouped into.

 Data Transfer - apply to transfer chunks of data (e.g., MOVE, LOAD, STORE,
SET, CLEAR, RESET, PUSH, POP). These are usually the simplest operations
to build.
 Arithmetic - ADD, SUB, MULT, etc...
 Logic - like arithmetic, but for any value, for example NOT (bit
flipping/twiddling), AND, EQUAL, bit shifting, etc...
 Conversion - translation between numeric formats (floating point/integer,
normalised/denormalised), length translations, etc...
 I/O - can either be memory mapped (the I/O addresses are in memory address
space, so only one set of instructions is needed for both I/O and memory) or
isolated (separate addresses and opcodes for I/O and memory)
 System Control - usually only available to CPU and OS, allows access to
privileged states and instructions, e.g., modifying special registers (program
counter, etc).
 Transfer of Control - breaking the normal F-E cycle, essentially overriding the
program counter. Using these instructions you can implement thinks like loops
(JMP), conditionals (Branch if). You can call procedures with a CALL
command, which runs a new program in the F-E cycle. RETURN then goes back
to where you CALLed from to continue. A stack of CALL locations allows you
to implement multiple levels of procedures.

👻👻moodswings👻👻
SEMESTER 1 PART B
CHAPTER 4:
Algorithms and Data Structures
Problems and Solutions
A problem is a description of the:

 assumptions on the inputs


 requirements on the outputs

A solution is a description of an algorithm and associated data structures that solve that
problem, independant of language.

e.g., given a non-negative integer n, calculate output: . You could solve this:

 s←0
for i ← 1 to n
do s ← s + i;

if ∃ p . n = 2p
 s ← n × (n + 1)/2

then s ← n/2
s ← s × (n + 1)
else s ← (n + 1)/2
s←x×n

Are these correct? If we assume they are, how do we compare them to find the best one?

Correctness

You can be formal about correctness using predicates and set theory, but in ADS, we shall be
semi-formal and informal.

An invariant is a property of a data structure that is true at every stable point (e.g.,
immediately before and after the body of a loop (for loop invariant), the procedues and
functions of an ADT (for data invariant)).

The first suggested solution has invariant and terminates when i = n. The other two
algorithms are correct by the laws of arithmetic. Although all 3 algorithms are correct, none
can be correctly implemented due to finite limitations. The way to find out which algorithm is
better than others is by measuring resource usage. For our solution: T1 = λn · (2i + a + t)n +

👻👻moodswings👻👻
(2a + t) and T2 = T3 = λn · m + i + d. i is the time to increment by 1, a is the time to assign a
value, t is the time to perform the loop test, m is the time to multiply two numbers and d is the
time to divide by 2.

Therefore, we can see that the time resource usage for the second and third algorithms is
constant and for the first algorithm it is linear. If we plotted a graph of T1 against T2 (and T3)
we see that for a finite value of n then T1 has less usage, but for an infinite value of n T1 has
greater usage. From this, we can assume that either T2 or T3 are the better algorithms.

Asymptotic Complexity

∃n0, c . n0 ∈ N ^ c ∈ R+ ^ ∀n . n0 ≤ n ⇒ fn ≤ c.(gn)

This can also be written as f = Ο(g). However, this means there is an ordering
between f and g (not an equality).

f = Ο(g) means that f grows no faster than g. From this, we can create the related definitions:

 f = Ω(g) - f grows no slower than g. g = Ο(f)


 f = Θ(g) - f grows at the same rate as g. f = Ο(g) ^ f = Ω(g)
 f = ο(g) - f grows slower than g. f = Ο(g) ^ f ≠ Θ(g).
 f = ω(g) - f grows faster than g. f = Ω(g) ^ f ≠ Θ(g).

For asymptotic complexity, given functions f, g and h, we can say they have the properties of
being reflexive and transitive, therefore the relation _ = Ο(_) is a preorder.

Another property we can consider is symmetry.

λn · 1 = Ο(λn · 2) (take n0 = 0 and c = 1). Also, λn · 2 = Ο(λn · 1) (take n0 = 0 and c = 2).


This proves that Ο(_) is not antisymmetric.

👻👻moodswings👻👻
To shorten down these expressions, it is traditional to drop the λn, so λn · 1 becomes just 1.

We can also combine functions, so:

 f = Ο(g) => g + f = Θ(g)


This allows us to drop low-order terms, e.g., 7n = Ο(3n2) => 3n2 + 7n = Θ(3n2)
 f = Ο(g) ^ h = Ο(k) => f × h = Ο(g × k)
This allows us to drop constants, e.g., 3 = Θ(1), so 3n2 = Θ(1n2)

We can also combine these laws to get such things as 3n2 + 7n = Θ(n2).

There are also two special laws for dealing with logarithms:

∀r, r ∈ R+ ⇒ log n = ο(nr)


 1 = n0 = ο(log n)

From these rules, we derive most of what we need to know about _ = Ο(_) and its relatives.

In general, we can simplify a polynomial to be an ordering between the polynomial and it's
largest term, also, we can treat log n as lying between n0 and nr for every positive r.

Classifying functions

We can class functions according to their complexity into various classes:

Name fn=t f(kn)

Constant 1 t

Logarithmic log n t + log k

Linear n kt

Log-linear n log n k(t + (log k)n)

Quadratic n2 k2t

Cubic n3 k3t

Polynomial np kpt

Exponential bn tk

Incomputable ∞ ∞

The f(kn) column shows how multiplying the input size by k affects the amount of resource, t,
used.

Complexities of Problems and Solutions

👻👻moodswings👻👻
For our example problem 1, how many times is fp = x executed with input size of n = (z - a) +
1? In the best case, this is 1 (x is found immediately), however in the worst case, this is n.
Thus we can say that the complexity class is linear.

The average case time depends on assumptions about probability distribution of inputs. In
practice, the worst case is more useful.

Abstract and Concrete Data Structures


Many of the problems of interest are to implement abstract mathematical structures, such as
sequences, bags, sets, relations (including orders), functions, dictionaries, trees and graphs,
because these are of use in the real world.

Common operations include searching, insertion into, deletion from and modification of the
structures.

However, our machines are concrete, so our implementations of these abstract structures must
be in terms of the concrete structures our machines provide.

We normally consider an abstract data structure as a collection of functions, constants and so


on which are designed to manipulate this. We came across this in Principles of
Programming as Abstract Data Types. Our aim here is to make the more common functions
efficient, at the expense of the uncommon functions.

The relationship between an abstraction and concretion is captured by an abstraction invariant


(or abstraction relation). e.g.,:

 A set of elements s is represented by a list l with the property: s = { l(i) | i is an index of l }


 A set of elements s is represented by an ordered list l with the property: s = { l(i) | i is an
index of l }
 A set s is represented by a sequence of lists q with the property: s = { (concat(q))(i) | i is an
index of concat(q) }

Contiguous Array

This is an concrete data structure, where each item in an array directly follows another. The
advantage of this method is that the component address can be calculated in constant time,
but it is inflexible (you need to find a whole block of spare storage that can hold all you want
to store in one long row)

Pointer (Access) Structures

This is also a concrete data structure. We already covered pointers in PoP. To lookup the pth
item in a pointer structure requires p lookups, so this example is linear in p. Extra space is
also used for pointers, however, it is flexible (can easily fit in spare space, things can be
added to and from it easily using new pointers, etc...

Linear Implementations of Dictionaries

👻👻moodswings👻👻
An entry in a dictionary consists of two parts: a key, by which we can address the entry and
the datum, attached to the key.

can say that a dictionary, d, is a set of entries, where if e0 ∈ d ^ e1 satisfy key(e0) = key(e1),
For entry e we can write key(e) and datum(e), and also e = make_entry(key(e), datum(e)). We

then datum(e0) = datum(e1). This is a finite partial function.

We can say that the main ADT operations are:

 is_present(d, k)
 lookup(d, k)
 enter(d, e)
 delete(d, k)

Other operations could include modify(d, k, v) and size(d).

Problem

Choose a concrete data structure to support the abstract concept, and algorithmns to support
operations over this structure, such that a good balance of efficiency is acheived. It may be
that we need to sacrifice the efficiency of less common operations to make the more common
ones more efficient.

These solutions are simplistic, and there are better way to accomplish this

An Array

This is a machine-level structure with a fixed number of cells, all of the same type. Each cell
can store one value and is identified by an address. The time taken to access a cell is
independent of the address of the cell, the size of the array and the content of the array.

Models

 A pair of: an array, e of length max and a number, s, in the range s..max, which shows the
current size of the array.
 A triple: as above, but with a boolean u that tells us whether or not a cell is active.

Is a value present?

In the pair model, a linear search in the range 0..s would be needed, whereas for the triple, a
linear search from 0..max, checking u(i) is true.

For the pair model, the complexity is Θ(s) and for the triple, the complexity is Θ(max), and
both are linear.

Value lookup

Again, this is a linear search in both cases and for the pair model, the complexity is Θ(s) and
for the triple, the complexity is Θ(max).

👻👻moodswings👻👻
Enter

In the pair model, you go through the list checking that no entry already exists with that key,
then you add a new entry to the end (at s) and increase s. For the triple model, you do a value
present check, and then enter the value into the first inactive cell (where u(i) is false).

For the pair, the complexity is Θ(s) and this is linear.

Deletion

There are two methods of implementing the deletion method. You could go through the array,
moving data leftwards to over-write entries with key k, or you could go through the array,
replacing key k with a special ghost value, and then compacting when there is no space left.

This is linear, however for the second strategy, you can then rewrite enter to use the first
ghost it finds, causing optimisations.

Linked Lists

A linked list is a machine level structure with a variable number of cells, all of the same type.
Each cell stores one value, and the address of the next value, so a cell is found by following
pointers.The time to access a cell is linearly dependent on the position of the cell in the list,
but independent of the length of the list or its contents. We already implemented linked lists
in PoP.

Is a value present?

Using a temporary pointer, you can go through the list and compare the value at the
temporary pointer to the value that you're looking for. This is linear in the length of the list.

Value lookup

This is essentially the same as the value present check, but returns the datum of that key, not
just that it exists. Again, it is linear.

Enter

To enter data, you need to create a new cell, then set the pointer value to the pointer of the
current list pointer. The list pointer is then updated to point at the new cell. This has a
constant complexity.

Deletion

To delete something, you need to find the key to be deleted, and also the key that points to it.
In the key that points to it, you need to update the pointer so that it now has the pointer of the
cell to be deleted (i.e., it points to the next cell). The deleted cell is no longer part of the list
and the garbage collector should come along and clean it up.

The raw material for an algorithm designer consists of assumptions on the input (stronger is
better), requirements on the outputs (weaker is easier) and intelligence and imagination (more

👻👻moodswings👻👻
is better). We can improve the assumptions on our dictionary algorithms by asking that keys
be orderable. We can then improve our algorithms to take advantage of this assumption.

Ordered Arrays

If you model the pairs so that for any array indeces i and j there is an ordering on i and j, you
can use binary search, a logarithmic algorithm (an improvement over linear), for
the is_present and lookup functions.

Enter

Use binary search to find the correct position, then for an existing key, you over-write the old
value. For a new key, the array is slid one place right and the value is written into the hole.
This is linear.

Deletion

This is the same as in an unordered array, but we can find the location in logarithmic time.
The deletion process is still linear overall, though.

Ordered Linked List

is_present and lookup can't use binary search, as we can't find the middle of the list in
constant time. We have to use linear search, but we can terminate earlier if we pass the point
where it would be in the ordering, rather than going straight to the end.

Enter

This is like the non-ordered example, but the key must be inserted in the correct place in the
list. Therefore, it is linear.

Deletion

This is as the unordered list, but you can now stop when the first occurance of a key is found.
This is again linear, but the constants are lower.

Summary

s is the number of distinct keys in the dictionary. l is the length of the list.

Complexity is_present / lookup enter delete

Unordered array s s s

Ordered array log s s s

Unordered linked list l 1 l

Ordered linked list s s s

👻👻moodswings👻👻
For the ordered list, l = s. Additionally, leading constants change between the types, so where
complexities appear the same, the leading constants could be much different. The exact value
of the constants depends on the details of the implementation. Additionally, the array solution
can waste space, as l must be allocated, and l > s.

Linear Priority Queues


Bags of items

Bags are sometimes called multisets. They are similar to sets. Sets have members, but bags
have the same member more than once (multiplicity) - however order is not important so it is
not a series.

 a,b,b is a bag with 1 copy of a and 2 copies of b.

 a,b,a is a bag with 2 copies of a and 1 copy of b.

 is an empty bag.

If we had the concept of the most important element in a bag, then we might have two
operations: enter(b,i) and remove_most_important(b). We will also
need is_empty(b), is_full(b), size(b), etc..., for completeness.

If the highest priority is youngest first, then this is a stack (LIFO)


structure. enter and remove is normally called push and pop respectively. If the oldest is the
highest priority, this is called a queue (FIFO) and enter and remove are normally
called enqueue and dequeue respectively.

Both the stacks and queues abstractions can be implemented efficiently using arrays or linked
lists. Regardless of whether it's implemented as a linked list or an array, the complexity will
be constant.

To implement a queue as a linked list, we would need two pointers, f to the front and e to the
back. The complexity for this is constant. To implement the queue with an array, you would
need to have an index e of the next insertion point and the number (s) of items in the queue.
To enter something into the queue, you overwrite the current value of e and then move e, and
then to dequeue something, you would return e - s mod max, then decrease s. We use
modulus arithmetic to allow the array to wrap round.

Following pointers may be slower than an array, so sometimes the array implementation is
the better one.

Trees

👻👻moodswings👻👻
Trees are an ADT that can be implemented using arrays or pointers, of which pointers are by
far the most common. Trees can be used to implement other ADT's, such as collections (sets).

Structure

Nodes carry data and are connected by branches. In some schemes only leaf nodes or only
interior nodes (including the root) carry data. Each node has a branching degree and a height
(aka depth) from the root.

Operations on a tree include:

 Get data at root n


 Get subtree t from root n
 Insert new branch b and leaf l at root n.
 Get parent node (apart from when at root) - this is not always implemented.

These operations can be combined to get any data, or insert a new branch and leaf.

Binary Trees

A binary tree is a tree with nodes of degree at most two, and no data in the leaves. The
subtrees are called the left and right subtrees (rather than 'subtree 0' and 'subtree 1'). An
ordered binary tree is one which is either an empty leaf, or stores a datum at the root and:

 all the data in the left subtree are smaller than the datum at the root,
 all the data in the right tree are larger than the datum at the root, and
 any subtree is itself an ordered binary tree.

Usual implementations use pointers to model branches, and records to model nodes. Null
pointers can be used to represent nodes with no data. As mentioned earlier, array
implementations can be used, but they are normally messy.

An ordered dictionary can be used to implement a binary tree. Such a tree would be
structured like this:

👻👻moodswings👻👻
These trees combine advantages of an ordered array and a list. The complexity to traverse a
complete tree is log2 n, however this is only if the tree is perfectly balanced. A completely
unbalanced tree will have the properties of an ordered list and will therefore by linear.

When we are considering trees, we should now consider three data invariants:

 order
 binary
 balance

For more info on


red-black, see CLRS
or Wikipedia

There are two common balance conditions: Adelson-Velsky & Landis (AVL) trees, which
allow sibling subtrees to differ in height by at most 1 and red/black trees, which allow paths
for nodes to differ a little (branches are either red or black - black branches count 1, red
branches count 0 and at most 50% can be red).

AVL Trees

👻👻moodswings👻👻
In AVL trees, searching and inserts are the same as before, but in the case of inserts, if a tree
is unbalanced afterwards, then the tree must be rebalanced.

Rebalancing

The slope is height(left) - height(right). For the tree to be balanced, slope ∈ {-1, 0, 1} for all
nodes. If the tree is not balanced, it needs rebalancing using 'rotation'.

Wikipedia entry
on tree rotation

To rotate, you should set up a temporary pointer for each node and then re-assemble the tree
by changing connecting pointers.

Deletion from a binary tree

There are different methods of deleting from binary trees. One is using ghosts. A cell could
have a "dead" marker indicated (however, it must still have the contents for ordering
purposes), but this is only useful if deletion does not happen often, otherwise the tree will end
up very large with lots of dead cells. Insert could also be altered to use these dead cells (if
they fit in with the orderings)

Another method that is used is propogation. Here, deletions are pushed down to a leaf, and
pushed up cells must be checked to ensure they are still ordered. The tree may need
rebalancing after this happens.

Lists will always be quicker than trees for deletion.

What about ternary trees? These would give us log3 n, which is < log2 n (where n > 1). In a
binary tree: Θ(log2 n), but in a ternary tree, Θ(log3 n). However, log2 n = Θ(log3 n). The base
is a constant, so can be disregarded for Θ purposes. However, ternary trees require harder
tests for ordering, so any benefit from having fewer arcs are cancelled out. Tree rebalancing
is much trickier, as rotation can not be used. In a case where pointers are very slow (such as
on a disk), then ternary trees may be better.

B+-trees

These are sometimes called B-trees, which is technically incorrect. B+ trees are slightly
different to binary trees in that the key and data are seperated. B+-trees are commonly used in
disks.

👻👻moodswings👻👻
The invariants for a B+ tree are:

 all leaves are at the same depth (highly-balanced)

non-root non-left nodes have ⌈m/2⌉..m descendantrs


 root is a leaf or has 2..M children.

 ordered

If m = 3 (as above), this is sometimes called a 2-3 tree. If m = 4, this is called a 2-3-4 tree.

Insertion

By adding
1,

Nodes can also propogate up, if there isn't a spare child available.

Priority Queues
There is some ordering over the elements which tells us the importance of the item. One
method of ordering is over time - as we saw earlier with linear priority queues.

👻👻moodswings👻👻
(Here, a low number means a high priority.)

The first invariant we should consider to implement this is to implement it as a binary tree.
However, to make the invariant stronger, we should use heap ordering. This is where the
highest priority is the root and each sub-tree is heap ordered. e.g., for our example above, a
heap ordered binary tree would look like:

To enter a value, you need to pick any null entry and replace by the new value and then
bubble up, e.g., put 1 as a leaf of 11. 1 is more important than 11, so swap the two values. 1 is
then more important than 9, so swap again and continue until a point is found where 1 has
nothing more important than it, which in this case is the root. This maintains the ordering and
therefore restores the invariant.

To delete a value, pick any leaf, move upwards to the root and then bubble the value down,
which is the opposite of bubbling up - however as there are two values to swap with,
swapping always occurs with the most important of its children to maintain the ordering.

However, like trees we've looked at before, in the worst case, this can be linear in
complexity, so we should strengthen our queue with a new invariant - balance. This balance
invariant is stronger than the balance invariant we looked at in AVL and red-black trees,
though. Here, we will only allow holes in the last row - this is perfect (or near-perfect)
balance and gives us a logarithmic time complexity.

There is only one undetermined thing to consider - when there are multiple holes to insert
into, which one do we use? By using another data invariant, we can reduce this
nondeterminisism.

Left justification is where all the values in the last row are on the left.

👻👻moodswings👻👻
From all of these invariants, the algorithm is now completely determined, so we now need to
consider how to implement this tree.

If we used pointers, we'd need two down to the subtrees, and one up to the parent, in addition
to a cell for data. However, as we know pointers are slow, this could be a slow
implementation. Because we have very strong invariants, an array implementation is better,
however the array will have a maximum stage, which is another invariant to consider.

Our implementation will therefore have an array h (1..m) and a number s which represents the
size.

If h(k) represents some node, then we need to be able to find parent and children.

The parent can be found at h(⌊k/2⌋) and therefore we can find the left child at h(2k)
and h(2k + 1). h(1) is the root.

As s is the size of the array, this makes h(s) the deletion point and h(s + 1) the insertion point.

This implementation is called a heap.

Merging Priority Queues

For efficient merging, we need to use a structure called a binomial queue. A binomial queue
is a forest of heap-ordered binomial trees, with at most 1 tree of each height.

Binomial Trees

A single node is a tree of height 0 (B0), and a tree Bk + 1 is where two trees of Bk are attached
together (one root is attached to the other).

Binomail trees are not binary - they can have many arcs.

To merge two queues together, we add two forests together (as in binary addition). However,
as we can have at most 1 trees, some trees need to be "carried" (combined with the next tree
up). This process is linear in the number of nodes.

Insertion

If you made x a single node queue, and then merge it with an existing queue you insert it into
the queue - this is logarithmic.

Removing

To remove, you need to find the tree with the smallest root and remove it from the queue.
You then remove the root from the removed tree which gives you a new set of trees of height
Bk - 1, which can be merged back in with the existing queue. This is logarithmic.

Dictionaries as Hash Tables

👻👻moodswings👻👻
There are two main implementations of dictionaries. One is with a simple keyspace (i.e.,
Index is 0..max-1, where max is a reasonable number). This can be implemented as an array,
either as a pair of arrays or an array of pairs - where one is a value indicating whether or not
the cell is active and the other is the contents.

The second implementation is a dictionary with an arbitary keyspace. Here, a hash function
takes a key and gives us back an Index. e.g., if key = N, hash(k) = k mod max.

What if the key is a string? We treat A-Z as base-26 digits and then apply the hashing
function to this. Horner's rule is an efficient hashing function.

(((k(#k - 1)) × 26 + k(#k - 1)) × 26 + ...) + k(1)

Horner's rule is linear, however if we use multiples of 32 instead of 26 we can bitshift, which
is much faster. Another improvement we could make is that ensuring our values don't
overflow by doing mod max after every multiplication. Making the max a prime value also
increases efficiency.

A consideration is what happens if we try to put 2 pieces of data in the same cell - in a length
7 array, indexes 3 and 10 both point to the same place, so we get a collision. There are two
hashing methods that work round this:

 Open hashing (sometimes called seperate chaining)


 Closed hashing (sometimes called open addressing)

Open Hashing

We have an array of small dictionaries, normally implemented using a linked list, so when a
collision occurs, it is added to the dictionary.

To consider efficiency, we need to conside the loading factor (λ), which is the number of
entries ÷ the number of cells. For open hashing, λ ≤ 1 (assuming a good hash function),
making our average list length 1. Therefore, on average, all operations are Θ(1).

Closed Hashing

In closed hashing, a key points to a sequence of locations and there are several varients of
closed hashing.

Linear Probe

hash(k) + i

When a collision occurs, find the next empty cell and insert it in there.

A collection of full cells is called a cluster, and searching for a key involves searching the
whole cluster until you reach an empty cell. For this to be efficient λ ≤ 0.5.

If deletion occurs, the algorithm breaks, as an empty cell would break up the cluster and the
search would stop early. A ghost must be used in this case.

👻👻moodswings👻👻
Quadratic Probing

hash(k) + i2

This is essentially the same as linear, but the offset is i2. If the table size is prine, and λ < 0.5,
insertion into the array is guaranteed, however without this condition, insertion can not be
guaranteed.

Double Hashing

hash(k) + i.hash′(k)

The first hash computes the start and the second computes the offset from the start.

Of these hashing methods, open hashing is much more popular. With hashing, insert, delete
and find are Θ(1) or close. Performing operations on all the entries on a dictionary in a hash
table isn't so good, however. Creating a hash table also requires that you know the size.

If λ gets close to the limit for the algorithm so that it would either be unusable or not very
efficient, you can rehash. Rehashing is create a new table (of approximately twice the size -
the new table should be prime in size) and copy all the data across. This is very slow. Open
hashing doesn't necessarily need rehashing, but speed does increase if you do it. You can also
rehash tables down in size as well as up.

There are other dictionary implementations, such as PATRICIA, but we will not be covering
them in this course.

Sorting
This brings us closer to our original definition of a problem. We have assumptions on inputs -
that we have an array (or linked list) a with a given total order on the elements - and
requirements on the outputs - namely that a is ordered (sometimes a different structure may
be used in the output, e.g., a linked list when the original was an array) and bag(a0) = bag(a′).

We can use loop invariants to define our sorts. There are lots of different sorts, including:

 Bubble sort
 Shell's sort
 Quicksort
 Merge sort
 Insert sort
 Select sort
 Heap sort
 Bucket sort
 Index sort

Bubble sort should never be used.

Selection Sort

👻👻moodswings👻👻
The first portion of the array is sorted and the second part isn't, but is known to be all larger
than the front portion.

To sort, the smaller value should be selected from the back portion, and the first element of
the back portion should be swapped with it.

The figure out the efficiency of this, you need to figure out the number of comparisons to be
made. This will be (n-1) + (n-2)+ ... + i, which is Σn-1i=0 i, which is Θ(n2) always. In the worst
case, Θ(n) swaps need to be made.

Insertion Sort

Like selection, but the back is not necessarily related to the front (it can be any size).

To sort, you take the first value of the back portion and insert it into the front portion in the
correct place.

The efficiency of the comparisons in the worst case is 1 + 2 + 3 ... + n which gives us the Σn-
1 2
i=0 i formula again, which is Θ(n ) . However, in the worse case, comparisons are only Θ(n).
The efficiency of the swaps at best is 0, but in the worst case this is Θ(n2), however this
doesn't happen often (only if the list is reverse ordered).

We could have used binary search here, which would have theoretically improved the time to
find the insertion point, but due to a high leading constant this makes it poor in practice.

This sorting algorithm is good for small lists - it has a small leading constant so the Θ(n2) isn't
too bad. However, it is also good for nearly ordered lists, where the efficiency is likely to be
nearly linear.

There are two theorems regarding sorting:

 Any sort that only compares and swaps neighbours is Ω(n2) in the worst case
 A general sorting problem has Ω(n log n) in the worst case.

Heap Sort

The heap has already been covered when we did priority queues.

1. Every element is put in a heap priority queue (any implementation will do, but a heap is fast).
2. Remove every element and place in successive positions in a, our sorted array.

n elements are inserted in log n time and n elements are removed in log n times,
so n log n + n log n = Θ(n log n).

However, heap sort is not very memory efficient due to duplicating arrays. Also, it has a very
high leading constant.

The Dutch National Flag Problem

👻👻moodswings👻👻
This algorithm deals with sorting three sets of data. You have an array of data sorted into 4
partitions, 3 of known type and one of unknown. The partitions are ordered so the unknown
type is the third one.

To sort, the first element in the unknown type is compared. If it's part of the first partition, the
first element of the second partition is swapped with it, moving the second partition along one
and growing the first partition. If it's part of the second partition, it's left where it is and the
pointer moves on. If it's part of the last partition, it's swapped with the last member of the
unknown partition, hence shrinking the unknown partition and growing the final known
partition by one.

Quicksort

Quicksort was invented by Sir Charles Antony Richard Hoare and it consists of series of
sorted areas and unsorted areas, where the unsorted areas are of a known size laying between
the two sorted areas.

The starting and finishing states are special instances of the above invarient. The starting state
is where the unsorted area is of size n and the sorted area is of size 0, whereas the finished
state is where the unsorted area is of size n and the sorted area is of size 0.

The first step sorts to where there are three areas, of which the middle one is called the pivot.
Everything before the pivot is known to smaller than the pivot, and everything after the pivot
is known to be larger than the pivot - this is accomplished using the Dutch National Flag
algorithm. Each unsorted section is then sorted again using this mechanism, until the array is
totally sorted.

A binary tree could be used to implement Quicksort.

This is called a divide and conquery mechanism as it runs in parallel. The Quicksort gives an
average time of n log n, and this is the first sorting algorithm that did this. However, in the
worst case, this could be n2 if a bad pivot (either the smallest or largest thing in the array) is
chosen, although the worst case is pretty rare.

If we use a sensible pivot, we can improve the efficiency of Quicksort. A way to do this in
constant time is to use the 'median-of-three' method. This is picking the first, middle and last
values from the unsorted secitons and then using the median of these values as the pivot. We
don't waste any comparisons here either, as the other 2 compared values can then be sorted
based on their relationship to the median.

Older textbooks may give the advice to implement Quicksort using your own recursive
methods, however in modern computing this is not a good idea, as optimising compilers are
often better than humans and this improves the leading constant.

For small arrays, quicksort is not so good, so if the array is short (e.g., less than 10), then
insertion sort should be used. Insertion sort could also be used as part of the Quicksort
mechanism for the final steps.

Overall, Quicksort is a very good algorithm: Θ(n log n)

👻👻moodswings👻👻
Merge sort

This is also a divide and conquer mechanism, but works in the opposite way to Quicksort.
Here, an array consists of some sorted arrays with no relationship between them. The first
state consists of n sorted arrays, each of length 1 and the final state consists of 1 sorted array
of length n. Two sorted arrays are merged to create a larger sorted array and this is repeated
until a final sorted array is reached.

This is also Θ(n log n), but it is always this complexity. Merge sort is very good for dealing
with linked lists, but it does require a lot of space.

Shell's Sort

This was invented by Donald Shell, but is sometimes called just 'Shell sort'. It is a divide and
conquer algorithm based on insertion sort.

Shell's sort breaks up the array into lots of little lists (e.g., each one consisting of every 4th
element) and an insertion sort is done on those elements (in-place of the main list) to get a
nearly ordered list which can then be efficiently sorted.

Shell used increments of 2k, where k gives us suitably small values for the initial sort, and this
gives us Θ(n2) in the worst case.

However, Hibbard noticed that you didn't need to use a divide and conquer mechanism in this
case, so we could use increments of 2k-1, which gives us Θ(n3/2) in the worst case and Θ(n5/4)
on average.

Sedgewick recommended using overlapping for further improvements, and this gives us
Θ(n4/3) in the worst case and Θ(n5/6) on average, which is very close to n log n.

Based on our original invarients, we can come up with two really bad sorting algorithms. We
can either generate every possible permutation until we get the ordered one. This has
complexity n!, which is ω(2n). Alternatively, you could generate every possible ordered
sequence until we get one which is a permutation. This has a complexity of ∞.

However, based on these principles, we can generate a good sorting algorithm.

Bucket Sort

To do bucket sort, you need to produce a bag consisting of buckets (these buckets contain
either the number of elements it represents, as in the Dutch National Flag problem, or a
linked list of values, as in Quicksort). Then you produce an array that is an ordered version of
the bag.

This algorithm is Θ(n), which is faster than Θ(n log n), however this is not a general sorting
problem, as we must know beforehand what the possible values are. Bucket sort is therefore
poor for general lists, or lists where there's a large number of keys, as it requires a lot of
memory.

👻👻moodswings👻👻
In conclusion, insertion sort is the best for small arrays, Shell's sort is best for medium size
arrays and Quicksort (with optimisations) is best for large arrays.

Merge sort and bucket sort can also be appropriate, e.g., if there are a small amount of known
values. Heap sort can also be used if the hardware is optimised for it.

Equivalence Classes
An equivalence class is like an ordering, but it works on equivalence relations.

find(x) tells us what class x is in, however instead of returning the name of the class, it
returns a representative member (a typical value of the class).

union(x, a) unites the equivalence class of x with the equivalence class of a.

Forests can be used to implement this data structure. Each equivalence is represented as a
tree, and the representative element is the root of the tree.

To find, you start at the tree node and go upwards until the root is hit.

To merge, just just add a pointer to point one root to the other root, making one root now a
node of another tree. When deciding which direction the pointer goes, set the smaller tree to
point at the larger tree. Here, smaller can mean either the fewest nodes (in which case we call
the operation union by size) or lowest depth (where the operation is called union by height).
However, in each case, information is required to be kept about each tree telling us either size
of height.

To improve tree depth, you can use path compression. Once a find has occured, the pointer of
that node can be updated to point directly at the root of the tree, and the same can be done for
every node along the path to the root.

This is a trade-off - more work now for less work later.

Most implementations combine union by height with path compression, however with path
compression it is difficult to keep an exact figure of height, as the height of a tree might
change, but it is difficult to figure out a new tree height, so an estimate is used called 'rank',
so the process is called 'union by rank with path compression'.

Amortised Analysis

Amortised analysis is finding the average running time per operation over a worst-case
sequence of operations. Because we're considering a sequence, rather than a single operation,
then optimisations from earlier path compressions can be considered.

For this structure, we have Ο(m log*n). The definition of log*n is log*n = 0 if log n < 1, else
log*n = 1 + log*(log n). This is a very, very slow growing operation that gives us a value very
close to m (linear).

👻👻moodswings👻👻
We can implement this data structure using an array. If key(x) = value(x), then this is a root,
otherwise parent(x) = value(x). Roots also need to hold a second value, showing the rank of
the tree they represent.

Graphs
Graphs are representations of networks, relations and functions of pairs. They are made up of
nodes (or vertices) and they are connected by edges (or arcs).

Graphs have various dimensions to consider, and the most important of this is
directed/undirected.

Another dimension by which graphs differ are weighted/unweighted. The above graphs are
unweighted.

An unweighted graph can be represented as a weighted graph where the weighting is 1 is an


arc exists, otherwise the weighting ∞.

In a graph, a path is a sequence of touching edges, and a cycle is a path whose start node and
end node is in the same place and the path length is ≥ 1 edge. From this we can define an
acyclic graph, which is a graph where there are no cycles.

Directed acyclic graphs (DAGs) are an important type of graph - forests are special types of
DAGs and DAGs are very nearly forests, but they don't have the concept of a graph root.

In undirected graphs, it is easier to define a graph component. We can say that A and B is in
the same component if a path exists between A and B. In directed graphs, it is harder to
define what a component is, as it is not necessarily possible to get to a node from another
node, but it may be possible to do the same in reverse.

👻👻moodswings👻👻
See ICM for more
about strong/weak
connections in graphs

We can consider a graph to be strongly connected if, ∀ nodes A,B, A ≠ B, ∃ path A→B,
B→A. For a graph to be weakly connected, direction is not considered.

The final thing to consider is whether or not the graph is sparse (has few edges) or dense
(almost every node is connected with an edge).

We can have a spanning forest if every node in the graph is in a tree and connectivity
between the elements is the same as the graph (all connectiong in tree = same as original
graph).

Now we can consider how we would implement graphs.

Adjacancy Matrix

Each cell contains the distance between two points, e.g.,

A B C D

A 0 3 9 ∞

B ∞ 0 5 ∞

C 3 ∞ 0 ∞

D ∞ ∞ 1 0

This tells us (for example) that from A to C, this has weighting 9. This kind of structure has a
space complexity of Θ(v2), where v is the number of vertices, but it does have a lookup
complexity of Θ(1).

Adjacency List

In the real world, most graphs have a large v and a small e (number of edges), so we could
construct an array of linked lists and information is stored in the same way as an adjacancy
matrix, except where no arc exists then this entry does not exist in the linked list. This gives
us something that looks like this:

👻👻moodswings👻👻
This has a space complexity of Θ(v + e), which is linear in the sum of v and e and an access
complexity of Θ(e) - although this has a very small leading constant.

Spanning trees

More about spanning


tree in CLRS.

Spanning trees can be built with either depth-first (which gives us long, thin trees which can
be implemented using a stack) or breadth-first (giving us short, fat trees and implemented
using a queue) searches. The components in a directed graph can be considered by their
connection.

Topological Sort

The input to this sort is a partial ordering (which is a DAG) and the output is a sequence
representing the partial order.

To do a topological sort, you need to compute a depth-first search tree and then the reverse of
the finished order is a suitable sequence.

Depth-First Search

To do a depth-first search, pick any node and follow the children until it finds a leaf (or a
node where all children from that node have been visited) and put the leaf in the sequence.
Repeat for each child to that node until all the children have been visited and continue with
any other unvisited node.

This algorithm is non-deterministic.

There are two famous algorithms that work with graphs that we can consider. They both take
a weighted, undirected graph and they output a minimal spanning tree. Both of these
algorithms are greedy (one which makes the best possible step and assumes that is the correct
step to take) and are dual mechanisms of each other.

Prim's Algorithm

👻👻moodswings👻👻
This has the invariant that the current tree is the minimal tree for its subgraph. To implement
this algorithm, you pick a node and then choose the smallest (least weighting) arc to add to
the tree. You continue picking the smallest arc (that doesn't connect to an already connected
node) until all arcs are picking - the chosen arc can be from anywhere in the tree.

This can be implemented using a priority queue (a heap is best) and is Θ(e log v) in the worst
case.

Kruskal's Algorithm

The invariant in this algorithm is to maintain a forest of minimal trees that span the graph.
The algorithm works similarly to Prim's, but you start with n trees of size 1 and then the trees
are linked together using the shortest arc (where that arc doesn't link to an existing connected
node) until there is only 1 tree left. This can be implemented using a priority queue and the
union/find equivalence trees and is again Θ(e log v) in the worst case.

In practice, some evidence points to Kruskal's algorithm being better, but whether or not this
is the case depends on the exact graph structure.

Dijkstra's Shortest Path Algorithm

This takes inputs of a directed, positive weighted graph and a distinguished node and then
outputs a tree rooted at v that is the minimal of all such trees that reaches as many nodes as
possible.

The algorithm works by looking at all nodes going out of a tree and adding an arc to the tree
if it's the shortest route to that node.

Biconnectivity

An articulation point is a node in a connected graph with the property of removal leaving an
unconnected graph. A biconnected graph is a connected (undirected) graph with no
articulation points.

To find out if a graph is biconnected:

1. Build a DFS tree and number the vertices by the order which they were visited in the DFS -
this gives us N(v).
2. Calculate L(v). This is a number representing how far we can go up the three without

could only be done once) and following the tree downwards) L(v) = min( {N(v)} ∪ {N(w)
following the tree (i.e., using arcs that are not in the tree, but are in the graph (however this

| v → w is a backedge } ∪ {L(w) | v → w is a tree edge}) - this is calculated using a post-order


traversal of the DFS tree
3. The root is an articulation point if it has > 1 child, otherwise non-root node v with child w is
an articulation point if L(w) ≥ N(v)

Travelling Salesman

Is there a cycle that:

👻👻moodswings👻👻
1. reaches all nodes
2. only visits each node once
3. has a distance of less than k

The famous travelling salesman is a form of this problem, where k is the minimal possible
number of paths. There are no known good solutions to this, the only solutions that are
currently known are to try every single cycle and see which one is the best, this has very low
efficiency.

However, if we have a candidate cycle, how hard is it to check that the cycle does fulfill the
criteria? It is only Θ(n) to check. This type of algorithm is an NP-hard algorithm - non-
determinate and can be checked in polynomial time. We can use heuristics (approximate
answers) to solve NP-hard equations.

String Searching
We have a source string and a pattern. The source may contain an instance of the pattern
called the target. The question is whether or not the pattern is in the source, and if so, where
in the source. We have two variables, s, the length of the source and p, the pattern length.

Naïve Approach

Break the source into arrays of length p (containing each possible combination) and do a
linear search. This gives us Θ(sp) - linear in s and p.

Knuth-Morris-Pratt

See book for full description

The Knuth-Morris-Pratt method uses finite state automata for string matching.

Boyer and Moore's algorithm is an improved algorithm based on Knuth-Morris-Pratt.

Baeza-Yates and Gonnect

"A new approach to string searching"


Comm ACM vol. 35 iss. 10
pages 74-82 October 1992

1. Build a table, the columns are the pattern, the rows are indexed by letters of the alphabet
2. The table is enumerated such that cell is 1 if the row = column, otherwise it is 0
3. For each letter in the source, put the reverse of the patern from that row for that letter, offset
by the letter position of the source word
4. If we get 0s in a row, we have a match

This can be implemented using shift and or at hardware level, so is very fast.

👻👻moodswings👻👻
When deciding the complexity for this, we have a new variable, a, the size of the alphabet
(may also be constant). This gives us Θ(s + ap), and if a and p are relatively smaller, we
could consider this as just Θ(s).

The construction of the table allows a great deal of flexibility (case insensitivity, etc).

Algorithm Design Strategies


Divide and Conquer

Split the problem into two, solve it and combine answers

Greedy Strategy

Making locally optimal choices produces a globally optimal solution.

The opposite of this is a backtracking solution, keep making choices until you run into a dead
end, then go back to the last choice and change it to produce an optimal solution.

Dynamic Programming

When a sub-result is needed many times, store it the first time it is calculated (memoisation).

Random Algorithms

If we use an algorithm that gives us a false positive (such as in the case of Fermat's Lesser
Theorem for discovering primes), if we repeat the algorithm many times, the probability of a
false positive greatly decreases (in the case of Fermat's Lesser Theorem, by trying 100
values, the chance of error reduces from 0.25 to 2-200, which is less than the probability of
hardware error.

Heuristic Search Methods

There are two main types of heuristic search methods, genetic algorithms and simulated
annealing. These types of algorithms search all possible answers in a semi-intelligent
way. f is the function giving fitness of a value and f(s) is a measure of how good s is. We find
our answer when s is fit enough.

👻👻moodswings👻👻
CHAPTER 5:
Digital and Analogue Circuit Design
This is a straight continuation from IDD.

Binary Codes
Binary codes can be used to represent information. The most common formats are pure
binary (for numbers), which is weighted and ASCII (for characters) which is unweighted.

What is weighted?

It is positional notation, e.g., 3456 = 3 × 104 + 4 × 102 + 5 × 101 + 6 × 100. For binary, this
could be: 0111 = 0 × 23 + 1 × 22 + 1 × 21 + 1 × 20. 0 + 4 + 2 + 1 = 7.

To distinguish pure binary, you could say it was the 8421 code (from the column weightings).

The pure binary sequence has occurances where multiple digits need to change (e.g., 0111 ->
1000). The multiple digits may change at different rates, so erroneus intermediate states may
be generated - causing hazards.

Another code is grey code that is used to minimise hazards. The grey code is reflected and
unweighted.

Grey Code

Everything but the left-hand column is reflected by the mid-point. The left-most column is
reflected instead. You repeat moving down each column.

Binary Coded Decimal (BCD) codes

BCD codes can represent the decimal numbers 0-9. Different codes could be:

 8421 BCD
 7421 BCD
 5421 BCD
 5211 BCD
 8421 BCD
 Excess-3 BCD
 Grey BCD

In most cases, the names come from the weightings. Excess-3 is where you add 3 to every
pure-binary number (8421) to get the Excess-3 BCD.

Error Detection Codes

👻👻moodswings👻👻
See Introduction to Computer Systems for more information and parity, Hamming detection
and Hamming distance.

Minimum distance is the number of bits that need to be changed to make another valid word.

An SN74LS180 is a device that is a parity generator/checker.

Sequential Circuits
In addition to combinatorial circuits, which we have seen so far, we also have sequential
circuits.

Combinatorial circuits are ones whose outputs depend on the current input state. When inputs
change, the outputs do not depend on the previous inputs.

Sequential circuits are similar, but they do also rely on previous input states. It can be
inferred that they have memory.

The outputs can be taken directly from the memory elements also.

Memory Elements

These are called bi-stables, or flip-flops. They are capable of storing 1 bit of data as long as
they are powered. Generally, there are two outputs, Q and Q which give opposite outputs.
There is also typically a CLK (clock) input and reset lines (which are independant of the
clock and are active-low, meaning they float high, and can be left hanging).

The type of flip-flop determines the state to which they switch and the inputs give them their
name: SR, D, T and JK.

Clock

👻👻moodswings👻👻
A step-down transformer can be used with a Zener diode and a Schmitt trigger inverter to
generate a clock based on mains voltage (50 Hz). Mains voltage is regulated so that a clock
based on it will never be more than a minute out.

Crystals can also be used to create a stable frequency, but long-term drift does occur.

Types of Sequential Circuit

Synchronous

The same clock signal is applied to each flip-flop, and changes in state occur when the clock
changes state from one level to another.

Asynchronous

The behaviour of an asynchronous circuit depends on the order in which the inputs change.
Sometimes, there is an input labelled clock, that provides some level of synchronisation, but
it is normally only applied to one flip-flop. In addition to this style of asynchronous circuit,
you also get gate-level asynchronous circuits, which are combinatorial circuits with feedback.

Gate-level Asynchronous

A gate level asynchronous circuit looks like this:

And this can also be redrawn to make a SR flip-flop.

👻👻moodswings👻👻
Flip-flop characteristic tables are like truth-tables, but for sequential circuits. They show the
states of inputs and outputs after the clock arrives.

D-type

D Q(t+1)

0 0

1 1

The value of D is transferred to Q after the clock arrives.

T-type

T Q(t+1)

0 Q(t)

👻👻moodswings👻👻
1 Q(t)

The value of T determines whether or not Q should change state (toggle).

JK-type

J K Q(t+1) Meaning

0 0 Q(t) no change

0 1 0 reset

1 0 1 set

1 1 Q(t) toggle

The JK is a general purpose flip-flop.

SR-type

S R Q(t+1) Meaning

0 0 Q(t) no change

0 1 0 reset

1 0 1 set

👻👻moodswings👻👻
1 1 ??? don't use

From the JK-type, you can make other types of gate. For example, by attaching a line to J and
K, you have a T-type, or by attaching a line to J and the inverted linke to K, you have a D
type gate, and the SR is similar to JK anyway. You can also make a JK-type from the other
types, but it's more complicated.

Sequential Circuit Analysis

X Y Q(t) a b T Q(t+1)

0 0 0 0 0 0 0

0 0 1 0 0 0 1

0 1 0 0 1 1 1

0 1 1 0 0 0 1

1 0 0 0 0 0 0

1 0 1 1 0 1 0

1 1 0 0 1 1 1

1 1 1 1 0 1 0

Note: Y is equivalent to J and X to K.

Frequency Division by 2

👻👻moodswings👻👻
Shift Registers and Counters
Shifting and Dividing

When D-type flip-flops are connected in parallel, inputs are passed from one flip-flop to the
other, providing a shifting operation. The shift loses the least significant bit, causing the
divide to be rounded down. A 0 must be shifted onto the word for it to work. If you left-shift,
you can multiply by 2, but this time, the most significant bit is lost.

A T-type flip-flop has only half the period of the input signal, therefore can be used for
dividing by 2. Connected in series will lead to dividing by 2, 4, 8, etc...

Universal shift registers (SN74AS194) can be used to do shift left or right, so you can
multiple of divide. They can also be parallel loaded.

The modes can be controlled as follows:

S1 S0 Action

0 0 Inhibit clock

👻👻moodswings👻👻
0 1 Shift right

1 0 Shift left

1 1 Parallel load

Parallel Loading

By shifting numbers into the shift register, you can then read them off in parallel. For example, if you
shift Q0, Q1, Q2 and Q3 into a 4-bit shift register, you can then read off the values in parallel to get a
parallel word of Q0, Q1, Q2 and Q3.

Asynchronous Counter

Because you can use T-types to divide, if you make them asynchronous and put them in
serial, you can create a counter. By attaching the T-type to 1, and then using Q to power the
clock for all flip-flops after the first one, you can inspect Q for each flip-flop to get a binary
word representing the current value in the word.

Then, by combining Q for the first flip-flop with Q for each subsequent flip-flop into a
NAND, and then into the CLR circuit, you can reset the counter to prevent overflow.

Sequential Circuit Design


Mealy Machine

This is sometimes referred to as a Class A machine.

👻👻moodswings👻👻
The outputs are a function of the current state (according to the memory elements) and the
inputs, according to the output decoder. The next state decoder determines the next-state to be
held in the memory elements. It is assumed there is also a clock for the memory elements.

Moore Machine

This is sometimes referred to as a Class B machine.

👻👻moodswings👻👻
A Mealy machine is like a class A machine, but the output is only a function of the current
state, and not the outside world.

Class C Machine

A Class C machine is like a Class B machine, but with no output decoder.

Class D Machine

👻👻moodswings👻👻
A Class D machine has no outside world inputs, i.e., it is a pure signal generator.

State Diagram

A state diagram is a pictorial representation of the state table.

Present State Inputs Next State Outputs

A B x A B y

0 0 0 0 0 0

0 0 1 0 1 0

0 1 0 0 0 1

0 1 1 1 1 0

1 0 0 0 0 1

1 0 1 1 0 0

1 1 0 0 0 1

1 1 1 1 0 0

For the above state table, the state diagram would look like:

👻👻moodswings👻👻
The circle shows the current state and the arrow show what transition occurs with that x/y
value pair. Note, the values are the values after the transition, when the circuit has become
stable.

Excitation Tables

When designing a sequential circuit, it is necessary to know what inputs are needed to trigger
the desired transition (as given in the design). An excitation gives this information, and is
kind of a reverse characteristic table.

D-type

Q(t) Q(t + 1) D(t)

0 0 0

0 1 1

1 0 0

1 1 1

T-type

Q(t) Q(t + 1) T(t)

0 0 0

0 1 1

1 0 1

1 1 0

SR-type

Q(t) Q(t + 1) S(t) R(t)

0 0 0 X

👻👻moodswings👻👻
0 1 1 0

1 0 0 1

1 1 X 0

JK-type

Q(t) Q(t + 1) J(t) K(t)

0 0 0 X

0 1 1 X

1 0 X 1

1 1 X 0

From the excitation tables from the individual flip-flops, it is possible to create an excitation
table for a sequential circuit. The table is broken up into 3 sections: Current state; Next state;
and Flip-Flop Inputs. The Current State column lists the state before the clock edge, the Next
State lists the state required after the clock edge, and the Flip-Flop Inputs list the inputs each
input requires to get that desired Next State.

Flip-Flop Selection

Although the choice of flip-flops is often out of your control, if you do have a choice over
which flip-flops you could use, some are better suited to tasks then others.

 D-type flip-flops are best for shifting operations (e.g., shift registers).
 T-type flip-flops are best for complemention operations (e.g., binary counters)
 JK-type flip-flops are best for general applications (they are the most versatile, and it is the
easiest to make the other types out of)
 SR-type flip-flops are safe to use also, as long as the circuit is designed properly, so the 1,1
condition never occurs across the inputs.

The basic steps to sequential circuit design are as follows:

1. Create the state diagram from the written specification


2. Minimise the number of values (state reduction)
3. Assign binary values to the states (state assignment)
4. Select the flip-flops to be used
5. Create the excitation table for the circuit
6. Minimise each next state decoder output
7. Minimise each output from the output decoder
8. Analyse the design and check it meets the specification

👻👻moodswings👻👻
There are additional stages to consider in sequential circuit design which can be found in
the online DAD notes.

Asynchronous Analysis of Flip-Flops

Modern flip-flops change on a clock edge. Early flip-flops changed on the pulse itself, which
led to difficulties when combining flip-flops into counters and registers.

Switch Debounce

When a mechanical switch is thrown, the contact vibrates a few times before settling in a
state. The amount of vibration is unpredictable, but leads to a sequence of pulses from 0 to 1
rather than a simple transition. If this switch is clocking a signal generator, it will not change
state on one application of the state, but clock through a variety of states, the number being
unpredictable.

A debounce circuit - based on the SR flip-flop - is required to correct this problem.

The mechanical switch used in this circuit must be a make-before-break circuit - it must
break contact before it makes the next contact.

SR flip-flop

In the characteristic table for the SR flip-flop, we can see that we should never set SR to 11.
Similarly, we should never set a SR flip-flop to 00. Doing this causes an unstable state that
causes Q and Q to oscillate between 00 and 11 forever due to the feedback. In reality,
however, this does not really happen, as one gate will be faster than the other, and it'll settle
unpredictably on 01 or 10.

Although the inputs are inverted to form an SR flip-flop from a SR flip-flop, the same
problem remains.

Asynchronous Analysis

If we consider the SR flip-flop where a and b are the inputs and x and y are the feedbacks
from the outputs, we can draw two transition tables:

A = bx

👻👻moodswings👻👻
ab\xy 00 01 11 10

00 1 1 1 1

01 1 1 0 0

11 1 1 0 0

10 1 1 1 1

B = ay

ab\xy 00 01 11 10

00 1 1 1 1

01 1 1 1 1

11 1 0 0 1

10 1 0 0 1

We can then combine the two:

ab\xy 00 01 11 10

00 11 11 11 11

01 11 11 01 01

11 11 10 00 01

10 11 10 10 11

The ringed terms indicate stable states.

Races in Asynchronous Circuits

Mano p347-348

A race condition is said to exist in an asynchronous sequential circuit when two or more
binary state variables change state in a response to a change in input variable. When unequal
delays are encountered, a race condition may cause the state variables to change in an
unpredictable manner.

If the final state of the circuit does not depend on the order in which the state variables
change, the race is called a non-critical race. If it is possible to end up in two or more stable
states depending on the order in which the state variables change, this is called a critical race.

👻👻moodswings👻👻
Analysis of a transition table can show race conditions, but it does assume 0 gate delay,
which is not true. The transition table can not predict the behaviour of a circuit where a
precisely timed enable line ensure correct logical behaviour.

Enabling Flip-Flops

If you use a NAND inverter into a SR flip-flop, and then use the spare input, you can make
an enabling line. When the enable line is low, S and R can take any input and Q and Q will
not change. When the enable line is high, it will behave like a normal SR flip-flop. This
enabling mechanism is sometimes called a level-sensitive clock.

If you invert S in the above circuit and connect it to R, you will get an enabled D-type flip-
flop, which is a well-behaved circuit called a latch - which is still widely used.

Transition table analysis


can be found for common
flip-flop types on the
lecture slides.

Hazards

A logical analysis of an asynchronous circuit can be achieved by breaking the feedback links
and forming a Karnaugh map. As a Karnaugh map, static hazards can be determined.

Essential hazards occur when an input changes and this is not detected by all the excitation
circuits before the state variables are sent back to the excitation circuits.

Modern Flip-Flop Clocking Mechanisms

Early flip-flops had enabling/disabling clocks - these were called level-sensitive. Although
early SR and D-types were well behaved, the T and JK types required the enabling clock
pules widths to be precisely set for each flip-flop.

The basic enabling clock pulse made D-type flip-flops connected in a shift register difficult to
be controlled. The pulse duration had to be shorter than the input-ouput delay of the flip-flop.

Modern flip-flops are synchronised on the edges (transitions) of the clock, not the pulses
(levels). There are three basic mechanisms for implementing this - edge-triggered, master-
slave and master-slave with data lockout - and they are either positive edge clocking (on the
rising edge) or negative edge clocking (on the falling edge). Once the clock has passed the
threshold, the inputs are locked out and is unresponsive to changes on the input. There are
finite times when the inputs must not change. These are:

 Setup time (tsu) is the time for input data to settle in before the triggering edge of the clock
occurs. Typical value is around 20 ns, but in reality this is shorter.
 Hold time (th) is the time required for the input data to remain stable after the triggering edge
of the clock occurs. Typical value is around 5 ns, but in reality this is shorter.

👻👻moodswings👻👻
In a manufacturers data book, minimum values are specified, which is the shortest intervals
for which correct operation is guaranteed.

The propogation delay of a flip-flop is defined as the time between the clock reaching the
clocking threshold (the transition point) and the outputs responding to the flip-flop inputs as
they were, immediately before the clock acheived the threshold.

Master-Slave Flip-Flops

A master-slave flip-flop consists of a master flip-flop of the basic types (D, T, SR or JK) and
a slave flip-flop (an SR type) with an inverted clock.

The setup time is determined by the pulse width and the hold time is 0. On the rising edge
two things happen - the master is isolated from the slave and the inputs are read. On the
downward edge, the flip-flop inputs stop being read and the master is connected to the slave.

The clocking edge can be positive or negative according to the device.

The states of the input should not change while the clock is high, otherwise the master is set
or reset and can not be changed again and the output of the slave will change accordingly.
This is a problem with synchronous sequential circuits with asynchronous inputs and the
input changes when the clock is high. A data-lockout variation of the master-slave flip-flop is
used that allows the device to load the next stage information into the flip-flop on the
preceeding edge waveform.

In a data-lockout flip-flop, there is a setup and hold time during which the inputs must not
change. After the hold time has expired, the inputs may change without creating erroneus
results. After the clocking edge (and propogation delay), Q will change accordingly. The hold
time after the clocking edge is 0.

Synchronising Asynchronous Circuits

Synchronous circuits which are controlled by asynchronous inputs may cause flip-flop inputs
to change during setup and hold periods - such changes may cause an inappropriate jump in
sequence. It is important to synchronise the asynchronous inputs so they do not occur during
setup and hold times - this is acheived by latching (with a latch) or registering (with a
register) the inputs.

When the input changes during the flip-flops setup time, metastability can become a problem.
The output is unstable for some time before it settles to a stable state and the output level can
be halfway between 0 and 1 and be recognised inconsistently as either 0 or 1. The length of
time the output is metastable is unpredictable.

Synchronous Circuit Timing Constraints

A flip-flop is limited in speed by the switching times of the transistors that make up the flip-
flop. The speed of the clock is determined by such factors as: the propogation delay of the
flip-flop, the propogation delay of the next-state decoder and the setup time of the flip-flop.

Maximum Flip-Flop Clocking Frequency

👻👻moodswings👻👻
This is the highest rate a bistable circuit can be driven whilst still maintaining stable logic
transitions and states. Minimum values are 20-25 MHz and typical values are more like 25-33
MHz.

Minimum Flip-Flop Pulse Duration

The time interval between specified reference points (about 1.3 V on each edge) on the
leading and trailing edge of the pulse waveform. Minimum values are 20-25 ns.

Races in Synchronous Circuits

A synchronous circuit behaves correctly if the flip-flop inputs do not change during their hold
time. Where flip-flops are coupled together directly with a common clock, the output changes
after the propogation delay of the clock has elapsed. That delay must be longer than the hold
time of the next device so it does not see a change during it's hold time. However, if flip-flops
from different families are connected, this could cause a problem. This is why 0 hold time is
important.

Clock Skew

In a synchronous sequential circuit, all flip-flops are supposed to be clocked at the exact same
time. If this does not happen, it causes clock skew. Clock skew can be caused by:

 unequal gate delays in clock buffers


 unequal wire length
 unequal clock threshold voltage

A small amount of clock skew is tolerable, however. To determine the clock skew tolerance,
the minimum flip-flop propogation delay needs to be known, but this is not always given in
manufacturer's data sheets. The guaranteed minimum setup time (the actual will probably be
less) also needs to be known. For master-slave devices with data lockout, the effective
allowable clock skew is the minimum propagation delay minus the setup time plus the clock
pulse width. Thus, the designer can set the maximum allowable clock skew needed by
changing the pulse width.

Computer Memory
There are different types of computer memory:

 Magnetic Disk - non-volatile data not immediately needed, often called backing store or
secondary memory.
 Integrated Circuit Random Access Memory - volatile data immediately needed, this is main
or primary memory. Main memory is often supplemented by fast register-based that stores
some of the items that have been recently stored in or accessed from main memory. This is
called cache memory and exists to speed up memory access.
 ROM - non-volatile, usually used for containing bootup instructions. ROMs also store micro-
instructions that define the instruction set of a central processor. Some CPUs allow micro-
instructions to be changed, thus redefining the instruction set. In some cases, the instruction
set could be changed to make particular programs and languages run more efficiently, e.g.,
PERQ.

👻👻moodswings👻👻
RAM comes in two main forms: SRAM (Static), based on flip-flops and DRAM (dynamic),
based on storage of charge. DRAM allows for approximately 4 times greater chip density, but
in reading back data in DRAM, the charge is destroyed and must be written back, requiring
greater circuit complexity. DRAM is slower than SRAM, but uses less power.

See ICS/CAR for


information about
organisation of
memory

To connect to RAM, you need to select an address (mar/memory address register), a


read/write control and data in/out lines (mbr/memory buffer register).

SRAM

An SRAM cell looks like:

A bank of cells forms RAM. In an 8 bit word, 8 cells are in parallel, and a decoder selects
which line to read.

Memory Address Decoding

Memory chips include a decoder that converts the address into the corresponding word in
memory to be accessed. An AND gate is used to perform this decoding and there is one AND
gate driving the enabling control line. A million word memory requires at least a million
AND gate, with each AND gate having 20 inputs (as in a parallel decoder).

So as the address requirement becomes large, the number of inputs to each AND gate
becomes less tractable to provide, so alternative structures such as the balanced decoder or
tree decoder are adopted.

A parallel decoder is just a normal decoder as looked at before, however a tree decoder works
in multiple stages, with one bit being added at each stage, meaning only 2-input AND gates
are needed.

👻👻moodswings👻👻
A balanced decoder is similar but backwards. Two inputs are mixed together, and then the
outputs from those are mixed together until the final mixing stage at the end.

Decoder No. of Address No. of Output No. of AND No. of AND gate Levels of
Type Lines Lines gates inputs delay

Parallel 4 16 16 64 1

Tree 4 16 28 56 3

Balanced 4 16 24 48 2

The parallel decoder is probably fastest, but the balanced decoder is probably most practical.

The output/input lines are often passed through bus drivers/receivers, so that the same pin on
the package can be used for both input and output and can be connected directly to a bus. The
basic memory cell is a latch that is enabled by the select line going high. Prior to the falling
edge there is a setup time criteria to be met and after the falling edge there is a hold time
criteria to be met.

Since the input data and write-enable signal is applied to each word, it is important for the
address to be stable before the memory enable occurs, otherwise the data could be placed in
the wrong word.

Memory access time is the time it takes to obtain stable output data after the word has been
selected - this is what is meant by 20 ns RAM. CMOS RAM has access times in the order of
15 ns, whilst TTL/CMOS has access time of around 3 ns.

Static RAM is often used as cache memory in DRAM-based high performance computers.
This enhances their performance by storing previously accessed or written data, or storing the
data in contiguous blocks of memory.

See ICS for more


information on error
correction

Both SRAM and DRAM can suffer from occasional errors in the stored data due to many
factors, such as cosmic rays. Hamming codes can be used to aid detection of words.

DRAM

A 64 k 1-bit DRAM is physically constructed as a 256 × 256 array of bits. 16 bits of address
are needed to address the 1-bit word. The address is presented as 2 8-bit words in sequence
requiring a row address strobe (RAS) and a column address strobe (CAS). To refresh, the
RAS line is made high, the address is stored and the whole row is stored in a 256-bit latch.
Then RAS goes low and the row is written back from the latch.

👻👻moodswings👻👻
In the read cycle, a row is specified and stored in the 256-bit latch. A column is then read and
stored in the column address register. A multiplexer is then used to select 1 bit from the row
latch which is made available at the output. The row address strobe goes low and the row is
written back.

In the write cycle, the row is addressed and stored in the 256-bit latch. WE (Write Enable) is
asserted and then CAS is asserted. The bit on the input line is read into the 256 bit latch
(using a demultiplexer) using the column address register and the latch is written back into
the array when the row address strobe falls to zero.

It is possible to read/write part or all of a row before the writing back process occurs.

Binary Arithmetic
The decimal number system is a positional number system, with numbers in different
columns weighted at base 10. The binary number system is identical, but with base 2.

In digital systems, registers are finite in size and limit the number of digits. If the
computation generates a number that is too big for the register, an overflow will occur. In a
computer system, this is often caught as an interrupt. If you wanted to add two n-bit positive
numbers, you would need an (n + 1)-bit register to hold the result.

Binary subtraction happens in much the same way as decimal, so a subtraction circuit must
examine signs and magnitude and decide what operation to perform and then figure out the
sign of the result. This is complex, so two's complement is used.

The benefit of this is that two positive or negative integers can be added together to produce
the correct result. To convert a binary number into two's complement involves forming the
one's complement (flipping all the bits) and adding 1. Therefore, a two's complement number
with 0 for the msb indicates a positive number, whereas one with the msb set to 1 indicates a
negative number, which must be converted back to a normallised number for humans to
digest.

If subtraction uses two's complement, an addition circuit can be used for both addition and
subtraction. If two n-bit positive numbers are to be added, then the result requires (n + 1) bits
to store, so giving that the msb signifies the sign of the number, two n-bit numbers engaged
in adding or subtracting requires (n + 2) bits for their storage and result.

Binary multiplication is similar to decimal multiplication:

 Multiply (AND) single digits


 Shift (by the position of the multiplier)
 Add to the accumulator.

If we multiple a negative (two's complement) number by a normal one, we must do two's


complement on the answer to get the correct answer. If two n-bit positive numbers are being
multiplied, then we might need up to 2n bits to store the result. In the case of a negative
number, this becomes (2n + 1).

👻👻moodswings👻👻
Binary division is a trial and error process based on long division in decimals. At each step a
trial division is made by subtracting the divisor. If the result is positive, the divisor goes into
the dividend, so the appropriate quotient digit becomes 1. If the result is negative, the
quotient digit becomes 0 and the divisor is added back.

Handling negative numbers is often handled by converting them to unsigned integers and
then applying rules to handle the signs (XOR).

Division using two's complement is possible, but is more complex than this method.

The form of numbers looked at so far are called fixed point numbers. Our digits are weighted
by a fixed position. As such, we can continue down the weighting (2-1, 2-2, etc). This also
works for two's complement. The accuracy of such a number is limited by the amount of bits
you have, so some rounding may be required.

IEEE floats were


covered in ICS.

We can also use floating point numbers, which are defined by an IEEE standard. There is a 1
bit sign (S), an 8-bit exponent (E) and a 23-bit fractional mantissa. A floating point number is
computed from this representation by: X = (-1)S × 2E - 127 × (1 + F). There are also special
cases to allow representations.

Two's complement also creates a problem with zero. We are only allowed one representation
for 0, but there is an extra negative number (-2n - 1 that has no positive counterpart (although it
is 1000, it does not behave like 0). Care must be taken to ensure this number is not used.

Two's complement numbers can be used inside the PC without constantly converting to
another form (although it must be converted for outside the computer use), however left and
right shifting for multiplication and division does not work with two's complement,.

Adding and Subtracting in Hardware

Normally if A-B, the magnitude of B is converted to its two's complement and added to A. If
the msb of the result is 1 then the result must be two's complemented to obtain the magnitude
of the negative number.

We can create a half-adder by truth table analysis:

A B Sum Carry

0 0 0 0

0 1 1 0

1 0 1 0

1 1 0 1

👻👻moodswings👻👻
So, Sum = A XOR B and Carry = AB.

A full adder needs to consider the previous carry bit when adding, so this gives us a new truth
table:

A B Ci-1 Sum Carry

0 0 0 0 0

0 0 1 1 0

0 1 0 1 0

0 1 1 0 1

1 0 0 1 0

1 0 1 0 1

1 1 0 0 1

1 1 1 1 1

We can conventionally minimise this, but another solution is to write it out in the canonical
form and use boolean algebra. Using this, Sum = A XOR B XOR Ci-1 and carry = AB +
(A XOR B)Ci-1. Therefore, we can implement a full adder using two half-adders (XOR's can
only have 2 inputs, so we need to chain them).

Although this is the standard textbook design of a full adder, due to constaints with the speed
of the carry, they aren't really implemented like this. A full-adder is deduced using map
factoring (Karnaugh maps and some intuition), which is likely to be less hazardous due to lots
of looping.

👻👻moodswings👻👻
A 4-bit binary adder uses parallel binary adding, with the adders connected in serial.
However, the correct addition of a bit depends on the states of the carry bits before it -
leading to a "ripple effect" of carry bits being computed and the correct answer propogating
through. This ripple effect determines the operational speed.

We should use look-ahead carry to increase the speed of our adder.

The look-ahead carry provides parallel generation of each carry bit at the expense of
additional gates. The sum function for the full-adder remains the same, but the carry function
is reconsidered.

Two carry bits are needed, a generate and propagate:

 G = AB
 P = A XOR B

So the carry function becomes: C = G + PCi-1 and the sum function becomes: S = P XOR Ci-1.
Both of these terms can be generated using half-adders.

Subtraction uses two's complement. By using XOR gates and a mode line connected to the
gates and Cin you can invert the inputs and then add 1, giving you the numbers in 2's
complement. By setting the mode line to 0 normal addition can still occur on the same gate.

Truth table analysis would give us an expression for the subtractor which can then be used to
construct a circuit.

Comparators and ALUs

A comparator compares two binary words for equality. Most also provide arithmetic
relationships between them - i.e., outputs indicating greater than or less than. Generally the
arithmetic output assumes the words only represent a magnitude.

👻👻moodswings👻👻
ALUs are covered
in CAR and ICS

Arithmetic Logic Units provide arithmetic and logic operations on a pair of words.

XORs into a NOR gate or XNORs into an AND gate provide an equality test.

If you realise an XNOR in 5 gates, you can also extract greater than and less than from the
internal nodes, as well as equality from its output.

Comparators (SN74LS85) can be chained, so they normally have 3 additional


inputs: AGTBIN, ALTBIN, AEQB, which are taken into account when a comparison is done.

ALUs come in different sorts, and provide different operations on the inputs depending on
what is needed of them. For example, the SN74LS181 provides a 4 bit function selector and a
1-bit mode selector that allows it to choose between logic or arithmetic mode.

ALUs can be cascaded, so they often have additional inputs such as CIN (carry
in), P and G (carry propagate and generate) and COUT (carry out).

Multiplying and Dividing Integers

One way of doing integer multiplication is repeated addition, e.g., 3 × 4 = 4 + 4 + 4. You


should select the smaller number to minimise the number of adds, and this is implemented
using a sequential circuit. Another way of implementing this is using shifting and
adding. m bits numbers require m shift-and-add operations. This is also a sequential circuit.
The final implementation method is by designing a combinatorial circuit that accomodates all
possible combinations of inputs.

If the maximum bit size for unsigned integers is n bits, then 2n bits are required to store the
results. If negative numbers are to be accomodated, then 2n+1 bits are needed.

To do repeated addition, feedback from an adder is used for one word and then the other
word is the other input from an adder. A down-counter connected to inhibit the clock when it
reaches 0 is used to control the number of additions.

👻👻moodswings👻👻
Shift-and-add is similar, but a shift register is used to control the adding process, and the
second word is also attached to a shift register. If negative numbers are to be handled, then
the registers and adder must be n+1 bits wide.

Standard combinatorial logic can be used to create a combinatorial multiplier. Combinatorial


multipliers can also be chained together. For example, you can create a 4 × 4 multiplier with
4 2 × 2 multipliers like so:

A × B = (ALSB × BLSB) + 2((ALSB × BMSB) + AMSB × BLSB)) + 4(AMSB × BMSB)

Array based multipliers, which is based on the AND gate can also be used. Here, half-adders
and full-adders are combined to cascade the output for multiple-byte addition. Array-based
multipliers can be combined in a building block for VLSI replication.

For division, negative numbers are normally converted to unsigned, and then the final sign is
figured out by rule. One method of division is to keep subtracting the divisor from the
dividend until the result of the subtraction is negative - the number of repetitions less 1 is the
quotient. To the negative result, you then add back the divisor to obtain the remainder.

Given that a fast multiplier may already exist, it is possible to consider an approach based on
it. dividend = (divisor × quotient) + remainder. The quotient is set to 0, and then increment,
multiply the quotient and divisor, and if the result is greater than the divided, stop and
decrement the quotient for the correct answer.

You can also use restoring division, based on shift and subtract, except when the subtraction
results in a negative number, the divisor must be added back.

Non-restoring division is similar, except that when the subtraction results in a negative
number, the divisor is shifted and added. This results in n additions, compared with an
average of 1.5n additions for the restoring divisor. This is a fast, sequential process.

Combinatorial dividers exist which offer parallel division, but there are not covered.

Sequential Programmable Devices


Sequential programming devices are extensions to the PLDs we've already covered in IDD.

A simple way of deriving a sequential circuit using a PLD is to feed the outputs of a
combinatorials PLD into D-type registers. Many sequential PLDs already have D-type
registers built in, with the input lines of the PLD taking feedback from the output of the flip-
flops.

The registers have a common clock and clock on a rising edge, therefore we can create
synchronous only circuits. They have a setup time and an (effectively) zero hold time
(because of the AND-OR propogation delays). There are also preset and clear lines which are
independent of the clock.

On power-up, the registered outputs are usually set low, and the outputs are usually tri-state.

👻👻moodswings👻👻
An XOR PAL (programmable array logic) can be used to change the type of flip flop in the
register. J and K can be simulated by using two AND gates, one with K and Q and the other
with J and Q into an OR gate (replicating a T-type) with Q and T going into the XOR gate
leading to the flip-flop.

Although the registered PAL can be used to provide a sequential circuit, the D-type is not
ideal for counters, however, with the XOR gate interposed between the OR stage and the D
input can provide resetting, holding and toggling. You could consider a binary counter as "the
current value, plus one". For the lsb, 1 is added (and there is no carry), but for all other bits,
there may be a carry from a less significant bit. Once the addition has taken place by the
PLD, the next value of the counter is available at the D inputs which is then transferred to the
clocking outputs at the next clocking cycle.

Analogue Circuits
Analogue linear circuits are the opposite to digital circuits in that a range of voltages can exist
in a circuit. With linear circuits, relationships are proportional (as in V = IR, V against I is
linear with the constant of proportionality is R).

Electricity

Electricity is an energy supply that is convenient to produce, distribute and use. Emf
(Electromotive force) causes electric charge to flow. Charge (Q) is the measure of the surplus
or deficit of of electrons and is measured in Coulombs. The smallest amount of charge
possible is the charge on an electron (1.6 × 10-19 C) which is denoted with the symbol e.

Electric current (I) is the rate at which charge flows (I = dQ/dt). Current can only flow in
closed paths, so we speak of electric circuits. A useful electric circuit consists of at least one
device that produces a desired effect and an energy source that can maintain a current
throughout the circuit.

The fundamental quantities to be found in electric circuit analysis are voltage (V) and current
(I). These two are related, since the voltage across a device drives a current through it, or a
current through a device generates a current across it.

Potential difference is the work (measured in Joules (J)) required to move a unit of positive
charge (1 C) between two points - E = QV. It is a measure of the force which tends to move
charge from one point to another. Potential difference relative is ground (0 V) is often just
called potential. Emf is the potential difference from a voltage source, such as a battery.

Ohm's Law

In conductors, charge can move easily. In insulators, charges can't move easily.

Ohm's Law states that the potential difference across a conductor is proportional to the
current through it. The constant of proportionality is called resistance (R) and is measured in
Ohms (Ω). This gives us V = IR.

👻👻moodswings👻👻
Resistors are used for voltage to current conversion, or vice-versa. They are rated for their
ability to dissipate power. The inverse of resistance is called conductance (G) and is
measured in Siemens (S). This gives us V = I/G.

Power

We know that E = QV, however if we measure change across time, then V is a constant and
dE/dt = dQ/dt V. We know that power is the change in energy over time and current is the
change in charge over time, so we can derive that P = IV. This is measured in Watts (W) or
Joules per second (Js-1).

Ideal Sources

As we mentioned earlier, some kind of power source is required for the circuit to be useful. In
circuit analysis, it is often convenient to think about idealised energy sources.

An ideal voltage source is independant of the current passing through it and an ideal current
source is independant of the voltage across it.

Real sources are modelled as ideal sources in combination with a resistance, called the
internal resistance.

Kirchoff's Law

Kirchoff's First (Current) Law

The sum of the current entering any node is equal to 0 (Σi = 0).

According to Kirchoff's Current Law, I1 + I2 = I3

Kirchoff's Second (Voltage) Law

The sum of all potential differences around any closed path is 0. (Σv = 0).

e.g., if you have a voltage source (V1) and two devices in the circuit (V2 and V3) - V1 = V2 +
V3

Capacitance

👻👻moodswings👻👻
If we move Q Coulombs of charge between 2 uncharged conductors, the voltage difference,
V, between the two conductors is proportional to Q such that Q = VC, where the constant of
proportionality is called the capacitance of the conductor arrangement. It depends on factors
such as the shape and size of the conductors and the material or dielectric between them.
Capacitance has units in Farads (F), but typical circuit values are in the microfarad (µF) range
or smaller.

From Ohm's Law, we can derive that I = C dV/dt.

Inductance

If, in a coiled conductor, current is changing with time, it generates a magnetic field which is
changing with time. This induces a voltage across the coil itself with tends to oppose the
current. The voltage is proportional to the rate of change of current such that V = L dI/dt.
This constant of proportionality (L) is called inductance and it is measured in Henry's (H).
This is symmetrical to the formula for capacitance, and in ac circuits we find that inductance
has the opposite effect to capacitance.

Network Analysis
Network reduction uses analysis techniques to replace large numbers of components into
equivalent "black box" circuits. There are also general network analysis methods.

 Node - A point where 2 or more elements have a common connection


 Branch - An element or group of elements with two terminals (between two nodes)
 Mesh - A closed loop path, sometimes called a loop.
 Open circuit - A branch of infinite resistance
 Short circuit - A branch of zero resistance.

Delta-Star Transformation

For circuit analysis, it is often convenient to convert from a star arrangement of resistors to a
delta arrangement, or vice-versa. Any star equivalent resistor is given as the product of the
two delta neighbour resistors divided by the sum of all three delta resistors.

From this, we can derive the equation R1 + R2 = R12(R13 + R23) / (R12 + R13 + R23). Similar
equations can also be derived for R1 + R2 and R1 + R2 so simultaneous equations can be used
to solve the equation for a particular resistor.

👻👻moodswings👻👻
For star-delta transformations, we can manipulate algebraicly our existing equations to give
us:

 R12 = R1 + R2 + R1R2/R3
 R13 = R1 + R3 + R1R3/R2
 R23 = R2 + R3 + R2R3/R1

Any delta equivalent resistor is given as the sum of the two star neighbour resistors added to
a number which is their product divided by the remaining non-neighbour resistor.

Thevenin's Theorem

Thevenin's theorem is a circuit reduction technique particularly useful when one only needs
to know the state of a specified branch of the circuit (i.e., the voltage across and current
through the branch)

Any two terminal (resistive) network can be represented by an equivalent circuit consisting of
a voltage (Vt) in series with a resistance (Rt), where Vt is the open circuit voltage and Rt is the
Thevenin resistance - the resistances between the terminals with all energy sources removed
(voltage sources short circuited and current sources open circuited). Thevenin resistance is
measured as Rt = VOC / ISC

Norton's Theorem

This is a dual of Thevenin's Theorem - here, a current source and a resistance is used in
parallel to represent a two-terminal resistive network.

According to Norton's Theorem, any two terminal (resistive) network may be represented as
an equivalent circuit consisting of an ideal current source, In, and a parallel resistance, Rn,
where In is the short circuit current between the terminals and Rn is the same as defined in
Thevenin's theorem.

As Thevenin and Norton both model the same V-I characteristics, you can interchange them.
The relation between the current source in a Norton model and the voltage source in a
Thevenin model is simply VT = InR

The Principle of Superposition

In any linear network containing more than one source of voltage or current, the current in
any element in the network can be found by determining the current in that element when
each source acts alone and then adding the results algebraically.

Node Voltage Analysis

By finding the voltage at an unknown node, then you can figure out all voltages and currents
in the circuit based on that. Kirchoff's Current Law is used to find out the missing voltage.

From Wikipedia, the method is:

👻👻moodswings👻👻
1. Label all the nodes in the circuit (e.g. 1, 2, 3...), and select one to be the "reference node." It is
usually most convenient to select the node with the most connections as the reference node.
2. Assign a variable to represent the voltage of each labeled node (e.g. V 1, V2, V3...). The values
of these variables, when calculated, will be relative to the reference node (i.e. the reference
node will be 0V).
3. If there is a voltage source between any node and the reference node, by Kirchoff's voltage
law, the voltage at that node is the same as the voltage source's. For example, if there is a 40
V source between node 1 and the reference node, node 0, V1 = 40 V.
4. Note any voltage sources between two nodes. These two nodes form a supernode. By
Kirchoff's voltage law, the voltage difference at these two nodes is the same as the voltage
source. For example, if there is a 60 V source between node 1 and node two, V1 − V2 = 60 V.
5. For all remaining nodes, write a Kirchoff's current law equation for the currents leaving each
node, using the terminal equations for circuit elements to relate circuit elements to currents.
For example, if there is a 10 Ω resistor between nodes 2 and 3, a 1 A current source between
nodes 2 and 4 (leaving node 2), and a 20 Ω resistor between nodes 2 and 5, the KCL equation
would be (V2 − V3)/10 + 1 + (V2 − V5)/20 = 0 A.
6. For all sets of nodes that form a supernode, write a KCL equation, as in the last step for all
currents leaving the supernode, i.e. sum the currents leaving the nodes of the supernode. For
example, if there is a 60 V source between nodes 1 and 2, nodes 1 and 2 form a supernode. If
there is a 40 Ω resistor between nodes 1 and the reference node, a 2 A current source between
nodes 1 and 3 (leaving node 3), and a 30 Ω resistor between nodes 2 and 4, the KCL equation
would be (V1 − 0)/40 + (−2) + (V2 − V4)/30 = 0 A.
7. The KCL and KVL equations form a system of simultaneous equations that can be solved for
the voltage at each node

Mesh Current Analysis

This uses simultaneous equations, KVL (but not KCL), and Ohm's Law to determine
unknown currents in a network. The first step is to identify "loops" within the circuit
encompassing all components and then envisioning circulating currents in each of the loops
(the loop direction is arbitary, but the equations are easier if all loops are circling in the same
direction). A more formal method from allaboutcircuits.com follows:

1. Draw mesh currents in loops of circuit, enough to account for all components.
2. Label resistor voltage drop polarities based on assumed directions of mesh currents.
3. Write KVL equations for each loop of the circuit, substituting the product IR for E in each
resistor term of the equation. Where two mesh currents intersect through a component,
express the current as the algebraic sum of those two mesh currents (i.e. I 1 + I2) if the currents
go in the same direction through that component. If not, express the current as the difference
(i.e. I1 - I2).
4. Solve for unknown mesh currents (simultaneous equations).
5. If any solution is negative, then the assumed current direction is wrong!
6. Algebraically add mesh currents to find current in components sharing multiple mesh
currents.
7. Solve for voltage drops across all resistors (E=IR).

AC Circuits
In AC circuits, we need to consider magnitude and phase (to the source). We can use
complex numbers (which have a real and imaginary parts) to represent sinusoidal voltages
and currents (known as phasors) and impedences (resistance to flow of ac).

👻👻moodswings👻👻
Complex numbers are numbers of the form a + bj (where j is √-1. i is used to represent
current, which is why j is chosen to represent the imaginary number)

Complex Numbers

Arithmetic operations on complex numbers are consistent with those on reals:

 (a + bj) + (c + dj) = (a + c) + (b + d)j.


 (a + bj) - (c + dj) = (a - c) + (b - d)j.
 k(a + b)j = ka + kbj.

This is similar to the way vectors work. Indeed, complex numbers can be represented as
vectors on an Argand diagram.

The operator × j can be interpreted as a 90 degree clockwise rotation. e.g., j(2 + j) = -1 + 2j.

Additionally, j4 = 1.

As our complex numbers are vectors, we can represent them in polar


form. z = a + bj = r(cosθ + jsinθ). If we use Taylor expansions, we can break this down

👻👻moodswings👻👻
to ejθ = cosθ + jsinθ. Thus, ejθ is a unit circle on the Argand diagram and a general complex
number, z can be written as rejθ.

When adding and subtracting complex numbers, the Cartesian form must be used. When
multiplying or dividing, either form can be used, but the polar form is more convenient.

AC is distributed over the National Grid as it is easy to change voltage and current levels via
transformer action. When we refer to AC in electricity, we normally refer to circuits in which
the voltages and currents are sinusoidal in waveform. The sinusoid is important as it is easy
to generate with rotating machinery and addition, subtraction, scaling, differentiation and
integration produces sinusoids of the same frequency (hence linear (RCL) circuits have all
their sinusoids at the source frequency). Mathematically it's a good basis function to represent
other waveforms (fourier analysis).

Phasors

An ac signal (voltage or current) changes magnitude from instant to instant. Sinusoidal


quantities can be conveniently represented using rotating vectors called phasors. Linear
circuits with energy sources at a given frequency have all voltages and currents oscillating at
that frequency. We can use a phasor diagram that disregards frequency information and
captures the phase and magnitude information only.

By convention, phasors rotate anti-clockwise, so if a phasor has a positive j, we say it leads


by the given angle and if the phasor has a negative j, it lags by the given angle.

To discover the phasor of v, we can use KVL, so v = VR + VC + VL. These are vector
quantites, so they must be added vectorly (e.g., using a vector diagram). In this circuit, we

👻👻moodswings👻👻
chose current to be the reference, as it is a series circuit and the same current flows through
all 3 elements. The voltage across each element is different in voltage and phase, due to their
different impedances. The phasor diagram then shows the phase relationship between the
voltage and current. In this case, the current lags behind the voltage, as the inductive
reactance is larger than the capacitative reactance.

As we know from complex numbers, we can represent an ac quantity by the phasor AejΦ, and
this encapsulates both the magnitude (A) and phase (Φ) information.

Reactances

Now we have removed the dependance on time from our calculations, ac circuit calculations
can be done with apparantely constant quantities, e.g., reactance calculations.

 V = IZ, hence
 ZR = R and also
 ZC = -j/ωc (= 1/jωc)
 ZL = jωL

Impedance

Impedance (Z) is the resistance to the flow of ac currents (the inverse of this is admittance,
Y). Z may include:

 a non-frequency dependant part, called resistance (R). The inverse of this is conductance, G.
 a frequency dependant part called the reactance (X). The inverse of this is the susceptance, B.

V/I = Z = R + jX.

In general, X (a reactance) e.g., XC = 1/ωC and XL = ωL and ±jX is an impedance with 0


resistive component, e.g., ZC = -j/ωc and ZL = jωL.

Thus, resistances and reactances deal with magnitures, whereas impedances capture
magnitude and phase.

From Ohm's and Kirchoffs laws, we can derive that series combination works such that ZS =
Z1 + Z2 + ... and parallel combination works such that 1/ZP = 1/Z1 + 1/Z2 + ...

Frequency Response and Bode Diagrams

Often, we need to consider the behaviour of a circuit in the frequency domain, rather than the
time domain, e.g., filters, resonant circuits, etc...

Any signal can be generated by the summation of sinusoids. These signals can be seperated
using a filter (e.g., noise from a signal can be removed to isolate a particular frequency)

👻👻moodswings👻👻
In the above filter, the filter gain is VO/VI

Low pass filters can clean up a noise contaminated signal after it has passed through a
transmission medium. A high pass filter can remove DC between stages of transistor
amplification, so transistor biasing is not affected.

Band pass filters can be employed which only allows signals of a particular frequency
through.

The effectiveness of filters can be shown by a graph of frequency against gain. Ideal filters of
different types and a more realistic filter graph are shown below:

👻👻moodswings👻👻
Frequency Response of Circuits

cos(ωt) → filter → G cos(ωt + Φ)

The system behaviour depends on frequency because reactances depend on frequency. This
behaviour is known as frequency response. The frequency response can be measured by
varying the frequency of the input signal and measuring the amplitude and phase of the
output signal with regard to the input signal in steady state. The magnitude of the voltage
gain, G, and phase difference Φ can be plotted as a function of frequency to describe the
behaviour of the linear circuit.

Logarithmic Scales and Gain

👻👻moodswings👻👻
Frequency and gain (but not phase) can vary over huge ranges, so we use logarithmic scales
to represent these. A multiple of 10 of frequency is called a decade (similarly, a multiple of 2
is called an octave in music).

log10P is power P expressed in bels, however as this scale is very course, we use decibels (a
tenth of a bel). 1 dB is 10log10P, which is power P represented in decibels.

As P is proportional to V2, 20log10V gives the voltage gain in decibels (derived from
logarithmic laws, 10log10V2 = 20log10V.

0dB represents a unity gain (gain of 1) and negative dB represents attenuation. Gain can also
be a phasor, G = Vo/Vi, therefore we can derive that G = GejΦ/ej0, therefore G = GejΦ, where
Φ is the phase shift relative to the input.

Bode Diagrams

A Bode diagram is a method of representing frequency response. A Bode diagram plots gain
and phase shift against f on two seperate diagrams. They are intuitive and easy to draw.

Since impedances are complex functions of frequency, a filters' behaviour can be described
by a complex function of frequency. At certain frequencies (0 and ∞), the passive RC filters'
gain and phase tends towards certain values (asymptotes). Constructing the Bode diagram
requires drawing the asymptotes in a systematic manner.

If we take a simple example of a filter (Vo/Vi) we have a simple transfer function (from the
voltage divider). This gives us Vo/Vi = 1/(1 + jωT).

We can find the asymptotes in the plot by finding where ωT << 1 and ωT >> 1. ωT << 1 ⇒
| Vo/Vi | ≈ 1, hence GdB ≈ 0 dB. ωT >> 1 ⇒ | Vo/Vi | ≈ 1/ ωT, hence GdB ≈ -20log10(ωT)

In this case, if frequency increases by a factor of 10, then log10(ωT) increases by 1, hence gain
decreases by 20 dB - hence for sufficiently large frequencies, we have a slope of 20 dB per
decade.

We also need to find the point where the two asymptotes meet - this is called the breakpoint.
For first order low pass filters, the breakpoint defines the bandwidth of the filter. If Zc =
1/jωC, there may be no breakpoints. In our example, the breakpoint is where neither ωT << 1
and ωT >> 1.

For our sample Bode diagram, we can plot:

👻👻moodswings👻👻
For the phase plot, we can also consider the cases at the asymptotes, which gives us Φ ≈ 0°
for ωT << 1, Φ ≈ -90° for ωT >> 1. At the breakpoint ωT = 1, Φ ≈ -45°, so the phase plot
looks as follows:

In addition to Bode diagrams, sometimes Nyquist diagrams are used. A Nyquist diagram
represents frequency response by plotting gain and phase shift against frequency on the same
graph. It is the locus of the endpoints of the output phasor for an input of Vi = cosωt as ω is
varied from 0 to ∞. This kind of analysis is useful for control systems analysis.

Resonant Circuits

A mechanical example of a resonant circuit is a mass on a string (pendulum) that osciallates


at a natural frequency (dependant on factors such as mgh) regardless of the force exerted. In
this example, energy is repeatedly transformed between different forms (gravitational

👻👻moodswings👻👻
potential and kinetic, with a small amount of heat from friction) - this is typical of resonant
circuits.

In resonant electrical circuits, energy is transferred between the electic field of a capacitor
and the magnetic field of an inductor. The energy that is taken from the supply and
transferred to heat is dependant on the resistance. The peak amplitude is dependant on the
circuit resistance, which is analogous to the effect of friction on the mechanical system.

Sometimes resonance is to be avoided, particularly in mechanical systems, where it can be a


destructive force. Other times, it is desirable, such as in frequency selective or tuned circuits.
In practice, such circuits can be considered as narrow bandpass filters. The higher and
narrower the peak in the response, the better the frequency selectivity. Selectivity depends on
the resistance in the circuit, and this is referred to as the Q factor.

Diodes
RCL elements are linear (i.e., Vin ∝ Vout), however a diode is non-linear and therefore doesn't
obey Ohm's Law (it's resistance is undefined). Diode circuits therefore don't have a Thevenin
equivalent, and superposition doesn't hold.

10 mA or more flowing through a diode (from the anode to the cathode) gives a voltage drop
of about 0.6 V. The reverse current is negligable unless the reverse breakdown voltage is
reached (a typical breakdown voltage is 75 V). The diode may be thought of as a one way
conductor that has a small voltage drop across it when conducting.

Applications of Diodes

👻👻moodswings👻👻
Rectifiers

A half-wave rectifier can be made using a single diode.

This would give a wave form similar to:

Using four diodes, a full wave rectifier could be made:

Which gives us a the following waveform:

👻👻moodswings👻👻
The gaps in between is caused by the small voltage drop that occurs through each diode. In
the half wave rectifier, this is 0.6 V, due to their only being one diode and in the full wave
rectifier, the drop is 1.2 V as the wave passes across 2 of the 4 diodes.

A crude DC power supply could be made using a full wave rectifier combined with a
capacitor to smooth the waveform and a further regulator (such as a Zener diode) to smooth
the waveform out further.

Diode Clamps

Diode clamps are often built into the inputs of CMOS chips to help prevent static electricity
discharges occur during handling.

Here, when V1 goes above 5.6 V or below -5.6 V, the appropriate diode will start conducting,
limiting V2 to 5.6 V or -5.6 V

Zener Diodes

Zener diodes have similar characteristics to normal diodes, except they are designed to start
conducting at a specific voltage in reverse bias. Thus, when they are reverse biased, they can
create a constant voltage in a circuit, derived from a higher voltate in a circuit. This process is

👻👻moodswings👻👻
called regulation, and this can be easily quantified and measures how the output voltage
changes for a given change in input and how much dampening occurs. An ideal Zener diode
has a regulation of 0 and has the following V/I characteristic graph:

Transistors
The transistor is the most important example of an active component. Unlike passive
components, active components can have power gain, with additional power coming from a
power supply. Transistors allow us to build many types of circuits, such as amplifiers,
oscillators and digital computers (as ICs consist of a large number of transistors).

The transistor is a three terminal device in which either the current (BJT) or voltage (FET) in
one terminal (the control electrode) controls the flow of current between the remaining two
terminals by altering the number of charge carriers in the device.

When designing an amplifier using transistors, we are interested in the basic properties of:

 gain, which tells us how much output we get for a given input
 input impedance, this, along with output impedance tells us about the loading effects through
the voltage divider effect when we connect one circuit to another

For a good amplifier, Zi >>> Zs and Zo <<< Zl (except in power matching RF circuits, when
we want Zi = Zo)

👻👻moodswings👻👻
In electronic design most transistors are already implemented in the form of ICs, however
knowing about transistors is useful as:

 the inputs and outputs of ICs are the inputs and outputs of transistors (open collector, etc)
 transistors are powerful interfacing tools between one circuit and another
 often, the right IC doesn't exist for a task
 someone needs to design the ICs in the first place!

Field Effect Transistor (FET)

A FET takes virtually no current into the gate and the drain-source current is controlled by
the gate-source voltage.

BJTs have a larger gain than FETs, so BJTs are usually used for simple single stage
amplifiers. However, the FET operates with no forward biased functions, so no gate current is
needed. FETs are much easier to use than BJTs. They have extremely high impedances, so no
base current is needed.

In JFETs, the GS junction should be reverse biased to prevent conduction. For MOSFETs,
there is no junction. The gate can be positive or negative with regards to the source.

👻👻moodswings👻👻
The drain voltage of FETs are more nearly constant than the collector currents of BJTs for
varying the voltage across the drain-source/collector-emitter. Additionally, unlike a BJT, a
FET can be made to behave like a voltage controlled variable resistor.

Bipolar Junction Transistor (BJT)

In a BJT, a small base current flows, which controls the larger collector-emitter current.

Further subdivisions of the transistor exist, depending on construction and the charge carrier
polarity (holes or electrons).

👻👻moodswings👻👻
In this course, we will be looking at npn BJTs and n-channel FETs.

For the npn BJT, the following rules apply for circuit design (the signs can be reversed to get
the rules for pnp):

1. The collector voltage must be more positive than the emitter voltage
2. The base-emitter and base-collector junctions behave like diodes, with the base-emitter
forward biased and the base-collector reverse biased
3. The max values for current and voltage for the junctions should not be exceeded
4. If rules 1-3 are obeyed, then IC = β IB, where β is the current gain, however, a circuit that
depends on a particular value of β is a bad circuit, as β can vary from transistor to transistor.

BJT characteristics

The V-I characteristics of a BJT collector varies according to IB, as the following graph
shows:

In an ideal transistors, the lines would not be curved, and would be straight across.

The transistor is approximately a controlled current source, but where is the output? If you
pass the collector current through a resistor, the voltage drop across the resistor is
proportional the the collector current, giving the voltage amplification that is needed.

Applying KVL to the circuit gives us VCC = ICR + VCE, or IC = -VCE/R + VCC/R. Using this, we
can apply y = mx + c, where x = VCE and y = IC, so it can be plotted onto the characteristic
graph above. This is called the load line, and has a slope of -1/R and crosses the axis at
VCC/R. The load line is defined by setting appropriate valyes of VCC and R. As IB varies, the
instantaneous values of voltage and current for the transistor collector-emitter slide up and
down the load line.

👻👻moodswings👻👻
 QC is the cutoff point, where there is 0 collector current (no base current supplied)
 QS is the saturation point, the maximum transistor current, with the collector a tenth (or so) of
a volt above the emitter
 QO is the operating point, working point, or the quiescent point. The region of the load line
around this point is called the active region, where the best amplification can be achieved.

The first two states are extreme states, which are important when transistors are used as
switches (i.e., in digital electronics, where the aim is to get it to switch between the two
extreme points as quickly and as efficiently as possible). In analogue electronics, the active
region is important.

Emitter-follower (BJT application)

👻👻moodswings👻👻
This circuit gives an output that is 0.6 V lower than output, and clipping occurs when V <
0.6. This circuit has a high input impedance, so it is useful as a buffer. They are most
commonly seen as in the second diagram, where R1 and R2 set the biasing point and the
capacitors remove any dc noise that may be introduced into the circuit.

Common emitter amplifier (BJT application)

👻👻moodswings👻👻
In this circuit, VO/VI = -RC/RE

FET characteristics

On the above I/V plot, for sufficiently high values of VDS, the characteristic is nearly
horizontal and so only has a small dependence on VDS. However, the value of I is dependent
on VGS. This makes the device a voltage controlled current source/sink.

👻👻moodswings👻👻
If we want to produce a device that amplifies a voltage signal, we pass the DS current
through a resistor connected to the drain of the FET. The output voltage must lie on a load
line which is obtained from an application of KVL:

I = -VDS/R + VDD/R

The straight line formula (y = mx + c) can be applied, where y = I, m = -1/R, x = VDS and c =
VDD/R.

The load line can be defined by setting appropriate values of VDD and R, then, as vGS varies,
the instantaneous values of voltage and current for the transistor drain-source slide up and
down the load line. When no signal is applied to the circuit, VGS must be biased to be roughly
in the centre of the load line.

Self-Biasing Schemes

Rather than manually biasing a transistor, it is often simpler and more elegant to use self-
biasing schemes, e.g.,

 JFET (n-channel) biasing - the GS junction is reverse biased. AC couple the input and put a
resistor between the gate and ground. DC current through the source resistor sets the
appropriate VGS.
 MOSFET (as with BJTs) requires a divider from the drain supply. Gate biasing resistors can
be large (> 1M Ω), because the gate current leakage is so small (in the nanoamps).

Small Signal Analysis

Small signal analysis involves modelling the transistor as an ideal voltage source of gmvgs in
parallel with resistance rd.

👻👻moodswings👻👻
Current is a function of the gate-source and drain-source voltage and the change in current
can be expressed as di = δi/δvGS + (δi/δvDS)δvDS. For small signals, we can write i = gmvGS +
(1/rd)vDS, where the transconductance gm and drain resistance rd are marginally depedent on
the operating point.

When analysing transistor circuits, we consider DC and AC seperately. DC values are


indicated by upper case symbols and small signal AC values are indicated by lower case
symbols. We consider DC first and get the biasing right, since AC properties have a small
dependence on the position of the operating point in the transistor characteristic. We may
then examine the transistor characteristic and the circuit diagram to determine AC behaviour,
represented by the small signal model. DC sources are shorted to ground to form the small
signal model.

Operational Amplifiers
The operational amplifier (op-amp) is a very high gain dc-coupled differential amplifier. It is
a macro-component typically consisting of 10s of transistors packed into an IC.

An op-amp is designed to have a set of specific properties (most importantly, high gain) so
that by choosing suitable feedback arrangements, the amplifier can be designed to carry out
certain operations on the input signals, such as:

 amplification
 filtering
 differentiation/integration
 summation/subtraction

👻👻moodswings👻👻
The op-amp is a high-gain differential amplifier with output vo = Av(v+ - v-, where Av is the
gain, v+ is the voltage at the non-inverting input and v- is the voltage at the inverting input.

For op-amps, supplies of ±15 V are usually used (exact specs depend on the chip). The 741
op-amp will work from ±3 V to ±18 V.

Even with no signal applied to the op-amp, typically 1 mV will be at the input and if the
voltage gain of the circuit is 1000, this gives a 1 V output for no input. To get round this, in
an ac circuit, a coupling capacitor could be used at the output to remove the dc and for dc, the
offset null circuit could be used.

Ideal Op-Amps

The ideal op-amp has characteristics such that:

 Gain Av is infinite
 Bandwidth is infinite
 Input impedance is infinite
 Output impedance is zero

From these ideal properties, we can infer the following principles:

1. In a negative feedback arrangement, the output attempts to do whatever is necessary to make


the voltage difference between the inputs 0
2. The input draws no current

Ideal properties are often used in circuit design and analysis, but they should be justified in
relation to other values in the op-amp circuit.

Negative Feedback (Inverting) Op-Amp

According to our ideal rule 1, the non-inverting input is earthed, so the inverting input must
be at the same voltage (0 V). This point is referred to as a virtual earth. Using rule 2, the
current into the inverting input is 0, hence applying KCL, we have: vi/R1 + vo/R2 = 0, which
can be rearranged to give vo/vi = -R2/R1.

👻👻moodswings👻👻
The previous expression can be generallised by replacing the resistances with impedances
and replacing them, for example, with filters.

Non-Inverting Op-Amp

Here, v- = R2/(R1 + R2) × vo. The ideal op-amps means that v- = v+ = vi.

In order to judge which is the most appropriate op-amp configuration to use, we need to
consider input impedance in addition to gain and phase shift considerations.

For the inverting circuit, the input resistance is R1, however in the non-inverting circuit, for
our ideal op-amp this is infinite, but in the real world this is not the case. By doing a parallel
combination of resistances, combined with gain, we can get a very large value, which is
unpredictable due to A.

Feedback

From the basic equation vo = Av(v+ - v-), we can see that any signal vi = v+ - v- will almost
immediately send the output into positive or negative saturation (i.e., close to one of the
supply voltages). Thus, in linear applications, the op-amp is used in a negative feedback
configuration.

Negative feedback is the process of coupling some of the output of the system (vo) back to the
input (v-) to cancel some of the differential input (v+ - v-). If we use greater amounts of
negative feedback, the circuit characteristics become less dependent on the open-loop
characteristics of the ampifier and more dependent on the feedback network itself. Thus, in
the ideal op-amp analysis, circuit properties are defined by input and feedback circuit
elements only.

The open loop gain is when there is no feedback. When we do have feedback (normally
through a network of resistors), we have closed loop gain. Closed loops give us constant gain

👻👻moodswings👻👻
up to the bandwidth limit. You need to take care when designing a circuit to ensure that the
gain is in the graph area.

The unity gain bandwidth, fT, will be quoted for a specific op-amp. This is the frequency
when the gain drops to unity and is also called the gain-bandwidth product.

The closed loop gain of the inverting amplifier circuit begins dropping when the open loop
gain approaches R2/R1. This means if we have set R2/R1 to give a gain of 100, this gain may
be limited by the open loop gain of the op-amp above 50 kHz. Indeed, for all
amplifying/filtering applications, we must design inside the open-loop characteristic.

Non-Ideal Op-amp Analysis

Parameter Ideal values Typical real values (for a 741 opamp)

Open loop gain ∞ 105

Gain-bandwidth product ∞ Hz 1 MHz

Input impedance ∞Ω 1 MΩ

Output impedance 0Ω 100 Ω

For a non-ideal op-amp, we can generalise resistances to impedances and add in the effects of
finite input impedance and non-zero output impedance to the inverting amplifier circuit.

👻👻moodswings👻👻
KCL can be applied at the inverting input and if we assume that the current taken by the load
is negligable, we can apply KCL at the op-amp output. We can cancel out the value of v- and
then if we have a much larger impedance at Zi then at Z1 and a much smaller magnitude at
Zo then at Z2 we can get a formula for the gain: vo/vi = AZ2/(Z1(A + 1) + Z2) = -Z2/(Z1 + (Z2 +
Z1)/A). If we have a very large A compared to |(1 + Z2/Z1)| we can further generalise to vo/vi ≡
-Z2/Z1. So we can apply ideal analysis to an op-amp inverting circuit, as long as:

 A >> |(1 + Z2/Z1)|


 |Zi| >> |Z1|
 |Zo| << |Z2|

Filters

Filters seperate signals of different frequencies by attenuating signals of all unwanted


frequencies. They are often used to remove noise, which is present at all frequencies. To
maximise the signal-noise ratio (SNR), we restrict the bandwidth of a circuit to the minimum
necessary to allow the frequencies contained in the signal to pass and to filter all the other
frequencies. An ideal filter is the "brick-wall" filter of rectangular shape, which is not
realisable in practice.

Depending on the application (which has a given accepted and rejected frequency separation)
we require the cut off to have a given steepness of slope. The corner should be as sharp as
possible, i.e., rapid transition from pass-band to roll-off.

Simple passive RC filters produce gentle roll offs of 20 dB per decade. If a steep roll-off is
required, we could cascade multiple low-pass filters. However, passive filters give us quite
heacy dampening (ξ). Comparing the equation for cascading passive filters with the standard

👻👻moodswings👻👻
second order equation shows us ξ = 1, therefore Q (the quality factor) = 0.5, giving a large
dampening and a "soft knee" in the frequency response.

We've already looked at LC resonance response, so we can use LC filters to reduce the
dampening and give a sharper knee to the filter. By cascading LC(R) filters, we can have
arbitarily steep falloff and corner sharpness, but a large number of components are required
and inductors are bulky, expensive and can pick up interference (e.g., mains hum).

An op-amp with R and C elements can simulate the behavior of an LCR filter. Such a filter is
called an active filter. Furthermore, due to the large input impedance and low output
impedance of op-amps, loading problems experienced with cascaded passive filters are
removed.

Sallen-Key Filter

The op-amp and voltage divider arrangement form a non-inverting amplifier G = K and
where 1/2πRC defines the cutoff frequency.

As this is a non-inverting amplifier, vo = Kv3. We can then figure out the voltage divider
effect for v3 and then use KCL round node v2. We can combine this three equations to give
us: vo/vi = K/(1 + (3 - K)jωRC - ω2R2C2).

When K = 1, this is identical to a low-pass filter. As K increases from 1 to 3, decreasing


amount of negative feedback has the effect of reducing dampening and giving a sharper knee.
Note that by setting K = 3 we can eliminate dampening altogether - this gives us an
oscillator!

As this is an op-amp, we don't have loading problems and stages are easy to cascade.

Non-linear Applications of Op-Amps

👻👻moodswings👻👻
In linear applications, negative feedback pre-dominates. Even if some positive feedback is
used, negative feedback must dominate for stability, the exception being sine wave
oscillators, which are on the cusp between stability and instability and postive and negative
feedback are perfectly balanced.. In non-linear applications, we may have no feedback (for
comparators), or positive feedback (oscillators and Schmitt triggers). Here, the output
switches to close to the power rails. It is important to note we can not assume v+ = v- for these
applications, as negative feedback is required to enforce this condition.

Comparators

The comparator is used to determine which of two signals is larger, or to know whether a
signal has exceeded a predetermined threshold. Ordinary op-amps are frequently used as
comparators, but a slow slew rate gives a relatively long switching time (20 µs for a 741) due
to stabilising internal compensation. The 748 op-amp has no internal compensation and a
switching time of less than 1 µs, which is still too slow for many logic circuits.

The 311 comparator has a single rail supply and an open collector output (hence requiring a
pull up resistor). This is an example of a special IC that are intended to be used as
comparators and can move in and out of saturation much faster. However, comparators chips
can not be used in the place of an op-amp, as they can be unstable in negative feedback
arrangements due to a lack of compensation.

A simple comparator circuit has a slow output swing and multiple triggering, which is very
bad for sequential logic, such as counters.

👻👻moodswings👻👻
Connecting Rf introduces positive feedback and introduces dual thresholds, one for each
output state. Positive feedback also gives us faster triggering. The resulting hysteresis ΔV is
given by ΔV = R1/(R1 + Rf)Vpp, where Vpp is the peak-to-peak voltage change.

This is called a Schmitt trigger, and a graph showing the difference between a Schmitt trigger
and a standard comparator is as follows:

👻👻moodswings👻👻
Oscillators

Op-amps can use positive feedback to generate signals (sine wave, square wave, sawtooth,
triangle, etc) which can be used to drive electronic circuits.

As mentioned earlier, we can make an oscillator based on a Sallen-Key filter with K = 3.

👻👻moodswings👻👻
The above diagram is that of a Wien-Bridge oscillator, which is popular for generating audio
frequencies. The frequency generated is 1/2πRC.

Charging and discharging an capacitor between two threshold voltages combined with a
comparator op-amp gives us a relaxation oscillator.

👻👻moodswings👻👻
Digital/Analogue Interfaces
Computer control of analogue processes is widespread. Hence, we need a method to convert
analogue signals to digital and digital signals to analogue.

Digital-to-Analogue Conversion

By changing the values of the resistors, you can weight each bit given its place weighting and
the outputs are summed using an analogue adder (an opamp). These are difficult to fabricate
on ICs due to difficulties in fabricating resistors with an exact value of R, but it is easier to
make identically matched ones. Hence, DACs employing an R-2R ladder network are used.

👻👻moodswings👻👻
This circuit transforms binary scaled currents to an output voltage. The resistors must be
precisely matched. It outputs 0 V for 0 binary input and -150/16 V for a maximum binary
input of 15.

Analogue-to-Digital Conversion

Analogue signals vary continuously with time. When converting an analogue signal to a form
that can be used by a computer, we require a digitsed, discrete time signal. The signal is
sampled at a given frequency to give us a discrete time signal. The discrete time signals are
digitised, i.e., they are represented by a fixed number of bits and so are quantised
representations of the original analogue signal. The diagram below is an example of such a
"sample and hold" waveform.

Nyquist Sampling Theorem

The sample-and-hold waveform can be low-pass filtered to remove the sharp edges and we
effectively reconstruct the original waveform. However, we need to ask ourselves how
rapidly do we have to sample in order to get an accurate reproduction of the original signal?

The Nyquist sampling theorem states that the sampling frequency, fs should be a minimum of
twice the highest frequency content of the signal: fs ≥ 2fmax. If this is not observed, a distortion
occurs called aliasing results. A graph showing the results of aliasing is below:

👻👻moodswings👻👻
Analogue signals are often passed through a low pass anti-aliasing filter to ensure no
frequency components higher than fs/2 are present when the signal is sampled.

Quantisation Error

If we wish to digitise a sinusoidal signal of amplitude V, then n bits must represent a voltage
swing of 2V. A change of 1 in the binary representation of the voltage then represents: ΔVq =
2V/(2n - 1) ≡ V/2n - 1. More bits can be used to reduce the quantisation noise.

Several different types of ADCs are available and the correct choice depends on the
application - whether speed or accuracy (i.e., number of bits) is required.

Flash Convertors

👻👻moodswings👻👻
Flash convertors are fast (in the range of ns) and are typically used for video signals. Many
parallel comparators are used to measure the analogue signal, one for each possible digital
value. A binary encoder generates a digital output corresponding to the highest comparator
number activated by the input voltage.

Typically only 8-bit accuracy is used. An 8-bit convertor requires 255 comparators. A 16-bit
version would require 65535 comparators, which is unwiedly.

Single/Dual Slope Integration Model

👻👻moodswings👻👻
The single and dual slope integration methods let the capacitor charge up and is then
connected to a comparator that stops a counter counting when the capacitor reaches a certain
level. The dual slope method additionally can time the time for both charging and
discharging.

The above diagram shows the single slope method.

Successive Approximation/Tracking ADC

👻👻moodswings👻👻
A comparator is used to compare the analogue input with the digital output presented by an
up/down counter. A negative feedback loop is then used to track the input.

👻👻moodswings👻👻
CHAPTER 6:
Computer Architectures
The CPU
A CPU is a processing circuit that can calculate, store results and makes algorithmic
decisions. It is this latter factor that sets computers apart from calculators.

There are different CPU types, including:

 register machines
 accumulator machines
 stack machines

A very simple processor could be Jorvik-1, an accumulator machine:

The program counter is a register showing the current location in a set of instructions and the
accumulator is where values are stored before and after computation.

An example set of instructions for this machine may be:

Hexadecimal Assembler
Description
number equivalent

31 ADD A 03h This is 0011 0001, so add 0011 and incremement the pc by 1

Subtract (this is only possible if the accumulator understands


F1 SUB A 01h
signed numbers)

05 BRANCH 05h The accumulator is not affected, but the pc increases by 5

No operation, just continue to the next step (this can be useful


01 NOP
for timing purposes)

👻👻moodswings👻👻
11 INC A Increment (really it's just add 1)

Don't increment the pc, and have no effect on the


00 HALT
accumulator.

However, what is wrong with Jorvik-1?

 The instruction set is very limited


 Is it actually useful?

What features should a CPU have?

 Ability to perform operations


o Add
o Subtract
o Possibly other operations
 Ability to choose what part of program to execute next
o Branch always
o Branch conditional
 Ability to access and alter memory access - with one register, auxilary memory access is
essential

Most CPUs have many more features than this.

Jorvik-1 can not conditionally branch, or access memory, so it does not fit into these features.

Let's consider an improved processor - Jorvik-2.

Here, we have changed the format of the opcode, reducing the maximum size of a branch, but
allowing conditional branching. When P = 0, the branch is always taken (pc = pc + ppp),
however if P = 1, the branch is only taken if the accumulator = 0, otherwise the program
counter is just incremented - so we now have a "branch if 0" instruction.

Complexity

👻👻moodswings👻👻
Jorvik-2 is more complicated, so requires more hardware. If we consider the propogation
delay in the gates, Jorvik-1 takes 25 ns per cycle (which works out as 40 MHz), whereas the
same cycle on Jorvik-2 might take 70 or 75 ms, depending on specific implementation
details, such as synchronisation between the accumulator and program counter stages. 75 ms
is 13 MHz, which is a 3 times decrease in speed compared to Jorvik-1.

What does this tell us about complexity? Complexity is related to speed, therefore simple
circuits run fast and complex circuits run slow. Jorvik-1 may be faster, but is not complex
enough to be useful.

If a new feature is important and heavily used, the advantages may outweigh the speed cost.
If the feature is only rarely useful, it may be a burden and slow down the whole CPU.

Addressing
Data can be held in memory and is normally organised into bytes (8 bits), words (16 bits) or
double words (32 bits). Data can also be addressed directly as an operand in an instruction -
for example ADD A 03h has an operand of 03h.

Instructions normally operate dyadically (they require two operands), e.g., ADD A 03h has
two operands - A and 03h. Having only one register (A) is not ideal and is not suited to the
frequent case of dyadic operations. We should consider whether more registers would be
better.

Our Jorvik-2 design is an extreme example of an accumulator machine - all operations use
the accumulator (A) and a secondary operand. Having more registers will make our CPU
more useful. We have already discussed 0 (stack), 1 (accumulator) 2 and 3 register addressing
in operands in ICS, but adding more registers means complex and potentially slower
circuitry. Adding fewer registers is normally simple and fast, but it could lead to harder
programming, perhaps producing slower circuitry.

A Three Port Register File

Consider a three port register file of four 8-bit registers.

Accessing any three of these 4 registers simultaneously can create design problems. Reducing
to two or one register addressing can ease the problem. Instructions must contain both the
function code (ADD, SUB, etc) and the register addressed: FFFF XX YY ZZ - FFFF is the
function code and XX YY ZZ are the registers (two inputs and an output) (as in the diagram
above) - this is 10 bits long.

👻👻moodswings👻👻
A One Port Register File

This is the standard accumulator model. It is much simpler, only one register file port is
needed. Instructions contain one register address, the second register is always the
accumulator. An instruction on this kind of machine looks like: FFFF RR - 6 bits long.

No Registers - The Stack Model

The stack (or zero-operand) model has no addressable registers, and thus no register file in
the traditional sense.

Instructions only specify the function code (e.g., ADD, SUB, etc). Registers are always TOP
and NEXT by default. An instruction in this model is only 4 bits long: FFFF.

We can implement a stack based on our Jorvik-3 machine, which adds an L (load) bit to the
opcode which allows us to load something (a number bbb) onto the stack. However, we have
lost a bit from our function code and introduced delays due to new components.

👻👻moodswings👻👻
Upon examining Jorvik-3 we start to see some new instructions when we examine the
circuitry. We have a new SET instruction which pushes a value onto the stack, and ADD,
which adds two values from the stack together. However, our design only has a data adder in
the path, so we can no longer do SUB. To get round this, we can SET a negative value onto
the stack and then ADD.

Jorvik-3 has new features, however they come at a cost. We could work around this cost in
various ways:

 making our instruction set 16 bit rather than 8


 improving the way branches are handled - most CPUs don't combine branches with arithmetic
operations, and branch:arithmetic functions are in the ratio of approximately 1:6, meaning
we're wasting about 40% of our bits in unused branches.

Having other functions would also be nice - a SUB function in addition to ADD,
also AND and OR and the ability to DROP things off the stack would also be useful.

It would be difficult (not to mention slow) to introduce these features to our existing Jorvik-
3 architecture, so we need another technology shift to Jorvik-4.

👻👻moodswings👻👻
Instructions
Additional information is available on the instruction decoder on the CAR course pages.

The biggest advantage to the Jorvik-4 architecture over our previous architectures is that the
opcode is now 8-bits long. We've already discussed what a CPU should have, but what other
things are important in an architecture? Common sense and quantative data tells us:

 Speed of execution
 Size of code in memory
 Semantic requirements (what the compiler or programmer needs).

Decoding Instructions

More on Decoding

Jorvik-1 through 3 used unencoded instructions - sometimes called external microcode. Each
bit of the function maps directly onto internal signals.

The Jorvik-4 architecture uses an encoded instruction format. The 8-bit opcode is just an
arbitary binary pattern, of which there are 256 possible permutations to decide which signals
are to be applied internally. An internal circuit (the instruction decoder) does this, but wastes
time doing it. There are two decoding strategies to use with an instruction decoder, either
using internal microcode - a ROM lookup table that outputs signals for each opcode or
hardwired - a logic circuit that directly generates the signals from opcode inputs. Microcode
is easier to change, but hardwiring is usually faster.

Whether or not we use microcode or hardwiring, there is going to be a decoding delay.


Suppose an ALU operates in 20ns, but a decoder takes 10ns, that means each operation takes
a total of 30ns, or 33 million operations per second. However, with a clever design, we can
overlap instructions, so the decoder can start decoding 10ns before the ALU is due to end the

👻👻moodswings👻👻
previous operation, meaning that the new instruction will be ready for as soon as the ALU
finishes.

An encoded instruction format relies upon an opcode of n-bits, and each permutation is a
seperate function. However, it is often useful to sub-divide functions in the opcode. We also
need to consider the frequent requirements for data operands. We could assume that any
operands will be stored in the memory byte immediately following the
opcode. ADD and SUB are 1-byte instructions requiring only one opcode byte, but SET is a
2-byte instruction, requiring opcode and operand.

At the moment, Jorvik-4 is only an architecture - there is no defined instruction set. From our
earlier machines, we almost certainly need to include the following instructions:

Instruction Description

SET nn Push nn onto the top of the stack

ADD Add the top two values of the stack together

SUB Subtract the top two items together (NEXT - TOP)

BRANCH nn Branch back or forward by nn place unconditionally

BRZERO nn Branch back or forward by nn places if zero is detected

NOP No operation (normally included by convention)

STOP Stop execution

We can see here which instructions are 2 bits and which aren't.

Delays

In addition to the delay we saw introduced by the instruction decoder, there are also delays
introduced by memory latency. These include a delay from fetching an instruction
(instruction fetch, or IF), and also a possible additional delay caused by fetching an operand
(data fetch, or DF).

Our timing diagram for a SET operation may look like:

However, we can not assume that delays just add up to a final figure. Clocks usually have a
uniform clock period, so our operations must be timed with clock ticks.

👻👻moodswings👻👻
Here, decode is given 15ns, rather than the 10ns it needs and the operation execution has
30ns, instead of 20, leading to wasted time. A more sensible clock period may be 10ns.

Now that we've taken into account memory latency, we need to reconsider our overlapping
time-saving technique. Only one fetch (instruction or data) can happen at once, so we need to
wait for one IF to finish before starting the next one, but we can overlap the IF with the
decode procedure and then overlap the decode as before. However, if the instruction is a two-
byte instruction and requires a DF also, we're in trouble. We can't start the next IF until that
DF has finished, and as the IF and decode operations take longer than the execution of
operation, we still end up waiting, negating the effects of our overlapping. Therefore, we
want to avoid 2-byte operations whenever possible.

SET is a frequent operation, but common sense tells us that frequent operations should be fast
and compact, so we should look at ways to improve the performance of this
command. Patterson & Hennessey tell us that at least 40% of constants are 4 bits or less, and
0 and 1 are very commonly used. We could improve matters in varying approaches:

 Maybe we could have a special short set instruction (SSET n as well as SET nn)
 A special "Zero" opcode could be used to initialise a zero onto the stack
 We could replace SET 01h, ADD with a special INC function.
 We could also replace SET 01h, SUB with a special DEC function.

If we decide to have a SSET function, we must consider how this would work in practice on
an 8-bit instruction set. If we have XXXX dddd, this gives us 4 bits of opcode and 4 bits of
data and SSET values 0-15. This is too restrictive, as we would only have 16 instructions
available, each with 4 bits of data.

However, we could only have dddd when XXXX = 1111, so opcodes 00h to EFh are
unaffected and we still have 240 possible opcodes, and then opcodes F0h to FFh give
us SSET 0h to SSET Fh. As F0h is SSET 0h, we don't need a special "initialise 0" function
any more.

The Stack
We covered the concepts of stack-based computing in ICS. Let's consider that we want to
calculate the average of three numbers, X, Y and Z.

Instruction Stack Content

SET X X

SET Y YX

👻👻moodswings👻👻
ADD Sum1

SET Z Z Sum1

ADD Sum2

SET 3 3 Sum2

DIV Average

Here, we've added yet another command - DIV, which divides. However, if DIV does 3 ÷
Sum2 instead of Sum2 ÷ 3, we have a problem.

Stack order affects a computation, so we should add commands that manipulate the stack.

 SWAP - this swaps the top two elements of the stack so X Y Z becomes Y X Z
 ROT - this rotates the top 3 items on the stack so X Y Z becomes Z X Y
 RROT - this rotates the stack in the opposite direction, so X Y Z becomes Y Z X

Re-using Data Operands

Say we have the formula (A - B) ÷ A, we can see that A is used twice, but in our current
instruction set, this would require 2 fetches of A into our stack, and as every fetch introduces
a delay penalty, this might not be the optimal solution. Using our current instruction set, an
implementation of this may use 9 bytes of storage and 3 fetch delays.

We could implement a DUP operator which duplicates the top of the stack.

Instruction Stack Contents

SET A A

DUP AA

SET B BAA

SUB Sum1 A

DIV Result

STOP

This requires 8 bytes of storage and only 3 fetch delays.

Another useful command would be the DROP instruction, to throw away the top stack item,
so X Y Z becomes Y Z, for example.

Another approach to stacks is orthogonality and scalability, of which more information is


available in the original lecture slides.

👻👻moodswings👻👻
Branches
All architectures have some form of branches. We could use them for:

1. skipping over parts of a program - normally conditional


2. jumping back repeatedly to a certain point (i.e., a loop) - normally conditional.
3. moving to another part of the program - normally conditional

In cases 1 and 2, the branch will probably be quite short. In case 3, the branch distance could
be quite large. According to Patterson and Hennessey, 50% of branches are ≤ 4 bits (based on
3 large test programs). These are almost certainly mostly types 1 and 2.

If all branch operations were 8-bit, we'd need 2 bytes for every branch and 2 fetches.
However, if we have short 4-bit operand, then 50% of brances will be 1 bit only.

The most common type of branch is a relative branch - these are normally used within loops.
They test a condition and loop back is true. Brances are used to skip over code conditionally.

Long Branches

Long branches are used to jump to another, major part of a program. Normal branches have a
limited range, of about 27 (7-bits and a sign bit), so long branches use a fixed (absolute) 2-
byte address (16 bits). Long branches get relatively little use, compared to other branching
methods. Long branches have been replaced in a lot of cases by procedural code.

Indirect Branch

A memory address is given in the branch opcode. The memory address contains where to
jump to, rather than the opcode. e.g., IBranch 06h would jump to the address stored at
location 06h.

Such branching schemes are speciallised (sometimes called a jump table), but can speed up
cases where there are multiple outcomes to consider (e.g., a 'case' in higher level
programming languages). IBranch could also be implicit, using the value on the top of a
stack.

ALU
The ALU is the Algorithm and Logic Unit. It is capable of monadic (1 operand) and diadic (2
operand) operations. The ALU requires operands and a function code (to tell it what to do).

The ALU is used for performing calculations (some are floating point, others integer only),
manipulating groups of bits (AND, OR, XOR, NOT) and for testing conditions (test-if-equal,
test-if-greater, test-if-less,test-if-zero, etc).

The ALU also has a carry flag which occurs when arithmetic takes place if the result is too
large to fit in the available bits. The carry flag "catches" the final bit as it falls off the end.

Software vs. Hardware

👻👻moodswings👻👻
A hardware multiply instruction may be very fast (1 cycle), but requires complex circuity. It
can also be done in software using two methods:

 repeated addition: 10 × 3 = 10 + 10 + 10
 shift and add: 10 × 3 = (3 × 0) + (30 × 1)

Calls, Procedures and Traces


Programs consist of often repeated sequences, or 'routines'. An in-line routine is coded at
each point when they are needed - this is inefficient on memory use.

In a real program, a common routine may be done 100s of times, so doing it in-line may
result in an increased possibility of bugs. Complicated procedures may takes 10s or even
100s of lines of instructions.

How can we avoid this repetition? One answer is to branch to one part of the program and
then branch back when finished. If we store this return value (where to go back to)
somewhere, we know where we'll be going back to.

To implement this in our architecture, we need new instructions:

 CALL nnnn - this stores the current pc somewhere and branches to the specified address
 RETURN - this gets the stored pc value and branches back to that address (the stored pc value
should really be the instruction after the original CALL, otherwise you would end in an
endless loop).

We now have a smaller program with greater consistency in code, however


the CALL and RETURN have overhead.

Hardware Considerations

To implement this, we need to consider how we will store the return value. One way is to just
have a second pc register, which allows single level depth, but this is limiting. Adding a stack
to the pc to allow multiple level calls (nested procedures) is more flexible. This is often called
a program stack or return stack.

Now we need to consider how our instructions work. CALL would push pc + i (where i is the
instruction length of the call operand - i.e., in a 16-bit address this would be pc + 3. Many
architectures automatically increment the program counter after the instruction fetch anyway
so often this isn't something to worry about) onto the stack and put the new address (operand
of CALL) into the pc register. The machine then automatically goes to and executes the
address in the pc.

For the RETURN instruction, the value on the top of the stack is popped onto the pc and the
machine automatically follows this new instruction.

Traces

👻👻moodswings👻👻
Traces are a useful hand debugging tool, especially for debugging loops. You act as the CPU,
listing each instruction in order of execution, alongside the stack contents. From this you can
check that the program behaves as expected, and possibly identify bugs in the architecture.

Long vs. Short

Usually CALL instructions have 16-bits (or longer) operands, however important routines -
sometimes referred to as 'kernel' or 'core' routines - get called very often. These could be I/O
procedures such as reading a byte from the keyboard or updating a display character or low-
level routines such as multiply (if it's not implemented in hardware).

We could implement a 'Short Call' instruction, like SSet and SBranch. However, this gives us
only 16 possible addresses. Each number should represent a chunk of memory, not a location,
however - otherwise each instruction could only be 1 byte long.

Now our instruction set looks like this:

 F0 - FF SSet
 E0 - EF SBZero
 D0 - DF SBranch
 C0 - CF SCall

SCall 01 might call address 010000, SCall 02 100000, etc... - each address is bit-shifted so
the instructions can be longer. If these addresses are at the beginning of memory, then SCall
00 is a software reset feature, as it runs the code at the very beginning of memory, rebooting
the system.

So, implementing this in our new machine we get the final Jorvik-5, a dual-stack architecture.

👻👻moodswings👻👻
However, additional hardware is required in the program counter path to figure out the new
program counter value when incrementing in a call.

Addressing Modes and Memory

Addressing Modes were


also covered in ICS.

An addressing mode is a method of addressing an operand. There are many modes, some
essential, others specific. We have already used several addressing modes, consider:

 SET nn - the operand is immediately available in the instruction, this is called immediate
addressing.
 BrZero nn - the address we want is nn steps away, relative to the present pc value -this is
relative addressing.
 LBranch nn - the address we want is nn itself - this is absolute addressing.
 Skip - this is implicit, there are no operands

Implicit

This is typical with accumulator or stack-based machines. The instruction has no operands
and always act on specific registers.

Stack processors are highly implicit architectures.

👻👻moodswings👻👻
Immediate

Here, the operand is the data, e.g., Set 05 or SSet 6h (here, the operand is part of the opcode
format).

The Jorvik-5 could make better use of immediate instructions, e.g., ADD nn adds a value
onto the stack, instead of SET nn, ADD. This operation is a common one, so it may be
worthwhile to implement this, although it does require hardware changes. We can justify this
change if:

 it occurs frequently
 it improves code compactness
 it improves overall performance

In our current Jorvik-5 architecture, an instruction might requires 70 ns to execute. In our


new design, each instruction now takes 80 ns to execute, but constants can be added with
only one instruction. Taking into accounts fetches, our immediate add is now 30% faster, but
we have a 14% slowdown for all other instructions.

We need to consider whether this trade off is worth it. If we plot a graph of speed-up against
frequency of immediate add's, we need 30% of instructions to be immediate add's for the
trade-off to break even, and 50% to gain even a modest 10% improvement.

Absolute

This form of addressing is very common.

To implement absolute addressing in our architecture, we need some new instructions:

 CALL nn - we've already covered this


 LOAD nn - push item at location nn onto the stack
 STORE nn - pop item off the stack into location nn.

Relative

This is commonly used for branches and speciallised store and load. The operand is added to
an implicit register to form a complete address.

This is useful for writing PIC - position independant code. This kind of code can be placed
anywhere in memory and will work.

Indirect

This addressing mode is more speciallised and less common. The operand is an address,
which contains the address of the data to be accessed (an address pointing to an address).
This is sometimes called absolute indirect.

Register Addressing

👻👻moodswings👻👻
This is common in register machines. Here, registers are specified in the operands, e.g., ADD
A B. Stack machines also have registers, mainly speciallised ones (e.g., the pc or flags such
as carry and 0)

Register Indirect

This is a mixture of register and indirect addressing. Here, the register contains the memory
location of data and then it works like indirect addressing.

High-Level Languages
High-level languages were developed in response to the increasing power of computers. It
was becoming harder to write correct programs as the complexity of the instruction set
increases. High-level languages come in two main types:

 Compiled to target - high-level language statements are translated into machine code and the
program is executed directly by the hardware. This method can generate large programs.
 Compiled to intermediate form - high-level language instructions are compiled into an
intermediate form (bytecode) and these intermediate forms need to be executed by an
interpreter. This method tends to product machine-independant compact programs, but the
interpreter also needs to be present.

The main requirement for an interpreter to be supported at machine level is the ability to
execute one of many routines quickly. Each 'routine' defines a function in the language
semantics. Ibranch allows us to branch to one of many subroutines, so we could implement
an interpreter control loop with this. Each bytecode represents a location in the jump table, so
we can use this to decide where to jump to to implement the instruction.

Typical languages consists of statements which we need to support in our machine


architecture such as:

 Variables and assignment


 Conditional statements (if-then-else)
 Case (switch, etc).

Managing Variables

A few values can be kept on the stack or in registers, however most high-level languages
have 100s of values to deal with. Registers and stacks are too small to support this, so we
have to use main memory. We could store our variables at fixed addresses (e.g., X is 1000, Y
is 1001, etc), but this is difficult to manage and wasteful as not all variables are needed at all
times. We could store this values on another stack, so when they are created they are pushed
onto the stack and when they are finished they can be dropped off. This type of
implementation is known as a stack frame.

Stack Frames

Stack frames are needed for variables, as these need to be created and deleted dynamically
and we have to keep variables in memory. Arrays are also variables and as such have the

👻👻moodswings👻👻
same issues, but they can contain many seperate data items. We need to consider how to
manage the storage of these items effectively.

We also need to consider the program structure, which might consist of procedures and
nested calls, recursion and interrupts. We can also have multiple instances of a procedure as
multiple calls to the same procedure can co-exist.

Stack frames help as variables can be added to, or removed from, a stack. The stack resides in
memory, but it can be located anywhere. For an array, the stack can hold a few, or many,
bytes just as easily. As for the program structure, the stack frame can store the context of
each procedure - recursive calls to procedures cause the preservation of the previous state.
Multiple instances can also be implemented in the same way, where each call has its own
state on the stack.

If we assume that we have a new third 'allocation' stack in Jorvik-5, then when a procedure is
called, the first thing that happens is that variables are declared and created on the stack. If
we had a procedure with 2 variables, A and B and an array X of length 4, then A, B, X[0],
X[1], X[2] and X[3] are pushed onto the stack, on top of any data already on the stack. If our
program was recursive, then we could get many stacks on top of each other and each one will
maintain the correct state. Nesting is essentially the same principle, but a different procedure
is called.

Our stack architecture has to place data on the stack to do computations, but the variables are
held in memory - how do we get them onto the stack? A way to accomplish this is using
indirect addressing. If we know a 'top-of-frame' address, then we can work out the address of
a variable in memory.

We need another register to store a pointer to the top of the stack frame, which can be pushed
onto the main stack for computation of indirect addressing. Because of the regularity of this
process, we could have an indirect load instruction (LOAD (TOF)+02), if we figure it
increases performance, however this is deviating from the stack machine philosophy. This
kind of instruction does exist in register-based CPU, such as the 68000.

In high-level languages, there are more to variables than this, however. We have currently
implemented local variables, which are created a procedural state and destroyed after the
return and can only be accessed by the program that creates them, however we also have to
consider global variables, which are usually created before the main procedure and can be
accessed by any procedure at any time. So, our frame stack has to be accessible in two parts,
one for globals and one for our current procedure (locals).

We can use a stack to store our pointers, so when we return, our stack frame can be restored.
With this stack of pointers, we can access variables from higher levels.

Conditional Statements

To deal with conditional statements, we put the values to be compared on the stack, and then
use a test operation (which sets the 0 flag is the test is true) and we can then do BrZero

Case Statements

👻👻moodswings👻👻
Here, an Ibranch is used with a jump table.

Arrays

Arrays are typically placed in memory (in a stack frame or elsewhere) and access to the array
is computed by using indexes to figure the offset from the array start.

Compilers

As we stated earlier, compilers turn high-level language statements into low-level machine
code by recognising statements in the high-level language such as if-then and case and then
outputting predefined machine code fragments.

Based on the examples given in the lecture slides, we could write a compiler for the Jorvik-
5 and with a little work, a program could be written to recognise common statements and
output Jorvik-5 machine code. There are various complications and refinements to be made,
however, but the basic idea is there.

Interrupts
An interrupt occurs when a device external to the CPU triggers an interrupt request line
(IRQ). The CPU responds with an IRQ acknowledge when it has processed the interrupt. An
interrupt is a request from a device to the CPU to perform a predefined task, called the
Interrupt Service Routine (ISR).

However, if there are several devices, things get more complicated. One option is to have an
IRQ line for each device, but this leads to lots of CPU pins. An alternative is to allow all
devices to share an IRQ line, but then we have to consider how to safely share one line.

Shared IRQ Approach

We have to somehow combine the IRQs from each device to signal an interrupt request. A
tri-state OR gate can be used for this, however if more than one device triggers an interrupt,
how does the CPU know which device needs servicing?

We can use the idea of interrupt prioritisation to work round this. A device has the idea of
priority, and can only raise an interrupt if there are no more important devices requesting one.
A method of implementing this is using daisy chaining, where a more important device in the
chain lets the one directly below it know whether or not it (or any devices above it) want to
use the interrupt, so the permission is passed down the chain. A single acknowledge line is
used, with each device connected in tri-state. Only the device that generated the input pays
attention to it.

However, we still have the problem of knowing which device initiated the interrupt, and
which service routine to run.

Vectored Interrupts

To implement this, we can use vectored interrupts:

👻👻moodswings👻👻
1. A device triggers an interrupt
2. The CPU acknowledges for more information
3. The device issues a byte on the data bus
4. The CPU reads the byte
5. The byte can be an opcode, or vector address (jump table) - the exact details are architecture
specific.

In Jorvik-5, we can use opcode interrupts where the short call uses a subroutine with I/O info.
Each device has a different bit pattern for different opcodes. Negotiation is needed to ensure
the correct subroutine is called.

Fully-Vectored Interrupts

If a single opcode is > 1 byte, then opcodes are tricky. A vectored address contains part of
address with a subroutine in - e.g., an 8-bit vector (the VPN - vector page number) is
combined with another 8-bit number (the VBA - vector base address) to give us the full
adress. However, to make full use of the 8-bits we can only have 1 location per routine,
which is not enough to be useful. Real systems use an interrupt vector table. The CPN is
shifted so each location is 2 bits longer, combined with a 7-bits VBA. We can then use
indirect addressing for our ISR as the actual instructions are in the 16-bit memory location
given.

There are also other considerations we should make. We never know when an interrupt is
going to be called, so the present state of the CPU needs to be preserved when the interrupt is
called, and the ISR also needs to leave the CPU in the exact state is was before it started,
otherwise program crashes may occur.

History of Computing
Ancient History

Abacuses were the most basic form of computing, and were thought to be invented by the
Babylonians or the Chinese in 1000-500 BC. Common abacuses use base 5 or 10, and binary
did not become the standard until the 1940s. In computing, binary is more stable, and less
susceptible to noise (being either on or off, not laying inside certain band limits).

Charles Babbage (1791-1871)

Babbage is widely recognised as the father of computing and is most famous for designing
the difference engine and analytical engine.

Difference Engine

The difference engine was commissioned by the British government to provide accurate
logarithmic, trigonometric and other tables for their seafaring merchant and naval vessels. It
could be considered to be an application specific computer (ASC) (i.e., optimised to provide
one specific task). It worked by evaluation polynomials using the method of finite
differences.

👻👻moodswings👻👻
One of the main aims of the project was to cut out the human computer, thus removing any
errors, therefore, the system needed to be able to print out the results to avoid any
transcription errors - much like current times with automatic code generation.

The Difference Engine was constructed from a set of mechanical registers (figure wheels)
storing differences Δ0 yi, Δ1 yi, ... and adding mechanisms and had eight storage locations
with 31 decimal digits.

High levels of accuracy were required to minimise the effect of accumulated errors i.e., 31
decimal digits is equivalent to 103 bit arithmetic. Fractional numbers were multiplied by a
constant i.e., scaled to fit within the machine�s integer representation and negative numbers
were computed using complement notation.

To save cranking, and to speed up operations, the Difference Engine could be set for parallel
"pipelined" operation. The maximum speed of the Difference Engine was 10 calculations a
minute, or a result every 6 seconds. An attempt to run the system faster than this the
momentum of some parts caused the transmission mechanism to bend, resulting in jamming
or broken gear teeth.

In summary, the difference engine had register storage, addition with a ripple carry
mechanism, a pipelined architecture and was capable of parallel operations.

Analytical Engine

Charles Babbage designed his analytical engine, which was the first programmable general
purpose computer (although not stored program), however he never built it, due to lack of
funds. It used microcode for complex operations. It was a lot bigger than the difference
engine, and some designs went as far as 50 registers and 40 digit precision.

The main innovations were the seperation of storage and calculation units, giving us the Store
(V) and the Mill, containing accumulators (A). Operations were controlled by the
programmable card (consisting of operations cards, variable cards and number cards), and
was based on the Jacquard loom. Babbage suggested using a printer, punched cards and a
curve drawing machine as outputs, giving the machine both inputs and outputs.

The Analytical Engine used look-ahead carry, speeding up addition considerably (into a
constant, rather than linear based on the length of the number), but 10 times more machinery
was used for the carry than for the addition.

Electromechanical Machines (First generation relay and valve machines)

Konrad Zuse (1910 - 1995)

After Babbage's death, no serious attempts were made to build a general purpose computer
until the 1930's. The first examples (the Z1 and Z3) were discovered after WWII and were
found to be built in Germany between 1936-1941.

The Z1 machine represented numbers in a 22-bit floating point format and the arithmetic unit
was an adder only (therefore subtraction could occur using complement notation) and all
operations had to be reduced to addition. However, it was not very reliable, so the Z3 was

👻👻moodswings👻👻
constructed, which was made entirely out of relays and was programmed using punched tape
and a user console.

The Z3 used microcode implemented using a micro sequencer constructed from stepwise
relays and instruction overlapping was used (see earlier in notes - an instruction is read whilst
result of last one was being written). A look-ahead carry circuit was constructed from relays
which allowed fast addition to occur.

Harvard Mk I

The Mk I was developed by Howard Aiken and was built by IBM between 1939 and 1944. It
was also known as the IBM Automatic Sequence Control Calculator (ASCC). It was
constructed from electromechanical relays, was 55 feet long, 8 feet high, 2 feet deep and
weighed 5 tons, making it relatively slow and very noisy. Numbers of up to 23 decimal places
could be represented using electromagnetic decimal storage wheels - 3000 of which were
present in the machine. The Harvard Mk I was used by the US Navy for gunnery and ballistic
calculations until 1959 but it was out of date by the time it was commissioned.

Aiken went on to develop the Harvard Mk II, III and IV using valve technology, and he also
developed the concept of the Harvard computer architecture, which uses physically seperate
instruction and data memory. This allowed the next instruction to be read whilst the previous
data is being written to memory and the width of the address and data bus could be different
and optimised to the appropriate size. However, two memory modules were now required and
storing programs inside the instruction memory could be difficult.

This is used in modern systems, where it is important to isolate the path between the
processor and memory to maximise performance. Nowadays, a dual independent bus (DIB) is
used. This replaces the system bus with a frontside bus (FSB), connecting the system memory
(via the memory controller) to the CPU, and also to the other buses. The backside bus (BSB)
is used to provide a fast direct channel between the CPU and the L2 cache.

ENIAC

ENIAC (Electrical Numerical Integrator and Computer) used valve technology instead of
relay technology, and consumed 180 kW of power. It was commissioned by the US Army to
calculate firing tables for specific weapons for a range of environmental and weather
conditions. It was completed in May 1944, and has a strong claim to be the first ever general
purpose electronic computer (although this is a topic of much debate). It was developed by a
team lead by JP Eckert and JW Mauchly and was decimal, using over 18,000 valves and
weighing 30 tons. It was much faster than anything ever built previously, and multiplication
could occur in under 3 ms. It was described as being "faster than thought".

ENIAC was not a stored program computer, and needed rewiring if it was needed to do
another job. For each program, someone needed to analyse the arithmetic processing needed
and prepared wiring diagrams - a process which was time consuming and prone to error. This
led to people such as John von Neumann to develop the idea of storing programs inside the
computers memory - von Neumann originally envisioned programs which could modify their
own program and rewrite itself. This innovation was a major factor that allowed computers to
advance. This architecture was called the von Neumann architecture.

👻👻moodswings👻👻
The von Neumann architecture used the same memory for instructions and data. The
advantages of this was that a single memory module could be used, minimising the number of
programming pins and buses. Instructions and data were treated equally, allowing data to be
easily embedded into a program. However, this required the bandwidth of the memory bus to
double and optimisations made by having seperate data and address bus sizes were lost.

Manchester Baby

This is sometimes referred to as the Small Scale Experimental Machine and it was the first
ever electronic binary computer which executed its first stored program on June 24th 1948. It
was created by a team at the University of Manchester between 1947 and 1949 consisting of
Turing, Williams, Kilburn and Tootill. It had a 32-bit word length and used serial binary
arithmetic using two's complement integers. It had a single address format order code and a
random access main store of 32 words - extendable up to 8192 words. The computing speed
was about 1.2 ms per instruction. It used Williams' idea of data storage based on cathode ray
tubes (a Williams-Kilburn Tube) as the main store. This CRT had a big advantage over
existing memory (delay line memory) as it allowed fast, random access to short strings of bits
(i.e., 20-bit or 40-bit). The bits were stored as electrical charges on the phosphor of the CRT
and had to be refreshed roughly every 0.2 s.

Delay line memory worked by electrical pulses being converted to sound pulses and then
being transmitted down a long tube of mercury. A sufficient delay allowed a number of bits
of data to be stored before the first bit was received and then re-transmitted. This was slow,
serial and didn't allow random access to memory.

The Manchester Mark I had the first hard disk - two magnetic drums used for backing store.
It was relatively slow, but had 16 times the storage of the CRT memory. The Manchester
Mark I wsa the first to use a paging type memory store, where memory was divided into
pages, each one containing 32 40-bit words (the size of a basic Williams-Kilburn tube) and
was used as the unit of magnetic drum storage. In the Manchester Mk I, there was 4 pages of
RAM main store and 128-page capacity of the drum backing store.

Each 40-bit addressable line could hold either 1 40-bit number, or 2 20-bit instructions.

EDSAC

EDSAC (Electronic Delay Storage Automatic Calculator) was built by Cambridge University
in 1949. EDSAC stood out from its contemparies as it added microcode to the architecture,
rather than the previous method of hardwiring control methods.

Collectively, these machines are referred to as the first generation computers and they share a
similar system architecture. A typical example of the first generation machines was the IAS -
a computer designed by von Neumann. The other major architecture of the time was the
Harvard architecture.

👻👻moodswings👻👻
This generation introduced isolating the processor from the IO with a dual independant bus
architecture, replacing a single system bus with a frontside bus (FSB) and backside bus
(BSB).

Second Generation Computers (Early Transistor)

The second generation of computing brought a change from valve to transistor and lasted
between 1955 and 1964.

William B Shockley has been termed the "father of the transistor". He joined Bell Labs in
1936 in the vacuum tubes department before moving to the semiconductor department. The
transistor is equivalent to a three-terminal electrical switch (triode). early transistors - point
contact - were difficult to manufacture and were also unreliable. A few months later, work
started on the junction transistor. Compared to valves, these were smaller, more reliable,
faster and required less power.

The first experimental transistor computer was built in Manchester in 1953. It used 200 point
contact transistors, 1300 point diodes and had a power consumption of 150 W. Main memory
was implemented using a drum store, meaning that the transistor computer was slower than
the mark one. The design was used as the basis of the MV950 in 1956, which was built using
the more reliable junction transistors.

👻👻moodswings👻👻
The next advance in this generation of computing was the non-volatile ferrite core memory.
Each core had a diameter of about 2 mm and two magnetising wires and one sense wire
passed through each one. The cores were arranged in a matrix such that each location was
addressable.

A current passing through one of the magnetising wires would set a core magnetisation which
could then be detected. Passing a current through the other wire would reset the
magnetisation, giving either 1 or 0.

Many computers in this generation were transistor/valve hybrids, or just transistor


implementations of the older valve computers. Another key part of this generation was the
development of index registers and floating point hardware, the use of dedicated IO
processors and hardware to supervise input/output operations, high level programming
languages such as COBOL, FORTAN and large scale computer manufacturers also supplying
compilers, software libraries with their machines.

Third Generation Computers

The IBM System/360 is the system which defines this generation of system - the classic
mainframe computer system.

This machine used Solid Logic Technology (SLT), which was a transition between discrete
transistors and integrated circuits. The System/360 also introduced an early form of
pipelining to increase machine performance.

Concurrency is used to increase the number of operations that are happening simulataneously
and there are two major approaches to this, temporal (the overlap of heterogenerous
functional units) and spatial (the parallelism of homogenous units). The System/360 used
temporal concurrency, known as pipelining for it's main performance increases, however
spatial concurrency (parallelism) was used in some areas, such as addition, multiply and
divide units. Dependancies could also come into play here, for example, one unit may be
waiting on the result on the other, so some kind of mechanism must be used to stop register
source/sink problems - the System/360 used highly speciallised dynamic scheduling
hardware.

The System/360 used a microprogrammed control unit (from EDSAC), where each processor
instruction in interpreted by a sequence of microinstructions called a microprogram.

The success of the transistor and other advances in solid state physics provided the
foundation for another new technology - ICs, or integrated circuits. In 1958, Jack St. Clair
Kilby whilst working at Texas Instruments succeeded in fabricating multiple components
onto a single piece of semiconductor and hence invented the IC. In 1963, Fairchild invented a
device called the 907 containing two logic gates constructed with 8 resistors and 8 transistors.
SSI was born.

CMOS (Complementary Metal Oxide Semiconductor) logic was the next step. This formed
TTL, and with no resistors, a higher density was possible.

The advent of cheap SSI, MSI and eventually LSI components allowed the development of
minicomputers, such as the PDP-11, first developed in 1970.

👻👻moodswings👻👻
Computers made from ICs could be generallised as the third generation of computers, and
dominated from 1965-1971. In addition to the move from discrete transistors, semiconductor
memory became to be in common use and microprogrammed CPUs became much more
common. Multitasking operating systems were also invented

Fourth Generation Computers

In 1969, Marcian "Ted" Hoff, working for a small company producing memory IC called
Intel, put a general purpose processor onto an IC and called it the 4004. It was a 4-bit
processor in a 16-ping package (at the time, Intel could only produce 16-pin DIPs). The lower
pin count meant a multiplexed address bus had to be used, simplifying the PCB as it required
fewer connections, but adding complexity due to the multiplex logic.

Computers designed using this technology are referred to as fourth generation machines, and
this became the dominant generation between 1971 and today. Here, circuits changed to
VLSI and development became focussed on complicated "system on a chip" architectures.

Processor Design
Design Levels

Processor design requires consideration of all aspects of operation (software and hardware).
For example, from a software point of view, we have A + B (High Level Language), which
becomes Add A, R1 (Instruction Set Architecture, ISA) and then 01010100 (ISA
implementation).

Processor design is based on levels of abstraction. As high level languages hides the details of
the ISA from the programmer, the ISA hides the hardware implementation from the compiler.

At the hardware level, we can abstract further to give us three levels, the processor level
(CPU, memory, IO and how these relate to each other), the register level (combinatorial
logic, registers, sequential circuit) and gate level (logic gates, data buses, flip-flops, etc).

These classifications are hierarchial and lend themselves to a top-down design approach. A
typical design process might be:

1. Specify the processor level structure of the system that implements the desired ISA
2. Specify the register level structure of each distinct component identified in step 1
3. Specify the gate level structure of each distinct component identified in step 2

The first thing that needs to be done is to chose an ISA for the desired application (based on
operand access):

 Stack based machines might be a Transputer T800


 Accumulator machines might be based on EDSAC or a 6502
 Register-memory machines might be based on the IBM 360 or 68000
 Register-register machines might be based on the MIPS or DLX
 Memory-memory machines might be a DEC Vax.

👻👻moodswings👻👻
The key to a good processor design is to optimise performance based on instruction level
simulations.

The next step is to define the ISA. This can be done in programs such as CPUsim where the
ISA is defined as a sequence of microinstructions defining register level transactions.

A processors performance can then be evaluated in software against benchmarks before a


CPU is implemented in hardware, allowing different architectures to be compared.

When top-down designing is used, components should be as independent as possible,


allowing components to be designed and tested independently. Interconnection should be
minimum where possible. Components that have high cohesion and low coupling will help
design and testing. Additionally, component boundaries should correspond to physical
boundaries, each design component should be a physically replaceable component, e.g.,
functional IC or PCB.

The processor level can be represented in a graphical way using a notation called PMS
(processor, memory, switches) and block diagrams can be done using seven basic types:

 Processor (P), e.g., CPU, IOP


 Control (K), e.g., program control unit
 Data operation (D), e.g., arithmetic unit
 Memory (M), e.g., main, cache memory, etc
 Transducer (T), e.g., I/O devices
 Links (L), e.g., I/O port
 Switch (S), e.g., multiplexer, crossbar switching network

The symbol could also be qualified by a subscript, e.g., PIO may be a DMA controller.

VHDL

Before VHDL, boolean equations and logic was used to define a processor, similar
to IDD/DAD. However, as processors became increasingly complex, schematic captures
systems similar to PowerView were used, however as systems became bigger and more
complex, this also became unmanagable, so hardware design languages (HDLs) - a textual
description of the desired hardware - were used.

VHDL as an acronym is a combination of very high speed integrated circuit (VHSIC) and
hardware design language (HDL). It was commisioned by the Department of Defense in
America in the early 1980's as an implementation independent method of describing
electronic systems. VHDL was based on Ada.

VHDL was primarily designed as a hardware modelling/specification language. It is used in


all stages of design, from system level modelling of specification down to modelling the
timing characteristics of implemented systems. Automated synthsis tools can be used to
change VHDL into IC - i.e., these tools construct and optimise a gate level design into
implementation.

As we design on different levels of abstraction, VHDL uses different styles depending on


which level we're using. The two main styles are behavioural, which describes the function of

👻👻moodswings👻👻
the system as a black box where input and output timing relationships can be described in
detail but with no indication of internal architecture (the system level), and register transfer
level (RTL), where a design that is to be implemented physically on silicon has to be
designed as a series of registers and interconnection signals and logic. This level describes
the implementation's architecture (register level).

Implementations that are to be physically realised have to be written in the RTL style.
Although tools can convert behavioural to RTL, this does not generally produce an optimal
design.

 Memory Architectures
 Cache Memory
 CISC
 RISC

Parallelism
To maximise software performance, a hardware architecture should be chosen to allow as
many operations as possible to be perfomed in parallel. If the desired functionality can be
decomposed into parallel processes, significant performance improvements can be made.
However, this is based on the assumption that performing operations in parallel will requite a
shorter time than performing the same operations sequentially. Althought this is essentially
true, there are two key limiting factors in the performance of parallel architectures:

 The communications overhead, if data needs to be passed between processes. The Minsky
conjecture states that the time wasted in synchronising and passing data between processes
will reduce the system's performance.
 The parallel nature of the intended function is also a limiting factor (Amdahl's Law)

Amdahl's Law

Amdahl's Law gives us the speedup for p parallel processes, and is Sp = Ts/Tp, where Ts is the
execution time of the algorithm on the sequential system and Tp is the execution time of the
algorithm on a parallel system using p processors.

However, this is for an ideal algorithm - not all algorithms can be completely parallelised. If
α is the fraction of the algorithm that can not be sped up in parallel with other processes, then:
Tp = αTs + ((1 - α)Ts)/p, hence Sp = p/(1 + (p - 1)α). By setting the number of processors to
infinity, we can calculate the maximum speedup of a system: Sp = lim 1/(1/p + (1 - 1/p)α) =
1/α.

Amdahl's Law states that the maximum potential speedup is limited by the sequential part of
the algorithm that can not be parallelised.

Maximum speedup can only be acheived if a truly parallel architecture (a number of parallel
processes operating independently of each other, with minimised inter-process

👻👻moodswings👻👻
communication) is constructed, minimising any sequential elements. The inter-dependencies
that exist in the chosen algorithm determine if, and how, it can be partitioned into parallel
processes.

Data Dependence

There are three main types of data dependencies that can exist between operations:

 Flow dependence, an operation (A) is dependent on another operation (B) if an execution path
exists from B to A and if at least one output variable of A is used by B.
 Anti-dependence, operation B is anti-dependent on A if B follows A in the execution order
and the output of B overlaps the input of A
 Output dependence: Two operations have output dependencies if they write data to the same
variable.

Resource Dependencies

These dependencies arise from sharing conflicts of limited resources within a system. Typical
examples are bus access, memory ports, register ports and functional hardware units. These
are naturally sequential units that can not be accessed by more than one process at a time.

This type of dependency can be overcome by additional hardware, which increases


complexity and cost.

In general, processes can operate in parallel if they fulfil Bernstein's conditions. For an
operation A, with an input set of IA and an output set of OA, to operate in parallel with an
operation B and the corresponding sets IB and OB, the following must be true.

IA ∩ OB = ∅
IB ∩ OA = ∅

OA ∩ O B = ∅

The parallel relationship is represented by || and is commutitive and associative, but not
transitive.

Concurrency

Concurrency can be implemented using spatial parallelism or temporal parallelism.

Spatial parallelism is when a number of processing modules are available to the controller.
These can be internal to the processor (e.g., multiple ALUs) or external (i.e., multiple
network cards, dual CPU motherboard).

Temporal parallelism is when the required functionallity can not be divided into totally
independent processes, but can be divided into a number of independent processes, with flow
dependencies. Each stage is dependent on the previous stage's output.

Pipelining

👻👻moodswings👻👻
Pipelining is stealth parallelism - nobody realises it's there, but we use it every day. In
unpipelined operation, one operation must complete before the next begins, but in pipelined
operation, the next operation can start before the previous one finishes. For example, as soon
as the instruction fetch stage has completed, you can start fetching the next instruction, as the
memory is now not being accessed, but the decode stage is being used.

This allows for optimal use of the architecture by delaying waiting on something else.

Pipelining is acheived by breaking up the stages of the operation into seperate hardware
sections seperated by registers that can keep state. The addition of registers increases the time
for an operation to be processed, but as the throughput increases, more operations can be
processed at once, increasing speed. The registers pass state up the pipeline.

However, hardware can not always be split up into units taking the same time, one process
may take 15 ns for example, whereas another may only take 10. The clock speed is therefore
limited by the slowest unit and an attempt must be made to balance the stages. There are two
solutions to this, super-pipelining and a super-scalar architecture. Super-pipelining is splitting
up a stage into two distinct units. This is not always possible, so super-scalar architectures
can be used. Super-scalar architectures are when duplicates of the same hardware unit are
used, which allows two operations to be processed at once, therefore stopping a stall for a
particular hardware unit and increasing clock speed.

Deep pipelining is super-pipelining taken to the extreme, where the pipelines become very
long (e.g., the Pentium 4 with a deep pipeline of 31). This leads to increased latency, and a
new limiting factor comes into play - register and routing delays.

The speedup of a pipeline system is dependent on the number of stages within the pipeline,
i.e., the number of operations that can be overlapped. In general, speed-up can be given by:
time for a non-pipelined system / number of pipeline stages.

Hazards prevent the next operation from executing during its designated clock cycle, and can
occur when the results from a previous operation or other result currently in the pipeline is
required. This is called a sequential dependency and additional hardware may be needed to
handle these situations. There are various different types of pipeline hazards:

 Structural hazards - these arise from resource conflicts when the hardware cannot support all
possible combinations of instructions in simultaneous overlapped execution.
 Data hazards - these arise when an instruction depends on the result of a previous instruction
in a way that is exposed by the overlapping of instructions in the pipeline
 Control hazards - these arise from the pipelining of branches or other instructions that change
the pc.

Hazards have a very dramatic effect on the pipelines performance, reducing or removing any
speedup gained by pipelining. Eliminating a hazard often requires that some instructions in
the pipeline are allowed to proceed while others are delayed (stalled). A stall causes the
pipeline performance to degrade the ideal performance.

A common measure of a processor's performance is the number of clock cycles required to


execute an instruction (CPI). On a pipelined machine, the ideal CPI is 1, and the pipelined
CPI can be given by 1 + pipelined clock stall cycles per instruction. We can then calculate

👻👻moodswings👻👻
speedup by the average time for a non-pipelined instruction / average time for a pipelined
instruction, or (unpipelined CPI × unpipelined clock cycle time) / (pipelined CPI × pipelined
clock cycle time). If we ignore the cycle time overhead of pipelining and assume all the
stages are perfectly balanced, the cycle time of the two machines become unpipelined CPI /
(1 + pipeline stall cycles per instruction). If all instructions take the same number of cycles
equal to the number of pipeline stages (pipeline depth), then the unpipelined CPI is equal to
the pipeline depth.

Structural Hazards

When a machine is pipelined, ideally we would want to be able to execute all possible
combinations of instructions within the pipeline. The overlapped execution of instructions
requires pipelining of the functional units and duplication of resources to allow parallel
operations to be performed. If some combination of instructions can not be accommodated
due to a resource conflict, the machine is said to have a structural hazard.

An example of this may be a single memory interface, where a LOAD operation may clash
with the IF stage of another operation. To resolve this, the pipeline can be stalled for one
cycle. The effect of a stall is to effectively occupy the resources for that instruction slot, i.e., a
NOOP.

Introducing stalls does, however degrade performance, and for a designer to let structural
hazards to occur, there must be a good reason. One is to reduce costs, multiple port memory
is more expensive than single port, and other structural hazards may be solved by using more
hardware. Another reason is to reduce latency, shorter latency comes from minimising
pipeline hardware and registers.

Data Hazards

A side effect of pipelining instructions is that the relative timing between overlapped
instructions is changed. Data hazards occur when the pipeline changes the order of read/write
accesses to operands so that the order differs from the order seen by sequentially executing
instructions on the non-pipelined machines, e.g., accessing data before a previous operation
has finished writing it.

To solve this, you can detect when a data dependancy is going to happen (the input of one
operation is going to be output by one already in the pipeline) and then stall all future
instructions until the data is valid (i.e., revert back to a non-pipelined operation), however this
is a massive performance hit. Another method is using a delayed read technique, in this
technique, register file read operations are performed in the second half of the cycle and
writes in the first half. By then ordering/timing the instructions correctly, writes will have
happened before reads.

There are other methods also, forwarding data means extra hardware is used to move the
registers contents between stages, for example, from the stage the data is written to the
registers to the one where it is needed to be read. Control logic decides where the correct
value should be in the chain and then when a read is requested, the value is returned from the
correct set of registers, instead of the ones for that stage. This forwarding method requires an
additional three results on the ALU multiplexor and the addition of three paths to the new
inputs.

👻👻moodswings👻👻
Data hazards can be classified into three types. If we consider two operations, i and j,
with i occuring before j:

 Read after write (RAW) - j tries to read a source before i writes it, so j incorrectly gets the
wrong value. This type of data hazard is the most common and can be solved with forwarding
and delayed read techniques.
 Write after write (WAW) - j tries to write to an operand before it is written by i. The writes
end up being performed in the wrong order, leaving the value written by i, instead of the value
written by j in the destination. This hazard is present only in pipelines that write in more than
one pipe stage or allow a n instruction to proceed whilst a previous instruction is stalled.
 Write after read (WAR) - j writes a new value before it is read by i, so i incorrectly gets the
new value. This can not happen if reads are early and writes are late in our pipeline and this
hazard occurs when there are some instructions that write early in the instruction pipeline and
others that write a source late in the pipeline. Because of the natural structure of pipelines
(reads occur before writes), such hazards are rare and only happen in pipelines for CISC that
support auto-increment addressing and require operands to be read late in the pipeline.

By convention, the hazards are named after the ordering in the pipeline that must be
preserved by the pipeline. Additionally, there is technically a fourth hazard, read after read
(RAR), but as the state isn't changed here, no hazard occurs.

Another technique to overcome pipeline stalls is to allow the compiler to try to schedule the
pipeline to avoid these stalls by rearranging the code sequence to eliminate hazards.

Control Hazards

In general, control hazards cause a greater performance loss for pipelines than data hazards.
When a conditional branch is executred, there are two possible outcomes - the branch not
being taken and the branch being taken.

The easiest method to deal with a conditional branch is to stall the pipeline as soon as the
branch is detected until the new pc is known, however, wasting clock cycles at every branch
is a huge loss. With a 30% branch frequency and an ideal CPI of 1, a machine with branch
stalls only acheives about half of the ideal speedup from pipelining. The number of lost clock
cycles can therefore be reduced by two steps:

 finding out whether the branch is taken or not taken earlier in the pipeline
 compute the target pc, i.e., the address of the branch target earlier

This can be acheived by moving the zero test into the ID stage, it is then possible to know if
the branch is taken at the end of certain branches at that poin, or by computing the branch
target address during the ID stage. This requires an additional adder because the main ALU,
which has been used for this function so far, is not usable until the execution stage. Using the
DLX architecture, a clock stall of 1 cycle is then needed. In general, the deeper the pipeline,
the worst the branch penalty (stalled clock cycles) owing to the increased time to evaluate the
branch condition.

To further minimise pipeline stalls caused by branches, branch prediction can be used.
Analysing the execution of a program at run time, a guess can be made for each branch as to
whether it will be taken or not. Four simple compile time branch prediction schemes are:

👻👻moodswings👻👻
 stall prediction - the simplest scheme to handle branches is to freeze or flush the pipeline,
holding or deleting any instructions after the branch until the branch destination is known.
This simplifies hardware and compiler at the cost of performance.
 predict taken - as soon as the branch is decoded and the target address is computed, we
assume the branch to be taken and begin fetching and executing at the target address. This
method only makes sense when you know the target address before the branch output.
 predict not taken - higher performance, and only slightly more complex than stall prediction,
you assume that the branch is not taken and simple allow the hardware to continue to execute.
Care must be taken to not permenantly change the machines state until the branch outcome is
definately known. The complexity arises from having to know when the state may be changed
by an instruction and having to know when to back out of a change.
 delayed branch - during the branch delay of n cycles (delay slots) where the branch condition
is executed, independent instructions are executed; the compiler inserts instructions into the
branch delay slots which are executed whether or not the branch is taken. The job of the
compiler (the hard part) is to make successor instructions valid and useful. Three common
branch scheduling schemes are:
o from before branch - the delay slot is filled with an independent instruction from
before the branch. This is the preferred choice, since this instruction will always be
executed
o from target - the branch-delay slot is filled with an instruction from the target of the
branch
o from fall through - the branch-delay slot is scheduled from the not taken fall through.
For this and the previous optimisation to be legal, it must be okay to execute the
instruction when the branch goes in the other direction

All the previous branch prediction schemes are static compiler based approaches to
minimising the effect of branches. An alternative approach is to use additional hardware to
dynamically predict the outcome of a branch at run time. The aim of this approach is to
enable to processor to resolve the outcome of a branch early in the pipeline, therefore
removing control dependencies in the program.

One method of doing this is using branch prediction buffers. The simplest example of this is a
one bit memory array indexed by the lower order address bits. At each address a prediction
based on the previous actions is stored. When an instruction is fetched, its address is looked
up in the BPB during the ID stage. This returns a prediction for this branch (address), e.g., 0
means branch not taken and the next sequential address is fetched and 1 means the branch is
taken and the branch target address is fetched. The processor assumes that this prediction is
correct and acts accordingly. If the prediction is incorrect, the prediction bit is inverted and
stored in the BPB and the pipeline is flushed. A two-bit branch prediction scheme is better for
handling loops in loops as it requires two wrong guesses to actually change the state. Two bit
prediction schemes allow more of the branch history to be remembered and the prediction to
be more accurate. The performance of the BPB can be improved if the behaviour of the other
previous branch instructions are also considered.

Branch predictors that use the behaviour of other branches are called correlating predictors or
two level predictors. This type of predictor tries to identify larger patterns of execution
(branch paths) in an attempt to predict the flow of execution.

Another hardware technique is the branch target buffer (BTB). At the end of the IF stage, the
processor needs to have calculated the next address to be fetched. Therefore, if we can find
out if the currently un-decoded instruction is a branch and can predict the result of that
branch, then the branch penalty will be 0. During the IF stage when an instruction is being

👻👻moodswings👻👻
fetched, the pc address is also looked up in the BTB. If there is a match then this instruction
is a branch and the predicted pc is used for the next IF. Unlike the BPB, the BTB must match
the complete address, not just the lower order bits. Otherwise non-branch instructions could
be misinterpreted as branches. The hardware used in this process is the same as that used for
memory caches.

A typical BTB operation may work like this:

 If a match is found in the BTB, this indicates that the current instruction is a branch and is
predicted as taken, therefore use the predicted pc for the next IF.
 If no match, but at the end of the ID, the instruction is determined to be a taken branch, then
the PC and branch addresses are added to the BTB.
 If a match is found but the branch is determined not taken, then the entry is removed from the
BTB, the incorrectly fetched instruction is stopped and then IF restarted at the other target
address.

One final improvement that can be made to the BTB is to store one or more target
instructions instead of, or in addition to, the predicted target address. This allows branch
folding to be used, which allows zero cycle unconditional branches and some conditional
branches. If a match is found in the BTB and it is determined that this instruction is an
unconditional branch, then the stored instruction can be simply substituted.

Multicycle Operations

Until now, we have only considered a simple integer pipeline. In reality, it is impractical to
require floating point operations to complete in one or two clock cycles. This would require
the designer to accept a slow clock speed, use a large amount of floating point logic, or both.
Instead, the floating point pipeline will allow for a longer latency for operations.

A super scalar architecture is used for floating point operations, however this leaves a longer
latency, structural hazards where the floating point hardware can not be fully pipelined,
WAW hazards for having pipelines of different lengths and an increase in RAW hazards due
to increased latency.

Co-processors

General purpose processors attempt to provide a wide spectrum of support for applications.
Inevitably, this will be less suitable for a particular application. This has led to the
development of co-processor based architectures.

For each operation, the desired algorithms are divided between the CPU and its co-
processors. This process is called hardware/software co-design, and there are four common
strategies:

1. Un-biased - the system is specified in an implementation independent manner with no


presumption as to whether components should be implemented in software or hardware.
2. Hardware biased - the system is realised in custom hardware except for those functions which
can be executed in software and which still allow the systems to conform to the specified time
constraints.
3. Software biased - the system is implemented in software, except for those functions that need
to be implemented in hardware to achieve the specific time constraints

👻👻moodswings👻👻
4. Hardware acceleration - The complete system is implemented in hardware as dedicated logic.

Partitioning of a system into its hardware and software components can be performed at a
task, function block or statement level.

A CPU can interact with the co-processor using new opcodes - the CPU issues a command
plus data sequence on its memory interface. This is identified and decoded by the co-
processor. The CPU is either halted whilst this command is processed, or if possible,
continues, being re-synchronised when the results are transferred. Additionally, remote
procedure calls (RPCs) can be used, where the CPU again issues a command sequence, which
is interpreted by the co-processor as an RPC. This is a similar process to the previous method,
but with a more complex interface and tighter synchronisation.

Co-processors could also be implemented using the client-server model. The co-processor is
configured as a server process, similar to the RPC mechanism. Data can now be processed
from any process thread running on the CPU. The co-processor must arbitrate and prioritise
between multiple requests. Another alternative is to specify the co-processor as the master,
off loading tasks difficult to implement in hardware, or infrequent tasks, to the CPU more
suited to dealing with these exceptions.

The final method is by parallel process - the co-processor runs as a top-level parallel process
and communication between the two processors is a peer-to-peer arrangement.

The operations performed on the co-processor will normally require it to have access to
variables and temporary storage. These can be implemented using various methods, such as:

 Having no external memory - data is either stored in specially constructed memory elements
within the FPGA, or is explicitly stored as a result of the functions implementation. This
increases the co-processors potential performance since no interface hardware is required to
access external memory. This implementation is only possible for relatively small data sets,
however.
 Sharing the CPUs memory - if truly parallel operation is not required, then the co-processor
can use the CPUs memory, removing the communication delay of passing data. However,
extra hardware that can negotiate with the CPU to gain access to this memory is required.
This memory will have relatively slow access time, reducing the co-processors performance.
 Dual-port memory - this is based on SRAM memory cells, but with two or more memory
interfaces. This allows multiple simultaneous data accesses, fast access times. However, this
is very expensive, limiting it to a small size.
 Local memory - this decouples the CPU and the co-processor allowing parallel operation.
This does require the co-processor to have additional hardware to interface to the external
memory as before, but now fast SRAM can be used to maximise the co-processors
performance. However, data must now be copied to this memory before processing can start.

👻👻moodswings👻👻
CHAPTER 7:
Mathematics for Computer Science
Types of Numbers
Positive Integers

N = {0, 1, 2, 3, ...}

Integers

Z = {..., -3, -2, -1, 0, 1, 2, 3, ...}

Rationals

Irrationals

Numbers not in the sets of rationals or integers, e.g.,

Reals

R - union of the sets of rationals and irrationals

The set of reals is infinite and uncountable.

The set of integers and rationals are infinite and countable.

Intervals on sets of reals

 Closed interval:
 Open interval:

Therefore, we can write the set of reals as:

We can also mix intervals to get half-opened intervals, e.g.:


or .

👻👻moodswings👻👻
Functions
Informal definition

f : A -> B

Function f maps every value from domain A onto a single value in the range B.

Formal definition

Image of the Function

If f(A) = B, then the function is surjective and can be inverted.

The image is that portion of the range for which there exists a mapping for the domain.

Combining Functions

Classes of Function

Injective Function

One-to-one mapping from domain to range.

👻👻moodswings👻👻
The above diagram shows an injective function, however the one below does not show an
injective function.

Surjective Function

Every value in the range is mapped onto from the domain, the image is identical to the range.

Something that is both injective and surjective is bijective.

Properties of Functions

A function is even if

A function is odd if

Special Types of Functions

👻👻moodswings👻👻
 Polynomial function of degree x.

 Rational functions ( ) where f(x) and g(x) are polynomials.


 Algebraic functions (sum, difference, quotient, power or product of rationals)

Piecewise Function

Function is built up over a series of intervals, e.g.,:

A more general case could be:

The domain of f(x) is the union of the intervals R1, R2, etc: .

To ensure that the piecewise function is indeed a function, they must be pairwise
disjoint: . If any intervals overlap, it is not pairwise disjoint, and is not a
function.

Function Inverse

Let f : A -> B be an injective function. The function g : B -> A is the inverse


of f if: . g(y) can also be written as f-1(y).

A function is not invertable if it is not injective or not surjective.

Function Composition

Consider 2 functions - f : B -> C and g : A -> B, where the domain at f must be indentical to
the image of g.

(f·g)(x) = f(g(x))

f·g has domain A and range C. g must be surjective otherwise f is not a function.

👻👻moodswings👻👻
g(A) = B

Function Limits

 As x gets closer and closer to a, does the value of function f(x) get closer and closer to some
limit, L?
 Can we make the function value f(x) as close to L as desired by making x sufficiently close
to a?

If both of the above requirements are met, then:

A more formal definition is:

Limit Rules

 Constant:
 Linear:
 Power rule:

 Functions to powers:

 Linear roots: , if a > 0 and n is a positive integer, or a < 0 and n is an odd


positive integer

 Roots of functions: , if n is an odd positive integer, or n is an even


positive integer and
 If p(x) is a polynomial function of x and a R, then

👻👻moodswings👻👻
 If q : A -> B is a rational function with A, B R and if a A, then provided that q(a)
exists

Compound Functions Rules

 Constant multiplier:
 Sum:
 Difference:
 Product:

 Quotient:

Tolerances

x has tolerance δ if x (a - δ, a + δ). f(x) has tolerance ε if f(x) (L - ε, L + ε).

Infinite Limits

The statement means that for every ε > 0 there exists a number M such that
if x > M then |f(x) - L| < ε. On a graph, this is represented as the function f(x) lying between
the horizontal lines y = L ± ε. The line y = L is called the horizontal asymptote of the graph
of x.

Continuity of a Function

A function is continuous if its graph has:

 no breaks
 no holes
 no vertical asymptotes

Therefore, we can say that there are the following types of discontinuity:

 removable discontinuity
 jump discontinuity (piecewise functions)
 infinite discontinuity (vertical asymptote)

Conditions for Continuity

f : A -> B is continuous at c A if:

 f(c) B and
 f(c) exists and
 f(x) = f(c)

👻👻moodswings👻👻
For special functions

 Polynomial p(x) (p : R -> R) is continuous for every real number x R.

 A rational function q(x) = is continuous at every real number x R, except where g(x)
= 0.

Continuity on an Interval

Let function f : A -> B be defined on the open interval (a,b) A R. The function is said to
be continuous on (a, b) if:

 f(x) exists for all c (a, b)


 f(x) = f(c)

If we want the function to be defined on the closed interval [a,b], then we must tighten up the
above definition to include:

 f(a) and f(b) are defined


Continuity of Compound Functions

If f and g are continuous, then so are:

 the sum
 the difference
 the product
 the quotient (provided that g(x) ≠ 0)

Continuity of Composite Functions

Consider: f : B -> C and g : A -> B (this is surjective). The domain of f must be identical to
the image of g.

Consider the composition f ∧ g, if g(x) = b and f is continuous at b, then f(g(x)) = f(b)


= f( g(x)). If, in addition, g is continuous at c and f is continuous at g(c), then f(g(x))
= f(b) = f( g(x)) = f(g(c)), then the composite function f ∧ g is continuous at c.

Intermediate Value Theorem

👻👻moodswings👻👻
If f is continuous on [a, b] and w [min(f(a), f(b)), max(f(a), f(b))], then there exists at least
one number in [a, b] such that f(c) = w.

f(w) [f(a), f(b)] => ∃≥1c [a, b][f(c) = w]

Zeroes of a Function

A function f : A -> B is said to have 0 at r A if f(r) = 0

Calculus
Derivative of a continuous function

Provided the limit exists: . Therefore, we can consider the

differentiatible if ∃f′(x).∀x (a, b). Over the closed interval [a, b], f is differentiatible if:
derivative as a function. The domain is the union of all intervals over which f′ exists. f is

 it is differentiatible over the open interval (a, b) and

 exists

Derivative of an Inverse Function

Let g be the inverse of function f : A -> B.

provided that ≠ 0.

👻👻moodswings👻👻
However, this is not well formed, as y is expressed in terms of x. We must use our knowledge
of the original formula and derivative to reform the derivative to be all in terms of y.

Derivative of composite functions

f(g(x)) = f∧g(x)

We can use the chain rule to differentiate this. The chain rule is f(g(x)) = f′(x).g′(x).

The Exponential

The exponential function is of the form ax, where a ∈ (0,∞) and it is a positive real number
constant. The exponential function is defined in five stages.

1. If x is a positive integer then ax = a × a × a ... x times.


2. If x = 0, then a0 = 1
3. If x = -n, where n is a positive integer, then a-n = 1/an

4. If x = , where this is a rational number (i.e., p, q ∈ Z), then

5. If x ∈ R, then

If a ∈ (0,∞), then ax is a continuous function with domain R and range (0,∞).

It is important to remember that this is not the same as the polynomial function. In a
polynomial, the function varies on the base, but here it varies on the exponent. f(x) = ax is
exponential, but f(x) = xn is polynomial.

Monotonicity

If a ∈ (0,1) then ax is a decreasing function.


If a ∈ (1,∞)then ax is an increasing function.

👻👻moodswings👻👻
Exponentials can also be combined. If a, b ∈ (0, ∞) and x, y ∈ R,
then ax + y = ax × ay and ax - y = ax ÷ ay. Also, (ax)y = axy and (ab)x = axbx

Exponential Limits

 If a > 1, limx → ∞ ax = ∞ and limx → -∞ ax = 0


 If a < 1 limx → ∞ ax = 0 and limx → -∞ ax = ∞

Derivative of Exponential

If we try to derive from first principles, we get an indeterminate number in the form of 0/0, so
the limit is when x = 0. Hence, we can say that f′(x) = f′(0)ax

Natural Exponential

This is the exponential where f′(0) = 1, we call this number e and f′(x), where f(x) = ex, is ex -
it is equal everywhere to its own derivative.

Natural Logarithm

The inverse of the natural exponential is the natural logarithm. y = ex ⇔ x = ln y

In order to satisfy injectivity, the domain is (0, ∞) and the range is R.

To figure out the derivative of the natural logarithm, we can use the rules we already know
about inverses to get Dy ln y = 1/ex. To get things in terms of y, if we sub y = ex in we get
Dy ln y = 1/y

The natural logarithm takes the normal logarithmic rules.

Say you want to express ax as an exponential to base e.

ax = ef(x) and then you can take natural logarithms of both side to get x ln a = f(x), therefore by
substituting back in, ax = ex ln a.

👻👻moodswings👻👻
This can then be used to differentiate ax

Dx ax = Dx ex ln a = ln a ex ln a, which by substituting back in becomes ln a ax.

By differentiating from first principles, we showed Dx ax = ax , but since we now

know that Dx ax = ax ln a, then ax ln a = ax . We can cancel through by ax and get

ln a = , so we can now say that f′(x) = ax ln a

Using the rules we learnt earlier for inverses, we can derive the derivative for logax, which
gives us Dy logay = 1/(y ln a)

Iteration Algorithms
Consider xr + 1 = (xr2 + 3) / 4. This has fixed points at 1 and 3. A fixed point is a number which
when put in returns itself (xr + 1 = xr). If we start with numbers near this point (e.g., x0 = 1.5)
we get a sequence that converges towards a point (in this case 1). However, if we put in x0 =
2.9, this converges to 1 also. x0 = 3.1 diverges to ∞

Types of Algorithm

The iterative algorithm for computing x will be based on a function of preceding values from
this sequence. The function will determine convergence, and care should be taken to ensure
convergence exists.

MCS covers two main types of algorithm, one-stage (e.g., Newton-Rapheson) and two-stage
(e.g., Aitken's method). Two stage algorithms tend to have faster convergence, but one-stage
algorithms have simpler initial conditions.

The type of algorithm we looked at above is a one-stage algorithm called fixed-point


iteration. To find the fixed points, a graph is used.

👻👻moodswings👻👻
Conditions for Stability and Convergence

 Under what conditions will fixed-point iterations converge or diverge?


 What path does it take?
 How fast does it move along this path? - Can we choose f(x) to optimise convergance?

Fixed Point Iteration

xr+1 = F(xr)

If it converges to s, limr→∞xr = s and s = F(s), then s is called the fixed point of the iteration
scheme. If the iterative process is initiallised with x0 = s then it will remain static at the fixed
point.

Behaviour Close to the Fixed Point

Suppose xr = s + εr and xr + 1 = s + εr + 1, to converge | εr + 1 | < | εr |, which is equivalent to s + εr


+ 1 = F(s + εr). If you apply the Taylor series, you can expand this as a polynomial series in ε.
F(s + ε) = Σ∞n = 0 anεn

To compute the an coefficient:

1. Set ε = 0. F(s) = a0, as the other terms depend on ε and vanish.


2. Differentiate - F′(s + ε) = a1 + 2εa2 + 3ε2a3 + ..., then set ε = 0 to eliminate terms containing ε.
3. Differentiate again and set ε = 0 gives us F″(s) = 2a2
4. Assemble to give us F(s + εr) = F(s) + εrF′(s) + εr2/2 F′(s)

As s = F(s), we can subtract from both sides to give us: εr+1 = εrF′(s) + εr2/2 F′(s). For small
values of r, we can ignore εr2 and higher powers. This is the recurance relation for the
distance from the fixed point.

So, | εr+1 / εr | ≈ | F′(s) | < 1 is the condition for convergance if for some range of x0 FPI
converges to s = F(s), otherwise if there's no such range, it diverges.

Convergance Paths

The sign of F′(s) determines the sign of the the error. sgn(εr + 1) = sgn(F′(s))r+1.sgn(ε0).

The convergance path can be classified as:

staircase, if F′(s) ∈ (0, 1). This converges to a fixed point from one direction
cobweb or spiral, if F′(s) ∈ (-1, 0). This jumps from side to side of the fixed point.

Rate of Convergance

The rate of convergance can be considered to be linear if F′(s) ≠ 0. εr + 1 ≈ F′(s)εr2.

We have quadratic convergance if F′(s) = 0 and F″(s) ≠ 0. εr + 1 ≈ (F″(s) / 2)εr2.

👻👻moodswings👻👻
Quadratic convergance is much faster than linear convergance. We have the freedom to
choose F(x) so that F′(s) = 0

There are limitations to solving non-linear equations using FPI. Non-uniqueness is a problem
when choosing the iteration function. For any one G(x) there may be many roots, and many
different ways of choosing G(x). There may be no universal F(x) for finding all roots; each
root may require its own F(x), and even if there is a universal F(x), the initial values may
require careful choice. F(x) may converge at different rates for different roots.

Due to the shortcomings with FPI, we can use the Newton-Raphson method.

Newton-Raphson

Wikipedia

In Newton-Raphson, the gradient in the proximity of the root to guide the search. It is a
universal scheme, where only one function is needed to find all the roots - provided that there
are good initial estimates. Newton-Raphson is also a faster iteration method - it is quadratic.

Probability

See ICM notes for


more about sets

We can consider probabiliy in terms of sets:

The empty set - ∅


 The Universal set - U

Union - A ∪ B

Disjoint sets - A ∩ B = ∅
 Intersection - A ∩ B

 Difference - A - B
 Complement - Ac = U - A

Random Events

Events can be one of two types: deterministic, where the same experimental conditions
always give identical answers, or random, where the same experimental conditions give
different answers, but the average of a large number of experiments are identical.

In nature, random events occur much more frequently than deterministic.

Sample space

The sample space is the set of all possible outcomes of a random experiment. S is the
universal set for the random experiment. Members of S are called sample points. e.g., for a
coin toss, S = {H, T}

👻👻moodswings👻👻
Sample spaces come in various forms. The first is a discrete sample space which can be either
finite (e.g., a dice throw or coin toss) or infinite (defined on the set of positive integers, and
are countable). The other sample space is the continuous sample space, which is defined over
the set of reals.

Events

Event A is sure if A ⇔ S - this is probability 1. Event A is impossible if A ⇔ ∅ - this is


probability 0.

For events A and B, algebra can be used:

 either A or B - A ∪ B
 both A and B - A ∩ B
 the event not A - Ac

mutually exclusive events (can not occur together) - A ∩ B = ∅


 event A and not B - A - B

Probability is the likelihood of a specific event occuring and is represented as a real number
in the range [0, 1]. If P(A) = 1, this is a sure event and P(A) = 0 then the event is impossible.

The odds against is the probability of an event not occuring. P(Ac) = 1 - P(A)

Estimating Probability

Epistemic means
relating to knowledge

There are two approaches to estimating probability, the first is the a priori (epistemic)
approach.

If event A can occur in h different ways out of a total of n possible ways, P(A) = h/n. This
assumes the different occurances of A are equally likely and is often referred to as the
uniform distribution of ignorance.

Aletoric means
relating to experience

The second approach is the a posteriori (aleatoric) approach. If an experiment is


repeated n times and the event A is observed to occur h times, P(A) = h/n, however, n must
be a very large value for this to be accurate.

We can break down probability in terms of atomic events into a union of events (an event or
another event) or an intersection of events (an event and another event). These can be
simplified into mutual exclusivity and independance respectively.

Axioms of Probability

👻👻moodswings👻👻
If sample space S is finite, the set of events is the power-set of S. If sample space S is
continuous, the events are measurable subsets.

exclusive events A1, A2, A3, ... then P(A1 ∪ A2 ∪ A3 ...) = P(A1) + P(A2) + P(A3) ...
For every event: P(A) ≥ 0, however for the sure event, P(A) = 1. However, for mutually

The theorems of probability follow the axioms. If A1 ⊂ A2 then P(A1) ≤ P(A2) ^ P(A1 - A2) =
P(A1) - P(A2). For every event P(A) ∈ [0,1]. The impossible event has 0 probability P(∅) = 0
and the complementary P(Ac) = 1 - P(A).

Conditional Probability

If you consider two events, A and B, the probability of B given A has occured has the
notation P(B|A) which can be expressed as P(B|A) = P(A ∩ B) / P(A), or equivalently P(A ∩
B) = P(B|A).P(A).

Mutually exclusive events are disjoint (i.e., P(A ∩ B) = 0), so P(B|A) = 0.

The probability of a set of independent events which are anded together (e.g., event A and
event B, etc) can be expressed as P(A1 ∩ A2 ... An) = ∏ni = 1 P(Ai).

Bayes Theorem

If you consider an event where A1, A2 ... An are mutually exclusive events which could have
caused B. The sample space S = Unk = 1, i.e., one of these events has to occur.

Bayes Rule gives us the probability of event B, and is expressed

as: . Bayes theorem is called the theorem of probability of


causes.

Random Variables

If you associate a variable with each point in the sample space, this is referred to as a random
or stochastic variable. A random variable is discrete if the sample space is countable, or
continuous if the sample space is uncountable.

If you consider a set of discrete random variables, X ∈ {xAk, Ak ∈ S} defined over the sample
space S, the probability distribution P(X = xAk) = f(xAk), where f : S → [0, 1], f(x) ≥ 0, ΣAk ∈
Sf(xAk) = 1.

If X is a continuous random variable, the probability that X takes on the value x is 0, as the
sample space is uncountably large.

Density Functions

👻👻moodswings👻👻
In a continuous distribution, f(x) is defined to mean P(A ≤ x ≤ B) = ∫BAf(x) dx, i.e., the
probability that x lies between A and B is equal to the corresponding area beneath the
function f(x).

Another method of considering density is cumulative distribution. Here, function F(x) = P(X
≤ x) = P(-∞ < x ≤ x). So, we could say that F(x) = ∫x-∞f(x) dx, hence f(x) = dF(x)/dx.

A graph of F(x) plotted against f(x) may look like:

Bernoulli Trials

A random event (e.g., heads in a coin toss, 4 on a dice) occurs with probability p and fails to
occur with probability 1 - p. A random variable X is the number of times the event occurs
in n Bernoulli trials.

Discrete Probability Distribution

The random variable X is the event that there are x successes and n - x failures, in n trials.
This gives us the probability of successes to be px (where p is the probability of success) and
the probability of failures is (1 - p)n - x

There are n!/(x!(n - x)!) ways of getting this Bernoulli Trial, so the probability distribution
P(X = x) = (n!/(x!(n - x)!))px(1 - p)n - x, which we call the binomial distribution.

Expectation

The expectation of random variable X is E(X) = Σj xjP(X = xj) if the event is discrete and
E(X) = ∫ xP(X = x) dx if the event is continuous. This gives us a value μ of X, which is the
expectation of X.

With the expectation, we can derive various theorems:

 If c is a constant and X is any random value, E(cX) = cE(X)


 If X and Y are any random values, E(X + Y) = E(X) + E(Y)
 If X and Y are independent random values, E(XY) = E(X) × E(Y)

Variance

👻👻moodswings👻👻
Variance is the expectation of (X - μ)2 → Var(X) = E[(X - μ)2]. Standard devaiation σX =
√Var(X).

 For any random value X, Var(X) = E(X2) - μ2.


 If c is a constant and X is any random value, Var(cX) = c2Var(X)
 If X and Y are independent random values, Var(X + Y) = Var(X) + Var(Y) and Var(X - Y) =
Var(X) - Var(Y).

Binomial Distribution

The binomial distribution has mean μ = np and variance Var(X) = np(1 - p) which gives a
standardised random value as Z = (X - np)/√np(1 - p)

Poisson Distribution

The binomial distribution is limited for rare events. If there is a large sample (i.e., n → ∞), a
small probability (i.e., p → 0), the mean μ = np and variance Var(X) ≈ np, then the
distribution can be expressed as P(X = x) = (μxe-μ)/x!

Normal/Gaussian Distribution

The limiting case of the binomial distribution is when n → ∞. P(Z = z) = 1/√2π e-z2/2, which is
the normal distribution.

space here is the set of reals (i.e., x ∈ R), which means x is a continuous random value.
A non-standardised random value x is Gaussian if P(X = x) = 1/(σx√2π e-(x - μ2)/(2σ2). The sample

This approximates the binomial with a standardised random value (Z = (X - np)/(np(1 - p))
when n is large and p is neither close to 0 or close to 1.

Formal Languages and Automata


We have the problem of an Ada compiler not knowing whether or not an Ada program is
legal Ada, and not English, FORTRAN or some other illegal language. To start with, we can
look at the problem of recognising individual elements (variable names, integers, real
numbers, etc) and then building these together to make a full language recogniser (a lexical
analyser). One definition may be that variable names always start with a letter which is then
followed by a sequence of digits and letters. We can draw a transition graph to describe what
a legal variable is composed of.

Transition Graphs

Transition graphs are pictoral representations of a regular language definition

👻👻moodswings👻👻
A transition graph consists of states, an initial state, a final state (also called acceptor states)
and named transitions between the states. The double ringed value represents the final state.
In the above transition graph, 1 is the initial state and as 2 can not get to the final state, this is
an illegal state.

From the transition graph, a transition table can be drawn up, for the above example:

letter digit EOS

1 3 2 error

2 error error error

3 3 3 accept

Proving Automata by Induction

Assume we have an on-off switch, we can define S1(n) as automata is off after n pushes if and

prove ∀n, S1(n) ^ S2(n) using induction.


only if n is even and S2(n) as the automata is on after n pushes if and only if n is odd. We can

As usual, we need to prove the base case (S1(0) ^ S2(0)) by considering implication in both
directions, and prove for n + 1.

Central Concepts

An alphabet is a finite non-empty set of symbols, e.g., Σ = {0, 1} or Σ = {a, b, c ... z}, etc

A string is a sequence of symbols of the alphabet, including the empty string (λ). Σk is the set
of strings of length k (however, Σ ≠ Σ1). Σ* is the unions of all strings of length k ≥ 0. Σ0 = λ.
We can also define the set that includes the empty string as Σ+ = Σ* - λ

In strings, (assume a ∈ L) an means a...a (a n times). If w = abc and v = bcd, then wv =


abcbcd. Prefixes are anything that forms the start of a string and a suffix is anything that

👻👻moodswings👻👻
forms the end of a string. The length of a string is the number of symbols in it and this can be
represented using |w|.

A language is a set of strings chosen from some Σ*. L is a language over Σ: L ⊆ Σ*.
Languages are sets, so any operation that can be performed on a set can be performed on a
language, e.g., L is the complement of L.

Models of Computation

Automata are abstract models of computing machines. They can read input, have a control
mechanism (a finite internal state), a possibly infinite storage capability and an output. In this
course, we will mainly be considering finite state automata (FSA) - machines without storage
or memory. An important point to consider is that some problems are insoluble using finite
state automata, and more powerful concepts are needed.

Deterministic Finite State Automata (DFA)

The DFA is specified using a 5-tuple, M = ⟨Q, Σ, δ, i, F⟩, where

 Q is a finite set of states


 Σ is a finite set of possible input symbols (the alphabet)

i is the initial state (i ∈ Q)


 δ is a transition function, such that Q × Σ ← Q

F is a set of final states (F ⊃ Q)



DFA's accept string S, if and only if δ(i, S) ∈ F, where the transition function δ is extended to
deal with strings, such that δ(q, aw) = δ(δ(q, a), w). λ is accepted by M, if and only if i ∈ F.

A language L is accepted by an FSA M if and only if M accepts every string S in L and


nothing else.

Non-deterministic Finite State Automate (NDFA)

The definition of a DFA is such that for every element of the input alphabet, there is exactly
one next state (i.e., δ is a function). An NDFA relaxes this restriction, so there can be more
than one next state from a given input alphabet, so the transition function becomes a relation
ρ.

There are advantages of non-determinism. Although digital computers are deterministic, it is

example, the following NDFA definining awa, where w ∈ {a, b}* is a natural way of
often convenient to express problems using NDFAs, especially when using search. For

expression:

👻👻moodswings👻👻
Any NDFA can be converted into a DFA by considering all possible transition paths in a
NDFA and then drawing a DFA where a node is a set of the nodes in an NDFA.

We can formally define an NDFA as a tuple M = ⟨Q, Σ, ρ, i, F⟩, where:

 Q is a finite set of states,

ρ is a transition relation, ⊃ 2Q × (Σ ∪ {λ}) × Q


 Σ is a finite set of possible input symbols (the alphabet)

i is an initial state (i ∈ Q)

F is a set of final states (F ⊃ Q)



ρ can also be defined as a transition function which maps a state and a symbol (or λ) to a set
of states.

Hence, we can say a string S is accepted by NDFA M = ⟨Q, Σ, ρ, i, F⟩ if and only if there
exists a state q such that (i, S, q) ∈ ρ and q ∈ F, where the transition relation ρ is extended to
strings by (q, aw, r) ∈ ρ, if and only if there exists a state p such that (q, a, p) ∈ ρ ^ (p, w, r)
∈ ρ. The empty string λ is accepted by M if and only if i ∈ F.

λ Transitions

λ transitions are a feature of an NDFA. A transition does not consume an input symbol and
these transitions are labelled with λ.

When converting an NDFA to a DFA with a λ transition, you always follow a λ transition
when plotting the possible paths.

Converting an NDFA to a DFA

We could say that a DFA was a special case of an NDFA, where there are no non-
deterministic elements. From the formal definitions of a DFA and NDFA, we could hence
prove that for any NDFA, an equivalent DFA could be constructed.

If M is an NDFA ⟨S, Σ, ρ, i, F⟩, then we can construct DFA M′ = ⟨S′, Σ, δ, i′, F′⟩, where:

 S′ = 2S
 i′ = {i}

δ is a function from S′ × Σ into S′ such that for each x ∈ Σ, s′ ∈ S′, δ(s′, x) is the set of all s ∈
 F′ is those subsets of S that contain at least one state in F

S such that (u, x, s) ∈ ρ for some u ∈ s′. In other words, δ(s′, x) is the set of all states in S that

can be reached from a state in s′ over an arc named x.

Inductive proof exists


in the lecture slides

Since δ is a function, then M′ is a DFA. All that needs to be done is prove that M and M′
accept the same strings. We can do this using induction on the string length, and using path
equivalence to show that any string accepted by an NDFA is accepted by the same equivalent
DFA.

👻👻moodswings👻👻
Regular Languages

A language L is regular if a DFA or NDFA exists that accepts L. If M is a DFA, L(M) is the
language accepted by M. Every finite language is regular.

See notes for proof


of set operations

If we consider a regular language as a set, we can perform set operations on one or more
regular languages, and still get regular languages out.

However, not all languages are regular and can not be represented by a DFA. For example, if
we consider Scheme expressions, these have a balanced amount of opening and closing
brackets, but these can not be represented by a DFA.

The proof for the pumping


lemma is on the lecture slides

form xnyn for an arbitarily large n ∈ N then it must also contain strings of the
We have a theorem called the pumping lemma that states, if a language contains strings of the

form xnym where n ∋ m.

Regular Expressions

Instead of using machines to describe languages, we switch to an algebraic description - this


is a more concise way of representing regular languages.

In regular expressions, + represents union, juxtaposition represents concatenation (note, some


books use . for concatenation) and * is the star closure. For example:

language regular expression

{a} a

{a, b} a+b

{a, b} ∪ {b, c} (a + b) + (b + c)

{a, b}* ∪ {aa, bc} (a + b)* + (aa + bc)

{a, b}*{ab, ba}* (a + b)*(ab + ba)*

{a, b}* ∪ {λ, aa}* (a + b)* + (λ + aa)*

Formal Definition

👻👻moodswings👻👻
Let Σ be an alphabet. ∅, λ and a ∈ Σ are regular expressions called primitive regular
expressions. if r1 and r2 are regular expressions, then so are r1 + r2, r1r2, r1* and (r1). A string
is a regular expression if and only if it can be derived from primitive regular expressions by
finite number of applications of the previous statement. Every regular language defines a set
of strings; a language.

The lecture slides contain the proof of this.

Generalised Transition Diagrams

A full set of examples


exist in the lecture notes

We can extend transition diagrams to have transitions which consist of regular expressions,
making them more flexible and powerful.

Finite Automata with Output

Moore Machines

A Moore machine is a DFA where the states emit output signals. Moore machines always
emit one extra symbol (i.e., the output has one symbol more in the output string than the input
string) and have no accepting state. In a Moore machine, the state name and the output
symbol are written inside the circle.

Mealy Machines

A Mealy machine is the same as a Moore machine, but it is the transitions that emit the
output symbols. This means that Mealy machines generate output strings of the same length
as the input string. With Mealy machines, the output symbols are written alongside the
transition with an input symbol.

Are Mealy and Moore machines equivalent?

Let Me be a Mealy machine and Mo be a Moore machine.

Previously, our definition of equivalence meant that both machines accept the same string,
however here, we need to define a new concept of equivalence, so we say that the machines
are equivalent if they both give the same output. Under this definition, equivalence will
always be a contradiction as Mealy and Moore machines will always generate strings of
different lengths, given the same input. We must also disregard the first character output from
a Moore machine.

Constructing the Equivalence

To change Moore machines into Mealy, each state needs to be transformed where the output
on the state is put on each transition leading to that state. This can be proved by induction, but
it is easier to prove by inspection.

👻👻moodswings👻👻
Converting a Mealy machine to a Moore machine is harder, as multiple transitions with
different outputs can arrive at the same state. To work round this, you need to replicate a
state, such that all transitions into a particular state give the same output and then the above
transformation process can be reversed. Again, this is proved by inspection.

Pumping Lemma for Regular Languages

that: ∀n > 0, xynz ∈ L


Let L be any infinite regular language. There then exists strings x, y, z (where y ≠ λ) such

 Let M be a DFA with n states that accepts L


 Consider a string w with length greater than n that is recognised by M
 Thus, the path taken by w visits at least one state twice
 With this information, we can break the string into three parts:
o let x be the prefix of w, up to the first repeated state (x may be λ)
o let y be the substring of w which follows x, such that if it travels round the circuit
exactly once it returns to the same state (this must not be λ)
o let z be the remainder of w
 Thus, w = xyz
 Hence, M accepts any string in the form xynz (where n > 0) and could be represented as:

To apply the pumping lemma, you should assume some language L is regular and generate a
contradiction. However, this is not perfect, for example, in the case of palindromes. so
additional methods must be applied based on the length of symbols in the FSA to create a
contradiction.

Decidability

A problem is effectively soluble if there is an algorithm that gives you the answer in a finite
number of steps - no matter on the inputs. The maximum number of steps must be predictable
before commencing execution of the procedure.

An effective solution to a problem that has a yes or no answer is called a decision procedure.
A problem that has a decision procedure is called decidable.

Equivalence of Regular Expressions

How do we conclude whether or not rwo regular expressions are equal? "Generating words
and praying for inspiration is not an efficient process".

A regular expression can be converted into a finite automaton by means of an effective


procedure, and vice versa. The length of the procedure is related to the number of symbols in
the regular expression or the number of states in the finite automaton. We can test the
languages are equivalent by:

👻👻moodswings👻👻
 producing an effective procedure for testing that an automaton does not accept any strings
 looking at the automaton that is constructed by looking at the differences between the two
machines
 if this difference automaton accepts no strings, then the languages are equivalent

The Complement of a Regular Language

If L is a regular language over alphabet Σ then L = Σ* - L is also a regular language. To prove


this, we need to consider:

 The DFA M = (Q, Σ, δ, i, F) such that L = L(M) and the DFA is a completion (no missing
transitions - that is δ(q0, w) is always defined).
 L = L(N), where N is the DFA N = (Q, Σ, δ, i, Q-F), that is, N is exactly like M, but the
accepting states have become non-accepting states, and vice versa.

Closure Under Intersection

operations, specifically de Morgan's law. That is, L1 ∩ L2 = ¬(L1 ∪ L2)


Proving that the intersection of two regular languages is regular is easy as we can use boolean

This can be constructed explicitely also. If L1 and L2 are regular languages, so is L1 ∩ L2.
This can be proved by:

 considering two languages, L1 and L2 and automata M1 and M2. Here, assume L1 = L(M1) and

notice the alphabets of M1 and M2 are the same, this can be done because in general, Σ = Σ1 ∪
L2 = L(M2) and that M1 = {Q1, Σ, δ1, i1, F1}, M2 = {Q2, Σ, δ2, i2, F2}

Σ2
 construct M such that it simulates both M1 and M2. States of M are the pairs of states first

(p, q) where p ∈ Q1 and q ∈ Q2. If a is an input symbol, we have δ(p, q), a) = δ1(p, a),
from M1 and the second from M2. When considering the transitions, suppose M is in the state

δ2(q, a), presuming that we have completed DFAs.

Closure Under Difference

If L and M are regular languages, so is L - M. This is proved by noting that L - M = L ∩ M.


If M is regular than M is regular and because L is regular, L ∩ M is regular

Testing for Emptiness

If we have two languages, L1 and L2 defined either by regular expressions or FSAs, we can
produce an FSA which accepts (L1 - L2) + (L2 - L1) - this is the language that accepts strings
that are in L1 but not L2, or are in L2 but not L1. If L1 and L2 are the same language, then the
automaton can not accept any strings.

To make this procedure an effective procedure, we need to show we can test emptiness. We
can create an effective procedure using induction. The base case is that from the start case, at
least the start case is reachable. The inductive step is that if state q is reachable from the start
state, there is an arc from q to p with any label (an input symbol or λ), then p is reachable. In
this manner, we can compute the set of reachable states. If the accepting state is among them,
then the FSA is not empty, otherwise it is.

👻👻moodswings👻👻
∅ denotes the empty language and any other primitive expression does not. The inductive
You can also test by regular expression to see if the FSA is empty. Here, the base step is that

step is that if r is a regular expression, there are four cases:

1. r = r1 + r2, then L(r) is empty if and only if L(r1) and L(r2) are empty.
2. r = r1r2 is empty if and only if either L(r1) or L(r2) are empty.
3. r = r1* then L(r) is not empty, as this always includes λ
4. r = (r1) then L(r) is empty if and only if L(r1) is empty.

Testing for Equivalence

Now we can give a procedure for checking whether two regular languages are equivalent:

1. Construct M1 = M(L1) and M2 = M(L2)


2. Construct M1 and M2

Construct (M1 ∩ M2) ∪ (M2 ∩ M1)


3. Construct M1 ∩ M2 and M2 ∩ M1
4.
5. Prove the constructed automaton is empty

Testing for (In)finiteness

If F is an FSA with n states, and it accepts an input string w such that n < length(w), then F
accepts an infinite language. This is proved with the pumping lemma.

If F accepts infinitely many strings then F accepts some strings such that n < length(w) < 2n.
This is proved by saying:

1. If F accepts an infinite language, then F contains at least one loop.


2. Choose path with just one loop
3. The length of this loop can not be greater than n
4. Then one can construct a path with length at least n + 1 but less than 2n by going round the
circuit the required number of times.

We can then generate an effective procedure. If we have an FSA with n states, we can
generate all strings in the language with length at least n and less than 2n and check if any is
accepted by the FSA. If it is, the FSA accepts an infinite language; if not, it does not.

Minimising a DFA

For an example,
see lecture 7

Is there a way to minimise a DFA? States p and q are equivalent if and only if for all input
strings w, δ(p, w) is accepting if and only if δ(q, w) is accepting. That is, starting
from p and q with a given input, either both end up in an accepting state (not necessarily the
same one) or neither does (not necessarily the same). States p and q are distinguisable if there
is at least one string w such that one of δ(p, w) and δ(q, w) is accepting and the other is not.

One efficient way of accomplishing this is using the table filling system. For DFA M = (Q, Σ,
δ, i, F), if p is an accepting state and q is non-accepting, then the pair {p, q} is

👻👻moodswings👻👻
distinguishable. The inductive step is that is p and q are states such that for some input
symbol a, r = δ(p, a) and s = δ(q, a) are a pair of states known to be distinguishable then
{p, q} is a pair of distinguishable states.

The procedure can be performed by filling up a table starting with the states we can
distinguish immediately. At the end of the process, those states not marked will show the
equivalent states.

The states eventually become an equivalence relation: reflexive, symmetric and transitive.
The automaton can be minimised by substituting states for equivalent states

Context-Free Grammars
Context-Free Grammars (CFGs) are very easy to write parsers for, and many computer
langauges are context free, such as DTDs for XML.

CFGs consist of a series of productions, rules which define what the language can do, as well
as non-terminal symbols, defined by productions and terminal symbols, which exist in the
language we are trying to define.

The context-free grammar for the language anbn is S → ab | λ, where our non-terminal symbol
is S, the terminals are a and b and "S → ab | λ" is our production.

To make a more formal definition of a context-free grammar, we could start with a grammar
and then restrict it to be context-free.

A grammar is specifiec to be a quadruple G = <V, T, S, P> where

 V is a finite set of nonterminal (or variable) symbols

S ∈ V and is the start symbol


 T is a finite set of terminal symbols (disjoint from V)

P is a finite set of production (or rules) in the form of α → β, where α and β ∈ (V ∪ T)* and


α contains at least one non-terminal

By convention, we use capital letters for non-terminals and small letters for terminal symbols.

To tighten up the above definition to be a context-free grammar, α should consist of a single


non-terminal.

Parsing

There are two ways to show a particular string is in the language of a grammer. The first
involves working from the string towards S, e.g.,

 S → NP VP
 VP → VT NP
 NP → peter
 NP → mary
 VP → runs
 VT → likes

👻👻moodswings👻👻
 VT → hates

So, to parse the string "peter hates mary" using the above CFG, the parsing process would
look like:

 peter hates mary


 NP hates mary
 NP VT mary
 NP VT NP
 NP VP
 S

It is also possible to work from S towards the string, and this process is called deriving. For
example, we could have:

S → NP VP, so rewrite S to NP VP, the notation for this is S ⇒ NP VP


 start with S

NP VP ⇒ peter VP

peter VP ⇒ peter VT NP

peter VT NP ⇒ peter hates NP


peter hates NP ⇒ peter hates mary



from (V ∪ T)* and β is a string from (V ∪ T)+ and a rule β → δ, we can derive αδγ replacing
Formally, we can express derivations as, given a string of symbols αβγ, where α, γ are strings

β with δ:

we write αβγ ⇒ αδγ to indicate that αδγ is directly derived from αβγ
we are generally interested in the question "Is there a sequence of derivations such that α ⇒ ...

⇒δ

In our example, we could have written peter VP ⇒ ... ⇒ peter hates mary. We say that peter
VP derives peter hates mary, and this is written as peter VP ⇒* peter hates mary.

However, this leaves room for mistakes, as it relies on intuition.

To reduce some (but not all) non-determinism from a derivation, we can consider two types
of derivations, left most and right most. In the left most derivation, the left most nonterminal

derivation is indicated by: ⇒lm* and a right most one by ⇒rm*.


is replaced, and in a right most derivation, the right most nonterminal is replaced. A left most

S ⇒* w ⇔ S ⇒lm* w ^ S ⇒* w ⇔ S ⇒rm*

Parse/Derivation Trees

These are graphical representations of derivations. For example, for the left most derivation
given above, the parse tree would look like:

👻👻moodswings👻👻
Ambiguity

We can consider different sentences to be semantically ambiguous, so different


interpretations would give us different parse trees. For example, for the phrase "Tony drinks
Scotch on ice", Tony could either be drinking the drink "Scotch on Ice", he could be drinking
Scotch whilst stood on some ice.

In natural language, ambiguity is part of the language, however in computer languages,


ambiguity is to be avoided.

Parsers

 Parsing is the process of generating a parse tree for a given string.


 A parser is a program which does the parsing.
 Given a string as input, a parser returns a (set of) parse tree(s) that derive the input string.

Languages Generated by a Grammar

 A grammar G generates a string α if S ⇒* α, where S is the start symbol of G.

generated from G, i.e., L(G) = {α ∈ T* | S ⇒* α}


 A grammar G generates a language L(G) if L(G) is the set of all terminal symbols that can be

Context Free Grammars and Regular Languages

Not all context free grammars are regular, however some are. All regular languages can be
expressed by a CFG. A CFG can be express all languages, however.

For a given CFG, the semiword is a string of terminals (maybe none) concatenated with
exactly one variable on the right, e.g.,: tttV.

👻👻moodswings👻👻
To prove this, we have to theorise that for any given finite automaton there is a CFG that
generates exactly the language accepted by the FSA, i.e., all regular languages are context
free. This is proved using a construction algorithm - we start with an FSA and generate a
CFG:

 The variables of the CFG will be all the names of the states in the FSA, although the start

For every edge labelled a ∈ Σ ∨ λ, the transition a between two nodes (A and B) gives a
state is renamed S.

production A → aB (where B = A in the case of a loop)
 An accept state A gives us A → λ
 The initial state gives us the start symbol S

Now we can prove that this CFG generates exactly the language accepted by the original FSA
by showing that every word accepted by the FSA can be generated by the CFG and every
word generated by the CFG is accepted by the FSA (we know in general that this is not
possible, but our CFG has a particular form).

If we consider some word, say abbaa, accepted by the FSA, then we can construct the path
through the FSA to an accepting state through a sequence of semi-paths. These semi-paths
are formed from the string read from the input followed by the name of the state to which the
string takes us.

 start in S, we have a semipath of S


 read a and go to X, semipath: aX
 then read b and go to Y, semipath: abY
 ...
 read the final a and go to semipath abbaaF
 as F is an accepting state, the word abbaa is accepted

This corresponds to a derivation of the string using the productions already generated from
the FSA.

To prove the second statement we have to express every word of the CFG in the form of N →
tN, thus every working string found in a derivation from this CFG must contain only one
nonterminal and that will be at the very end, thus all derivations from this CFG give a
sequence of semiwords, starting with S and corresponding to a growing semipath through the
FSA. The word itself is only generated by the final accepting nonterminal being turned into a
λ - this means the semipath ends in an accepting state, and the word generated is accepted by
the FSA.

We have a final theorem to complete our proof: If all the productions of a CFG are either of
the form N → semiword or N → word (where the word may be λ).

 Draw a state for each nonterminal and an additional state for the accept state
 For every production of the form Nx → wyNz, draw a directed arc from Nx to Nz
 If Nx = Nz then loop
 If Np → wq then we have a final state
 Thus, we have constructed a transition graph
 Any path in the transition graph from start to accept corresponds to a word in the language of
the transition graph and also to the sequence of productions from the CFG which generate the
same word. Conversely, every derivation of a word corresponds to a graph.
 Therefore, this form of CFG is regular.

👻👻moodswings👻👻
Regular Grammars

A grammar G = <V, T, S, P> is a regular grammar if all rules in P are of the form:

 A → xB
 A→x

where A and B are non-terminals and x ∈ T* (i.e., x is either λ or a sequence of terminal


symbols).

Thus, we see that all regular languages can be generated by regular grammars, and all regular
grammars generate regular languages. But, it's still quite possible for a CFG which is not in
the format of a regular grammar to generate a regular language.

Ambiguity in CFGs

Ambiguity is particularly a problem if we want to compare parsed structures, e.g., structured


files. A CFG is ambiguous is there exists at least one string in the language accepted by the
CFG for which there is more than one parse tree.

The question we need to ask is whether or not we can develop an algorithm that removes
ambiguity from CFGs. There is no general algorithm that can tell us whether or not a CFG is
ambiguous (this is discussed further in TOC).

If L is a CFG for which there exists an unambiguous grammar, then L is said to be


unambiguous. However, if every grammar that generates L is ambiguous then the language is
inherently ambiguous.

Are there techniques that might help us avoid ambiguity? We can look at the two causes of
ambiguity:

 no respect for precedence of operators, a structure needs to be forced as required by the


operator precedence
 a group of identical operators can group either from the left or from the right. It does not
matter which, but we must pick one

A technique for removing ambiguity is introducing several new variables each representing
those expressions that can share a level of binding strength.

Total Language Trees

The total language tree is the set of all possible derivations of a CFG starting from the start
symbol S. We can use total language trees to get an idea of what kind of strings are in the
language. Sometimes (but not always), the are useful to check for ambiguous grammars.

However, if an identical node appears twice in a total language tree, this does not necessarily
mean the language is ambiguous, as the parse trees could just be the difference between left
most and right most derivations.

Pushdown Stack Automata

👻👻moodswings👻👻
Pushdown Stack Automata are like NDFA, but they also have a potentially infinite stack (last
in, first out) that can be read, pushed and popped. The automaton observes the stack symbol
at the top of the stack. It uses this and the input symbol to make a transition to another state
and replaces the top of the stack by a string of stack symbols.

The automaton has accept states, but we will also consider the possibility of acceptance by
empty stack.

A non-deterministic pushdown stack automaton (NDPA) is defined by P = (Q, Σ, Γ, δ, q0, z0,


F).

 Q, Σ, q0 and F are defined as before - states, input symbols, initial states and accept states

z0 ∈ Γ is the stack start symbol. The NDPA stack consists of one instance of this symbol in
 Γ is a finite set of symbols called the stack alphabet

out outset

so we can define δ as a function to a set of pairs of stacks and states. δ takes a state q ∈ Q, an
 δ is revised for NDPAs. As this is a non-deterministic automaton, the result is a set of states,

input symbol a ∈ Σ or λ the empty symbol and a stack symbol z ∈ Γ, which is removed from
the top of the stack and produces a set of pairs: a new state and a new string of stack symbols

(λ). Hence, δ : Q × (Σ ∪ {λ}) × Γ → 2(Q × Γ*). (note, this is restricted to finite subsets)
that are pushed onto the stack. If the stack is to be popped then the new string will be empty

Instantaneous Descriptions

To compute an NDPA, we need to describe sequences of configurations. In the case of an


FSA, the state was enough. In the case of an NPDA, we need stacks and states.

The configuration of an NPDA is called the instantaneous description (ID) and is represented
by a triple (q, w, γ), where q is the state, w is the remaining input and γ is the stack. As before,
we assume the top of the stack is the left hand end.

In DFAs and NDFAs, the δ notation was sufficient to describe sequences of transitions
through which a configuration could move, however for NPDAs, a different notation is
required.

├P, or just ├ (when P is assumed) is used to connect two IDs. We also use the symbol ├* to
signify there are zero or more transitions of the NDPA. The basis of this is that I ├* I for any

├* J if there exists a sequence of IDs K1,...,Kn, and ∀i = 1, 2, ..., n - 1, Ki ├ Ki+1


ID I, and inductively, I ├* J, if there exists some ID K such that I ├ K and K ├* J, that is, I

We can also state the following principles of IDs:

 If a sequence of IDs (a computation) is legal, then adding an additional input string to the end
of the input of each ID is also legal
 If a computation is legal, then the computation formed by adding the same string of stack
symbols below the stacks for each ID is also legal
 If a computation is legal and some tail of inputs is not consumed then removing this tail in
each case will not affect the legality

We can express these more formally:

👻👻moodswings👻👻
If P = (Q, Σ, Γ, δ, q0, z0, F) is an NPDA and (q, x, α) ├*P (p, y, β), then for any strings w ∈ Σ*
and γ ∈ Γ* it is also true that (q, xw, αγ) ├*P (p, yw, βγ). This captures the first two principles

above, and can be proved by induction.


 If P = (Q, Σ, Γ, δ, q0, z0, F) is an NPDA and (q, xw, α) ├*P (p, yw, β), then it is also true that
(q, x, α) ├*P (p, y, β). This captures the third principle above

Acceptance by Final State

The language accepted by final state by a NDPA P = (Q, Σ, Γ, δ, q0, z0, F) is: L(P) = {w ∈ Σ*
| (q0, w, z0) ├*P (q, λ, α), q ∈ F, α ∈ Γ*}. Note that in this definition, the final stack content is
irrelevant to this definition of acceptance.

Acceptance by Empty Stack

(q, λ, λ)}, for any state q ∈ Q. (note that the N here indicates null stack).
For an NPDA P, the language accepted by empty stack is defined by N(P) = {w|(q0, w, z0) ├*

For full proof, see Hopcroft


Motwani and Ullman
"Introduction to Automata
Theory, Languages and
Computation", page 231.

The class of languages that are L(P) for some NDPA P is the same for the class of languages
that are N(P), perhaps for a different P. This class is exactly the context free languages. The
proof of this is based on constructing the relevant automata.

When discussing acceptance by empty stack, the final states (final component) of the NPDA
are irrelevant, so are sometimes omitted.

Converting from empty stack to final state

If L = N(PN) for some NPDA where PN = (Q, Σ, Γ, δN, q0, z0), then there is a NPDA PF such
that L = L(PF). Diagrammatically, this looks like:

👻👻moodswings👻👻
We let x0 be a new stack symbol, the initial stack symbol for PF and we construct a new start
state, p0 whose sole function is to push z0, the start symbol of PN onto the top of the stack and
enter state q0, the start state of PN. PF simulates PN until the stack of PN is empty. PF detects
this because it detects x0 on the top of the stack. The state pf is a new state in PF which is the
accepting state. The new automaton transfers to pf whenever it discovers that PN would have
emptied its stack.

We can make a formal definition of this accepting state machine as: PF = (Q ∪ {p0, pf}, Σ, Γ
∪ {x0}, δF, p0, x0, {pf}), where δF is defined by:

 δF(p0, λ, x0) = {(q0, z0x0)}. PF makes a spontaneous transition to the start state of PN pushing its

∀q ∈ Q, a ∈ Σ ∪ {λ}, y ∈ Γ, δF(q, a, y) contains at least all the pairs in δN(q, a, y)


start symbol z0 onto the stack.

In addition, (pf, λ) ∈ δf(q, λ, x0), ∀q ∈ Q




 No other pairs are found in δF(q, a, y)

It can be shown that w ∈ L(PF) ⇔ w ∈ N(PN). This fact is later used as a proof about
languages that an NPDA will accept.

Converting from final state to empty stack

acceptance by empty stack machine is PN = (Q ∪ {p0, p}, Σ, Γ ∪ {x0}, δN, p0, x0).
The construction to do a conversion from an acceptance by final state automata to an

👻👻moodswings👻👻
δN is defined by:

 δN(p0, λ, x0) = {q0, z0x0}. The start symbol of PF is pushed onto the stack at the outset and then

∀q ∈ Q, a ∈ Σ ∪ {λ}, y ∈ Γ, δN(q, a, y) ⊇ δF(q, a, y). PN simulates PF.


goes to the start value of PF.

(p, λ) ∈ δN(q, λ, y), ∀q ∈ F, y ∈ Γ ∪ {x0} - whenever PF accepts, PN can start emptying its

∀y ∈ Γ ∪ {x0}, δN(p, λ, y) = {(p, λ)}. Once in p when PF has accepted, PN pops every symbol
stack without consuming any more input.

on its stack until the stack is empty. No further input is consumed.

Equivalence of NPDAs and CFGs

We have shown from our final state to empty stack theorem that if we let L to be L(P F) for
some NPDA PF = (Q, Σ, Γ, δF, q0, z0, F) then there is a NPDA PN such that L = N(PN). We can
then show that the following classes of language are equivalent:

 context free languages


 languages accepted by final state in an NPDA
 languages accepted by empty state in an NPDA

If we have a CFG G, the mechanism is to construct an NPDA that simulates the leftmost
derivations of G. At each stage of the derivation, we have a sentential form that describes the
state of the derivation: xAα, where x is a string of terminals. A is the first nonterminal
and α is a string of terminals and nonterminals. We can call Aα the tail of the sentential form,
that is, the string that still needs processing. If we have a sentential form of terminals only,
then the tail is λ.

The proposed NDPA has one state and simulates the sequence of left sentential forms that the
grammar uses to derive w. The tail of xAα appears on the stack with A on the top.

To figure out how the NDPA works, we suppose that the NDPA is in an ID (q, y, Aα)
representing the left sentential form xAα. The string being parsed w has form xy, so we are
trying to match y against Aα. The NPDA will guess non-deterministically one of the rules,
say A → β as appropriate for the next step in the derivation. This guess causes A to be
replaced by β on the stack, so, (q, y, Aα) ├* (q, y, βα). However, this does not necessarily

👻👻moodswings👻👻
lead to the next sentential form as β may be prefixed by terminals. At this stage, the terminals
need to be removed. Each terminal on the top of the stack is checked against the next input to
ensure that we have guessed right. If we have not guessed right, then the NPDA branch dies.
If we succeed then we shall eventually reach the left sentential form w, i.e., the terminal
string where all the symbols of the stack have been expanded (if nonterminals) or matched
against input (if terminals). At this point, the stack is empty and we accept by empty stack.

For proof, see


lecture notes

of NPDA P is as follows: P = ({q}, T, V ∪ T, δ, q, S) (this NPDA is accepting by empty


We can make this more precise by supposing F = (V, T, S, R) is a CFG and the construction

stack), where δ is defined as:

A ∈ V, δ(q, λ, A) = {(q, β) | A → β ∈ R}
a ∈ T, δ(q, a, a) = {(q, λ)}

To convert from an NPDA to a CFG, a similar approach is used, but in the other direction.
Our problem is that given an NPDA P, we need to find a CFG G whose language is the same
language that P accepts by empty stack. We can observe that:

 a fundamental event as far as an NPDA is concerned is the popping of a symbol from the
stack, while consuming some input - this is a net process and may involve a sequence of steps
including the pushing of other symbols and removal
 this process of popping the element may involve a sequence of state transitions, however
there is a starting state at which the process begins and an ending state where it finishes.

In the construction of the grammar below, we use the following notation: nonterminals are
associated with the net popping of a symbol, say X and the start and finish states, say p and q
for the popping of X. Such a symbol is in the form [pXq].

If we let P = (Q, Σ, Γ, δ, q0, z0, F) be an NDPA, then there is a CFG G such that L(G) = N(P).
We can construct G = (V, Σ, S, R) as follows:

Other symbols: [pXq] where p and q are states in P and Q and X ∈ Γ


 Let a special symbol S be the start symbol

 Productions (R) for all states p we have S → [q0z0p]. This generates all the strings that cause

Suppose δ(q, a, X) contains the pair (r, Y1...Yn) where a = Σ ∪ λ then for all lists of
P to pop z0 while going from q0 to p, thus for any string w, (q0, w0, z0) ├* (p, λ, λ)

states r1...rn G includes the production [qXrn] → a[rY1Y1]...[rn-1Ynrn]. This production says
that one way to pop X and go from q to state rn is to read a (which may be λ) and then use
some input to pop Y1 off the stack while going from r to r1, then read some input that pops Y2,
etc...

that[qXp] ⇒* w ⇔ (q, w, X) ├* (p, λ, λ)


In order to prove that P will accept strings that satisfy this grammar we have to prove

Deterministic Push Down Automata (DPDAs)

A DPDA provides no choice of move in any situation. Such choice might come from:

👻👻moodswings👻👻
 More than one pair in δ(q, a, x)
 A choice between consuming a real input symbol and making a λ transition.

P = (Q, Σ, Γ, δ, q0, z, F) is deterministic exactly when:

1. δ(q, a, x) has at most one member for any q ∈ Q, a ∈ Σ ∪ {λ}, x ∈ Σ


2. If δ(q, a, x) is non empty for some a ∈ Σ, then δ(q, λ, x) must be empty

DPDAs accept all languages by final state that are regular, but there are also some non-
regular languages that are accepted. DPDAs do not accept some context-free languages. All
languages accepted by DPDAs are unambiguous, however not all inherently unambiguous
languages are accepted by a DPDA.

Simplifying CFGs

Every context free language can be generated by a CFG in which all productions areof the
form: A → BC, or A → a. This form is called the Chomsky Normal Form. To get to this
form, we need to perform various steps.

reachable. X is generating if X ⇒* w, for some terminal string w. Every terminal is therefore


Firstly we remove useless symbols. For a symbol to be useful, it must be both generating and

generating. X is reachable if there is a derivation S ⇒ αXβ for some α and β.

To compute generating symbols, we have the basis that every terminal in T is generating. The
inductive step is that if there is a production A → α and every symbol of α is known to be
generating then A is generating.

To compute reachable symbols, we have the basis of S being reachable and the inductive step
of supposing variable A is reachable. Then, for all productions with A as the head (LHS), all
the symbols of the bodies of those productions are also reachable.

For the next step, we need to eliminate nullable symbols. A is nullable if A ⇒* λ. If A is


nullable, then whenever A appears in a production body, A may or may not derive λ.

To compute nullable symbols, we have the base step that if A → λ is a production of G, then
A is nullable. Inductively, if there is a production B → C1C2...Ck, where each Ci is nullable,
then B is nullable. Ci must be a variable to be nullable. The procedure for this is:

1. Suppose G = (V, T, S, P) is a CFG

3. Construct G1 = (V, T, S, P1), where P1 is determined as follows: Consider A → X1X2...Xk ∈


2. Determine all the nullable symbols

P, k ≥ 1. Suppose m of the k Xis are nullable then the new grammar will have 2m versions of
this production, where nullable Xis are present and absent. (If m = k, then don't include the A
→ λ case, otherwise if A → λ is in P, then remove this production)

The final step is to remove unit productions: A → B, where A and B are variables. These

identifying cycles. The procedure for identifying unit pairs, A ⇒* B using only unit
productions serve no use, so can be removed. However, we need to take care when

productions, has the base step of (A, A) is a unit pair, as A ⇒* A. Then by induction, if (A,
B) is a unit pair and B → C is a production where C is a variable, then (A, C) is a unit pair.

👻👻moodswings👻👻
To eliminate unit productions, if we're given a CFG G = (V, T, S, P), then construct G1 = (V,
T, S, P1) by finding all the unit pairs, and then for pair (A, B), add to P1 all the productions A
→ α where B → α is a non-unit production of P.

Now we have the three processes for converting a CFG to an equivalent minimised CFG, we
have to consider a safe order for performing these operations to ensure a minimised CFG.
This is:

1. Remove λ productions
2. Eliminate unit productions
3. Eliminate useless symbols

Chomsky Normal Form

To reduce a grammar to the Chomsky Normal Form, we have to perform the following
procedure:

1. Remove λ productions, unit productions and useless symbols, as above. As a result, every
production has the form A → a, or has a body of length 2 or more.
2. For every terminal that appears in a production body of length 2 or more, create a new
variable A and a production A → a. Replace all occurances of a with A in production bodies
of length 2 or more. Now each body is either a single terminal or a string of at least 2
variables with no terminals.
3. For bodies of length greater than 2, introduce new variables until we have all bodies down to
length 2.

The Chomsky Normal Form is helpful in decidability results. For instance, it can turn parse
trees into binary trees and is therefore able to calculate the longest path and is then bound on
execution time.

Chomsky Hierarchy of Grammars


Chomsky proposed the following classification for grammars in terms of the language they
generate.

Grammar Langage
Restriction on α → β Accepting Automata
Name Generated

Turing machine (a general purpose


computing machine with infinte memory
Unrestricted α contains at least one non- Recursively
and read/write operations on memory).
(type 0) terminal enumerable
Transitions can take into account the
contents of any memory location

Context- Linear-bounded automata (like a Turing


α contains at least one non- Context-
sensitive (type machine, but with finite memory and
terminal and |α| ≤ |β| sensitive
1) restrictions on usage)

Context-free
α is a single non-terminal Context-free Non-deterministic pushdown automata
(type 2)

👻👻moodswings👻👻
α is a single non-terminal
and β is a sequence of
Regular (type
terminals (possibly zero) Regular Finite state automata
3)
followed by at most one
non-terminal

This can be represented by Venn diagram.

👻👻moodswings👻👻

You might also like