Classical and Quantum Computing With C++ and Java Simulations
Classical and Quantum Computing With C++ and Java Simulations
Yorick Hardy
Willi-Hans Steeb
Classical and
Quantum Computing
with C++ and Java Simulations
Springer Basel AG
Authors:
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concer-
ned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, repro-
duction on microfilms or in other ways, and storage in data banks. For any kind of use permission of the copy-
right owner must be obtained.
List of Figures · xv
Preface . . . . . · xxi
I Classical Computing
1 Algorithms
1.1 Algorithms . ...... 3
1.2 Algorithm Verification. 6
1.3 Random Algorithms . . 10
1.4 Total and Partial Functions . 15
1.5 Alphabets and Words 18
2 Boolean Algebra
2.1 Introduction 23
2.2 Definitions. 24
2.3 Rules and Laws of Boolean Algebra 26
2.4 DeMorgan's Theorem 27
2.5 Further Definitions . 27
2.6 Boolean Function Implementation 32
2.6.1 Karnaugh Maps . . . . . . 35
2.6.2 Quine-McKluskey Method 38
2.7 Example Programs . . . . . . . . 41
2.7.1 Efficient Set Operations Using Boolean Algebra 41
2.7.2 Quine-McKluskey Implementation . . . . . . . . 46
vi Contents
3 Number Representation
3.1 Binary, Decimal and Hexadecimal Numbers. 51
3.1.1 Conversion .. . 53
3.1.2 Arithmetic .. . 58
3.1.3 Signed Integers 60
3.1.4 Overflow . . . . 67
3.1.5 Binary-Coded Decimal Form. 70
3.2 Floating Point Representation 72
3.2.1 Introduction .. 72
3.2.2 Representation . . . . 74
4 Logic Gates
4.1 Introduction 79
4.2 Gates .... 80
4.2.1 AND Gate. 80
4.2.2 OR Gate .. 81
4.2.3 XOR Gate. 82
4.2.4 NOT Gate (Inverter) 83
4.2.5 NAND Gate. 84
4.2.6 NOR Gate .. 85
4.2.7 XNOR Gate. 86
4.3 Buffer . . . . . . . . . 87
4.4 Tri-State Logic . . . . 88
4.5 Feedback and Gates . 89
5 Combinational Circuits
5.1 Introduction 91
5.2 Decoder... 92
5.3 Encoder . . . 93
5.4 Demultiplexer 96
5.5 Multiplexer.. 97
5.6 Binary Adder 98
5.6.1 Binary Half Adder 98
5.6.2 Binary Full Adder. 99
5.6.3 Binary Four-Bit Adder .100
5.6.4 Faster Addition · 101
5.7 Binary Subtraction . . . . . . · 102
5.8 Binary Multiplication . . . . . · 103
5.8.1 Unsigned Integer Multiplication · 103
5.8.2 Fast Multiplication . . . . . . · 105
5.8.3 Signed Integer Multiplication · 106
5.9 Binary Division . . . . . . . . . . . . · 107
Contents vii
7 Synchronous Circuits
7.1 Introduction · 125
7.2 Shift Registers · 127
7.3 Binary Counter · 129
7.4 Example Program · 133
8 Recursion
8.1 Introduction · 135
8.2 Example Programs · 140
8.3 Mutual Recursion · 152
8.4 Wavelets and Recursion. .156
8.5 Primitive Recursive Functions · 162
8.6 Backtracking . . . . . . . . . . · 165
8.7 Stacks and Recursion Mechanisms .168
8.7.1 Recursion Using Stacks. · 168
8.7.2 Stack Free Recursion · 169
11 Cryptography
11.1 Introduction . 215
11.2 Classical Cypher Systems. .216
11.3 Public Key Cryptography. .221
14 Neural Networks
14.1 Introduction .261
14.2 Hyperplanes .266
14.3 Perceptron. .268
14.3.1 Introduction. .268
14.3.2 Boolean Functions .272
14.3.3 Perceptron Learning .275
14.3.4 Quadratic Threshold Gates .279
14.3.5 One and Two Layered Networks. .282
Contents ix
15 Genetic Algorithms
15.1 Introduction. .313
15.2 The Sequential Genetic Algorithm .315
15.3 Gray Code . . . . . .320
15.4 Schemata Theorem .323
15.5 Markov Chain Analysis .326
15.6 Bit Set Classes in C++ and Java .328
15.7 A Bit Vector Class. . . . . . . . . .333
15.8 Maximum of One-Dimensional Maps. .337
15.9 Maximum of Two-Dimensional Maps .346
15.10 The Four Colour Problem. .356
15.11 Problems with Constraints .360
15.11.1 Introduction . . . . .360
15.11.2 Knapsack Problem .362
15.11.3 Traveling Salesman Problem . .368
15.12 Other Applications for Genetic Algorithms .380
15.13 Distributed Global Optimization. .381
15.14 Genetic Programming . . . . . . .384
15.15 Gene Expression Programming. .392
II Quantum Computing
16 Quantum Mechanics
16.1 Hilbert Spaces .403
16.2 Linear Operators in Hilbert Spaces . .417
16.3 Schmidt Decomposition . . . . . . . .431
16.4 Spin Matrices and Kronecker Product .434
16.5 Postulates of Quantum Mechanics . . .442
x Contents
20 Teleportation
20.1 Introduction .507
20.2 Teleportation Algorithm .508
20.3 Example Program . . . . .511
Contents xi
21 Quantum Algorithms
21.1 Deutsch's Problem. .515
21.2 Simon's Problem. . .519
21.3 Quantum Fourier Transform .522
21.4 Factoring (Shor's Algorithm) .524
21.5 The Hidden Subgroup Problem. .528
21.6 Unstructured Search (Grover's Algorithm) .530
21.7 Quantum Key Distribution .537
21.8 Dense Coding . . . . . . .539
24 Quantum Hardware
24.1 Introduction. .563
24.2 Trapped Ions. .564
24.3 Cavity Quantum Electrodynamics .565
24.4 Quantum Dots . . . . . . . . . . . .566
24.5 Nuclear Magnetic Resonance Spectroscopy .569
Bibliography .573
Index . . . . .585
List of Tables
14.1 Function Table for the Boolean Function (Xl· X2) + (X2 . X3) .271
14.2 Training Set for Parity Function .301
23.1 Error Syndrome for the 5 Qubit Error Correction Code .561
List of Figures
0 empty set
N natural numbers
No NU {O}
Z integers
Q rational numbers
R real numbers
R+ nonnegative real numbers
e complex numbers
Rn n-dimensional Euclidean space
en n-dimensional complex linear space
1i Hilbert space
i :=A
Rz real part of the complex number z
~z imaginary part of the complex number z
AcB subset A of set B
AnB the intersection of the sets A and B
AUB the union of the sets A and B
fog composition of two mappings (j 0 g)(x) = f(g(x))
1/;, I1/;) wave function
t independent time variable
x independent space variable
xERn element x of Rn
11·11 norm
xx y vector product
0 Kronecker product, tensor product
1\ exterior product (Grassmann product, wedge product)
(,), ( I) scalar product (inner product)
det determinant of a square matrix
tr trace of a square matrix
{, } Poisson product
[, 1 commutator
[, l+ anticommutator
15jk Kronecker delta
° °
8 delta function
sgn(x) the sign of x, 1 if x > 0, -1 if x < 0, if x =
xx List of Symbols
eigenvalue
f real parameter
I unit operator, unit matrix
U unitary operator, unitary matrix
II projection operator, projection matrix
H Hamilton function
II Hamilton operator
v potential
bj , b{ Bose operators
Cj, cj Fermi operators
p momentum
P momentum operator
L angular momentum
i angular momentum operator
1;3) Bose coherent state
D differential operator a/ax
n+ Moller operator
Yim(B, ¢) spherical harmonics
AND operation in Boolean algebra
+ OR operation in Boolean algebra
EEl XOR operation in Boolean algebra
A negation of A in Boolean algebra
lxJ the greatest integer which is not greater than x
Preface
Scientific computing is not numerical analysis, the analysis of algorithms, high per-
formance computing or computer graphics. It consists instead of the combination
of all these fields and others to craft solution strategies for applied problems. It is
the original application area of computers and remains the most important. From
meteorology to plasma physics, environmental protection, nuclear energy, genetic en-
gineering, symbolic computation, network optimization, financial applications and
many other fields, scientific applications are larger, more ambitious, more complex
and more necessary. More and more universities introduce a Department of Sci-
entific Computing or a Department of Computational Science. The components of
this new department include Applied Mathematics, Theoretical Physics, Computer
Science and Electronic Engineering. This book can serve as a text book in Scientific
Computing. It contains all the techniques (including quantum computing). Most of
the chapters include C++ and Java simulations.
Chapter 1 covers the description of algorithms and informal verification techniques.
Some basic concepts for computing are also introduced, such as alphabets and words,
and total and partial functions.
Chapter 2 discusses Boolean algebra. The definition of a Boolean algebra is given,
and various properties of the algebra are introduced. The chapter focuses on how
Boolean algebra can be used to implement a computation. Methods are discussed
to obtain efficient implementations.
Chapter 3 deals with number representation for computing devices. This includes
the different implementations of integers, and the representation of real numbers.
Conversion between different representations of numbers is also described.
Chapter 4 gives an overview of logic gates, which serve as the building blocks for
implementing functions in digital electronics. All of the commonly used gates such
as AND, OR, XOR and their negations are discussed.
Chapter 5 shows how to use the gates introduced in Chapter 4 to build circuits
for specific purposes. The circuits described are important components in com-
puting devices. The arithmetic operations such as addition and multiplication are
described, as well as methods to increase the efficiency of the implementations. Var-
ious techniques for programming circuits are also considered, such as programmable
logic devices and programmable gate arrays.
Chapter 6 is about latches. Latches serve as memory for a computing device. We
consider three different types of latches. Using the latches, registers can be con-
structed which provide memory capability in a more useful form.
xxii Preface
1.1 Algorithms
An algorithm [48, 63, 77, 97, 115] is a precise description of how to solve a prob-
lem. For example algorithms can be used to describe how to add and subtract
numbers or to prove theorems. Usually algorithms are constructed with some basic
accepted knowledge and inference rules or instructions. Thus programs in program-
ming languages such as C++ and Java are algorithms. Thus an algorithm is a map
f : E -+ A of the input data E to the set of output data A.
Knuth [104] describes an algorithm as a finite set of rules which gives a sequence
of operations for solving a specific type of problem similar to a recipe or procedure.
According to Knuth [104] an algorithm has the following properties
3. Input. An algorithm has zero or more inputs, i.e., quantities which are given
to it initially before the algorithm begins.
4. Output. An algorithm has one or more outputs, i.e., quantities which have a
specified relation to the inputs.
Not every function can be realized by an algorithm. For example, the task of adding
two arbitrary real numbers does not satisfy finiteness.
Example. The Euclidean algorithm is a method to find the greatest common divisor
(GCD) of two integers. The GCD d of two integers a and b is the integer that divides
a and b, and if c < d is a divisor of a and b then c divides d.
1. Let a' := a and b' := b
2. Let r and q be integers such that a' = qb' + rand 0 ::; r < b'
3. If r is not zero
• 2) q = 1 and r = 3
• 2) q = 6 and r = 0
• 4) The GCD is 3.
Now we find the GCD of 113 and 49:
• 2) q = 2 and r = 15
• 3) a' = 49 and b' = 15
• 2) q = 3 and r = 4
• 3) a' = 15 and b' = 4
• 2) q = 3 and r = 3
• 2) q = 1 and r = 1
• 2) q = 3 and r = 0
• 4) The GCD is 1.
1.1 Algorithms 5
An algorithm executes uniquely if, for a given input, the termination of the algo-
rithm is always the same, i.e. the variables, memory, state, output and position of
termination in the algorithm are always the same.
Example. The Euclidean algorithm is deterministic, in other words for any given a
and b the algorithm will always give the same result (the GC D of the given values).
"
Definition. An algorithm which is not deterministic is said to be non-deterministic.
• The algorithm assigns probabilities P1,P2, ... ,Pn according to merit for each
of the options, where
n
Pi 2: 0, LPi = 1.
i=1
n
S:= - LPilog2(Pi)
i=1
• The outcome of the event is used for learning where the learning is weighted
using S.
6 Chapter 1. Algorithms
• P holds for 1,
• if P holds for k then P holds for k + 1,
So if P holds for 1 it also holds for 2 and therefore also for 3 and so on.
Example.
• For n = 1
t i = 1 = n(n + 1)
i=1 2
• Suppose
ti= k(k+1)
i=1 2
then
Thus
ti
i=1
= n(n+ 1)
2
• For n =1
(1 +xt ~ 1 +nx
• Suppose
(l+x)k ~ l+kx
then
(1 + x)k+l = (1 + x)k(l + x) ~ (1 + kx)(l + x)
now
(1 + kx)(l + x) = 1 + (k + l)x + x2 ~ 1 + (k + l)x
Thus
(1+xt~l+nx
This method allows us to verify that a property is true for all natural numbers by
"-
building on initial truths. The same method can be extended for algorithm verifi-
cation. A program starts with some conditions known to be true and verification
is the process of determining whether certain desired properties always hold during
execution to give the desired result.
Example. We consider the algorithm to add n numbers Xl, X2, •.. , Xn:
i
sum = LXj
j=l
4. Postcondition:
2. Invariant: GCD(a,b)=GCD(a',b')
Let rand q be integers such that a' = qb' + rand r < b'
3. If r is not zero
Obviously GCD(b', r) divides GCD(a', b'). The reverse argument is also easy.
GCD(a', b') divides both a' and b' and therefore divides r. When r = 0 the GCD
~V. ..
1.2 Algorithm Verification 9
In C and C++ the function assert in the header file assert. h is provided to help
with debugging. The function takes one argument which must be an expression
with numerical value. The function assert aborts the program and prints an error
message if the expression evaluates to O.
Example. We can use assert to make sure that, whenever a program calls the
function sum, the function adds at least one number.
II sum.cpp
#include <iostream>
#include <assert.h>
assert(n> 0);
int i;
double sum = 0.0;
for(i=O;i < n;i++) sum += x[i]; II invariant sum = x[O]+ ... +x[i]
void main(void)
{
double x[5] = { 0.5,0.3,7.0,-0.3,0.5 };
sum=8
Assertion failed: n>O, file sum.cpp, line 9
Random algorithms exist for numerical integration but other numerical methods are
generally better for not too large dimensions.
void srand(unsigned)
in stdli b . hand
unsigned time(time_t *)
srand(time(NULL))
initializes the random number generator. The function rand 0 generates a ran-
dom number between 0 and RANDJ1AX. Note that the random number sequences
generated in this way by the computer are not truly random and are eventually
periodic. The number sequences have properties which make them appropriate ap-
proximations for random number sequences for use in algorithms. The statement
double (rand () ) /RAND_MAX takes the integer returned by rand () and casts it to type
double so that the division by RANDJ1AX gives a random number of type double in
the unit interval [0,1].
Example. To calculate the value of 7r we use the fact that the area of a quadrant of
the unit circle
pi=3.13994
pi=3.13806
pi=3.14156
pi=3.13744
1.3 Random Algorithms 11
II calcpi.cpp
#include <iostream>
#include <time.h>
#include <stdlib.h>
void main(void)
{
const int n = 500000;
double x,y,pi;
int i;
for(i=O;i<n;i++)
{
x = double(rand())/RAND_MAX;
y double(rand())/RAND_MAX;
if (x*x+y*y<=l)
in_count++;
}
pi 4. O*double(in_count)/n;
cout « "pi=" « pi « endl;
}
12 Chapter 1. Algorithms
Example. Annealing [164] is the process of cooling a molten substance with the
objective of condensing matter into a crystaline solid. Annealing can be regarded
as an optimization process. The configuration of the system during annealing is
defined by the set of atomic positions rio A configuration of the system is weighted
by its Boltzmann probability factor,
e-E(r;)/kT
where E(ri) is the energy of the configuration, k is the Boltzmann constant, and T
is the temperature. When a substance is subjected to annealing, it is maintained at
each temperature for a time long enough to reach thermal equilibrium.
The iterative improvement technique for combinatorial optimization has been com-
pared to rapid quenching of molten metals. During rapid quenching of a molten
substance, energy is rapidly extracted from the system by contact with a massive
cold substrate. Rapid cooling results in metastable system states; in metallurgy, a
glassy substance rather than a crystalline solid is obtained as a result of rapid cool-
ing. The analogy between iterative improvement and rapid cooling of metals stems
from the fact that iterative improvement and rapid cooling of metals accepts only
those system configurations which decrease the cost function. In an annealing(slow
cooling) process, a new system configuration that does not improve the cost function
is accepted based on the Boltzmann probability factor of the configuration. This
criterion for accepting a new system state is called the Metropolis criterion. The
process of allowing a fluid to attain thermal equilibrium at a temperature is also
known as the Metropolis process.
If the initial temperature is too low, the process gets quenched very soon and only
a local optimum is found. If the initial temperature is too high, the process is
very slow. Only a single solution is used for the search and this increases the
chance of the solution becoming stuck at a local optimum. The changing of the
temperature is based on an external procedure which is unrelated to the current
quality of the solution, that is, the rate of change of temperature is independent of
the solution quality. These problems can be rectified by using a population instead
of a single solution. The annealing mechanism can also be coupled with the quality
of the current solution by making the rate of change of temperature sensitive to the
solution quality.
In the following program we apply simulated annealing to find the minimum of the
function
f(x) = x 2 exp( -x/15) sinx.
1.3 Random Algorithms 13
II anneal.cpp
II simulated annealing
II x range: [0 : 100]
#include <iostream>
#include <math.h>
#include <stdlib.h>
#include <time.h>
int mainO
{
cout « "Finding the minimum via simulated annealing:" « endl;
double xlow = 0.0; double xhigh = 100.0;
double Tmax = 500.0; double Tmin = 1.0;
double Tstep = 0.1;
double T;
srand(time(NULL));
double s = rand()/double(RAND_MAX);
{
xcurrent = xnew;
Ecurrent = Enew;
}
}
return 0;
}
15x*
tan(x*) = --3-
x' - a
Definition. A function
f : Al x A2 X ... x An --- B
is said to be n-ary. Unary, binary and ternary are synonyms for 1-ary, 2-ary and
3-ary respectively. In the expression
Definition. The range of a function f : A --- B is the set { f(a) I a E dom(f) } and
is denoted rng(f).
g(x,y) =x-y
Ix E A
X(x):= { 0
x~A
exp(O'.D)f
II trans.cpp
#include <iostream>
#include <math.h>
double fl(double x)
{
return sin(x);
}
int f2(int x)
{
return x*x;
}
void mainO
{
double xl = 1. 0;
double alphal = 0.5;
int x2 = 5;
int alpha2 = 3;
Definition. Let x, y E E* where x = a]a2 .. ' an and y = b]b2 .. . bm. The concatena-
tion of x and y is xy = a]a2" . anb1b2 ... bm.
For any symbol a E E*, am denotes the word of length m consisting of m a's.
• Xy = { xy I x E X, Y E Y }
• 1. XO = {f}
2. xn+] = xnx, for n 2: 0
00
• X* = U xn
n=O
00
• X+ = U xn
n=]
This is the L-Ianguage for this ruleset. Each word in the derivation is simply the
concatenation of the previous two words in the derivation. We can prove this fact
by induction. Let L(wj) denote the mapping from the bit string Wj to the next
derivation using the ruleset and starting from 0, and Wj be the j-th bit string in the
derivation. We have
Wo 0
WI L(wo) = 1
W2 L(WI) = L(l) = 01 = WOWI
W3 L(W2) = L(01) = 101 = WIW2
By induction
Wj+1 L(wj)
L( Wj-2Wj-l)
L( wj-2)L( wj-d
The following Java program shows how to implement the derivation. We use the
StringBuffer class which is built into Java. The StringBuffer class implements
a mutable sequence of characters. The method
StringBuffer append(String str)
I I LSystem.java
class LSystem
{
public static void map(StringBuffer sold,StringBuffer snew)
{
int i;
for(i=O; i < sold.length(); i++)
{
if(sold.charAt(i) '0') snew.append("l");
if(sold.charAt(i) '1') snew.append("Ol");
}
} II end method map
map(sold,snew);
System.out.println(II snew = II + snew); II 10101101
int j;
for(j=O; j < 6; j++)
{
map(sO,sl);
sO = sl;
System.out.println(" s = II + sO);
sl = new StringBuffer('"');
}
}
}
1.5 Alphabets and Words 21
The encoding for a String begins with two bytes for the length of the string. The
first byte is the high order byte and the second byte is the low order byte. The
character encoding follows this. A zero value is encoded as two bytes
11000000,10000000.
The bytes are written in left to right order. All ASCII codes from 1 to 127 are
written using a single byte with a leading 0 bit,
o(0-6)
where (Q--6) indicates that the bits indexed by 0, I, ... 6 are written in the remaining
bit positions. All codes in the range 128 to 2047 are encoded as two bytes
110(6-10), 10(0-5).
Finally all codes in the range 2048 to 65535 are encoded as three bytes
Thus the string "UTF example" would be encoded as the bytes (in hexadecimal)
00, OB, 55, 54, 46, 20, 65, 78, 61, 6D, 70, 6C, 65.
The following Java program uses the above methods to illustrate the encoding.
22 Chapter 1. Algorithms
II UTFexample.java
import java.io.*;
output.writeUTF(s);
output.flushO;
output. close 0 ;
DatalnputStream input =
new DatalnputStream(new FilelnputStream(lmyout.dat"));
String t = input.readUTF();
input. close 0 ;
System.out.println("t II + t);
}
}
Chapter 2
Boolean Algebra
2.1 Introduction
Boolean algebra forms the theoretical basis for classical computing. It can be used
to describe the circuits which are used as building blocks for classical computing.
In this chapter we introduce the definitions of Boolean algebra and the rules for
manipulation. We introduce the standard forms for manipulation and describe how
Boolean algebra can be used to describe functions. Efficiency is an important issue
in computing and we describe the methods of Karnaugh maps and Quine-McKluskey
to simplify expressions.
At the end of the chapter two programs are given to illustrate the concepts. The first
example program uses the properties of Boolean algebra to efficiently implement sets
in C++. This implementation reduces the memory requirements for a set since only
one bit of information is needed for each element of the set. The second example is an
implementation of the Quine-McKluskey method in C++. The Quine-McKluskey
method is easier to implement on computer whereas the Karnaugh map method is
easier to do by hand.
The smallest Boolean algebra consists of two elements usually labelled 0 and 1 or
false and true but larger Boolean algebras exist.
2.2 Definitions
Definition. A Boolean algebra is a closed algebraic system containing a set B of two
or more elements and two operations
.: B x B --+ B, +:BxB--+B
• Identity Elements. There exist unique elements 0,1 E B such that for every
AEB
1. A+O=A
2. A·I = A
1. Ao + Al = Al + Ao
2. Ao' Al = Al . Ao
• Associativity. For every A o, AI, A2 E B
1. Ao + (AI + A2) = (Ao + AI) + A2
2. Ao' (AI' A2) = (Ao . AI) . A2
Example. The smallest Boolean algebra consists of the identity elements {O, I}.
The Boolean algebra can be summarised in a table.
Ao Al Ao +AI Ao' Al Ao
0 0 0 0 1
0 1 1 0 1
1 0 1 0 0
1 1 1 1 0
Example. The set P(X) (set of all subsets of the finite set X) of a non-empty set X
with· the intersection of sets, + the union of sets and the complement with respect
to X as negation forms a Boolean algebra with identity elements 0 = 0 and 1 = X.
This Boolean algebra has 21x1 members, where IXI denotes the cardinality (number
of elements) of X. ..
Example. The set A of all functions from the set {PI,P2, ... ,Pn} into {O, I} (i.e.
a function in the set assigns 0 or 1 to each of PI, P2, ... , Pn) and ., + and negation
described pointwise by the definitions in the first example forms a Boolean algebra.
For example, if h, h E A then
• Double negation. A = A
• Idempotence.
1. A·A=A
2. A+A=A
• Absorption.
I.A+1=1
2. O· A = 0
3. Ao + Ao . Al = Ao
4. Ao' (Ao + AI) = Ao
5. Ao' Al + Al = Ao + Al
6. (Ao + AI) . Al = Ao . Al
The double negation property is obvious. The idempotence property follows from
1. A· A = A· A + 0 = (A· A) + (A . A) = A· (A + A) = A· 1 = A
2. A + A = (A + A) . 1 = (A + A) . (A + A) = A + (A· A) = A + 0 = A
The absorption properties are derived as follows
1. A = A + A = A + (A· 1) = (A + A) . (A + 1) = A . (A + 1)
2. A = A . A = A . (A + 0) = (A· A) + (A . 0) = A + (A . 0)
3. Ao + Ao . Al = Ao . 1 + Ao . Al = Ao . (1 + AI) = Ao
4. Ao' (Ao + AI) = (Ao . Ao) + (Ao . Ad = Ao + Ao . Al
5. Ao' Al + Al = (Ao + AI) . (AI + AI) = (Ao + AI) . 1 = Ao + Al
6. (Ao + AI) . Al = (Ao . AI) + (AI' Ad = (Ao . AI) + 0 = Ao . Al
2.4 DeMorgan's Theorem 27
Ao + Al == Ao . Al
Thus the left-hand side of the two identities involves two operations and the right-
hand side three operations. DeMorgan's theorem can be proved using the properties
given above. It describes the relationships between the operations +, . and negation.
This theorem is very important for building combinational circuits consisting of only
one type of operation.
Ao Al An· Al An+ Al Ao
0 0 0 0 1
0 1 0 1 1
1 0 0 1 0
1 1 1 1 0
Definition. A Boolean function is a map f : {a, l}n ----> {a, I} where {a, l}n is the
set of all n-tuples consisting of zeros and ones.
Definition. Boolean variables are variables which may only take on the values of
or 1.
°
Definition. Bit is short for binary digit which refers to a °or 1.
We will use the notation Bn:= B x B x ... x B (n times). Thus Bn = {a, l}n.
°
°°°... °
...
1
f(Ao
f(Ao
= 0, Al = 0, ... ,An- l = 0)
= O,AI = 0, ... ,An- l = 1)
The rows of the table are over all combinations of A o, AI, . .. ,An-I.
There are 2n such combinations. Thus the truth table has 2n rows.
To see this we construct product forms Pj = Ij,1 . Ij,2 .... lj,n for each row in the
truth table of f where f = 1 with lj,i = A; if the entry for Ai is 1 and lj,i = Ai if
the entry for Ai is O. If f = 1 in m of the rows of the truth table then
Example. Consider the parity function for two bits with truth table Table 2.2.
Ao Al P(Ao, AI)
0 0 1
0 1 0
1 0 0
1 1 1
Definition. A canonical SOP form is a SOP form over n variables, where each
variable or its negation is present in every product form, in other words a Boolean
expression E is in canonical SOP form if it can be written as
E = h,1 . h,2 ..... h,n + 12,1 . 12,2 ..... 12,n + .... +lm,1 . lm,2 ..... lm,n
where li,j = Aj or li,j = Aj .
Ao Al Ao EEl Al
0 0 0
0 1 1
1 0 1
1 1 0
• AE9A=O
• AE9A=l
• Ao E9 Al = Al E9 Ao
• Ao E9 Al = Ao E9 Al
• (Ao E9 AI) E9 A2 = Ao E9 (AI E9 A2)
• Ao E9 Al = Ao . Al + Ao . Al
• (Ao' Ad E9 Ao = (Ao E9 AI) . Ao = Ao . Al
The XOR operation can be used to swap two values a and b (for example integers
in C++ and Java):
1. a:= a E9 b
2. b:= a E9 b
3. a:= aE9 b
By analysing the variables at each step in terms of the original a and b the swapping
action becomes clear. In the second step we have (a E9 b) E9 b = a E9 0 = a. In the
third step we have (a E9 b) E9 a = b E9 0 = b.
In C, C++ and Java the XOR operation is denoted by -. The following C++
program illustrates the swapping.
II xor.cpp
#include <iostream>
void main(void)
{
int a=23;
int b=-565;
Ao' Al = 1
Ao EB Al = 1
For Boolean functions there exist universal sets of operations with only one element.
The NAND and NOR operations can be used to build any other function which we
will show in the next section.
32 Chapter 2. Boolean Algebra
• Ao' Ai = Ao . Ai = Ao . Ai . Ao . Ai
• Ao + Ai = Ao . Ai = Ao . Ao . Ai . Ai
Example. We show now how to implement the NOR operation using only NAND
•
operations. As mentioned earlier De Morgan's laws are important to achieve this .
..
It can also be shown that the NOR gate is sufficient to build an implementation of
any Boolean function.
Data are represented by bit strings a n -ian -2 ... ao, ai E {O, I}. Bit strings of length
n can represent up to 2n different data elements. Functions on bit strings are then
calculated by
can be represented by
n-l
an-lan-2 ... ao -+ ~ ai2i.
i=O
n-l
~ 2i = 2n - 1 = 4294967295.
i=1
This relates to the data type unsigned long in C and C++. Java has only signed
data types. ..
{ x I x E R, x = b + j 2~ -=- b1 , j = 0,1, .. . , 2n - 1}
c- b n-l .
an-1 an-2··· ao -+ b + 2n _ ~ ai2'.
1 i=O
So we find
and
an-1an-2 ... an = 11 ... 1 -+ c.
34 Chapter 2. Boolean Algebra
Minimizing the number of gates in an implementation decreases cost and the number
of things that can go wrong.
One way to reduce the number of gates is to use the properties of the Boolean
algebra to eliminate literals.
Example. The full adder (Table 2.4) consists of two outputs (one for the sum and
one for the carry) and three inputs (the carry from another adder and the two bits
to be added).
Gin Ao Al S Gout
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
Thus
and
Gout = Ao . Al . C;n + Ao . Al . Gin + Ao . Al . Gin + Ao . Al . Gin
Simplification for Gout yields
00 01 11 10
0 0 0 1 0
1 0 1 1 1
Note that adjacent columns and rows only differ in the assignment of one variable
(only one bit differs). This is important for the simplification algorithm to work
correctly. Suppose two adjacent squares have the value 1 and only differ in the
variable A. Writing the corresponding product forms as p. A and p. A the canonical
SOP form can be simplified using
In fact this can be extended to any 2n adjacent squares in a row or column. The
first column is adjacent to the last column ("wrap around"), and the same applies
to rows. The simplification is indicated by circling the adjacent squares involved in
the simplification. Overlapping circles are allowed due to the idempotence property.
If two circles are "adjacent" in the sense that they cover the same columns(rows)
in adjacent rows (columns) they may be joined to form one circle encircling all the
appropriate squares. The only restriction is that the number of rows and columns
encircled are a power of 2, i.e. 1,2,4,8, .... This is due to the algebraic simplification
used. Each set of encircled squares is called a group and the squares are said to be
covered. There are two algorithms for this method.
36 Chapter 2. Boolean Algebra
Algorithm 1.
1. Count the number of adjacencies (adjacent I-squares) for each I-square on the
Karnaugh map.
3. Circle the I-square so that the circle covers the most uncovered I-squares.
Algorithm 2.
1. Circle all I-squares so that the circle covers the most I-squares.
2. Eliminate all circles that do not contain at least one I-square that is not
covered by another circle.
The SOP form is the OR of product forms representing the groups of the Karnaugh
map. The variable Ai is in the product form if Ai = 1 is constant in the group, Ai
is in the product form if A; = 0 is constant in the group.
00 01 11 10
0 0 0 (11 0
1 0 1 ll1JJ 1
2.6 Boolean Function Implementation 37
Example. The truth table for a decimal incrementer (4-bit) with 4 inputs and 4
outputs is given by Table 2.5.
O)L ~1 11 10 00 01 11 10
00 1 1 0 0 00 0...- 1---0 0 0
01 1 d d 0 01 (1 d) d 0
11 d d d d 11 \Q 9' (di d
10 1 ~ 0 0 10 0 0 lV 0
where ~,j denotes the jth product form with exactly i negated Boolean variables.
The method is as follows
2. Set m:= n.
3. Set
QM(m - 1) := QM(m)
and
QMm,i:= {P E QM(m) I P has m Boolean variables of which i are negated}
4. For each pair of elements
el = h,l . h,2 ..... h,m E QMm,i
and
e2 = l2,1 ·l2,2 .... ·l2,m E QMm,i+l where i = 0, 1, ... ,m - 1
which differ in only one literal h,j =1= l2,j
set
QM(m - 1) := (QM(m - 1) - {el' e2}) U {h,l .... h,j-l . h,j+l .... ·ll,m}
5. Set m := m - 1.
10 II 00 01
0 0 0 0
0 1 1 1
1 0 1 0
1 1 0 1
e m=2.
QM(2) = {fo . II. 10 . II}
QMO,2 = {Io· II}
QMI ,2 = {To· II}
QM2,2 = 0
e m=2.
QM{l) = {II}
QMO,2 = {fo . II}
QMI ,2 = {fo . II}
QM2,2 = 0
em=l.
QM{O) = {Id
QMO,1 = {h}
QMI,1 = 0
e m=3.
QM(3) = {Ao· AI· Gin, Ao· AI· Gin, Ao· AI· Gin, Ao· Al . Gin}
Q MO,3 = {Ao· AI· Gin}
QMI ,3 = {Ao· AI· Gin, Ao· AI· Gin, Ao· AI· Gin}
QM2,3 = 0
QM3,3 = 0
40 Chapter 2. Boolean Algebra
• m=3.
QM(2) = {Ao' AI' Gin, AI' Gin, Ao' AI' Gin}
QMO,3 = {Ao' AI' Gin}
QMI ,3 = {Ao' AI' Gin, Ao' AI' Gim An· AI' Gin}
QM2 ,3 = 0
QM3,3 = 0
• m=3.
QM(2) = {Ao' AI' Gin, AI' Gin, Ao' Gin}
QMO,3 = {An' AI' Gin}
QMI ,3 = {Ao' AI' Gin, Ao' AI' Gim Ao' AI' Gin}
QM2,3 = 0
QM3,3 = 0
• m=3.
QM(2) = {Ao' AI, AI' Gin, Ao' Gin}
QMO,3 = {Ao' AI' Gin}
QMI ,3 = {Ao' AI' Gin, Ao' AI' Gin, Ao' AI' Gin}
QM2,3 = 0
QM3,3 = 0
• m=2.
QM(l) = {Ao' AI, AI' Gin, Ao' Gin}
QMO,2 = QM(l)
QMI ,2 =0
QM2,2 = 0
QM3,2 = 0
.m=l.
QM(O) = {Ao . AI, Al . Gin, Ao . Gin}
QMO,1 = 0
QMI,1 = 0
QM2,1 = 0
QM3 ,1 = 0
Thus we have reduced the expression to one consisting of only two types of operations
and 5 operations in total. This is a large reduction compared to the original total
of 11 operations. The example also illustrates that the process is long but simple
enough to implement, making it a good application for a computing device. ..
2.7 Example Programs 41
A := aD a1 ... an-I·
For example if
A 11010100
B 01101101
then
AuB 11111101
AnB 01000100
A 00101011.
42 Chapter 2. Boolean Algebra
The following C++ program bi tset . cpp implements these concepts. The class
Bi tSet implements all the bitwise operations introduced above. We could also use
the bi tset which is part of the standard template library, which includes all the
methods needed to implement complement, intersection and union.
II bitset.cpp
#include <iostream>
#include <string>
class SetElementBase
{
public:
virtual void output(ostream&)=Oj
}j
class BitSet
{
protected:
char *setj
int lenj
SetElementBase **universej
static int byte(int)j
static char bit(int)j
public:
BitSet(SetElementBase**,int,int*,int)j
BitSet(const BitSet&)j
BitSet &operator=(const BitSet&)j
BitSet operator+(const BitSet&) constj I I union
BitSet operator*(const BitSet&) constj II intersection
BitSet operator-(void) constj II complement
void output(ostream&) constj
-BitSetO j
}j
2.7 Example Programs 43
void main(void)
{
SetElement<int> sl(5);
SetElement<string> s2(string(lelement"»;
SetElement<double> s3(3.1415927);
SetElement<int> s4(8);
2.7 Example Programs 45
SetElement<int> s5(16);
SetElement<int> s6(3);
SetElement<string> s7(string("string"));
SetElement<double> s8(2.7182818);
SetElement<int> s9(32);
SetElement<int> s10(64);
SetElementBase *universe[10]={&sl,&s2,&s3,&s4,&s5,
&s6,&s7,&s8,&s9,&s10};
The program quine. cpp simplifies the carry and sum bits from the full adder. In
the main function we consider the expressions for Gaut and S for the full adder. We
use an array of three char to represent a product form, a 1 indicates that the literal
in the product form is not a negated variable and a 0 indicates that the literal is a
negated variable. The variable is identified by the index in the array, for example
the program uses index 0 for Ao, index 1 for Al and index 2 for Gin. These arrays
(representing product forms) are placed in an array representing the final SOP form.
II quine.cpp
#include <iostream>
struct QMelement
{
int nvars,used;
char *product;
int *vars;
QMelement *next;
};
{
if(itemi->next != (QMelement*)NULL)
{
item2 = item1->next;
while(item2 != (QMelement*)NULL)
{
if (complementary (item1,item2))
{
char *product = new char [iteml->nvars-l] ;
int *vars = new int[iteml->nvars-l];
for(i=O,j=O;i<iteml->nvars;i++)
if(iteml->product[i] == item2->product[i])
{
product[j] = iteml->product[i];
vars [j ++] = i;
}
Addltem(sets[index-1] ,product,item1->nvars-1,vars);
delete[] product;
delete [J vars;
iteml->used = item2->used=1;
}
item2 = item2->next;
}
}
item2 = item1;
item1 = item1->next;
if(item2->used) Deleteltem(sets[index] ,item2);
}
QuineRecursive(sets,index-1);
}
void main(void)
{
//carry flag
char c1[3]={1,1,0},c2[3]={0,1,1},c3[3]={1,0,1},c4[3]={1,1,1};
//sum
char s1[3]={0,1,0},s2[3]={1,0,0},s3[3]={0,0,1},s4[3]={1,1,1};
char *Cout[4];
char *8[4];
char *names [3] {"AO","A1","Cin"};
Cout [0] c1;
Cout [1] c2;
Cout [2] c3;
Cout [3] c4;
8 [0] sl;
8 [1] s2;
8 [2] s3;
8 [3] s4;
cout « "Cout=" ; QuineMcKluskeyCCout,4,3,names);
cout « endl « 118=11; QUineMcKluskey(S,4,3,names);
cout « endl;
}
Chapter 3
Number Representation
For any integer r > 1, every positive integer n can be represented uniquely in the
form
ao . rO + al . rl + a2 . r2 + ... + am . rm
where 0 ::; ai ::; r - 1 for 0 ::; i ::; m and am > 0 and rO = 1. This can be proved by
induction on n.
where 0 ::; ai ::; r - 1 for 0 ::; i ::; m and am > 0 and rO = 1. Let k be the least
integer in {O, 1, ... ,m} with ak < r - 1. Either k < m which gives
Example. The number 23 (in decimal notation) has the binary representation 10111,
since 24 + 22 + 2 + 1 = 23. The decimal number 101 has the binary representation
1100101, i.e. 26 + 25 + 22 + 1. ,.
A procedure for finding the binary representation of a number n is to find the highest
power 2m which is ::; n, subtract 2m from n, then find the highest power 2j which
is ::; n - 2m , etc.
3. In today's CPU the length ofthe storage elements (called registers) are gener-
ally multiples of 8 bits (typically 32 or 64 bits). The general purpose registers
are 32 bits long or 64 bits long. Thus it is convenient to show contents as
multiples and fractions of 16 - hexadecimal. The storage sizes are 8 bits (a
byte), 16 bits (a worrf), 32 bits (a doubleword) , 64 bits (a quadworrf) , and 80
bits (a tenbyte) - all multiples and fractions of 16.
Thus, although we think in decimal and the computer thinks in binary, hexadecimal
is a number system that captures some of the important elements of both. In the
remainder of this section we discuss the binary, decimal, and hexadecimal number
systems and the methods for converting from one number system to another.
3.1 Binary, Decimal and Hexadecimal Numbers 53
3.1.1 Conversion
In this section we describe the conversion from binary to hexadecimal, from hex-
adecimal to binary, binary to decimal, decimal to binary, decimal to hexadecimal,
and hexadecimal to decimal.
or
... + 256b8 + 128b7 + 64b6 + 32b5 + 16b4 + 8b3 + 4b2 + 2b1 + bo
which can be written
Each of the sums in parentheses is a number between 0 (if all the b values are 0)
and 15 (if all the b values are 1). These are exactly the digits in the hexadecimal
number system. Thus to convert from binary to hexadecimal, we must gather up
groups of 4 binary digits.
The notation in assembly language is as follows. The letter b indicates that the
number is in binary representation and the letter h indicates that the number is in
hexadecimal representation. This is the notation used in assembly language. The
letter d indicates a decimal number. The default value is decimal.
In C, C++, and Java decimal, octal and hexadecimal numbers are available. Hex-
adecimal numbers are indicated by Ox ... in C, C++ and Java. For example the
decimal number 91 would be expressed as Ox5B, since
5 . 161 + 11 . 160 = 91 .
54 Chapter 3. Number Representation
Binary to Decimal. Write the binary sequence in its place-value summation form
and then evaluate it.
Example.
Thus
10101010b = 27 + 25 + 23 + 21 = 128 + 32 + 8 + 2 = 170d.
m m
Lai2i-k =2 L ai 2i - k- 1 + ak·
i=k i=k+l
The remainder after integer division by 2 gives ak, and we continue until the division
gives O. The following example illustrates this.
3.1 Binary, Decimal and Hexadecimal Numbers 55
This method works because we want to find the coefficients bo, b1 , b2 , •.• (which are
o or 1) of 2°, 21, 22, ... and so on.
Thus, in the preceding example,
Dividing by 2,
345/2 = b1029 + b928 + ... + bi + (bo/2)
Thus bo is the remainder on division by 2 and
is the quotient.
The following C++ program finds the binary representation of a non-negative inte-
ger. The operator %is used to calculate the remainder after integer division.
56 Chapter 3. Number Representation
II remain.cpp
#include <iostream.h>
void mainO
{
int ij
unsigned long N = 345j
unsigned long array[32]j
for(i=Oji<32ji++) { array[i] = OJ }
for(i=Oji<32ji++) { array[i] = ~!.2j N = N/2j }
for(i=31ji>=Oji--) { cout« array[i]j }
}
Example.
This works for the same reMon that the method for decimal-to-binary conversion
works. That is, division by 16 produces as a remainder the coefficient (h o) of 16°,
and as a quotient the decimal number minus the quantity (ho· 16°).
Example.
II remain2.cpp
#include <iostream.h>
void mainO
{
int i;
unsigned long N = 15947;
unsigned char array[8];
for(i=0;i<8;i++)
array [i] = 0;
for(i=0;i<8;i++)
{
array[i] = N%16;
if (array [i] >9)
array[i] += 'A'-10;
else
array[i] += '0';
N = N/16;
}
for(i=7;i>=0;i--)
cout « array[i];
}
3.1.2 Arithmetic
The rules for addition of binary numbers are:
o+ 0 = 0
o+ 1 = 1
1 + 0 =1
1 + 1 = (1) 0
where (1) denotes a carry of 1. Note that lOb is the binary equivalent of 2 decimal.
Thus the sum 1 + 1 requires two bits to represent it, namely 10, the binary form
of the decimal number 2. This can be expressed as follows: one plus one yields a
sum bit s = 0 and a carry bit c = 1. If we ignore the carry bit and restrict the sum
to the single bit s, then we obtain 1 + 1 = O. This is a very useful special form of
addition known as modulo-2 addition.
Doing arithmetic in the binary and hexadecimal number systems is best shown by
examples and best learned by practice.
Example. Decimal Arithmetic
45
+ 57
102
,.
Remember that 7 + 5 is 2 with a 1 carry in decimal. 5 + 4+ the carried 1 is 0 with
a 1 carry.
Example. Binary Arithmetic
1011
+ 1001
10100
11110
3.1 Binary, Decimal and Hexadecimal Numbers 59
lA
+ 5
IF
FF
+ 3
102
1001 multiplicand
x 110 multiplier
High speed multiplication techniques use addition and subtraction or uniform mul-
tiple shifts. Binary divisions can be performed by a series of subtractions and shifts.
60 Chapter 3. Number Representation
Example. Show 65712d as a binary (a) byte, (b) word, (c) double word. We have
65712d= 10000000010110000b
Thus
(c) 00000000000000010000000010110000
The largest integer number which fits into a register with 32 bits is
31
232 - 1 == L 2k == 4294967295.
k=O
The largest integer number which fits into a register with 64 bits is
31
264 - 1 == L 2k == 18446744073709551615.
k=O
Storing negative integers presents a more difficult problem since the negative sign
has to be represented (by a 0 or a 1) or some indication has to be made (in binary!)
that the number is negative. There have been many interesting and ingenious ways
invented to represent negative numbers in binary. We discuss three of these here:
2. One's complement
3. Two's complement
3.1 Binary, Decimal and Hexadecimal Numbers 61
The sign and magnitude representation is the simplest method to implement nega-
tive integers. Knuth [105] used sign and magnitude in his mythical MIX computer.
In sign and magnitude representation of signed numbers, the leftmost (most signif-
icant) bit represents the sign:
o for positive
and
1 for negative.
Example. The positive integer number 31 stored in a double word (32 bits) using
sign and magnitude representations is
00000000000000000000000000011111b
Thus the negative integer -31 becomes
10000000000000000000000000011111b
There are two drawbacks to sign and magnitude representation of signed numbers:
1. There are two representations of 0:
+0 = OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOb
and
-0 = 10000000000000000000000000000000b.
Thus the CPU has to make two checks every time it tests for O. Checks for 0
are done frequently, and it is inefficient to make two such checks.
2. Obviously,
a + (-b)
is not the same as
a- b.
What this means is that the logic designer must build separate circuits for
subtracting; the adding circuit used for a + b is not sufficient for calculating
a-b.
Example. The following shows that
52 - 31
and
52 + (-31)
are not the same in sign and magnitude representation.
62 Chapter 3. Number Representation
Thus this shows that the sign and magnitude representation is not useful for imple-
mentations on CPU's.
Furthermore 31 - 31 gives
One's Complement
One's complement method of storing signed integers was used in computers more in
the past than it is currently. Here we assume again that 32 bits are given (double
word). In one's complement, the leftmost bit is still 0 if the integer is positive. For
example,
00000000000000000000000000011111b
still represents +31 in binary. To represent the negative of this, however, we replace
all O's with l's and alII's with O's. Thus
11111111111111111111111111100000b
represents -31. Note that the leftmost bit is again 1. Notice that in assembly
language one starts counting from zero from the rightmost bit.
11111111111111111111111111111110b
0000000000000000000000000000000lb
Thus the second drawback to sign and magnitude representation has been elimi-
nated. This means a - b is the same as a + (-b). Thus the circuit designer need only
include an adder; it can also be used for subtraction by replacing all subtractions
a - b with a + (-b).
The following example shows, however, that this adder must do a little more than
just add.
Example. We show that 52 - 31 and 52 + (-31) are the same in one's complement
representation. For 52 - 31 we have
The adder for one's complement arithmetic is more complicated; it must carry
around any overflow bit in order to work correctly for subtraction. The first draw-
back is still with us, however. In one's complement, there are still two representa-
tions of 0
00000000 00000000 00000000 OOOOOOOOb positive 0
and
11111111 11111111 11111111 l1111111b negative 0
when viewed as a double word.
One's complement is implemented in C, C++ and Java with the - operator. The
following program shows an application.
II complement.cpp
#include <iostream.h>
for(int i=34,j=27;i>=0;i--)
if«i == j) && (i != 0)) { array[i]=' '; j-=9;}
else { array[i] = N%2 + '0'; N = N/2; }
array[35]='\0' ;
return array;
}
void mainO
{
int a = 17; II binary 000000000 00000000 00000000 0010001
cout « "a = " « a « endl « binary(a) « endl;
int b = -a; II binary 111111111 11111111 11111111 1101110
cout « "-a = " « b « endl « binary(b) « endl;
}
3.1 Binary, Decimal and Hexadecimal Numbers 65
Two's Complement
The two '8 complement method of storing signed integers is used in most present-day
CPU's, including the 386, 486, Pentium and Alpha Dec. The two's complement is
formed by
(2) adding 1.
Example. Using two's complement and a double word (32 bits). Thus the decimal
number 31 is stored as
00000000000000000000000000011111b.
11111111111111111111111111100001b.
in two's complement. If we have registers with 32 bits then we can store the integer
numbers (n = 32)
-2147483648 to 2147483647.
Although taking the two's complement of a number is more difficult than taking its
one's complement, addition of two's complement numbers is simpler than addition
in one's complement or in signed-magnitude representations.
3.1.4 Overflow
If we do arithmetic operations with 32 bit registers overflow will occur in cases:
1. if we go beyond the range 0-4294967295 for the data type unsigned long
in C and C++. This means we add numbers so that the sum is larger than
4294967295. Also negative numbers are out of range.
4294967295 + 1 = 4294967296
The number on the right-hand side is out of the range for a 32 bit register for the
C and C++ data type unsigned long. Since
4294967295 = Illlllllllllllllllllllllllllllllb
for unsigned long, the addition of 1 yields
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOb
with one overflow bit. Thus the output is O.
The number on the right-hand side is out of range for a 32 bit register for long.
Since
-2147483648 = 10000000000000000000000000000000b
and
-3 = 11111111111111111111111111111101b
we obtain
01111111111111111111111111111101b
Thus the output is 2147483651.
68 Chapter 3. Number Representation
II overflow.cpp
#include <iostream.h>
int mainO
{
unsigned long a = 4294967295;
unsigned long b = 1;
unsigned long r1;
r1 = a + b;
cout « "r1 = " « r1 «endl; II 0
unsigned long e = 0;
unsigned long f = -1;
unsigned long r3 = e + f;
r3 = e + f;
cout « "r3 = " « r3 « endl; II 4294967295
long g = -2147483648;
long h = -3;
long r4 = g + h;
cout « "r4 = " « r4 « endl; II 2147483645
return 0;
}
3.1 Binary, Decimal and Hexadecimal Numbers 69
The range of the data type unsigned long is 0 - 4294967295. The binary repre-
sentation of 4294967295 is
This is the largest number which fits into 32 bits. Thus if we add 1 to this binary
number under the assumption that 32 bits are given we find
with 1 overflow bit. The output of the C++ program ignores the overflow bit and
displays the output O.
r2 =1
r3 = 4294967295
and
r4 = 2147483645.
of the program.
Java has the signed data type long. The size is 64 bits. Thus the range is
-9223372036854775808 to 9223372036854775807.
70 Chapter 3. Number Representation
where (Bih = (~)1O. Thus, a 9 in NlO is mapped into 1001, an 8 into 1000, a 7 into
0111, and so on. For example, if NlO = 7109 10 , then the decimal-to-BCD conversion
process takes the form
leading to
NlQ= OlllOOOlOOOOlOOllQ
where the underlined subscript 10 is our notation for binary-coded decimal. This
conversion process is, in fact, the same as that used for changing a hexadecimal
number to binary. In this case, there are 10 digits instead of 16, so only 10 of the 16
possible 4-bit binary numbers are needed. Also, each 4-bit group must be assigned
weight 10 rather than 16. For example, we get
Conversion from BCD to ordinary decimal form is achieved by replacing 4-bit groups
with the equivalent decimal digit. For instance,
3.1 Binary, Decimal and Hexadecimal Numbers 71
implying that N{o = 284905 10 . Conversion between binary (base 2) and BCD re-
quires the decimal-binary conversion procedure, in addition to the decimal digit-
encoding procedure discussed above.
Not all the possible binary patterns correspond to BCD numbers. The six 4-bit
patterns
II bcd.cpp
#include <iostream.h>
void mainO
{
int i;
unsigned long N = 15947;
unsigned char array [4] ;
unsigned char mask[2]={OxOF,OxFO};
unsigned char shift[2]={0,4};
for(i=0;i<4;i++)
array[i] = 0;
for(i=0;i<8;i++)
{array[i/2] 1= (N%10) «shift [i%2] ; N = N/l0; }
for(i=7;i>=O;i--)
{ cout « char«(array[i/2]&mask[i%2]»>shift[i%2])+'O'); }
}
72 Chapter 3. Number Representation
There are actually three formats - one that requires 32 bits, one that is used for 64
bits, and one for 80 bits. We describe the 32-bit format, called the short real format,
here.
The table lists seven numeric data types showing the data format for each type. The
table also shows the approximate range of normalized values that can be represented
with each type. Denormal values are also supported in each of the real types, as
required by IEEE Std 854.
3.2 Floating Point Representation 73
Significant
Digits Approximate Normalized
Data Type Bits (Decimal) Range (Decimal)
All operands are stored in memory with the least significant digits starting at the
initial (lowest) memory address. Numeric instructions access and store memory
operands using only this initial address.
74 Chapter 3. Number Representation
3.2.2 Representation
The first step to understanding how a binary fraction is stored using short real
format is to normalize it. This is similar to putting a decimal point number into the
familiar scientific notation in which we have a sign, an exponent, and a mantissa.
To normalize a binary fraction, we write it so that the first 1 is just to the left of
the binary point.
111111111
O.OOOllllOl = O· - + 0 . - + 0 . - + 1 . - + 1· - + 1 . - + 1 . - + O· - + 1 . -
2 22 23 24 25 26 27 28 29
..
1.l1l0l .2- 4
The next step is to represent the important parts of the normalized fraction in 32
bits. The important parts are those that will allow us to recover the original number
(and allow the computer to perform operations on it). These parts are the
1. Sign
3. Mantissa
In the IEEE short real format, the sign is stored in the leftmost bit, the exponent
is stored in the next 8-bits, after some alteration, and the mantissa is stored in the
rightmost 23 bits, again after a minor adjustment.
2. To store the exponent. Add 127 (l11l11b) to it. The number 127 is called a
bias, and the resulting exponent is called a biased exponent. Biased exponents
may range from 1 to 254, so that exponents range from -126 to +128.
3. To store the mantissa. Remove the leftmost 1 and store the rest of the fraction
left-adjusted. This technique of not storing the first 1 before the binary point
is a common way to store mantissas. It is called hidden bit storage. Computer
circuitry knows that the 1 is really part of the mantissa.
3.2 Floating Point Representation 75
Example. Find 0.0390625 (base 10) as it would be stored in short real format.
.0390625 10 = .00001012
Step 2. Normalize the binary fraction .
.0100000000000000000000
~~ P100000000090000000000Q
Sign Exponent Fradion
The following C++ program implements this algorithm. The only difference is that
the actual conversion to binary is delayed until after the normalization procedure.
We use the above test example. For the output we find
II float2bin.cpp
#include <iostream.h>
#include <math.h>
normalize(f,el);
f = fabs(f);
II remove the leftmost 1 bit
f -= 1;
b[1]=b[10]=' ';
3.2 Floating Point Representation 77
b[34]='\0' ;
}
void main(void)
{
char b[35] ;
float f=0.0390625;
float2bin(f,b);
cout « f « " (base 10) = "
« b « " (floating point base 2)"«endl;
}
78 Chapter 3. Number Representation
10111110111101000000000000000000?
1101111101111101000000000000000000
.11101000000000000000000
results in
1.11101000000000000000000
which is
1.11101 2 •
Multiplying by 2- 2 (provided by the exponent) yields
1 1 1 1 1
.011110b = - + - + - + - + - = -0.4765625 10
4 8 16 32 128
Chapter 4
Logic Gates
4.1 Introduction
A digital electronic system uses a building-block approach. Many small operational
units are interconnected to make up the overall system. The system's most basic
unit is the gate circuit. These circuits have one output and one or more inputs. The
most basic description of operation is given by the function table, which lists all
possible combinations of inputs along with the resulting output in terms of voltage,
high and low. Table 4.1(a) shows a function table for a 2-input circuit. This table
indicates that if both inputs are low or both are high, the output will be low. If
one input is high and the other is low, a high level will result on the output line.
As we deal with logic design, it is appropriate to use Is and Os rather than voltage
levels. Thus, we must choose a positive (H = 1, L = 0) or negative (H = 0, L =
1) logic scheme. Once this choice is made, we use the function table to generate a
truth table. The function table describes inputs and outputs in terms of Is and Os
rather than voltage levels. Function tables are used by manufacturers of logic gates
to specify gate operation. The manufacturer conventionally defines gates in terms
of positive logic.
Table 4.1: Function Table and Truth Tables for a Logic Circuit
4.2 Gates
4.2.1 AND Gate
The AND gate has one output and two or more inputs. The output will equal 0 for
all combinations of input values except when all inputs equal 1. When each input
is 1, the output will also equall. Figure 4.1 shows the AND gate. Table 4.2 shows
the function and positive logic truth tables. The AND gate will function as an OR
gate for negative logic, but the gate is named for its positive logic function.
Al A2 X
0 0 0
0 1 0
1 0 0
1 1 1
The AND operation can be interpreted as the multiplication of a set of 1 bit numbers;
a 0 among the input variables makes the result (product) 0; the product is 1 if and
only if all the inputs are 1. For this reason the AND function is written as a product
expression
or
X AND := Al ..... An
if we have n inputs. Alternative AND symbols in common use are /\ and &. The
latter is the AND designator in the standard box symbol for an AND gate. As with
multiplication the symbol . is sometimes omitted from AND expressions, so that
Al . A2 reduces to A I A2.
4.2.2 OR Gate
The OR gate has one output and two or more inputs. If all inputs are equal to 0,
the output will be equal to O. The presence of a 1 bit leads to an output of 1. Table
4.3 describes this operation in terms of a truth table. The standard symbol for a
2-input OR gate is shown in Figure 4.2.
Al A2 X
0 0 0
0 1 1
1 0 1
1 1 1
The OR operation takes its name from the fact that the output X is 1 if and only
if Al is 1 or A2 is 1. In other words, the output X of an OR gate is 1 if and only if
the number of Is applied as input is one or greater.
Thus, + denotes OR in this context, and is read as "or" rather than plus. An
alternative OR symbol is V.
In CMOS 4071 is a quad two-input gate and the CMOS 4072 is a two quad-input
gate.
82 Chapter 4. Logic Gates
Al A2 X
0 0 0
0 1 1
1 0 1
1 1 0
For this reason, XOR is also called the odd-parity junction, and is the basis of error-
handling circuits. This versatile function can also be interpreted as (numerical)
summation modulo 2. Thus, another definition of XOR equivalent to the definition
given above is
The XOR gate is a special gate and is widely employed in digital circuits that
perform mathematical functions.
The symbol for the XOR gate are shown in the next figure.
The use of the generic odd number 2k + 1 as the function designator in the standard
box symbol reflects the fact that the output is 1 if and only if 2k + 1 inputs are
1, for k = 0,1,2, .... In logic expressions, the XOR operator is Ell which is read as
exclusive OR, ring-sum, or sum modulo 2. Thus, we can write
XXOR = Al Ell A2 Ell ... Ell An
The CMOS 4030 is a quad two-input exclusive OR gate.
4.2 Gates 83
lffiJ
o
1
x
1
0
In CMOS the 4069 is a hex inverter. Each of the six inverters is a single stage.
The NOT gate can be combined with the AND, OR and XOR gate to provide the
NAND, NOR and XNOR gate.
84 Chapter 4. Logic Gates
X = A l · A2 • •.. · An
which indicates that the inputs AI, A2 , . .. , An are first ANDed and then the result
inverted. Thus a NAND gate always produces an output that is the inverse (op-
posite) to an AND gate. The gate symbol is therefore formed by appending the
graphic inversion symbol (a small circle) to the corresponding AND symbol.
Al A2 X
0 0 1
0 1 1
1 0 1
1 1 0
Since both inverters and AND gates can be constructed from NAND gates, the
NAND gate is seen to be a functionally complete set itself. The AND gate and
inverter form a functionally complete set. This means that any logic function realized
by logic gates can be realized with the AND and NOT functions. For example the
XOR gate can be represented by
which indicates that AI, A2 , • .. , An are first ORed and then the result is inverted.
A NOR gate always gives an output that is the inverse of the OR gate. The gate is
characterized by the tables and symbols of Table 4.7 and Figure 4.7.
Al A2 X
0 0 1
0 1 0
1 0 0
1 1 0
All other gates can be constructed from NOR gates. For example, the XOR gate
can be found as
Al A2 X
0 0 1
0 1 0
1 0 0
1 1 1
4.3 Buffer
The buffer is an IC device that provides no change in logic at the output, but does
provide a high input load impedance, and therefore good output drive capability. It
works the same way as an emitter-follower circuit. The output of a MOS micropro-
cessor, for example, has very poor drive capability when driving a TTL device. By
inserting a buffer between the output of the MOS microprocessor and the input of
the TTL device, we can solve the problem. The buffer provides an input load the
processor can handle and an output drive that is TTL-compatible. The truth table
and the symbol for a buffer are shown in Table 4.9 and Figure 4.10.
lffiJ
o
1
x
0
1
As an example consider the buffering of MPU buses. Thc MPU, RAM and ROM
are chips that are generally manufactured using CMOS technology. The decoders,
gates, inverters, tri-state buffers, and output register are all TTL devices, usually
L8-TTL to minimize power requirements and loading.
Inputs Output
A E X
~ ~LE_N__l_'\l-,~ X 0 0
0 1
Z
1
1 0 Z
1 1 0
Inputs Output
A E X
~ ~LE_N__l_'\l-,~ X 0
0
0
1
0
Z
1 0 1
1 1 Z
Figure 4.11: (a) A tri-state inverter with an enable line, (b) a tri-state buffer with
a disable line
Tri-state buffers are often used in applications where several logic signals are to be
connected to a common line called a bus. Many types of logic circuits are currently
available with trl-state outputs. Other tri-state circuits include flip-flops, registers,
memories, and almost all microprocessors and microprocessor interface chips. In
CMOS the 400097 is a hex non-inverting buffer with 3-state outputs. The 3-state
outputs are controlled by two enable inputs.
4.5 Feedback and Gates 89
In Figure 4.12 there is a feedback loop from the output to the input of the NAND
gate, implying the Boolean equation
A(t) . X(t) = o.
Similarly, for X(t) = 0 we have a logically inconsistent situation, since X(t) cannot
be 0 and 1 at the same time.
The inconsistency present in this example disappears if the NAND gate has a
nonzero propagation delay t pd , which also makes a better model for the behaviour
of a physical gate. Our equation changes to
Hence, if A(t) changes from 0 to 1 at some time t, this change will cause X(t) to
change from 0 to 1 at some time t + tpd. Owing to our equation, this second change
90 Chapter 4. Logic Gates
will change X(t) from 1 to 0 at t + 2t pd , and so on. Hence, the value of X(t) must
change every tpd time units. This type of regular and spontaneous changing, called
oscillation, is an extreme form of unstable behaviour. However it is not logically
inconsistent. This type of behaviour plays an important role in generating the clock
signal that controls synchronous circuits. Spontaneous oscillation of the above kind
involves narrow pulses of width tpd that tend to be filtered out by the gate through
which they pass. Consequently, such an oscillation usually dies out quickly.
Chapter 5
Combinational Circuits
5.1 Introduction
A combinational circuit consists of gates representing Boolean connectives; it is free
of feedback loops. A combinational circuit has no state; its output depends solely
on the momentary input values. Examples are the full adder, comparator, decoder
and multiplexer. In reality, however, signal changes propagate through a sequence
of gates with a finite speed. This is due to the capacitive loads of the amplifying
transistors. Hence circuits have a certain propagation delay.
5.2 Decoder
In digital computers, binary codes are used to represent many different types of
information, such as instructions, numerical data, memory addresses, and control
commands. A code group that contains N bits can have 2N different combinations,
each of which represents a different piece of information. A logic circuit is required
which can take the N-bit code as logic inputs and then generate an appropriate
output signal to identify which of the 2N different combinations is present. Such a
circuit is called a decoder.
Thus a l-out-of-n decoder is a circuit with n outputs and N = log2 n = ldn inputs,
outputs Xj are numbered from 0 to (n - 1). An output goes to 1 when the input
number A is identical to the number j of the relevant output. Figure 5.1 shows the
truth table for a 1-out-of-4 decoder. The variables Ao and Al represent the binary
code of the decimal number m. The sum of the products (disjunctive normal form)
of the recoding functions can be taken directly from the truth table. The circuit is
also shown using AND and NOT gates. The functions are
Ao-.---I-
Al -t-----f-1+
Inputs Outputs
m Ao Al Xo Xl X 2 X3
0 0 0 0 0 0 1
1 0 1 0 0 1 0
2 1 0 0 1 0 0
3 1 1 1 0 0 0
5.3 Encoder
A decoder takes an input code and activates the one corresponding output. An
encoder performs the opposite operation; it generates a binary code corresponding
to which input has been activated. A commonly used Ie encoder is represented in
Figure 5.2. It has eight active LOW inputs, which are kept normally high. When
one of the inputs is driven to 0, the binary output code is generated corresponding
to that input. For example, when input 13 = 0, the outputs will be CBA = 011,
which is the binary equivalent of decimal 3. When 16 = 0, the outputs will be
CBA = 110. For some encoders, if more than one input is made low the output
would be garbage. For a priority encoder, the outputs would be the binary code for
the highest-numbered input that is activated. For example, assume that the encoder
of the Figure is a priority encoder and that inputs 14 and 17 are simultaneously made
low. The output code will be CBA = 111 corresponding to 17 . No matter how many
inputs are activated, the code for the highest one will appear at the output.
To
T;
C
I:;
~
8-Line-to-3-line B
~
encoder
I;,
A
h
17
A h+h+h+h
B h + 13 + 16 + 17
C 14 +h +h+17
V ~+h+~+h+~+h+h+h
The output V is used to indicate when an input is 1 for the encoder; it differentiates
between the 0 input 10 and when no inputs are 1. The encoder is not a priority
encoder, it performs a bitwise OR on all the inputs which are set to 1.
94 Chapter 5. Combinational Circuits
In CMOS the 4532 is an 8-input priority encoder with eight active HIGH priority
inputs (fa to h), three active HIGH outputs (00 to O2 ), an active HIGH enable
input (Ein ), an active HIGH enable output (Eout) and an active HIGH group select
output (GS). Data is accepted on inputs fa to f 7 . The binary code corresponding
to the highest priority input (fa to f7) which is HIGH, is generated on 0 0 to O2 if
E in is HIGH. Input h is assigned the highest priority. GS is HIGH when one or
more priority inputs and Ein are HIGH. Eout is HIGH when fa to f7 are LOW and
E in is HIGH. E in , when LOW, forces all outputs (00 to O2 , GS, Eout) LOW. The
circuit is given below.
~
~
~
~
~ >---
-
~ t-
t-
~
1111111
~ 1111111.
~ ~
& & & & & &
::::1 ::::1
& &
00 GS Eout
O2 = Ein . (h + h + h + 17)
0 1 = E in · (12 . ~ . h + 13 . ~ . 15 + 17)
0 0 = Ein · (h . h.. ~. Tr, + 13 . 14 . Tr, + h· Tr, + Ir)
Eaut = Ein . (To . II . 12 . 1;, . ~ . Y;, . Tr, . J.;)
GS = Ein . (10 + h + 12 + 13 + 14 + h + h + 17).
Inputs Outputs
E in Ir h 15 14 fa h h 10 GS O2 0 1 0 0 EOut
L X X X X X X X X L L L L L
H L L L L L L L L L L L L H
H H X X X X X X X H H H H L
H L H X X X X X X H H H L L
H L L H X X X X X H H L H L
H L L L H X X X X H H L L L
H L L L L H X X X H L H H L
H L L L L L H X X H L H L L
H L L L L L L H X H L L H L
H L L L L L L L H H L L L L
5.4 Demultiplexer
A demultiplexer can be used to distribute input information D to various outputs.
It represents an extension of the l-out-of-n decoder. The addressed output does
not go to one, but assumes the value of the input variable D. Figure 5.4 shows
its implementation using AND and NOT gates. If we make D =const= 1, the
demultiplexer operates as a l-out-of-n decoder.
The following figure shows the basic mode of operation and the circuit.
Ao--.-----i
Al-+-----t-.-l
Xo
D-~----_nr-------~
In CMOS the 4555 is a dual 1-of-4 decoder/demultiplexer. Each has two address
inputs (Ao and AI), an active LOW enable input (It) and four mutually exclusive
outputs which are active HIGH (00 to 0 3 ), When used as a decoder (It is HIGH),
then 0 0 to 0 3 is LOW. When used as a demultiplexer, the appropriate output is
selected by the information on Ao and Al with It as data input. All unselected
outputs are LOW.
Inputs Outputs
It Ao Al 0 0 0 1 O2 0 3
L L L H L L L
L H L L H L L
L L H L L H L
L H H L L L H
H X X L L L L
5.5 Multiplexer
A multiplexer or data selector is a logic circuit that accepts several data inputs
and allows only one of them at a time to get through to the output. It is an
extension of an encoder. The routing of the desired data input to the output is
controlled by SELECT inputs (sometimes referred to as ADDRESS inputs). There
are many IC multiplexers with various numbers of data inputs and select inputs.
Thus the opposite of a demultiplexer is a multiplexer. The following figure shows
the multiplexer circuit.
AO---..----1
A 1 _-+-__-+..-!
Da-----i-----1:i===t====1
x
In CMOS technology, a multiplexer can be implemented using both gates and ana-
log switches (transmission gates). When analog switches are employed, signal trans-
mission is bidirectional. In this case, therefore, the multiplexer is identical to the
demultiplexer. The circuit is then known as an analog multiplexer/demultiplexer.
In CMOS the 4019 provides four multiplexing circuits with common select inputs
(SA, SB). Each circuit contains two inputs (An' En) and one output (On). It may
be used to select four bits of information from one of two sources. The A inputs
are selected when SA is HIGH, the E inputs are selected when SB is HIGH. When
SA and SB are HIGH, the output (On) is the logical OR of the An and En inputs
(On = An +En). When SA and SB are LOW, the output (On) is LOW independent
of the multiplexer inputs.
98 Chapter 5. Combinational Circuits
0+0 = 0, 0+ 1 = 1, 1+0=1.
The sum 1+1 requires two bits to represent it, namely 10, the binary form of
two(decimal). This can be expressed as follows: one plus one yields a sum bit
S = 0 and a carry bit C = 1. If we ignore the carry bit and restrict the sum to the
single bit s, then we obtain 1 + 1 = O. This is a very useful special form of addition
known as modulo-2 addition.
The half adder circuit can be realized using an XOR gate and an AND gate. One
output gives the sum of the two bits and the other gives the carry. In CMOS for
the AND gate the 4081 can be used and for the XOR gate the 4030 can be used.
The circuit and the figure for the input and output is shown in the following figure.
Inputs Outputs
Ao Al S C
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
C= Ao·A1.
5.6 Binary Adder 99
We can construct logic expressions for these operations. The five gates (three AND
gates, one OR gate and one XOR gate) used in the full adder given below lead to
the logic equations
Note that the + is the logical OR operation, and· is the AND operation and EEl is
the XOR operation.
sum
Yi
A2
carry
Al Yo
Ao
Inputs Outputs
Ao Al A2 Yo YI
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
1
carry 1 o
sum
To add two four-bit numbers, four adders are connected in parallel as shown for the
addition of binary 1110 (14 decimal) and 0111 (7 decimal) to give the sum 10101
(21 in decimal). By joining more full adders to the left of the system, numbers with
more bits can be added.
o
1
1
carry o 1 o 1
sum
In CMOS the 4008 is a 4-bit binary full adder with two 4-bit data inputs, a carry
input, four sum outputs, and a carry output. The IC uses full look-ahead across
4-bits to generate the carry output. This minimizes the necessity for extensive
look-ahead and carry cascading circuits.
5.6 Binary Adder 101
G2 = A2 · B2 + A2 · AI· BI + AI· B2 · BI +
A2 · AI· Ao· Bo + A2 · Ao· BI . Bo + AI· Ao· B2 · Bo + Ao· B2 · BI . Bo +
A2 . Al . Ao . Gin + A2 . Al . Bo . Gin + A2 . Ao . BI . Gin + Al . Ao . B2 . Gin +
A2 . Bl . Bo . Gin + Al . B2 . Bo . Gin + Ao . B2 . BI . Gin + B2 . Bl . Bo . Gin.
Each full adder requires 2 levels of gates to calculate the carry bit. Thus we have
reduced the carry computation from 6 levels of gates to 2 levels of gates. The circuit
for the computation is given below.
I
2:
I 1
I
888888888888888
Figure 5.10: Circuit for the Carry Bit of a 3-bit Adder
Similary the calculation for the carry bit of a 4-bit adder will be reduced from 8
levels of gates to 2 levels of gates.
102 Chapter 5. Combinational Circuits
A-B = A+ (-B)
where the negation is two's complement.
4-bit Adder
A-B
n n
A = L:ai2i and B = L:bi 2i
i=O i=O
1. j := 0, result := °
2. if bj is 1 add A to result
3. shift left A
4. increment j
To ensure that the product of two n-bit numbers can be represented the output may
be extended to 2n bits.
104 Chapter 5. Combinational Circuits
0000 A
Product
CA- Controlled 4-bit Adder
SL - Logical 4-bit Shift Left
The Russian peasant method [104] uses the same technique for multiplication, with a
small change to simplify implementation. It is a practical method for multiplication
by hand, since it involves only the operations of doubling, halving and adding.
Western visitors to Russia in the nineteenth century found the method in wide use
there, from which the method derives its name.
1. j := 0, result := 0
2. if bo is 1 add A to result
3. shift left A
4. shift right B
5. increment j
Bl Bo
Al AD
Ao' BI Ao' Bo
AI' BI AI' Bo
P3 P2 PI Po
Po Ao·Bo
Pl (Ao . B I ) EB (Al . Bo)
g (AI' B l ) EB (Al . Ao . Bl . Bo)
P3 Al . AD . Bl . Bo
Ao----+---------------~----~----------------_.
Al--~r-------~------~----_+----_,
Bo--~~------_+----~H_----_+----~--------__,
Bl--~r-------~----~H_--_,
Po
n
L 2k = 2 +l -
n 1.
k=O
This can be extended for two's complement numbers. The two's complement (nega-
tion) of the same number gives 2N - 2n + 2k+l. The contribution of 2N is an overflow
and does not influence the operation. Thus addition and subtraction can be per-
formed whenever a 1 and 0 are adjacent in the bit representations.
For Booth's algorithm we introduce an extra bit Q-l which is used to determine the
boundaries of blocks of O's or 1's. The final product is in AQ. A and Q are n-bit
registers. The arithmetic shift right (SHR) operation shifts all bits one position
right, and leaves the highest order bit (sign bit) at its previous value.
1. A:= 0, Q-l := 0
M :=Multiplicand
Q :=Multiplier
C:=n
3. If QOQ-l = 01 A := A + M
4. If QOQ-l = 10 A := A - M
1. A:= 0
M :=Divisor, Q :=Dividend, C := n
2. SHL AQ
3. A :=A-M
7. increment C
1. Load the divisor into the M register and the dividend into the AQ registers.
The dividend must be expressed as a 2n two's complement number. Thus,
for example, the 4-bit number 0111 becomes 00000111, and 1001 becomes
11111001.
3. If M and A have the same signs, perform A:= A-M; otherwise, A:= A+M.
4. The above operation is successful if the sign of A is the same before and after
the enumeration.
5. Repeat steps (2) through (4) as many times as there are bit positions in Q.
6. The remainder is in A. If the signs of the divisor and dividend are the same,
the the quotient is in Q; otherwise the correct quotient is the two's complement
ofQ.
108 Chapter 5. Combinational Circuits
In CMOS the 4585 is a four-bit magnitude comparator which compares two 4-bit
words (A and B), whether they are 'less than', 'equal to' or 'greater than'. Each
word has four parallel inputs (Ao to A 3) and (Bo to B 3); A3 and B3 being the most
significant inputs. Three outputs are provided. A greater than B (0A>B), A less
than B (OA<B) and A equal to B (OA=B). Three expander inputs (IA>B, 1A<B and
1A=B) allow cascading of the devices without external gates. For proper comparison
operation the expander inputs to the least significant position must be connected as
follows:
1A<B = LOW
For words greater than 4-bits, units can be cascaded by connecting output 0 A<B and
oA=B to the corresponding inputs of the next significant comparator (input 1A>B is
connected to a HIGH). Operation is not restricted to binary codes, the devices will
work with any monotonic code. Table 5.5 displays the truth table for the CMOS
4585. The following notation is used. H=HIGH state (the more positive voltage),
L=LOW state (the less positive voltage), X = state is immaterial. The upper 11
lines describe the normal operation under all conditions that will occur in a single
device or in a serial expansion scheme. The lower 2 lines describe the operation
under abnormal conditions on the cascading inputs. These conditions occur when
the parallel expansion technique is used. The circuit consists of 8 XNOR gates and
one NAND gate.
In CMOS the 74LV688 is an 8-bit magnitude comparator. It takes two 8-bit numbers
provided by the inputs Po to P7 and Qo to Q7. The output is
P=Q.
Table 5.6 shows the function table for the CMOS 74LV688 and Figure 5.14 the logic
diagram for the CMOS 74LV688.
5.10 Magnitude Comparator 109
Inputs Output
Data Enable
Pn,Qn E P=Q
P=Q L L
X H H
P>Q L H
P<Q L H
&
Carry propagate (F) and carry generate (G) outputs are provided to aHow a full
look-ahead carry scheme for fast simultaneous carry generation for the four bits in
the package. Fast arithmetic operations on long words are obtainable by using the
MC14582B as a second order look ahead block. An inverted ripple carry input (Cn )
and a ripple carry output (Cn +4) are included for ripple through operation.
When the device is in the subtract mode (LHHL), comparison of two 4-bit words
present at the A and B inputs is provided using the A = B output. It assumes a
high-level state when indicating equality. Also, when the ALU is in the subtract
mode the Cn +4 output can be used to indicate relative magitude as shown in this
table
Inputs/Outputs Inputs/Outputs
Function Select
Active Low Active High
Logic Arithmetic' Logic Arithmetic'
83 82 81 80 Function Function Function Function
(MC=H) (MC=L, Cn=L) (MC=H) (MC=L, Cn=H)
L L L L A A minus 1 A A
L L L H A·B A·B minus 1 A+B A+B
L L H L A+B A·B minus 1 A·B A+B
L L H H Logic 1 minus 1 Logic 0 minus 1
L H L L A+B A plus (A+B) A·B A plus A·B
L H L H B A· B plus (A+B) B (A + B) plus A . B
L H H L AEJlB A minus B minus 1 AEJlB A minus B minus 1
L H H H A+B A+B A·B A· B minus 1
H L L L A·B A plus (A+B) A+B A plus A· B
H L L H AEJlB A plus B AEJlB A plus B
H L H L B A . B plus (A + B) B (A + B) plus A . B
H L H H A+B A plus B A·B A· B minus 1
H H L L Logic 0 A plus A Logic 1 A plus A
H H L H A·B A· B plus A A+B (A+B) plus A
H H H L A·B A- B plus A A+B (A+B) plus A
H H H H A A A A minus 1
The * indicates that the inputs are expressed in two's complement form. For arith-
metic functions with Cn in the opposite state, the resulting function is as shown
plus 1.
Thus, for active high inputs, the basic logic functions are achieved with the following
selections and M C = H.
83 82 81 80 Logic Function
L L L L NOT
H L H H AND
H H H L OR
L H L L NAND
L L L H NOR
L H H L XOR
83 82 81 80 Cn Arithmetic Function
H L L H L Addition
L H H L H Subtraction
L L L L L Increment
H H H H L Decrement
112 Chapter 5. Combinational Circuits
A ROM essentially consists of a decoder of the binary encoded input number, called
the address, an array of OR gates, and a set of output drivers. The decoder yields
a selector signal for each input value, addressing each cell .
..--
o
1
- 2
- 3
- 4
5
6
'----
7
Do
which is the binary representation of 31. The ROM can be used to speed up certain
tasks. For example it can store multiplication tables. Of course ROMs can also be
used to store identification strings or any other data.
5.13 Combinational Programmable Logic Devices 113
A programmable gate is one where the inputs to the gate can be selected from
a given set of inputs (for example from other gates). If no inputs are selected
we assume that all inputs are O. We introduce a new notation to simplify circuit
representation. A cross x indicates a programmable connection to a gate, in other
words a connection which can be removed (for example a fuse that can be burnt
open). A dot. indicates a fixed connection. The following figure shows an AND
gate with programmable inputs (the inputs from Ao and A2 can still be removed)
and an OR gate with two fixed inputs.
Ao ·1· A2
In the following examples we use two inputs, four AND gates and one OR gate for
the output. In general, for n inputs and m outputs, 2n AND gates and m OR gates
are required. One way to implement a programmable AND gate is to have an AND
gate with 2n inputs (for an input and its inverse) and to set the input to 1 whenever
an input is not connected. Similarly for the OR gate an input can be set to o. A
special case is when no input is selected, the output of the gate must be zero (as
if the gate is not present). In this case we set all inputs to the gate to o. In this
way gates with a fixed number of inputs can be used as programmable gates. For
each architecture we show the circuit before programming and after programming
the XOR operation.
114 Chapter 5. Combinational Circuits
PROM stands for programmable read only memory. These devices consist of a
number of fixed AND gates (fixed in input) and programmable OR gates. The
AND gates are over all possible inputs. For an n variable system there are 2n AND
gates. All connections are initially closed, the unwanted connections are then burnt
by applying a voltage to the appropriate inputs. Once a PROM is programmed
it cannot be reprogrammed. The EPROM or erasable PROM can be erased (all
connections are closed). The EEPROM is an electrically erasable PROM.
Ao~----~--~----~--~
Al~----~--~----~--~-
Ao~----4-----~--~----+
Al~----4---~~---r----+-
PAL stands for programmable array logic. A number of programmable AND gates
feed fixed OR gates in these devices. The AND gates represent the product forms
of the desired expression's SOP form. Specific AND gates are dedicated to specific
OR gates.
GAL stands for generic array logic. They are used to emulate PALs. Different types
of PALs can then be replaced with a single device type (the GAL device).
Ao~----~--~~--~----*
Al~----*---~--~~--~
Ao~----~--~~---+----+
Al~----~----~---+----+-
PLA stands for programmable logic array. These devices provide the greatest
programming flexibility through the use of programmable AND gates feeding pro-
grammable OR gates. Any AND gate can feed any OR gate.
Ao~----*---~--~~--~
Al~--~~--~--~~--*-
Ao~----+---~--~~--~
x
Figure 5.23: Example of a Combinational FPGA Cell
To design a circuit using FPGA the cell functions must be specified as well as the
connections between cells. Determining which connections must be closed is called
routing. For example a FPGA may have a grid pattern where the output can be
connected to four adjacent cells. Each outward going arrow is a duplicate of the
function output. Each input arrow can be configured to be closed or open.
5.15 VHDL
VHDL [151] is a standardized language that is not tied to any single tool vendor
or hardware manufacturer. It is a complete programming language with built-in
mechanisms to handle and synchronize parallel processes, and also supports abstract
data types and high level modelling. The IEEE adopted VHDL as a standard in
1987.
VHDL was initially intended to describe digital electronics systems. It can be used
to model existing hardware for verification and testing, and also for synthesis.
-- eqcomp4.vhd
6.1 Introduction
The combinational circuits introduced so far perform a function and, except for
the ROM, do not provide any memory. The ROM provides a static memory, the
content is predetermined. A system providing dynamic memory is required to store
data which cannot be predetermined. Any two state (bistable) system which can be
dynamically controlled will provide this function. Many systems acting in parallel
can provide the required data width for operations provided by combinational cir-
cuits. The bistable systems are called latches and the parallel combination registers.
Combinational circuits are free of loops. In this chapter we examine circuits with
feedback loops. This is what allows them to store information. Propagation delays
are important in the analysis of these circuits.
The following chapter introduces mechanisms for an external source of timing. The
timing system helps describe the logic functions of these circuits under specific
conditions.
In this chapter we discuss the SR latch and J K latch which use two inputs. One
sets the logical value of the latch to 0 and the other sets the logical value of the latch
to 1. The D latch has only one input and remembers the logical value of the input
for one time interval. The D register and J K register use D latches and J Klatches
respectively, to provide the same logical action as the latches, but with different
physical characteristics.
6.2 SR Latch
This circuit has two inputs labelled S (Set) and R (Reset), and two outputs Q and
Q, and consists of two NOR gates connected in a feedback arrangement.
R
Q
The following table summarizes the characteristics of the operation of the latch.
The Sand R should not both be set at the same time as this gives an undetermined
value for Q.
St Rt Qt Qt+l
0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 -
1 1 1 -
The circuit is stable when S = R = 0 (Qt+l = Qt). The output is time dependent
and there is a delay from the time that one of S or R are set to one and the time
when the circuit is stable again. If S = 0 and R = 1 the system is reset. If S = 1
and R = 0 the system is set. The logical equation for Qt+l (if St and Rt are not 1
at the same time) is
6.3 D Latch 121
6.3 D Latch
The input S = R = 1 must be avoided when using the SR Latch. The D latch
overcomes this by using only a single D input. The ouput is always the same as the
last D input.
D-4----j
The D latch is sometimes called the data latch because it stores 1 bit of information.
It is also called the delay latch because it delays the output of the 0 or 1 (in an
environment where a CLOCK input is provided, the delay is one clock cycle). The
characteristic table for the D latch is as follows:
D Qt+l
0 0
1 1
The latch described above is called transparent since the output Q is the same as
the input D. An extra input can be introduced to indicate when to set the output
identical to the given input.
G-+--....
6.4 JK Latch
The JK latch takes two inputs. Unlike the SR latch all input combinations are
valid. The J input performs the set function while the K input performs the reset
function. When J = K = 1 the toggle function is performed (the outputs are
inverted).
Jt K t Qt QHl
0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 0
6.5 D Register
The transparency of a D latch can be undesirable. It may be preferable to accept
an input value upon the rising edge of a control signal, and retain the stored value
before and after the transition. Latches are level-sensitive whereas registers are
edge-sensitive. An example implementation of this is the master-slave latch pair. It
consists of two latches connected in series with each enable input the inverse of the
other. This separates the storage of the input D and the output Q. The boxes with
the symbols D, G and Q represent D latches.
c:~Q
CK
6.6 JK Register
Similar to the D register, the principle for the J K register is based on the J Klatch.
A master-slave configuration can again be used to implement this register.
J
~Q K ~Q ~Q
K=O=O=K J J
CK
CK"-------'
Each J K latch has two additional AND gates for each input where the appropriate
CK or CK is the second input to each AND gate. This construction has the same
purpose as the G input in a D latch. A variation on the register is to use the Q and
Q feedback loops directly to the first input and not for each latch. The following
figure shows a JK-Master-Slave register.
7.1 Introduction
Circuits that react immediately to the stimulus of the input are called asynchronous.
This term is a combination of the greek words meaning "without regard to time" .
In digital systems it is important that outputs change at precise points in time.
Circuits that operate in this manner are called synchronous. Digital circuits often
use time reference signals called clocks. A clock signal is nothing more than a
square wave that has a precise known period. The clock will be the timing reference
that synchronizes all circuit activity and tells the device when it should execute
its function. Thus the clock signal is the signal that causes things to happen at
regularly spaced intervals. In particular, operations in the system are made to take
place at times when the clock signal is making a transition from 0 to 1 or from
1 to O. These transitions are pointed out in the figure. The O-to-l transition is
called the rising edge or positive-going edge of the clock signal. The synchronous
action of the clock signal is the result of using clocked latches, which are designed to
change states on either (but not both) the rising edge or the falling edge of the clock
signal. In other words, the clocked latches will change states at the appropriate
clock transition and will rest between successive clock pulses. The frequency of the
clock pulses is generally determined by how long it takes the latches and gates to
respond to the level changes by the clock pulse, that is, the propagation delays of
the various logic circuits.
1
o
--> Time
Many ways of designing and controlling latches have evolved over the years. They
differ not only in their logic design but also how they use the clock signal. Let
us consider a latch. During the period tl : t2 when the clock is enabled C = 1,
any change made to the data signal may enter the latch immediately. After some
propagation delay, these changes affect the latch's data output Q (and also Q) during
the period t3 : t 4. Thus, ignoring the brief and somewhat uncertain transition
periods when the data and clock signals are actually changing values, the latch
responds to all input changes that occur when C is at the inactive 1 level. For this
reason latches are said to be level sensitive or level-triggered.
t} Data input tJ tl
I changes accepted I
Clock 6I I I
I I
I Output Q I
.. may change ...
t3 t4 t3
To obtain latch behavior, we must ensure that the period tl : t2 (when input data
changes are accepted) and the period t3 : t4 (when the output data changes) do not
overlap. One way a latch can meet this requirement is by accepting input changes
when C = 1, and changing its output when C = o. This pulse mode of operation
was used in some early designs for bistables. The clocking method most commonly
used in modern latch design is edge triggering, in which a transition or edge of the
clock signal C causes the actions required in tl : t2 and t3 : t4 to take place, as
shown in the figure.
Data input
changes accepted
T"gge'in~t; t2
Clock 6I,--.tl::~=:::-'--_""&-",,,-,~-
Output Q
may change
Identification of shift registers may be made by noting how data is loaded into and
read from the storage unit. In the following figure we have a register 8 bits wide.
The registers are classified as:
1. Serial-in serial-out
2. Serial-in parallel-out
3. Parallel-in serial-out
4. Parallel-in Parallel-out
Parallel in Parallel in
~ ~
1011 101 1
++++ Serial-out
IlIOI1l1f-.- 1011
~10 11
'--v---'
Parallel out
(3) Parallel-in serial-out (4) Parallel-in parallel-out
A simple four-bit shift register is displayed in the following figure. It uses four D-
latches. Data bits (Os and Is) are fed into the D input of latch 1. This input is
labelled as the serial data input. The clear input will reset all four D latches to 0
when activated by a LOW. A pulse at the clock input will shift the data from the
serial-data input to the position A (Q of latch 1). The indicators (A, B, C, D) across
the top of the figure show the contents of the register. This register can be classified
as a serial-in parallel-out unit if data is read from the parallel inputs (A, B, C, D)
across the top.
Serial-data D Q
CLK CLK CLK CLK
CLR CLR CLR CLR
Clear -+--+---+--+-------1--+--+-----'
Clock -4-_ _ _--+---_ _---<>--_ _.....J
In CMOS the 4014B is a fully synchronous edge-triggered 8-bit static shift regis-
ter with eight synchronous parallel inputs, a synchronous serial data input, a syn-
chronous parallel enable, a LOW to HIGH edge-triggered clock input and buffered
parallel outputs from the last three stages.
7.3 Binary Counter 129
Next we discuss the basic counter operation. The Figure 7.6 shows the schematic
representation of a 4-bit counter. This counter contains four latches, one per bit,
with outputs labelled A, B, C, and D. Two inputs are shown, the clock pulse
input, CP, and Reset. The counter operates such that the states of the four latches
represent a binary number equal to the number of pulses that have been applied to
the C P input. The diagram shows the sequence which the latch outputs follow as
pulses are applied. The A output represents the LSB (least significant bit) and D
is the MSB (most significant bit) of the binary count. For example, after the fifth
input pulse, the outputs DCBA = 0101, which is the binary equivalent of 5. The
C P input has a small circle and triangle to indicate that the latches in the counter
change states on the negative going edge of the clock pulses. Counters that trigger
on positive-going edges are also available and they do not have the circle on the C P
input.
D C B A
0 0 0 0 Before 1st input pulse
16 Different 0 0 0 1 After 1st input pulse
possible 0 0 1 0 After 2nd input pulse
states 0 0 1 1 After 3rd input pulse
In addition to counting pulses, all counters can perform frequency division. This is
illustrated in the following figure for the 4-bit, MOD-16 counter. The state of the
A output is seen to change at a rate exactly ~ that of the CP input. The C output
is ~ that of the A output and ~ of the CP input. The C output is ~ the frequency
l
of the B output and the input frequency, and the D output is ~ the frequency of
-k
the C output and the input frequency. In general, the waveform out of the MSB
latch of a counter will divide the input frequency by the MOD number.
CP
CP
A
B-------'
c------'
D------------'
The counters described above can count up from zero to some maximum count and
then reset to zero. There are several IC counters that can count in either direction
and are called up/down counters. The following figure shows the two basic up/down
counter arrangements. The counter in this Figure has a single CP input that is
used for both count-up and count-down operations. The UP/DOWN input is used
to control the counting direction. One logic level applied to this input causes the
counter to count up from 0000 to 1111 as pulses are applied to CPo The other logic
level applied to UP/DOWN causes the counter to count down from 1111 to 0000
as pulses are applied to CPo The second counter does not use and UP/DOWN
control input. Instead, it uses separate clock inputs CPu and CPD for counting up
and down, respectively. Pulses applied to CPu cause the counter to count up, and
pulses applied to CPD cause the counter to count down. Only one CP input can
be pulsed at one time, or erratic operations will occur.
In CMOS the 4516 is an edge triggered synchronous up/down 4-bit binary counter
with a clock input and an up/down count control input.
7.3 Binary Counter 131
CP CPu
Up/Down CPD
1 J Q
CLK CLK
1 K 1 K
The C LK input of the second latch is driven by the output of the first latch. The
C LK input of the first latch is driven by an external clock signal. Every second
toggle action of the first latch will cause the the second latch to toggle. The output
A is the least significant bit and B is the most significant bit of the binary counter.
This ripple action of one latch depending on the output of the previous latch can
necessitate potentially large clock cycles, due to propagation delays. To avoid this
lag, latches can be updated in parallel. The latches are driven by the same external
clock at their CLK inputs. This is illustrated below in another mod-4 counter.
1 J Qr---+----I J Q
CLK CLK
1 K K
VHDL can also be used to simulate a synchronous circuit. For example, consider
A
B
----..----1 ~1 1------- Y
D Q X
eLK ----------~
entity simple is
porteA, B, eLK: in bit;
X, Y out bit);
end simple;
The instruction BTFSC is the "bit test f and skip if clear" instruction. If the tested
bit of f is 0 the next instruction is skipped. Thus if the BTFSC is executed with the
operands STATUS and 0, the carry flag (STATUS register bit 0) is tested to determine
if the next instruction is executed. The instruction DECFSZ is the "decrement f and
skip if zero" instruction. The value of register f is decremented, and if the result is
zero the next instruction is skipped.
; multiply.asm
;*******************************************************************
Multiplies two 8 bit numbers
00000011 (decimal 3)
and
01100101 (decimal 101)
and stores the result (16 bits)
00000001 00101111 (decimal 303)
in LBYTE and HBYTE
LBYTE: 00101111
HBYTE: 00000001
;*******************************************************************
PROCESSOR 16f84
INCLUDE "p16f84.inc"
; Variable Declarations
LBYTE EQU H'l1' variable at address Ox11 in SRAM
HBYTE EQU H'12' variable at address Ox12 in SRAM
COUNT EQU H'13' variable at address Ox13 in SRAM
NOA EQU H'20' first number at address Ox20 in SRAM
NOB EQU H'21' second number at address Ox21 in SRAM
134 Chapter 7. Synchronous Circuits
ORG H'OO'
Start
BSF STATUS, RPO
MOVLW B'llllllll'
MOVWF PORTA
MOVLW B'OOOOOOOO'
MOVWF PORTB
BCF STATUS, RPO
CLRF LBYTE
CLRF HBYTE
MOVLW 8
MOVWF COUNT
MOVLW B'OOOOOOll'
MOVWF NOA
MOVLW B'01100101'
MOVWF NOB
MOVF NOB, W
BCF STATUS, 0
LOOP
RRF NOA
BTFSC STATUS, 0
ADDWF HBYTE
RRF HBYTE
RRF LBYTE
DECFSZ COUNT, F
GOTO LOOP
MOVF HBYTE, 0
MOVWF PORTB
Stop GOTO Stop
END
Chapter 8
Recursion
8.1 Introduction
Recursion is a fundamental concept in mathematics and computer science. It is a
useful tool for simplifying solutions to problems. A recursive solution is possible if
a problem can be solved using the solution of a simpler problem of the same type
and a solution to the simplest of problems of the same type is known. A recursive
solution to a problem consists of
Let us now list some recursive structures. One of the most important recursive
structures are strings. The string manipulation functions can be implemented using
recursion, for example to find the length of a string and reverse a string. The
linear linked list is a recursive structure; it has a head followed by a linked list. An
example implementation of a recursive linked list is given later in the next chapter,
it allows lists to be copied, compared, searched and items to be inserted and deleted
recursively. Another structure which is recursive is the binary tree.
b d
/ / f(x,y)dxdy.
a c
The evaluation of the outer integral requires us to know the value of the integrand
at selected points, and calculation of the integrand requires the evaluation of an
integral, so that the subprocess is the same as the main process.
Example. The set N~ is bijective with No. To see this we write the elements of N~
in a table.
We now write down the elements of this table by moving along the diagonals which
go from north-east to south-west, that is, we write them in the sequence
(0,0), (0,1), (1,0), (0,2), (1,1), (2,0), (0,3), (1,2), ...
Since there are (k+ 1) pairs (r,s) with r+s = k, we see that the pair (m,n) occurs
in the position
(m+n)(m+n+1)
1+2+ ... +(m+n)+m= 2 +m.
1
f(m,n) = "2(m+n)(m+n+ 1) +m.
8.1 Introduction 137
We have two functions 9 and h from No -> No such that f-l(r) = (g(r), h(r)).
They are given by the following formulas. Find 8 E No so that
1 1
28(8 + 1) :S r < 2(8 + 1)(8 + 2).
Let m be
1
r - -8(8
2 + 1) .
Example. Let n = 0,1, ... and f(n) = 2n. Then we can find the recursive definition
as follows
f(n + 1) = 2n +1 = 2 . 2n = 2f(n)
Thus f(n + 1) = 2f(n) where f(O) = 1.
n = 0,1,2, ...
where Fo = Fl = 1.
138 Chapter 8. Recursion
Example. The Bessel functions In(x) are solutions of the linear second order dif-
ferential equation
n = 0, 1,2, ....
n = 0,1,2, ...
where
00 . x2j
Jo(x) = ~)-1)1 ITj _ (2k)2
1-0 k-l
dy
dx = f(x,y(x)), y(xo) = Yo
Jf(s,y(s))ds.
x
y(x)=Yo+
XO
Jf(s, Yn(s))ds.
x
Yn+l(X) = Yo +
XO
As an example we consider
dx
dy = x + y, Xo = 0, y(xo) = 1.
Yn+1(x) = 1+ j(S+Yn(S))dS.
o
Yo(X) = 1
Y3(X)
140 Chapter 8. Recursion
• Initially all n discs are on peg A with the largest disc at the bottom and discs
decrease in size towards the top of the pile. If disc 1 is above disc 2 then disc
1 is smaller than disc 2.
• Only one disc may be moved at a time. A disc must be moved from the top
of a pile on one peg to the top of a pile on another peg. A larger disc may not
be placed on a smaller one.
If n=l we can move the disc from A to B. If n=2 we can move a disc from A to C,
A to B, C to B. This is the inspiration for the solution to the general problem. If
n > 2 move the pile of n - 1 discs from A to C, move the disc on A to B and move
the pile on peg C to peg B.
II hanoi.cpp
#include <iostream>
void main(void)
{
cout « "Tower of Hanoi with 1 disc:" « endl:
hanoi(l, 'A', 'B', 'C');
cout « "Tower of Hanoi with 2 discs:" « endl;
hanoi(2,'A','B','C'):
cout « "Tower of Hanoi with 3 discs:" « endl;
8.2 Example Programs 141
n even
n odd
with a E Rand n E N and ~ is calculated using integer division (i.e. l~J). The
program power. cpp implements the solution.
II power.cpp
#include <iostream>
#include <iomanip>
power_ndiv2 = power(a,n/2);
if(n )
return a*power_ndiv2*power_ndiv2;
return power_ndiv2*power_ndiv2;
}
void main(void)
{
cout « "13.4~0=" « power(3.4,0) « endl;
cout « "2~24=" « setprecision(9) « power(2,24) « endl;
cout « "3.1415~7=" « setprecision(9) « power(3.1415,7) « endl;
}
13.4~O=1
2~24=16777216
3.1415~7=3019.66975
8.2 Example Programs 143
1. R ~ A x A.
We view the statement (a, b) E R with a, b E A as a proposition. We also
write (a, b) ERas aRb
2. aRb and bRc implies aRc
A fast sorting method would be to place the elements of S in a tree as they occur
in the sequence and traverse the tree to find the sorted sequence. Another fast
sorting algorithm called quicksort is implemented using recursion. The algorithm
first partitions the sequence around an element Si such that all elements on the
left of Si have the property sjRs i and all elements to the right of Si do not. The
next step is to sort each of the partitions, and we use use quicksort to do this (i.e.
recursively). The program qsort. cpp uses the function partition to partition the
sequence at each step of the qsort algorithm. This is the most important part of
the algorithm. The function takes an element of the array and rearranges the array
such that all elements before are less than the given element and all elements after
are greater than the given element.
II qsort.cpp
#include <iostream>
#include <string>
p=O;
while(i > 0)
{
if(R(array[p],array[pe]) > 0)
{
temp1 = array[p]; temp2 = array [p+1] ;
144 Chapter 8. Recursion
partition(array,n,R,pelement);
qsort(array,pelement,R);
qsort(array+pelement+l,n-pelement-l,R);
}
void main(void)
{
int testl[9] = {1,5,3,7,2,9,4,6,8};
string test2[6] = {"orange","grape","apple","pear","banana","peach"};
int i;
qsort<int>(testl,9,less_int);
qsort<string>(test2,6,less_string);
for(i=0;i<9;i++) cout « test1[i] « " ,
II.
cout « endl;
for(i=0;i<6;i++) cout « test2[i] « " ,
II.
cout « endl;
}
1 2 3 4 5 6 7 8 9
apple banana grape orange peach pear
8.2 Example Programs 145
f(n, m) := 1 m+l
f(n - 1,1)
ifn=O
if m = 0
f(n -1,f(n, m - 1)) otherwise
II acker.cpp
#include <iostream>
void main(void)
{
cout«"f (1, 1)="«ackermann(1, 1)«"
«"f (2, 1)="«ackermann(2, 1)«"
«"f (3, 1) ="«ackermann(3 , 1) «endl ;
cout«"f (1 ,2)="«ackermann(1, 2)«"
«"f(2,2)="«ackermann(2,2)«"
«"f(3,2)="«ackermann(3,2)«endl;
cout«"f (1 ,3)="«ackermann(1 ,3)«"
«"f(2,3)="«ackermann(2,3)«"
«"f(3,3)="«ackermann(3,3)«endl;
}
f(x) = 4x(1 - x)
t = 0,1,2, ...
where Xo E [0,1] is the initial value. Thus we can implement the function to compute
Xt recursively. Of course it makes more sense to implement the function using
iteration [164].
II logistic.cpp
#include <iostream>
void main(void)
{
cout « "x100 = " « logistic(100,O.3899)
« " when xO=O.3899" « endlj
cout « "x500 = " « logistic(500,O.5)
« " when xO=O.5" « endl;
cout « "x10000 = " « logistic(10000,O.8988i)
« " when xO=O.89881" « endlj
}
where x, a5, a4, a3, a2, al and ao are given numbers. Finding P5 would involve
5 + 4 + 3 + 2 + 1 = 15 multiplications and 5 additions. Rewriting this in the form
(Horner's rule)
reduces the number of multiplications to five and we still have five additions. In
general, let
Pn(X) = anx n + an_lX n - 1 + ... + alx + ao
which can be rewritten as
Then we have n multiplications and n additions. The next program shows a non-
recursive implementation of Horner's rule in C++.
II horner1.cpp
#include <iostream>
void main(void)
{
const double a[5] = { 1.0,0.5,0.0,-18.0,3.0 };
II horner2.cpp
#include <iostream>
void mainCvoid)
{
const double a[5] = { 1.0,0.5,0.0,-18.0,3.0 };
p(x) = 3x"4-18x"3+x/2+1
p(O.O) = 1
PC-1.0) = 21.5
P(5.0) = -371.5
8.2 Example Programs 149
Example. The following Java program is used to construct a graphic pattern called
a Hilbert curve. Each curve Hi consists of four half-sized copies of Hi - 1 with a
different orientation. The Hilbert curve is the limit of this construction process,
i.e Hoo. Thus we can implement the methods AO, BO, CO and DO to draw the
four copies for each step in the construction of the Hilbert curve using recursion.
Lines are drawn to connect the four copies. For example, the first three steps in
constructing the Hilbert curve are given below.
/ / Hilbert. java
import java.awt.*;
import java.awt.event.*;
public HilbertCurve() { n = 5j }
public void A0
{
if(n > 0)
{
Graphics g = getGraphics()j n--j
D()j g.drawLine(x, y, x-h, y)j x-=hj
A()j g.drawLine(x, y, x, y-h)j y-=hj
A()j g.drawLine(x, y, x+h, y)j x+=hj
BO j n++j
}
}
public void B0
{
if(n > 0)
{
Graphics g = getGraphics()j n--j
C()j g. drawLine (x, y, x, y+h)j y+=hj
B()j g. drawLine (x, y, x+h, Y)j x+=hj
8.2 Example Programs 151
The Jacobi elliptic functions can be defined as inverse of the elliptic integral of first
kind [53J. Thus, if we write
x(rI.. k) -
'1', -
JVI
o
4>
ds
- k2'sm 2 s
(B.1)
For k = 0 we obtain
2
sn(x, 1) == tanh(x), cn(x,l) == dn(x, 1) == - - -X (B.4)
eX + e-
(8.6)
(8.7)
8.3 Mutual Recursion 153
The expansions of the Jacobi elliptic functions in powers of x up to order 3 are given
by
x3
sn(x, k) = x - (1 + k2 ) , + ... (8.8)
3.
x2
cn(x, k) = 1 - , + ... (8.9)
2.
x2
dn(x,k) = 1- k2 , + ... (8.10)
2.
We can now use the identities (8.5)-(8.7) and the expansions (8.8)-(8.10) to im-
plement the Jacobi elliptic functions using one recursive call. The recursive call in
scdn uses half of the provided parameter x. In other words the absolute value of
the parameter passed in the recursive call is always smaller (by 4). This guarantees
that for fixed f > 0 the parameter Ixl will satisfy Ixl < f after a finite number
of recursive calls. At this point a result is returned immediately using the poly-
nomial approximation (8.8)-(8.10). This ensures that the algorithm will complete
successfully. The recursive call is possible due to the identities for the sn, cn and dn
functions given in (8.5)-(8.7). Since the identities depend on all three functions sn,
cn and dn we can calculate all three at each step instead of repeating calculations
for each of sn, cn and dn [81]. Lastly some optimization was done to reduce the
number of multiplications used in the double angle formulas. We also use the fact
that the denominator of all three identities is the same.
The advantage of this approach is that all three Jacobi elliptic functions are found
with one function call. Furthermore the cases k = 0 and k = 1 include the sine,
cosine, tanh and sech functions. Obviously, for these special cases faster routines are
available. Elliptic functions belong to the class of doubly periodic functions in which
2K plays a similar role to 'if in the theory of circular functions, where K = F(I, k)
is the complete elliptic integral of first kind. We have the identities
To reduce the argument of the Jacobi elliptic functions we can also apply these
identities.
The recursion method described above can be implemented using C++ as follows.
The arguments to the function scdn are
• eps, the upper bound on the argument x for application of the Taylor expan-
sion approximation.
Using the implementation we calculate the sine, cosine, identity, hyperbolic tan,
hyperbolic sec and hyperbolic cosec functions for the value 3.14159.
I I jacobi. cpp
#include <iostream.h>
#include <math.h>
II forward declaration
void scdn(double,double,double,double&,double&,double&);
void main(void)
{
double x, k, k2, eps;
x = 3.14159;
eps = 0.01;
double resl,res2,res3;
II sin,cos,l of x
k = 0.0;
k2 = k*k;
scdn(x,k2,eps,resl,res2,res3);
cout « "sin(x) = " « resl « endl;
cout « "cos(x) = " « res2 « endl;
cout « "l(x) = " « res3 « endl;
II tanh,sech,sech of x
8.3 Mutual Recursion 155
k = 1.0;
k2 = k*k;
scdn(x,k2,eps,resl,res2,res3);
cout «"tanh(x) ,,« resl « endl;
cout « "sech(x) " « res2 « endl;
cout « "sech(x) " « res3 « endl;
}
s = 2.0*sh*ch*dh/denom;
c = (1.0 - 2. 0*sh2+sh4)Idenom;
d = (1.0 - 2.0*k2*sh2+sh4)/denom;
}
}
156 Chapter 8. Recursion
M-l
</J(x) = L ck</J(2x - k)
k=O
where the range of the summation is determined by the specified number of nonzero
coefficients M. The number of the coefficients is not arbitrary and is determined
by constraints of orthogonality and normalization. Owing to the periodic boundary
condition we have
Ck := Ck+nM
where n E N. Generally, the area under the wavelet curve over all space should be
unity, i.e.
R
J ¢(x)dx = 1.
It follows that
M-l
L Ck = 2.
k=O
In the Hilbert space L 2 (R), the function ¢ is orthogonal to its translations; i.e.
J ¢(x)¢(x - k)dx = 0, k# o.
R
What is desired is a function 1jJ which is also orthogonal to its dilations, or scales,
i.e.,
J
R
1jJ(x)1jJ(2x - k)dx = o.
8.4 Wavelets and Recursion 157
Such a function 'IjJ does exist and is given by (the so-called associated wavelet func-
tion)
'IjJ(x) = L(-I)kc1 _k1>(2x - k)
k=l
which means that the above sum is zero for all m not equal to zero, and that the
sum of the squares of all coefficients is two. Another equation which can be derived
from the above conditions is
L(-I)kC1-kCk-2m = o.
k
A way to solve for 1> is to construct a matrix of coefficient values. This is a square
M x M matrix where M is the number of nonzero coefficients. The matrix is
designated L with entries
Lij = C2i-j·
This matrix has an eigenvalue equal to 1, and its corresponding (normalized) eigen-
vector contains, as its components, the value of the function 1> at integer values of
x. Once these values are known, all other values of the function 1> can be generated
by applying the recursion equation to get values at half-integer x, quarter-integer x,
and so on down to the desired dilation. This determines the accuracy of the function
approximation.
An example for 'IjJ is the Haar function
1 o:sx<~
-1 ~:Sx<1
o otherwise
O:Sx<1
1>(x) = { ~ otherwise
The functions
'ljJm,n(x) := 2- T 'IjJ(2- mx - n), m, n E Z
form a basis in the Hilbert space L2(R).
158 Chapter 8. Recursion
Wavelet CO Cl C2 C3 C4 C5
Haar 1.0 1.0
Daubechies-4 !(1+V3) !(3+V3) !(3-V3) !(I-V3)
Daubechies-6 0.332671 0.806891 0.459877 -0.135011 -0.085441 0.035226
operates on a finite set on N input data, where N is a power of two; this value will be
referred to as the input block size. These data are passed through two convolution
functions, each of which creates an output stream that is half the length of the
original input. These convolution functions are filters, one half of the output is
produced by the "low-pass" filter
. N
t = 0,1, ... , ~ - 1
. N
z = 0,1, ... ,~ - 1
where N is the input block size, Cj are the coefficients, f is the input function, and
a and b are the output functions. In the case of the lattice filter, the low- and
high-pass outputs are usually referred to as the odd and even outputs, respectively.
In many situations, the odd or low-pass output contains most of the information
content of the original input signal. The even, or high-pass output contains the
difference between the true input and the value of the reconstructed input if it were
to be reconstructed from only the information given in the odd output. In general,
higher order wavelets (i.e. those with more nonzero coefficients) tend to put more
information into the odd output, and less into the even output. If the average
amplitude of the even output is low enough, then the even half of the signal may
be discarded without greatly affecting the quality of the reconstructed signal. An
important step in wavelet-based data compression is finding wavelet functions which
cause the even terms to be nearly zero.
8.4 Wavelets and Recursion 159
The Haar wavelet represents a simple interpolation scheme. After passing these data
through the filter functions, the output of the low-pass filter consists of the average
of every two samples, and the output of the high-pass filter consists of the difference
of every two samples. The high-pass filter contains less information than the low
pass output. If the signal is reconstructed by an inverse low-pass filter of the form
N/2-1
then the result is a duplication of each entry from the low-pass filter output. This is
a wavelet reconstruction with 2 x data compression. Since the perfect reconstruction
is a sum of the inverse low-pass and inverse high-pass filters, the output of the inverse
high-pass filter can be calculated. This is the result of the inverse high-pass filter
function
where each f is the vector with elements Ii- Using other coefficients and other orders
of wavelets yields similar results, except that the outputs are not exactly averages
and differences, as in the case using the Haar coefficients.
#include <iostream.h>
#include <math.h>
void mainO
{
const double pi = 3.14159;
int n = 16; II n must be a power of 2
double* f = new double[n];
II input signal
int k;
for(k=O; k < n; k++)
f[k] = sin(2.0*pi*(k+1)/n);
160 Chapter 8. Recursion
int i, j;
for(i=O; i < n/2; i++)
{
for(j=O; j < n; j++)
{
if(2*i-j+l < 0) a[iJ += c[2*i-j+l+nJ*f[jJ;
else a[iJ += c[2*i-j+1J*f[jJ;
}
a[i] = 0.5*a[iJ;
}
/linverse
double* fL = new double[n);
double* fH = new double[nJ;
To begin, we take as variable the letters n,XI,X2, .... We write x for (Xl, ... ,Xk).
Next we list the basic, incontrovertibly computable functions which we use as build-
ing blocks for all others.
These functions are called initial functions. We sometimes call the projections pick-
out functions, and PI the identity function, written id(x) = x.
Next, we specify the ways we allow new functions to be defined from ones we already
have.
f(O) d
f(n + 1) h(f(n), n)
For functions of two or more variables, if 9 and h are already defined then f is given
by primitive recursion on h with basis 9 as
f(O,x) d
f(n + l,x) h(f(n, x), n, x)
The reason we allow both n and x as well as f(n, x) to appear in h is that we may
wish to keep track of both the stage we are at and the input.
8.5 Primitive Recursive Functions 163
The primitive recursive functions are exactly those which are either an initial func-
tion or can be obtained from the initial functions by a finite number of applications
of the basic operations.
product(O, X2) o
product(XI + 1, X2) sum(product(XI' X2), X2)
Example. The predecessor of X is primitive recursive defined by
pred(O) 0
pred(x + 1) X
Example. The function
Xl :::: X2
otherwise
is primitive recursive.
minus(O, X2) o
minus(xi + 1,x2) S(minus(xI' X2))
Example. We can now describe the mod function
mod(n m) = { n n<m
, mod(n - m,m) otherwise
mod(O, m) o
mod(n+ I,m) k(mod(n, m), n, m)
3. C is closed under composition, if f comes from 9 and hI, ... ,hr by composition
(f(x) = g(hl(x), ... , hr(x))) and g, hI, ... , hr E C then f E C.
Theorem. The set of all primitive recursive functions is primitive recursively closed.
[48,63,97]
Theorem. Any primitive recursively closed set contains every primitive recursive
function. [48, 63, 97]
• If
is Jl-recursive then
Jlf: N~ ----> No
defined by
In other words Jl acts as a minimalization operator in the sense that it maps to the
minimum x such that f(x, y) = o.
8.6 Backtracking
A common approach to finding a solution, when no simple solution algorithm is
avaialable, is trial and error. Suppose a configuration is built up by a number of
well defined steps and then tested to see if it is a solution. If it is not a solution we
return to an earlier step and try a different option. Backtracking is the technique of
returning to an earlier stage in a solution process to make a different choice in the
attempt to find solutions.
Example. 8-Queens problem. A chessboard is 8 columns wide and 8 rows high. The
8 queens problem requires us to place 8 queens on the chessboard so that no queen
is attacking another. A queen attacks another if it is on the same row, column or
diagonal as the other queen. An example solution is
Q
Q
Q
Q
Q
Q
Q
Q
The recursive solution is to place a queen on a position which is not attacked, row
by row. If there is no position available for a queen, the algorithm return to the
previous row and moves the queen to the next position which is not attacked. To
check every possible placement of the 8 queens includes configurations which are
obviously incorrect and will also take a long time. This algorithm uses a technique
called pruning to reduce the number of configurations to test. We can form a tree
according to the square we place each queen in. The root of the tree corresponds to
an empty board. The branches from the root correspond to each possible placement
of a queen. By rejecting certain options earlier, the corresponding branches and
entire sub-trees are removed from consideration.
166 Chapter 8. Recursion
II queens.cpp
#include <iostream>
}
}
void main(void)
{
char board [8] [8] ;
int i,j;
for(i=O;i<8;i++)
for(j=O; j<8; j++)
board[i] [j]=SPACE;
queens(board,O);
}
Q#######
#####Q##
#######Q
##Q#####
######Q#
###Q####
#Q######
####Q###
168 Chapter 8. Recursion
The CALL routine pushes the specified registers and return address onto the stack,
and then transfers control to the specified function.
RETURN has two arguments.
The RETURN routine restores the specified registers from the stack and then returns
control to the return address on the stack. In shortened form
RETURN (registers)
8.7 Stacks and Recursion Mechanisms 169
n = 0, 1,2, ...
n = 0,1,2, ...
II fibonacci.cpp
#include <iostream>
void main(void)
{
int i;
unsigned long FO = 1,F1 1;
unsigned long temp;
for(i=O;i<10;i++)
{
cout « FO « " ";
temp = F1;
F1 = FO + F1;
FO = temp;
}
cout « endl;
}
170 Chapter 8. Recursion
#include <iostream>
char A,B,C;
unsigned long n;
void hanoiO
{
i f (n==1) cout « A « " -> " « B « endl;
else
{
n--; C = BAC; B = BAC; C = BAC; II swap B and C
hanoi 0 ;
C = BAC; B = BAC; C = BAC; II swap B and C back
cout « A « " -> " « B « endl;
C = AAC; A = AAC; C = AAC; II swap A and C
hanoi 0 ;
n++; C = AAC; A = AAC; C = AAC; II swap A and C back
}
}
void main(void)
{
A= 'A'; B= 'B'; C= 'C'; n=l;
cout « "Tower of Hanoi with 1 disc:" « endl;
hanoi 0 ;
A = 'A'; B = 'B'; C = 'C'; n = 2;
cout « "Tower of Hanoi with 2 discs:" « endl;
hanoi 0 ;
A = 'A'; B = 'B'; C = 'C'; n = 3;
cout « "Tower of Hanoi with 3 discs:" « endl;
hanoi 0 ;
}
Chapter 9
Abstract Data Types
9.1 Introduction
Programming languages such as C++ and Java have built in data types (so-called
basic data types or primitive data types) such as integers that represent information
and have operations that can be performed on them (such as multiplication and
addition). For example the built in basic data types in C++ are short, int, long,
float, double and char.
An abstract data type (ADT) consists of data and the operations which can be per-
formed on it. Generally the data is represented with standard data types of the
language in which it is implemented but can also include other abstract data types.
The operations defined on the ADT provide access to the information and manipu-
lation of the data without knowing the implementation of the ADT. The abstract
data type is implemented using constructors, data fields and methods (functions).
Information hiding is when ADT data is inaccessable (no operation can retrieve the
data). Encapsulation refers to the hiding of inner details (such as implementation).
In C++ the concepts of public, private and protected data fields and methods
are important in the implementation of an ADT. Public members of an ADT are
always accesible. Private members are only accesible by the ADT itself and pro-
tected members are only accesible by the ADT and any derived ADT's. A derived
ADT may override the accessibility of members by forcing all inherited members to
a specified level if they are more accessible.
For example the Standard Template Library [2] in C++ introduces many ADT's
such as Vector, list, staCk, queue and set. Standard C++ now includes the ab-
stract data type string. Symbolic C++ [169] includes the template classes Rational,
Complex, Quaternion, Vector, Matrix, Polynomial and Sum. Operations such as
addition and multiplication, determinant, trace and inverse of matrices are included
in the matrix class. An instance of Matrix could then be used in the same way
that integers are used without knowing the internal differences.
A linked list is useful in the implementations of dynamic arrays, stacks, strings and
sets. The linked list is the basic ADT in some languages, for example LISP. LISP
stands for List Processing. All the program instructions in LISP operate on lists.
Linked lists are most useful in environments with dynamic memory allocation. With
dynamic memory allocation dynamic arrays can grow and shrink with less cost than
in a static memory allocation environment. Linked lists are also useful to manage
dynamic memory environments. Diagrammatically a linear linked list can be viewed
as follows.
The list consists of data elements. Each data element has an associated link to the
next item in the list. The last item in the list has no link. In C++ we can implement
this by using a null pointer. The first element of the list is called the head, the last
element is called the tail.
Extensions to the ADT include double-linked lists where links exist for the next data
and the previous data allowing easier access to data earlier in the list, and sorted
linked lists. We use a template class definition so that the linked list can store any
kind of data without having to change or reimplement any functionality.
The following class is a C++ implementation of the ADT list. It has methods
for creating and destroying a list, copying one list to another (assignment oper-
ator), adding items to and removing items from the list (additem, insert item,
removeitem), merging lists (operators for addition), iteration (first, next, last,
position, data) and indexing elements (operator[]).
9.2 Linked List 173
II list.h
#ifndef LIST_HEADER
#define LIST_HEADER
{
head=new listitem<T>;
head->data=li->data;
head->next=NULL;
current=head;
li=li ->next;
}
while (li !=NULL)
{
current->next=new listitem<T>;
current=current->next;
current->data=li->data;
current->next=NULL;
li=li->next;
}
current=head;
}
size=losize;
while(head!=NULL)
{
current=head;
head=head->next;
delete current;
}
head=current=NULL;
i f (li !=NULL)
{
head=new listitem<T>;
9.2 Linked List 175
head->data=li->data;
head->next=NULL;
current=head;
li=li->next;
}
while (Ii! =NULL)
{
current->next=new listitem<T>;
current=current->next;
current->data=li->data;
current->next=NULL;
li=li ->next;
}
current=head;
return *this;
}
current=head;
}
else
{
li=current->next;
current->next=new listitem<T>;
current->next->data=t;
current->next->next=li;
}
size++;
}
current=li;
size--;
}
}
return TO;
}
#endif
9.2 Linked List 179
Now the ADT is used in an example program to illustrate the available operations.
/ / listeg. cpp
#include <iostream>
#include 11ist.h"
void main(void)
{
int i;
list<int> 11;
11.additem(1);
11.additem(2) ;
11.additem(3) ;
11.additem(5);
11.additem(8) ;
list<int> 12(11),13;
13=12;
11.nextO;
11.insertitem(13); 11.insertitem(21,5);
11.firstO;
cout«"11: ";
do cout«l1.dataO«" ";
while(11.next());
cout«endl;
cout«"12: ";
do cout«12.dataO«" ";
while(12.next());
cout«endl ;
cout«"13: ";
do cout«13.dataO«" ";
while(13.next());
cout«endl;
list<int> 14=11+12;
cout«"14: ";
do cout«14.dataO«" ";
while(14.next());
cout«endl;
14+=13;
14.firstO;
cout«"14: ";
do cout«14.dataO«" ";
while(14.next());
cout«endl;
14.firstO;
cout«"The first item of 14 is "«14.dataO«endl;
180 Chapter 9. Abstract Data Types
14.1astO;
cout«"The last item of 14 is "«14.dataO«end1;
cout«"The fourth item of 14 is "«14[3] «end1;
14.position(3);
14.removeitemO;
14.removeitem(7);
14.firstO;
cout«"14: ";
do
cout«14.dataO«" ";
whi1e(14.next(»;
cout«end1;
cout«"Size of 11 is "«l1.getsize0 «end1;
cout«"Size of 12 is "«12.getsizeO«end1;
cout«"Size of 13 is "«13.getsizeO«end1;
cout«"Size of 14 is "«14.getsize0 «end1;
}
11: 1 2 13 3 5 8 21
12: 1 2 3 5 8
13: 1 2 3 5 8
14: 1 2 13 3 5 8 21 1 2 3 5 8
14: 1 2 13 3 5 8 21 1 2 3 5 8 1 2 3 5 8
The first item of 14 is 1
The last item of 14 is 8
The fourth item of 14 is 3
14: 1 2 13 5 8 21 1 3 5 8 1 2 3 5 8
Size of 11 is 7
Size of 12 is 5
Size of 13 is 5
Size of 14 is 15
9.2 Linked List 181
The linked list can also be viewed as a recursive structure with the first element
followed by a linked list. This view can make the implementation of many methods
easier.
Suppose an item is to be inserted into the sorted list. The item either comes before
the head of the list (which can be easily implemented) or after the head in which
case the item actually has to be inserted in a list with the second element of the list
as the head. Similarly to delete an item either the head must be removed or the
item must be deleted from a list with the second element of the list as the head.
In each of the above cases the rest of the list is a list itself and the method can be
applied recursively. Usually the simplest case for each recursive method is for the
empty list.
Special care must be taken when destroying the list. Simply deleting the head of
the list will create a memory leak. The remaining list must be destroyed before the
head can be destroyed. In the implementation the head data is part of the class
data and so the memory leak is avoided.
The data members of the class are few since most of the data management is done
by the recursive structure. The data member head stores the data for the node in
the linked list. Since a linked list can be empty, a node with the data member empty
set to one represents an empty list. A pointer tail provides access to the rest of
the list.
182 Chapter 9. Abstract Data Types
/ / rlist.h
#ifndef RLIST_HEADER
#define RLIST_HEADER
#include <assert.h>
#endif
9.2 Linked List 185
Now the ADT is used in an example program to illustrate the available operations.
II rlisteg.cpp
#include <iostream>
#include "rlist.h"
int main(void)
{
RList<int> L;
int i;
for(i=l; i<=8; i++)
L. Insert(i) ;
RList<int>* LX = &L;
cout « "The initial list is: " « endl;
RList<int>* R = L.Reverse(&L);
RList<int>* LP = R;
L.Delete(S);
LP = &:L;
while(!LP -> Empty(»
{
cout « LP -> Head() « ' ';
LP = LP -> Tail();
}
cout « endl;
return 0;
}
8 7 6 5 432 1
9.3 Stack
The stack is a LIFO (last in first out structure). The last value stored (and not yet
retrieved) is the only value that can be retrieved. The traditional analogy is a stack
of plates where only the top plate can be removed, and a plate can only be placed
on top of the stack. Due to the dynamic nature of a stack the implementation is
based on the linked list.
Since we have already created a list ADT which can grow or shrink in size as needed
we can reduce the amount of work needed to implement the stack ADT. The list
enables access to any element in the structure, the stack can be viewed as a restricted
list with access to only the tail. The operation of putting data on the stack is referred
to as "pushing" data onto the stack, and the operation of retrieving data from the
stack is referred to as "popping" data off the stack. The stack is an important
structure for implementing recursion.
Access to stack
K
C
A
T
S
We implement the stack as a class in C++. The class has methods for creating
a stack using an empty list (the constructor), copying one stack to another (the
assignment operator), pushing data onto the stack (push which simply adds the data
to the end of the list) and popping data off the stack (pop which simply removes the
last element of the list). No destructor is needed since the list destructor is called
automatically.
188 Chapter 9. Abstract Data Types
II stack.h
#ifndef STACK_HEADER
#define STACK_HEADER
#include "list.h"
using namespace std;
#endif
9.3 Stack 189
Now the ADT is used in an example program to illustrate the available operations.
II stackeg.cpp
#include <iostream>
#include "stack.h"
void main(void)
{
int i;
stack<int> sl;
s1. push(l) ;
s1.push(2) ;
s1.push(3) ;
sl.push(5);
s1. push (7) ;
s1. push(l1) ;
stack<int> s2(sl);
stack<int> s3;
s3=sl;
stack<int> s4;
cout«"Size of sl is "«s1.getsizeO«endl;
cout«"sl: ";
while(s1.getsizeO>O) {cout«(i=s1.popO)«" ";s4.push(i);}
cout«endl«"s2: ";
while(s2.getsizeO>O) cout«s2.popO«" ";
cout«endl«"s3: ";
while(s3.getsizeO>O) cout«s3.popO«" ";
cout«endl«"s4: ";
while(s4.getsizeO>O) cout«s4.popO«" ";
cout«endl;
}
Size of 51 i5 6
51: 11 7 5 3 2 1
52: 11 7 5 3 2 1
s3: 11 7 5 3 2 1
54: 1 2 3 5 7 11
190 Chapter 9. Abstract Data Types
9.4 Tree
A tree is a branching structure. It has a starting node called a root node. An n-ary
tree can have up to n branches from each node to other nodes. A binary tree is a
2-ary tree. Every node in a tree is the root of a subtree. A tree is noncyclic, in other
words there is only one path between any two nodes in a tree.
In general an n-ary tree has a search time O( logns) where s is the number of elements
in the tree. For a linear structure such as the linked list the search time is O(n).
Diagrammatically a binary tree can be viewed as follows.
We implement a binary tree as a class in C++. The class has methods for creating
a new binary tree, destroying a binary tree, copying one binary tree to another
(assignment operator), adding an item and removing an item from the tree (addi tern
and removeitem), determining if an item is present in the tree (find) and iterating
through the tree (first, last, next and previous).
9.4 Tree 191
II tree.h
#ifndef TREE_HEADER
#define TREE_HEADER
#include "list.h"
#include "stack.h"
int i=O;
list<treenode<T>*> *1;
if (tn==NULL)
{
tn=new treenode<T>;
tn->data=t;
return;
}
if(t<tn->data) l=&(tn->leftchildren);
else l=&(tn->rightchildren);
if (l->getsize()==O)
{
l->additem(new treenode<T»;
(*1) [O]->data=t;
return;
}
else
{
while«i<l->getsize(»&&«*l) [i]->data<t» i++;
if«l->getsize()<limit-l)&&
(tn->leftchildren.getsize()+tn->rightchildren.getsize()<limit»
{
l->insertitem(new treenode<T>,i);
(*1) [i]->data=t;
}
else additem(t,(*l) [i]);
}
}
current=current->rightchildren->data();
}
}
Now the ADT is used in an example program to illustrate the available operations.
II treeeg.cpp
#include <iostream>
#include "tree.h"
void main(void)
{
int ii
Tree<int> ti
t.insert(4);t.insert(1);t.insert(2)it.insert(7)it.insert(5)i
Tree<int> t2(t);
Tree<int> t3i
t3=ti
if(t2==t) cout « It2==t" « endli
if(t3==t) cout « It3==t" « endli
for(i=Oii<t.size()ii++)
cout « "t[" « i « II] = II « t[i] « endli
cout«endli
}
t2==t
t3==t
t[O] 1
t[1] 2
t [2] 4
t [3] 5
t[4] 7
Chapter 10
Error Detection and Correction
10.1 Introduction
Due to external influences and the imperfection of physical devices, errors can occur
in data representation and data transmission. This chapter examines some methods
of limiting the effect of errors in data representation and transmission. Error control
coding should protect digital data against errors which occur during transmission
over a noisy communication channel or during storage in an unreliable memory.
The last decade has been characterized not only by an exceptional increase in data
transmission and storage requirements, but also by rapid developments in micro-
electronics providing us with both a need for, and the possibility to, implement
sophisticated algorithms for error control.
The data representation examined here is strings of bits (binary strings, binary
sequences)
as defined before. Therefore an error is a bit flip, i.e. we have a; for some i.
We discuss single bit error detection in the form of parity checks, Hamming codes
for single bit error correction and finally the noiseless coding theorem which de-
scribes the limitations of coding systems and the requirements on codes to reduce the
probability of error. Another commonly used error detection scheme, the weighted
checksum, is also discussed.
The result of the parity function is a single bit stored in an extra bit an, the bit is
stored or transmitted with the data. If an odd number of errors occur the result of
the parity function over an-la n-2 ... ao will not concur with an. The parity of the
bit string must be calculated when the data is sent or stored, and when the data is
received or retrieved. The bit reserved for the parity information can take the values
o or 1. To ensure the meaning of the bit is consistent we introduce the following
definitions.
The odd-parity function sets an such that anan-l ... ao has an odd number of Is.
The even-parity function sets an such that anan-l ... ao has an even number of Is.
an is called the parity bit. Either parity function can be used, but consistency must
be ensured so that results are meaningful.
Example. Consider the bit string 1101. Podd(1101) = O. The stored string is then
01101. Suppose an error occurs giving 01001 then Podd (lOO1) = 1 and an error is
detected. Suppose an error occurs in the parity bit giving 11101. Podd (l101) = 0
and once again an error is detected. If two errors occur, for example 11001, then
Podd (lOOl) = 1 and the errors are not detected. ...
10.3 Hamming Codes 199
Definition. The Hamming distance dH of two bit strings an -la n -2 ... ao and
bn - 1 bn - 2 . .. bo of the same length n is the number of positions that differ, formally
n-l
dH(a n -la n -2 ... ao,b n - 1bn - 2 ... bo) := L(a; - b;)2.
;=0
• dH(a, b) ~ 0
• dH(a, b) = 0 iff a = b
• dH(a, b) = dH(b, a)
The first three properties are easy to see. The last property follows from the fact
that
II hdist.cpp
#include <iostream.h>
void main(void)
{
unsigned long x = 186; 1110111010b
unsigned long y = 117; 1I01110101b
int d.H = 0;
200 Chapter 10. Error Detection and Correction
The Hamming distance can be used as a tool for error correction. For a set C c En,
of allowable bit strings for data representation, we define the minimum distance
t5(C):= min{dH(a,b)}.
a,bEG
It is then possible to detect up to t5(C) errors in a bit string from C. The minimum
distance principle for error correction is to select c E C for a bit string x E En such
that dH(c,x) is a minimum.
Theorem. If the minimum distance principle for error correction is used and
t5(C) ~ 2e+ 1
then up to e errors in a bit string from C can be corrected.
Proof Let ae be the bit string a E C with up to e errors. Let b E C and b =f. a then
dH(a,a e ) + dH(a e , b) > dH(a,b)
e + dH(a e , b) > t5(C)
> 2e + 1.
For t5(C) = 3 only one error can be corrected. C is called a code and the elements
•
of C are called code words.
Theorem. An upper bound of the number s of code words of length n which can
correct up to e errors if the minimum distance principle is used, is given by
s< ( ).
- E~=o 7
Proof Since the codewords can correct up to e errors we have dH(a, b) > e for any
two codewords a and b. We consider the number of binary sequences of length n
10.3 Hamming Codes 201
For e = 1 we find
•
2n 2n
s< =--.
- "e_ (n)
L..t_O t
1+ n
A Hamming code is the best code that can detect and correct one error in the sense
that it contains the most code words. Let Hr be an r x (2r -1) matrix with entries
hi,j E {O, I}, no two columns the same and no zero columns.
(~ ~)
1
H2 = 0
H,~ U 0 0 1 1 1
1 1 0 0 1
0 1 0 1 0 n
202 Chapter 10. Error Detection and Correction
and
The Hamming code CHr has ICHrl = 22r - r - 1 code words. Since addition is modulo
2, we find
0 1 1 0 0 0 0 1
0 0 0 1 0 0 1 1
0 1 0 0 1 1 0 1
CHa = 0 1 0 0 0 1 1 1
0 0 0 1 1 0 0 1
0 1 1 0 1 0 1 1
0 0 1 1 0 1 0 1
1 0 1 1 1 0 0 1
0 0 1 1 1 1 1 0
0 0 1 0 0 1 1 1
1 1 0 0 1 0 1 0
1 1 0 1 0 0 1 1
0 1 0 1 0 1 0 0
0 1 0 0 1 1 0 1
10.3 Hamming Codes 203
o
1
o
H3 1
o
1
1
So it is not a valid code. Assuming at most one error the code word 1101010 must
have been 0101010 E CH3 . ..
In the previous example the result of the test was nonzero. The last row determines
the even parity of the bits in positions 1, 3, 5 and 7 where the first bit is numbered
as 1. The second row determines the even parity of bits 2, 3, 6 and 7 and the first
bits 4, 5, 6, 7. For the last row the first bit of all the positions listed is 1. For the
second row the second bit in the positions listed is 1. For the first row the third
bit in the positions listed is L Thus if a bit string fails the test, the resulting bit
string can be used to determine the position of the error. This is possible because
the columns of H3 are numerically ascending.
Example. From the above example the test result was 111. The result indicates the
error is in the last position, giving the desired code 0101010. ..
For all a, b, c E C
eaEElO=a
e a EEl a = 0, therefore -a = a
•
204 Chapter 10. Error Detection and Correction
Given a column vector a = (al a2 ... an)T, and a t x n weight matrix W, the
column coded version of a is
Let
H= (W -It).
An encoded vector a" containing valid data is guaranteed to satisfy the equation
H a" = 0, which is seen as follows
(WIn - ItW) a
o
Matrices can be encoded in a similar manner. Each data matrix A has a set of
column, row and full weighted checksum matrices Ac, A,., and Af .
Af = ( WAA WAW
Awr) T
n
and LP(ai) = 1.
i=O
{ I l l Ill}
4'16'16'4'8'4 .
3 1 1 1 2 1
Es(A) - - iog 2 - - - log2 - - - iog2 -
4 4 8 8 16 16
2.375
It takes 3 bits to specify which message was received. The value 2.375 can be inter-
preted as the average number of bits needed to communicate this information. This
can be achieved by assigning shorter codes to those messages of higher probability
and longer messages to those of lower probability.
206 Chapter 10. Error Detection and Correction
Let, = If! denote the number of elements in r. The bounds of, are given by
1 2 L p(a) 2 LT N(Es(A)+8) = ,TN(Es(A)+8)
aEr aEr
and
1- E:S; LP(a) :s; LT N(Es(A)-8) = ,2- N(Es(AH).
aEr aEr
Thus we find
10.5 Noiseless Coding Theorem 207
and each element of r can be encoded uniquely in a bit string of length Es(A)+
Ii. The other sequences are encoded as bit strings of length Es(A) + Ii but
will not be correctly decoded. Since these sequences are not in r they have
probability less than f.
2. If Es(A) - Ii bits are available to encode messages from A then there exists
No(li, f) such that for all N > No sequences of messages from A of length N
are coded into binary sequences with probability of error greater than 1 - f.
Let A, () > 0, >. < Ii. Then 2 N (Es(A)-<l) sequences of messages from A can be
encoded uniquely. The rest will not be correctly decoded. There exists No
such that for N > No,
p(a) :::; TN(Es(A)-A)
for a E rand
Let Pc denote the probability that the sequence is correctly decoded. Then
Pc < 2 N (Es(A)-<l)2- N (Es(A)-A)
II hamming.cpp
#include <iostream>
void hammingcode(int x)
{
int size = (l«x)-l;
10.6 Example Programs 209
void main(void)
{
cout « "Hamming codes of length 3:" « endl;
hammingcode(2);
cout « "Hamming codes of length 7:" « endl;
hammingcode(3);
}
210 Chapter 10. Error Detection and Correction
The following C++ program implements a weighted checksum. The function encode
takes a matrix (2-dimensional array) and a vector (I-dimensional array) as argu-
ments and calculates the vector with checksum information using matrix multipli-
cation. The function checksum takes a matrix and a vector as arguments. It de-
termines the matrix for the checksum test, and determines if matrix multiplication
with the supplied vector gives the zero vector (the checksum test is satisfied).
II checksum.cpp
#include <iostream.h>
void main(void)
{
int data[7] = {3,8,1,7,9,200,5};
int datac[10];
int **W = new int*[3];
int i;
212 Chapter 10. Error Detection and Correction
for(i=0;i<3;i++)
WEi] = new int[7];
encode(7,3,W,data,datac);
if (checksum(7,3,W,datac»
cout « "Checksum satisfied." «endl;
else
cout « "Checksum failed." «endl;
i = datac[4];
datac[4] = 0;
if(checksum(7,3,W,datac»
cout « "Checksum satisfied." «endl;
else
cout « "Checksum failed." «endl;
datac[ 4] = i;
i = datac[9];
datac [9] = 0;
if (checksum(7,3,W,datac»
cout « "Checksum satisfied." «endl;
else
cout « "Checksum failed." «endl;
for(i=0;i<3;i++)
delete W[iJ ;
delete W;
}
10.6 Example Programs 213
void update(byte[] b)
in class CRC32 is used to update the CRC-32 calculation when the bytes in the byte
array are added to the data used to calculate the checksum. The method
byte[] getBytes()
in class String is used to provide the data for the calculation. The method
void resetO
in clas CRC32 resets the calculation so that the CRC-32 checksum can be calculated
with new data. The method
is used to get the value of the checksum for the given data. If the value is not the
expected value then the checksum indicates an error.
II Cksum.java
class Cksum
{
public static void main(String[] args)
{
long csum;
java.util.zip.CRC32 code;
String data = "Checksum example";
String output;
code. reset 0 ;
data = "Ch-cksum exmaple";
code.update(data.getBytes());
if(csum == code.getValue(»
System.out.println("Checksum satisfied.");
else
System.out.println("Checksum failed.");
}
}
11.1 Introduction
Cryptology is the science which is concerned with methods of providing secure stor-
age and transport of information. Cryptography can be defined as the area within
cryptology which is concerned with techniques based on a secret key for concealing
or enciphering data. Only someone who has access to the key is capable of decipher-
ing the encrypted information. In principle this is impossible for anyone else to do.
Cryptanalysis is the area within cryptology which is concerned with techniques for
deciphering encrypted data without prior knowledge of which key has been used.
Suppose A (the transmitter, normally called Alice) wishes to send a message enci-
phered to B (the receiver, normally called Bob). Often the original text is simply
denoted by M, and the encrypted message by C. A possible method is for A to use a
secret key K for encrypting the message M to C, which can then be transmitted and
decrypted by B (assuming B possesses the key K). We denote by C = EK(M) the
message M encrypted using the key K, and M = DK (C) the message C decrypted
using the key K. We assume that an attacker (normally called Eve) can easily
read any communication between Alice and Bob. The communication method must
attempt to send the message in a form which Eve cannot understand and possibly
also include authentication of the transmitter and receiver.
II transpose.cpp
#inc1ude <iostream>
#inc1ude <string>
len = m.1ength();
if(len%l) return 0;
for(i=O;i < 1en;i++)
{
temp[i%l] = m[1*(i/1)+p[i%1]];
if«i%l) == 1-1)
for(j=i-1+1;j < i+1;j++) m[j] = temp[j%l];
}
de1ete[] temp;
return 1;
}
void main(void)
11.2 Classical Cypher Systems 217
{
string m = "A sample message";
int pl[2] = {l,O}, pli[2] = {l,O};
int p2[4] = {3,1,O,2}, p2i[4] = {2,1,3,O};
int p3[S] = {5,1,7,O,2,3,4,6}, p3i[S] = {3,1,4,5,6,O,7,2};
cout « "m = " « m « endl;
transpose(m,pl,2);
cout « "Enciphering musing pl = " « m « endl;
transpose(m,pli,2);
cout « "Deciphered using pli = " « m « endl;
transpose(m,p2,4);
cout « "Enciphering musing p2 = " « m « endl;
transpose(m,p2i,4);
cout « "Deciphered using p2i = " « m « endl;
transpose(m,p3,S);
cout « "Enciphering musing p3 = " « m « endl;
transpose(m,p3i,S);
cout « "Deciphered using p3i = " « m « endl;
}
m = A sample message
Enciphering m using pi Aaspmelm seaseg
Deciphered using pii = A sample message
Enciphering musing p2 a Asepmlsm eeasg
Deciphered using p2i = A sample message
Enciphering musing p3 = p eAsamlame essg
Deciphered using p3i = A sample message
A keyword may be provided with the message to derive the permutation. For
example the permutation may be specified by arranging the letters of the first word
in alphabetical order. For example if the reference word is "word" and is placed at
the beginning of the message as "dowr" the permutation can be inferred to be p2
in the above example, and the rest of the message can be deciphered.
In this case the permutation serves as the key. There are N! permutations of length
N. The identity permutation is not of any use so the total number of useful keys
are N! - 1.
218 Chapter 11. Cryptography
Example. In this example the function substitute takes the message m to encipher,
and the number n by which to shift the alphabet. The substitution is only applied
to the letters 'A'-'Z' and 'a'-'z'.
II substitute.cpp
#include <iostream>
#include <string>
I = m.lengthO;
while(n < 0) n += 26;
for(i=O;i < l;i++)
if((m[i] >= 'A')&&(m[i] <= '2'))
m[i] = (m[i]-'A'+n)%26+'A';
else if ( (m [i) >= , a' ) && (m [i] <= , z') )
m[i] = (m[i]-'a'+n)%26+'a';
}
void main(void)
{
string m = "A sample message";
cout « "m = " « m « endl;
substitute(m,l);
cout « "Caesar cipher with n=l "« m « endl;
substitute(m,-l);
substitute(m,-l);
cout « "Caesar cipher with n=-l "« m « endl;
substitute(m,l);
substitute(m,10);
cout « "Caesar cipher with n=10 "« m « endl;
substitute(m,-10);
cout « "m = " « m « endl;
}
m = A sample message
Caesar cipher with n=l = B tbnqmf nfttbhf
Caesar cipher with n=-l = Z rzlokd ldrrzfd
Caesar cipher with n=10 = K ckwzvo wocckqo
m = A sample message
If each alphabet is viewed as a key then there are only 26 keys. The first alphabet
is the one we already use, so 25 useful keys are left. If permutations of the alphabet
are used instead of only shifts a total of 26! - 1 useful keys are available.
A word can be used for a key to identify for each symbol to encode which row of the
Vigenere table to use. For example, the word "CIPHER" indicates that the third,
ninth, sixteenth, eighth, fifth and eighteenth rows are to be used for enciphering.
Thus the symbol at position i is encoded using the row identified by the symbol in
the i mod l position of the key word, where l is the number of symbols in the key
word.
Example. We modify the previous program to use the Vigenere table and a keyword.
The function vigenere takes three arguments. The argument decipher determines
if the function enciphers or deciphers the message. The argument m is the message
to be enciphered, and k is used as the index for the row in the Vigenere table.
II vigenere.cpp
#include <iostream>
#include <string>
n = k.lengthO;
1 = m.lengthO;
for(i=O;i < l;i++)
if«m[i] >= 'A')&&(m[i] <= 'Z'))
if (decipher)
m[i] = (m[i]-'A'+26-(k[i%n]-'A'))%26+'A';
220 Chapter 11. Cryptography
else
m[i] = (m[i]-'A'+k[i%n]-'A')%26+'A';
else if«m[i]>='a')&&(m[i]<='z'))
if (decipher)
m[i] (m[i]-'a'+26-(k[i%n]-'A'))%26+'a';
else
m[i] (m[i]-'a'+k[i%n]-'A')%26+'a';
}
void main(void)
{
string m = "A sample message";
string k = "CIPHER";
cout « "m = " « m « endl;
vigenere(m,"CIPHER",O);
cout « "Cipher with Vigenere table and keyword CIPHER = "
« m « endl;
vigenere (m, "CIPHER", 1) ;
cout « "m = " « m « endl;
}
m = A sample message
Cipher with Vigenere table and keyword CIPHER C hhqgnm tijuivl
m = A sample message
11.3 Public Key Cryptography 221
The RSA system is a well known public key system. It uses the fact that the product
of two prime numbers is easy to calculate, but to factor the product into the two
prime numbers is difficult. First two prime numbers p and q are generated, and the
product n = pq calculated. Then e is determined as follows.
Suppose we have a message with non-negative integer value M. The ciphered mes-
sage is represented by
C = Me mod n.
The message is deciphered as follows
M=C d mod n.
Definition. Euler's totient function cp(n) is the number of positive integers smaller
than n and relatively prime to n. For a prime number p we have cp(p) = p - 1. Thus
for cp( n) we find
cp(n) = cp(p)cp(q) = (p - 1)(q - 1)
where n = pq as given above.
The theorem is called Euler's theorem. For the proof we refer to [171]. The theorem
is of interest because it can be used to prove that encipherment and decipherment
using the RSA system are inverse operations. In other words if we have a message
M enciphered
C=Me mod n
and deciphered according to
M' = Cd mod n
with ed = 1 mod n then M' = M. Again we refer to [171].
The public key in this system is (e, n) and the private key d. The method can be
improved to include verification of the sender and remove transport of the private
222 Chapter 11. Cryptography
key from the sender to the receiver. Suppose the sender has a public key (el' nl) and
a private key d1 . Similarly suppose the receiver has public key (e2' n2) and private
key d2 . Let the message to be encoded be M. Thus an encoded message would be
In other words the sender encodes the message using a private key and then using
the public key of the receiver. The receiver can decode the message using
The receiver decodes the message by first using a private key and then using the
public key of the sender. Using this method only public keys are exchanged. Since
the receiver can only decode the message using the sender's public key the message
source can be verified.
The RSA system relies on the fact that two large prime numbers p and q can be
found. It is generally quite slow to check if numbers are prime, since the obvious
method is check for any factors. Define the Jacobi symbol as follows
J(l,p) := 1
gcd(a,p) = 1
and
J(a,p) = a(p- 1l/2 mod p
for all a E {I, 2, ... ,p - I}. If p is not prime then the test will fail in more than 50%
of the cases. Every time a is successfully tested the probability that p is a prime
number increases.
First the prime numbers must be generated to implement the algorithm. To per-
form faster encryption a table of prime numbers is used. The prime numbers are
11.3 Public Key Cryptography 223
generated with the following C++ program, and then can be read by a program
which needs prime numbers. The program takes one parameter on the command
line to indicate how many prime numbers to generate. The program output is a
list of prime numbers which can be used in other C++ programs. The standard
error output stream is used to output how many prime numbers have been found.
The program output can be redirected in UNIX and Windows systems with the
command
which generates 10000 prime numbers and places them in the file primes. dat. The
header file list. h contains the implementation of the ADT list class developed
earlier.
II gprime.cpp
#include <iostream>
#include <ctype.h>
#include <stdlib.h>
#include <math.h>
#include "list.h"
if(argc :: 1) return 1;
count: atoi(argv[1]);
1.additem(2); 1.additem(3);
cout « count « endl;
cerr « i « II \r";
i++;
}
}
for(;i < count;n+=type(2))
{
success = 1;
sn = (type)(sqrt(n)+l);
for(j=O;success&&(j<l.getsize())&&(I[j)<sn);j++)
if«n%l[j)) == type(O)) success = 0;
if (success)
{
1. addi tem (n) ;
cout « n « endl;
cerr « i « II \r" ;
i++;
}
}
cerr « endl;
return 0;
}
Similarly the program gkeys. cpp generates an array of key values using the prime
numbers generated in gprime . cpp. The RSA program can then simply use an index
to specify the key. The generation of prime numbers and keys takes a long time,
it is much faster to do the long calculations once and then just use precalculated
results in the algorithm.
II gkeys.cpp
#include <fstream>
#include <stdlib.h>
#include <time.h>
type primelist(int i)
{
type data;
int j;
ifstream primes("primes.dat");
primes» j;
for(j=O;(j<=i)&&!primes.eof()&&!primes.fail();j++)
primes » data;
primes. close 0 ;
11.3 Public Key Cryptography 225
return data;
}
while(r != type(O»
{
r = a%b;
if(r != type(O» { a = b; b = r; }
}
return b;
}
primes » maxprime;
total = int«double(maxprime)*maxprime-1)/2);
primes. close 0 ;
if(argc == 1) return 1;
count = atoi (argv [1] ) ;
if(count > total) count = total;
cout « count « endl;
srand(time(NULL»;
for(i=O;maxkeys<=count&&i<maxprime;i++)
for(j=i+1;maxkeys<count&&j<maxprime;j++)
{
type temp,temp2,p,q,n,e,d;
p = primelist(i);
q = primelist(j);
n = p*q;
temp = (p-type(l»*(q-type(l»;
d = e = type(O);
{ d = (q*temp+l)/e; p = temp; }
}
if«e != type(O»&&(d != type(O»)
{
maxkeys++;
cout « n « II II.,
cout « e « II II.,
cout « d « endl;
}
cerr « (total--) « " left to try, " « maxkeys
« " generated \r";
cerr.flushO;
}
if(maxkeys<count) cout « "Not enough keys generated.";
cerr « endl;
}
In the following program it is important to use the class Verylong [169J, which
provides a theoretically unbounded integer type, since for even small prime numbers
< 216 the calculations used in the RSA system can exceed the bounds of the data
type unsigned long depending on the underlying hardware platform. The program
performs the RSA encoding of a message using the previously generated keys. We
again use a recursive implementation for raising an integer to an integer power, this
time using Verylong and modulo arithmetic.
II rsa.cpp
#include <fstream>
#include <stdlib.h>
#include <time.h>
#include <assert.h>
#include "verylong.h"
keys» j;
for(j=O;(j<=i)&&!keys.eof()&&!keys.fail();j++)
{ keys » n; keys » e; keys » d;}
keys. close 0 ;
}
{
Verylong temp;
void main(void)
{
int i,len,maxkeys;
Verylong e,d,n;
char m[18];
Verylong mt[17];
ifstream keys ("keys. dat") ;
srand(time(NULL));
keylist(rand()'l~axkeys,n,e,d);
rsa(mt,e,n,len);
rsa(mt,d,n,len);
vltoc(mt,m,len);
Initial message ;
65,32,115,97,109,112,108,101,32,109,101,115,115,97,103,101
A sample message
Encrypted message
696340,554727,635395,510042,702669,39492,737693,78176,554727,702669,
78176,635395,635395,510042,635068,78176
Decrypted message ;
65,32,115,97,109,112,108,101,32,109,101,115,115,97,103,101
A sample message
Chapter 12
Finite State Machines
12.1 Introduction
Finite state machines [49, 67] provide a visual representation of algorithms. Algo-
rithms are implemented on a machine with a finite number of states representing
the state of the algorithm. This provides an abstract way of designing algorithms.
The chapter will only cover deterministic machines (the actions of the machines are
determined uniquely).
The reason for studying these machines is to determine what are necessary require-
ments to be able to perform arbitrary functions. Certain machines (as will be illus-
trated) cannot perform certain functions. Computer scientists are interested in the
requirements for functions to be performed and what functions can be performed.
Finite state machines can be used to understand these problems. Finite state ma-
chines are concerned with taking an input, changing between internal states, and
generating an output (which may just be the machine's final state). This describes
all computing devices. Thus in an abstract way it is possible to consider what is
computable. Any machine required to solve arbitrary problems must be described
in terms of a basic set of features and operations which determine what the machine
can do. From the description, algorithms to solve problems can be constructed.
Furthermore the basic operations must be reasonable in the sense that it must be
known that the operations can be performed in a finite amount of time. The features
of a machine can, for example, be memory and the ability to output.
In this chapter we discuss finite automata, finite automata with output and Turing
machines. It will become evident that with each improvement the machines can
compute more. We show some problems which are computable by Turing machines
and not by finite automata. Turing machines are used as the basis for deciding what
is computable and what is not.
• A finite set S of states. One state is designated as the start state. Some states
may be designated as final states.
• A finite set of transitions for each state and symbol in the alphabet. Transi-
tions are ordered triples (a, b, c) where a, bE Sand c E E, and b is uniquely
determined by a and c.
Visually the finite automaton can be represented with circles for the states and
directed edges between states for the transitions. This visual representation is called
a transition diagram. A "-" in a state denotes the start state. A "+" in a state
denotes a final state.
Finite automata can be used to define languages. The language consists of all input
words accepted by the finite automata. The automaton can only accept input words,
it cannot provide any output except for failing or accepting. The only memory the
finite automaton possesses is the state it is currently in and its transitions. This
is obviously a limitation. More effective computing machines such as push-down
automata (using a stack as memory) and Turing machines can increase the number
of computable functions.
Now we provide some examples to show some of the uses of finite automata.
12.2 Finite Automata 231
The start state is Seven and the final state Sodd. The table for the transitions is given
by Table 12.1. The transition diagram is shown in Figure 12.1.
The finite automaton only accepts bit strings which pass the odd parity test. If
Seven were selected as the final state instead of Sodd the finite automaton would only
accept bit strings which pass the even parity test. Note that it is not necessary to
label the states in the transition diagram since this does not change the operation
of the finite automata. '"
232 Chapter 12. Finite State Machines
with the start state SStart and final states SO,3 and Sl,3, and transition Table 12.2.
The transition diagram is given by Figure 12.2.
0,1
y
o ~
° °
1 1
~
/
0,1
0,1
This finite automaton only accepts code words from the Hamming code CH2 . "
12.3 Finite Automata with Output 233
• For each state the symbol from r to output when the state is entered.
The transition diagrams already introduced can be extended for Moore machines
by writing the output symbol in the circle for each state. Unlike finite automata, a
Moore machine does not accept or reject input strings, rather it processes them. If
8 is a state in a Moore machine then the notation 8 - denotes the fact that 8 is a
start state.
This machine has only the memory of which state it is in and its transitions. In this
respect it is no more powerful than a finite automaton. But its relation to practical
usage is stronger since now the machine is able to give us information about the
input provided, beyond a simple accept or fail. The ability to output is also tied to
memory. If a machine can read its own output at a later stage it may be able to
compute more. These ideas are incorporated into the Turing machines.
These machines can be coupled so that the output of one machine can be used as
input for another. A Moore machine exists for any pair of coupled Moore machines.
The set of states for such a machine is the Cartesian product of the sets of states of
each of the machines. Let 8 1 = {Sl,O, Sl,l, ... } and 8 2 = {S2,0, S2,1, ... }, where Sl,O
and S2,0 are the start states, be the states of the first and second Moore machines
respectively. Let the output for Si,j be denoted by Oi,j' The machine with states 8 1 x
8 2 , start state (Sl,O, S2,0), output 02,j for state (Sl,i, S2,j), and transitions (Sl,i, S2,j) --+
(Sl,k, S2,Z) if the Sl,i --+ Sl,k is a transition for some input for the first machine and
S2,j --+ S2,Z is a transition for input 01,k in the second machine. Thus combining
Moore machines provides no extra computing power to this class of machines.
234 Chapter 12. Finite State Machines
Example. Table 12.3 describes a simple Moore machine which performs the NOT
operation. Here
1 0
Example. This example shows how an n-bit incrementer which increments any n-bit
number modulo 2n (2n == 0 modulo 2n) can be implemented with a Moore machine.
The bits are fed in from low order to high order. For example the decimal number
11 with bit representation 1011 will be input as 1,1,0 and then 1. The transition
table is given by Table 12.4. Here
0- 1
The transition diagrams already introduced can be extended for Mealy machines by
writing the input and output symbols as an ordered pair (i, 0) for each transition.
Unlike finite automata a Mealy machine does not accept or reject input strings
rather it processes them. Similarly to Moore machines, Mealy machines can be
combined. Using a similar proof to the one for Moore machines, the combination of
Mealy machines provides no extra computing power.
Example. Table 12.5 describes a simple Mealy machine which performs the NOT
operation. Here
S:= {So}, E:= {D, I}, r:= {D, I}.
The transition diagram is given in Figure 12.5.
(P(D,I),(I,D)
Example. This example shows how an n-bit incrementer which increments any n-bit
number modulo 2n (2n == 0 modulo 2n) can be implemented with a Mealy machine.
The bits are fed in from low order to high order. For example the number 11 with
bit representation 1011 will be input as 1,1,0 and then 1. The transition table is
given by Table 12.6. Here
(0,0),(1,1)
(1,0)
For every Moore machine there is an equivalent Mealy machine and conversely. For
the proof we refer to [49]. This is simply a matter of showing how to gain the same
output for Moore and Mealy machines with the same input.
238 Chapter 12. Finite State Machines
Example. We can use a Turing machine to perform the parity check. Let
and
E:={O,1}, r:={O,1}.
The start state is Seven and halt state Sfin. The table for the transitions is given by
Table 12.7. The symbol r in the movement column instructs the tape head to move
one cell right. The transition diagram is given by Figure 12.7.
+
(1,1,r) (~,~,r)
(O,O,r) 0 - o (O,O,r)
\------------~)
(1,1,r)
The Turing machine only accepts bit strings which pass the odd parity test. Note
that it is not necessary to label the states in the transition diagram since this does
not change the operation of the Turing machine. ..
240 Chapter 12. Finite State Machines
Example. Now we use a 'lUring machine to calculate the parity bit for odd parity
and place it in the cell of the tape immediately after the bit string used for input.
Let
and
E := {a, 1}, r = {a, 1}.
The start state is Seven and halt states S/inl and S/in2. The table for the transitions
is given by Table 12.8. The transition diagram is given by Figure 12.8.
°1
°°
1
r
r
r
S/inl
Seven
Seven Sodd
Seven tl. 1 r S/in2
(O,O,r) (O,O,r)
\.'-------~)
(1,1,r)
Example. Now we use a Turing machine to negate (NOT) a bit sequence (one's
complement). The states are
Sstart is the start state and Shalt is a halt state. The alphabets are
~:={O,1}, r:={O,1}.
The transition table is given by Table 12.9. The transition diagram is given by
Figure 12.9.
(O,1,r),(1,O,r)
Q(~,~,r)~
~
Example. Now we consider a 'lUring machine which has no finite automaton equiv-
alent. The 'lUring machine reverses a bit string. The states are
~ := {OI,lI} and r := {Do, 10 ,01 , II}. 8start is the start state and 8 halt is the
halt state. The input and output alphabet are different so that the machine can
differentiate between input and output symbols. The input and output will be
interpreted as binary digits but using different alphabets means the machine can
remember what it has already done. The transitions are given by Table 12.10. The
transition diagram is given by Figure 12.10. ..
LF(Ll, 10, r)
L2=(Ll, 00, r)
L3=(1[,00,1)
L4=(O[,lo,l)
II turing.cpp
#include <iostream>
#include <string>
class Tapecell
{
protected:
char symbol;
Tapecell *next,*previous;
class Transition
{
public:
int state,nextstate;
char input ,output ,movement ,halt;
Transition(int s = O,char i = , ',char 0 = , ,
char m = l,int ns = O,char h = 0)
:state(s) ,input (i) ,output (0) ,
nextstate(ns),movement(m),halt(h) { }
12.5 Example Programs 245
};
class TuringMachine
{
protected:
Tapecell *tape;
Transition *table;
int ccell,state,tentries,crash,sstate;
void tcrash(char);
void add(char);
Transition *lookup(Tapecell *);
int ishalt(int);
public:
TuringMachine(Transition *,int,int);
-TuringMachine();
void run(const string &,int);
static char left,right;
};
II constructor
TuringMachine::TuringMachine(Transition *ttable,int entr,int strt)
{
int i;
II destructor
TuringMachine::-TuringMachine()
{
Tapecell *cell = tape;
delete[] table;
if(cell != (Tapecell *)NULL)
while(cell->next != (Tapecell *)NULL)
{
cell = cell->next;
if(cell->previous != (Tapecell *)NULL)
delete cell->previous;
}
if(cell != (Tapecell *)NULL)
246 Chapter 12. Finite State Machines
delete cell;
}
tcrash(cell->symbol);
}
halt = ishalt(state);
}
}
if(!crash)
{
cell = tape;
cout « "Succesful completion, tape:" « endl;
while(cell != (Tapecell *)NULL)
{
cout « cell->symbol;
cell cell->next;
}
cout « endl;
}
cell = tape;
if (cell != (Tapecell *)NULL)
while(cell->next != (Tapecell*)NULL)
{
cell = cell->next;
if(cell->previous != (Tapecell*)NULL)
delete cell->previous;
}
if(cell != (Tapecell *)NULL)
delete cell;
tape = (Tapecell *)NULL;
}
void main(void)
{
II parity calculation Turing Machine Transitions
Transition paritytable[8] = {
Transition(l,'O','O',TuringMachine::right,l,O),
Transition(l,'l','l',TuringMachine::right,O,O),
Transition(l,' ','0',TuringMachine::right,2,0),
Transition(O,'O','O',TuringMachine::right,O,O),
Transition(O,'l','l',TuringMachine::right,l,O),
Transition(O,' ','1',TuringMachine::right,3,0),
Transition(2,' ',' ',TuringMachine::right,2,1), I Ihal t state
Transition(3,' ',' ',TuringMachine::right,3,1) I Ihal t state
};
Transition(O,'a',' ',TuringMachine::right,10,O),
Transition(O,'b',' ',TuringMachine::right,ll,O),
Transition(10,'a','a',TuringMachine::right,10,O),
Transition(10,'b','b',TuringMachine::right,10,O),
Transition(10,'O','O',TuringMachine::left,20,O),
Transition(10,'1','1',TuringMachine::left,20,O),
Transition(10,' ',' ',TuringMachine::left,20,O),
Transition(20,'a','O',TuringMachine::left,30,O),
Transition(20,'b','O',TuringMachine::left,31,O),
Transition(30,'a','a',TuringMachine::left,30,O),
Transition(30,'b','b',TuringMachine::left,30,O),
Transition(30,'O','O',TuringMachine::left,30,O),
Transition(30,'1','1',TuringMachine::left,30,O),
Transition (30 , , ','O',TuringMachine::right,O,O),
Transition(ll,'a','a' ,TuringMachine::right,11,O),
Transition(ll,'b','b',TuringMachine::right,ll,O),
Transition(11,'O','O',TuringMachine::left,21,O),
Transition(11,'1','1',TuringMachine::left,21,O),
Transition(ll,' ',' ',TuringMachine::left,21,0),
Transition(21,'a','1',TuringMachine::left,30,0),
Transition(21,'b','1',TuringMachine::left,31,0),
Transition(31,'a','a',TuringMachine::left,31,0),
Transition(31,'b','b',TuringMachine::left,31,0),
Transition(31,'0','0',TuringMachine::left,31,0),
Transition(31,'1','1',TuringMachine::left,31,0),
Transition(31,' ','l',TuringMachine::right,O,O),
Transition(50,' ',' ',TuringMachine::right,50,1) //halt state
};
string paritycheck="01101001";
string reversecheck="01101001";
TuringMachine parity(paritytable,8,0);
cout « "Parity calculation with input "
« paritycheck « endl;
parity.run(paritycheck,8);
TuringMachine reverse(reversetable,29,O);
cout « "Reverse input "
« reversecheck « endl;
reverse.run(reversecheck,8);
paritycheck[6] = 'a';
cout « "Crash parity calulation with input "
« paritycheck « endl;
parity.run(paritycheck,8);
}
250 Chapter 12. Finite State Machines
13.1 Introduction
Once we have the building blocks for a computing device, we can construct the
device and give it tasks to perform. Some tasks are more difficult than others.
Some tasks may even be impossible for the computing device to perform. This is
the concept of computability. Since tasks can be represented as functions, we need to
determine the computability of functions. The computable functions are obviously
limited by the computing device, but if we choose a sufficiently general computing
device it can serve as a measure for computability.
We also need a measure of the difficulty of tasks. This measure indicates how fast
the task can be done. Some problems are inherently difficult such as prime number
factorization as used in public key cryptography systems, and therefore take a long
time to perform. This is referred to as the complexity of the problem. In general two
measures of complexity are often used, the time complexity and space complexity.
Time complexity describes the amount of time taken to do a task given the input.
Space complexity refers to the amount of memory required to perform the task given
the input. More precisely the measure of complexity is applied to algorithms, since
some algorithms are more efficient than others.
The complexity of sequences of symbols has been analysed [106, 109]. Thus if an
algorithm can be transformed into an appropriate sequence of symbols, the com-
plexity of the sequence can be used as a measure of the complexity of the algorithm.
An example is given in [161].
13.2 Computability
Church's thesis states that the intuitively computable functions are exactly the
partial recursive functions. Sometimes Church's thesis is called the Church-Turing
thesis because it can be formulated as the intuitively computable functions are
the functions which can be computed by Turing machines. To show that these two
statements are equivalent requires that we show that every partial recursive function
can be computed by a Turing machine and every Turing machine computes a partial
recursive function. It is simple to see how to implement the successor function, at
least it is simple to build a binary incrementer Turing machine (in the previous
chapter we showed how to achieve this using Moore and Mealy machines). The
projection operation is also not difficult to implement on a Turing machine. It can
be achieved by reading from the least significant bit to the most significant bit and
if the bit is 0 blank every second word (bit sequence) and if the bit is 1 blank every
first word (bit sequence). We can introduce new symbols to indicate the end of
words and the end of the words on the tape to simplify the implementation. The
zero function is trivial to implement using a Turing machine. It is also necessary to
show that primitive recursion and composition can be realised somehow with Turing
machines. Composition should pose no problem, if new symbols are introduced again
to make the task easier. The composition is a combination of the Turing machines
implementing each of the functions in the composition, and a control structure.
Primitive recursion can be implemented by writing n, n - 1, ... ,0 on the tape after
the input number. The value for n = 0 is part of the TUring machine structure,
independent of the contents of the tape. Once the function value is known for zero,
the value at n = 1 can be calculated and so on, up to n + 1. So we expect that
a Turing machine can compute all primitive recursive functions. A motivation for
the thesis is that Turing machines can compute anything that we can. Given as
much paper as needed we can compute certain functions using basic operations, for
a Turing machine the paper is formally defined by the tape and the basic operations
are formally defined by transitions. Any step in the computation is determined by
the contents of the paper, a TUring machine operates uniquely according to the tape
contents. Since we use the term "intuitively computable" , the statement cannot be
proven. A proof would require a definition of intuitive computability.
13.2 Computability 253
To simplify the machine, if there is no arc for a given state and input then the
machine continues the last motion and replaces the symbol with itself (a transition
to a state for this machine always has the same motion of the tape head). Also an
254 Chapter 13. Computability and Complexity
arc with no label replaces the symbol on the tape with itself and moves the tape
head left. We assume that the machine we wish to simulate uses only binary for
input and output. For each state, a valid transition can be represented in a finite
number of bits, i.e. a fixed number to represent the current and next state, and a
single bit to represent the input, output and movement. The description here uses
a tape which is infinite to the left, with the description of the Turing machine to be
simulated starting at the rightmost position of the tape. The description consists of
transitions represented in binary, where the end of a transition description is marked
by the symbol X. The end of the table of transitions is marked by a Y. Additional
symbols are used to mark the state of the machine. The start state is assumed to
begin immediately under the tape head.
Now we consider some problems the Turing machine cannot solve. For the halting
problem we consider if a Turing machine H exists which always halts, when given
as input a representation of another Turing machine and its input, and will give an
output indicating if the given Turing machine halts or not. A simple extension gives
the machine H' which halts whenever the input machine does not halt, and never
halts when the input machine does halt (achieved by a simple loop between two
states for any symbol read from the tape). Furthermore we require that the input
machine take its own description as input. If we use as input to the machine H', the
machine H' itself with itself again as input, we obtain a machine which halts only
when the machine does not halt. Thus such a Turing machine H' does not exist.
We can work with an alphabet which contains only a single letter, e.g. the letter I.
The words constructed from this alphabet (apart from the empty word) are: I, II, III,
etc. These words can, in a trivial way, be identified with the natural numbers 0,
1, 2, .... Such an extreme standardization of the "material" is advisable for some
considerations. On the other hand, it is often convenient to disperse the diversity
of an alphabet consisting of several elements.
The use of an alphabet consisting of one element does not imply any essential
limitation. We can associate the words W over an alphabet A consisting of N
elements with natural numbers G(W), in such a way that each natural number is
associated with at most one word. Similar arguments apply to words of an alphabet
consisting of one element. Such a representation of G is called a G6del numbering [63]
13.3 Gi:>del's Incompleteness Theorem 255
(also called arithmetization) and G(W) is the Godel number of the the word W with
respect to G. The following are the requirements for an arithmetization of W:
1 If WI i= W 2 then G(W1) i= G(W2)'
2 There exists an algorithm such that for any given word W, the corresponding
natural number G(W) can be computed in a finite number of steps.
3 For any natural number n, it can be decided whether n is the Godel number
of a word W over A in a finite number of steps.
4 There exists an algorithm such that if n is the Godel number of a word W over
A, then this word W (which is unique by argument (1)) can be constructed in
a finite number of steps.
Here is an example of a Godel numbering. Consider the alphabet with the letters
a, b, c. A word is constructed by any finite concatenation of these - that is, a
placement of these letters side by side in a line. For example, abcbba is a word. We
can then number the words as follows:
Given a word XIX2'" Xn where each Xi is a, b or c, we assign to it the number
2do • 3d , • ... • pdn
n
where Pi is the ith prime number (and 2 is the oth prime) and
I if Xi is a
d; := { 2 if Xi is b
3 if Xi is c
To show that this numbering satisfies the criteria given above, we use the funda-
mental theorem of arithmetic:
Any natural number 2: 2 can be represented as a product of primes, and that
product is, except for the orner of the primes, unique.
We may number all kinds of objects, not just alphabets. In general, the criteria for
a numbering to be useful are:
1. No two objects have the same number.
2. Given any object, we can "effectively" find the number that corresponds to it.
3. Given any number, we can "effectively" determine if it is assigned to an object
and, if so, to which object.
256 Chapter 13. Computability and Complexity
Now consider
with the Godel number m. This states that there is no proof for the theorem x. Let
A:= Vy ---,p(m,y).
Thus if number theory is consistent there exists a theorem such as A which cannot
be proved.
13.4 Complexity
13.4.1 Complexity of Bit Strings
Usually the complexity of an algorithm is expressed in terms of the size of the
input. Many different definitions of complexity have been proposed in the litera-
ture. A few are algorithmic complexity (Kolmogorov-Chaitin) [41], the Lempel-Ziv
complexity [109], the logical depth of Bennett [13], the effective measure of com-
plexity of Grassberger [76], the complexity of a system based on its diversity [94],
the thermodynamic depth [111], and a statistical measure of complexity [113].
We may describe the time complexity in terms of the total number of operations
required for a certain input size, or we may choose some basic operation as the
most expensive (such as multiplication or comparison) and use that to describe
the complexity of an algorithm. We can represent any program as a bitstring,
for example by calculating the Godel number of the program and using the bit
representation of this number. We can then use, as a measure of complexity, the
13.4 Complexity 257
compressibility of the bit string. Here we use the measure defined by Lempel and
Ziv [109, 161].
Given a binary string S = 8182, ... , 8 n of finite length n, we denote by S(i,j) the
substring 8i8i+l ... 8j (or the empty word if i > j) of S and by v(S) all substrings
of S. If Sl and S2 are two strings SlS2 denotes the concatenation (appending) of
S2 and Sl. The complexity in the sense of Lempel and Ziv of a finite string is
evaluated from the point of view of a simple self-delimiting learning machine, which
as it scans a given n digit string S = 8182, ... , 8 n from left to right, adds a new
string to its memory every time it discovers a substring of consecutive digits not
previously encountered. We begin with the complexity of the empty string as O.
Suppose we have already scanned the first r digits
where 0 indicates that we know the complexity c(R). We have to determine if the
rest of the string S(r + 1, n) can be produced by a simple copy operation. To do
this we consider the substrings
For i < 1 we use the empty string as Qr+i. Initially we consider i = 1. The substring
RQr+i can be produced by a simple copy operation if
If this is the case and the substring begins at 8j with j ::::: r, we can simply copy
8jH to 8 r H for k = 1,2, ... i, so we try i + 1. For r + i = n we have the special
case C(RQr+i) = c(R) + 1. If this is not the case, we have C(RQr+i) = c(R) + 1 and
repeat the process using RQr+i as Rand i = 1.
For example the bitstring consisting of only Os or only Is has complexity 2. Alter-
nating Os and Is has complexity 3.
o0 1 0010101. ...
The string 01101000011101001 has complexity 6.
II complex.cpp
#include <iostream.h>
#include <string.h>
return 0;
}
if(n 0) return 0;
if(r n-1) return c;
if(laststring!=s) { c 1; r 0; }
laststring=s;
return ++c;
}
void main(void)
{
char *str1 "0101010101";
char *str2 "1010101010101010101";
char *str3 "01101000011101001";
char *str4 "1011001011";
Definition. The space complexity [114] Cs(T, n) of a Turing machine T as the max-
imum number of cells into which T writes.
Definition. The class of problems on decidability for which there exists a polynomial
Turing machine is called the P-class of problems, denoted by the set P.
Definition. The N P -class of problems (denoted by the set N P) are those problems
for which, when given a potential solution, there exists a polynomial Turing machine
to determine if the solution is valid. The N P is for non-deterministic polynomial.
Thus if we can find a potential solution, for example by construction using random
numbers such that the probability of constructing an actual solution is sufficiently
high, the validity of the solution can be efficiently checked.
a polynomial algorithm exists for A then a polynomial algorithm exists for every
other problem in N P. An important question in complexity is if the classes P and
N P are the same. This reduces to the question is A E P for A any NP-complete
problem.
Cook's theorem [114, 183] states that the satisfiability problem is NP-complete.
Since the satisfiability problem is in NP, there exists a polynomial Turing machine
that can check the validity of a solution. The proof of the theorem consists of
analysing the Turing machine and constructing a logical formula, in a polynomial
number of operations, which describes the operation of the Turing machine. The
formula introduces a polynomial number of Boolean variables. The formula is a
conjunction of disjunctions which are the requirements on the Turing machine. For
example, the Turing machine can only be in one state at a time. Thus if any
problem A in NP is polynomially reducable to a satisfiability problem, A is also
NP-complete.
Chapter 14
Neural Networks
14.1 Introduction
Artificial neural networks is an abstract simulation of a real nervous system that
contains a collection of neuron nets communicating with each other via axon connec-
tions. Such a model bears a strong resemblance to axons and dendrites in a nervous
system. The first fundamental modelling of neural nets was proposed in 1943 by
McCulloch and Pitts in terms of a computational model of "nervous activity". The
McCulloch-Pitts neuron is a binary device and each neuron has a fixed threshold
logic. This model lead to the works of John von Neumann, Marvin Minsky, Frank
Rosenlatt, and many others. Hebb postulated [85], that neurons were appropriately
interconnected by self-organization and that "an existing pathway strengthens the
connections between the neurons". He proposed that the connectivity of the brain
is continually changing as an organism learns different functional tasks, and that
cell assemblies are created by such changes. By embedding a vast number of simple
neurons in an interactive nervous system, it is possible to provide computational
power for very sophisticated information processing.
The neuron is the basic processor in neural networks. Each neuron has one ouput,
which is generally related to the state of the neuron - its activation - and which
may fan out to several other neurons. Each neuron receives several inputs over
these connections, called synapses. The inputs are the activations of the incoming
neurons multiplied by the weights of the synapses. The activation of the neuron is
computed by applying a threshold function to this product. This threshold function
is generally some form of nonlinear function.
The basic artificial neuron (Cichocki and Unbehauen [45], Fausett [65], Hassoun [82],
Haykin [83], Rojas [139], Steeb [164]) can be modelled as a multi-input nonlinear
device with weighted interconnections Wji, also called synaptic weights or strengths.
The cell body (soma) is represented by a nonlinear limiting or threshold function f.
The simplest model of an artificial neuron sums the n weighted inputs and passes
the result through a nonlinearity according to the equation
where
Xo = 1.
The basic artificial neuron is characterized by its nonlinearity and the threshold
OJ. The McCulloch-Pitts model of the neuron used only the binary (hard-limiting)
function (step function or Heaviside function), i.e.
1ifX>O
H(x):= { 0 if X 0 . <
In this model a weighted sum of all inputs is compared with a threshold OJ. If this
sum exceeds the threshold, the neuron output is set to 1, otherwise to O. For bipolar
representation we can use the sign function
if x> 0
sign(x) := f~ if x = 0
1-lifx<O
The threshold (step) function may be replaced by a more general nonlinear function
and consequently the output of the neuron Yj can either assume a value of a discrete
set (e.g. {-1, 1}) or vary continuously (e.g. between -1 and 1 or generally between
Ymin and Ymax > Ymin)' The activation level or the state of the neuron is measured
by the output signal Yj, e.g. Yj = 1 if the neuron is firing (active) and Yj = 0 if the
neuron is quiescent in the unipolar case and Yj = -1 for the bipolar case.
In the basic neural model the output signal is usually determined by a monotonically
increasing sigmoid function of a weighted sum of the input signals. Such a sigmoid
14.1 Introduction 263
where). is a positive constant or variable which controls the steepness (slope) of the
sigmoidal function. The quantity Uj is given by
n
Uj := L WjiXi·
i=O
II thresh.cpp
#include <iostream.h>
#include <math.h>
int mainO
{
int n = 5; II length of input vector includes bias
double theta = 0.5; II threshold
II allocation memory for weight vector w
double* w = NULL;
w = new double[n];
w[O] -theta;
w[l] 0.7; w[2] = -1.1; w[3] = 4.5; w[4] 1.5;
int rl = H(w,x,n-l);
cout « "rl = " « rl « endl;
int r2 = sign(w,x,n-l);
cout « "r2 = " « r2 « endl;
double r3 = unipolar(w,x,n-l);
cout « "r3 = " « r3 « endl;
double r4 = bipolar(w,x,n-l);
cout « "r4 = " « r4 « endl;
delete [] w;
delete [] x;
return 0;
}
266 Chapter 14. Neural Networks
14.2 Hyperplanes
Hyperplanes are used to describe the function of a perceptron. They are used to
classify points in space as being elements of one of two half spaces.
Any point x ¢ Hp ,,,, in Rn has the property that either x E H:'", or x E H;,,,,.
These definitions can also be expressed in terms of a fixed point on the hyperplane.
Suppose a ERn is a point on the hyperplane Hp,O/. Any point x on the hyperplane
must satisfy
pTX - pTa = Ct - Ct = O.
Hp,a {x I pT(X - a) = 0, x E Rn }
Ht,a {x I pT(x - a) > 0, x E Rn }
H;'a { x I pT(x - a) < 0, x E R n }.
Definition. Two sets of points A and B in the n-dimensional space Rn are called
linearly separable if n + 1 real numbers wo, WI," . , Wn exist, such that every point
(XI,X2, ... ,Xn ) E A satisfies ~i=IWiXi ~ Wo and every point (XI,X2, ... ,X n ) E B
satisfies ~~IWiXi < WOo
Definition. Two sets A and B of points in the n-dimensional space Rn are called
absolutely linearly separable if n+ 1 real numbers wo, WI, ... , wn exist such that every
point (Xl, X2, . .. , Xn) E A satisfies ~i=IWiXi > Wo and every point (Xl, X2,.· . , Xn) E
B satisfies ~i=IWiXi < W00
Definition. The open (closed) positive half space associated with the n-dimensional
weight vector w is the set of all points x ERn for which w T x> 0 (w T X ~ 0). The
open (closed) negative half space associated with w is the set of all points x E Rn
for which wTx < 0 (wTx S; 0).
It is a hyperplane H p ,4. The point (1,1,0, 1)Y can be used to describe the two half
spaces
To understand the separation better, we can examine the effect of the division on
subspaces. The hyperplane divides the subspace corresponding to X3 around the
origin. The hyperplane divides the subspace corresponding to Xl around 1. The
same applies for the subspaces corresponding to X2 and X4' Thus we can classify the
following points.
s, t, v ~ 1, u S; 0
14.3 Perceptron
14.3.1 Introduction
The perceptron is the simplest form of a neural network used for the classification
of special types of patterns said to be linearly separable (i.e. patterns that lie
on opposite sides of a hyperplane). It consists of a single neuron with adjustable
synaptic weights Wi and threshold ().
1 1>
~ w·x,
"" - ()
i=l
The origin of the inputs is not important irrespective of whether they come from
other perceptrons or another class of computing units. The geometric interpretation
of the processing performed by perceptrons is the same as with McCulloch-Pitts
elements. A perceptron separates the input space into two half-spaces. For points
belonging to one half-space the result of the computation is 0, for points belonging
to the other it is 1.
We can also formulate this definition using the Heaviside step function
I for X ~0
H{x):= { 0 for X <0
Thus
Hct i=l
WiXi _ ()) = {I for {~~=1 w~x~
0 for (~i=l W,X,
=()) <~ 00
())
LWiXi = ()
i=l
defines a hyperplane which divides the Euclidean space Rn into two half spaces.
14.3 Perceptron 269
In many cases it is more convenient to deal with perceptrons of threshold zero only.
This corresponds to linear separations which are forced to go through the origin of
the input space. The threshold of the perceptron with a threshold has been converted
into the weight -(J of an additional input channel connected to the constant 1. This
extra weight connected to a constant is called the bias of the element. Thus the
input vector (Xl, X2, . .. , xn) must be extended with an additional! and the resulting
(n + 1)-dimensional vector
Xo = 1
whereby Wo = -(J.
The threshold computation of a perceptron will be expressed using scalar products.
The arithmetic test computed by the perceptron is thus
Example. If we are looking for the weights and threshold needed to implement the
AND function with a perceptron, the input vectors and their associated outputs are
If a perceptron with threshold zero is used, the input vectors must be extended and
the desired mappings are
A percept ron with three still unknown weights (wo, WI, W2) can carry out this task.
270 Chapter 14. Neural Networks
Example. The AND gate can be simulated using the perceptron. The AND gate is
given by
Input Output
0 0 0
0 1 0
1 0 0
1 1 1
Let
Then
wT = {1,1}
and the evaluation of H(wTxj - fJ) for j = 0, 1, 2, 3 yields
T 3 1
H(w Xl - fJ) = H(l - -) = H( --) = 0
2 2
T 3 1
H(w X2 - 0) = H(1 - -) = H( --) = 0
2 2
14.3 Perceptron 271
y = H(wTx - e)
Xl X2 X3 Y
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 0
Table 14.1: Function Table for the Boolean Function (Xl' X2) + (X2 . X3)
0 < e
Wl < e
W2 > e
W3 < e
Wl +W2 > e
Wl +W3 < e
W2 +W3 > e
Wl + W2 + W3 < e
which admits the solution
Wl = -1, W3 = -1.
272 Chapter 14. Neural Networks
Since we are considering logical functions of two variables, there are four possible
combinations for the input. The outputs for the four inputs are four bits which
uniquely distinguish each logical function. We use the number defined by these four
bits as a subindex for the name of the functions. The function (Xl,X2) t-+ 0, for
example, is denoted by 10 (since 0 corresponds to the bit string 0000). The AND
function is denoted by 18 (since 8 corresponds to the bit string 1000), whereby the
output bits are ordered according to the following ordering of the inputs: (1,1),
(0,1), (1,0), (0,0).
Two of the functions cannot be computed in this way. They are the function XOR
(exclusive OR) (function 16) and the function XNOR fg. No line can produce the
necessary separation of the input space. This can also be shown analytically.
14.3 Perceptron 273
Let WI and W2 be the weights of a percept ron with two inputs, and () its threshold.
If the perceptron computes the XOR function the following four inequalities must
be fulfilled.
Since the threshold () is positive, according to the first inequality, WI and W2 are
positive too, according to the second and third inequalities. Therefore the inequality
WI + W2 < () cannot be true. This contradiction implies that no perceptron capable
of computing the XOR function exists. An analogous proof holds for the function f9'
Thus using
y=H(wTx-(})
we cannot represent all Boolean functions. However we can realize the universal
NAND-gate (or universal NOR-gate). Thus any boolean function can be realized
using a network of linear threshold gates. For example the XOR gate can be con-
structed as in Figure 14.1.
1
(}=--
2'
274 Chapter 14. Neural Networks
0> 0,
Y H(Wltl+ W2t2 - 0)
H(WIH(WIXI + W2S - 0) + W2H(WIS + W2X2 - 0) - 0)
H(WIH(WIXI + w2H(WIXl + W2X2 - 8) - 0)
II xor.cpp
#include <iostream>
void main(void)
{
cout « "XOR(O,O) " « XOR(O,O) « endl;
cout « "XOR(O,l) " « XOR(O,O « endl;
cout « "XOR(l,O) " « XOR(l,O) « endl;
cout « "XOR(l,l) = « XOR(l,l) « endl;
II
XDR(O,O) 0
XDR(O,l) 1
XDR(l,O) 1
XDR(l,l) =0
14.3 Perceptron 275
Learning algorithms can be divided into supervised and unsupervised methods. Su-
pervised learning denotes a method in which some input vectors are collected and
presented to the network. The output computed by the network is observed and
the deviation from the expected answer is measured. The weights are corrected ac-
cording to the magnitude of the error in the way defined by the learning algorithm.
Unsupervised learning is used when, for a given input, the exact numerical output
a network should produce is unknown. Assume, for example, that some points in
two-dimensional space are to be classified into three clusters. We can use a classifier
network with three output lines. Each of the three computing units at the output
must specialize by firing only for inputs corresponding to elements of each cluster.
If one unit fires, the others must keep silent. In this case we do not know a priori
which unit is going to specialize on which cluster. Generally we do not even know
how many well-defined clusters are present. The network must organize itself in
order to be able to associate clusters with units.
Supervised learning is further divided into methods which use reinforcement or error
correction. Reinforcement learning is used when after each presentation of an input-
output example we only know whether the network produces the desired result or
not. The weights are updated based on this information so that only the input
vector can be used for weight correction. In learning with error correction, the
magnitude of the error, together with the input vector, determines the magnitude
of the corrections to the weights (corrective learning).
The percept ron learning algorithm is an example of supervised learning with rein-
forcement. Some variants use supervised learning with error correction.
The proof of convergence of the perceptron learning algorithm assumes that each
perceptron performs the test w T x > O. So far we have been working with percep-
trons which perform the test w T x ::=: o. If a perceptron with threshold zero can
linearly separate two finite sets of input vectors, then only a small adjustment to its
weights is needed to obtain an absolute linear separation. This is a direct corollary
of the following proposition.
Proposition. Two finite sets of points, A and B, in n-dimensional space which are
linearly separable are also absolutely linearly separable.
276 Chapter 14. Neural Networks
A usual approach for starting the learning algorithm is to initialize the network
weights randomly and to improve these initial parameters, looking at each step to see
whether a better separation of the training set can be achieved. We identify points
(Xl, X2,'" ,xn ) in n-dimensional space with the vector x with the same coordinates.
Let P and N be two finite sets of points in Rn which we want to separate linearly.
A weight vector is sought so that the points in P belong to its associated positive
half-space and the points in N to the negative half-space. The error of a perceptron
with weight vector w is the number of incorrectly classified points. The learning
algorithm must minimize this error function E(w). Now we introduce the percept ron
learning algorithm. The training set consists of two sets, P and N, in n-dimensional
extended input space. We look for a vector w capable of absolutely separating both
sets, so that all vectors in P belong to the open positive half-space and all vectors
in N to the open negative half-space of the linear separation.
This algorithm makes a correction to the weight vector whenever one of the selected
vectors in P or N has not been classified correctly. The perceptron convergence
theorem guarantees that if the two sets P and N are linearly separable the vector
w is updated only a finite number of times. The routine can be stopped when all
vectors are classified correctly.
P = { (1,2.0,2.0), (1,1.5,1.5)}
{ (2.0,2.0), (1.5,1.5)}
and
{ (0, 1), (1,0), (O,O)}.
14.3 Perceptron 277
I I classify. cpp
#include <iostream>
#include <stdlib.h>
#include <time.h>
srand(time(NULL));
for(i=O;i<d;i++) w[i] = double(rand())/RAND_MAX;
k = 0;
while(!classified)
{
i = rand()%(p+n-l);
if(i<p) x = P[i]; else x = N[i-p];
for(j=O,sum=O;j < d;j++) sum += w[j]*x[j];
if«i<p) && (sum<=O))
for(j=O;j < d;j++) w[j] += x[j];
if«i>=p) && (sum>=O))
for(j=O;j < d;j++) w[j] -= x[j];
k++;
classified = 1;
II check if the vectors are classified
if«k%(2*p+2*n)) == 0)
{
for(i=O;(i < p)&&classified;i++)
{
sum = 0;
for(j=O,sum=O;j < d;j++) sum += w[j]*P[i] [j];
if(sum <= 0) classified = 0;
}
278 Chapter 14. Neural Networks
for(i=O;(i<n)ltclassified;i++)
{
sum = 0;
for(j=O,sum=O;j < d;j++) sum += w[j]*N[i] [j];
if(sum >= 0) classified = 0;
}
}
else classified = 0;
}
}
void main(void)
{
double **P = new double*[2];
classify(P,N,2,3,w,3);
F{
n n n
1 L WiXi +L L WijXiXj 2': B
i=l i=l j=i
0 otherwise
for x E Rn and
n n n
y~{ 1 LWiXi
i=l
+L L
i=l j=i+1
WijXiXj 2': B
0 otherwise
for x E {O, 1}n. The only difference between the above two equations is the range of
the index j of the second summation in the double-summation term. The bounds
on the double summations eliminate WijXiXj and WjiXjXi duplications. Quadratic
threshold gates greatly increase the number of realizable Boolean functions when
compared to linear threshold gates.
Example. Consider
This quadratic threshold gate can be used to implement the XNOR operation. The
function classifies points in R2 according to g(x, y) 2': 0 and g(x, y) < 0 where
1
g(x,y) = -x-y+3xy+ 2"'
280 Chapter 14. Neural Networks
1.2 ,...---...---.--,---,-----,-----.----,-----,
g(X,y) = 0 -
o •
0.8 g(x, y) < 0
0.6
g(x,y) > 0
0.4
0.2
o • g(x,y) < 0 o
-0.2 '--_ _.l....-_ _- ' - -_ _- ' - ' ' - - - _ - ' -_ _.....I..._ _....L..._ _- '
-0.2 o 0.2 0.4 0.6 0.8 1 1.2
The following program illustrates a quadratic threshold gate for the XNOR opera-
tion.
II quadratic.cpp
#include <iostream.h>
void main(void)
{
int i;
14.3 Perceptron 281
int n = 2;
double T = 0.5;
double *x = NULL;
x = new double[n+l];
double *wv = NULL;
wv = new double [n+l] ;
x[O] = 1.0;
II case 1
x[l] = 0.0; x[2] = 0.0;
double rOO = f(x,wv,wm,n);
cout « "rOO = " « rOO « endl;
II case 2
x[l] = 0.0; x[2] = 1.0;
double rOl = f(x,wv,wm,n);
cout « "rOl = " « rOl « endl;
II case 3
x[l] = 1.0; x[2] = 0.0;
double rl0 = f(x,wv,wm,n);
cout « "rl0 = " « rl0 « endl;
II case 4
x[l] = 1.0; x[2] = 1.0;
double rll = f(x,wv,wm,n);
cout « "rl1 = " « rl1 « endl;
delete [] x;
delete [] wv;
The input sites are entry points for information into the network and do not perform
any computation. Results are transmitted to the output sites. The set N consists
of all computing elements in the network. The edges between all computing units
are weighted, as are the edges between input and output sites and computing units.
Layered architectures are those in which the set of computing units N is subdivided
into £ subsets N 1 , N 2 , .•. ,Nt in such a way that only connections from units in Nl
go to units in N2 , from units in N2 to units in N3 , etc. The input sites are only
connected to the units in the subset Nl, and the units in the subset Nt are the only
ones connected to the output sites. The units in Nt are the output units of the
network. The subsets Ni are called the layers of the network. The set of input sites
is called the input layer, the set of output units is called the output layer. All other
layers with no direct connections from or to the outside are called hidden layers.
Usually the units in a layer are not connected to each other and the output sites
are omitted from the graphical representation. A neural network with a layered
architecture does not contain cycles. The input is processed and relayed from the
layer to the other, until the final result has been computed.
In layered architectures normally all units from one layer are connected to all other
units in the following layer. If there are m units in the first layer and n units in the
second one, the total number of weights is mn. The total number of connections
can be rather large.
14.3 Perceptron 283
3. Set Emax = 0
(e) Set
Yo = 0, Yl = 0, Y2 = 0, Ya = 1 .
The calculations yield
Thus with
wT = (0.6,1.05), () = 1.3
we can simulate the AND gate.
In the program percand. cpp we use the notation of the extended space. Further-
more, the threshold is also initialized to a small random value at t = O.
II percand.cpp
#include <iostream.h>
#include <math.h>
double H(double z)
{
if(z >= 0.0) return 1.0;
else
return 0.0;
}
int mainO
14.3 Perceptron 287
{
II number of input vectors (patterns) is m = 4
II length of each input vector n = 3
int m = 4;
int n = 3;
double** x = NULL;
x = new double*[m];
for(int k=O; k<m; k++)
x[k] = new double[n];
II desired output
double* yt = NULL;
yt = new double [m];
yt[O] = 0.0; yt[l] 0.0; yt[2] = 0.0; yt[3] 1.0;
II weight vector
II w[O] = - theta (threshold)
double* w = NULL;
w = new double en];
II initialized to small random numbers
w[O] = 0.01; w[l] = 0.005; w[2] = 0.006;
II learning rate
double eta = 0.5;
double* wt = NULL;
wt = new double[n];
for(int i=Oj i<nj i++)
wt [i] = w [i] j
fore; ;)
{
change(x,yt,w,eta,m,n);
double dist = distance(w,wt,n);
if(dist < 0.0001) break;
for(i=O; i<n; i++)
wt [i] = wei] j
}
delete [] w·,
delete [] wt;
delete [] yt;
return 0;
}
Thus with
The calculations for the XOR gate are as follows. We work in the extended space.
The input vectors are
2) hidden layer ----> output. The input pairs from the hidden layer are (1,0,0),
(1,0,1), (1,1,0) and (1,0,0). Thus the first and the last patterns are the same. The
weights are
Consider input pattern (1,0,0) from hidden layer (already considered above)
II XOR1.cpp
#include <iostream.h>
double H(double s)
{
if(s >= 0.0) return 1.0;
else
return 0.0;
}
delete [] z;
y = H(y);
return y;
}
int mainO
{
int sizel, size2, size3;
size1 = 2; size2 = 2; size3 = 3;
int i, j, k;
double*** w = NULL;
w = new double** [size1];
for(i=O; i<size1; i++)
{
14.3 Perceptron 293
w[lJ [OJ [OJ = -0.5; w[lJ [OJ [lJ = 1.0; w[lJ [OJ [2J = 1.0;
w[l] [1] [OJ = 0.0; w[1J [lJ [lJ = 0.0; w[lJ [lJ [2J = 0.0;
II input patterns
int p = 4; II number of input pattern
int n = 3; II length of each input pattern
double** x = NULL;
x = new double* [pJ;
for(int k=O; k<p; k++)
{
x[kJ = new double [nJ;
}
x [OJ [OJ 1.0; x [OJ [1J = 0.0; x [OJ [2J = 0.0;
x [1] [OJ 1.0; x [1] [1J = 0.0; x[l] [2J = 1.0;
x [2J [OJ 1.0; x [2J [1] 1.0; x[2J [2J = 0.0;
x [3J [OJ 1.0; x [3J [1] = 1.0; x[3J [2J = 1.0;
result = map(w,x[2J,size2,size3);
cout « "result = " « result « endl; II => 1
result = map(w,x[3J,size2,size3);
cout « "result = " « result « endl; II => 0
return 0;
}
294 Chapter 14. Neural Networks
Let
{Xk, dd
be the training data, where k = 0,1, ... , m - 1. Here m is the number of training
examples (patterns). The sets Xk (k = 0, 1, ... , m-1) are the input pattern and the
sets d k are the corresponding (desired) output pattern. One complete presentation
of the entire training set during the learning process is called an epoch.
... , Xm-l,dm - 1.
The first example XQ, do in the epoch is presented to the network, and the sequence
of forward and backward computations described below is performed, resulting in
certain adjustments to the synaptic weights and threshold levels of the network.
Then, the second example x(1), d(1) in the epoch is presented, and the sequence of
forward and backward computations is repeated, resulting in further adjustments
to the synaptic weights and threshold levels. This process is continued until the last
training pattern xm-l, d m- 1 is taken into account.
N
G(x, w, a, 9) = L O:jf(wf X + OJ)
j=1
and
Hornik et al. [92] employing the Stone-Weierstrass theorem and Funahashi [69]
proved similar theorems stating that a one-hid den-layer feedforward neural network
is capable of approximating uniformly any continuous multivariate function to any
desired degree of accuracy.
296 Chapter 14. Neural Networks
We consider one hidden layer. The notations we use follow closely Hassoun [82].
Thus we consider a two-layer feedforward architecture. This network receives a set
of scalar signals
where Xo is a bias signal set to 1. This set of signals constitutes an input vector
Xk E Rn. The layer receiving this input signal is called the hidden layer. The hidden
layer has J units. The output of the hidden layer is a J dimensional real-valued
vector Zk = (ZO,Zl,"" ZJ-l), where we set Zo = 1 (bias signal). The vector Zk
supplies the input for the output layer of L units. The output layer generates an
L-dimensional vector Yk in response to the input vector Xk which, when the network
is fully trained, should be identical (or very close) to the desired output vector d k
associated with Xk.
The two activation functions fh (input layer to hidden layer) and fo (hidden layer
to output layer) are assumed to be differentiable functions. We use the logistic
functions
1 1
fh(s):= 1 + exp(-AhS) , fo (s) := -1-+-ex-p--'-(--A-oS-'-)
1
f(s) = 1 + exp( -As)
df
- = Af(l - 1) .
ds
The components of the desired output vector d k must be chosen within the range
of fo. We denote by Wji the weight of the jth hidden unit associated with the input
signal Xi. Thus the index i runs from 0 to n -1, where Xo = 1 and j runs from 1 to
J - 1. We set WOi = O. Now we have m input/output pairs of vectors
14.4 Multilayer Perceptrons 297
where the index k runs from 0 to m - 1. The aim of the algorithm is to adap-
tively adjust the (J - l)n + LJ weights of the network such that the underlying
function/mapping represented by the training set is approximated or learned. We
can define an error function since the learning is supervised, i.e. the target outputs
are available. We denote by Wlj the weight of the lth output unit associated with
the input signal Zj from the hidden layer. We derive a supervised learning rule for
adjusting the weights Wji and Wlj such that the error function
is minimized (in a local sense) over the training set. Here w represents the set of
all weights in the network.
Since the targets for the output units are given, we can use the delta rule directly
for updating the Wlj weights. We define
Since
J-l
netl := L WljZj
j=O
is the weighted sum for the lth output unit, f~ is the derivative of fa with respect to
netl, and w&ew and wlj are the updated (new) and current weight values, respectively.
The Zj values are calculated by propagating the input vector x through the hidden
layer according to
298 Chapter 14. Neural Networks
where j = 1,2, ... ,J - 1 and Zo = 1 (bias signal). For the hidden-layer weights Wji
we do not have a set of target values (desired outputs) for hidden units. However, we
can derive the learning rule for hidden units by attempting to minimize the output-
layer error. This amounts to propagating the output errors (d1 - Yl) back through
the output layer toward the hidden units in an attempt to estimate dynamic targets
for these units. Thus a gradient descent is performed on the criterion function
1 £-1
E(w) = 2 L (d1 - Yl)2
1=0
where w represents the set of all weights in the network. The gradient is calculated
with respect to the hidden weights
j)E
flWji = -rtf< tfw),
<='--- .. ' j = 1,2, ... ,J - 1, i = 0, 1, ... ,n - 1
where the partial derivative is to be evaluated at the current weight values. We find
oE oE OZj onetj
OWji OZj onetj OWji
where
we obtain
14.4 Multilayer Perceptrons 299
Now we can define an estimated target dj for the jth hidden unit implicitly in terms
of the backpropagated error signal as follows
L-1
dj - Zj := L(d1 - YM~(netl)wlj.
1=0
The complete approach for updating weights in a feedforward neural net utilizing
these rules can be summarized as follows. We do a pattern-by-pattern updating of
the weights.
1. Initialization. Initialize all weights to small random values and refer to them as
current weights w 1j and wJi .
2. Learning rate. Set the learning rates Tlo and 'f/h to small positive values.
3. Presentation of training example. Select an input pattern Xk from the training set
(preferably at random) propagate it through the network, thus generating hidden-
and output-unit activities based on the current weight settings. Thus find Zj and Yl.
4. Forward computation. Use the desired target vector d k associated with Xk, and
employ
or
300 Chapter 14. Neural Networks
to compute the hidden layer weight changes. The current weights are used in these
computations. In general, enhanced error correction may be achieved if one employs
the updated output-layer weights
and
7. Test Jor convergence. This is done by checking the output error function to
see if its magnitude is below some given threshold. Iterate the computation by
presenting new epochs of training examples to the network until the free parameters
of the network stabilize their values. The order of presentation of training examples
should be randomized from epoch to epoch. The learning rate parameter is typically
adjusted (and usually decreased) as the number of training iterations increases.
The following table gives the training set for the odd parity function over four bits.
The equation is
P = A3 E9 A2 E9 Al E9 Ao
where P is the odd parity function and Ao, At, A2 and A3 are the inputs.
14.4 Multilayer Perceptrons 301
Inputs Parity
0 0 0 0 1
0 0 0 1 0
0 0 1 0 0
0 0 1 1 1
0 1 0 0 0
0 1 0 1 1
0 1 1 0 1
0 1 1 1 0
1 0 0 0 0
1 0 0 1 1
1 0 1 0 1
1 0 1 1 0
1 1 0 0 1
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1
II backpr2.cpp
II back propagation
#include <iostream>
#include <math.h> II for exp
int mainO
{
int k, i, j, 1, t; II summation index
II k runs over all input pattern k = 0, 1, .. , m-l
III runs over all output units 1 = 0, 1, .. , L-l
II j runs over all the hidden layer units j = 0, 1, .. , J-l
II i runs over the length of the input vector i = 0, 1, .. , n-l
I I learning rates
double etao = 0.05;
double etah = 0.05;
II memory allocations
double** x = NULL;
int m = 16; II number of input vectors for Parity problem
int n = 5; II length of each input vector for Parity problem
II input vectors
x = new double* Em] ;
for(k=O; k<m; k++) x[k] = new double En];
x [0] [0] = 1.0; x[O] [1] = 0.0; x[O] [2] = 0.0; x[O] [3] = 0.0;
x [0] [4] = 0.0;
x [1] [0] 1.0; x [1] [1] = 0.0; x [1] [2] = 0.0; x[l] [3] = 0.0;
x [1] [4] = 1.0;
x [2] [0] = 1.0; x [2] [1] = 0.0; x [2] [2] = 0.0; x [2] [3] 1.0;
x [2] [4] = 0.0;
x [3] [0] 1.0; x[3] [1] = 0.0; x[3] [2] = 0.0; x [3] [3] 1.0;
x[3] [4] = 1.0;
x[4] [0] = 1.0; x [4] [1] = 0.0; x [4] [2] = 1.0; x [4] [3] = 0.0;
x[4] [4] = 0.0;
x[5] [0] = 1.0; x [5] [1] = 0.0; x [5] [2] = 1.0; x [5] [3] = 0.0;
x[5] [4] = 1.0;
x[6] [0] = 1.0; x [6] [1] = 0.0; x [6] [2] = 1.0; x [6] [3] = 1.0;
x[6] [4] = 0.0;
x[7] [0] = 1.0; x [7] [1] = 0.0; x [7] [2] = 1.0; x [7] [3] = 1.0;
x[7] [4] = 1.0;
x[8] [0] = 1.0; x [8] [1] = 1.0; x [8] [2] = 0.0; x [8] [3] = 0.0;
x[8] [4] = 0.0;
14.4 Multilayer Perceptrons 303
x[9] [0] = 1.0; x[9] [1] = 1.0; x[9] [2] = 0.0; x[9] [3] = 0.0;
x[9] [4] = 1.0;
x[10] [0]= 1.0; x[10] [1]= 1.0; x[10] [2]= 0.0; x[10] [3]= 1.0;
x[10] [4]= 0.0;
x[l1] [0]= 1.0; x [11] [1]= 1.0; x[l1] [2]= 0.0; x[l1] [3]= 1.0;
x[l1] [4]= 1.0;
x[12] [0]= 1.0; x[12] [1]= 1.0; x[12] [2]= 1.0; x[12] [3]= 0.0;
x[12] [4]= 0.0;
x[13] [0]= 1.0; x [13] [1]= 1.0; x[13] [2]= 1.0; x[13] [3]= 0.0;
x[13] [4]= 1.0;
x[14] [0]= 1.0; x[14] [1]= 1.0; x[14] [2]= 1.0; x[14] [3]= 1.0;
x[14] [4]= 0.0;
x[15] [0]= 1.0; x[15] [1]= 1.0; x[15] [2]= 1.0; x[15] [3]= 1.0;
x[15] [4]= 1.0;
We [0] [0] = 0.0; We [0] [1] = 0.0; We [0] [2] = 0.0; We [0] [3] = 0.1;
Wc[O] [4] = -0.2;
We [1] [0] = -0.2; We [1] [1] = 0.5; We [1] [2] = -0.5; We [1] [3] = 0.3;
We [1] [4] = 0.1;
We [2] [0] = -0.3; Wc[2] [1] = -0.3; We [2] [2] = 0.7; We [2] [3] = 0.1;
Wc[2] [4] = -0.2;
We [3] [0] = 0.2; We [3] [1] = 0.1; We [3] [2] = 0.5; We [3] [3] = -0.3;
304 Chapter 14. Neural Networks
II new
double** Wnew = NULL;
Wnew = new double* [J];
for(j=O; j<J; j++) Wnew[j) = new double [n];
II weight matrix (hidden layer -> output layer)
II current
double** Whc = NULL;
Whc = new double* [L];
for(l=O; l<L; 1++) Whc[l) = new double [J);
II new
double** Whnew = NULL; Whnew = new double* [L);
for(l=O; l<L; 1++) Whnew[l] = new double [J];
II training session
int T = 10000; II number of iterations
for(t=O; t<T; t++)
{
14.4 Multilayer Perceptrons 305
E[k] = 0.0;
double sum = 0.0;
for(l=O; l<L; 1++)
sum += (d[k] [1] - y[l])*(d[k][l] - y[l]);
E[k] = sum/2.0;
totalE += E[k];
} II end for loop over all input pattern
if(totalE < 0.0005) goto L;
else totalE = 0.0;
} II end training session
L:
cout « "number of iterations "« t « endl;
II input (1,0,0,0,1)
for(j=l; j<J; j++)
{
netj [j] = scalar (x [1] , Wc[j] ,n) ;
z [j] = fh(netj[j]);
}
II input (1,0,0,1,0)
for(j=l; j<J; j++)
{
netj[j] = scalar(x[2],Wc[j],n);
z [j] = fh(netj [j]) ;
}
II input (1,0,0,1,1)
for(j=l; j<J; j++)
{
netj[j] = scalar(x[3],Wc[j] ,n);
z [j] = fh(netj[j]);
}
for(l=O; l<L; 1++)
{
netl[l] = scalar(z,Whc[l] ,J);
yEll = fo(netl[l]);
cout « "y[" « 1 « II] = II « yEll « endl;
}
II input (1,0,1,0,0)
for(j=l; j<J; j++)
{
netj[j] = scalar(x[4],Wc[j] ,n);
308 Chapter 14. Neural Networks
Z [j] = fh(netj[j]);
}
for(I=O; I<L; 1++)
{
netl[l] = scalar(z,Whc[I],J);
y[l] = fo(netl[I]);
cout « "y[" « 1 « II] = II « y[l] « endl;
}
II input (1,0,1,0,1)
for(j=l; j<J; j++)
{
netj[j] = scalar(x[5],Wc[j],n);
Z[j] = fh(netj [j]) ;
}
for(I=O; I<L; 1++)
{
netl[l] = scalar(z,Whc[I],J);
y[l] = fo(netl[I]);
cout « "y[" « 1 « II] = II « y[l] « endl;
}
II input (1,0,1,1,0)
for(j=l; j<J; j++)
{
netj[j] = scalar(x[6],Wc[j] ,n);
z [j] = fh(netj[j]);
}
for(I=O; I<L; 1++)
{
netl[l] = scalar(z,Whc[I],J);
y[l] = fo(netl[I]);
cout « "y[" « 1 « II] = II « y[l] « endl;
}
II input (1,0,1,1,1)
for(j=l; j<J; j++)
{
netj[j] = scalar(x[7],Wc[j],n);
z [j] = fh(netj [j]) ;
}
for(I=O; I<L; 1++)
{
netl[l] = scalar(z,Whc[I],J);
y[l] = fo(netl[I]);
cout « "y[" « 1 « II] = II « y[l] « endl;
}
14.4 Multilayer Perceptrons 309
II input (1,1,0,0,0)
for(j=l; j<J; j++)
{
netj[j] = sealar(x [8], We [j],n);
z[j] = fh(netj[j]);
}
for(l=O; l<L; 1++)
{
netl[l] = sealar(z,Whe[l],J);
y[l] = fo(netl[l]);
eout « "y[" « 1 « II] = II « y[l] « endl;
}
II input (1,1,0,0,1)
for(j=l; j<J; j++)
{
netj[j] = sealar(x[9],We[j],n);
z[j] = fh(netj [j]) ;
}
for(l=O; l<L; 1++)
{
netl[l] = sealar(z,Whe[l],J);
y[l] = fo(netl[l]);
eout « "y[" « 1 « II] = II « y[l] « endl;
}
II input (1,1,0,1,0)
for(j=l; j<J; j++)
{
netj[j] = sealar(x[10],We[j],n);
z[j] = fh(netj[j]);
}
for(l=O; l<L; 1++)
{
netl[l] = sealar(z,Whe[l],J);
y[l] = fo(netl[l]);
eout « "y[" « 1 « II] = II « y[l] « endl;
}
II input (1,1,0,1,1)
for(j=l; j<J; j++)
{
netj[j] = sealar(x[ll],We[j],n);
z [j] = fh(netj[j]);
}
for(l=O; l<L; 1++)
{
netl[l] = sealar(z,Whe[l],J);
310 Chapter 14. Neural Networks
y[l] = fo(net1[1]);
cout « "y[" « 1 «II] "« y[l] « end1;
}
II input (1,1,1,0,0)
for(j=l; j<J; j++)
{
netj[j] = sca1ar(x[12],Wc[j],n);
z[j] = fh(netj [jD ;
}
for(l=O; l<L; 1++)
{
net1[1] = sca1ar(z,Whc[1],J);
y[l] = fo(net1[1]);
cout « "y[" « 1 « II] = II « y[l] « end1;
}
II input (1,1,1,0,1)
for(j=l; j<J; j++)
{
netj[j] = sca1ar(x[13],Wc[j],n);
z [j] = fh(netj [jD ;
}
for(l=O; l<L; 1++)
{
net1[1] = sca1ar(z,Whc[1],J);
y[l] = fo(net1[1]);
cout « "y[" « 1 « II] = II « y[l] « end1;
}
II input (1,1,1,1,0)
for(j=l; j<J; j++)
{
netj[j] = sca1ar(x[14],Wc[j],n);
z[j] = fh(netj[j]);
}
for(l=O; l<L; 1++)
{
net1[1] = sca1ar(z,Whc[1],J);
y[l] = fo(net1[1]);
cout « "y[" « 1 « II] = " « y[l] « end1;
}
II input (1,1,1,1,1)
for(j=l; j<J; j++)
{
netj[j] = sca1ar(x[15],Wc[j],n);
z[j] = fh(netj [jD;
14.4 Multilayer Perceptrons 311
}
for(I-0; I<L; 1++)
{
netl[I] = scalar(z,Whc[I],J);
y[l] = fo(netl[l]);
cout « "y[" « 1 « II] - " « y[l] « endl;
}
return 0;
}
The output is
y[O] = 5.96064e-07
yeO] = 5.32896e-07
yeo] = 0.989954
yeo] = 0.0183719
yeo] = 0.986117
yeo] = 0.98594
y[O] = 0.0110786
yeo] = 0.0200707
yeo] = 0.998834
yeo] = 0.998846
yeo] = 0.00840843
yeo] = 0.983464
yeo] = 0.00589264
yeo] = 0.00599696
yeo] = 0.996012
The values y [0] approximate the parity function.
Chapter 15
Genetic Algorithms
15.1 Introduction
Evolutionary methods have gained considerable popularity as general-purpose
robust optimization and search techniques. The failure of traditional optimiza-
tion techniques in searching complex, uncharted and vast-payoff landscapes riddled
with multimodality and complex constraints has generated interest in alternate ap-
proaches.
Genetic algorithms (Holland [89]' Goldberg [72], Michalewicz [116], Steeb [164]) are
self-adapting strategies for searching, based on the random exploration of the solu-
tion space coupled with a memory component which enables the algorithms to learn
the optimal search path from experience. They are the most prominent, widely used
representatives of evolutionary algorithms, a class of probabilistic search algorithms
based on the model of organic evolution. The starting point of all evolutionary
algorithms is the population (also called farm) of individuals (also called animals,
chromosomes, strings). The individuals are composed of genes which may take on a
number of values (in most cases 0 and 1) called alleles. The value of a gene is called
its allelic value, and it ranges on a set that is usually restricted to {O, I}. Thus these
individuals are represented as binary strings of fixed length, for example
"10001011101"
If the binary string has length N, then 2N binary strings can be formed. If we
describe a DNA molecule the alphabet would be a set of 4 symbols, {A, C, G, T}
where A stands for Adenine, C stands for Cytosine, G stands for Guanine and
T stands for Thymine. Strings of length N from this set allow for 4N different
individuals. We can also associate unsigned integers with these strings.
For example
II TCCGAT"
is associated with the integer
For the four colour problem we also use an alphabet of 4 symbols, {R, G, B, Y}
where R stands for red, G stands for green, B stands for blue and Y stands for
yellow.
Each of the individuals represents a search point in the space of potential solu-
tions to a given optimization problem. Then random operators model selection,
reproduction, crossover and mutation. The optimization problem gives quality in-
formation (fitness function or short fitness) for the individuals and the selection
process favours individuals of higher fitness to transfer their information (string) to
the next generation. The fitness of each string is the corresponding function value.
Genetic algorithms are specifically designed to treat problems involving large search
spaces containing multiple local minima. The algorithms have been applied to a
large number of optimization problems. Examples are solutions of ordinary differ-
ential equations, the smooth genetic algorithm, genetic algorithms in coding theory,
Markov chain analysis, the DNA molecule.
(1) Classical or calculus-based. This uses a deterministic approach to find the best
solution. This method requires the knowledge of the gradient or higher-order
derivatives. The technique can be applied to well-behaved problems.
(2) Enumemtive. With these methods, all possible solutions are generated and
tested to find the optimal solution. This requires excessive computation in
problems involving a large number of variables.
(3) Random. Guided random search methods are enumerative in nature; how-
ever, they use additional information to guide the search process. Simulated
annealing and evolutionary algorithms are typical examples of this class of
search methods.
15.2 The Sequential Genetic Algorithm 315
Step 2. Compute the fitness f(A;(t)) of each individual Ai(t) of the current popu-
lation A(t).
3) Mutation. The standard mutation operator modifies each allele of each individual
of the population in probability, where the mutation probability is a system param-
eter, Pm. Usually, the new allelic value is randomly chosen with uniform probability
distribution.
4) Local search. The necessity of this operator for optimization problems is still
under debate. Local search is usually a simple gradient-descent heuristic search
that carries each solution to a local optimum. The idea behind this is that search in
the space of local optima is much more effective than search in the whole solution
space.
The purpose of parent selection (also called setting up the farm of animals) in a
genetic algorithm is to give more reproductive chances, on the whole, to those pop-
ulation members that are the most fit. We use a binary string as a chromosome to
represent real value of the variable x. The length of the binary string depends on
the required precision. A population or farm could look like
"10101110011111110"
"00111101010100001"
"10101110010000110"
For the crossover operation the individuals of the population are randomly paired.
Each pair is then recombined, choosing one point in accordance with a uniformly
distributed probability over the length of the individual strings (parents) and cutting
them in two parts, accordingly. The new individuals (offspring) are formed by the
part of one part and the last part of the other. An example is
1011011000100101 parent
0010110110110111 parent
I I
1011010110110101 child
0010111000100111 child
The mutation operator modifies each allele (a bit in the bitstring) of each individual
of the population in probability. The new allele value is randomly chosen with
uniform probability distribution. An example is
1011011001011001 parent
I
1011111001011001 child
15.2 The Sequential Genetic Algorithm 317
The bit position is randomly selected. Whether the child is selected is decided by
the fitness function.
We have to map the binary string into a real number x with a given interval [a, b]
(a < b). The length of the binary string depends on the required precision. The
total length of the interval is b - a. The binary string is denoted by
where So is the least significant bit (LSB) and SN-I is the most significant bit (MSB).
In the first step we convert from base 2 to base 10
N-l
m = L si2i.
i=O
In the second step we calculate the corresponding real number on the interval [a, b]
b-a
x=a+m-
2N .
-1
Obviously if the bit string is given by "000 ... 00" we obtain x a and if the
bitstring is given by "111 ... 11" we obtain x = b.
[a, b] x [C, d]
which is a subset of R2. The coordinates are Xl and X2, i.e. Xl E [a, b] and X2 E [c, d].
Given a bitstring
of length
The block
N1-I
ml = L Si 2i
i=O
and therefore
b-a
Xl =a+ml-N- - ·
2 1-1
The block
SN-ISN-2· .. SNI
N-I
m2 = L Si 2i - N1
i=Nl
and therefore
d-c
X2 = c+ m2-N- - '
2 2-1
Example. In the one-dimensional case consider the binary string 10101101 of length
8 and the interval [-1,1]. Therefore
Thus
2
x=-1+173 256 _ 1 =0.357.
#include <iostream.h>
return r;
}
void main(void)
{
cout « reverse(23) « endl;
The advantage of the Gray code for genetic algorithms is that the mutation operator
does not cause a large change in the numeric value of an animal in the population.
Large changes are provided by additions of randomly initialized animals to the
population at regular intervals. Thus mutation would provide a more local search.
The conversion from standard binary encoding to binary Gray code is achieved as
follows. If we want to convert the binary sequence bn - 1 bn - 2 .•• bo to its binary Gray
code gn-lgn-2 ... go, the binary Gray code is
Thus gn-l = bn- 1 and gi = bi +1 EB bi for 0 < i :::; n - 1. To use numerical values
in calculations we need to apply the inverse Gray encoding. To convert the binary
Gray code gn-lgn-2 .. ·go to the binary number bob1b2 ... bn- 1 we use
The following Java program gives an implementation. We apply the built in Bi tSet
class in Java.
15.3 Gray Code 321
/ / Gray. java
import java.util.*;
size=3;
for(int i=0;i<8;i++)
{
b[i]=new BitSet(size);
if«i&l)==l) b[i] .set(O);
if«i&2)==2) b[i] .set(l);
if«i&4)==4) b[i] .set(2);
System.out.println("binary to gray "+btos(b[i])+"
+btos(b[i]=graycode(b[i])));
}
for(int i=0;i<8;i++)
{
System.out.println(lI gray to binary "+btos(b[i])+"
+btos(inversegraycode(b[i])));
}
}
for(int i=O;i<size;i++)
{
if(b.get(i)) s=ll"+s;
else S="O"+S;
}
return S;
}
for(int i=Oji<sizeji++)
{
if (b.get(i»
{
g.set(i)j
if(i>O)
gsr.set(i-l)j
}
}
g.xor(gsr)j
return gj
}
for(int i=Oji<sizeji++)
{
int sum=Oj
for(int j=ijj<sizejj++)
{
if(b.get(j» sum++j
}
if «(sum%2)==1)
ig.set(i)j
else
ig.clear(i)j
}
return igj
}
}
15.4 Schemata Theorem 323
{ 0, 1, *} .
A schema matches a particular string if at every location in the schema 1 matches
a 1 in the string, a 0 matches a 0, and a * matches either. As an example, consider
the strings and schemata of length 5. The schema
V := {O, 1, * }.
For alphabets of cardinality k, there are (k + 1)1 schemata, where I is the length of
the string. Furthermore, recall that in a string population with n members there
are at most n . 21 schemata contained in a population because each string is itself a
representative of 21 schemata. These counting arguments give us some feel for the
magnitude of information being processed by genetic algorithms.
All schemata are not created equal. Some are more specific than others. The schema
011*1** is a more definite statement about important similarity than the schema
0******. Furthermore, certain schemata span more of the total string length than
others. The schema 1****1* spans a larger portion of the string than the schema
1*1****. To quantify these idea.'l, two schema properties are introduced: schema
order and defining length.
Example. The order of the schema 011*1** is 4, whereas the order of the schema
0****** is 1.
Example. The schema 011*1** has defining length 6 = 4 because the last specific
position is 5 and the first specific position is 1. Thus 6(H) = 5 - 1 = 4.
Schemata provide the basic means for analyzing the net effect of reproduction and
genetic operators on building blocks contained within the population. Let us con-
sider the individual and combined effects of reproduction, crossover, and mutation
on schemata contained within a population of strings. Suppose at a given time step
t there are m(H, t) examples of a particular schema H contained within the popu-
lation A(t). During reproduction, a string is copied according to its fitness, or more
precisely a string Ai gets selected with probability
(H 1) = m(H, t)nf(H)
m ,t+ "n f.()
L...J=l J t
where f(H) is the average fitness of the strings representing schema H at time t.
The average fitness of the entire population is defined as
_ 1 n
f:= -
n
Eh·
j=l
f(H)
m(H, t + 1) = m(H, t) J(t) .
15.4 Schemata Theorem 325
Assuming that f(H)/ J remains relatively constant for t = 0,1, ... , the preceding
equation is a linear difference equation x(t + 1) = ax(t) with constant coefficient
which has the solution x(t) = atx(O). A particular schema grows as the ratio of the
average fitness of the schema to the average fitness of the population. Schemata with
fitness values above the population average will receive an increasing number of sam-
ples in the next generation, while schemata with fitness values below the population
average will receive a decreasing number of samples. This behaviour is carried out
with every schema H contained in a particular population A in parallel. In other
words, all the schemata in a population grow or decay according to their schema
averages under the operation of reproduction alone. Above-average schemata grow
and below-average schemata die off. Suppose we assume that a particular schema
H remains an amount cJ above average with c a constant. Under this assumption
we find
Theorem. By using the selection, crossover, and mutation of the standard genetic
algorithm, then short, low-order, and above average schemata receive exponentially
increasing trials in subsequent populations.
The short, low-order, and above average schemata are called building blocks. The
fundamental theorem indicates that building blocks are expected to dominate the
population. It is necessary to determine if the original goal of function optimization
is promoted by this fact. The preceding theorem does not answer this question.
Rather, the connection between the fundamental theorem and the observed opti-
mizing properties of the genetic algorithm is provided by the following conjecture.
n times
o(n)=~
which is a M + a L - 1 bit representation. We use the '1' symbol to mark the end
of the number of occurences of one individual and the beginning of the number of
occurences of the next. Thus the number of different populations is
When the new population consists only of individuals generated by selection, cross-
over and mutation the following equation for Q is obtained
(p )n .
Qk ,=v
MI
'
II
"L_1
k,j
I
",3
j=O nv,j'
where Pk,j is the probability that individual j occurs in population k, and nv,i is
generated according to the multinomial distribution based on Pk,j' Furthermore
Vose [180] derived
for any population k. Here J.L is the probability of mutation. Suzuki [168] analysed
the modified elitist strategy for genetic algorithms. This strategy always selects the
15.5 Markov Chain Analysis 327
I x2:0
H(x):= { 0 x<O
I j = ik
{
OJ,ik = 0 otherwise
The matrix Q consists of submatrices Q(i) of size N(i) x N(i) along the diagonal
and zero above these matrices. For the size N(i) we have
N(i) = (M -1M-1
+ 0:£ - i)
where Q(i) denotes the submatrix associated with the ith fittest individual of the
i k . The eigenvalues of each submatrix Q(i) are eigenvalues of Q. Furthermore, the
eigenvalues have magnitude not more than one. Denote by q/: the probability that
the nth generation (population) is population k, and by K the set of all populations
which include the fittest individual. To demonstrate the convergence of the genetic
algorithm using the modified elitist strategy Suzuki [168] showed that there exists
a constant C such that
L q~ 2: 1 - Cl-X.ln
kEK
where ,x,. is the eigenvalue with greatest magnitude. Thus, with enough iterations,
the probablity that a population includes the fittest individual is close to unity.
328 Chapter 15. Genetic Algorithms
The operation setbit sets a bit at a given position b (i.e the bit at the position b is
set to 1).
unsigned long b = 3;
unsigned long x = 15;
x 1= (1 « b); II shortcut for x = x 1 (1 « b);
The operation clearbit clears a bit at a given position b (i.e. the bit at the position
b is set to 0).
unsigned long b 3;
unsigned long x 15;
x &= - (1 « b); II short cut for x = x & -(1 « b);
The operation swapbi t swaps the bit at the position b, i.e. if the bit is 0 it is set to
1 and if the bit is 1 it is set to o.
unsigned long b = 3;
unsigned long x 15;
x -= (1 « b); II short cut for x =x - (1 « b);
The operation testbi t returns 1 or 0 depending on whether the bit at the position
b is set or not.
unsigned long b = 3;
unsigned long x = 15;
unsigned long result = «x & (1 «b» != 0);
15.6 Bit Set Classes in C++ and Java 329
The operations setbit, clearbit, swapbit and testbit are written as functions.
This leads to the following program.
II mysetbit.cpp
#include <iostream.h>
int mainO
{
unsigned long b = 3;
unsigned long x = 10; II binary 1010
setbit(x,b);
cout « "x = " « x « endl; II 10 => binary 1010
clearbit(x,b);
cout « "x = " « x « endl; II 2 => binary 10
swapbit(x,b) ;
cout « "x = " « x « endl; II binary
return 0;
}
Java has a BitSet class which includes the following methods (member functions):
void clear(int bitlndex) the bit with index bitlndex in this BitSet
is changed to the clear (false) state
boolean get(int bitlndex) returns the value of the bit with the
specified index
void or(Bitset set) performs a logical OR of this bit set with the
bit set argument
void xor(BitSet set) performs a logical XoR of this bit set with
the bit set argument
The BitSet class will be used in the program for the four colour problem.
15.6 Bit Set Classes in C++ and Java 331
In C++ we can use the standard template library's bitset class. The methods are
Constructors
bitset<N> s construct bitset for N bits
bitset<N> s(aBitSet) copy constructor
bitset<N> s(ulong) create bitset representing an
unsigned long value
Assignment
Other operations
s bitwise complement of s
s « n shift set left by n
s » n shift set right by n
s.to_string() return string representation of set
I I bi tset 1. cpp
#include <iostream>
#include <bitset>
#include <string>
using namespace std;
int mainO
{
const unsigned long n = 32;
bitset<n> s;
cout « s.set() « endl; II set all bits to 1
bitset<n> t;
cout « t.reset() « endl; II set all bits to false
t.set(23);
t.set(27);
bitset<n> u;
u = s " t;
cout « "u = " « u « endl;
bitset<n> V;
V = sit;
cout « "v = " « v « endl;
bitset<n> W;
W =S - t;
cout « "w = " « w « endl;
bitset<n> Z;
z =w W;
cout « "z = " « z « endl;
return 0;
}
15.7 A Bit Vector Class 333
#include <string.h>
#ifndef __ BITVECTDR
#define __ BITVECTDR
class BitVector
{
protected:
unsigned char *bitvecj
int lenj
public:
BitVectorO j
BitVector(int nbits)j
BitVector(const BitVector& b)j II copy constructor
-BitVectorO j
void SetBit(int bit,int val=l)j
int GetBit(int bit) constj
void ToggleBit(int bit)j
BitVector operator&(const BitVector&) constj
BitVector& operator &= (const BitVector&)j
BitVector operator 1 (const BitVector&) constj
BitVector& operator 1= (const BitVector&)j
BitVector operator - (const BitVector&) constj
BitVector& operator -= (const BitVector&)j
friend BitVector operator - (const BitVector&)j
BitVector& operator = (const BitVector&)j
int operator[] (int bit) constj
void SetLength(int nbits)j
}j
BitVector: :BitVector()
{
len = OJ
bitvec = NULLj
}
BitVector::BitVector(int nbits)
{
len = nbits/8+«nbits%8)?1:0)j
bitvec = new unsigned char[len]j
}
334 Chapter 15. Genetic Algorithms
BitVector::-BitVector()
{
if(bitvec != NULL) delete[] bitvec;
}
#endif
15.8 Maximum of One-Dimensional Maps 337
f(x) = cos(x)
and
g(x) = cos (x) - sin(2x)
in the interval [0 : 2111 In this interval the function f has two global maxima at
the value 0 and 27r. The function g has three maxima. The global maximum is at
5.64891 and the two local maxima are at a and 2.13862.
II x_value
double x_value(int* arr,int& N,double a,double b)
II setup of farm
void setup(int** farm, int M, int N)
II mutate an individual
void mutate(int** farm, int M, int N)
Here N is the length of the binary string and M is the size of the population, which
is kept constant at each time step. For the given problem we select N = 10 and
M = 12. The binary string" SN-1SN-2"'SO" is mapped into the integer number m
and then into the real number x in the interval [0 : 27r] as described above.
The farm is set up using a random number generator. In our implementation the
crossing function selects the two fittest strings from the two parents and the two
children. The parents are selected by a random number generator. With a popu-
lation of 12 strings in the farm we find after 100 iterations both the maxima at a
and 27r for the function f. A typical result is that five strings are related to the
maximum at x = a and seven strings are related to the maximum at x = 27r. For
the fitness function g we find the global maximum and the second highest maximum
after 100 iterations.
338 Chapter 15. Genetic Algorithms
II genetic.cpp
II A simple genetic algorithm
II finding the global maximum of
II the function f in the interval [a,b].
#include <iostream.h>
#include <stdlib.h>
#include <time.h> II for srand(), rand()
#include <math.h> II for cos(), sin(), pow
time_t t;
srand«unsigned) time(&t»;
for(int j=O; j<M; j++)
{
for(int k=Oj k<N; k++)
{
farm[j] [k] = randO%2;
}
}
}
double res[4];
int r1 = rand()%M;
int r2 = rand()Y~;
II random returns a value between
II 0 and one less than its parameter
while(r2 == r1) r2 = rand()%M;
res[O] = f_value(f,farm[r1],N,a,b);
res[l] = f_value(f,farm[r2],N,a,b);
int r3 = rand()%(N-2) + 1;
res[2] = f_value(f,temp[O],N,a,b);
res[3] = f_value(f,temp[l],N,a,b);
340 Chapter 15. Genetic Algorithms
II mutate an individual
void mutate(int** farm,int& M,int& N,double& a,double& b)
{
double res [2] ;
int r4 = rand()%N;
int rl = rand()%M;
res[O] = f_value(f,farm[rl],N,a,b);
int vl = farm [rl] [r4];
if(vl == 0) farm[r1] [r4] = 1;
if(v1 == 1) farm [r1] [r4] = 0;
double a1 = f_value(f,farm[r1] ,N,a,b);
if (al < res[O]) farm[rl] [r4] = vl;
int r5 = rand()%N;
int r2 = rand()%M;
res[l] = f_value(f,farm[r2] ,N,a,b);
int v2 = farm [r2] [r5] ;
if(v2 == 0) farm[r2] [r5] = 1;
if(v2 == 1) farm[r2] [r5] = 0;
double a2 = f_value(f,farm[r2] ,N,a,b);
if(a2 < res[l]) farm[r2] [r5] = v2;
}
void mainO
{
int M = 12; II population (farm) has 12 individuals (animals)
int N = 10; II length of binary string
In the program given above we store a bit as into This wastes a lot of memory
space. A more optimal use of memory is to use a string, for example" 1000111101".
Then we use 1 byte for 1 or O. An even more optimal use is to manipulate the
bits themselves. In the following we use the class BitVector described above to
manipulate the bits. The BitVector class is included in the header file bitVect.h.
/ / f indmax . cpp
#include <iostream.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>
#include "bitvect.h"
void main(void)
{
int M = 12;
int N = 10;
int i, j, k;
for(k=0;k<1000;k++)
{
crossings(farm,M,N,a,b);
mutate(farm,M,N,a,b);
}
for(j=O ;j<N; j++)
cout « "farm[1] ["«j«"]=" « farm[l] [j] « endl;
cout«endl;
for(j=O ;j<M;j++)
cout « "fitness f_value["«j«"]="
« f_value(f,farm[j] ,N,a,b)
«" x_value["«j«"]="« x_value(farm[j],N,a,b) « endl;
delete [] farm;
}
A typical output is
fam[l] [8]=0
fam[l] [9]=0
We use the following notation. N is the length of the chromosome (binary string).
The chromosome includes both the contributions from the x variable and y variable.
The size of N depends on the required precision. M denotes the size of the farm
(population) which is kept constant at each time step. First we have to decide about
the precision. We assume further that the required precision is four decimal places
for each variable. First we find the domain of the variable x, i.e. b-a. The precision
requirement implies that the range [a, bJ should be divided into at least (b-a) ·10000
equal size ranges. Thus we have to find an integer number NI such that
The domain of variable y has length d - c. The same precision requirement implies
that we have to find an integer N2 such that
Next we generate the farm. To optimize the function! using a genetic algorithm,
we create a population of size = M chromosomes. All N bits in all chromosomes
are initialized randomly using a random number generator.
Let us denote the chromosomes by vo, VI, ... , VM-I. During the evaluation phase
we decode each chromosome and calculate the fitness function values f(x, y) from
(x, y) values just decoded.
Now the system constructs a roulette wheel for the selection process. First we
calculate the total fitness F of the population
M-I
F:= L !(Vi).
i=O
i
f(vi)
Pi:=p' qi:= LPk> i = 0, 1, ... , M - 1.
k=O
Obviously, qM-l = 1. Now we spin the roulette wheel M times. First we generate a
(random) sequence of M numbers for the range [0 .. 1]. Each time we select a single
chromosome for a new population as follows. Let fa be the first random number.
Then qk < fa < qk+1 for a certain k. We selected chromosome k + 1 for the new
population. We do the same selection process for all the other M - 1 random
numbers. This leads to a new farm of chromosomes. Some of the chromosomes can
now occur twice.
We now apply the recombination operator, crossover, to the individuals in the new
population. For the probability of crossover we choose Pc = 0.25. We proceed in the
following way: for each chromosome in the (new) population we generate a random
number f from the range [0 .. 1]. Thus we generate again a sequence of M random
numbers in the interval [0, 1]. If f < 0.25, we select a given chromosome for crossover.
If the number of selected chromosomes is even, so we can pair them. If the number
of selected chromosomes were odd, we would either add one extra chromosome or
remove one selected chromosome. Now we mate selected chromosomes randomly.
For each of these two pairs, we generate a random integer number pos from the
range [O .. N - 2]. The number pos indicates the position of the crossing point. We
do now the same process for the second pair of chromosomes and so on. This leads
to a new farm of chromosomes.
Thus we have completed one iteration (i.e., one generation) of the while loop in the
genetic procedure. Next we find the fitness function for the new population and
the total fitness of the new population, which should be higher compared to the
old population. The fitness value of the fittest chromosome of the new population
should also be higher than the fitness value of the fittest chromosome in the old
population. Now we are ready to run the selection process again and apply the
genetic operators, evaluate the next generation and so on. A stopping condition
could be that the total fitness does not change anymore.
348 Chapter 15. Genetic Algorithms
II twodim.cpp
#include <iostream.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>
II function to optimize
double f(double x, double y)
{
return exp(-(x-l.0)*(x-l.0)*y*y/2.0);
return x*y;
}
fitnessVector[i] =
fitnessValue(f.farm[i] .1ength.domain.xLength.yLength);
}
{
count = 0;
while(randomVector[i] > cum_prob_Vector[count]) count++;
for(j=O; j<length; j++)
{
newFarm[i] [j] = farm[count] [j] ;
}
}
delete [] fitnessVector;
delete [] probabilityVector;
delete [] cum_prob_Vector;
delete [] randomVector;
time_t t;
srand((unsigned) time(&t));
delete [] chosen;
delete [] randomVector;
time_t t;
srand((unsigned) time(&t));
int a, b;
for(i=O; i<totalbits; i++)
{
if(randomVector[i] < 0.01)
{
if(i >= length)
{
a = i/length; b i%length;
}
else
{
a = 0; b = i;
}
if(farm [a] [b] == 0)
farm [a] [b] 1;
else
farm[a][b] 0;
}
}
delete [] randomVector;
}
fitnessValue(f,farm[i],length,domain,xLength,yLength);
« max;
delete [] fitnessVector;
}
int mainO
{
int size = 32; II population size
int prec1s1on = 6; II precision
int iterations = 10000;
I I total length
int length = xLength + yLength;
cout « "\n the chromosone length is: " « length;
int i;
setup(farm,size,length);
II iteration loop
int t;
printFinalResult(farm,length,size,domain,xLength,yLength,iterations);
return 0;
}
356 Chapter 15. Genetic Algorithms
The program below solves the four colour problem for the map in Figure 15.1(a}.
0 1 2 y B R
3 B
6 Y
5 G
4 Y
9 B
7 8 R Y
(a) (b)
/ / Colour. java
System.out.println(GA(adjM,IRGBY",10));
}
while(p2==pl) p2=(int)(Math.random()*(p.length-l»;
while(c2==cl) c2=(int)(Math.random()*(p[O].length()-1»;
if(c2<cl) {int temp=c2; c2=cl; cl=temp;}
int i,f;
for(i=O,f=O;i<4;i++)
{
if(fitness(adjM,temp[i],temp[i] .length(»
>fitness(adjM,temp[f],temp[f] .length(»)
f=i;
}
{String tmp=temp[f]; temp[f]=temp[O]; temp[O]=tmp;}
for(i=1,f=1;i<4;i++)
{
if(fitness(adjM,temp[i],temp[i].length(»
>fitness(adjM,temp[f],temp[f] .length(»)
f=i;
}
{String tmp=temp[f]; temp[f]=temp[l]; temp[l]=tmp;}
p[pl]=temp[2]; p[p2]=temp[3];
}
n
c(7r) = LCi7l"(i).
i=l
The value c( 7r) is usually referred to as the length (or cost or weight) of the per-
mutation 7r. The traveling salesman problem is one of the standard problems in
combinatorial optimization and has many important applications like routing or
production scheduling with job-dependent set-up times. Another example is the
knapsack problem, where the weight which can be carried is the constraint. The
norm of an n x n matrix over the real numbers R is given by
IIxll = 1
i.e. the length of the vector x E Rn must be 1. This problem can be solved with
the Lagrange multiplier method. The Lagrange multiplier method is as follows. Let
M be a manifold and I be a real valued function of class C(1) on some open set
containing M. We consider the problem of finding the extrema of the function 11M.
This is called a problem of constrained extrema. Assume that I has a constrained
°
extremum at x* = (xi, x;, ... , x~). Let gl (x) = 0, ... ,gm (x) = be the constraints
(manifolds). Then there exist real numbers AI, ... ,Am such that x* is a critical point
of the function
The numbers AI, ... , Am are called Lagrange multipliers. For the problem to find
the norm of an n x n matrix one considers the functions
The most difficult problem in genetic algorithms is the inclusion of constraints. Con-
straints are usually classified as equality or inequality relations. Equality constraints
may be included into the system. It would appear that inequality constraints pose
no particular problem. A genetic algorithm generates a sequence of parameters to be
tested using the system model, objective function, and the constraints. We simply
run the model, evaluate the fitness function, and check to see if any constraints are
violated. If not, the parameter set is assigned the fitness value corresponding to the
objective function evaluation. If constraints are violated, the solution is infeasible
and thus does not have a fitness. This procedure is fine except that many practical
problems are highly constrained; finding a feasible point is almost as difficult as find-
ing the best. As a result, we usually want to get some information out of infeasible
solutions, perhaps by degrading their fitness ranking in relation to the degree of
constraint violation. This is what is done in a penalty method. In a penalty method,
a constrained problem in optimization is transformed to an unconstrained problem
by associating a cost or penalty with all constraint violations. This cost is included
in the objective function evaluation.
i = 1,2, ... ,n
minimize
n
g(x) + r L <I> [hi (x)]
i=1
where <I> is the penalty function and r is the penalty coefficient. Other approaches
use decoders or repair algorithms.
I > 0,
{ Vi Vi i = 0, 1, ... ,n - I}
n-l
maximize L ViXi
i=O
n-l
subject to "w·x·
~ <M
t t_
i=O
where xiE{O,l}.
Here Xi = 0 means that item i should not be included in the knapsack, and Xi = 1
means that it should be included.
Although we do not know yet how to obtain the solution, the way to fill the knapsack
to carry the most value is to take the sleeping bag, food, mosquito repellent, first-aid
kit, flashlight, water purifier, and change of clothes, for a total value of 149 with a
total weight of 19 kilograms. An interesting aspect of the solution is that it is not
directly limited by the weight restriction. There are ways of filling the knapsack
with exactly 20 kilograms, such as substituting the change of clothes for the camp
stove and rain gear, but this decreases the total value.
The following program uses a genetic algorithm to solve the problem. We use the
header file bi tvect . h given above.
II knapsack.cpp
#include <fstream.h>
#include <time.h>
#include <stdlib.h>
#include "bitvect.h"
struct item
{
char name[50];
double weight;
double value;
};
for(i=O;i<n;i++)
{
data » list[i] .name;
data » list[i] .weight;
data » list[i] . value;
}
data» max;
}
{
int ij
double tweight = O.Oj
double tvalue = O.Oj
for(i=Oji<nji++)
{
if(b.GetBit(i»
{
tweight += list[i].weightj
tvalue += list[i].valuej
}
if(tweight > max)
{ tvalue = -1.0j i = nj }
}
return tvaluej
}
while(i<tries)
{
while(pos2 == pos) pos2 = rand()Y>n;
newanim -> ToggleBit(pos2);
if (value (*newanim,n,max, list) > 0) i=tries;
else { newanim -> ToggleBit(pos2);i++;pos2=pos; }
}
if(value(*newanim,n,max,list»value(farm[animal],n,max,list»
farm [animal] = *newanim;
delete newanim;
}
for(i=pos;i<n;i++)
{
newaniml -> SetBit(i,farm[anima12] [i]);
newanim2 -> SetBit(i,farm[animall] [i));
}
delete newaniml;
delete newanim2;
}
void mainO
{
item* list = NULL;
int n, m = 100,i,iterations = 500, besti = 0;
double max, bestv = 0.0, bestw = 0.0, temp;
BitVector *farm = NULL;
readitems("knapsack.dat",list,n,max);
366 Chapter 15. Genetic Algorithms
for(i=O;i<m:i++)
farm[i].SetLength(n);
setupfarm(farm,m,n,list,max);
for(i=O:i<iterations;i++)
{
crossing(farm,m,n,list,max);
mutate(farm,m,n,list,max);
}
for(i=O:i<m;i++)
if((temp=value(farm[i],n,max,list» > bestv)
{ bestv=temp; besti=i; }
12
tent 11 20
canteen_(filled) 7 10
change_of_clothes 5 11
camp_stoves 4 5
sleeping_bag 3 25
dried_food 3 50
first-aid_kit 3 15
mosquito_repellent 2 12
flashlight 2 6
novel 2 4
rain_gear 2 5
water_purifier 1 30
20
The output is
Items to take
change_of_clothes,
sleeping_bag,
dried30od,
first-aid_kit,
mosquito_repellent,
flashlight,
water_purifier,
We can extend the knapsack problem to one with m knapsacks. The capacity of
knapsack j is denoted by M j . The problem statement becomes
where Xi,j E {O, I}. Here Xi,j = 0 means that item i should not be included in
knapsack j, and Xi,j = 1 means that it should be included. The meanings of Wi and
Vi are the same as for the single knapsack problem.
368 Chapter 15. Genetic Algorithms
The structure of !1 and f (7r) depends on the problems considered. A typical problem
is the traveling salesman problem. The traveling salesman problem is deceivingly
simple to state. Given the distances separating a certain number of towns the aim
is to find the shortest tour that visits each town once and ends at the town it
started from. As there are several engineering and scientific problems equivalent to
a traveling salesman problem. The problem is of practical importance. The number
of all possible tours is finite, therefore in principle the problem is solvable. However,
the brute force strategy is not only impractical but completely useless even for a
moderate number of towns n, because the number of possible tours grows factorially
with n. The traveling salesman problem is the best-known example of the whole class
of problems called NP-complete (or NP-hard), which makes the problem especially
interesting theoretically. The NP-complete problems are transformable into each
other, and the computation time required to solve any of them grows faster than
any power ofthe size of the problem. There are strong arguments that a polynomial
time algorithm may not exist at all. Therefore, the aim of the calculations is usually
to find near-optimum solutions.
The following C++ program permut. cpp finds all permutations of the numbers
1,2, ... ,n. The array element p [0] takes the value 0 at the beginning of the pro-
gram. The end of the evaluation is indicated by p [0] = 1.
15.11 Problems with Constraints 369
II permut.cpp
II permutation of the numbers 1,2, ... , n
#include <iostream.h>
int mainO
{
int i, j, k, t, tauj
unsigned long n = 3j
II starting permutation
I I identity 1, 2, ... , n -> 1, 2, ... , n
for(i=Oj i<=nj i++)
{
p[i] = i;
cout « lip [II « i « II] II « p[i] « II II.
}
cout « endlj
int test = 1j
do
{
i = n -1j
while(p[i] > p[i+1]) i = i - 1j
if(i > 0) test = 1j else test = OJ
j = nj
while(p[j] <= p[i]) j = j - 1j
Goldberg and Lingle [73] suggested a crossover operator, the so-called partially
mapped crossover. They believe it will lead to an efficient solution of the traveling
salesman problem. A partially mapped crossover proceeds as follows. We number
the cities from 0 to N - 1. Let N = 10.
We explain with an example how the partially mapped operator works. Assume
that the parents are
(1 2 3 4 5 6 7 8 9 10 11 12) al
(7 3 6 11 4 12 5 2 10 9 1 8) a2
o ~ r 1 ~ (n - 1), 0 ~ r2 ~ (n - 1), r1 ~ r2
Let r1 = 3, r2 = 6. Truncate parents using r1 and r2.
(1 2 3 4 5 6 7 1 8 9 10 11 12)
(7 3 6 11 4 12 5 1 2 10 9 1 8)
(1 2 3 11 4 12 5 1 8 9 10 11 12)
(7 3 6 4 5 6 7 1 2 10 9 1 8)
Now some cities occur twice while others are missing in the new array. The crossing
defines the mappings
(1 2 31 11 4 12 5 1 8 9 10 X X)
(X 3 X 1 4 5 6 7 1 2 10 9 1 8)
Thus replace number 6 at position 2 by number 12. Consequently, the children are
(1 2 3 11 4 12 5 8 9 10 7 6)
(11 3 12 4 5 6 7 2 10 9 1 8)
372 Chapter 15. Genetic Algorithms
Bac and Perov [4] proposed another operator of crossings using the permutation
group. We illustrate the operator with an example and a C++ program. Let the
parents be given by
(0 1 2 3 4 5 6 7 8 9) -> (8 7 3 4 5 6 0 2 1 9) parent 1
(0 1 2 3 4 5 6 7 8 9) -> (7 6 0 1 2 9 8 4 3 5) parent 2
o -> 8 -> 3
1 -> 7 -> 4
2 -> 3 -> 1
(0 1 2 3 4 5 6 7 8 9) -> (3 4 1 2 9 8 7 0 6 5)
(0 1 2 3 4 5 6 7 8 9) -> (2 0 8 7 3 9 1 5 4 6)
II tspperm.cpp
#include <iostream.h>
int mainO
{
int n = 10;
int i;
crossing(a1,a2,a3,a4,n);
cout « endl;
cout « endl;
return 0;
}
374 Chapter 15. Genetic Algorithms
In the following program we use these operators to find solutions to the traveling
salesman problem.
II tsp.cpp
II
II traveling salesman problem
#include <fstream.h>
#include <stdlib.h>
#include <time.h>
#include "bitvect.h"
BitVector used(cities);
int city,i,j;
srand(time(NULL));
for(i=O;i<n;i++)
{
for(j=O;j<cities;j++) used.SetBit(j,O);
for(j=O;j<cities;j++)
{
city = rand()%cities;
if(!used.GetBit(city)) {farm[i] [j]=city;used.SetBit(city);}
else j--;
}
}
}
result1 = child1;
else result1 = farm[seq1];
if(distance(farm[seq2],cities,dist) > distance(child2,cities,dist))
result2 = child2;
else result2 = farm[seq2];
result3 = «result1 == farm[seq1])?child1:farm[seq1]);
result4 = «result2 == farm[seq2])?child2:farm[seq2]);
farm[seql] = resultl;
farm[seq2] = result2;
delete [] result3;
delete [] result4;
}
for(i=O;i<cities;i++)
15.11 Problems with Constraints 377
{
if«i<pos1) II (i>=pos2))
while«pos = insequence(child1[i],child1,pos1,pos2)) >= 0)
childl[i] = child2[pos];
if«i<pos1) I I (i>=pos2))
while«pos=insequence(child2[i],child2,pos1,pos2))>=0)
child2[i] = childl[pos];
}
void main(void)
{
int N = 20; // number of animals/chromosomes
int iterations = 300;
cout « N « endl;
int** farm = NULL;
int i,j;
double** dist = NULL; // array of distances
int cities; // number of cities
readdist("tsp.dat",dist,cities);
cout«"Cities: "«cities«endl;
farm = new int*[N];
for(i=O;i<N;i++) farm[i]=new int[cities];
setupfarm(farm,N,cities);
for(i=O;i<iterations;i++)
{
mutate(farm,N,cities,dist);
permutate(farm,N,cities,dist);
pmx(farm,N,cities,dist);
}
for(i=O;i<N;i++)
{
for(j=O;j<cities;j++) cout«farm[i] [j]«" ";
cout « " distance:" « distance (farm [i] ,cities,dist) « endl;
}
378 Chapter 15. Genetic Algorithms
destroydist(dist,cities);
}
8
14.5413
20.7663
13.5059
19.6041
10.4139
4.60977
14.5344
6.34114
5.09313
9.12195
5
12.0416
14.0357
8.70919
10.4938
11.2432
18.3742
18.8788
14.213
7.5326
12.6625
17.7071
9.72677
15.4729
10.5361
7.2111
10.198
10
15.11 Problems with Constraints 379
A typical output is
Cities: 8
742 3 150 6 distance:64.8559
742 3 1 506 distance:64.8559
742 3 1 506 distance:64.8559
065 1 3 247 distance:66.1875
7 4 2 3 1 506 distance:64.8559
7 4 2 3 1 506 distance:64.8559
7 4 2 3 150 6 distance:64.8559
065 1 3 247 distance:66.1875
7 4 2 3 150 6 distance:64.8559
4 2 1 3 0 6 5 7 distance:67.9889
7 4 2 3 1 506 distance:64.8559
7 4 2 3 1 506 distance:64.8559
065 1 3 2 4 7 distance:66.1875
065 1 3 2 4 7 distance:66.1875
742 3 150 6 distance:64.8559
742 3 150 6 distance:64.8559
742 3 150 6 distance:64.8559
742 3 1 5 6 0 distance:66.1875
742 3 1 506 distance:64.8559
742 3 150 6 distance:64.8559
38D Chapter 15. Genetic Algorithms
The bin packing problem is as follows. Given bins each of size Sand m objects with
sizes
n-l n-l
subject to L SiXi,j :; S, LXi,j = I,
;=0 i=O
Genetic algorithms can also be applied to Steiner's problem. In this problem there
are n villages. Village j requires L j phone lines to a station. A line costs c per
kilometer. Determine where to place a single station such that the total cost for the
phone lines is minimized. The set of positions of the villages is
n-l
L CLjlXj - 81
i=O
The algorithm starts with an initial guess 8(0). This can be randomly generated or
specifically chosen. Each iteration of the algorithm produces a new point 8(i). We
use the transformation T(m1, m2, 8) to generate two new points for every for every
point 8 in a set. The transformations operate on the part of the point identified by
m1 and m2, where m1 and m2 are bit positions. Each resolution must use a power
of 2 as the number of bits of representation.
1. Set the resolution n := 1.
21-1
P(j):= U T(k2 n - i ,(k+ 1)2n - j -1,8(i)).
k=O
cos(x) - sin(2x).
Since the discrete global optimization technique attempts to find a minimum func-
tion value, we use -(cos(x) - sin(2x)) in the program for function evaluation. For
the transformation we use simple bit inversion (Le. we apply NOT to the selected
bits). Valafar [172] recommends using the Gray code to transform the selected bits.
The chosen transform influences the effectiveness of the technique.
382 Chapter 15. Genetic Algorithms
II dgo.cpp
#include <iostream>
#include <math.h>
#include <stdlib.h>
#include <time.h>
if(!init)
{
pow2table[0]=1;
for(p2=1;p2<32;p2++) pow2table[p2]=2*pow2table[p2-1];
}
if(y>31) return 0;
return pow2table[y];
}
min = f(S);
while(n < maxres)
{
q = P = new unsigned int[pow2(n+1)-1];
newmin = 0;
for(j=O;j <= n;j++)
for(k=O;k < pow2(j);k++)
{
*q = T(k*pow2(n-j),(k+l)*pow2(n-j)-l,S);
if«f(*q) < min)&&(S != *q»
15.13 Distributed Global Optimization 383
{
min = H*q);
S = *q;
newmin = 1;
}
q++;
}
delete[] P;
if ( !newmin) n++;
}
}
void main(void)
{
unsigned int S;
double x;
srand(time(NULL»;
S=randO;
dgo(S,f,T);
x = 2*pi*S/(pow2(8*sizeof(unsigned int»-l);
cout « "For cos(x)-sin(2x) DGO gives" « -£(S)
« " at " « x « endl;
}
cos(x), cos(2x)
1
l+x
using 20 data points. We use SymbolicC++ [169] to create the expressions for the
functions and to evaluate the functions at the data points. The generation of a
tree for a symbolic expression is simple. The function type is randomly determined,
and if the function takes any parameters these must also be generated. For every
function parameter generated the probability that the next parameter is a leaf node
(a constant, or variable, but not a function) is increased. Thus we can ensure
that a randomly generated tree will not exceed a certain depth. The crossover
function randomly selects two individuals from the population, and then selects the
subtrees to be swapped. This is done by randomly determining if the current node
in the tree will be used as the subtree or if one of its branches will be used. The
process is repeated until a node is selected or a leaf node is found. An improvement
would be to use the roulette wheel method to select candidates for crossover. The
function fitness takes a symbolic tree, converts it to a symbolic expression and
then calculates the error for each data point. The sum of these errors is used as the
fitness. A fitness of zero is a perfect match.
15.14 Genetic Programming 385
The program below uses small populations and few iterations of the algorithm in
order to reduce the execution time and memory requirements. To achieve better
results large populations should be used to perform more extensive global search for
the optimum.
II sreg.cpp
#include <iostream.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include "Msymbol.h"
struct stree
{
int type;
struct stree *arg[2];
};
{
if(st.type >= opsl)
{
destroy(*(st.arg[O]»;
delete st.arg[O];
}
if(st.type >= ops2)
{
destroy(*(st.arg[l]»;
delete st.arg[l];
}
}
if(q> p)
type = rand()y'ops;
else
type = rand()Y.opsl;
if(n == 1) ind = 0;
if(st[ind].type < opsl)
{ destroy(st[ind]); st[ind] = generate(); return; }
else if(st[ind].type < ops2)
15.14 Genetic Programming 387
st[n] = copy(st[indl));
st[n+l] = copy(st[ind2]);
d1 = 1;
}
while(!d2)
{
arg2 = randO%2;
double p = double(rand(»/RAND_MAX;
if (stn2->type < ops2) arg2 = 0;
stp2 = stn2;
stn2 = stn2->arg[arg2];
if«stn2->type < opsl) I I (p > 0.5»
d2 = 1;
}
void main(void)
{
const double pi=3.1415297;
double data[20*2];
int i;
srand(time(NULL»;
The program outputs the function which best fits the data points and the data point
and function evaluation for the different input values. The left column of numbers
are the desired values, and the right column the obtained values.
390 Chapter 15. Genetic Algorithms
50/50 generations
cos (x)
1 1
0.809024 0.809024
0.309041 0.309041
-0.308981 -0.308981
-0.808987 -0.808987
-1 -1
-0.809061 -0.809061
-0.309101 -0.309101
0.308921 0.308921
0.80895 0.80895
50/50 generations
x-(2)
o0
4 4
16 16
36 36
64 64
100 100
144 144
196 196
256 256
324 324
60/60 generations
sin(cos(2*x»
1 0.841471
-0.653644 -0.608083
-0.1455 -0.144987
0.843854 0.74721
-0.957659 -0.817847
0.408082 0.39685
0.424179 0.411573
-0.962606 -0.820683
0.834223 0.740775
-0.127964 -0.127615
100/100 generations
sin((exp(x)*x+x-(2»*sin(x)*x)
1 0
0.333333 0.396537
0.2 0.429845
0.142857 -0.199608
15.14 Genetic Programming 391
0.111111 0.976007
0.0909091 0.997236
0.0769231 -0.246261
0.0666667 0.327578
0.0588235 0.762098
0.0526316 -0.998154
0.047619 -0.677241
0.0434783 -0.998545
0.04 0.209575
0.037037 -0.863178
0.0344828 -0.987161
0.0322581 0.492514
0.030303 -0.326958
0.0285714 -0.801911
0.027027 -0.328644
0.025641 0.782954
392 Chapter 15. Genetic Algorithms
A gene is a symbolic string with a head and a tail. Each symbol represents an
operation. For example the operation "+" takes two arguments and adds them.
The operation "x" would evaluate to the value of the variable x. The tail consists
only of operations which take no arguments. The string represents expressions in
prefix notation, i.e. 5 - 3 would be stored as "- 5 3". The reason for the tail is to
ensure that the expression is always complete. Suppose the string has h symbols in
the head which is specified as an input to the algorithm, and t symbols in the tail
which is determined from h. Thus if n is the maximum number of arguments for an
operation we must have
h+t-1 = hn.
The left-hand side is the total number of symbols except for the very first symbol.
The right-hand side is the total number of arguments required for all operations. We
assume, of course, that each operation requires the maximum number of arguments
so that any string of this length is a valid string for the expression. Thus the equation
states that there must be enough symbols to serve as arguments for all operations.
Now we can determine the required length for the tail
t = h(n - 1) + 1.
Suppose we use h = 8, and n = 2 for arithmetic operations. Thus the tail length
must be t = 9. So the total gene length is 17. We could then represent the expression
cos(x2 + 2) - sin(x)
-c+*xx2slxlx226x31
The vertical I is used to indicate the beginning of the tail. Here c represents cosO
and s represents sinO.
-c+*xx2slxlx226x31
-c++x2*xI xlx226x31
which represents
cos((x + 2) + x2) - 1.
• Recombination. The crossover operation. This can be one point (the chro-
mosomes are split in two and corresponding sections are swapped), two point
(chromosomes are split in three and the middle portion is swapped) or gene
(one entire gene is swapped between chromosomes) recombination. Typically
the sum of the probabilities of recombination is used as 0.7.
In the following program we implement these techniques. The example is the same as
for genetic programming. This implementation is faster and more accurate than the
implementation for genetic programming. This is a result of the relative simplicity of
gene expression programming. For simplicity we use only one gene in a chromosome
and only one point recombination.
394 Chapter 15. Genetic Algorithms
II gep.cpp
#include <iostream.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <string.h>
cout « 11)11;
break.;
case's': ·coat« "sin(";printr{e);
coat « II)";
break;
case 'e': cout « "exp(";printr(e);cout«")";
break;
case '+' : cout « ' ( , ;
printr(e);
cout « '+' ;
printr(e);
cout « ,) , ;
break;
case ,-, . cout « ' (' ;
printr(e);
cout « ,-, ,.
printr(e);
cout « ,) , ;
break;
case '*': cout « '(';
printr(e);
cout « '*';
printr(e);
cout«')';
break;
}
}
II initial calculations
bestf = fitness(population,data,N):
best = population:
forCi=O,sumf=O.O,iter=population;i < P:i++,iter+=gene_Ien)
{
f = fitness(iter,data,N):
sumf += f:
if (f<bestf)
{
bestf = f:
best = population+i*gene_len:
}
}
lastf = 0.0;
for(j=O;j < P;j++)
{
f = fitness(population+j*gene_len,data,N);
if«lastf<=r) && (r<f+lastf))
{
elim[j] = 1;
j = P;
}
lastf += f;
}
}
II insertion
if (double (rand())/RAND_MAX<pi)
{
II find a position in the head of this gene for insertion
II -gene_len for the gene since we have already moved
II onto the next gene
replace = i-gene_len;
rp = rand 0 'l.h;
II a random position for insertion source
replace2 = rand()%pop_len;
II a random length for insertion from the gene
rlen = rand()%(h-rp);
II create the new gene
char *c = new char[gene_len];
II copy the shifted portion of the head
strncpy(c+rp+rlen,population+replace+rp,h-rp-rlen);
II copy the tail
strncpy(c+h,population+replace+h,t);
II copy the segment to be inserted
strncpy(c+rp,population+replace2,rlen);
398 Chapter 15. Genetic Algorithms
II recombination
if(double(rand(»/RAND_MAX < pr)
{
II find a position in the gene for one point recombination
replace = i-gene_Ienj
rlen = rand()%gene_Ienj
II a random gene for recombination
replace2 = (rand()%P)*gene_Ienj
II create the new genes
char *c[5]j
c[O] = population+replacej
c[l] = population+replace2j
c[2] = new char[gene_Ien]j
c[3] = new char[gene_Ien]j
c[4] = new char[gene_Ien]j
strncpy(c[2],c[O],rlen)j
strncpy(c[2]+rlen,c[1]+rlen,gene_len-rlen)j
strncpy(c[3],c[1],rlen)j
strncpy(c[3]+rlen,c[O]+rlen,gene_len-rlen)j
II take the fittest genes
for(j=Ojj < 4jj++)
for(k=j+ljj < 4jj++)
if(fitness(c[k],data,N) < fitness(c[j],data,N»
{
strncpy(c[4],c[j],gene_Ien)j
strncpy(c[j],c[k],gene_Ien)j
strncpy(c[k],c[4],gene_Ien)j
}
delete [] c [2] j
delete [] c [3] j
delete[] c[4] j
}
}
II fitness
for(i=O,sumf=O.O,iter=populationji < Pji++,iter+=gene_Ien)
{
f = fitness(iter,data,N)j
sumf += fj
if (f < bestf)
{
bestf = fj
15.15 Gene Expression Programming 399
best population+i*gene_len;
}
}
iterations++;
}
print(best);
cout « endl;
cout « "Fitness of " « bestf « " after "
« iterations « " iterations." « endl;
for(i=O;i < N;i++)
cout « data [2*i +1] « " " « eval(best, data [2*i]) « endl;
delete[] population;
delete [] elim;
}
void main(void)
{
double data[20*2];
int i;
srand(time(NULL));
gep(data,10,50,O.OOl);
cout « endl;
gep(data,10,50,O.OOl);
cout « endl;
gep(data,10,50,0.001);
cout « endl;
}
/*
Results:
cos (x)
Fitness of 1.95888e-16 after 6 iterations.
1 1
0.809024 0.809024
0.309041 0.309041
-0.308981 -0.308981
-0.808987 -0.808987
-1 -1
-0.809061 -0.809061
-0.309101 -0.309101
0.308921 0.308921
0.80895 0.80895
(x*x)
Fitness of 0 after 0 iterations.
o0
4 4
16 16
36 36
64 64
100 100
144 144
196 196
256 256
324 324
cos««x+x)+x)-x»
Fitness of 1.59324e-16 after 191 iterations.
1 1
-0.653644 -0.653644
-0.1455 -0. 1455
0.843854 0.843854
-0.957659 -0.957659
0.408082 0.408082
0.424179 0.424179
-0.962606 -0.962606
0.834223 0.834223
-0.127964 -0.127964
*/
Part II
Quantum Computing
Chapter 16
Quantum Mechanics
It follows that
and
(j,cg) = c(j,g).
Definition. A linear space E is called a normed space, if for every fEE there is
associated a real number Ilfll, the norm of the vector f such that
11111 := J(f,J).
Then xTy = O.
Definition. A sequence {In} (n E N) of elements in a normed space E is called a
Cauchy sequence if, for every t: > 0, there exists a number M, such that II Ip - fq II < t:
for p,q > M •.
16.1 Hilbert Spaces 405
Example. The vector space C([a, b]) of all continuous (real or complex valued)
functions on an interval [a, bJ with the norm
11111 = max
[a,b]
II(x)1
is a Banach space.
Before we discuss some examples of Hilbert spaces we give the definitions of strong
and weak convergence in Hilbert spaces.
in the Hilbert space £2[0, 1rJ. The sequence does not tend to a limit in the sense
of strong convergence. However, the sequence tends to 0 in the sense of weak
convergence. "
406 Chapter 16. Quantum Mechanics
Let us now give several examples of Hilbert spaces which are important in quan-
tum mechanics (and quantum computing). Although quantum computing is mainly
discussed for finite dimensional Hilbert spaces (see chapter 17), infinite dimensional
Hilbert spaces are now also discussed [29, 30, 31, 112, 175].
Example. Every finite dimensional vector space with an inner product is a Hilbert
space. Let en be the linear space of n-tuples of complex numbers with the scalar
product
n
(u, v) := L U{Vj.
j=1
Then en is a Hilbert space. Let u E en. We write the vector u as a column vector
Example. By l2(N) we mean the set of all infinite dimensional vectors (sequences)
u = (Ul, U2, ... f of complex numbers Uj such that
00
00 00 00
L IUj + Vjl2 ::; L(IUjI2 + IVjl2 + 21UjVjl) ::; 2 L(lujl2 + IVjI2) < 00.
j=l j=l
16.1 Hilbert Spaces 407
L U{Vj = UTV.
00
(u, v) :=
j=l
It can also be proved that this pre-Hilbert space is complete. Therefore 12(N) is a
Hilbert space. As an example, let us consider
1 lIT
u=(l,-,-, ... ,-, ... ) .
2 3 n
Since
00 1
~-<oo
~n2
n=l
Then u ~ lz(N).
M
J1112 dm < 00.
The integration is performed in the Lebesgue sense. The scalar product in L2 (M)
is defined as
(I,g) := Jf(x)g(x)dm
M
where 9 denotes the complex conjugate of g. It can be shown that this pre-Hilbert
space is complete. Therefore L 2 (M) is a Hilbert space. Instead of dm we also write
408 Chapter 16. Quantum Mechanics
dx in the following. If the Riemann integral exists then it is equal to the Lebesgue
integral. However, the Lebesgue integral exists also in cases in which the Riemann
integral does not exist. ..
Example. Consider the linear space Mn of all n x n matrices over C. The trace of
an n x n matrix A = (ajk) is given by
n
trA:= Lajj.
j=l
(A, B) := tr(AB*)
where tr denotes the trace and B* denotes the conjugate transpose matrix of B. We
recall that tr(C+D) = trC+trD, where C and Dare n x n matrices. For example,
if A is the n x n unit matrix we find
Example. Consider the linear space of all infinite dimensional matrices A = (ajk)
over C such that
00 00
(A, B) := tr(AB*)
where tr denotes the trace and B* denotes the conjugate transpose matrix of B. We
recall that tr( C + D) = trC + trD where C and D are infinite dimensional matrices.
The infinite dimensional unit matrix does not belong to this Hilbert space. ..
Example. Let D be an open set of the Euclidean space Rn. Now L2 (D)pq denotes
the space of all q x p matrix functions Lebesgue measurable on D such that
~ trf(x)f(x)*d~ < 00
D
16.1 Hilbert Spaces 409
where m denotes the Lebesgue measure, * denotes the conjugate transpose, and tr
is the trace of the q x q matrix. We define the scalar product as
(j,g) := Jtrf(x)g(x)*dm.
D
Theorem. All complex infinite dimensional Hilbert spaces are isomorphic to l2(N)
and consequently are mutually isomorphic.
Definition. Let S be a subset of the Hilbert space 7t. The subset S is dense in 7t
if for everyf E 7t there exists a Cauchy sequence {Ij} in S such that Ii ----> f as
j ----> 00.
It follows from this definition that, if /C is a subspace of H, then so too is the set /C J.
of vectors orthogonal to all those in /C. The subspace /C J.. is termed the orthogonal
complement of /C in H. Moreover, any vector f in H may be uniquely decomposed
into components fK: and f KJ., lying in /C and /C J., respectively, i.e.
where I is a countable index set and 8jk denotes the Kronecker delta, i.e.
1 for j = k
{
8jk := 0 for j #- k
Example. Consider the Hilbert space 1i. = C 4 . The scalar product is defined as
4
(u, v) := L UjVj.
j=l
<1>+ = _1
y'2
(~)
0 '
1
w+ = _1
y'2
(~)
1 '
o
16.1 Hilbert Spaces 411
Let
u~~(l)
be a normalized state in C 4 . Then the expansion coefficients are given by
Consequently
{ ¢Jk(X) := vk exp(ikx)
Let f E L2 (-71", 71") with f (x) = x. Then the expansion coefficients are
-71"
_ 11"
-1['
Theorem. Every separable Hilbert space has at least one orthonormal basis.
(3) 1\
/,gE1i.
(f,g) = L(f,¢n)(g,¢n)
nEI
Ixl < 7r
l = 0, 1,2, ... }
The polynomials are called the Legendre polynomials. For the first four Legendre
polynomials we find Po(x) = 1, P1 (x) = X, P2 (x) = !(3x 2 - 1) and P3(X) =
!(5x 3 - 3x).
{
1 .
Va exp(271unja) nEZ}
~-cos 27rXn)
(- - , ~ . (27rXn)
-sm --
a a a a
B = { (211"~n/2 exp(ik· x)
B = { _1_ ei2"n.x/a
a3/ 2
where
0,1,2,3, ...
m + 1, ... , +l
-l, -l
and 0 :::; </J < 211", 0 :::; 0 < 11". The functions Ylm are called spherical harmonics.
The orthogonality relation is given by
J JYlm(O,</J)ft'm,(O,</J)~=§Il'§mm'.
" 2" dfl
(Ylm,YI'm'):=
8=0<1>=0
1
Yoo(O, </J) = . t.=
V 411"
YlO(O,</J) = {fCOSO
16.1 Hilbert Spaces 415
k = 0,1,2, ... }
The functions
are called the Hermite polynomials. For the first four Hermite polynomials we find
Ho(x) = 1, H1(x) = 2x, H2(x) = 4x 2 - 2, H3(X) = 8x3 - 12x.
11) L2(0,00)
n = 0, 1,2, ... }
where
The functions Ln are called Laguerre polynomials. For the first four Laguerre poly-
nomials we find Lo(x) = 1, L 1(x) = -x + 1, L2(x) = x 2 - 4x + 2, L3(X) =
_x3 + 9x2 - 18x + 6. eft
In many applications in quantum mechanics such as spin-orbit coupling we need
the tensor product of Hilbert spaces. The tensor product also plays a central role
in quantum computing. Let 'HI and 'H 2 be two Hilbert spaces. We first consider
the algebraic tensor product considering the spaces merely as linear spaces. The
algebraic tensor product space is the linear space of all formal finite sums
n
h = L(fj I8l gj), Ii E 'HI, gj E 'H 2
j=1
where c E C. Let Ii, hI E 'HI and gj, kl E 'H 2 • We endow this linear space with the
inner product
As an example we consider the two Hilbert spaces 'HI = L2(a, b) and 'H 2 = L 2(c, d).
Then the tensor product Hilbert space 'HI 18l'H2 is readily seen to be
the space of the functions f(XI, X2) with a < Xl < b, c < X2 < d and
d b
(f, g) :=
c a
Let 'HI = L 2(a, b) and 'H 2 = L 2(c, d). Then we have the following
Theorem. Let
{¢>n:nEN}
be an orthonormal basis in the Hilbert space L 2 (a, b) and let
be an orthonormal basis in the Hilbert space L2 (C, d). Then the set
nEN, mEN}
is an orthonormal basis in the Hilbert space L 2( (a :::; Xl :::; b) x (c :::; X2 :::; d)).
16.2 Linear Operators in Hilbert Spaces 417
It is easy to verify by integration that the set is an orthonormal set over the rectangle.
To prove completeness it suffices to show that every continuous function I with
00
c a
whose Fourier coefficients with respect to the set are all zero, vanishes identically
over the rectangle.
In some textbooks and articles the so-called Dirac notation is used to describe
Hilbert space theory in quantum mechanics (Dirac [59]). Let H be a Hilbert space
and H. be the dual space endowed with the multiplication law of the form
Definition. The linear operator A is termed bounded if the set of numbers, IIAIII, is
bounded as I runs through the normalized vectors in V(A). In this case, we define
418 Chapter 16. Quantum Mechanics
IIAII, the norm of A, to be the supremum, i.e. the least upper bound, of I AI I , as I
runs through these normalized vectors, i.e.
Example. Let 1i = en. Then all n x n matrices over e are bounded linear operators.
If In is the n x n identity matrix we have IIInll = 1. Ie
Example. Consider the Hilbert space L 2 (0, a) with a > 0. Let AI(x) := xI(x).
Then V(A) = L 2 (0, a) and IIAII = a. Ie
It follows from this definition that
If A is bounded, we may take V(A) to be 1i, since, even if this domain is originally
defined to be a proper subset of 1i, we can always extend it to the whole of this
space as follows. Since
We may then extend A to the full Hilbert space 1i, by defining it to be zero on
V(A)-L, the orthogonal complement of V(A).
On the other hand, if A is unbounded, then in general, V(A) does not comprise the
whole of 1i and cannot be extended to do so.
Example. Consider the differential operator d/dx acting on the Hilbert space, 1i,
of square-integrable functions of the real variable x. The domain of this operator
consists of those functions f(x) for which both J If(x)1 2 dx and J Idf(x)/dxI 2dx are
both finite, and this set of functions does not comprise the whole of 1i. Ie
Definition. Let A be a bounded operator in 1i. We define A*, the adjoint operator
of A, by the formula
In the case where A is an unbounded operator in H we again define its adjoint, A*,
by the same formula, except that j is confined to D(A) and 9 to the domain D(A*),
which is specified as follows: 9 belongs to D(A*) if there is a vector gA in H such
that
U,gA) = (Aj,g) for all j in D(A)
in which case gA = A*g. The operator A is termed self-adjoint if D(A*) = D(A) and
A* = A. The coincidence of D(A*) with D(A) is essential here. The domain of a
self-adjoint operator is dense in H.
Remark. If merely (Aj, g) = (f, Ag) for all j, 9 E D(A), and if D(A) is dense in H,
i.e., if A c A*, then A is called hermitian (symmetric); D(A*) may be larger than
D(A), in which case A* is a proper extension of A.
Definition. Let A be a linear operator with dense domain. Then its nullspace is
defined by
N(A) := { u E H : Au = O}.
U,Af) ~ 0
and then
(AB)v := A(Bv).
Therefore D(A* A) may be smaller than D(A).
420 Chapter 16. Quantum Mechanics
Next we summarize the algebraic properties of the operator norm. It follows from
the definitions of the norm and the adjoint of a bounded operator, together with
the triangular inequality that if A, B are bounded operators and c E C, then
IIcAl1 IcIIiAIl
IIA"AII IIAI12
IIA+BII < II All + IIBII
IIABII < IIAIIIIBII·
Definition. Suppose that K, is a subspace of 1i. Then since any vector I in 1i may be
resolved into unequally defined components !K: and !K:J. in K, and K,.L, respectively,
we may define a linear operator II by the formula III = !K:. This is termed the
projection operator from 1i to K" or simply the projection operator or projector for
the subspace K,.
It follows from this definition and the orthogonality of lie and !K:J. that
and therefore that II is bounded. It also follows from the definition of II that
II2 = II = II".
III
1(11)
=2 1 1 ' II2 =
12 (1 -1)
-1 1
(UI,Ug) = (j,g)
16.2 Linear Operators in Hilbert Spaces 421
for all vectors I, 9 in H, and if U has an inverse U- 1, i.e. UU- 1 = U-1U = I, where
I is the identity operator, i.e. I I = I for all I E H.
In other words, a unitary operator is an invertible one which preserves the form
of the scalar product in H. The above definition of unitarity is equivalent to the
condition that
U'U = UU* = I i.e. U* = U- 1.
A unitary mapping of H onto a second Hilbert space H' is an invertible transforma-
tion V, from H to H', such that
o 0
o 1
1 0
o0
Next we discuss operator convergence. Suppose that A and the sequence {An} are
bounded linear operators in H.
Example. Consider the Hilbert space L2(R). Let An be the translation operator
where {II( ¢>n) } are the projection operators for an orthonormal sequence { ¢>n} of
vectors and { Wn } is a sequence of non-negative numbers whose sum is unity. Thus,
a density matrix is bounded and positive.
where {¢>n : n E I} is any orthonormal basis set. The value of tr(B), which is
infinite for some operators, is independent of the choice of basis.
It follows from these definitions of density matrices and trace that a density matrix
is a positive operator whose trace is equal to unity.
<1>+ = _1
v'2 (~)
0 '
1
form a basis in C 4 • Consider the density matrix (we apply the Dirac notation)
This density matrix describes the Werner state. This mixed state, a ~ vs. ~ singlet-
triplet mixture, can be produced by mixing equal amounts of singlets and random
16.2 Linear Operators in Hilbert Spaces 423
we find
and
The pure states of a quantum mechanical system are given by normalized vectors in
a Hilbert space. The expectation value of an observable A (self-adjoint operator),
for the state represented by I7/» is
We have
p = p', tr(p) = 1,
For a statistical mixture of pure states, given by an orthonormal set of vectors
{W n : n = 1,2, ... , N}
where
N
LW n = 1, Wi :::0: 0 for i = 1,2, ... ,N
n=l
N
(A) = L wn(7/>nI AI7/>n).
n=l
424 Chapter 16. Quantum Mechanics
N
P := L Wnl1jJn}(1jJn I
n=1
Uo =1.
Example. Let
0
K = ( -i 0
i) .
Then
u=eiKt=(C?st -sint).
t smt cost
Next we consider linear operators in tensor product space. Suppose that HI and H2
are Hilbert spaces, and that H is a third Hilbert space, defined in terms of HI and
16.2 Linear Operators in Hilbert Spaces 425
1i2 as follows. We recall that for each pair of vectors iI, 12 in 1i I , 1i2, respectively,
there is a vector in 1i, denoted by iI 181 12, such that
If Al and A2 are operators in 1iI and 1i2, respectively, we define the operator AII8iA2
in 1iI 181 1i2 by the formula
Similarly, we may define the tensor product 1iI 181 1i2 181 ... 181 1in as well as that,
Al 181 A 2 18i .. ·181 An, of operators AI,'" , An. In standard notation, one writes
n
® 1ij = 1iI 181 1i2 181 ... 181 1in
j=I
and
n
® Aj = Al 181 A2 181 ... 181 An .
j=I
Al = (~ ~), A2 = (~ ~)
we obtain
~ (l (l n
0 1 1 0
~
D'
0 0 0 0
A, ® A, 0 0
A,®A, 0 0
1 0 0 1
Let us now discuss the spectrum of a linear operator. Let T be a linear operator
whose domain V(T) and range R(T) both lie in the same complex linear topological
space X. In our case X is a Hilbert space 1{. We consider the linear operator
T>.:=)..J -T
where A is a complex number and I the identity operator. The distribution of the
values of A for which T>. has an inverse and the properties of the inverse when it
exists, are called the spectral theory for the operator T. We discuss the general
theory of the inverse of T>. (Yosida [185]).
Definition. If Ao is such that the range R(T>.o) is dense in X and T>.o has a continuous
inverse (AoI - T)-l, we say that Ao is in the resolvent set e(T) of T, and we denote
this inverse (AoI - T)-l by R(Ao;T) and call it the resolvent (at Ao) of T. All
complex numbers A not in e(T) form a set a(T) called the spectrum of T. The
spectrum a(T) is decomposed into disjoint sets Pu(T), Cu(T) and Ru(T) with the
following properties:
Pu(T) is the totality of complex numbers A for which T>. does not have an inverse.
Pu(T) is called the point spectrum of T. In other words the point spectrum Pu(T)
is the set of eigenvalues of T; that is
Pu(T) := pEe: Tf = Ai for some nonzero f in X}.
Cu(T) is the totality of complex numbers A for which T>. has a discontinuous inverse
with domain dense in X. Cu(T) is called the continuous spectrum of T.
Ru(T) is the totality of complex numbers A for which T>. has an inverse whose
domain is not dense in X. Ru(T) is called the residual spectrum of T.
For these definitions and the linearity of the operator T we find the
Proposition. A necessary and sufficient condition for Ao E Pu(T) is that the equation
Tf = Aof
has a solution f "I 0 (J E X). In this case Ao is called an eigenvalue of T, and
f the corresponding eigenvector. The null space N(AoI - T) of T>.o is called the
eigenspace of T corresponding to the eigenvalue Ao of T. It consists of the vector 0
and the totality of eigenvectors corresponding to Ao. The dimension of the eigenspace
corresponding to Ao is called the multiplicity of the eigenvalue Ao.
Example. If the linear space X is of finite dimension, then any bounded linear
operator T is represented by a matrix (t ij ). The eigenvalues of T are obtained as
16.2 Linear Operators in Hilbert Spaces 427
the roots of the algebraic equation, the so-called secular or characteristic equation
of the matrix (t ij ):
det(.AOij - tij ) = 0
where det(.) denotes the determinant of the matrix.
Tf(x) := xf(x)
that is,
V(T) = {f(x) : f(x) and xf(x) E L 2 (R) }
and Tf(x) = xf(x) for f{x) E V(T). Then every real number ).0 is in CO'(T), i.e.
T has a purely ,continuous spectrum consisting of the entire real axis. For the proof
we refer to Yosida [185]. ,.
1
IIR().; H)II :::; ICS().)I·
Moreover,
CS((>.I - H)f, f) = CS().)llfI12, f E V(H).
Example. Let U be a unitary operator. The spectrum lies on the unit circle 1).1 = 1;
i.e. the interior and exterior of the unit circle are the resolvent set (2(U). The
residual spectrum is empty. ,.
Example. Consider the linear bounded self-adjoint operator in a Hilbert space 12(N)
o 1 0 0
1 0 1 0
o 1 0 1
A=
428 Chapter 16. Quantum Mechanics
In other words
if i=j+1
if i=j-1
otherwise
with i,j E N. We find specA = [-2,2]' i.e. we have a continuous spectrum [163].
Example. The operator -d?- /dx 2 , with a suitably chosen domain in L2 (R) has a
purely continuous spectrum consisting of the nonnegative real axis. The negative
real axis belongs to the resolvent set. ..
Example. The operator -d?- /dx 2 + x2, with a suitably chosen domain in L2(R),
has a pure point spectrum consisting of the positive odd integers, each of which is
a simple eigenvalue. ..
Example. Let 1-£ = b(N). Let A be the unitary operator that maps
onto
The point spectrum is empty and the continuous spectrum is the entire unit circle
in the>. plane. ..
Example. In the Hilbert space 1-£ = 12 (N U {O} ), annihilation and creation operators
denoted by band b* are defined as follows. They have a common domain
00
bipn ,[iiipn-l
b*ipn In + 1ipn+1'
We find that b* is the adjoint of b. We can show that
b*b-bb* =-1
The operator denotes the identity operator. The operator N = bOb with domain 'D2
is called the particle-number operator. Its action on the states ipn is given by
Nipn = Nipn
where N = 0,1,2, .... Thus the eigenvalues of N are N = 0,1,2, .... The point
spectrum of b is the entire complex plane. The point spectrum of b* is empty. The
equation
b*u = Au
implies u = 0. We can show that the residual spectrum of b* is the entire complex
plane. ,.
Remark. Instead of the notation ipn the notation In) is used in physics, where
n = 0, 1,2, ....
where II( CPn) is the projection operator and An the eigenvalue for CPn.
430 Chapter 16. Quantum Mechanics
In general, even when the operator A does not have a discrete spectrum, it may
still be resolved into a linear combination of projection operators according to the
spectral theorem [137] which serves to express A as a Stieltjes integral
A= JAdE(A)
where {E(A)} is a family of intercommuting projectors such that
E(-oo) 0
E(oo) I
E(A) < E(X) if A < A'
and E(X) converges strongly to E(A) as X tends to A from above. Here E(A) is a
function of A, i.e. X..\(A), where
I for x < A
X..\(x) = { 0
forx ~ A.
then
E(A) = L II(q'Jn) .
..\n<..\
In general, it follows from the spectral theorem that, for any positive N, we may
express A in the form
where
AN = and A~ =
-N-O -00 N+O
16.3 Schmidt Decomposition 431
Thus, A is decomposed into parts, AN and A~, whose spectra lie inside and outside
the interval [- N, N], respectively, and
A = lim AN
N---+oo
Since A and B are generally unbounded, one cannot say that AB = BA unless
the domains of AB and BA happen to be the same, whereas E()") and F()..) are
defined on allH; however ABu = BAu for all u (if any) such that both sides of the
equation are meaningful. Commuting operators A and B are said to have a simple
joint spectrum or to form a complete set of commuting observables if there is an
element X in H such that the closed linear span of the elements
is all of H. If A and B are two bounded operators in a Hilbert space we can define
the commutator [A, B] := AB - BA in the sense that for all u E H we have
m n
l'l/ih2 = L2:>ijlih 01jh
i=l j=l
where aij E C and { lih 1i = 1, ... , m } and { Ij)2 1j = 1, ... ,n } are orthogonal
basis for HI and H2, respectively. Let
I~i) := L aijljh
j=l
We notice that the I~i) need not be mutually orthogonal or normalized. Thus l'l/ih2
can be written as
Let
and let
Theorem.
1. PI and P2 have the same nonzero eigenvalues AI, . .. ,Ak (with the same multi-
plicities) and any extra dimensions are made up with zero eigenvalues, where
k :::; min(m, n). There is no need for HI and H2 to have the same dimension,
so the number of zero eigenvalues of PI and P2 can differ.
2. The state l'l/ih2 can be written as
k
l'l/ih2 = L I\lih @ l<I>i)
i=l
where the I¢i) are (not necessarily orthogonal) states in H 2 • Taking the partial trace
of l1/Jh2 12(1/J1 over H2 and equating to
Hence it turns out that the { I¢i) } are orthogonal after all. Thus at most min(m, n)
eigenvalues are non-zero. Consequently, the set of states
is an orthonormal set in the Hilbert space H 2 , where we exclude the zero eigenvalues.
It follows that
k
l1/Jh2 = L ~Iih @ I¢i)
i=1
k
P2 = L,\;I¢i)(¢il·
i=1
We have
PI ~O)(l 0 o)+~m(o o 1)
P2 ~(~)(1 O)+~(~)(O 1)
11/J} ~m®m+~m®(n
and
The matrices O"x, O"y, o"z, are called the Pauli spin matrices. Let I be the 2 x 2 unit
matrix. We find the following relationships. After squaring the spin matrices, we
have
0"; = I, O"~ = I, 0"; = I.
Since the squares of the spin matrices are the 2 x 2 unit matrix, their eigenvalues
are ±l. The anticommutators are given by
The trace of a matrix is the sum of the diagonal terms. For all three Pauli spin
matrices the trace is zero. The Pauli spin matrices are self-adjoint operators (her-
mitian matrices) and therefore have real eigenvalues. The commutators are given
by
The two matrices are projection matrices. As mentioned above the four matrices
O"±, A± form an orthonormal basis in the Hilbert space M2.
Let us now study the action of spin matrices on spin vectors. A vector u E C 2 can
be written as
Furthermore we find
and
0"; = 0,
16.4 Spin Matrices and Kronecker Product 437
In studying spin systems such as the Heisenberg model, the XY model and the
Dirac spin matrices we have to introduce the Kronecker product (Steeb [162]). Also
in the spectral representation of hermitian matrices the Kronecker product plays an
important role.
A I8i B is an (mp) x (nq) matrix and I8i is the Kronecker product (sometimes also
called tensor product or direct product).
The size of the matrices must be such that the matrix products exist. Further rules
are
AI8i (B+ C)
(A I8i Bf
BI8iA P(AI8i B)Q
where tr denotes the trace and det the determinant. The Kronecker product of two
orthogonal matrices is again an orthogonal matrix.
With the help of the eigenvalues and eigenvectors of a hermitian matrix A we can
reconstruct the matrix A using the Kronecker product.
A= LAjuj@Uj.
n
j=l
A=(~ ~)
Since
Moreover we have
and
a,~ u. u. ~ (~
® ! 0
1 0
o 01
1 0 o 0
1 a,~
' (! Il
u. ®u, ~
0
0
-~
0
0
0
0
a,~u.®u, ~ (j ~~u.®l,~ (! ~1
0 1 0 0
0
0
0
0 ~1o 1' 1 0
0 -1
-1 0 0 0 0 -1
where I is the 2 x 2 unit matrix, a = x, y, z and O'a is the a-th Pauli matrix in the
j-th location. Thus O'a,; is a 2N x 2N matrix. Analogously, we define
2
if = JLSj' S;+1
;=1
where J is the so-called exchange constant (J > 0 or J < 0) and . denotes the scalar
product. We impose cyclic boundary conditions, i.e. 8 3 == 8 1 , It follows that
Therefore
Since
Sx,1 = Sx ®I, Sx,2 = I®Sx
etc. where I is the 2 x 2 unit matrix, it follows that
Thus we obtain
Since
we obtain
o 0
o 1
1 0
o 0
where EB denotes the direct sum of matrices. The eigenvalues and eigenvectors can
now easily be calculated. We define
I it) := I j) @ I j), I il) := I j) @ 11), Ill):= 11) @ I j), I 11) := 11) @ 11)
where I j) and 11) have been given above. Consequently,
Obviously these vectors form the standard basis in C 4 . One sees at once that I it)
and I 11) are eigenvectors of the Hamilton operator with eigenvalues J /2 and J /2,
respectively. This means the eigenvalue J /2 is degenerate. The eigenvalues of the
matrix
{(-1 2)
2 2-1
442 Chapter 16. Quantum Mechanics
1
2(1 it) + I H)), ~(I it) -I H))·
Remark. More than sixty years after the formulation of quantum mechanics the
interpretation of this formalism is by far the most controversial problem of current
research in the foundations of physics and divides the community of physicists into
numerous opposing schools of thought. There is an immense diversity of opinions
and a huge variety of interpretations. A more detailed discussion of the interpreta-
tion of the measurement in quantum mechanics is given in chapter 18.
PI. The pure states of a quantum system, S, are described by normalized vectors
1/J which are elements of a Hilbert space, 1i, that describes S. The pure states of a
quantum mechanical system are rays in a Hilbert space 1i (Le., unit vectors, with
an arbitrary phase). Specifying a pure state in quantum mechanics is the most that
can be said about a physical system. In this respect, it is analogous to a classical
pure state. The concept of a state as a ray in a Hilbert space leads to the probability
interpretation in quantum mechanics. Given a physical system in the state 1/J, the
probability that it is in the state X is I(1/J, xW. Clearly
While the phase of a vector 1/J has no physical significance, the relative phase of
two vectors does. This means for lad = 1, l(a1/J,x)1 is independent of a, but 1(1/Jl +
a1/J2, X) I is not. It is most convenient to regard pure states 1/J simply as vectors in
1i, and to normalize them in an appropriate calculation.
! + ! cos(wt) - ~sin(wt)
(
exp( -iHt/li) = - ~ sin(wt) cos(wt)
! cos(wt) -! - ~sin(wt)
Let
cos(wt) - ~ sin(wt) )
exp( -iHt/li)'I/J(O) = Ja ( cos(wt) - ..J?i sin(wt) .
cos(wt) - ~ sin(wt)
The probability
p(t) = 1('I/J(t),'I/J(O)W
is given by
PIV. If the state of the system is described by the normalized vector 'IjJ, then a
measurement of a will yield the eigenvalue Aj with probability
Notice that (<pj, 'IjJ) can be complex. It is obvious that 0 ::; Pj ::; 1.
PV. Immediately after a measurement which yields the value Aj the state of the
system is described by IIj'IjJ, where IIj is the projection operator which projects onto
the eigenspace of the eigenvalue Aj •
The type of time evolution implied by PV is incompatible with the unitary time
evolution implied by PII. PIV can be replaced by the weaker postulate:
Clearly PIV' is a special case of PIV but it is not a statement about probabilities.
The replacement of PIV by PIV' eliminates the immediate need for PV since the
state is <Pj before and after the measurement.
PVI. Quantum mechanical observables are self-adjoint operators on Ji. The ex-
pected (average) value of the observable b with the corresponding self-adjoint oper-
ator B in the normalized state 'IjJ is
p(B) = tr(pB)
trp
16.5 Postulates of Quantum Mechanics 445
where tr denotes the trace. If p has rank 1, then p(B) is a pure state with p/trp the
projection onto 1/J. Otherwise, p(B) is a convex linear combination of pure states,
p(B) = LG:j(<pj,B<pj)
j
PVII. The Hamilton operator H is the infinitesimal generator of the unitary group
U(t) := exp( -itH /h)
of time translations. The unit of action h (h/27r) has the same dimension as pq,
where p is the momentum and q is the position [59].
The momentum operator p is the infinitesimal generator of the unitary space trans-
lation group
exp(iq· Plh)
where
N 3
q. p:= L LqkjPkj.
k=lj=l
We recall that
exp(a· V)u(q) = u(q + a)
where u is a smooth function and
The angular momentum operator j is the infinitesimal generator for the unitary
space rotation group
exp( -i8 . j).
Remark. This leads to the quantization. Consider the energy conservation equation
E=
N
LL
3
k=lj=l
:k 2
j
mk
+ V(q).
446 Chapter 16. Quantum Mechanics
.to 8
Pkj -+ - Z n - .
8qkj
Applying this operator relation to a wave function ,¢(q, t) we obtain the Schrodinger
equation.
The time translation group U(t) determines the dynamics. There are two standard
descriptions: the Schrodinger picture and the Heisenberg picture. In the Schrodinger
picture, the states '¢ E 1f. evolve in time according to the Schrodinger equation, while
the observables do not evolve. The vectors satisfy the Schrodinger equation. The
time-dependent normalized state '¢ (t) yields the expectation
The second description of dynamics is the Heisenberg picture, in which the states
remain fixed, and the observables evolve in time according to the automorphism
group
(it In)n
B(t) = E
00
n=O
-,-[H, [H, ... , [H, B], .. .]] = exp(zHtln)Bexp(-zHtln).
n.
A A A • A • A
16.5 Postulates of Quantum Mechanics 447
Postulate PVII ensures that the results of an experiment, i.e., inner products ('l/J, X),
are independent of the time at which the experiment is performed. This means
'l/J'=A'l/J.
This choice is called the Schrodinger representation (as distinct from the Schrodinger
picture). The function 'l/J(q) E 1i has the interpretation of giving the probability
distribution
p(q) = 1'l/J(q)12
for the position of the particles in R3N. Using postulate PVII, we find
'J. {)
= - Z{)qkj
A
Pk" -+ Pk" n-
J J
N 3 2
H=EE:kj +V(q)
k=lj=l mk
448 Chapter 16. Quantum Mechanics
, N 3 n,2 82 ,
H=-LL--2
k=l j=l 2mk 8qkj
+V(q).
In other words the Hamilton operator if follows from the Hamilton function H via
the quantization
The operator rlkj is defined by rlkjf(q) .- qkjf(q). We find for the (canonical)
commutation relations
[rlkj,qk1j'] 0
[Pkj, Pk' j'] 0
[Pkj,rlk1j'] -iMkk1ojjlI
Thus far the spin of the particle is not taken into account. We have spin 0 for 7r
mesons, spin ~ for electrons, muons, protons, or neutrons, spin 1 for photons, and
higher spins for other particles or nuclei. To consider spin-dependent forces (for
example the coupling of the spin magnetic moment to a magnetic field) we have to
extend the Hilbert space L 2 (R3N ) to the N-fold tensor product
Here L2(R3, S) denotes functions defined on R3N with values in the finite dimen-
sional spin space S. For spin zero particles we have S = C, and we are reduced
to L 2 (R3N ). For nonzero spin s, we have S = C 2s+l. We write ¢(q) as a vector
with components 11'( q, (). A space rotation (generated by the angular momentum
observable J) will rotate both q and (, the latter by a linear transformation of the
( coordinates according to an N-fold tensor product of a representation of the spin
group SU(2, R). The group SU(2, R) consists of all 2 x 2 matrices with
(qk, (k), k = 1, 2, ... , N. The standard choices are the totally symmetric repre-
sentation for integer spin particles and the totally antisymmetric representation for
half-integer spin particles.
The choice of antisymmetry for atomic and molecular problems with spin ~ is known
as the Pauli exclusion principle. One can prove that integer spin particles cannot be
antisymmetrized and half-integer spin particles cannot be symmetrized. Particles
with integer spin are called bosons. Those with half-integer spin are called fermions.
17.1 Introduction
Digital computers are based on devices that can take on only two states, one of
which is denoted by 0 and the other by 1. By concatenating several Os and Is
together, 0-1 combinations can be formed to represent as many different entities
as desired. A combination containing a single 0 or 1 is called a bit. In general,
n bits can be used to distinguish among 2n distinct entities and each addition of
a bit doubles the number of possible combinations. Computers use strings of bits
to represent numbers, letters, punctuation marks, and any other useful pieces of
information. In a classical computer, the processing of information is done by logic
gate. A logic gate maps the state of its input bits into another state according to a
truth table. Quantum computers require quantum logic, something fundamentally
different to classical Boolean logic. This difference leads to a greater efficiency of
quantum computation over its classical counterpart.
In the last few years a large number of authors have studied quantum computing
([122], [8]). The most exciting development in quantum information processing has
been the discovery of quantum algorithms - for integer factorization and the discrete
logarithm - that run exponentially faster than the best known classical algorithms.
These algorithms take classical input (such as the number to be factored) and yield
classical outputs (the factors), but obtain their speedup by using quantum inter-
ference computation paths during the intermediate steps. A quantum network is a
quantum computing device consisting of quantum logic gates whose computational
steps are synchronised in time. Quantum computation is defined as a unitary evo-
lution of the network which takes its initial state input into some final state output.
In a quantum computer the quantum bit [8, 18, 21, 132, 138, 156, 178] or simply
qubit is the natural extension of the classical notion of bit. A qubit is a quantum
two-level system, that in addition to the two pairwise orthonormal states 10) and
11) in the Hilbert space C 2 can be set in any superposition of the form
Co, Cl E C.
Since 1'ljJ) is normalized, i.e. ('ljJI'ljJ) = 1, (111) = 1, (010) = 0, and (011) = 0 we have
ICol 2 + ICll2 = 1.
Any quantum two-level system is a potential candidate for a qubit. Examples are
the polarization of a photon, the polarization of a spin-1/2 particle (electron), the
relative phase and intensity of a single photon in two arms of an interferometer, or
an arbirary superposition of two atomic states. Thus the classical Boolean states,
oand 1 can be represented by a fixed pair of orthogonal states of the qubit. In the
following we set
Often the representations of 10) and 11) are reversed, this changes the matrix repre-
sentation of operators but all computations and results are equivalent. In the follow-
ing we think of a qubit as a spin-1/2 particle. The states 10) and 11) will correspond
respectively to the spin-down and spin-up eigenstates along a pre-arranged axis of
quantization, for example set by an external constant magnetic field. Although a
qubit can be prepared in an infinite number of different quantum states (by choosing
different complex coefficient £;'s) it cannot be used to transmit more than one bit of
information. This is because no detection process can reliably differentiate between
non-orthogonal states. However, qubits (and more generally information encoded
in quantum systems) can be used in systems developed for quantum cryptogra-
phy, quantum teleportation or quantum dense coding. The problem of measuring a
quantum system is a central one in quantum theory. In a classical computer, it is
possible in principle to inquire at any time and without disturbing the computer)
about the state of any bit in the memory. In a quantum computer, the situation
is different. Qubits can be in superposed states, or can even be entangled with
each other, and the mere act of measuring the quantum computer alters its state.
Performing a measurement on a qubit in a state given above will return 0 with
probability ICol 2 and 1 with probability IClI2. The state of the qubit after the mea-
surement (post-measurement state) will be 10) or 11) (depending on the outcome),
17.2 Quantum Bits and Quantum Registers 453
and not ColO) + clI1). We think of the measuring apparatus as a Stern-Gerlach de-
vice [23,68] into which the qubits (spins) are sent when we want to measure them.
When measuring a state of outcomes 0 and 1 will be recorded with a probability
ICol 2 and ICll2 on the respective detector plate.
For instance, the binary form of 9 (decimal) is 1001 and loading a quantum register
with this value is done by preparing four qubits in the state
Thus the state 19) == 11001) is an element in the Hilbert space C16. In the literature
the notation
11)10)10)11)
is sometimes used, i.e. the symbol @ is omitted. Consider first the case with two
quantum bits. Then we have the basis
Two states la) and Ib) are orthogonal if aj i= bj for at least one j. For an n-bit
register, the most general state can be written as
2n-l
I1/;) = L cxlx)
x=Q
where
2n-l
L
x=o
Icx l2 = 1.
Measuring the state of a register is done by passing, one by one, the various spins
that form the register into a Stern-Gerlach apparatus and recording the results. For
instance a two-bit register initially prepared in the state
will, with equal probability, result in either two successive clicks in the down-detector
or two successive clicks in the up-detector. The post measurement state will be either
where @ denotes the Kronecker product [162, 163]. In other words, what is the
condition on lu) such that Ix) and Iy) exist? If no such Ix) and Iy) exist then lu)
is said to be entangled. If Ix) and Iy) do exist we say that lu) is not entangled. As
an example the state
is not entangled. The Bell basis states [132] (which form a basis in C 4 ) are given by
(17.1 )
I!) := ( ~) .
1
v'2 (lam - l!3a))
iO
~(I H) -I it))
Thus measurement of q,- always yields opposite outcomes for the two qubits, inde-
pendent of the basis. The Bell states
cp+ = _1
v'2
(~) 0 '
1
q,+ = ~v'2 ~
(
1
)
'
q, __ ~
- v'2 ( ~10)
1
are entangled. The entangled state w+ is also called the EPR state, after Einstein,
Podolsky and Rosen [60]. Entangled states exhibit nonlocal correlations. This
means that two entangled systems which have interacted in the past and are no
longer interacting still show correlations. These correlations are used for example
in dense coding and quantum error-correction techniques [19, 132]. The Bell states
can be characterized as the simultaneous eigenvectors of the 4 x 4 matrices
17.3 Entangled States 457
The measure for entanglement for pure states E(u) is defined as follows [19, 132J
PA:= trBlu)(ul,
and
S(p) := -trplog2 p.
°
Thus S; E S; 1. If E = 1 we call the pure state maximally entangled. If E = 0,
the pure state is not entangled.
° °
we find
PB .- trA(17jI)(7jIi)
WI Q9 h) I7jI)(7j1 I(I j) Q9 h) + WI Q9 12 )17jI)(7jI1(ll) Q9 12 )
!12
Thus
E", = S(PB) = -trpB log2 PB = O.
This state is not entangled.
The Schmidt number can also be used to characterize entanglement. The Schmidt
number is the number of nonzero eigenvalues of PA and PB. A pure state is entangled
if its Schmidt number is greater than one. In this case we have E > O. Otherwise
the pure state is not entangled and we have E = O.
Next we derive the requirement for a state in C 4 to be entangled. We use the
representation
Ix) =( :~ ) ,
Since In) is normalized at least one of Ul, U2, U3, U4 is nonzero. From the normaliza-
tion conditions and (1) we find
IUll 2+ IU21 2 + IU31 2+ IU412 1 (17.2)
IXll2 + IX212 1 (17.3)
IYll2 + IY21 2 1 (17.4)
XIYl (17.5)
XIY2 (17.6)
X2Yl (17.7)
X2Y2 (17.8)
17.3 Entangled States 459
(17.9)
From (17.3)-(17.8) we obtain
(17.18)
and
460 Chapter 17. Quantum Bits and Quantum Computation
This follows from the fact that if lu) = Ix) Q9 Iy) is a decomposition of lu) then
BER
for some k, l E {1,2}. XIk can be written as XIk = CX2k, C E C which gives CYll =
Y21. Let k' := 3 - k and l' := 3 - l. If XIk' is nonzero then X2k' is nonzero and
XIk'Yll = X2k'Y21 so that XIk' = CX2k" Similarly if Yll' is nonzero then CYll' = Y21"
Thus decomposition is unique up to a phase factor.
Next we describe the relation between condition (17.9) and the measure of entan-
glement introduced above. Since
UIU2 UIU3
lu)(ul =
C'"'
U2 U I
U3UI
U2U2
U3U2
U2U3
U3U3
"'U')
U2U4
U3U4
(17.25)
U4UI U4U2 U4U3 U4U4
we find
(17.26)
(17.27)
The 2 x 2 density matrices PA and PB given by (17.26) and (17.27) are hermitian and
have the same eigenvalues. Thus the eigenvalues Al and A2 are real. The matrices
are also positive semi-definite i.e. for allla) E C 2 we have (aIPA,Bla) 2: O. Thus the
eigenvalues are non-negative. The eigenvalues are given by
(17.28)
17.3 Entangled States 461
tr(trAlu)(ul) = 1 (17.29)
tr(trRlu)(ul) =1 (17.30)
and therefore
(17.31)
where we used the fact that the trace of an n x n matrix is the sum of its eigenvalues.
This can also be seen from (17.28). Thus 0 ::::; >'1, A2 ::::; 1. Now we have
P~ = PA and P~ = PR·
Using the computer algebra system SymbolicC++ [169] the expression UIU4 - U2U3
can be evaluated symbolically and compared against 0 which then provides the in-
formation whether the state is entangled or not. SymbolicC++ includes among
462 Chapter 17. Quantum Bits and Quantum Computation
other classes a template class Complex and a Sum class to do the symbolic manip-
ulations. If the state is entangled, then we can use equations (38) and (33) to find
the entanglement E.
A remark is in order about the precision of the numerical calculations of the condi-
tion Ul U4 = U2U3 and the entanglement E to test for non-entanglement. To test the
condition Ul U4 = U2U3 has the advantage that it consists of only multiplication of
complex numbers and the normalization factor of the vector lu) must not be taken
into account. On the other hand if the difference IUlu4 - u2u31 is of order 0(10- 15 )
the term
can be taken as
log(1 + 0(10-30 )).
Therefore 1 + 0(10- 3°) is rounded to log 1 for data type double. Thus the entangle-
ment E is less affected by the problem of the floating point comparison. However
in calculating E we have to take into account the normalization factor of the vector
lu). Warnings should be issued if E or IU1u4 - u2u31 are close to zero when we use
the data type double. Java and a number of computer algebra systems admit a
data type of arbitrary precision of floating point numbers. For example, Java has
the abstract data type BigDecimal. Then we can work with higher precision. An
important special case arises when one of the components of the vector lu) is equal
to zero. For example, say U4 = O. If the state lu) is non-entangled then U2 or U3
must be zero.
The analysis of separability can be extended to higher dimensions, for example Steeb
and Hardy [167] consider when states in C9 can be separated into a product of two
states in C 3 . They have only considered separability of pure states.
The more general question of the separability of mixed states has been considered
in [90, 91, 127].
17.4 Quantum Gates 463
'Ij;(t) = e-itH/h'lj;(O)
and since H is self-adjoint exp( -itH In) is unitary. Thus the evolution of states in
quantum computation is described by unitary operations.
with B, 6, 17, T E R.
Theorem. Given any unitary transformation U and E > 0 there exists simple unitary
transformations U1 , U2 , . .. ,Uk such that
464 Chapter 17. Quantum Bits and Quantum Computation
This theorem is important for the discussion of universality (see section 17.4.6).
The quantum NOT gate for the two quantum bit case would be then the unitary
4 x 4 matrix
since
This can be extended to any dimension. The unitary matrix UNOT is a permutation
matrix. The NOT gate is a special case of the unitary matrix
if 0: = 1.
17.4 Quantum Gates 465
Note that it evolves classical states into superpositions and therefore cannot be
regarded as classical. Thus UH is given by the 2 x 2 unitary matrix
since
UH ll) == v'2
1(11)(1)
-1 1 0 = v'2
1(1-1) = v'2(IO)
1 -11)).
466 Chapter 17. Quantum Bits and Quantum Computation
The Walsh-Hadamard gate is a special case of the rotation matrix when 0 = -~,
The Walsh-Hadamard gate is quite useful when extended using the Kronecker prod-
uct. If we take an n-bit quantum register initially in the state
100 ... 0)
and apply UH to every single qubit of the register. The resulting state is
1 2"-1
I1/!) = 2n/2 L
x=o
Ix).
17.4 Quantum Gates 467
Y = Yo + yl 2 + ... + Yn_1 2n - l ,
in other words the register is prepared as
1 2n-1
(UH Q9 UH Q9 •.• Q9 UH)ly) = 2n/2 L (-I)x*Ylx).
x=O
where
x * Y = (xo . Yo) EB (Xl' YI) EB ... EB (Xn-l . Yn-l).
This means with a linear number of operations (i.e. n applications of UH) we have
generated a register state that contains an exponential (2n) number of distinct terms.
Using quantum registers, n elementary operations can generate a state containing
all 2n possible numerical values of the register. In contrast, in classical registers
n elementary operations can only prepare one state of the register representing
one specific number. It is this ability of creating quantum superpositions which
makes the quantum parallel processing possible. If after preparing the register in a
coherent superposition of several numbers all subsequent computational operations
are unitary and linear (i.e. preserve the superpositions of states) then with each
computational step the computation is performed simultaneously on all the numbers
present in the superposition.
UxoRla,b) := la,aEBb).
Consequently
UXORIOO) = 100), UxoR I01) = 101), UxoR llO) = 111), UxoR ll1) = 110).
The vectors 100),101),110) and 111) form an orthonormal basis in C 4 . If we consider
the basis in this order, the matrix representation of Ux 0 R is
o 0
1 0
o 0
o 1
468 Chapter 17. Quantum Bits and Quantum Computation
If we consider the order 111), 110), 101), 100), the matrix representation is
1 0
o 0
o 1
o 0
Both matrices are permutation matrices. Sometimes in the literature the definition
UxORla, b) := la ED b, b)
is used. Furthermore the XOR gate is also called the controlled NOT gate (GNaT
gate). The name comes from the fact that the gate effects a logical NOT on the
second qubit (target bit), if and only if the first qubit (control bit) is in state 1.
We see that UXOR cannot be written as a Kronecker product of 2 x 2 matrices.
Two interacting magnetic dipoles sufficiently close to each other can be used to
implement this operation. The XOR gate is denoted by
J)
100) >-> 100), 101) >-> 110), 110) >-> 101), 111) >-> 111).
We have
UEXCH := 100)(001 + 110)(011 + 101)(101 + 111)(111·
The matrix representation is
o 0
o 1
1 0
o 0
where a, bE {O, I} and . denotes the classical AND operation. Thus we have
The gate performs a conditional phase shift, i.e. a multiplication by a phase factor
only if the two qubits are both in their 11) state. The three other basis states are
ei </:>
unaffected. An important special case is if ¢ = 7r. The phase shift gate is denoted
by
The phase shift gate which acts on one qubit is defined as (in matrix notation)
(
° °)
e-i</:>
e'</:> .
which is a special case of Deutsch's gate (given below) for a = 1. Thus we obtain
the matrix representation
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
UTOFFOLI = 0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
!
Figure 17.5: Toffoli Gate
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
UD(o:) =
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 i cos( 1l'o:/2) sin( 1l'o:/2)
0 0 0 0 0 0 sin( 1l'o:/2) i cos(m:1O/2)
Deutch's gate, described in the previous section is one such gate [57]. It is a class
of gates described by a real parameter. Thus to prove a set of quantum gates is a
universal set, all that is required is to show that the set can implement Deutch's
gate. For example UXOR together with the set of all single qubit transformations
described by the matrix
It has also been shown that a combination of single and double qubit operations
[6, 57] can also form a universal set.
472 Chapter 17. Quantum Bits and Quantum Computation
17.4.7 Functions
Now we illustrate how to construct a simple transformation implementing a classical
function. We consider only the case where one qubit is changed, i.e. the function
we compute over the input gives the value 0 or 1. A simple permutation, which
is unitary, and its inverse allows functions with a greater number of output qubits
to be computed. Suppose the input consists of n qubits, and the function to be
calculated is f := {O, 1, ... ,2n - I} - {O, I}. A unitary transform given by
2"-1 1
Uf := L L Ij)(jl ® Ik EB f(j))(kl
j=O k=O
to ensure unitarity.
For example, the sum bit of the full adder would be implemented as
Next we describe how quantum computers deal with functions ([8], [123]). Consider
a function
f: {O, 1, ... , 2m - I} --t {O, 1, ... , 2n -l}
where m and n are positive integers. A classical device computes f by evolving each
labelled input
0,1, ... ,2m -1
into its respective labelled output
Quantum computers, due to the unitary (and therefore reversible) nature of their
evolution, compute functions in a slightly different way. It is not directly possible
to compute a function f by a unitary operation that evolves Ix) into If(x)). If f is
not a one-to-one mapping (i.e. if f(x) = f(y) for some x f= y), then two orthogonal
kets Ix) and Iy) can be evolved into the same state
If(x)) = If(y))·
Thus this violates unitarity. One way to compute functions which are not one-to-
one mappings, while preserving the reversibility of computation, is by keeping the
record of the input. To achieve this, a quantum computer uses two registers; the first
register to store the input data, the second one for the output data. Each possible
input x is represented by the state Ix), the quantum state of the first register.
Analogously, each possible output y = f(x) is represented by Iy), the quantum
state of the second register. States corresponding to different inputs and different
outputs are orthogonal,
(xix') oxx"
(yly/) Oyyl .
Thus
((y'l ® (x'I)(lx) ® Iy)) = OxxlOyyl .
The function evaluation is then determined by a unitary evolution operator Uf that
acts on both registers
Uflx) ® 10) = Ix) ® If(x)).
A reversible function evaluation, i.e. the one that keeps track of the input, is as
good as a regular, irreversible evaluation. This means that if a given function can
be computed in polynomial time, it can also be computed in polynomial time using
a reversible computation. The computations we are considering here are not only
reversible but also quantum, and we can do much more than computing values of
f(x) one by one. We can prepare a superposition of all input values as a single state
474 Chapter 17. Quantum Bits and Quantum Computation
and by running the computation Uf only once, we can compute all of the 2m values
1(0), ... , 1(2m - 1),
How much information about 1 does the state 1'If) contain? No quantum measure-
ment can extract all of the 2m values
Nielsen and Chuang [123] use a different notation. The initial state is assumed to
be of the form
Id} ® IP}
where Id} is the state of the m-qubit data register, and IP} is a state of the n-qubit
program register. The two registers are not entangled. The dynamics of the gate
array is given by
Id} ® IP} ~ G(ld} ® IP})
where G is a unitary operator. This operation is implemented by some fixed quan-
tum gate array. A unitary operator, U, acting on m qubits, is said to be implemented
by this gate array if there exists a state lPu} of the program register such that
for all states Id) of the data register, and some state IPb) of the program register.
To see that IPb} does not depend on Id}, suppose that
(P{IP~) =1
provided
Thus
IP{) = I~)
and therefore there is no Id) dependence of IPu). The case (ddd2 ) = 0 follows
by similar reasoning. Nielsen and Chuang [123] show how to construct quantum
gate arrays that can be programmed to perform different unitary operations on a
data register, depending on the input to some program register. Furthermore, they
show that a universal quantum register gate array - a gate array which can be
programmed to perform any unitary operation - exists only if one allows the gate
array to operate in a probabilistic fashion. Since the number of possible unitary
operations on m qubits is infinite, it follows that a universal gate array would require
an infinite number of qubits in the program register, and thus no such array exists.
Suppose distinct (up to a global phase) unitary operators U1 , ... UN are implemented
by some programmable quantum gate array. Nielsen and Chuang [123] showed that
the program register is at least N dimensional, that is, contains at least log2 N
qubits. Moreover, the corresponding programs Ig), .. . IPN ) are mutually orthog-
onal. A deterministic programmable gate array must have as many Hilbert space
dimensions in the program register as the number of programs implemented.
476 Chapter 17. Quantum Bits and Quantum Computation
where each operator Ui places the result in register (of appropriate size) i. The final
result is placed in register n. We assume each register i is in an initial state 10};-
The register indicated by 0 is in the initial state la) which serves as a parameter to
the algorithm. Each Ui successively places the result Ji(a) of the computation in
register i, given the values a, h(a), . .. ,fi-l(a) as parameters. Thus application of
the operators gives
Since we only require the result fn(a) we can apply the inverse operations
U~_l' U~_2'···' U;
u;u; ... U~_l(la) ® Ih(a)) ® Ih(a)) ® ... ® fn(a)) = la) ® IOh ® .. ·IO)n-l ® Ifn).
This can be understood by examining the register content after each unitary oper-
ation
Each step depends only on previously calculated values, thus reversing each compu-
tation from Un - 1 to U1 does not destroy the final result. This method of regaining
the use of temporary registers is termed garbage disposal, as it eliminates "garbage"
in temporary registers which is no longer useful.
17.6 Quantum Copying 477
Theorem. Given an arbitrary state, 11,b) no unitary matrix U exists such that
UI1,b, 0) = 11,b,1,b)·
Proof. Suppose U does exist. Then for a state la) and a different state Ib) we have
Now
U(la) + Ib)) @ 10) = (Ia) + Ib)) @ (Ia) + Ib))
and
U(la) + Ib)) @ 10) = la) @ la) + Ib) @ Ib).
This is a contradiction since in general
Mozyrsky et al. [122] derived a Hamilton operator for copying the basis up and
down states of a quantum two-state system - a qubit - onto n copy qubits (n 2: 1)
initially prepared in the down state. The qubit states by quantum numbers are
denoted by qj = 0 (down) and qj = 1 (up), for spin j. The states of the n + 1 spins
will then be expanded in the basis
operator is not unique. One chooses a particular transformation that allows analyt-
ical calculation and, for n = 1, yields a controlled-NOT gates. They considered the
following unitary transformation.
The sum in the fourth term, {q;}, is over all the other quantum states of the system,
i.e., excluding the three states
The first two terms accomplish the desired copying transformation. The third term
is needed for unitarity since the quantum evolution is reversible. General phase
factors are allowed in these terms. Thus
o ).
eip
Thus the eigenvalues of the Hamiton operator in the selected subspace are given by
E3
n 27rn
= - - p - -N3.
~t ~t
Universal optimum cloning [33, 35] is an attempt to provide the best copy of an
arbitrary quantum state given the constraints of quantum mechanics. The specific
constraints specified for the copy operation are given as follows
1. The density operators of the source and destination states must be identical
after the copy operation
2. All pure states should copy equally well. This can be implemented, for ex-
ample, by requiring that, for a certain distance measure, the copied state is
always a fixed distance d from the original pure state.
3. The distance between the state to be copied and the copy must be a minimum.
The distance between the original state before and after copying must also be
minimized.
Using the Bures distance for density operators,
Buzek and Hillery [35] found the following transformations satisfy the given con-
straints
UoQcIO) i8llj) i8llq) := ~Ij) i8llj) i8llaj) + ~(IO) i8l11) + 11) i8l10)) i8l1J31-j)
where Iq) is the initial state of the ancillary system used in the copying process and
lao) and lal) are orthonormal states in the Hilbert space of the ancillary system.
Using a slightly different approach Brufi et al. [33] found the same transformation.
480 Chapter 17. Quantum Bits and Quantum Computation
Let us assume the state I") E C 2 is not entangled, i.e. UIU4 = U2U3. The C++
program decompose. cpp will calculate the decomposition into Ix) and IY) using
(17.18)~(17.21) assuming I") is normalized. We use a two-dimensional array of
data type double to represent the state Ill) and an array of two double variables
to represent the real and imaginary parts of the complex numbers. Owing to the
numerical calculation of Ix) and IY) these states can contain small rounding errors.
II decompose.cpp
#include <iostream>
#include <cmath>
using namespace std;
void factor(double x[2] [2] ,double y[2] [2] ,double u[4] [2])
{
double xln,x2n,yln,y2n,uls,u2s,u3s,u4s;
double aul,au2,au3,a4;
int i;
double x [2] [2] , y [2] [2] ;
We consider now three applications for quantum computing [166] and give the sim-
ulation using SymbolicC++ [169]. First we show how entangled states can be gen-
erated from unentangled states using unitary transformations. The quantum circuit
is also given. Next we consider a quantum circuit for swapping two bits. The third
application deals with teleporation [17, 28]. Finally, we consider the Greenberger-
Horne-Zeilinger state [96]. Then we provide the SymbolicC++ [169] implementation
of these applications.
In our first example we start from standard basis (unentangled states) in the Hilbert
space C 4 and transform them into the Bell states. The Bell states are defined as
The Bell states also form a basis in C 4 . They are entangled. Entangled states
exhibit nonlocal correlations. This means that two entangled systems which have
interacted in the past and are no longer interacting still show correlations. These
correlations are used for example in dense coding and quantum error-correction
techniques [162, 17]. To transform the standard basis into the Bell states we apply
the following two unitary transformations. The first unitary transformation is given
by
As our second example we consider the swapping of a pair of bits. The circuit for
swapping a pair of bits is given by
o 0
o 0
o 1
1 0
Thus we find the permutation matrix UEXCH . This permutation matrix cannot be
represented as the Kronecker product of 2 x 2 matrices.
1 1
11l1)GHZ := J2(1000) + 1111) == J2(10) @ 10) @ 10) + 11) @ 11) @ 11)).
1 T
11l1)GHZ = J2(10000001)
where T stands for transpose. If we consider 000 and III to be the binary represen-
tation of "0" and "7", respectively, the GHZ state simply represents the coherent
484 Chapter 17. Quantum Bits and Quantum Computation
superposition 1/v'2(I"O") + 1"7")). In this state all three qubits are either 0 or 1
but none of the qubits has a well-defined value of its own. Measurement of anyone
qubit will immediately result in the other two qubits attaining the same value. For
example
II qthree.cpp
#include <iostream>
#include "Vector.h"
#include "Matrix.h"
#include "Rational.h"
#include "Msymbol.h"
using namespace std;
Matrix<T> I(2,2),H(2,2),X(4,4);
1. identity 0 ;
Matrix<T> UH=kron(H,I);
X[O][O] = T(l); X[O] [1] = T(O); X[O] [2] = T(O); X[O] [3] = T(O);
X[l] [0] = T(O); X[l] [1] = T(1); X[1] [2] = T(O); X[1] [3] = T(O);
X[2] [0] = T(O); X[2] [1] = T(O); X[2] [2] = T(O); X[2] [3] = T(1);
X[3] [0] = T(O); X[3] [1] = T(O); X[3] [2] = T(l); X[3] [3] = T(O);
return (X*(UH*v»;
}
1. identity 0 ;
486 Chapter 17. Quantum Bits and Quantum Computation
X[O] [0] = T(1); X[O] [1] = T(O); X[O] [2] = T(O); X[O] [3] = T(O);
X[l] [0] = T(O); X[1] [1] = T(l); X[l] [2] = T(O); X[l] [3] = T(O);
X[2] [0] = T(O); X[2] [1] = T(O); X[2] [2] = T(O); X[2] [3] = T(O;
X[3][0] = T(O); X[3][1] = T(O); X[3][2] = T(l); X[3][3] = T(O);
Matrix<T> Ul=kron(I,kron(H,I»;
Matrix<T> U2=kron(I,X);
Matrix<T> U3=kron(X,I);
Matrix<T> U4=kron(H,kron(I,I»;
Matrix<T> U5=kron(I,X);
Matrix<T> U6=kron(I,kron(I,H»;
Matrix<T> U7=dsum(I,dsum(I,dsum(NOT,NOT»);
Matrix<T> U8=kron(I,kron(I,H»;
result=U8*(U7*(U6*(U5*(U4*(U3*(U2*(Ul*v»»»);
for(i=0;i<8;i++)
{
while(result[i].put(power(sqrt(T(2»,-6),power(T(2),-3»);
while(result[i].put(power(sqrt(T(2»,-4),power(T(2),-2»);
while(result[i].put(power(sqrt(T(2»,-2),power(T(2),-1»);
}
return result;
}
len = v.length()/int(pow(2,qubit+l»;
for(i=O;i<v.length();i++)
{
if(!(i%len» skip = l-skip;
if(skip) result[i] = T(O);
else D += result[i]*result[i];
}
result/=sqrt(D);
return result;
}
17.7 Example Programs 487
if(v.length()==2) b=b2;
if(v.length()==4) b=b4;
if(v.length()==8) b=b8;
for(i=O;i<v.length();i++)
if(!v[i] .is_Number() I I v[i] .nvalue()!=C(O))
o « "+(" « v[i] « ")" « b[i];
return 0;
}
void main(void)
{
Vector<C> zero(2),one(2);
Vector<C> zz(4),zo(4) ,oz(4) ,00(4) ,qreg;
Vector<C> tpOO,tpOl,tpl0,tpll,psiGHZ;
Sum<Rational<int> > a("a" ,0) ,b("b" ,0);
int i;
cout « endl;
cout « "UXORIOO> , print(cout,XOR(zz)) « endl;
II.
cout « endl;
cout « "UBELLIOO> , print(cout,Bell(zz)) « endl;
II.
cout « endl;
cout « "USWAPIOO> , print (cout ,Swap(zz)) « endl;
II.
488 Chapter 17. Quantum Bits and Quantum Computation
qreg=kron(a*zero+b*one,kron(zero,zero»(0);
cout « "UTELEPORT("; print(cout,qreg) « ") = ";
print(cout,qreg=Teleport(qreg» « endl;
cout « "Results after measurement of first 2 qubits:" « endl;
tpOO = Measure(Measure(qreg,O,O),l,O);
tpOl = Measure(Measure(qreg,O,O),l,l);
tpl0 = Measure(Measure(qreg,O,l),l,O);
tpll = Measure(Measure(qreg,O,l),l,l);
for(i=0;i<8;i++)
{
while(tpOO[i].put(a*a,C(l)-b*b»;
while(tpOO[i].put(power(sqrt(C(1)/C(2»,-2),C(2»);
while(tpOl[i].put(a*a,C(l)-b*b»;
while(tpOl[i].put(power(sqrt(C(1)/C(2»,-2),C(2»);
while(tpl0[i].put(a*a,C(1)-b*b»;
while(tpl0[i] .put(power(sqrt(C(1)/C(2»,-2),C(2»);
while(tpll[i].put(a*a,C(l)-b*b»;
while(tpll[i].put(power(sqrt(C(1)/C(2»,-2),C(2»);
}
cout « 100>
II print (cout,tpOO) « endl;
cout « 101>
II print (cout, tpOi) « endl;
cout « II 110> print (cout,tpl0) « endl;
cout « II 111> print (cout, tpll) « endl;
cout « endl;
psiGHZ=(kron(Matrix<C>(zz),Matrix<C>(zero»/sqrt(C(2»
+kron(Matrix<C>(oo),Matrix<C>(one»/sqrt(C(2»)(0);
cout « "Greenberger-Horne-Zeilinger state: ";
print(cout,psiGHZ) « endl;
cout « "Measuring qubit 0 as 1 yields: ";
print(cout,Measure(psiGHZ,O,l» «endl;
cout « "Measuring qubit 1 as 1 yields: ";
print(cout,Measure(psiGHZ,l,l» «endl;
cout « "Measuring qubit 2 as 0 yields: ";
print(cout,Measure(psiGHZ,2,0» «endl;
}
UHIO> +(sqrt(2)~(-1»10>+(sqrt(2)~(-1»11>
UH11> +(sqrt(2)~(-1»10>+(-sqrt(2)~(-1»11>
UXORIOO> = +(1)100>
17.7 Example Programs 489
UXORI01> = +(1)101>
UXORll0> = +(1)111>
UXOR I11> = +(1)110>
USWAPIOO> = +(1)100>
USWAPI01> = +(1)110>
USWAPll0> = +(1)101>
USWAPlll> = +(1)111>
In 1996 Schack and Brun [141] described a powerful C++ library for solving quantum
systems. The core of the library are the C++ classes State and Operator, which
represent state vectors and operators in Hilbert space. However the disadvantage
of this C++ library is that the constants (for example the coupling constant in a
Hamilton operator) can only be treated numerically, i.e. it is of data type double.
In SymbolicC++ we can treat constants either symbolically or numerically. Using
the method set we can switch from a symbolic representation of a constant to a
numeric representation. Using the approach of Schack and Brun it is also difficult to
construct the CNOT operator on any two qubits of a state. In 1995, a year before the
paper of Schack and Brun [141], Steeb [159] described a computer algebra package
based on Reduce and Lisp that can handle Bose, Fermi and coupled Bose-Fermi
systems. Since spin operators can be expressed with Fermi operators, the package
490 Chapter 17. Quantum Bits and Quantum Computation
can also deal with spin-systems. It also has the advantage that constants can be
treated either numerically or symbolically.
Two other simulations are described by Omer [124] and Pritzker [134]. Both are
implemented in C++. However, these implementations can also only use numeric
representations and not symbolic representations. None of them implement the
Kronecker product and direct sum to aid the construction of operators such as we
have used for the simulation of teleportation.
The OpenQubit simulation [134] implements the classes QState, which represents
the state of the entire quantum computer, and QRegister, which refers to specific
qubits from QState to be used as a quantum register. Further support for quan-
tum algorithms are provided by four operators denoted by R x , Ry, Ph and GNot
which are rotations, phase changes and the controlled NOT gate. The implementa-
tion supports the simulation of measurement. Shor's factoring algorithm has been
successfully implemented using this system.
IS. 1 Introduction
The interpretation of measurements in quantum mechanics is still under discussion
(Healey [84], Bell [9], Redhead [136]). Besides the Copenhagen interpretation we
have the many-worlds interpretations (Everett interpretations), the modal interpre-
tations, the decoherence interpretations, the interpretations in terms of (nonlocal)
hidden variables, the quantum logical interpretations.
It is well known that the conceptual foundations of quantum mechanics have been
plagued by a number of paradoxes, or conceptual puzzles, which have attracted a
host of mutually incompatible attempted resolutions - such as that presented by
Schrodinger [143], popularly known as the paradox of Schrodinger's cat, and the
EPR paradox, named after the last initials of its authors, Einstein, Podolsky, and
Rosen [60].
where
and
(<1>1<1» = 1.
Thus
ICll 2 + 1~12 = 1.
Let IR =1) and IR =!) denote the up and down pointer-reading eigenstates of an
Sz-measuring apparatus. Thus IR =1) and IR =!) are eigenstates of the operator
k According to quantum mechanics (with no wave-function collapse), if the appa-
ratus ideally measures the particle, the combined system evolves into an entangled
superposition
Common sense insists that after the measurement, the pointer reading is definite.
According to the orthodox value-assignment rule, however, the pointer reading is
definite only if the quantum state is an eigenstate of
the pointer-reading operator, where I is the identity operator. Since l<p) is not an
eigenstate of I ® il, the pointer reading is indefinite. The interpretations of quantum
mechanics mentioned above attempt to deal with this aspect of the measurement
problem. However their solutions run into a technical difficulty which is called the
basis degenemcy problem.
18.3 Copenhagen Interpretation 493
prob",(A E 0) = p.
Here p is a real number between zero and one (including those limits), A is a quan-
tum dynamical variable, 0 is a (Borel) set of real numbers, and 1/J is a mathematical
representative of an instantaneous quantum state. In quantum state 1/J, the proba-
bility of finding that the value of A lies in 0 is p. How is the phrase "of finding" to
be understood? This probability is calculated according to the appropriate quantum
algorithm. For example,
where 1/J is the system's state vector, and PA(O) is the projection operator corre-
sponding to the property A E O. On the present interpretation, a quantum state
may be legitimately ascribed to a single quantum system (and not just to a large
ensemble of similar systems), but only in certain circumstances. A system does not
always have a quantum state. These circumstances are not universal. Nevertheless,
every quantum system always has a dynamical state. Consequently, there can be
no general identification between a system's quantum state and its dynamical state;
nor is it always true that one determines the other.
The Born rules apply directly to possessed values of quantities, and only deriva-
tively to results of measurements of these quantities. In this view every quantum
dynamical variable always has a precise real value on any quantum system to which
it pertains, and the Born rules simply state the probability for that value to lie in
any given interval. Thus the Born rules assign probabilities to events involving a
quantum system u of the form "The value of A on u lies in 0". A properly con-
ducted measurement of the value of A on u would find that value in 0 just in case
the value actually lies in O.
Since the statement of the Born rules then involves explicit reference to measure-
ment (or observation), to complete the interpretation it is necessary to say what
constitutes a measurement. Proponents of the Copenhagen interpretation have typ-
ically either treated measurement (or observation) or cognates as primitive terms
in quantum mechanics, or else have taken each to refer vaguely to suitable interac-
tions involving a classical system. If measurement remains a primitive term, then
it is natural to interpret it epistemologically as referring to an act of some observer
which, if successful, gives him or her knowledge of some structural feature of a phe-
nomenon. But then, quantum mechanics seems reduced to a tool for predicting
494 Chapter 18. Measurement and Quantum States
A motivation behind the construction of such theories has been the belief that some
more complete account of microscopic processes is required than that provided by
quantum mechanics according to the Copenhagen interpretation (Healey [84]). The
general idea has been to construct such an account by introducing additional quan-
tities, over and above the usual quantum dynamical variables (such as de Broglie's
pilot wave, Bohm's quantum potential, or fluctuations in Vigier's random ether),
and additional dynamical laws governing these quantities and their coupling to the
usual quantum variables. The primary object is to permit the construction of a de-
tailed dynamical history of each individual quantum system which would underlie
the statistical predictions of quantum mechanics concerning measurement results.
Though it would be consistent with this aim for such dynamical histories to conform
only to indeterministic laws, it has often been thought preferable to consider in the
first instance deterministic hidden variable theories. A deterministic hidden vari-
able theory would underlie the statistical predictions of quantum mechanics much
as classical mechanics underlies the predictions of classical statistical mechanics. In
both cases, the results of the statistical theory would be recoverable after averaging
over ensembles of individual systems, provided that these ensembles are sufficiently
typical: but the statistical theory would give demonstrably incorrect predictions for
certain atypical ensembles.
Bell [9] showed that no deterministic hidden variable theory can reproduce the
predictions of quantum mechanics for certain composite systems without violating
a principle of locality. This principle is based on basic assumptions concerning the
lack of physical connection between spatially distant components of such systems;
and the impossibility of there being any such connection with the property that a
change in the vicinity of one component should instantaneously produce a change
in the behaviour of the other. Further work attempting to extend Bell's result
to apply to indeterministic hidden variable theories has shown that there may be
a small loophole still open for the construction of such a theory compatible with
the relativistic requirement that no event affects other events outside of its future
light-cone.
Existing hidden variable theories, such as that of Vigier [179], are explicitly nonlocal,
and do involve superluminal propagation of causal influence on individual quantum
systems, although it is held that exploiting such influences to transmit information
superluminally would be extremely difficult, if not impossible. Any superluminal
transmission of causal signals would be explicitly inconsistent with relativity theory.
If this were so, such nonlocal hidden variable theories could be immediately rejected
on this ground alone. Relativity does not explicitly forbid such transmission. Nonlo-
cal hidden variable theories like that of Vigier can conform to the letter of relativity
by introducing a preferred frame, that of the subquantum ether, with respect to
which superluminal propagation is taken to occur. By doing so they avoid the gen-
eration of so-called causal paradoxes. However they violate the spirit of relativity
496 Chapter 18. Measurement and Quantum States
theory by reintroducing just the sort of privileged reference frame. The princi-
ple that a fundamental theory can be given a relativistically invariant formulation
seems so fundamental to contemporary physics that no acceptable interpretation of
quantum mechanics should violate it.
A hidden variable theory is a separate and distinct theory from quantum mechanics.
To offer such a theory is not to present an interpretation of quantum mechanics but
to change the subject. One reason is that a hidden variable theory incorporates
quantities additional to the quantum dynamical variables. Another is that hidden
variable theories are held to underlie quantum mechanics in a way similar to that
in which classical mechanics underlies the distinct theory of statistical mechanics.
A final reason is that a hidden variable theory (at least typically) is held to be
empirically equivalent to quantum mechanics only with respect to a restricted range
of conceivable experiments, while leading to conflicting predictions concerning a
range of possible further experiments which may, indeed, be extremely hard to
actualize.
quantum state immediately after the conclusion of the interaction is related to the
intitial state as follows
I-I,O"EIJ"')
'I-'
= 1,,0")
'1-',
1Zl1'l/J'" ) ---+ I'l/J'O"EIJ"')
1... 1
= IcpO")
" 1Zl1'
I l/J'".. ·,ail )
for each eigenvector Icpr) of A, where the I'l/Jf) are orthonormal vectors, [ail stands for
a recording of the eigenvalue ai of A. The dots indicate that results of earlier good
observations may also be recorded in the state of a. It follows from the linearity of
the Schrodinger equation that an arbitrary normalized initial object system quantum
state Li eMf) with
Each component Icpf) 1Zl1'l/J~... ,ail) with nonzero coefficient c; in the superposition on
the right-hand side corresponds to a distinct state in which the observer has recorded
the ith eigenvalue for the measured quantity on the object system, while the object
system remains in the corresponding eigenstate Icp;). Moreover, all these states are
equally real. Every possible result is recorded in some observer state I'l/J~... ,ail)' and
there is no unique actual result. For a sequence of good observations by a single
observer, consisting of multiple pairwise interactions between the apparatus system
and each member of a set of object systems, Everett is able to show the following. If a
good observation is repeated on a single object system in circumstances in which that
system remains undisturbed in the intervening interval (in the sense that the total
Hamilton operator commutes with the operator representing the observed quantity),
then the eigenvalues recorded for the two observations are the same, in every observer
state. This is exactly what would be predicted by an observer who represents each
object system independently by a quantum state vector and regarded the first of each
sequence of repeated measurements on it as projecting the relevant object system's
quantum state onto an eigenvector corresponding to the initially recorded eigenvalue.
This is the first respect in which, for each observer, a good observation appears
to obey the projection postulate. Everett shows that each observer will get the
right probabilities for results of arbitrary good observations on a system which has
been subjected to an initial good observation, if, following this initial observation,
one assigns to the system the quantum state it would have had if projection had
then occurred. For the following two probabilities are demonstrably equal: the
probability of result bj in a subsequent good observation of B on (J by an observer
corresponding to a who applies the projection postulate to the state of (J alone after
498 Chapter 18. Measurement and Quantum States
where the {IAi)} and {IBi)} vectors are orthonormal, and are therefore eigenstates
of self-adjoint operators (observables) A and B associated with systems 1 and 2,
respectively. This biorthogonal expansion picks out the Schmidt basis. The basis
degeneracy problem arises because the biorthogonal decomposition is unique just in
case all of the nonzero ICil are different. When ICII = IC21, we can biorthogonally
expand
problem by hypothesizing that when the combined system occupies state lip), the
two branches of the superposition split into separate worlds, in some sense. The
pointer reading becomes definite relative to its branch. For instance, in the "up"
world, the particle has spin up and the apparatus possesses the corresponding pointer
reading. In this way, many-world interpreters explain why we always see definite
pointer readings, instead of superpositions.
Elby and Bub [62] proved that when a quantum state can be written in the tri-
orthogonal form
then, even if some of the c; are equal, no alternative bases exist such that Iw) can
be rewritten
Therefore the triorthogonal decomposition picks out a special basis. This preferred
basis can be used to address the basis degeneracy problem. The tridecompositionaZ
uniqueness theorem provides many-world interpretations, decoherence interpreta-
tions, and modal interpretations with a rigorous solution to the basis degeneracy
problem. Several interpretations of quantum mechanics can make use of this special
basis. For instance, many-world adherents can claim that a branching of worlds
occurs in the preferred basis picked out by the unique triorthogonal decomposition.
Modal interpreters can postulate that the triorthogonal basis helps to pick out which
observables possess definite values at a given time. Decoherence theorists can cite
the uniqueness of the triorthogonal decomposition as a principled reason for assert-
ing that pointer readings become classical upon interacting with the environment.
When the environment interacts with the combined particle-apparatus system the
following state results
where IE±) is the state of the rest of the universe after the environment interacts
with the apparatus. As time passes, these environmental states quickly approach
orthogonality:
(E+IE_) ~ O.
In this limit, we have a triorthogonal decomposition of Iw). Even if Cl = C2, the
triorthogonal decomposition is unique. In other words, no transformed bases exist
such that Iw) can be expanded as
500 Chapter 18. Measurement and Quantum States
Therefore, IIJ!) picks out a preferred basis. Many-world interpreters can postulate
that this basis determines the branches into which the universe splits. For the proof
we refer to the literature (Elby and Bub [62]).
Thus the measurement is classically correlated, but the result is random. Further
measurements will retain this correlation giving the observer the illusion of the pro-
jection postulate being satisfied. The mutual information shared with the original
system vanishes, thus no information is obtained about the state of the original
system.
Chapter 19
Quantum State Machines
19.1 Introduction
In this chapter we introduce the quantum state machine [80, 119]. The quantum
state machine is an extension of classical finite state machines used to represent the
computations possible in quantum computing. Quantum state machines introduce
amplitudes for transitions between states to represent the parallelism available.
• A finite set 5 of states where the elements are uniquely identified with or-
thonormal states in a Hilbert space 1-l of dimension at least 151. One state
So E 5 is designated as the start state. We will use the one-to-one function
m : 5 -+ 1-l to denote the relationship between states and elements of the
Hilbert space. We will use the notation Is) = m(s).
• A finite set of transitions for each combination of two (possibly identical) states
and symbols in the alphabet. Transitions are ordered 4-tuples (a, b, c, da,b,c)
where a, b E 5, c E A and dEC. We require that L:b,c Ida,b,cI 2 = 1 where the
sum is over all transitions from a. We will also define da,b,c to be zero when
no transition exists between a and b for input c. The values da,b,c must also
satisfy
L d.,t,cds',t,c = os,s'
tES
for every pair of states s, s' E 5, where os,s' is the Kronecker delta.
The condition
L ds,t,cdsl,t,c = ds,s'
tES
we say that ala2 ... an is accepted, otherwise it is rejected. Thus the input symbols
define a sequence of unitary operations to apply to an initial state, or a program.
This can be thought of as a program of quantum operations controlled classically,
which is exactly the way we have described quantum algorithms. The end of the
input corresponds to a measurement, i.e. we have to determine if the machine is in
a final state. We cannot define halt states, since the initial state may evolve into a
superposition of halt states and states which are not. This is the quantum halting
problem [110, 101]. The quantum finite automaton cannot crash on an input, since
it simply performs the transition with amplitude (probability) o.
n
D(s, n) = L II dSj_bSj,aj"
SI,82, ...,snES j=l
al,a2, ... ,anEA
Graphically, we can represent quantum automata in the same way as with finite
automata, with the additional labelling of arcs between states with the complex
amplitudes for the corresponding transition. The description of quantum automata
19.2 Quantum Automata 503
The function U( w )so is called the response function of the q-automaton. A function
R : A* -+ H is realizable by the q-automaton if R(w) = U(w)so. A word w is
accepted if U (w ) So E F. The language of the q-automaton is the set of all words
accepted by the q-automaton.
• R is realizable by a q-automaton.
• There exists an orthonormal basis l7j;j) for H and an orthonormal basis l7j;j(x))
for every x E A such that (R(xw)l7j;j) = (R(x)l7j;j(x)).
We can define the tensor product of two finalizing q-automata q1 = (H 1, S1, A, U1, F1)
and q2 = (H2' S2, A, U2, F2) over the same input alphabet as
We can extend the languages accepted by a finalizing q-automaton. For 0 :::; 1] < 1
a word w is 1]-accepted by a q-automata q = (H, So, A, U, F) if
For further results in the theory of quantum automata we refer to Gudder [80].
T:=Auru{.6.}.
The input string is placed in the first cells of the tape, the rest of the cells
are filled with.6.. The content of a cell is identified with orthonormal states
in a Hilbert space HT of dimension at least IAI + Irl + 1. The Hilbert space
describing the tape is thus HM := (8)~oo H T . The one-to-one mapping mT,
with mT : T -+ HT , associates elements in the tape cells with the elements of
the Hilbert space H T . We use the notation It) := mT(t).
• A tape head that can read the contents of a tape cell, put a symbol from r or
the .6. symbol in the tape cell and move one cell right or left. All these actions
take place simultaneously. If the head is at cell[i] and moves left (right) then
19.3 Quantum Thring Machines 505
the head will be at cell[i - 1] (cell[i + 1]). The position of the tape head
is identified with orthonormal states in an infinite dimensional Hilbert space
1iTH . The one to one mapping mTH : Z -+ 1iTH associates the tape head
position (an integer specifying the cell) with elements of the Hilbert space.
We use the notation Ij) := mTH(j).
• A finite set of transitions for states and symbols from ~U r u{~ }. A transition
is an ordered 6-tuple (a, b, c, d, e, fa,b,c,d,e) with
1. L fs,t,c,d,efsl,tl,c,d,e = lis,s,lit,tl
c,d,e
2. L fa,b,c,d,efa,b,c',dl,e l = lic,c,lid,d,lie,e'
a,b
The quantum Thring machine has a tape which is infinitely long in both directions.
This does not provide any additional computing power over the tape which is only
infinite in one direction, but it does make the description of the quantum Thring
machine simpler since we can avoid crashing the machine, which corresponds to the
lack of a unitary transform to describe what happens when the machine is at ceU[O].
where I'l/Jo) is the initial contents of the tape, using l cells. The evolution of the
machine is described by a unitary operator U, which in turn is specified by the
transitions. Thus after n steps of execution the machine is in the state
506 Chapter 19. Quantum State Machines
The unitary evolution U can be described in terms of the amplitudes of the transi-
tions.
U = L
x,a,b,c,d,e
ia,b,c,d,el c)(al ® Ix + OT,e - ol,e)(xl ® (® IT)
-00
® Id) (bl ® (® IT)
X+ 1
L /s,t,c,d,e/s',t',c,d,e = os,s,flt,tl
c,d,e
and
L fa,b,c,d,e/a,b,e ,d' ,e' = Oc,r!Od,d,tJe,e"
a,b
We cannot determine when a quantum Turing machine halts in the same way as
for quantum automata. Quantum automata relies on a finite input string which
describes the running of the machine and explicitly determines when the machine
halts. The tape of the quantum TUring machine cannot fulfill this role since the
machine can modify any cell on the tape, the input is not "consumed". Deutsch [54]
suggested reserving one cell of the tape which is always in one of two orthonormal
states to indicate when the machine has halted. The cell contents can become
entangled with the rest of the machine, giving a superposition of halted machines
and machines which have not halted [110, 101]. If it is known that for any input
of length n the quantum Turing machine will halt after t(n) steps, we can use the
state indicating the halt status of the machine as a control (in the same way as
the controlled NOT) for the transformation U of the quantum Turing machine, and
measure after t(n) steps with certainty that the machine has halted. Deutsch also
suggested the existence of a universal quantum Turing machine which can simulate
any other quantum Turing machine. Yu Shi [147] discusses why this cannot be the
case.
Chapter 20
Teleportation
20.1 Introduction
Quantum teleportation is the disembodied transport of an unknown quantum state
1'IjI) from one place to another. All protocols for accomplishing such transport re-
quire nonlocal correlations, or entanglement, between systems shared by sender and
receiver. The sender is normally called Alice and the receiver is called Bob. Most
attention has focused on teleporting the states of finite-dimensional systems, such
as the two-dimensional polarization of a photon or the discrete level structure of an
atom. First proposed in 1993 by Charles Bennett and his colleagues [17, 24, 138]
quantum teleporation thus allows physicists to take a photon or any other quantum
scale particle such as an atom and transfer its properties (such as the polarization)
to another photon even if the two photons are on opposite sides of the galaxy. This
scheme transports the particle's properties to the remote location and not the par-
ticle itself. The state of the original particle must be destroyed to create an exact
reconstruction at the other end. This is a consequence of the no cloning theorem. A
role in the teleportation scheme is played by an entangled ancillary pair of particles
which will be initially shared by Alice and Bob.
teleported
state ~
entangled par
~
~ ,~
a>
EPR-source
and the entangled pair of particles 2 and 3 shared by Alice and Bob is in the state
Alice gets particle 2 and Bob particle 3. This entangled state contains no information
on the individual particles 2 and 3. It only indicates that the two particles will be
in opposite states. Alice then performs a joint Bell-state measurement on the initial
particle 1 and particle 2 projecting them also onto an entangled state. After Alice has
sent the result of her measurement as classical information to Bob, he can perform
a unitary transformation on the other ancillary particle resulting in it being in the
state of the original particle.
Most experiments are done with photons which are spin-1 particles. The information
to be teleported is the polarization state of the photon. The Innsbruck experiment is
a simplified version of the teleportation described above. In this experiment photons
are used. The photon is a particle with spin 1 and rest mass O. If the photon moves
in positive z direction, i.e. the wave vector k is given by (0,0, kf we have the wave
functions
where el := (1,0, of and e2 := (0,1, of. Thus we have two transverse waves.
Although the photon is a spin-1 particle the vectors sand k can only be parallel
(or antiparallel). The wave ¢>l is in a state of positive helizity and the wave ¢>-l
is in a state of negative helizity. In the Innsbruck experiment at the sending sta-
tion of the quantum teleporter, Alice encodes photon M with a specific state: 45
degree polarization. This photon travels towards a beamsplitter. Meanwhile two
additional entangled photons A and B are created. Thus they have complementary
polarizations. For example, if photon A is later measured to have horizontal (0 de-
grees) polarization, then the other photon B must collapse into the complementary
state of vertical (90 degrees) polarization. Now entangled photon A arrives at the
beamsplitter at the same time as the message photon M. The beamsplitter causes
each photon either to continue towards detector 1 or change course and travel to
20.2 Teleportation Algorithm 509
detector 2. in 1/4 of all cases, in which the two photons go off into different de-
tectors, Alice does not know which photon went to which detector. Owing to the
fact that the two photons are now indistinguishable, the message photon Mioses
its original identity and becomes entangled with A. The polarization value for each
photon is now indeterminate, but since the two photons travel towards different de-
tectors Alice knows that the two photons must have complementary polarizations.
Since message particle M must have complementary polarization to particle A, then
the other entangled particle B must now attain the same polarization value as M.
Therefore teleportation is successful. The receiver Bob sees that the polarization
value of the particle B is 45 degrees, which is the initial value of the message photon.
In the experimental version of this setup executed at the University of Innsbruck,
the 45-degree polarization would always fire when detector 1 and detector 2 fired .
Except in rare instances attributable to background noise, it was never the case that
the 135-degree polarization detector fired in coincidence with detectors 1 and 2.
~91 __
1~a"~~le"
Teleportation can also be understood using the quantum circuit shown in the fol-
lowing figure.
A------<r-IHt------'--------.-----
c-----<~----~~~~
In the figure A is the input 1'Ij!), B the input 10) and C the input 10). Now we study
what happens when we feed the product state 1'Ij!00) into the quantum circuit. From
the circuit we have the following eight 8 x 8 unitary matrices
Applying the first four unitary matrices U4U3 U2 Uj to the input state we obtain
Thus the state I1/'} will be transferred to the lower output, where both other outputs
will come out in the state (IO) + II}}/Y'2. If the two upper outputs are measured
in the standard basis (IO) versus II}}, two random classical bits will be obtained in
addition to the quantum state I1/'} on the lower output.
Consider the case when the qubit to be teleported is one qubit of an entangled pair.
The first two qubits are entangled. Applying the teleportation algorithm to the
second, third and fourth qubits yields
The first and last qubits are now entangled, whereas the first and second are no
longer entangled. Thus we have achieved entanglement swapping.
#include "Vector.h"
#include "Matrix.h"
#include "Rational.h"
#include "Msymbol.h"
using namespace std;
Matrix<T> NOT(2,2);
512 Chapter 20. Teleportation
Matrix<T> H(2,2);
H[O] [0] = T(1)/sqrt(T(2)); H[O] [1] = T(1)/sqrt(T(2));
H[l] [0] = T(1)/sqrt(T(2)); H[l] [1] = T(-1)/sqrt(T(2));
Matrix<T> 1(2,2);
I. identity 0 ;
Matrix<T> X(4,4);
X[O] [0] = T(l); X[O] [1] - T(O); X[O] [2] - T(O); X[O] [3] = T(O);
X[l] [0] = T(O); X[l] [1] = T(1); X[l] [2] - T(O); X[l] [3] = T(O);
X[2] [0] = T(O); X[2] [1] - T(O); X[2] [2] - T(O); X[2] [3] = T(l);
X[3] [0] = T(O); X[3] [1] - T(O); X[3] [2] = T(l); X[3] [3] - T(O);
Matrix<T> Ul=kron(I,kron(H,I));
Matrix<T> U2=kron(I,X);
Matrix<T> U3=kron(X,I);
Matrix<T> U4=kron(H,kron(I,I));
Matrix<T> U5=kron(I,X);
Matrix<T> U6-kron(I,kron(I,H));
Matrix<T> U7-dsum(I,dsum(I,dsum(NOT,NOT)));
Matrix<T> U8=kron(I,kron(I,H));
result=U8*(U7*(U6*(U5*(U4*(U3*(U2*(Ul*v)))))));
for(i-0;i<8;i++)
{
while(result[i].put(power(sqrt(T(2)),-6),power(T(2),-3)));
while(result[i).put(power(sqrt(T(2)),-4),power(T(2),-2)));
while(result[i].put(power(sqrt(T(2)),-2),power(T(2),-1)));
}
return result;
}
len = v.length()/int(pow(2,qubit+l));
for(i=O;i<v.length();i++)
{
if(!(i%len)) skip = i-skip;
if(skip) result[i] = T(O);
else D += result[i]*result[i];
}
result/=sqrt(D);
return result;
}
if(v.length()==2) b=b2;
if(v.length()==4) b=b4;
if(v.length()==8) b=b8;
for(i=O;i<v.length();i++)
if(!v[i) . is_Number 0 II v[i] . nvalue 0 !=C(O))
o « "+(" « v[i) « "),, « b[i);
return 0;
}
void main(void)
{
Vector<C> zero(2),one(2);
Vector<C> zz(4) ,zo(4) ,oz(4) ,00(4) ,qreg;
Vector<C> tpOO,tpOl,tpl0,tpll;
Sum<Rational<int> > a("a",O),b("b",O);
int i;
qreg=kron(a*zero+b*one,kron(zero,zero))(O);
cout « "UTELEPORT("; print(cout,qreg) « ") ,
II.
514 Chapter 20. Teleportation
print(cout,qreg=Teleport(qreg» « endl;
cout « "Results after measurement of first 2 qubits:" « endl;
tpOO = Measure(Measure(qreg,O,O) ,1,0);
tpOl = Measure(Measure(qreg,O,O) ,1,1);
tpl0 = Measure(Measure(qreg,O, 1) ,1,0);
tpll = Measure(Measure(qreg,O, 1) ,1,1);
for(i=0;i<8;i++)
{
while(tpOO[i] .put(a*a,C(l)-b*b»;
while(tpOO[i] .put(power(sqrt(C(1)/C(2»,-2),C(2»);
while(tpOl[i] .put(a*a,C(l)-b*b»;
while(tpOl[i] .put(power(sqrt(C(1)/C(2»,-2),C(2»);
while(tpl0[i] .put(a*a,C(l)-b*b»;
while(tpl0[i] .put(power(sqrt(C(1)/C(2»,-2),C(2»);
while(tpll[i].put(a*a,C(l)-b*b»;
while(tpll[i].put(power(sqrt(C(1)/C(2»,-2),C(2»);
}
cout « " 100> print (cout,tpOO) « endl;
cout « " 101> print (cout ,tpOl) « endl;
cout « " 110> print (cout ,tpl0) « endl;
cout « " 111> print(cout,tpll) « endl;
cout « endl;
There are four such functions, the constant functions which map all inputs to 0 or
all inputs to 1, and the varying functions which have f(O) # f(I). In other words
11(0) = 0, 11(1) = 0
f2(0) = 1, f2(1) = 1
h(O) = 0, h(l) = 1
14(0) = 1, 14(1) = O.
The first two functions are constant. The task is to determine for such a function if
it is constant or varying using only one calculation of the function. In the classical
case it is necessary to compute f twice before it is known whether it is constant or
varying. For example if f(O) = 0, the function could be 11 or h, similarly for any
other single evaluation two of the functions have the same value.
In quantum computing the following solution was found [47, 178]. The function is
implemented on quantum hardware with the unitary transformation Uf such that
where E!:l denotes the XOR operation. We apply the transformation to the state
This gives
1
2(10) @ 10 E9 f(O)) -10) @ 11 E9 f(O)))
1
+ 2(11) @ 10 E9 f(I)) -11) @ 11 E9 f(I))).
Since Uf is linear and quantum mechanics allows the superposition of states we have
calculated the function f with each input twice. This feature of quantum parallelism
makes it possible to solve the problem. Applying the Hadamard transform UH to
the first qu bit yields
1
(UH @12)Uf l1/i) = M(lO) @ (10 E9 f(O)) -11 E9 f(O)) + 10 E9 f(I)) -11 E9 f(I))))
2v2
1
+ M(ll) @ (10 E9 f(O)) -11 E9 f(O)) -10 E9 f(I)) + 11 E9 f(I)))).
2v2
1
(H@I2 )Uf l1/i) = )210) @ (10 E9 J(O)) -11 EB J(O))).
If f is varying we have
Thus measuring the first qubit as zero indicates the function f is constant and
measuring the first qubit as one indicates the function f is varying. The function f
was only calculated once using U, and so the problem is solved.
21.1 Deutsch's Problem 517
The algorithm has been implemented using nuclear magnetic resonance techniques
[43,98].
I{x E {a, l}n I f(x) = a}1 = I{x E {a, l}n I f(x) = 1}1·
Deutsch and Josza [47, 55] generalized the problem. Let f : {a,l}n ...... {a, I} be
a boolean function. Assume that f is either constant or balanced. Thus f maps
only to a, only to 1, or to an equal number of a's and l's for all possible inputs.
The problem is to determine if f is constant or balanced using only one function
evaluation. Let Uf be the unitary operator which implements the function f,
1
y'2lx) 119 (Ia EEl f(x)) -11 EEl f(x)))
( 1)f(x)
y'2 Ix) 119 (Ia) -11)).
where x E {a,l}n and j * x denotes the bitwise AND of j and x followed by the
XOR of the resulting bits
1
I1/!) := y'2la) ® (Ia) - 11)).
518 Chapter 21. Quantum Algorithms
The first step is to apply the Walsh-Hadamard transform to the first n qubits of
11/1) .
( ®UH)I1/1)=
n
!n
v 2·· xE{O,l}n
L IX)0(IO)-11)).
The probability that a measurement of the first n qubits yields 100 ... 0) is
~ L
I2
(_1)f( Xl I
2
= {1 l~f f is constant
xE{O,qn 0 f is balanced
Thus, after applying these transformations, measuring all 10) for the first n qubits
indicates that f is constant and any other result indicates that f is balanced. The
network representation for the algorithm is given in Figure 21.1.
nqubitsl~
H ----0-
10)-11) - - - - - - i U / f - - - - - 10)-11)
f:{O,lt----){O,l}m, m~n.
Only one function evaluation is required. This is the next step in the solution.
The final step is to apply the Walsh-Hadamard transform to the first n qubits again.
We have
Thus if 8 *k = 0,
(j Ell 8) * k = j * k.
If 8 * k =J 0 then the amplitude of the state Ik) @ 1/(j)) is zero. Thus measuring the
n+m qubits yields with certainty a number t such that t*8 = O. Repeating the above
procedure O(n) times will yield enough linearly independent t and corresponding
equations of the form t * 8 = 0 so that 8 can be found.
If 1 is one to one, then each measurement will yield a random value. The resulting
8 determined from the equations must then be tested, for example the values 1(0)
and 1(8) can checked for equality.
The expected time of the algorithm is O(nTf(n) + G(n)) where Tf(n) is the time
required to compute 1 on inputs of n bits and G(n) is the time required to solve n
linear equations for n unknowns in {O, I}.
The algorithm does not guarantee completion in polynomial time for the worst case.
A quantum algorithm which is guaranteed to complete in polynomial time (or less)
is called an exact quantum polynomial time algorithm. Brassard and H¢yer [27]
discovered an exact quantum polynomial time algorithm to solve Simon's problem.
Their algorithm is based on Simon's solution with the addition that after each
iteration, 0 and values of t already found are removed from the superposition. Thus
the time to determine the n linear equations is precisely determined.
We can eliminate the state with the first n qubits equal to t as follows. Suppose the
lth bit of t is 1. We begin with the final state of an iteration of Simon's solution.
We add an auxiliary qubit 10) and then apply the controlled NOT with qubit I as
the control and the auxiliary qubit as the target which gives the state
1 2n-12n-l .
UcNoT(I, n + m + 1) 2n L L
(-l)J·klk) @ I/(j)) @ 10)
j=O k=O
where kl is the lth bit of k and Ot = 0 and It = t. We obtain this result by separating
the sum over those k with the lth bit 0 and those with the lth bit 1. We also make
use of the fact that the k form a group with EB, the bitwise XOR operation. Thus
(k EB pt) will take all the values over k.
Now we apply the operator which maps Ix) to Ix EB t), for the first n qubits, only
when the auxiliary qubit is 11) and leaves the first n qubits unchanged otherwise.
In other words we apply the operator
which can be built using n controlled controlled NOT gates. Applying the operator
U(JJt yields
2: 2~lt 2~\_I)j*(k(JJmt)lk)
J=O p=0 kl=O
® IfU)) ® 1m)
Since the first n +m qubits are independent of p we can discard the auxiliary qubit.
Thus the final state is
This state is the same form as the final state of each iteration of Simon's solution,
except for the fact that the probability of measuring the first n qubits as the t
already found is 0 and the probability of measuring some other t' with t' * s = 0 is
greater. This process can be repeated to eliminate all of the t values with t * s = 0
already found. The above modification to the algorithm does not ensure that the
first n qubits will never be measured as o. To remove this possibility, Brassard
and H0yer [27] use a modification of Grover's search algorithm which, under certain
conditions, succeeds with probability 1. The technique is discussed in Section 21.6.
522 Chapter 21. Qua:ntum Algorithms
1 n-l
x(k) = - L x(j)e-·27rkJ/n.
._
n j=O
n-l
x(j) = L x(k)ei27rki/n.
k=O
The transform is unitary and is called the quantum Fourier transform. We have
[47,162]
1 1 1 1
1 e- i27r/ n e- i41r / n e- i (n-l)21r/n
1 1 e- i4'1r/n e- i81r / n e-i (n-l)47r/n
UQFT=-
..;n
1 e-i(n-l)211'/n e-i (n-l)47r/n e- i (n-l)22'1r/n
The transform can be implemented using m(~+l) single and double qubit gates
[47,138]. Let Hk denote the Hadamard transform acting on qubit k and Ups(k,j, </J)
denote the phase shift transform Ups(</J) acting on qubits k and j. We can rewrite
the transform as (by relabeling the sums)
21.3 Quantum Fourier Transform 523
1 m-l
®
;;=;:;n (10) + exp( -i21fk21- m )ll))
y2"· 1=0
1
ffm
m-l
~(IO) + B
m-l .
exp(-i21fkj2J-m)ll))
I~o)~ ...
Ikm - 3 )
Ikm - 2)
Ikm - 1)
fa,N(X) := aX mod N
where a is any randomly chosen positive integer (a < N) which is coprime with N,
i.e. which has no common factors with N. If a is not coprime with N, then the
factors of N are trivially found by computing the greatest common divisor of a and
N.
Example. Let N = 21 and a = 11. Thus N and a have no common factors. Now
for x = 0, 1,2,3,4,5,6 we find
ar = 1 modN
for the smallest positive integer r. Obviously we find
11 6 = 1 mod 21
Since
a;modN i - I .
For the example given above these two conditions are met. When a is chosen
randomly the two conditions are satisfied with probability greater than 1/2 [47J.
21.4 Factoring (Shor's Algorithm) 525
The greatest common divisor can be found using the Euclidean algorithm. The Eu-
clidean algorithm runs in polynomial time on a classical computer. For the example
given above with N = 21, a = 11 and r = 6 we find
The quantum computer is prepared with two quantum registers, X and Y, each
consisting of a string of qubits initialized to the Boolean value zero
22L -1 ~ N 2.
This can be achieved by applying the Hadamard transform to each qubit of the
input register X, i.e.
1 22L_1
11/11) == (UH @ U H @ ... @ UH) @ liD) @ ID) = 2L I: Ix) @ ID).
x=o
1 22L_1
11/12) == 2L I:
x=o
Ix) @ Ifa,N(x)).
At this stage, all the possible values of fa,N are encoded in the state of the second
register. However, they are not all accessible at the same time. However, we are not
526 Chapter 21. Quantum Algorithms
interested in the values themselves, but only in the periodicity of the function fa,N'
Observing the second quantum register as u would yield
with
laX =U mod N
g,,(x):= { 0 otherwise .
The next step is to Fourier transform the first register. This means we apply a
unitary operator that maps the state onto
1 22L_122L_l
11/13) = 22L L L exp(21l'ixk/22L) Ik) QSllfa,N) .
x=o k=O
is
such that
aX = am mod N .
Using x = m + br with b E No the sum becomes
21.4 Factoring (8hor's Algorithm) 527
which is congruent to
rkmod(22L -1).
The above probability has well-defined peaks if
{rkh2L
rk = d2 2L
for some d < N. Thus, knowing L and therefore 22L and the fact that the position
of the peaks k will be close to numbers of the form d2 2L , we can find the period r
using continuous fraction techniques. To explicitly construct the unitary evolution
that takes the state 11/;1) into the state 11/;2) is a rather nontrivial task [118].
There are r¢(r) states which can be used to determine r [148], where ¢ is Euler's
totient function. Each state occurs in the superposition with probability at least
1/(3r2), so the probability of measuring a state which can be used to determine r
is ¢(r)/(3r). Using the fact that
Now the algorithms of Deutsch, Simon, Shor and Kitaev as well as others can be
formulated group theoretically as a hidden subgroup problem [93, 100, 121]. Let f
be a function from a finitely generated group G to a finite set X such that f is
constant on the cosets of a subgroup K and distinct on each coset. The cosets of K
are the sets
9· K:= {g. k IkE K}, 9 E G.
The cosets partition G, i.e. the union of all the cosets is the set of the group G and
every two cosets are equal or their intersection is empty. Thus we write f : G - t X
and
K={kEGlf(k·g)=f(g), V'gEG}.
The problem is, for given f and G determine the hidden subgroup K.
We describe briefly how the above mentioned problems can be expressed in terms
of the hidden subgroup problem.
Deutsch's problem. We set G = Z2 = {O, I} with the group operation· = EB, the
XOR operation. The only subgroups are {O} and {O, I}. If K = {O} then f is
balanced and if K = {O, I} then f is constant.
Simon's problem. For Simon's problem we have G = {O, l}n with· = EB, the bitwise
XOR operation. Simon's problem requires that f(x) = f(y) if and only if x = y or
x = y EB s. Immediately we see that K = {O, s}.
that it is known that a is of order r, i.e. aT = 1. Thus f(XI, YI) = f(X2' Y2) if and
only if
Xl - X2 = -m(YI - Y2) mod r.
Equivalently f(x, y) = f(x· s, Y . t) if and only if s = -mt mod r. The hidden
subgroup (which is used to determine m) is
1
Ig· K):= !IVi L Ig· k).
VIKI kEK
and measure the function value, the measurement projects the first register onto
one of the cosets of K (since the cosets form a partition of G). From the coset we
would like to determine K. The states Ig . k) are all displaced by 9 with respect
to the group operation. We can associate the idea of a periodic sequence 9 . k
where the "period" of the sequence is the generator of the subgroup K. Thus we
can try to apply a transform analogous to the quantum Fourier transform. The
transform is constructed using techniques from group representation theory. For
more information see Josza [99, 100].
The above mentioned problems are all defined with Abelian groups. The construc-
tion of the Fourier transform is also for Abelian groups. Non-Abelian hidden sub-
group problems create more difficulties, for some results on these problems see for
example Ivanyos et al. [95], and Rotteler and Beth [140].
530 Chapter 21. Quantum Algorithms
n-times
® U := 'u ® U ®... ® U.
n
Denote by X T the set of all bit sequences of length n which satisfy P, and by X F
the set of all bit sequences of length n which do not satisfy P. Thus applying Up
to l"po) gives
= ..; ~+1 Up ( E Ix) ® 10) - E Ix) ® 11) + E Ix) ® 10) - E Ix) ® 11))
2 xEXT XEXT XEXF XEXF
= ..;
2
~+1 ( E
xEXF
Ix) - E
xEXT
IX)) ® (10) -11))
21.6 Unstructured Search (Grover's Algorithm) 531
The amplitudes of the bit sequences satisfying P are negative. If the last qubit had
only been 10), the sequences satisfying P would be marked with a 11) in the last
qubit, but measuring would yield any sequence with equal probability, in general
obtaining a sequence which satisfies P would have low probability. The state 11/11) is
not much better, but can be manipulated to increase the probability of measuring
a sequence which satisfies P. The next step is to increase the absolute value of the
elements of X T , i.e. those elements in the superposition with negative amplitudes.
This is done with the inversion around average operation. This operation maps each
amplitude ai to 2A - ai where A is the average of all the amplitudes. We note
2A - ai == A + (A - ail
which explains the terminology "inversion around the average". The operation is
repesented by the 2n x 2n unitary matrix U1A (where fA indicates inversion about
the average)
J)
1 1
[-I
2 2
1 1
-2
(~ ~),
2
1 1
2 -2
1 1
2 2
... 1
... 2Ti"
2 )
-hn
. .. 2n
where 12 n is the 2n x 2n unit matrix. If only one state satisfies P, the inversion about
average operation inverts and increases the amplitude of the state with negative
amplitude while the other amplitudes decrease. The process is repeated (calculate
P on the bit sequences with the amplitudes of those states satisfying P negative, and
inversion about the average) iv'2" times for a greater than 50% chance of obtaining
the state which satisifes P [26]. The algorithm also works when more than one x
satisfies P [26]. Unlike classical algorithms, applying the process further will lead
to a decrease in the probability of measuring the required state. This is due to
the fact that the states are normalized, and operations perform a rotation in the
state space. Since iterations of the algorithm always perform the same rotation, the
rotation must at some stage necessarily move away from the desired state, although
it may approach the desired state again under further iteration of the algoritm. We
can think of the algorithm as the rotation of a ray from the origin to the surface of
the unit ball. For the case of a single qubit with real amplitudes the rotation is on
the unit circle. Figure 21.3 gives the network representation of the algorithm.
i v'2" times
~
10)
10)
10) -11)
The algorithm has been generalized for the case when the amplitudes of the states
in the superposition are initially in an arbitrary configuration [22].
Bennett et al [20] found lower bounds for the unstructured search problem and
proved that a square root speed up (obtained by Grover's algorithm) is optimal.
Consider any quantum algorithm A for solving the unstructured search problem.
First we do a test run of A on the function f == o. Define the query magnitude of x
21.6 Unstructured Search (Grover's Algorithm) 533
to be l:t la x ,tI 2 , where ax,t is the amplitude with which A queries x at time t. The
expectation value of the query magnitude is T /2n. Thus,
Let the states of the algorithm A run on f be 14>0),14>1)' ... ' I4>T). We run the
algorithm A on the function
g(x):= { °
Ix = Y
x=f.y
Suppose the final state of A run on 9 is I1/JT). Then [20] 1114>T) - I1/JT) II must be
small and
where
Now we prepare the database state as a tensor product of all items in the database,
the identifier of the item we are searching for, a qubit to store the search result and
534 Chapter 21. Quantum Algorithms
Is}. A superposition state of all the items in the database could also be used, but this
would require determining for each bit sequence if the item is in the database which
reduces the efficiency to no better than classical. We also assume that the database
is maintained in this tensor product form since constructing the quantum database
each time for a search again reduces the efficiency to no better than classical. The
initial state is
This state associates the data Idj } with the identifier Ix;}. We define the unitary
operator U~ by
From this point the algorithm proceeds as before. The probability of registering the
!
second quantum register as the id representing Xid is greater than and identifies
the element in the database to examine.
21.6 Unstructured Search (Grover's Algorithm) 535
Grover's algorithm has an interesting property when ~ of the states in the superpo-
sition are states satisfying P. We consider again the state l'l/h)
Now we have IXFI = ~2n and IXTI = ~2n. Applying U1A to the first n qubits (i.e.
we apply U1A 119 1) will ensure that measurement of the first n qubits will yield a
state from X T . To see this we calculate the average A of the amplitudes of the first
n qubits. We find
#
1 (34 41)
1
2#
Since each amplitude of a state in X F is ffn' these amplitudes become 2A- ffn = o.
Brassard and H0yer [27] discuss a related algorithm which works more generally,
when the probability of measuring a state which satisfies P is ~. Let l'lji) be the
state
l'lji) := IXT ) + IXF ),
where IXT ) is the superposition of all states satisfying P and IXF ) is the superpo-
sition of all states not satisfying P. There is no constraint on the amplitudes of the
states, except that l'lji) must be normalized. Obviously we must have (XTIX F) = O.
Suppose (XTIXT ) = t, then (XFIXF) = (1 - t). The algorithm transforms l'lji) to
l'lji') with
l'lji') = (2i(1- t) - l)IXT ) + i(l- 2t)IXF ).
If t = ~ then the amplitudes of all states in X F are zero. If t = 0 the algorithm
changes the global phase only. Let A be a quantum algorithm which evolves 10) (an
appropriate tensor product of 10) qubits) into the superposition l'lji), represented as
536 Chapter 21. Quantum Algorithms
is the single qubit phase change gate, ignoring the global phase. Similarly we need
So which takes the state 10) (n qubits) to iIO). The transform we use is described
by
G = UASOUi.1Sp.
We need to calculate (XTIGIXT) and (XFIGIXF). We have
Thus we obtain the desired result. This technique is used to remove 10) from the
superposition in the exact quantum algorithm for the solution of Simon's problem
since at least one qubit has probability ~ of being measured as 11) and every other
qubit has either probability 0 or ~.
21. 7 Quantum Key Distribution 537
Let Eo denote the basis {10), II)} and El denote the basis
Suppose Alice transmits the key and Bob is to receive the key. Alice and Bob agree
to use one to one mappings io : Eo ~ {O, I} and II : El ~ {O, I} to uniquely
convert between 0 and 1 and a given basis.
Bob
For each bit in the key Alice randomly chooses a basis from Bo and Bl and sends the
quantum state from that basis which corresponds to the bit. Bob randomly chooses
a basis from Bo and Bl and measures the quantum state he receives relative to this
basis. On average 50% of the time the basis chosen by Alice and Bob will be the
same. After Bob has received all the bits, Alice and Bob communicate on an open
channel to determine which quantum states were prepared and measured using the
same basis. This determines which bits are used, since Alice and Bob have the same
bit values in these cases. Suppose now a third party Eve attempts to obtain the
key from the quantum states sent by Alice to Bob. Eve attempts to measure the
states being sent from Alice to Bob by randomly choosing a basis from Bo and B 1 ,
which she chooses correctly 50% of the time. Eve then resends the quantum state
using the basis she guesses and the value she measured. When Bob uses the same
basis as Alice when Eve does not, Bob will measure the correct value 50% of the
time. This means that when Alice and Bob use the same basis when Eve attempts
to obtain the key Bob will obtain an incorrect value 25% of the time. Alice and
Bob agree after sending all the quantum states to use a number of the states where
the same basis was used to determine if someone tried to obtain the key. If enough
states are used an error rate of larger than (say) 5% may be agreed to indicate the
transmission was potentially influenced by a third party and is insecure. The rest
of the corresponding states are used as the key if the error rate is low enough.
Other techniques have been described for quantum key distribution based on Bell's
inequality [61], measurement uncertainty [12, 14, 15], and a distribution scheme
where quantum states are reused [36] using entanglement. These schemes rely on
the fact that the third party (Eve) cannot pretend to be Alice and Bob, i.e. Eve can
only inspect the quantum states which are sent, and cannot receive a state and send
another. If this were not the case Eve could impersonate Bob and Alice without
either party knowing.
21.8 Dense Coding 539
The transmitter (Alice) and the receiver (Bob) each have one quantum subsystem
which together form the quantum system of an already prepared EPR state
1 1
I¢) := y'2(100) + 111)) == y'2(IO) QS) 10) + 11) QS) 11)).
We let the first system denote Alice's quantum subsystem (qubit) of the EPR state
and the second system denote Bob's quantum subsystem (qubit) of the EPR state.
Alice can transform I¢) to anyone of the Bell basis states according to
1
( (~ ~1) QS) 12) I¢) y'2(IOO) -111)) == <p-
Alice has two bits representing the values 0,1,2 or 3. She transforms 11/1) according
to the following table.
The transformations are obviously unitary. Alice then sends her quantum qubit to
Bob. Now Bob applies a controlled NOT using the first (Alice's) qubit as the control
and then applies the Hadamard transform to the first qubit. Finally a controlled
NOT is applied to yield the data. The following table describes the quantum state
after the transformation.
22.1 Introduction
The concepts of classical information theory can be extended to quantum infor-
mation theory. Since in general measurement yields a result with probability, we
may suggest using these probabilities in classical information theory. However the
probabilities do not contain phase information, which cannot be neglected. Thus the
definitions are given in terms of the density operator. These probabilities depend on
the basis used for measurement. A density operator p over a n-dimensional Hilbert
space 11. is a positive operator with unit trace. The trace tr(A) is defined as
n
tr(A) := ~]{3jIAI{3j}
j=1
where (3j for j = 1, ... , n is any orthonormal basis in 11.. Thus tr(p) = 1. The
eigenvalues of a density operator are greater than zero. By the spectral theorem
every density operator can be represented as a mixture of pure states
n
p = LPjlaj)(ajl
j=1
where laj} for j = 1, ... , n are the orthonormal eigenvectors of p (which form a
basis in 11.), and
n
Pj ER, Pj 2:: 0, LPj = 1.
j=1
where N.(O:,'Y,{3) is the relative number of experiments where the first particle has
positive spin with the polarizer at angle 0: and negative with (3, and spin s with the
polarizer at angle 'Y. In a local hidden variable theory, these quantities are available.
Since
N(o:, 'Y) = N+(o:, (3, 'Y) + N_(o:, (3, 'Y),
A variant of Bell's inequality derived by Clauser, Horne, Shimony and Holt is given
by [127]
I(AB) + (AB') + (A' B) - (A' B') I ::; 2
The operators A and A' (respectively Band B') are normalized, noncommuting
and can be measured by an observer. The expectation values can be calculated if
the quantum state is known, and it is also experimentally observable, by repeating
the measurements sufficiently many times with identically prepared initial pairs of
quantum systems. The validity of this inequality for all combinations of independent
measurements on both systems is necessary, although not sufficient, for the existence
of a local hidden variable model.
544 Chapter 22. Quantum Information Theory
1
11/1):= y'2(100l) + 1110)).
We consider the basis a = {IL), IR)} and the basis (3 = {IH), IV)} described by
Expressing only one qubit of 11/1) in the basis a and the rest in the basis (3, and
expressing each qubit in the basis a yields
None of the results obtained are consistent with the final equation for 11/1) in the a
basis for all qubits.
22.3 Measures of Entanglement 545
where trB denotes the partial trace over E, i.e we use I @ l,Bj) as the basis for the
trace where l,Bj) is an orthonormal basis in E. The measure of entanglement E(AE)
is then defined as
E(AE) := S(PA)'
This describes the entanglement for pure states.
we have
where
PA,j = trB(I1{ij)(1{ijl)·
Example. For the Werner state
it has been shown that E (W) ~ 0.117 [19]. If we use the definition given for pure
states we obtain 1.
546 Chapter 22. Quantum Information Theory
Thus we can define the entanglement of a state as the minimum distance to any
state in the set S [176, 177]. The Hilbert-Schmidt norm is defined as
IIAIIHS:= Vtr(A*A).
where the minimum is taken over all states 0" E S which are not entangled.
Three processes are used to increase correlations between two quantum subsystems,
i.e. to distill locally a subensemble of highly entangled states from an original
ensemble of less entangled states.
where
aj = VjaVj*.
Two results are required before we can prove the quantum noiseless coding theorem.
22.4 Quantum Coding 549
If the quantum channel C has dimension d and any projection r onto a d dimensional
subspace of M has the property
Since the channel has only dimension d, the final decoded state w(a)MI is only sup-
ported on a d dimensional subspace of M'. Let r denote the projection onto this
subspace. In other words w(a)MI results from a unitary transformation of the sepa-
rable state w(a)c 0 OM'-C where OM'-C is the initial state introduced once the state
is transmitted. Let w(a)M',r denote w(a)M' in the subspace. The d eigenstates of
w(a)MI,r (denoted by I¢>(ah), .. . I¢>(a)d)) form an orthonormal basis in this subspace.
Let >.(ah, .. . ,>.(a)d denote the eigenvalues corresponding to these eigenstates. Then
w (a) M',r can be expressed as
d
w(a)MI,r = L >.(ahl¢>(ah)(¢>(ahl·
k=O
Denote by 11jI(a)k) the state I¢>(ah) 010r.L) which is the state I¢>(a)k) extended in
M'. The projection operator r is given by
d
r = L 11jI(a)k) (1j!(ahl·
k=O
Now
d
tr(laM)(aMI L >.(a)klljl(ah) (ljI(ahD
k=O
d
< laM) (aMI L tr(lljI(ah) (ljI(ahl)
k=O
550 Chapter 22. Quantum Information Theory
°
for some fixed :s; '" :s; 1, then there exists a transposition scheme with fidelity
F(M, M') > 1 - 2",.
where
and
Obviously
11'01 2 + 1I'0-L1 2 = 1.
Let the dimension of M be N, 11), ... , Id) denote an orthonormal basis in a, and
Id + 1), ... , IN) denote an orthonormal basis in ai.. Also let lIe), ... , Ide) be an
orthonormal basis in C and I(d + I)E)"" , INE) be an orthonormal basis in E (the
system representing the information lost during transmission). The states in a
will be used for transmission over the channel C. The initial state is prepared as
laM) ® 10e) ® 10E) where 10E) is the initial state for the system which represents the
°
loss of information, and we require (OElkE) = for k = d + 1, ... , N. The following
unitary transformation is used to prepare the state for transmission.
d N
'Ya L(klac)IO) 01kc) 010E) + 'Ya~ L (klac~)IO) o 10c) o IkE)
k=l k=d+l
where we used
d N
lac) := L(klac)lkc), laE):= L (klac~)lkE)'
k=l k=d+1
LP(a)(1-I'Yo.L1 2?
a
> 1 - 2 LP(a)I'Yo.L 12
a
It is given that
tr(pfa) LP(a)laM)(aMlfa
a
> 1-77
Thus
LP(a)I'YaI 2 > 1 - 77.
a
where Aa and la) are the eigenvalues and orthonormal eigenstates of p. The Von
Neumann entropy is then
S(p) = - LAalog2Aa.
a
For the density matrix ® N P of N identical and independent systems the eigenval-
ues and orthonormal eigenstates are given by the products of N eigenvalues and
eigenvectors of p.
22.4 Quantum Coding 553
If we interpret the eigenvalue Aa as the probability that the eigenstate la) is trans-
mitted then the Von Neumann entropy is the classical Shannon entropy of these
probabilities, and so following page 206, the number of sequences of N eigenstates
which are likely to be transmitted is bounded above by 2N (S(p)H) and below by
(1- E)2 N (S(p)-o).
1. If the quantum channel C has dimension at least 2S (p)+o then there exists
No(6, E) such that for all N > No sequences of eigenstates of p of length N can
be transmitted via C with fidelity greater than 1 - E.
tr(®p)f> 1- ~
N 2
2. If the quantum channel C has dimension at most 2S(p)-o then there exists
No(6, E) such that for all N > No sequences of eigenstates of p of length N
cannot be transmitted with fidelity greater than E.
F(M,M') < E.
•
554 Chapter 22. Quantum Information Theory
where the maximum is over all probability distributions and L:aEAP(a) = 1. The
quantity
H(A, C, c) := S(L p(a)c(a)) - L p(a)S(c(a))
aEA aEA
where b is the output from the alphabet B. We denote by p(bla) the probability
that b is the output (where Xb is identified with b) if the input was a. The Shannon
information is given by
23.1 Introduction
We use the state Ix)@IO), where Ix) is an encoded quantum state with the necessary
property that it can be used to determine if any error of E 1 , ••. , En has occurred,
and the second quantum register will hold the number of the type of error which
occurred. Further let S denote the operator for the error syndrome [138].
n
(E@ I) Ix) @ 10) = L Ejlx) @ 10).
j=l
n
S(E ® I) Ix) ® 10) = L Eilx) ® Ij)·
i=l
Measuring the second register identifies the error. Suppose the measurement corre-
sponds to Ik), then the error is easily repaired since
This illustrates that the additional difficulties in quantum error correction can, to
some degree, be overcome by the properties of quantum mechanics itself. In classical
error correction codes, duplication is used to overcome errors. This simple approach
cannot be directly applied in quantum error eorrecting codes since this would involve
a violation of the no cloning theorem. In the following section a code is introduced
which involves duplication of certain properties of a state, and does not violate the
no-cloning theorem. These duplications are specific to certain types of errors. The
code words for 10) and 11) must be orthogonal to make sure they are distinguishable.
Further error eorrecting techniques introduced are fault tolerant error correction
codes [58, 149, 157] which allow for some errors occuring in the error correction
process, and fault tolerant quantum gates [149].
10) -+ 1000)
11) -+ 1111)
Thus the qubit ItPo) = adO) + 1111) is mapped to
ItPl) = 01000) + 111111).
Thus a single bit flip error can be corrected by a majority value correction scheme.
First the additional syndrome register must be added. We apply the operator
So = (I ® US)UXOR(1,5)UXOR(1,6)UXOR(2,4)UXOR(2,6PXOR(3,4)UXOR(3,5)
23.2 The Nine-qubit Code 557
where UXOR(i,j) denotes the CNOT operator working with the ith qubit as the
control and the jth qubit as the target, and
The bit flip errors are corrected for each of the three 3-qubit registers in the same
way as above. The phase error (i.e. at most one sign change) is dealt with in
exactly the same way using the subspace described by {IOOO) + 1111), 1000) -ll11)}
instead of {IO), II)}. It is important to note that the total phase is ignored. Using
this procedure we can correct both bit flips and sign changes if they occur. The
operators I, (Jx, (Jy and (Jz described by
(Jy
0 -i)
=( i 0 '
Furthermore the unit matrix I effects no error on a qubit, (Jx effects a bit flip, (Jz
effects a sign change and (Jy effects a bit flip and sign change. All these errors can
be corrected by the nine-qubit code. Thus any linear combination of these errors
can be corrected. Consider the arbitrary phase change
which can also be corrected by this scheme. Thus the scheme can correct anyone-
qubit error.
558 Chapter 23. Quantum Error Detection and Correction
This coding is obviously orthogonal and can correct any single bit flip error. But
we require that the coding can correct phase errors as well. This can be done by
noting that a phase change is a bit flip when we apply the Hadamard gate
and use the basis {10') = UHIO), II') = UHII)}. Applying the Hadamard transform
to all seven qubits in the code gives
Thus the code can still correct any single qubit error.
This code still requires seven qubits to encode a single qubit. Thus a 128 qubit
system requires 896 qubits to operate reliably. With such a large number of qubits
it is possible that interactions with the environment involving not only single qubits
can become a larger problem. Thus it is desirable to encode a qubit with as few
qubits as possible, in the next section we show that a qubit can be encoded reliably
with less that 7 qubits.
23.4 Efficiency and the Five-quhit Code 559
31 31
L It£kl 2= L IVkl2 = 1
k=O k=O
31 31 31 31
L L t£kvl(kIEll) = L L Vkt£l(kIEll) = 0,
k=O 1=0 k=O 1=0
where t£k and Vk are the amplitudes of the encodings for 10) and 11), respectively.
For the code they obtain [108]
where
Ib1) 1000) + 1111)
Ib2) 1000) -1111)
Ib3) 1100) + 1011)
Ib4) 1100) - 1011)
Ib5) 1010) + 1101)
Ib6) 1010) - 1101)
Ib7) 1110) + 1001)
Ib8) 1110)-1001).
560 Chapter 23. Quantum Error Detection and Correction
The code was discovered by assuming that the absolute value of the non-zero am-
plitudes were equal and real. Thus a solution would be described exclusively by the
signs of the amplitudes. A computer search was used to find the code. A surprising
feature of the scheme is that the error correcting technique is the exact reverse of the
encoding technique [108], i.e we apply the same transformations but in the reverse
order. The following figure illustrates the encoding process
Iqabcd) --Iq'a'b'dd').
For the decoding we follow the process from right to left giving
Iq'a'b'e'd') --Iqabcd).
In the figure the 7r is a controlled phase change (multiplication with -1), the other
la) f-------"f"""-----Ia')
Ib) I---+---<i>---"---+---.---Ib')
Iq) ---+--+-....,.--E&---E&--+--i Iq')
Ie) f--t----<..........,f---t~-+_+-Id)
Id)----1 f--EIt--B7---EB--"*---ld')
gates have the usual meanings. A filled connection (circle) indicates the operation is
only applied when the corresponding qubit is 11) and an empty connection (circle)
indicates the operation is only applied when the corresponding qubit is 10). The error
syndrome and the result of the error on the state 0:10) +.811) (where 10:1 2 + 1.81 2 = 1)
is listed in the Table 23.4. Another 5 qubit code [19, 58] also found by a computer
search is given by
10) -- 15) := H+100000)
+111000) + 110001) + 100011) + 100110) + 101100)
-110100) -101001) -110010) -100101) -101010)
-111110) -111101) -111011) -110111) -101111))
11) -- Ii) := H+111111)
+100111) + I01ll0) + 11ll00) + IllOO1) + 110011)
-101011) -110110) -101101) -111010) -110101)
-100001) -100010) -100100) -101000) -110000)).
We note that
5
Ii) = ®UxoRli),
j=l
Table 23.1: Error Syndrome for the 5 Qubit Error Correction Code
Any two elements in the group either commute or anticommute, i.e for A, B E P
• if [A, BJ = AB - BA i- 0 it follows that [A, BJ+ = AB + BA = 0,
• if [A, Bl+ i- 0 it follows that [A, B] = O.
A consequence is that the set
Let 11/J) be a codeword from Cs. We suppose that E, an error, has operated on
11/J). Let M E S, E and M either commute or anticommute. Suppose E and M
commute, the state EI1/J) is an eigenstate of M
24.1 Introduction
• Storage. Qubits must be stored for long enough for a required algorithm
to complete and a result to be obtained. The discovery of quantum error
correcting codes decreases the hardware requirements at the cost of using
more qubits.
• Reliability. Algorithms must run reliably. Fault-tolerant gates and error cor-
recting codes can be used to satisfy the requirement provided that the hard-
ware only introduces errors which can be corrected.
remains coherent for a time comparable to the lifetime of the excited state, with
oscillating relative phase. To measure a qubit, a laser tuned to a transition from
the ground state to a short lived excited state is used to illuminate the ion. An ion
in the state 10) repeatedly absorbs and reemits the laser light. An ion in the state
11) will remain dark. Due to Coulomb repulsion, the ions are sufficiently separated
to be addressed by pulsed lasers. A laser tuned to the frequency w of the transition
focused on the appropriate ion induces Rabi oscillations between 10) and 11). Using
the appropriate laser pulse timing and phase, anyone qubit unitary transformation
can be applied to the ion.
The Coulomb repulsion between ions is used to achieve the interaction between
ions. The mutual Coulomb repulsion between the ions results in a spectrum of
coupled normal modes of vibration for the trapped ions. When the laser is correctly
tuned, then the absorption or emission of a laser photon by a single ion causes a
normal mode involving many ions to recoil coherently. The vibrational mode of
lowest frequency v is the center-of-mass mode. The ions can be laser cooled to
temperatures below that required for the center-of-mass mode, to levels such that
each vibrational mode is likely to occupy its quantum-mechanical ground state. A
laser tuned to the frequency w - v and applied to an ion for the time required to
rotate 11) to 10) and the the center-of-mass oscillation to transition from its ground
state to the first excited state, causes the information of the qubit to be transferred
to the collective state of motion of all the ions. Similarly, the information should be
transferred to another ion while returning the center-of-mass oscillator to its ground
state. Thus two ions can interact, and two qubit operations can be performed.
of the harmonically trapped ion. Manipulation between the four basis eigenstates
spanning the two qubit register is achieved by applying a pair of off-resonant laser
beams to the ion, which drives stimulated Raman transitions between basis states.
When the difference frequencey 5 is set near Wo transitions are coherently driven
between internal states IS) while preserving In). For 5 ::::J Wo - Wx (respectively
5 ::::J Wo + wx ) transitions are coherently driven between 11) 181 I 1) and 10) 181 I i)
(respectively 10) 18111) and 11) 1811 j)).
Briegel et al. [32] propose two other methods to implement a phase shift gate.
The first method involves moving the potentials of the traps towards each other in
a state-dependent way while leaving the shape of the potential unchanged. This
results in two kinds of phase shifts. A single particle kinetic phase shift and an
interaction phase shift due to coherent interactions between two atoms.
The second method involves changing the shape of the potentials with time, de-
pending on the internal states of the particles. The atoms are initially trapped in
two displaced wells. The barrier between the wells is removed (quickly) for atoms in
a state Ib) while atoms in state la) experience no change. The atoms are allowed to
oscillate for some time and then the barrier is raised (again quickly) such that the
atoms are trapped again in their initial positions. The atoms acquire a kinematic
566 Chapter 24. Quantum Hardware
phase due to the oscillations within their respective wells and an interaction phase
due to the collision.
Both methods implement the phase change gate similar to the transform given
above, except for some additional overall phase introduced in the transform.
Lent and Porod [130] suggest creating a cell of 5 quantum dots in the shape of an
"X" containing 2 electrons. The two electrons will be positioned in two opposite
corner quantum dots due to Coulomb repulsion. There are two such configurations
which can be identified with 0 and 1.
~ ~
~ ~
Figure 24.1: Two Possible Configurations for Quantum Dot Cells
24.4 Quantum Dots 567
The ground state of the system will be an equal superposition of the two basic
configurations with electrons at opposite corners. The quantum dots are labelled
as 0 for the top left corner, 1 for the top right corner, 2 for the bottom left corner,
3 for the bottom right corner and 0 for the center. Thus the cell polarity can be
defined as
4
Lni = 2.
i=O
We can identify the polarity of -1 with binary 0 and a polarity of 1 with binary
1, where the polarities are of the configurations given above. Polarities between
-1 and 1 are interpreted as superposition states, for example the ground state has
a net polarity of O. The electrons can move between quantum dots if they are
close enough due to quantum tunneling. If two cells must interact, they must be
sufficiently distant to prevent electrons tunneling between them. Their interaction
is due to Coulomb interaction between the electrons. Suppose two adjacent cells are
configured as follows
o o 0 o
o o
o o 0 o
Forcing the first cell into the configuration with polarity 1, causes the second cell to
reconfigure to a minimum energy configuration.
,- -~_._O_~----,----~_'_O_~---,I ~ L-I~_._O_~_·--'----~_O_~---'
This allows signals to propagate along a series of quantum dots as quantum wires.
Classical operations are also possible using these cells, for example the majority
568 Chapter 24. Quantum Hardware
~
~
~~~
~~~
~
~
Forcing any three outer cells into some configurations causes the last outer cell
to be forced to the majority configuration to achieve the lowest energy state. A
majority gate can be used to construct any other classical gate. The OR gate can
be constructed by fixing the polarity of one of the inputs to 1, and the AND gate
by fixing the polarity of one of the inputs to -1. A full adder has been constructed
using these principles.
Another approach using quantum dots aimed specifically at quantum computing has
been proposed [34]. This approach does not use cells of quantum dots, but rather
the electron spin of an electron contained in a quantum dot. The manipulation
of more than one qubit is required for computation. This is achieved by coupling
quantum dots. Due to the Coulomb interaction and the Pauli exclusion principle
the ground state for two qubits is an entangled spin state. The system is described
by the Heisenberg Hamiltonian
H.(t) = J(t)8 1 .82
where J(t) is the exchange coupling between 8 1 and 8 2 . If the exchange coupling is
pulsed such that
h1 JJ(t)dt = T
JOT.
= 7r (mod 27r)
then the associated unitary evolution describes the swapping Usw of the quantum
states of the two qubits. The XOR operation is obtained as
UXOR ( i7rsz)
= eXP"2 ( i7r SZ)U!swexp (.Z7rSZ)U!
1 exp -"2 2 1 SW
24.5 Nuclear Magnetic Resonance Spectroscopy 569
1
a combination of U§w and single qubit rotations. The XOR operation combined
with single qubit rotations is a universal set of quantum gates. Thus any quantum
algorithm can be implemented using these operations. With only two quantum dots
a gate operation can be performed using uniform magnetic fields. For more qubits
local magnetic fields are necessary. The requirement is reduced by noting that using
the swap operation, a qubit state can be transferred to a qubit where an operation
can take place and then back to the original position without influencing the other
qubit states.
In nuclear magentic resonance [50, 51, 103, 132] implementations, the qubit is iden-
tified with a nuclear spin in a molecule. A spin can be aligned (10)) or antialigned
(11)) with an applied magnetic field giving the basis of computation. The spins
take a long time to relax or decohere. The technique emulates a quantum compu-
tation using a large number of spins. Spin-active nuclei in each molecule of a liquid
sample are largely isolated from the spins in all other molecules, each molecule is
effectively an independent quantum computer. The computation is possible due to
the existence of pseudo-pure states, whose transformation properties are identical
to those of true pure states. Results of computations are then determined by, for
example, thermodynamic averaging. The method is chosen to average out unwanted
fluctuating properties so that only underlying coherent properties are measured. Al-
ternatively methods such as optical pumping and dynamic nuclear polarization can
be used to cool the system to a ground state. This leads to an ensemble quantum
computer.
Using a pulsed rotating magnetic field with frequency w determined by the energy
splitting between the spin-up and spin-down states, Rabi oscillations of the spin are
induced. The appropriate timing of the pulses can perform any unitary transform
on a single spin. All spins are exposed to the rotating magnetic field but only
those on resonance respond. The spins have dipole-dipole interactions which can be
exploited to perform two-qubit operations. The XOR (controlled NOT) operation
has been implemented using Pound-Overhauser double resonance and also using a
spin-coherence double reSOnance pulse sequence.
Average Hamiltonian theory can be used to implement quantum gates. The evo-
lution of the state at a time T is solved in terms of the time independent average
Hamiltonian H(T). The total Hamiltonian Htot(t) = Hint + Hext(t) is separated
into a time invariant internal Hamiltonian Hint and a time dependent Hamiltonian
570 Chapter 24. Quantum Hardware
U(T) = T exp ( -i 1
Htot(T)dT) = e- iHT
where T is the Dyson time ordering operator. For sufficiently small T the Magnus
expansion can be used to determine H. The coupling between qubits is always
active, thus it is useful to have an operation to suppress the undesirable couplings.
This can be achieved by an experimental method for "tracing out" or averaging out
unwanted degrees of freedom. The CNOT operation can be expressed as
since the only two-body Hamiltonian available in liquid state nuclear magnetic res-
onance spectroscopy is the scalar coupling o"z 181 O"z.
Using nuclear magnetic resonance techniques Deutsch's algorithm [43, 98], Grover's
algorithm [44, 173] and a generalization of Shor's algorithm [174] have been imple-
mented. Maximally entangled states using this technique have also been achieved.
Chapter 25
Internet Resources
In the following we give a collection of web sites which provide information about
quantum computing. The web sites provide tutorials, information on experimental
implementations and electronic versions of papers.
http://issc.rau.ac.za
The web site for the International School for Scientific Computing. The school
offers courses in scientific computing including a course on classical and quantum
computing.
http://xxx.lanl.gov
The web site of the Los Alamos National Laboratory pre-print archive. The
site provides access to pre-prints in the fields of physics, mathematics, nonlinear
sciences and computer science. A search engine is also provided.
http://www.qubit.org
The Centre for Quantum Computation, part of the University of Oxford, conducts
theoretical and experimental research into all aspects of quantum information
processing, and into the implications of the quantum theory of computation for
physics itself.
http://www.theory.caltech.edu/-preskill/ph229
Quantum Information and Computation course notes. Overview of cla.<:;sical
complexity theory, quantum complexity, efficient quantum algorithms, quantum
error-correcting codes, fault-tolerant quantum computation, physical implemen-
tations of quantum computation.
http://www.openqubit.org
A quantum computation simulation project on Intel based architectures. The
project goal is to develop a system for describing and testing quantum computing
algorithms.
http://squint.stanford.edu/
A collaboration between researchers at Stanford University and U.C. Berkeley,
involving the experimental and theoretical study of quantum-mechanical systems,
and how they can be utilized to process and store information.
http://qso.lanl.gov/qc/
An overview of the work done at Los Alamos on quantum computation and
cryptography is provided. A number of papers are also provided in electronic
form.
http://theory.caltech.edu/-quic/
Quantum Information and Computation (QUIC). A collaboration of groups at
MIT, Caltech and USC investigating experimental, theoretical, and modelling
quantum computation.
http://www.research.ibm.com/quantuminfo/
Quantum Information and Information Physics at IBM Research Yorktown. The
group's work main work is in quantum information and computation theory, but
they also study other aspects of the relation between physics and information
processing.
http://www.iro.umontreal.ca/labs/theorique/index.html.en
Laboratory for Theoretical and Quantum Computing of the Computer Science
Department of the University of Montreal. Includes a bibliography of quantum
cryptography.
http://www.fysel.ntnu.no/Optics/qcr/
Quantum cryptography in Norway. As the first large task of the project, they are
building the demonstrator of a point-to-point quantum key distribution channel.
Some have already been built and tested by other research groups. The basic
principles are well known, but what remains a challenge is approaching practical
applications. They are working in this direction.
http://www.nd.edu/-qcahome/
Quantum-dot Cellular Automata. A web site exploring the possibilities of using
quantum-dots to form quantum wires and to construct gates. The web site pro-
vides tutorials, simulations and electronic versions of some papers on the subject.
Bibliography
[1] Adami C. and N.J. Cerf, "What Information Theory Can Tell Us About Quan-
tum Reality" ,
http) /xxx.lanl.gov, quant-ph/9806047.
[2] Ammeraal L., STL for c++ Programmers, John Wiley, Chichester, 1997.
[3] Ash R. B., Information Theory, Dover Publications, New York, 1990.
[4] Bac Fam Quang and Perov V. L., "New evolutionary genetic algorithms for
NP-complete combinatorial problems", Biological Cybernetics 69, 229-234,
1993.
[5] Balakrishnan A. V., Applied FUnctional Analysis, Second Edition Springer-
Verlag, New York, 1981.
[6] Barenco A., "A Universal Two-Bit Gate for Quantum Computation",
http://xxx.lanl.gov, quant-ph/9505016
[7] Barenco A. et al., "Elementary gates for quantum computation" ,
http://xxx.lanl.gov, quant-ph/9503016
[8] Barenco A., "Quantum Physics and Computers", Contemporary Physics 37,
375-389, 1996.
[9] Bell J. S., Speakable and unspeakable in quantum mechanics, Cambridge Uni-
versity Press, Cambridge, 1989.
[10] Ben-Ari M., Mathematical Logic for Computer Science, Prentice Hall, New
York, 1993.
[11] Benioff P., "Models of Quantum Turing Machines",
http://xxx.lanl.gov, quant-ph/9708054
[12] Bennett C. H. and G. Brassard, "Quantum cryptography: Public-key distri-
bution and coin tossing", Proceedings of IEEE International Conference on
Computers, Systems and Signal Processing, Bangalore, India, 175-179 (1984).
[15] Bennett C. R., "Quantum cryptography using any two nonorthogonal states",
Phys. Rev. Lett. 68, 3121-3124 (1992).
[16] Bennett C. H. and S.J. Wiesner, "Communication via one- and two-particle
operations on Einstein-Podolsky-Rosen states", Phys. Rev. Lett. 69, 2881-2884
(1992).
[17] Bennett H. C, G. Brassard, Crepeau C., R. Jozsa, A. Peres and W. K.
Wootters, "Teleporting an Unknown Quantum State via Dual Classical and
Einstein-Podolsky-Rosen Channels", Phys. Rev. Lett. 70 1895-1899 (1993).
[28] Brassard G., Braunstein S. L. and R. Cleve, Physica D 120 43-47 (1998).
[45] Cichocki A. and Unbehauen R., Neural Networks for Optimization and Signal
Processing, John Wiley, Chichester, 1993.
[46] Cirac J. 1. and P. Zoller, "Quantum Computations with Cold Trapped Ions",
Physical Review Letters 74,4091-4094, 1995.
[49] Cohen D.1.A., Introduction to Computer Theory, Revised Edition, Wiley, New
York,1991.
[50] Cory D. G., M. D. Price and T. F. Havel, "Nuclear Magnetic Resonance Spec-
troscopy: An Experimentally Accessible Paradigm for Quantum Computing" ,
http://xxx.lanl.gov, quant-ph/970900l
[51] Cory D. G. et al., "NMR Based Quantum Information Processing: Achieve-
ments and Prospects" ,
http://xxx.lanl.gov, quant-ph/0004104
[52] Cybenko G., Approximation by superpositions of a sigmoidal function, Math-
ematics of Control, Signals and Systems 2, 303-314, 1989
[61] Ekert A., "Quantum cryptography based on Bell's theorem", Phys. Rev. Lett.
67, 661--663 (1991).
[62] Elby A. and J. Bub, "Triorthogonal uniqueness theorem and its relevance to
the interpretation of quantum mechanics", Physical Review A 49, 4213-4216,
1994.
[63J Epstein R. 1. and Carnielli W. A., Computability, Wadsworth & Brooks/Cole,
Pacific Grove, California (1989).
[64J Everett III H., "Relative state formulation of quantum mechanics" Review of
Modem Physics 29 454-462, 1957.
[65J Fausett L., Fundamentals of Neural Networks: Architecture, Algorithms and
Applications, Prentice Hall, Englewood Cliffs, N. J., 1994
[66J Ferreira C., "Gene Expression Programming: a New Adaptive Algorithm for
Solving Problems" ,
http://xxx.lanl.gov, cs.AI/0l02027
[67J Feynman R. P., A. J. G. Hey (Editor) and R. W. Allen (Editor), Feynman
Lectures on Computation, Perseus books, 1996.
[68] Feynman R. P., R. B. Leighton and M. Sands, The Feynman Lectures on
Physics Volume III, Addison-Wesley, Reading, MA, 1966.
[69J Funahashi K.-1., "On the approximate realization of continuous mappings by
neural networks", Neural Networks, 2, 183-192, 1989
[70] Gardiner S. A., J. 1. Cirac and P. Zoller, "Measurement of Arbitrary Observ-
abIes of a Trapped Ion" ,
http://xxx.lanl.gov, quant-ph/9606026
[71J Glimm J. and A. Jaffe, Quantum Physics, Springer-Verlag, New York, 1981.
[72J Goldberg D. E., Genetic Algorithms in Search, Optimization and Machine
Learning, Addison-Wesley, Reading, MA, 1989.
[73] Goldberg D. E. and R. Lingle, "Alleles, Loci, and the TSP", in Greffenstette,
J. J. (Editor), Proceedings of the First International Conference on Genetic
Algorithms, Lawrence Erlbaum Associates, Hillsdale, NJ, 1985.
[74] Gottesman D., "A Class of Quantum Error-Correcting Codes Saturating the
Quantum Hamming Bound" ,
http://xxx.lanl.gov, quant-ph/9604038
[75J Gottesman D., "An Introduction to Quantum Error Correction",
http://xxx.lanl.gov, quant-ph/0004072
[76J Grassberger P., Int. Journ. Theor. Phys. 25, 907, 1986.
[77] Grassmann W. K. and J.-P. Tremblay, Logic and Discrete Mathematics: A
Computer Science Perspective, Prentice Hall, New Jersey, 1996.
578 Bibliography
[81] Hardy Y., W.-H. Steeb and R Stoop, "Jacobi Elliptic Functions, Nonlinear
Evolution Equations and Recursion" , International Journal of Modern Physics
ell, 27-31, 2000.
[82] Hassoun M. H., Fundamentals of Artificial Neural Networks, The MIT Press,
Cambridge Massachusetts, 1995.
[83] Haykin S., Neural Networks, Macmillan College Publishing Company, New
York,1994.
[84] Healey R, The philosophy of quantum mechanics Cambridge University Press,
Cambridge, 1990.
[85] Hebb D.O., The Organization of Behaviour, John Wiley, New York, 1949.
[86] Holevo A. S., "The Capacity of Quantum Channel with General Signal States",
http://xxx.lanl.gov, quant-ph/9611023
[87] Holevo A. S., "Coding Theorems for Quantum Communication Channels",
http://xxx.lanl.gov, quant-ph/9708046
[88] Holevo A. S., "Coding Theorems for Quantum Channels",
http://xxx.lanl.gov, quant-ph/9809023
[89] Holland J. H. Adaptation in Natural and Artificial Systems, University of
Michigan Press, Ann Arbor, 1975.
[90] Horodecki M., P. Horodecki and R Horodecki, "Separability of mixed states:
necessary and sufficient conditions" ,
http://xxx.lanl.gov, quant-ph/9605038
[91] Horodecki P., M. Lewenstein, G. Vidal and 1. Cirac, "Operational criterion
and constructive checks for the separability of low rank density matrices." ,
http://xxx.lanl.gov, quant-ph/0002089
[92] Hornik K, M. Stinchcombe and H. White, "Multilayer feedforward networks
are universal approximators", Neural Networks 2, 359-366, 1989
[93] H0yer P., "Conjugated Operators in Quantum Algorithms",
ftp:/ /ftp.imada.sdu.dk/pub/papers/pp-1997/34.ps.gz
Bibliography 579
[95] Ivanyos G., F. Magniez and M. Santha, "Efficient quantum algorithms for
some instances of the non-Abelian hidden subgroup problem",
http) /xxx.lanl.gov, quant-ph/0l020l4
[96] Jianwei Pan and A. Zeilinger, Phys. Rev. A 572208-2212 (1998).
[97] Jones N. D., Computability Theory: An Introduction, Academic Press, New
York,1973.
[100] Jozsa R., "Quantum factoring, discrete logarithms and the hidden subgroup
problem",
http://xxx.lanl.gov, quant-ph/00l2084
[101] Kieu T. D. and M. Danos, "The halting problem for universal quantum com-
puters" ,
http://xxx.lanl.gov, quant-ph/9811001
[103] Knill E., I. Chuang and R. Laflamme, "Effective Pure States for Bulk Quantum
Computation" ,
http://xxx.lanl.gov, quant-ph/9706053
[108] Laflamme L., C. Miquel, J. P. Paz, and W. H. Zurek, "Perfect Quantum Error
Correction Code" ,
http://xxx.lanl.gov, quant-ph/96020l9
[109] Lempel A. and J. Ziv, "On the Complexity of Finite Sequences", IEEE Trans-
actions on Information Theory 22, 75-81, 1976.
580 Bibliography
[110] Linden N. and S. Popescu, "The Halting Problem for Quantum Computers",
http://xxx.lanl.gov, quant-ph/9806054
[111] Lloyd S. and H. Pagels, "Complexity as thermodynamic depth", Ann. Phys.
188, 186-213, 1988.
[112] Lloyd S, and Braunstein S. L., "Quantum Computation over Continuous Vari-
abIes", Physical Review Letters 82, 1784-1787, 1999.
[113] Lopez-Ruiz R., Mancini H. L. and X. Calbet, "A statistical measure of com-
plexity", Phys. Lett. A 209, 321-326, 1995.
[114] Lovasz L., Computation Complexity,
http://zoo.cs.yale.edu/classes/cs460/Spring98/complex. ps
[115] Mallozzi J. S. and N. J. De Lillo, Computability with Pascal, Prentice Hall,
New Jersey, 1984.
[116] Michalewicz Z., Genetic Algorithms + Data Structure = Evolution Programs,
Third Edition, Springer-Verlag, Berlin, 1996.
[117] Minsky M. L., Computation: Finite and Infinite Machines, Prentice Hall, New
York, 1967.
[118] Miquel C., J. P. Paz and R. Perazzo, "Factoring in a dissipative quantum
computer" , Physical Review A 54, 2605-2613, 1996.
[119] Moore C. and J. P. Crutchfield, "Quantum Automata",
http://xxx.lanl.gov, quant-ph/9707031
[120] Monroe C., D. M. Meekhof, B. E. King, W. M. Itano and D. J. Wineland,
"Demonstration of a Fundamental Quantum Logic Gate", Physical Review
Letters 75, 4714-4717, 1995.
[121] Mosca M. and A. Ekert, "The Hidden Subgroup Problem and Eigenvalue
Estimation on a Quantum Computer" ,
http://xxx.lanl.gov, quant-ph/9903071
[122] Mozyrsky D., V. Privman and M. Hillary, "A Hamiltonian for quantum copy-
ing", Physics Letters A 226, 253-256, 1997.
[123] Nielsen M. A. and I. L. Chuang, "Programmable Quantum Gate Arrays",
Physical Review Letters 79, 321-324, 1997.
[124] Orner B., http://tph.tuwien.ac.atroemer
[125] Ozawa M., "Quantum Turing machines: Local transition, preparation, mea-
surement and halting" ,
http://xxx.lanl.gov, quant-ph/9809038
[126] Ozawa M., "Entanglement measures and the Hilbert-Schmidt distance",
http://xxx.lanl.gov, quant-ph/0002036
Bibliography 581
[151] Skahill K., VHDL for Programmable Logic, Addison-Wesley, Reading Mas-
sachusetts, 1996.
[152] Stakgold 1., Boundary Value Problems of Mathematical Physics, Volume I,
MacMillan, New York 1967.
[153] Stallings W., Computer Organization and Architecture: designing for perfor-
mance, Fourth Edition, Prentice Hall, 1996.
[154] Steane A., "Multiple-Particle Interference and Quantum Error Correction",
http://xxx.lanl.gov, quant-ph/9601029
[155] Steane A., "The Ion Trap Quantum Information Processor",
http://xxx.lanl.gov, quant-ph/9608011
[156] Steane A., "Quantum computing",
http) /xxx.lanl.gov, quant-ph/9708022
[157] Steane A., "Efficient fault-tolerant quantum computing",
http://xxx.lanl.gov, quant-ph/9809054
[158] Steane A. and D. M. Lucas, "Quantum computing with trapped ions, atoms
and light",
http://xxx.lanl.gov, quant-ph/0004053
[159] Steeb W.-H., "Bose-Fermi Systems and Computer Algebra", Found. Phys.
Lett. 8 73-82, 1995.
[160] Steeb W.-H., Problems and Solutions in Theoretical and Mathematical Physics,
Volume I, World Scientific, Singapore, 1996.
Bibliography 583
[161J Steeb W.-H. and F. Solms, "Complexity, chaos and one-dimensional maps",
South African Journal of Science 92, 353-354, 1996.
[162J Steeb W.-H., Matrix Calculus and Kronecker Product with Applications and
C++ Programs, World Scientific, Singapore, 1997.
[163] Steeb W.-H., Hilbert Spaces, Wavelets, Generalized Functions and Modern
Quantum Mechanics, Kluwer Academic Publishers, Dordrecht, 1998.
[164] Steeb W.-H., The Nonlinear Workbook, World Scientific, Singapore, 1999.
[165J Steeb W.-H. and Y. Hardy, "Entangled Quantum States and a C++ Imple-
mentation", International Journal of Modern Physics C 11, 69-77, 2000.
[166] Steeb W.-H. and Y. Hardy, "Quantum Computing and SymbolicC++ Simu-
lations", International Journal of Modern Physics C 11, 323-334, 2000.
[167] Steeb W.-H. and Y. Hardy, "Entangled Quantum States", International Jour-
nal of Modern Physics C 39, 2765, 2000.
[168] Suzuki J., "A Markov Chain Analysis on Simple Genetic Algorithms" IEEE
Transactions on Systems, Man and Cybernetics 25, 655--{)59, 1995.
[169] Tan KS., W.-H. Steeb, and Y. Hardy, SymbolicC++ (2nd extended and revised
edition), Springer Verlag, London, 2000.
[171J van der Lubbe J. C. A., Basic Methods of Cryptography, Cambridge University
Press, Cambridge, 1998.
[172] Valafar H. Distributed Global Optimization and Its Applications, Purdue Uni-
versity, PhD. Thesis, 1995.
[186] Zukowski M., "Violations of Local Realism in the Innsbruck GHZ experiment",
http://xxx.lanl.gov, quant-ph/9811013
Index