Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
260 views

Classical and Quantum Computing With C++ and Java Simulations

Uploaded by

davichosan01
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
260 views

Classical and Quantum Computing With C++ and Java Simulations

Uploaded by

davichosan01
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 606

m

Yorick Hardy
Willi-Hans Steeb

Classical and
Quantum Computing
with C++ and Java Simulations

Springer Basel AG
Authors:

Yorick Hardy and Willi-Hans Steeb


International School for Scientific Computing
Rand Afrikaans University
P.O. Box 524
Auckland Park 2006
South Africa

2000 Mathematical Subject Classification 68Q01; 81P68

A CIP catalogue record for this book is available from the


Library of Congress, Washington D.C., USA

Deutsche Bibliothek Cataloging-in-Publication Data


Hardy, Yorick:
Classical and quantum computing with C++ and Java simulations /
Yorick Hardy ; Willi-Hans Steeb. - Basel ; Boston ; Berlin : Birkhäuser,
2001
ISBN 978-3-7643-6610-0 ISBN 978-3-0348-8366-5 (eBook)
DOI 10.1007/978-3-0348-8366-5

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concer-
ned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, repro-
duction on microfilms or in other ways, and storage in data banks. For any kind of use permission of the copy-
right owner must be obtained.

© 2001 Springer Basel A G


Originally published by Birkhäuser Verlag in 2001
Cover design: Micha Lotrovsky, 4106 Therwil, Switzerland
Printed on acid-free paper produced from chlorine-free pulp. TCF oo

987654321 www. birkhauser-science. com


Contents

List of Tables . · xiii

List of Figures · xv

List of Symbols . · xix

Preface . . . . . · xxi

I Classical Computing

1 Algorithms
1.1 Algorithms . ...... 3
1.2 Algorithm Verification. 6
1.3 Random Algorithms . . 10
1.4 Total and Partial Functions . 15
1.5 Alphabets and Words 18

2 Boolean Algebra
2.1 Introduction 23
2.2 Definitions. 24
2.3 Rules and Laws of Boolean Algebra 26
2.4 DeMorgan's Theorem 27
2.5 Further Definitions . 27
2.6 Boolean Function Implementation 32
2.6.1 Karnaugh Maps . . . . . . 35
2.6.2 Quine-McKluskey Method 38
2.7 Example Programs . . . . . . . . 41
2.7.1 Efficient Set Operations Using Boolean Algebra 41
2.7.2 Quine-McKluskey Implementation . . . . . . . . 46
vi Contents

3 Number Representation
3.1 Binary, Decimal and Hexadecimal Numbers. 51
3.1.1 Conversion .. . 53
3.1.2 Arithmetic .. . 58
3.1.3 Signed Integers 60
3.1.4 Overflow . . . . 67
3.1.5 Binary-Coded Decimal Form. 70
3.2 Floating Point Representation 72
3.2.1 Introduction .. 72
3.2.2 Representation . . . . 74

4 Logic Gates
4.1 Introduction 79
4.2 Gates .... 80
4.2.1 AND Gate. 80
4.2.2 OR Gate .. 81
4.2.3 XOR Gate. 82
4.2.4 NOT Gate (Inverter) 83
4.2.5 NAND Gate. 84
4.2.6 NOR Gate .. 85
4.2.7 XNOR Gate. 86
4.3 Buffer . . . . . . . . . 87
4.4 Tri-State Logic . . . . 88
4.5 Feedback and Gates . 89

5 Combinational Circuits
5.1 Introduction 91
5.2 Decoder... 92
5.3 Encoder . . . 93
5.4 Demultiplexer 96
5.5 Multiplexer.. 97
5.6 Binary Adder 98
5.6.1 Binary Half Adder 98
5.6.2 Binary Full Adder. 99
5.6.3 Binary Four-Bit Adder .100
5.6.4 Faster Addition · 101
5.7 Binary Subtraction . . . . . . · 102
5.8 Binary Multiplication . . . . . · 103
5.8.1 Unsigned Integer Multiplication · 103
5.8.2 Fast Multiplication . . . . . . · 105
5.8.3 Signed Integer Multiplication · 106
5.9 Binary Division . . . . . . . . . . . . · 107
Contents vii

5.10 Magnitude Comparator . . . .108


5.11 4-Bit ALU .......... · 110
5.12 Read Only Memory (ROM) . · 112
5.13 Combinational Programmable Logic Devices · 113
5.14 Programmable Gate Arrays . · 117
5.15 VHDL . . . . . . . . . . . . · 118

6 Latches and Registers


6.1 Introduction · 119
6.2 SR Latch. · 120
6.3 D Latch · 121
6.4 JK Latch. · 122
6.5 D Register · 123
6.6 JK Register · 124

7 Synchronous Circuits
7.1 Introduction · 125
7.2 Shift Registers · 127
7.3 Binary Counter · 129
7.4 Example Program · 133

8 Recursion
8.1 Introduction · 135
8.2 Example Programs · 140
8.3 Mutual Recursion · 152
8.4 Wavelets and Recursion. .156
8.5 Primitive Recursive Functions · 162
8.6 Backtracking . . . . . . . . . . · 165
8.7 Stacks and Recursion Mechanisms .168
8.7.1 Recursion Using Stacks. · 168
8.7.2 Stack Free Recursion · 169

9 Abstract Data Types


9.1 Introduction .171
9.2 Linked List . · 172
9.3 Stack · 187
9.4 Tree. · 190
viii Contents

10 Error Detection and Correction


10.1 Introduction .. . .197
10.2 Parity Function .. . .198
10.3 Hamming Codes . . . .199
10.4 Weighted Checksum. .204
10.5 Noiseless Coding Theorem .205
10.6 Example Programs .208

11 Cryptography
11.1 Introduction . 215
11.2 Classical Cypher Systems. .216
11.3 Public Key Cryptography. .221

12 Finite State Machines


12.1 Introduction . . .229
12.2 Finite Automata. . . . . . .. .230
12.3 Finite Automata with Output .233
12.4 Turing Machines .. .238
12.5 Example Programs ... .244

13 Computability and Complexity


13.1 Introduction . . . . . . . . 251
13.2 Computability...... .252
13.2.1 Church's Thesis . .252
13.2.2 The Halting Problem . .253
13.3 Godel's Incompleteness Theorem. .254
13.3.1 Godel Numbering . . . . . .254
13.3.2 Godel's Incompleteness Theorem .256
13.4 Complexity............. .256
13.4.1 Complexity of Bit Strings .256
13.4.2 NP-class of Problems . .259

14 Neural Networks
14.1 Introduction .261
14.2 Hyperplanes .266
14.3 Perceptron. .268
14.3.1 Introduction. .268
14.3.2 Boolean Functions .272
14.3.3 Perceptron Learning .275
14.3.4 Quadratic Threshold Gates .279
14.3.5 One and Two Layered Networks. .282
Contents ix

14.3.6 Perceptron Learning Algorithm . . . . . . . . . .283


14.3.7 The XOR Problem and Two-Layered Networks .289
14.4 Multilayer Perceptrons ... . .294
14.4.1 Introduction . . . . . . . . . . .294
14.4.2 Cybenko's Theorem. . . . .. .295
14.4.3 Back-Propagation Algorithm. .296

15 Genetic Algorithms
15.1 Introduction. .313
15.2 The Sequential Genetic Algorithm .315
15.3 Gray Code . . . . . .320
15.4 Schemata Theorem .323
15.5 Markov Chain Analysis .326
15.6 Bit Set Classes in C++ and Java .328
15.7 A Bit Vector Class. . . . . . . . . .333
15.8 Maximum of One-Dimensional Maps. .337
15.9 Maximum of Two-Dimensional Maps .346
15.10 The Four Colour Problem. .356
15.11 Problems with Constraints .360
15.11.1 Introduction . . . . .360
15.11.2 Knapsack Problem .362
15.11.3 Traveling Salesman Problem . .368
15.12 Other Applications for Genetic Algorithms .380
15.13 Distributed Global Optimization. .381
15.14 Genetic Programming . . . . . . .384
15.15 Gene Expression Programming. .392

II Quantum Computing

16 Quantum Mechanics
16.1 Hilbert Spaces .403
16.2 Linear Operators in Hilbert Spaces . .417
16.3 Schmidt Decomposition . . . . . . . .431
16.4 Spin Matrices and Kronecker Product .434
16.5 Postulates of Quantum Mechanics . . .442
x Contents

17 Quantum Bits and Quantum Computation


17.1 Introduction . . . . . . . . . . . . . . .451
17.2 Quantum Bits and Quantum Registers .452
17.2.1 Quantum Bits . . . .452
17.2.2 Quantum Registers .453
17.3 Entangled States ... .455
17.4 Quantum Gates . . . .463
17.4.1 Introduction. .463
17.4.2 NOT Gate .. .464
17.4.3 Walsh-Hadamard Gate .465
17.4.4 XOR and the Controlled NOT Gate. .467
17.4.5 Other Quantum Gates . . . . . . .468
17.4.6 Universal Sets of Quantum Gates .471
17.4.7 Functions .472
17.5 Garbage Disposal .476
17.6 Quantum Copying. .477
17.7 Example Programs .480

18 Measurement and Quantum States


18.1 Introduction . . . . . . .491
18.2 Measurement Problem .492
18.3 Copenhagen Interpretation .493
18.4 Hidden Variable Theories .495
18.5 Everett Interpretation . . .496
18.6 Basis Degeneracy Problem .498
18.7 Information Theoretic Viewpoint . .500

19 Quantum State Machines


19.1 Introduction . . . . .501
19.2 Quantum Automata. .501
19.3 Quantum Turing Machines .504

20 Teleportation
20.1 Introduction .507
20.2 Teleportation Algorithm .508
20.3 Example Program . . . . .511
Contents xi

21 Quantum Algorithms
21.1 Deutsch's Problem. .515
21.2 Simon's Problem. . .519
21.3 Quantum Fourier Transform .522
21.4 Factoring (Shor's Algorithm) .524
21.5 The Hidden Subgroup Problem. .528
21.6 Unstructured Search (Grover's Algorithm) .530
21.7 Quantum Key Distribution .537
21.8 Dense Coding . . . . . . .539

22 Quantum Information Theory


22.1 Introduction . . . . . . . .541
22.2 Von Neumann Entropy . .542
22.3 Measures of Entanglement .543
22.3.1 Bell's Inequality .. .543
22.3.2 Entanglement of Formation .545
22.3.3 Conditions on Entanglement Measures .546
22.4 Quantum Coding .548
22.5 Holevo Bound . . . . . . . . . . . . . .554

23 Quantum Error Detection and Correction


23.1 Introduction...... .555
23.2 The Nine-qubit Code . . . . . . . . .556
23.3 The Seven-qubit Code. . . . . . . . .558
23.4 Efficiency and the Five-qubit Code. .559
23.5 Stabilizer Codes . . . . . . . . . .. .561

24 Quantum Hardware
24.1 Introduction. .563
24.2 Trapped Ions. .564
24.3 Cavity Quantum Electrodynamics .565
24.4 Quantum Dots . . . . . . . . . . . .566
24.5 Nuclear Magnetic Resonance Spectroscopy .569

25 Internet Resources . .571

Bibliography .573
Index . . . . .585
List of Tables

2.1 AND, OR and Complement. 27


2.2 Parity Function . . 29
2.3 XOR Truth Table . . . . . 29
2.4 Full Adder . . . . . . . . . 34
2.5 4-bit Decimal Incrementer 37
2.6 Two's Complement Operation on 2 Bits. 39

4.1 Function Table and Truth Tables for a Logic Circuit . 79


4.2 Truth Table for the AND Gate. 80
4.3 Truth Table for the OR Gate. . 81
4.4 Truth Table for the XOR Gate. 82
4.5 Truth Table for the NOT Gate . 83
4.6 Truth Table for the NAND Gate. 84
4.7 Truth Table for the NOR Gate. . 85
4.8 Truth Table for the XNOR Gate. 86
4.9 Truth Table for the Buffer . . . . 87

5.1 Truth Table for the CMOS 4532 95


5.2 Truth Table for CMOS 4555 96
5.3 Half Adder Truth Table. . . . . 98
5.4 Full Adder Truth Table . . . . . 99
5.5 Truth Table for the CMOS 4585 . · 109
5.6 Function Table for CMOS 74LV688 .109
6.1 Characteristic Table for the SR Latch. · 120
6.2 Characteristic Table for the D Latch · 121
6.3 Characteristic Table for the J KLatch .122

7.1 Counting Sequence . . . . . . . . . . . · 129


8.1 Coefficients for Three Wavelet Functions .158

12.1 Parity Check Finite Automaton - Transitions .231


12.2 Hamming Code Finite Automaton - Transitions .232
12.3 Moore Machine for the NOT Operation - Transitions .234
12.4 n-bit Incrementer Moore Machine - Transitions . . . .235
12.5 Mealy Machine for the NOT Operation - Transitions .236
12.6 n-bit Incrementer Mealy Machine - Transitions . . . . .237
xiv List of Tables

12.7 Parity Check Turing Machine - Transitions . . . . . . .239


12.8 Parity Calculation Turing Machine Transitions . . . . .240
12.9 Turing Machine for the NOT Operation - Transitions .241
12.10 Bit Reversal Turing Machine Transitions . . . . . . . .242

14.1 Function Table for the Boolean Function (Xl· X2) + (X2 . X3) .271
14.2 Training Set for Parity Function .301

15.1 3 Bit Binary Gray Code .... .320

21.1 Dense Coding: Alice's Transformations .540


21.2 Dense Coding: Bob's Transformations. .540

23.1 Error Syndrome for the 5 Qubit Error Correction Code .561
List of Figures

4.1 Symbol for 2-input AND Gate . . . . . . . . ......... 80


4.2 Symbol for 2-input OR Gate . . . . . . . . 81
4.3 Symbol for 2-input XOR Gate . . . . . . . . . . . . . . 82
4.4 Symbol for the NOT Gate . . . . . . . . . . . . . . . . . . . . 83
4.5 Symbol for 2-input NAND Gate . . . . . . . . . . . . 84
4.6 XOR Implemented With NAND Gates . . . . . . . . . . 84
4.7 Symbol for 2-input NOR Gate . . . . . . . . . . . . . . . . . . 85
4.8 XOR Implemented With NOR Gates . . . . . . . . . . 85
4.9 Symbol for 2-input XNOR Gate . . . . . . . . . . . . . . . . . . . 86
4.10 Symbol for the Buffer. . . . . . . . . . . . . . . . . . . . . . . . . 87
4.11 (a) A tri-state inverter with an enable line, (b) a tri-state buffer with a
disable line. . . . . . . . . . . . . . . . . . . . 88
4.12 NAND Gate With Feedback . . . . . . . . . . . . . . . . . . . 89

5.1 Truth Table and Circuit of a 1-out-of-4 Decoder 92


5.2 Typical IC Encoder . . . . . . . . . . . . . . . . . . . . . 93
5.3 Circuit for the CMOS 4532 . . . . . . . . 94
5.4 Demultiplexer Circuit . . . 96
5.5 Multiplexer Circuit . . . . . . . . . 97
5.6 Half Adder Circuit . . . . . 98
5.7 Full Adder Circuit. . . . . . . . . . ........ 99
5.8 Two Full Adders in Parallel . . . . ........ 100
5.9 Four Bit Adder Consisting of Four Adders . . . . . . . . · .. 100
5.10 Circuit for the Carry Bit of a 3-bit Adder . . . . . · .. 101
5.11 Binary Subtraction Using the Two's Complement · .. 102
5.12 Unsigned 4-bit Multiplication . . . . . . ............ 104
5.13 2-bit Fast Unsigned Multiplication . . . . . . . . . . . . . . . . · .. 105
5.14 Logic Diagram for the CMOS 74LV688 ... . · .. 109
5.15 Logic Diagram for a ROM . . . . . . . . . . . . . · .. 112
5.16 Input Representation for Programmable Gates . · .. 113
5.17 PROM Device Architecture . . . . . . . · .. 114
5.18 PROM Implementation of XOR ..... 114
5.19 PAL Device Architecture . . . . · .. 115
5.20 PAL Implementation of XOR . · .. 115
5.21 PLA Device Architecture . . . · .. 116
5.22 PLA Implementation of XOR .. . · .. 116
xvi List of Figures

5.23 Example of a Combinational FPGA Cell · 117


5.24 Grid Pattern for PGA Design .. .117

6.1 Logic Diagram for the SR Latch. .120


6.2 Logic Diagram for the D Latch · 121
6.3 Logic Diagram for the D Latch with Enable · 121
6.4 Logic Diagram for the J KLatch. · 122
6.5 D Register Using Two D Latches · 123
6.6 Logic Diagram for the D Register · 123
6.7 JK Register Using Two JK Latches .124
6.8 Logic Diagram for the J K Register · 124

7.1 Example Clock Signal. .125


7.2 Level Sensitive Latch .126
7.3 Edge Triggered Latch. · 126
7.4 Types of Shift Registers · 127
7.5 Logic Diagram of a 4-bit Serial-Load Shift-Right Register · 128
7.6 Four-Bit Binary Counter . . . . . . . . . . . . . . . . · 129
7.7 Counter Waveforms Showing Frequency Division .. . · 130
7.8 Representation of Two Types of Up/Down Counters. · 131
7.9 2-bit Binary Ripple Counter . . . . . . . . . . .. · 131
7.10 2-bit Binary Parallel Counter . . . . . . . . . . . . · 131
7.11 Synchronous Circuit Specified in VHDL Program .132

8.1 First 3 Steps in the Construction of the Hilbert Curve. · 149


8.2 A Solution to the 8-Queens Problem. . . . . . · 165
9.1 Diagrammatic Representation of a Linked List · 172
9.2 Diagrammatic Representation of a Stack . . . · 187
9.3 Diagrammatic Representation of a Binary Tree . · 190
12.1 Parity Check Finite Automaton . . . . . · 231
12.2 Hamming Code Finite Automaton. . .. .232
12.3 Moore Machine for the NOT Operation . .234
12.4 n-bit Incrementer Moore Machine . . . .235
12.5 Mealy Machine for the NOT operation .236
12.6 n-bit Incrementer Mealy Machine .237
12.7 Parity Check Turing Machine . . . . . .239
12.8 Parity Calculation Turing Machine .. .240
12.9 Turing Machine for the NOT Operation. .241
12.10 Bit Reversal Turing Machine. .243

13.1 Universal Turing Machine .. .253

14.1 XOR Implementation Using NAND Operations .273


14.2 Quadratic Threshold Gate for XOR . . . . . . . .280
14.3 A Three-layered Network for the Computation of XOR .289
List of Figures xvii

15.1 A Map for the Four Colour Problem. .356

17.1 NOT Gate . . . . . . . .465


17.2 Walsh-Hadamard Gate .466
17.3 XOR Gate . . . . .468
17.4 Phase Shift Gate . . . .469
17.5 Tofolli Gate . . . . . . .470
17.6 Quantum Circuit to Generate Bell States .483
17.7 Quantum Circuit to Swap a Pair of Bits .483

20.1 Teleportation . . . . . . . . . . . . . . . .507


20.2 Experimental Realization of Teleportation .509
20.3 Quantum Circuit for Teleportation . . . . .510

21.1 Network Representation to Solve Deutsch's Problem. .518


21.2 Network for the Quantum Fourier Transform . .523
21.3 Network Representation of Grover's Algorithm. .532
21.4 Quantum Key Distribution . . . . . . . . . . . . .537
23.1 Encoding for the 5-qubit Error Correction Code .560

24.1 Two Possible Configurations for Quantum Dot Cells . .566


List of Symbols

0 empty set
N natural numbers
No NU {O}
Z integers
Q rational numbers
R real numbers
R+ nonnegative real numbers
e complex numbers
Rn n-dimensional Euclidean space
en n-dimensional complex linear space
1i Hilbert space
i :=A
Rz real part of the complex number z
~z imaginary part of the complex number z
AcB subset A of set B
AnB the intersection of the sets A and B
AUB the union of the sets A and B
fog composition of two mappings (j 0 g)(x) = f(g(x))
1/;, I1/;) wave function
t independent time variable
x independent space variable
xERn element x of Rn
11·11 norm
xx y vector product
0 Kronecker product, tensor product
1\ exterior product (Grassmann product, wedge product)
(,), ( I) scalar product (inner product)
det determinant of a square matrix
tr trace of a square matrix
{, } Poisson product
[, 1 commutator
[, l+ anticommutator
15jk Kronecker delta

° °
8 delta function
sgn(x) the sign of x, 1 if x > 0, -1 if x < 0, if x =
xx List of Symbols

eigenvalue
f real parameter
I unit operator, unit matrix
U unitary operator, unitary matrix
II projection operator, projection matrix
H Hamilton function
II Hamilton operator
v potential
bj , b{ Bose operators
Cj, cj Fermi operators
p momentum
P momentum operator
L angular momentum
i angular momentum operator
1;3) Bose coherent state
D differential operator a/ax
n+ Moller operator
Yim(B, ¢) spherical harmonics
AND operation in Boolean algebra
+ OR operation in Boolean algebra
EEl XOR operation in Boolean algebra
A negation of A in Boolean algebra
lxJ the greatest integer which is not greater than x
Preface

Scientific computing is not numerical analysis, the analysis of algorithms, high per-
formance computing or computer graphics. It consists instead of the combination
of all these fields and others to craft solution strategies for applied problems. It is
the original application area of computers and remains the most important. From
meteorology to plasma physics, environmental protection, nuclear energy, genetic en-
gineering, symbolic computation, network optimization, financial applications and
many other fields, scientific applications are larger, more ambitious, more complex
and more necessary. More and more universities introduce a Department of Sci-
entific Computing or a Department of Computational Science. The components of
this new department include Applied Mathematics, Theoretical Physics, Computer
Science and Electronic Engineering. This book can serve as a text book in Scientific
Computing. It contains all the techniques (including quantum computing). Most of
the chapters include C++ and Java simulations.
Chapter 1 covers the description of algorithms and informal verification techniques.
Some basic concepts for computing are also introduced, such as alphabets and words,
and total and partial functions.
Chapter 2 discusses Boolean algebra. The definition of a Boolean algebra is given,
and various properties of the algebra are introduced. The chapter focuses on how
Boolean algebra can be used to implement a computation. Methods are discussed
to obtain efficient implementations.
Chapter 3 deals with number representation for computing devices. This includes
the different implementations of integers, and the representation of real numbers.
Conversion between different representations of numbers is also described.
Chapter 4 gives an overview of logic gates, which serve as the building blocks for
implementing functions in digital electronics. All of the commonly used gates such
as AND, OR, XOR and their negations are discussed.
Chapter 5 shows how to use the gates introduced in Chapter 4 to build circuits
for specific purposes. The circuits described are important components in com-
puting devices. The arithmetic operations such as addition and multiplication are
described, as well as methods to increase the efficiency of the implementations. Var-
ious techniques for programming circuits are also considered, such as programmable
logic devices and programmable gate arrays.
Chapter 6 is about latches. Latches serve as memory for a computing device. We
consider three different types of latches. Using the latches, registers can be con-
structed which provide memory capability in a more useful form.
xxii Preface

Chapter 7 considers synchronous circuits. To perform a computation, certain oper-


ations must be applied at specific times. This chapter describes how the timing of
operations can be achieved.
Chapter 8 illustrates the technique of recursion and its usefulness through a num-
ber of problems with recursive solutions. The implementation of recursion is also
discussed. The description of recursive functions is also given, which is important
for discussions on computability.
Chapter 9 serves as an introduction to the concept of abstract data types. The data
types support a specific organization of information. Three examples are provided.
Chapter 10 is devoted to classical error detection and correction. Specifically the
chapter deals with Hamming codes to correct single bit errors, and the technique of
weighted checksums. Besides these techniques, the noiseless coding theorem (which
gives bounds on how reliably information can be transmitted with limited resources)
is also discussed in detail.
Chapter 11 deals with cryptography. Methods to encrypt information using a private
key are considered. The public key cryptography technique, which uses a computa-
tionally difficult system to provide security and only public keys are exchanged, is
described.
Chapter 12 introduces computing models in the form of state machines. A finite
automaton is described, and improvements are considered leading to the Turing
machine. An example program provided aids in the understanding of the operation
of a Turing machine.
Chapter 13 discusses the concepts of computability in terms of Turing machines,
functions, and complexity. Complexity is described in terms of the repetitiveness of
bit strings.
Chapter 14 provides an extensive discussion of neural networking techniques. The
different models are illustrated through example programs. The networks discussed
are the single layer perceptron and multilayer perceptron models.
Chapter 15 is concerned with the technique of random searches through a solution
space for an optimum in the form of genetic algorithms. Several problems illustrate
the strengths of variations to the genetic algorithm.
Chapter 16 considers the theoretical background needed for quantum computing.
Hilbert spaces are defined and linear operators and important properties are dis-
cussed. The postulates of quantum mechanics give the formal description of quan-
tum systems, their evolution and the operations that can be performed on them.
Chapter 17 describes the fundamentals of quantum computation. The basic unit
of storage, how a quantum computation evolves, what operations can be performed
and some important operations are discussed.
Chapter 18 deals with the different approaches proposed to explain the process of
measurement in quantum mechanics.
Chapter 19 is about quantum state machines. Extensions to classical state machines
are considered which can model quantum computations.
Preface xxiii

Chapter 20 is devoted to the description of the teleportation algorithm. The algo-


rithm is an important illustration of what can be achieved using the properties of
quantum mechanics.
Chapter 21 covers six quantum algorithms which display a significant advantage
over current classical algorithms. The problems include Deutsch's, problem which
cannot be solved classically, secure key distribution for cryptography, factoring and
database searching.
Chapter 22 discusses quantum information theory. The Von Neumann entropy is
introduced, and measurement of entanglement is considered. Finally bounds on
communication of quantum states with limited resources are considered in the form
of the quantum noiseless coding theorem and the Holevo bound.
Chapter 23 shows how to avoid errors in quantum states due to interaction with
the environment. Some of the techniques from classical error correction apply to
quantum error correction, but a new theory is developed to tolerate new errors
possible in the quantum computation model.
Chapter 24 explains some approaches to the physical implementation of a quantum
computing device. The device must have some properties such as the ability to store
quantum information and support the operations on quantum systems.
Chapter 25 lists sites on the internet where more information on quantum compu-
tation can be found.
Ends of proofs are indicated by'. Ends of examples are indicated by 4. Any useful
suggestions and comments are welcome. The e-mail addresses of the authors are:
whs@na.rau.ac.za
steeb_wh@yahoo.com
yorickhardy@yahoo.com

The web pages of the authors are


http://zeus.rau.ac.za
http://issc.rau.ac.za
Part I
Classical Computing
Chapter 1
Algorithms

1.1 Algorithms
An algorithm [48, 63, 77, 97, 115] is a precise description of how to solve a prob-
lem. For example algorithms can be used to describe how to add and subtract
numbers or to prove theorems. Usually algorithms are constructed with some basic
accepted knowledge and inference rules or instructions. Thus programs in program-
ming languages such as C++ and Java are algorithms. Thus an algorithm is a map
f : E -+ A of the input data E to the set of output data A.
Knuth [104] describes an algorithm as a finite set of rules which gives a sequence
of operations for solving a specific type of problem similar to a recipe or procedure.
According to Knuth [104] an algorithm has the following properties

1. Finiteness. An algorithm must always terminate after a finite number of steps.

2. Definiteness. Each step of an algorithm must be precisely defined; the actions


to be carried out must be rigorously and unambiguously specified for each
case.

3. Input. An algorithm has zero or more inputs, i.e., quantities which are given
to it initially before the algorithm begins.

4. Output. An algorithm has one or more outputs, i.e., quantities which have a
specified relation to the inputs.

5. Effectiveness. This means that all of the operations to be performed in the


algorithm must be sufficiently basic that they can, in principle, be done exactly
and in a finite length of time by a man using pencil and paper.

Not every function can be realized by an algorithm. For example, the task of adding
two arbitrary real numbers does not satisfy finiteness.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
4 Chapter 1. Algorithms

Example. The Euclidean algorithm is a method to find the greatest common divisor
(GCD) of two integers. The GCD d of two integers a and b is the integer that divides
a and b, and if c < d is a divisor of a and b then c divides d.
1. Let a' := a and b' := b

2. Let r and q be integers such that a' = qb' + rand 0 ::; r < b'

3. If r is not zero

(a) Let a' := b' and b' := r


(b) Goto 2

4. The GCD is b'.


To illustrate this we find the GCD of 21 and 18:

• 1) a' = 21 and b' = 18

• 2) q = 1 and r = 3

• 3) a' = 18 and b' = 3

• 2) q = 6 and r = 0

• 4) The GCD is 3.
Now we find the GCD of 113 and 49:

• 1) a' = 113 and b' = 49

• 2) q = 2 and r = 15
• 3) a' = 49 and b' = 15

• 2) q = 3 and r = 4
• 3) a' = 15 and b' = 4

• 2) q = 3 and r = 3

• 3) a' = 4 and b' = 3

• 2) q = 1 and r = 1

• 3) a' = 3 and b' = 1

• 2) q = 3 and r = 0

• 4) The GCD is 1.
1.1 Algorithms 5

Definition. Execution of an algorithm refers to the following (or execution) of the


steps given in the algorithm.

Definition. Termination of an algorithm is when an algorithm finishes, there is


nothing more to be done.

An algorithm executes uniquely if, for a given input, the termination of the algo-
rithm is always the same, i.e. the variables, memory, state, output and position of
termination in the algorithm are always the same.

Definition. An algorithm is said to be deterministic if the algorithm execution is


uniquely determined by the input.

Example. The Euclidean algorithm is deterministic, in other words for any given a
and b the algorithm will always give the same result (the GC D of the given values).

"
Definition. An algorithm which is not deterministic is said to be non-deterministic.

Example. An algorithm which follows a certain path with probability is non-


deterministic. For example a learning algorithm can use probabilities as follows:

• Suppose there are n options (paths of execution) available.

• The algorithm assigns probabilities P1,P2, ... ,Pn according to merit for each
of the options, where

n
Pi 2: 0, LPi = 1.
i=1

• The algorithm calculates the Shannon entropy

n
S:= - LPilog2(Pi)
i=1

which is a measure of the information available about the options.

• The algorithm then chooses an option (according to the given probabilities).

• The outcome of the event is used for learning where the learning is weighted
using S.
6 Chapter 1. Algorithms

1.2 Algorithm Verification


To illustrate the method of program verification we first introduce mathematical
induction.

If P is a property with respect to natural numbers Nand

• P holds for 1,
• if P holds for k then P holds for k + 1,

then P holds for all natural numbers.

So if P holds for 1 it also holds for 2 and therefore also for 3 and so on.

Example.

• For n = 1

t i = 1 = n(n + 1)
i=1 2

• Suppose

ti= k(k+1)
i=1 2
then

I:i=I>+k+1= k(k+l) +k+l= (k+l)(k+2)


i=1 i=1 2 2

Thus

ti
i=1
= n(n+ 1)
2

for n a natural number.


1.2 Algorithm Verification 7

Example. Let x ~ 1 and n E N .

• For n =1
(1 +xt ~ 1 +nx

• Suppose
(l+x)k ~ l+kx
then
(1 + x)k+l = (1 + x)k(l + x) ~ (1 + kx)(l + x)
now
(1 + kx)(l + x) = 1 + (k + l)x + x2 ~ 1 + (k + l)x
Thus
(1+xt~l+nx

for n a natural number.

This method allows us to verify that a property is true for all natural numbers by
"-
building on initial truths. The same method can be extended for algorithm verifi-
cation. A program starts with some conditions known to be true and verification
is the process of determining whether certain desired properties always hold during
execution to give the desired result.

Definition. An assertion is a statement about a particular condition at a certain


point in an algorithm.

Definition. A precondition is an assertion at the beginning of a sub-algorithm.

Definition. A postcondition is an assertion at the end of a sub-algorithm.

Definition. An invariant is a condition which is always true at a particular point in


an algorithm.

To prove that an algorithm is correct it is necessary to prove that the postcondition


for each sub-algorithm holds if the precondition for the same sub-algorithm holds.
It is also necessary to prove that invariants hold at their positions in the algorithm.
The preconditions, postconditions and invariants must of course reflect the purpose
of the algorithm.
8 Chapter 1. Algorithms

Example. We consider the algorithm to add n numbers Xl, X2, •.. , Xn:

1. Let sum be a variable which takes numerical values initially set to O.

2. Precondition: Xl,'" ,Xn are the n numbers to be added

3. For each of the n numbers Xi (i = 1,2, ... ,n) do sum := sum + Xi


Invariant:

i
sum = LXj
j=l

4. Postcondition:

5. The desired result is given by sum.

Example. The correctness of the Euclidean algorithm is based on a simple invariant.


..
1. Let a' := a and b' := b

2. Invariant: GCD(a,b)=GCD(a',b')
Let rand q be integers such that a' = qb' + rand r < b'
3. If r is not zero

(a) Let a' := b' and b' := r


(b) Goto 2

4. The GCD is b'.

To prove the invariant holds we need to prove that after step 2

GCD(a', b') = GCD(b', r).

Obviously GCD(b', r) divides GCD(a', b'). The reverse argument is also easy.
GCD(a', b') divides both a' and b' and therefore divides r. When r = 0 the GCD
~V. ..
1.2 Algorithm Verification 9

In C and C++ the function assert in the header file assert. h is provided to help
with debugging. The function takes one argument which must be an expression
with numerical value. The function assert aborts the program and prints an error
message if the expression evaluates to O.

Example. We can use assert to make sure that, whenever a program calls the
function sum, the function adds at least one number.

II sum.cpp
#include <iostream>
#include <assert.h>

using namespace std;

double sum(int n,double x[])


{
II precondition: x is an array of n doubles

assert(n> 0);

int i;
double sum = 0.0;

for(i=O;i < n;i++) sum += x[i]; II invariant sum = x[O]+ ... +x[i]

II postcondition: sum = x[O]+ ... +x[n-l]


return sum;
}

void main(void)
{
double x[5] = { 0.5,0.3,7.0,-0.3,0.5 };

cout « "sum=" « sum(5,x) « endl;


cout « "sum=" « sum(-l,x) « endl;
}

The output is:

sum=8
Assertion failed: n>O, file sum.cpp, line 9

abnormal program termination


10 Chapter 1. Algorithms

1.3 Random Algorithms


Random or stochastic algorithms use random numbers to try solve a problem. Gen-
erally the technique is used where approximations are acceptable and a completely
accurate answer is difficult to obtain.

Examples include Monte Carlo methods, genetic algorithms, simulated annealing


and neural networks. These algorithms are usually non-deterministic.

Random algorithms exist for numerical integration but other numerical methods are
generally better for not too large dimensions.

In C++ the functions rand and

void srand(unsigned)

in stdli b . hand

unsigned time(time_t *)

in time. h can be used to generate uniformly distributed random numbers. The


function call

srand(time(NULL))

initializes the random number generator. The function rand 0 generates a ran-
dom number between 0 and RANDJ1AX. Note that the random number sequences
generated in this way by the computer are not truly random and are eventually
periodic. The number sequences have properties which make them appropriate ap-
proximations for random number sequences for use in algorithms. The statement
double (rand () ) /RAND_MAX takes the integer returned by rand () and casts it to type
double so that the division by RANDJ1AX gives a random number of type double in
the unit interval [0,1].

Example. To calculate the value of 7r we use the fact that the area of a quadrant of
the unit circle

is ~. By generating random coordinates in the first quadrant of the unit circle


the proportion of coordinates in the unit circle to the total number of coordinates
generated approximates the area.

A few examples of the output are given below

pi=3.13994
pi=3.13806
pi=3.14156
pi=3.13744
1.3 Random Algorithms 11

II calcpi.cpp

#include <iostream>
#include <time.h>
#include <stdlib.h>

using namespace std;

void main(void)
{
const int n = 500000;
double x,y,pi;
int i;

II initialize the counter of the number of points


II found in the unit circle to zero
int in_count=O;

II initialize the random number with a


II seed value given by the current time
srand(time(NULL));

for(i=O;i<n;i++)
{
x = double(rand())/RAND_MAX;
y double(rand())/RAND_MAX;

if (x*x+y*y<=l)
in_count++;
}

pi 4. O*double(in_count)/n;
cout « "pi=" « pi « endl;
}
12 Chapter 1. Algorithms

Example. Annealing [164] is the process of cooling a molten substance with the
objective of condensing matter into a crystaline solid. Annealing can be regarded
as an optimization process. The configuration of the system during annealing is
defined by the set of atomic positions rio A configuration of the system is weighted
by its Boltzmann probability factor,
e-E(r;)/kT

where E(ri) is the energy of the configuration, k is the Boltzmann constant, and T
is the temperature. When a substance is subjected to annealing, it is maintained at
each temperature for a time long enough to reach thermal equilibrium.

The iterative improvement technique for combinatorial optimization has been com-
pared to rapid quenching of molten metals. During rapid quenching of a molten
substance, energy is rapidly extracted from the system by contact with a massive
cold substrate. Rapid cooling results in metastable system states; in metallurgy, a
glassy substance rather than a crystalline solid is obtained as a result of rapid cool-
ing. The analogy between iterative improvement and rapid cooling of metals stems
from the fact that iterative improvement and rapid cooling of metals accepts only
those system configurations which decrease the cost function. In an annealing(slow
cooling) process, a new system configuration that does not improve the cost function
is accepted based on the Boltzmann probability factor of the configuration. This
criterion for accepting a new system state is called the Metropolis criterion. The
process of allowing a fluid to attain thermal equilibrium at a temperature is also
known as the Metropolis process.

The simulated annealing procedure is presented below. Simulated annealing es-


sentially consists of repeating the Metropolis procedure for different temperatures.
The temperature is gradually decreased at each iteration of the simulated annealing
algorithm.

If the initial temperature is too low, the process gets quenched very soon and only
a local optimum is found. If the initial temperature is too high, the process is
very slow. Only a single solution is used for the search and this increases the
chance of the solution becoming stuck at a local optimum. The changing of the
temperature is based on an external procedure which is unrelated to the current
quality of the solution, that is, the rate of change of temperature is independent of
the solution quality. These problems can be rectified by using a population instead
of a single solution. The annealing mechanism can also be coupled with the quality
of the current solution by making the rate of change of temperature sensitive to the
solution quality.

In the following program we apply simulated annealing to find the minimum of the
function
f(x) = x 2 exp( -x/15) sinx.
1.3 Random Algorithms 13

II anneal.cpp
II simulated annealing
II x range: [0 : 100]

#include <iostream>
#include <math.h>
#include <stdlib.h>
#include <time.h>

using namespace std;

inline double f(double &x)


{
return sin(x)*x*x*exp(-x/15.0);
}

inline int accept(double &Ecurrent,double &Enew,double &T,double &s)


{
double dE = Enew - Ecurrent;
double k = 1380662e-23;

if(dE < 0.0)


return 1;
if(s < exp(-dE/(k*T)))
return 1;
else return 0;
}

int mainO
{
cout « "Finding the minimum via simulated annealing:" « endl;
double xlow = 0.0; double xhigh = 100.0;
double Tmax = 500.0; double Tmin = 1.0;
double Tstep = 0.1;
double T;

srand(time(NULL));
double s = rand()/double(RAND_MAX);

double xcurrent s*(xhigh - xlow);


double Ecurrent f(xcurrent);

for (T=Tmax; T>Tmin; T-=Tstep)


{
s = rand()/double(RAND_MAX);
double xnew = s*(xhigh - xlow);
double Enew = f(xnew);
if(accept(Ecurrent,Enew,T,s))
14 Chapter 1. Algorithms

{
xcurrent = xnew;
Ecurrent = Enew;
}
}

cout « "The minimum found is " « Ecurrent « " at x ="


« xcurrent « endl;

return 0;
}

Typical outputs are given below.

Finding the minimum via simulated annealing:


The minimum found is -121.796 at x = 29.8397

Finding the minimum via simulated annealing:


The minimum found is -121.796 at x = 29.8397

Finding the minimum via simulated annealing:


The minimum found is -121.749 at x = 29.874

The global minimum of f is found as one of the solutions to the transcendental


equation

15x*
tan(x*) = --3-
x' - a

in the interval [0, 100] with x :::::; 1;".


1.4 Total and Partial Functions 15

1.4 Total and Partial Functions


Functions and algorithms are closely related since algorithms are used to implement
functions in computing and algorithms can be described in terms of functions.

Definition. A function f : A --- B is a total function if f associates every a E A


with exactly one image f(a) in B.

Definition. A function f : A --- B is a partial function if it is total on some subset A'


of A. The set A' is then called the domain of f and is denoted dom(f). If a E A \ A' ,
f(a) is said to be undefined otherwise it is said to be defined.

Definition. Suppose f : A --- Band 9 : A --- B. By definition, f and 9 are equal if


and only if for each a E A, either

1. both f(a) and g(a) are defined, and f(a) = g(a); or


2. both f(a) and g(a) are undefined.

Definition. A function
f : Al x A2 X ... x An --- B
is said to be n-ary. Unary, binary and ternary are synonyms for 1-ary, 2-ary and
3-ary respectively. In the expression

we say that at, a2, . .. , an are the arguments of f.

Definition. The range of a function f : A --- B is the set { f(a) I a E dom(f) } and
is denoted rng(f).

Definition. The function f : A --- B is onto if rng(f) = B. f is one to one if


f(a) = f(a') implies a = a' for all a, a' E A.
Example. Let A = No x No and B = No. The function f : A --- B defined by
f(x,y)=x+y

is a total binary function.


Example. The function f : A --- B defined by

g(x,y) =x-y

is a partial function with domain { (x,y) I x ~ y}.


16 Chapter 1. Algorithms

Definition. A partial or total n-ary function f is said to be effectively computable


if there is an effective process which, when given any n argument values Xl, ... , Xm
will either

1. eventually halt, yielding f(XI, .. . ,Xn ) if it is defined, or

2. never halt if f(XI, . .. ,Xn ) is undefined.

Definition. The characteristic function of a set A is defined as

Ix E A
X(x):= { 0
x~A

Definition. Let f : A -+ Band 9 : B -+ C where f is partial and 9 is total. The


composition 9 0 f : A -+ C is defined as
(g 0 J)(a) := g(f(a)).

If f is total then 9 0 f is total.

Example. In the theory of Lie transformation groups the following function

exp(O'.D)f

plays a central role, where f : Rn -+ R is an analytic function, D is the differential


operator

where Di : Rn -+ R are analytic functions and 0'. is a parameter (0'. E R). If n = 1


and D := d/dx we have
exp(O'.D)f(x) = f(x + 0'.).
Thus the argument X of the function f maps to x + 0'. (translation). The following
C++ program shows an implementation of this function.
1.4 Total and Partial Functions 17

II trans.cpp

#include <iostream>
#include <math.h>

using namespace std;

template <class T> T translation(T (*f)(T),T x,T alpha)


{
return f(x + alpha);
}

double fl(double x)
{
return sin(x);
}

int f2(int x)
{
return x*x;
}

void mainO
{
double xl = 1. 0;
double alphal = 0.5;

cout « "f1(xl=" « xl « ") = " « f1(x1) « endl;


cout « "fl(" « xl « " + " « alphal « ")
« translation(fl,xl,alphal) « endl;

int x2 = 5;
int alpha2 = 3;

cout « "f2(x2=" « x2 « ") = " « f2(x2) « endl;


cout « "f2(" « x2 « " + " « alpha2 « ")
« translation(f2,x2,alpha2) « endl;
}
18 Chapter 1. Algorithms

1.5 Alphabets and Words


Alphabets and words are used as the inputs and outputs for computing. They are
used in complexity and computability analysis. Words can also be considered as
sequences of characters, i.e. strings.

Definition. An alphabet is any finite set of symbols.

Definition. A word over an alphabet E is any finite string of symbols from E. E*


denotes the set of all words over E.

Definition. The length of a word x is the number of symbols contained in x and is


denoted by Ixl.
Definition. The word of length 0 is called the null or empty word and is denoted
by f or A.

Definition. Let x, y E E* where x = a]a2 .. ' an and y = b]b2 .. . bm. The concatena-
tion of x and y is xy = a]a2" . anb1b2 ... bm.

Definition. x E E* is a prefix of y E E* if there exists Z E E* such that y = xz.

For any symbol a E E*, am denotes the word of length m consisting of m a's.

Definition. Let X, Y ~ E*.

• Xy = { xy I x E X, Y E Y }

• 1. XO = {f}
2. xn+] = xnx, for n 2: 0

00

• X* = U xn
n=O

00

• X+ = U xn
n=]

The set En is the set of all words of length n over E.


1.5 Alphabets and Words 19

Example. Lindenmayer systems or L-systems consist of a set of rules for modify-


ing a word to produce a new word. Lindenmayer systems play a role in modelling
biological systems. The rules specify for each symbol in the alphabet, a word with
which to replace it. This system is called a OL-system. The L-language correspond-
ing to a ruleset is the set of all words derived by successive application of the ruleset
to all symbols in the alphabet. An example ruleset for the alphabet {O, I} is

o---+ 1, 1 ---+ 01.

Thus beginning with 0, this produces a series of derivations as follows

{O, 1,01,101,01101,10101101, ... }.

This is the L-Ianguage for this ruleset. Each word in the derivation is simply the
concatenation of the previous two words in the derivation. We can prove this fact
by induction. Let L(wj) denote the mapping from the bit string Wj to the next
derivation using the ruleset and starting from 0, and Wj be the j-th bit string in the
derivation. We have

Wo 0
WI L(wo) = 1
W2 L(WI) = L(l) = 01 = WOWI
W3 L(W2) = L(01) = 101 = WIW2

By induction

Wj+1 L(wj)
L( Wj-2Wj-l)
L( wj-2)L( wj-d

The following Java program shows how to implement the derivation. We use the
StringBuffer class which is built into Java. The StringBuffer class implements
a mutable sequence of characters. The method
StringBuffer append(String str)

in class StringBuffer appends the String str to the StringBuffer.


20 Chapter 1. Algorithms

I I LSystem.java

class LSystem
{
public static void map(StringBuffer sold,StringBuffer snew)
{
int i;
for(i=O; i < sold.length(); i++)
{
if(sold.charAt(i) '0') snew.append("l");
if(sold.charAt(i) '1') snew.append("Ol");
}
} II end method map

public static void main(String[] args)


{
StringBuffer sold = new StringBuffer("01101");
StringBuffer snew = new StringBuffer(IIII);

map(sold,snew);
System.out.println(II snew = II + snew); II 10101101

StringBuffer sO = new StringBuffer("O");


StringBuffer sl = new StringBuffer('"');

int j;
for(j=O; j < 6; j++)
{
map(sO,sl);
sO = sl;
System.out.println(" s = II + sO);
sl = new StringBuffer('"');
}
}
}
1.5 Alphabets and Words 21

Example. UTF-8 encoding is an efficient method of coding characters and words


from many languages as integers. The encoding uses variable length codes to ob-
tain the efficiency by noting that the most common characters used are from the
ASCII character set. The Java language uses the methods writeUTFO in class
DataOutputStream and readUTFO in class DatalnputStream to implement the en-
coding and decoding to and from UTF-8. ASCII codes (the codes numbered from
1 to 127) are stored in 8 bits with the highest order bit set to zero.

The encoding for a String begins with two bytes for the length of the string. The
first byte is the high order byte and the second byte is the low order byte. The
character encoding follows this. A zero value is encoded as two bytes

11000000,10000000.

The bytes are written in left to right order. All ASCII codes from 1 to 127 are
written using a single byte with a leading 0 bit,

o(0-6)
where (Q--6) indicates that the bits indexed by 0, I, ... 6 are written in the remaining
bit positions. All codes in the range 128 to 2047 are encoded as two bytes

110(6-10), 10(0-5).

Finally all codes in the range 2048 to 65535 are encoded as three bytes

1110(12-15), 10(6-11), 10(0-5).

Thus the string "UTF example" would be encoded as the bytes (in hexadecimal)

00, OB, 55, 54, 46, 20, 65, 78, 61, 6D, 70, 6C, 65.

The following Java program uses the above methods to illustrate the encoding.
22 Chapter 1. Algorithms

II UTFexample.java

import java.io.*;

public class UTFexample


{
public static void main(String[] args) throws IOException
{
DataOutputStream output =
new DataOutputStream (new FileOutputStream (llmyout . dat ") ) ;

String s = new String(IIUTF example");


System.out.println(lI s = II + s);

output.writeUTF(s);
output.flushO;
output. close 0 ;

DatalnputStream input =
new DatalnputStream(new FilelnputStream(lmyout.dat"));

String t = input.readUTF();
input. close 0 ;

System.out.println("t II + t);
}
}
Chapter 2
Boolean Algebra

2.1 Introduction
Boolean algebra forms the theoretical basis for classical computing. It can be used
to describe the circuits which are used as building blocks for classical computing.

In this chapter we introduce the definitions of Boolean algebra and the rules for
manipulation. We introduce the standard forms for manipulation and describe how
Boolean algebra can be used to describe functions. Efficiency is an important issue
in computing and we describe the methods of Karnaugh maps and Quine-McKluskey
to simplify expressions.

At the end of the chapter two programs are given to illustrate the concepts. The first
example program uses the properties of Boolean algebra to efficiently implement sets
in C++. This implementation reduces the memory requirements for a set since only
one bit of information is needed for each element of the set. The second example is an
implementation of the Quine-McKluskey method in C++. The Quine-McKluskey
method is easier to implement on computer whereas the Karnaugh map method is
easier to do by hand.

The smallest Boolean algebra consists of two elements usually labelled 0 and 1 or
false and true but larger Boolean algebras exist.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
24 Chapter 2. Boolean Algebra

2.2 Definitions
Definition. A Boolean algebra is a closed algebraic system containing a set B of two
or more elements and two operations

.: B x B --+ B, +:BxB--+B

with the following properties:

• Identity Elements. There exist unique elements 0,1 E B such that for every
AEB
1. A+O=A
2. A·I = A

• Commutativity. For every Ao, Al E B

1. Ao + Al = Al + Ao
2. Ao' Al = Al . Ao
• Associativity. For every A o, AI, A2 E B
1. Ao + (AI + A2) = (Ao + AI) + A2
2. Ao' (AI' A2) = (Ao . AI) . A2

• Distributivity. For every A o, AI, A2 E B


1. Ao + (AI' A2) = (Ao + AI) . (Ao + A2)
2. Ao' (AI + A2 ) = (Ao . AI) + (Ao . A2)
• Complement. For every A E B there exists A E B such that
1.A+A=I
2. A·A= 0
The operations· and + are referred to as the AND and OR operations respectively.
o is called the identity element for the OR operation and 1 is called the identity
element for the AND operation. The complement will also be referred to as the NOT
or negation operation. The AND operation is sometimes referred to as conjunction.
The OR operation is sometimes referred to as disjunction.

From the properties of identity elements and complements we find

0=1 and 1=0.


2.2 Definitions 25

Example. The smallest Boolean algebra consists of the identity elements {O, I}.
The Boolean algebra can be summarised in a table.

Ao Al Ao +AI Ao' Al Ao
0 0 0 0 1
0 1 1 0 1
1 0 1 0 0
1 1 1 1 0

Example. The set P(X) (set of all subsets of the finite set X) of a non-empty set X
with· the intersection of sets, + the union of sets and the complement with respect
to X as negation forms a Boolean algebra with identity elements 0 = 0 and 1 = X.
This Boolean algebra has 21x1 members, where IXI denotes the cardinality (number
of elements) of X. ..

Example. The set A of all functions from the set {PI,P2, ... ,Pn} into {O, I} (i.e.
a function in the set assigns 0 or 1 to each of PI, P2, ... , Pn) and ., + and negation
described pointwise by the definitions in the first example forms a Boolean algebra.
For example, if h, h E A then

(h + h) (Pi) = h(Pi) + h(Pi)


and
(h, h) (Pi) = h(Pi)' h(Pi).
The Boolean algebra has 22n members and is called a free Boolean algebra on the
generators Pl,P2,··. ,Pn· ..
Example. Let {Pl,P2,"'} be a countably infinite set. Then we can again form a
free Boolean algebra on this generating set by considering finite Boolean expressions
in the Pi. ..
26 Chapter 2. Boolean Algebra

2.3 Rules and Laws of Boolean Algebra


The following are consequences of the definitions.

• Double negation. A = A
• Idempotence.
1. A·A=A
2. A+A=A

• Absorption.
I.A+1=1
2. O· A = 0
3. Ao + Ao . Al = Ao
4. Ao' (Ao + AI) = Ao
5. Ao' Al + Al = Ao + Al
6. (Ao + AI) . Al = Ao . Al
The double negation property is obvious. The idempotence property follows from

1. A· A = A· A + 0 = (A· A) + (A . A) = A· (A + A) = A· 1 = A
2. A + A = (A + A) . 1 = (A + A) . (A + A) = A + (A· A) = A + 0 = A
The absorption properties are derived as follows

1. A = A + A = A + (A· 1) = (A + A) . (A + 1) = A . (A + 1)
2. A = A . A = A . (A + 0) = (A· A) + (A . 0) = A + (A . 0)
3. Ao + Ao . Al = Ao . 1 + Ao . Al = Ao . (1 + AI) = Ao
4. Ao' (Ao + AI) = (Ao . Ao) + (Ao . Ad = Ao + Ao . Al
5. Ao' Al + Al = (Ao + AI) . (AI + AI) = (Ao + AI) . 1 = Ao + Al
6. (Ao + AI) . Al = (Ao . AI) + (AI' Ad = (Ao . AI) + 0 = Ao . Al
2.4 DeMorgan's Theorem 27

2.4 DeMorgan's Theorem


Another property of Boolean algebra is given by DeMorgan's theorem

Ao + Al == Ao . Al
Thus the left-hand side of the two identities involves two operations and the right-
hand side three operations. DeMorgan's theorem can be proved using the properties
given above. It describes the relationships between the operations +, . and negation.

(Ao . Ao . AI) + (Ao . Al . AI)


O· Al + Ao· 0
0+0=0

(Ao + Ao + Ad . (AI + Ao + AI)


(1 + AI) . (1 + Ao)
1·1 = 1

This theorem is very important for building combinational circuits consisting of only
one type of operation.

2.5 Further Definitions


We will use the set B = {O, I} and the operations AND, OR and complement defined
by Table 2.1.

Ao Al An· Al An+ Al Ao
0 0 0 0 1
0 1 0 1 1
1 0 0 1 0
1 1 1 1 0

Table 2.1: AND, OR and Complement.


28 Chapter 2. Boolean Algebra

Definition. A Boolean function is a map f : {a, l}n ----> {a, I} where {a, l}n is the
set of all n-tuples consisting of zeros and ones.

Definition. Boolean variables are variables which may only take on the values of
or 1.
°
Definition. Bit is short for binary digit which refers to a °or 1.

Definition. A literal is a variable or the complement of a variable.

We will use the notation Bn:= B x B x ... x B (n times). Thus Bn = {a, l}n.

Definition. Any function f : Bn ----> B can be represented with a truth table.

Ao Al ... A n- l f(Ao, AI,·· . ,An-I)

°
°°°... °
...
1
f(Ao
f(Ao
= 0, Al = 0, ... ,An- l = 0)
= O,AI = 0, ... ,An- l = 1)

1 1 ... 1 f(Ao = 1, Al = 1, ... ,An- l = 1)

The rows of the table are over all combinations of A o, AI, . .. ,An-I.

There are 2n such combinations. Thus the truth table has 2n rows.

Definition. Two functions f : Bn ----> B, g: Bn ----> B are equivalent if

for all Ao, ... ,An - l E {a, I}.

Definition. A product form is an AND of a number of literals h . l2 .... lm.

Definition. A sum of products (SOP) form is an OR of product forms

Definition. Disjunctive normal form is a disjunction of conjunctions of literals. This


is equivalent to SOP form.

Definition. Conjunctive normal form is a conjunction of disjunctions of literals.


2.5 Further Definitions 29

Theorem. Any function f : En ...... E can be represented in SOP form.

To see this we construct product forms Pj = Ij,1 . Ij,2 .... lj,n for each row in the
truth table of f where f = 1 with lj,i = A; if the entry for Ai is 1 and lj,i = Ai if
the entry for Ai is O. If f = 1 in m of the rows of the truth table then

Example. Consider the parity function for two bits with truth table Table 2.2.

Ao Al P(Ao, AI)
0 0 1
0 1 0
1 0 0
1 1 1

Table 2.2: Parity Function

Definition. A canonical SOP form is a SOP form over n variables, where each
variable or its negation is present in every product form, in other words a Boolean
expression E is in canonical SOP form if it can be written as

E = h,1 . h,2 ..... h,n + 12,1 . 12,2 ..... 12,n + .... +lm,1 . lm,2 ..... lm,n
where li,j = Aj or li,j = Aj .

Definition. The exclusive OR function is defined by the following table.

Ao Al Ao EEl Al
0 0 0
0 1 1
1 0 1
1 1 0

Table 2.3: XOR Truth Table


30 Chapter 2. Boolean Algebra

Some more properties of the XOR operation are given below:

• AE9A=O
• AE9A=l
• Ao E9 Al = Al E9 Ao
• Ao E9 Al = Ao E9 Al
• (Ao E9 AI) E9 A2 = Ao E9 (AI E9 A2)
• Ao E9 Al = Ao . Al + Ao . Al
• (Ao' Ad E9 Ao = (Ao E9 AI) . Ao = Ao . Al
The XOR operation can be used to swap two values a and b (for example integers
in C++ and Java):
1. a:= a E9 b
2. b:= a E9 b
3. a:= aE9 b
By analysing the variables at each step in terms of the original a and b the swapping
action becomes clear. In the second step we have (a E9 b) E9 b = a E9 0 = a. In the
third step we have (a E9 b) E9 a = b E9 0 = b.

In C, C++ and Java the XOR operation is denoted by -. The following C++
program illustrates the swapping.
II xor.cpp

#include <iostream>

using namespace std;

void main(void)
{
int a=23;
int b=-565;

cout « "a = " « a « " • b = " « b « endl;


a -= b; b -= a; a -= b;
cout « "a = " « a « " • b = " « b « endl;
}

The results are


a = 23 • b = -565
a = -565 • b = 23
2.5 Further Definitions 31

Definition. The operation Ao + Al is called the NOR function.

Example. Let Ao = 0 and Al = O. Then

Definition. The operation Ao . Al is called the NAND function.

Example. Let Ao = 1 and Al = O. Then

Ao' Al = 1

Definition. The operation Ao EB Al is called the XNOR function.

Example. Let Ao = 1 and Al = 1. Then

Ao EB Al = 1

Definition. A universal set of operations is a set of operations which can be used to


build any Boolean function.

For Boolean functions there exist universal sets of operations with only one element.

For simplicity of implementation, it is useful to know the minimum number of


parameters a Boolean function must take in order to be able to build any other
Boolean function. Obviously functions taking only a single parameter cannot fulfill
this requirement. The minimum number of parameters is thus at least two.

The NAND and NOR operations can be used to build any other function which we
will show in the next section.
32 Chapter 2. Boolean Algebra

2.6 Boolean Function Implementation


The physical implementation of a Boolean function is achieved by interconnection of
gates. A gate is an electronic circuit that produces an output signal (representing 0
or 1) according to the signal states (again representing 0 or 1) of its inputs. The task
is then to build an implementation of a Boolean function with gates of prescribed
types. This is not always possible, for example the NOT operation cannot implement
two bit operations and the OR operation cannot be used to implement the AND and
NOT operations. In the previous section it was shown informally that any Boolean
function can be implemented with AND, OR and NOT operations (in SOP form).
Therefore to show that any Boolean function can be implemented with a set of gates
it is sufficient to show that AND, OR and NOT can be implemented with the set of
gates.

The NAND gate is sufficient to build an implementation of any Boolean function:

• Ao' Ai = Ao . Ai = Ao . Ai . Ao . Ai
• Ao + Ai = Ao . Ai = Ao . Ao . Ai . Ai

Example. We show now how to implement the NOR operation using only NAND

operations. As mentioned earlier De Morgan's laws are important to achieve this .

..
It can also be shown that the NOR gate is sufficient to build an implementation of
any Boolean function.

Data are represented by bit strings a n -ian -2 ... ao, ai E {O, I}. Bit strings of length
n can represent up to 2n different data elements. Functions on bit strings are then
calculated by

with I : Bn ~ Bm and Ii : Bn ~ B. In other words a function of a bit string of


length n gives a bit string of length say m, each bit in the output string is therefore
a function of the input bits. It is sufficient then to consider functions with an output
of only one bit.
2.6 Boolean Function Implementation 33

Example. The set


{ z IZ E No, 0 ~ Z < 2n }

can be represented by

n-l
an-lan-2 ... ao -+ ~ ai2i.
i=O

If n = 32 the largest integer number we can represent is

n-l
~ 2i = 2n - 1 = 4294967295.
i=1

This relates to the data type unsigned long in C and C++. Java has only signed
data types. ..

Example. The set

{ x I x E R, x = b + j 2~ -=- b1 , j = 0,1, .. . , 2n - 1}

where b, C E R and c > b can be represented by

c- b n-l .
an-1 an-2··· ao -+ b + 2n _ ~ ai2'.
1 i=O

So we find

and
an-1an-2 ... an = 11 ... 1 -+ c.
34 Chapter 2. Boolean Algebra

Minimizing the number of gates in an implementation decreases cost and the number
of things that can go wrong.

One way to reduce the number of gates is to use the properties of the Boolean
algebra to eliminate literals.

Example. The full adder (Table 2.4) consists of two outputs (one for the sum and
one for the carry) and three inputs (the carry from another adder and the two bits
to be added).

Gin Ao Al S Gout
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

Table 2.4: Full Adder

Thus

and
Gout = Ao . Al . C;n + Ao . Al . Gin + Ao . Al . Gin + Ao . Al . Gin
Simplification for Gout yields

Gout = Ao' Al . Gin + Ao . Al . Gin + Ao . Al . Gin + Ao . AI'· Gin + Ao . Al . Gin


+Ao' AI' Gin
Ao' AI' (Gin + Gin) + (Ao + Ao) . AI' Gin + Ao' (AI + AI) . Gin
Ao . Al + Al . Gin + Ao . Gin

The half adder is a full adder where Gin = 0 is fixed.


Karnaugh maps and the Quine-McKluskey method [153] for simplification of
Boolean expressions are discussed next.
2.6 Boolean Function Implementation 35

2.6.1 Karnaugh Maps


Karnaugh maps can be used to simplify Boolean expressions of up to six variables.
When an expression has more than four variables the Karnaugh map has dimension
greater than 2. For 2, 3 and 4 variables the Karnaugh map is represented by 2 x 2,
2 x 4 and 4 x 4 grids, respectively. The rows and columns of the grid represent the
values of variables. Each square contains the expression value for the variables with
values given by the row and column.

Example. The Karnaugh map for the carry flag

of the full adder is as follows

00 01 11 10
0 0 0 1 0
1 0 1 1 1

Note that adjacent columns and rows only differ in the assignment of one variable
(only one bit differs). This is important for the simplification algorithm to work
correctly. Suppose two adjacent squares have the value 1 and only differ in the
variable A. Writing the corresponding product forms as p. A and p. A the canonical
SOP form can be simplified using

In fact this can be extended to any 2n adjacent squares in a row or column. The
first column is adjacent to the last column ("wrap around"), and the same applies
to rows. The simplification is indicated by circling the adjacent squares involved in
the simplification. Overlapping circles are allowed due to the idempotence property.
If two circles are "adjacent" in the sense that they cover the same columns(rows)
in adjacent rows (columns) they may be joined to form one circle encircling all the
appropriate squares. The only restriction is that the number of rows and columns
encircled are a power of 2, i.e. 1,2,4,8, .... This is due to the algebraic simplification
used. Each set of encircled squares is called a group and the squares are said to be
covered. There are two algorithms for this method.
36 Chapter 2. Boolean Algebra

Algorithm 1.

1. Count the number of adjacencies (adjacent I-squares) for each I-square on the
Karnaugh map.

2. Select an uncovered I-square with the fewest number of adjacencies. (An


arbitrary choice may be required.)

3. Circle the I-square so that the circle covers the most uncovered I-squares.

4. If all the I-squares are not yet covered goto 2

Algorithm 2.

1. Circle all I-squares so that the circle covers the most I-squares.

2. Eliminate all circles that do not contain at least one I-square that is not
covered by another circle.

3. Introduce a minimum number of circles to complete the cover.

The SOP form is the OR of product forms representing the groups of the Karnaugh
map. The variable Ai is in the product form if Ai = 1 is constant in the group, Ai
is in the product form if A; = 0 is constant in the group.

Example. The Karnaugh map for the carry flag

is the same after application of either algorithm

00 01 11 10
0 0 0 (11 0
1 0 1 ll1JJ 1
2.6 Boolean Function Implementation 37

An advantage of Karnaugh maps is that simpler expressions can be found when


certain inputs cannot occur. A "don't care" symbol is placed in the squares on the
Karnaugh map which represent the inputs which cannot occur. These d-squares can
be interpreted as I-squares or O-squares for optimal grouping.

Example. The truth table for a decimal incrementer (4-bit) with 4 inputs and 4
outputs is given by Table 2.5.

Number in Number out 13 12 11 10 0 3 O2 0 1 0 0


0 1 0 0 0 0 0 0 0 1
1 2 0 0 0 1 0 0 1 0
2 3 0 0 1 0 0 0 1 1
3 4 0 0 1 1 0 1 0 0
4 5 0 1 0 0 0 1 0 1
5 6 0 1 0 1 0 1 1 0
6 7 0 1 1 0 0 1 1 1
7 8 0 1 1 1 1 0 0 0
8 9 1 0 0 0 1 0 0 1
9 0 1 0 0 1 0 0 0 0
1 0 1 0 d d d d
1 0 1 1 d d d d
1 1 0 0 d d d d
1 1 0 1 d d d d
1 1 1 0 d d d d
1 1 1 1 d d d d

Table 2.5: 4-bit Decimal Incrementer

The Karnaugh map for 0 0 and 0 3 is:

O)L ~1 11 10 00 01 11 10
00 1 1 0 0 00 0...- 1---0 0 0
01 1 d d 0 01 (1 d) d 0
11 d d d d 11 \Q 9' (di d
10 1 ~ 0 0 10 0 0 lV 0

Therefore 0 0 = To and 0 3 = To .13 + 10 . 11 , h.


38 Chapter 2. Boolean Algebra

2.6.2 Quine-McKluskey Method


This method provides an algorithmic description for simplification which lends itself
to implementation in programming languages such as C++ and Java. It is also
more general than the method of Karnaugh maps and can handle an arbitrary
number of Boolean variables. Suppose the Boolean expression to be simplified has
the representation in canonical SOP form

E = Po, I (Ao, AI"'" An) + Po,2(Ao,AI , ... , An) + ...


+ n,l(Ao, AI"'" An) + Pl ,2(Ao, AI"'" An) + ...

where ~,j denotes the jth product form with exactly i negated Boolean variables.
The method is as follows

1. Let QM(n):= {~,j Ii = 0, 1, ... ,n j = 1,2, ... }

2. Set m:= n.

3. Set
QM(m - 1) := QM(m)
and
QMm,i:= {P E QM(m) I P has m Boolean variables of which i are negated}
4. For each pair of elements
el = h,l . h,2 ..... h,m E QMm,i
and
e2 = l2,1 ·l2,2 .... ·l2,m E QMm,i+l where i = 0, 1, ... ,m - 1
which differ in only one literal h,j =1= l2,j
set
QM(m - 1) := (QM(m - 1) - {el' e2}) U {h,l .... h,j-l . h,j+l .... ·ll,m}
5. Set m := m - 1.

6. If (m > 0) goto step 3

7. E is the OR of all the elements of QM(O).


2.6 Boolean Function Implementation 39

Example. We consider the two's complement operation on two bits.

10 II 00 01
0 0 0 0
0 1 1 1
1 0 1 0
1 1 0 1

Table 2.6: Two's Complement Operation on 2 Bits

e m=2.
QM(2) = {fo . II. 10 . II}
QMO,2 = {Io· II}
QMI ,2 = {To· II}
QM2,2 = 0

e m=2.
QM{l) = {II}
QMO,2 = {fo . II}
QMI ,2 = {fo . II}
QM2,2 = 0

em=l.
QM{O) = {Id
QMO,1 = {h}
QMI,1 = 0

The method yields 0 1 = h.

Example. The method applied to the carry flag

of the full adder is as follows

e m=3.
QM(3) = {Ao· AI· Gin, Ao· AI· Gin, Ao· AI· Gin, Ao· Al . Gin}
Q MO,3 = {Ao· AI· Gin}
QMI ,3 = {Ao· AI· Gin, Ao· AI· Gin, Ao· AI· Gin}
QM2,3 = 0
QM3,3 = 0
40 Chapter 2. Boolean Algebra

• m=3.
QM(2) = {Ao' AI' Gin, AI' Gin, Ao' AI' Gin}
QMO,3 = {Ao' AI' Gin}
QMI ,3 = {Ao' AI' Gin, Ao' AI' Gim An· AI' Gin}
QM2 ,3 = 0
QM3,3 = 0

• m=3.
QM(2) = {Ao' AI' Gin, AI' Gin, Ao' Gin}
QMO,3 = {An' AI' Gin}
QMI ,3 = {Ao' AI' Gin, Ao' AI' Gim Ao' AI' Gin}
QM2,3 = 0
QM3,3 = 0

• m=3.
QM(2) = {Ao' AI, AI' Gin, Ao' Gin}
QMO,3 = {Ao' AI' Gin}
QMI ,3 = {Ao' AI' Gin, Ao' AI' Gin, Ao' AI' Gin}
QM2,3 = 0
QM3,3 = 0

• m=2.
QM(l) = {Ao' AI, AI' Gin, Ao' Gin}
QMO,2 = QM(l)
QMI ,2 =0
QM2,2 = 0
QM3,2 = 0
.m=l.
QM(O) = {Ao . AI, Al . Gin, Ao . Gin}
QMO,1 = 0
QMI,1 = 0
QM2,1 = 0
QM3 ,1 = 0

Thus we have reduced the expression to one consisting of only two types of operations
and 5 operations in total. This is a large reduction compared to the original total
of 11 operations. The example also illustrates that the process is long but simple
enough to implement, making it a good application for a computing device. ..
2.7 Example Programs 41

2.7 Example Programs


2.7.1 Efficient Set Operations Using Boolean Algebra
Let
U := { 00,01, ... ,On-1 }

be the universal set of n objects. Any subset A of U can be represented with a


sequence of bits

where ai = 1 if 0i E A and ai = 0 otherwise. For example, let n = 8 and consider


the bitstring
10010111.

Now the set Au B (union of A and B) corresponds to

and A n B (intersection of A and B) corresponds to

The complement of A corresponds to

A := aD a1 ... an-I·
For example if

A 11010100
B 01101101

then

AuB 11111101
AnB 01000100
A 00101011.
42 Chapter 2. Boolean Algebra

The following C++ program bi tset . cpp implements these concepts. The class
Bi tSet implements all the bitwise operations introduced above. We could also use
the bi tset which is part of the standard template library, which includes all the
methods needed to implement complement, intersection and union.

II bitset.cpp

#include <iostream>
#include <string>

using namespace stdj

class SetElementBase
{
public:
virtual void output(ostream&)=Oj
}j

template <class T>


class SetElement: public SetElementBase
{
protected:
T dataj
public:
SetElement(T t) : data(t) {}
virtual void output(ostream& 0) { 0 « dataj }
}j

class BitSet
{
protected:
char *setj
int lenj
SetElementBase **universej
static int byte(int)j
static char bit(int)j
public:
BitSet(SetElementBase**,int,int*,int)j
BitSet(const BitSet&)j
BitSet &operator=(const BitSet&)j
BitSet operator+(const BitSet&) constj I I union
BitSet operator*(const BitSet&) constj II intersection
BitSet operator-(void) constj II complement
void output(ostream&) constj
-BitSetO j
}j
2.7 Example Programs 43

int BitSet::byte(int n) { return n»3; }


char BitSet::bit(int n) { return 1«(n%8); }

II Create a BitSet with universe un of n elements,


II with m elements given by el
BitSet::BitSet(SetElementBase **un,int n,int *el,int m)
{
int i;
len = n;
universe = un;
set = new char [byte(len)+l] ;
for(i=O;i<byte(len)+l;i++) set[i] = 0;
if(m> 0)
for(i=O;i<m;i++) set[byte(el[iJ)] 1= bit (el[iJ) ;
}

BitSet: :BitSet(const BitSet &b)


{
int i;
len = b.len;
universe = b.universe;
set = new char [byte (len)+l] ;
for(i=O;i<byte(len)+l;i++) set[i] b. set [i] ;
}

BitSet &BitSet::operator=(const BitSet &b)


{
if (this ! =&b)
{
int i;
delete [J set;
len = b.len;
universe = b.universe;
set = new char [byte (len)+l] ;
for(i=O;i<byte(len)+l;i++) set[i] = b.set[i];
}
return *this;
}

BitSet BitSet::operator+(const BitSet &b) const


{
if (universe == b.universe)
{
int i;
BitSet c(universe,len,NULL,O);
for(i=O;i<byte(len)+l;i++) c.set[i] set [i] 1b. set [i] ;
return c;
}
44 Chapter 2. Boolean Algebra

else return *this;


}

BitSet BitSet::operator*(const BitSet tb) const


{
if (universe == b.universe)
{
int i;
BitSet c(universe,len,NULL,O);
for(i=O;i<byte(len)+l;i++) c.set[i] set [i) tb. set [i) ;
return c;
}
else return *this;
}

BitSet BitSet::operator-(void) const


{
int i;
BitSet b(universe,len,NULL,O);
for(i=O;i<byte(len)+l;i++) b.set[i]
return b;
}

void BitSet::output(ostream to) const


{
int i,start = 0;
o « II{II;
if«set[byte(O)]&bit(O» !=O) universe[O]->output(o);
for(i=O;i<len;i++)
if«set[byte(i)]&bit(i» != 0)
{ if(start) 0 « ", "; universe[i]->output(o); start=l; }
o « II}II;
}

BitSet::-BitSet() {delete[] set; }

ostream toperator«(ostreamt o,const BitSet &b)


{
b.output(o);
return 0;
}

void main(void)
{
SetElement<int> sl(5);
SetElement<string> s2(string(lelement"»;
SetElement<double> s3(3.1415927);
SetElement<int> s4(8);
2.7 Example Programs 45

SetElement<int> s5(16);
SetElement<int> s6(3);
SetElement<string> s7(string("string"));
SetElement<double> s8(2.7182818);
SetElement<int> s9(32);
SetElement<int> s10(64);
SetElementBase *universe[10]={&sl,&s2,&s3,&s4,&s5,
&s6,&s7,&s8,&s9,&s10};

cout « "Universe=" « (-BitSet(universe,10,NULL,0)) « endl;


cout « "Empty set=" « BitSet(universe,10,NULL,0) « endl;
int a[7] = {1,2,5,6,8,9};
int b[4] = {3,5,7,8};
BitSet A(universe,10,a,7);
BitSet B(universe,10,b,4);
cout « "A=" « A « endl;
cout « "B=" « B « endl;
cout « II-A=" « (-A) « endl;
cout « "A+B=" « (A+B) « endl;
cout « "A*B=" « (A*B) « endl;
}

The output of the bi tset . cpp program is

Universe={55, element, 3.14159, 8, 16, 3, string, 2.71828, 32, 64}


Empty set={}
A={55, element, 3.14159, 3, string, 32, 64}
B={8, 3, 2.71828, 32}
-A={8, 16, 2.71828}
A+B={55, element, 3.14159, 8, 3, string, 2.71828, 32, 64}
A*B={3, 32}
46 Chapter 2. Boolean Algebra

2.1.2 Quine-McKluskey Implementation


The next C++ program illustrates the Quine-McKluskey method for the full adder.
The algorithm is modified slightly to make the implementation easier. For an n-
Boolean variable expression the program maintains n + 1 sets So, ... ,Sn where Si
contains product forms of exactly n variables. Initially only Sn is non-empty and
contains all the product forms from the expression to be simplified. The program fills
the set Si by combining and simplifying two product forms from S'H which only
differ in one literal (using previously discussed methods). Once all sum product
forms have been simplified the product forms used in simplification are removed
from Si+l. As input, the program takes an array of product forms (where product
forms are arrays of characters) where the value 1 means the variable is present in
the product form and 0 means the negation of the variable is present in the product
form. Thus the arrays can be constructed directly from truth tables.

The program quine. cpp simplifies the carry and sum bits from the full adder. In
the main function we consider the expressions for Gaut and S for the full adder. We
use an array of three char to represent a product form, a 1 indicates that the literal
in the product form is not a negated variable and a 0 indicates that the literal is a
negated variable. The variable is identified by the index in the array, for example
the program uses index 0 for Ao, index 1 for Al and index 2 for Gin. These arrays
(representing product forms) are placed in an array representing the final SOP form.

The function complimentary searches for complimentary literals in a set of product


forms in order to perform simplification, Addltem is used to add elements to the
sets maintained in the algorithm, Deleteltem removes elements from these sets
(the simplification). The function QuineRecursive is the main implementation
of the algorithm. It is implemented using recursion. This is not necessary but
simplifies the implementation. The function QuineMcKluskey prepares the data for
the QuineRecursi ve function.

The output of the program is


Cout=AO.Al+Al.Cin+AO.Cin
S=NOT(AO).Al.NOT(Cin)+AO.NOT(Al).NOT(Cin)
+NOT(AO).NOT(Al).Cin+AO.Al.Cin
which is the same result obtained in earlier examples. The sum bit could not be
simplified.
2.7 Example Programs 47

II quine.cpp

#include <iostream>

using namespace std;

struct QMelement
{
int nvars,used;
char *product;
int *vars;
QMelement *next;
};

int complementary(QMelement *p1,QMelement *p2)


{
int sum = O,i;
if(p1->nvars != p2->nvars) return 0;
for(i=0;i<p1->nvars;i++) sum += (p1->vars[i] !=p2->vars[i]);
if(sum == 0)
for(i=0;i<p1->nvars;i++)
sum += (p1->product[p1->vars[i]]!=p2->product[p2->vars[i]]);
else sum = 0;
return (sum == 1);
}

void Addltem(QMelement* tlist,char *product,int nvars,int *vars)


{
int i;
QMelement *item;
if(list == (QMelement*)NULL)
{
list = new QMelement;
list->nvars = nvars;
list->next = (QMelement*)NULL;
list->product = new char[nvars];
list->vars = new int[nvars];
list->used = 0;
for(i=O;i<nvars;i++)
{
list->product[i] = product[i];
list->vars[i] = vars[i];
}
}
else
{
item = list;
while(item->next != (QMelement*)NULL) item item->next;
48 Chapter 2. Boolean Algebra

item = (item->next = new QMelement);


item->nvars = nvars;
item->next = (QMelement*)NULL;
item->product = new char[nvars];
item->vars = new int[nvars];
item->used = 0;
for(i=O;i<nvars;i++)
{
item->product[i] = product[i];
item->vars[i] = vars[i];
}
}
}

void Deleteltem(QMelement *tset,QMelement *item)


{
QMelement *last = set;
if(item == set) set = set->next;
else
{
while(last->next != item) {last last->next;}
last->next = item->next;
}
delete[] item->product;
delete[] item->vars;
delete item;
}

void Deleteltems(QMelement *set)


{
QMelement *item = set,*next;
while(item != (QMelement*)NULL)
{
next = item->next;
delete[] item->product;
delete[] item->vars;
delete item;
item = next;
}
}

void QuineRecursive(QMelement **sets,int index)


{
if(index<l) return;
if(sets[index] == (QMelement*)NULL) return;
int i,j;
QMelement *iteml = sets[index],*item2;
while(iteml != (QMelement*)NULL)
2.7 Example Programs 49

{
if(itemi->next != (QMelement*)NULL)
{
item2 = item1->next;
while(item2 != (QMelement*)NULL)
{
if (complementary (item1,item2))
{
char *product = new char [iteml->nvars-l] ;
int *vars = new int[iteml->nvars-l];
for(i=O,j=O;i<iteml->nvars;i++)
if(iteml->product[i] == item2->product[i])
{
product[j] = iteml->product[i];
vars [j ++] = i;
}
Addltem(sets[index-1] ,product,item1->nvars-1,vars);
delete[] product;
delete [J vars;
iteml->used = item2->used=1;
}
item2 = item2->next;
}
}
item2 = item1;
item1 = item1->next;
if(item2->used) Deleteltem(sets[index] ,item2);
}
QuineRecursive(sets,index-1);
}

void QuineMcKluskey(char **sop,int nproducts,int nvars,char **names)


{
int i,j,*vars = new int[nvars] j
QMelement **sets = new QMelement*[nvars+l]j
for(i=O;i<=nvarsji++)
{sets[i] = (QMelement*)NULL; if(i < nvars) vars[i] = ij}
for(i=Oji<nproductsji++) Addltem(sets[nvars] ,sop[i] ,nvars,vars)j
QuineRecursive(sets,nvars);
delete[] vars;
for(i=O;i<=nvarsji++)
{
QMelement *item = sets[i];
while(item != (QMelement*)NULL)
{
for(j=O;j<item->nvarsjj++)
{
if(item->product[j] == 1) cout « names [item->vars [j]] ;
50 Chapter 2. Boolean Algebra

else cout « "NOT(" « names[item->vars[j]] « ")";


if(j != item->nvars-1) cout « II II.
}
item = item->next;
if(item != (QMelement*)NULL) cout « "+";
else if(i != nvars)
if (sets [i+1] != (QMelement*)NULL) cout « "+";
}
Deleteltems(sets[i]);
}
}

void main(void)
{
//carry flag
char c1[3]={1,1,0},c2[3]={0,1,1},c3[3]={1,0,1},c4[3]={1,1,1};
//sum
char s1[3]={0,1,0},s2[3]={1,0,0},s3[3]={0,0,1},s4[3]={1,1,1};
char *Cout[4];
char *8[4];
char *names [3] {"AO","A1","Cin"};
Cout [0] c1;
Cout [1] c2;
Cout [2] c3;
Cout [3] c4;
8 [0] sl;
8 [1] s2;
8 [2] s3;
8 [3] s4;
cout « "Cout=" ; QuineMcKluskeyCCout,4,3,names);
cout « endl « 118=11; QUineMcKluskey(S,4,3,names);
cout « endl;
}
Chapter 3
Number Representation

3.1 Binary, Decimal and Hexadecimal Numbers


We are accustomed to using the decimal number system. For example the (decimal)
number 34062 can be written as

where 10 1 = 10 and 10° = 1. In general, any positive integer can be represented in


one and only one way in the form

ao . 10° + al . 10 1 + a2 . 102 + ... + ak . 10k


where 0 ::; ai ::; 9 for 0 ::; i ::; k and ak > O. This number is denoted

in standard decimal notation.

For any integer r > 1, every positive integer n can be represented uniquely in the
form
ao . rO + al . rl + a2 . r2 + ... + am . rm
where 0 ::; ai ::; r - 1 for 0 ::; i ::; m and am > 0 and rO = 1. This can be proved by
induction on n.

In particular, every positive integer can be represented in binary notation

where ai E { 0, I} for 0 ::; i ::; m and am = 1.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
52 Chapter 3. Number Representation

Obviously we have 1 = 1 . rO. Further if

where 0 ::; ai ::; r - 1 for 0 ::; i ::; m and am > 0 and rO = 1. Let k be the least
integer in {O, 1, ... ,m} with ak < r - 1. Either k < m which gives

a + 1 = (ak + l)rk + ... + am' r m,


or k = 1 which gives
a + 1 = rm+!.

Example. The number 23 (in decimal notation) has the binary representation 10111,
since 24 + 22 + 2 + 1 = 23. The decimal number 101 has the binary representation
1100101, i.e. 26 + 25 + 22 + 1. ,.

A procedure for finding the binary representation of a number n is to find the highest
power 2m which is ::; n, subtract 2m from n, then find the highest power 2j which
is ::; n - 2m , etc.

Although the computer operates on binary-coded data, it is often more convenient


for us to view this data in hexadecimal (base 16). There are three reasons for this:

1. Binary machine code is usually long and difficult to assimilate. Hexadecimal,


like decimal, is much easier to read.

2. There is a direct correspondence between binary and hexadecimal. Thus, we


can easily translate from hexadecimal to binary.

3. In today's CPU the length ofthe storage elements (called registers) are gener-
ally multiples of 8 bits (typically 32 or 64 bits). The general purpose registers
are 32 bits long or 64 bits long. Thus it is convenient to show contents as
multiples and fractions of 16 - hexadecimal. The storage sizes are 8 bits (a
byte), 16 bits (a worrf), 32 bits (a doubleword) , 64 bits (a quadworrf) , and 80
bits (a tenbyte) - all multiples and fractions of 16.

Thus, although we think in decimal and the computer thinks in binary, hexadecimal
is a number system that captures some of the important elements of both. In the
remainder of this section we discuss the binary, decimal, and hexadecimal number
systems and the methods for converting from one number system to another.
3.1 Binary, Decimal and Hexadecimal Numbers 53

3.1.1 Conversion
In this section we describe the conversion from binary to hexadecimal, from hex-
adecimal to binary, binary to decimal, decimal to binary, decimal to hexadecimal,
and hexadecimal to decimal.

Binary to Hexadecimal. To see the one-to-one correspondence between hexadecimal


and binary, notice that if we use b to represent a bit and

is a binary number, then it has a value of

or
... + 256b8 + 128b7 + 64b6 + 32b5 + 16b4 + 8b3 + 4b2 + 2b1 + bo
which can be written

Each of the sums in parentheses is a number between 0 (if all the b values are 0)
and 15 (if all the b values are 1). These are exactly the digits in the hexadecimal
number system. Thus to convert from binary to hexadecimal, we must gather up
groups of 4 binary digits.

Example. Convert the following binary word to hexadecimal.

0010 1011 00111000 b


"-v-'''-v-'''-v-'''-v-'
2 B 3 8

That is, 0010101100111000b = 2B38h.

The notation in assembly language is as follows. The letter b indicates that the
number is in binary representation and the letter h indicates that the number is in
hexadecimal representation. This is the notation used in assembly language. The
letter d indicates a decimal number. The default value is decimal.

In C, C++, and Java decimal, octal and hexadecimal numbers are available. Hex-
adecimal numbers are indicated by Ox ... in C, C++ and Java. For example the
decimal number 91 would be expressed as Ox5B, since

5 . 161 + 11 . 160 = 91 .
54 Chapter 3. Number Representation

Hexadecimal to Binary. To convert from hexadecimal to binary, we perform the


opposite process from that used to convert from binary to hexadecimal. We must
expand each hexadecimal digit to four binary digits.

Example. Convert the hexadecimal number DOh to binary. We find

That is, DOh = 11010000b.

Example. Convert FFh to binary.

Thus FFh = l1111111b.

Binary to Decimal. Write the binary sequence in its place-value summation form
and then evaluate it.

Example.

Thus
10101010b = 27 + 25 + 23 + 21 = 128 + 32 + 8 + 2 = 170d.

Decimal to Binary. Divide the decimal number successively by 2: remainders are


the coefficients of 20 ,21, 22, .... We know that any non-negative integer a can be
written in the form
a = aD + a1' 21 + ... + am' 2m.
The technique follows simply from

m m
Lai2i-k =2 L ai 2i - k- 1 + ak·
i=k i=k+l

The remainder after integer division by 2 gives ak, and we continue until the division
gives O. The following example illustrates this.
3.1 Binary, Decimal and Hexadecimal Numbers 55

Example. Convert 345d to binary.

345/2 = 172, remainder 1; coefficient of 2° is 1

172/2 = 86, remainder 0; coefficient of 21 is 0

86/2 = 43, remainder 0; coefficient of 22 is 0

43/2 = 21, remainder 1; coefficient of 23 is 1

21/2 = 10, remainder 1; coefficient of 24 is 1

10/2 = 5, remainder 0; coefficient of 25 is 0

5/2 = 2, remainder 1; coefficient of 26 is 1

2/2 = 1, remainder 0; coefficient of 27 is 0

1/2 = 0, remainder 1; coefficient of 28 is 1

Thus, 345d 101011001b.

This method works because we want to find the coefficients bo, b1 , b2 , •.• (which are
o or 1) of 2°, 21, 22, ... and so on.
Thus, in the preceding example,

Dividing by 2,
345/2 = b1029 + b928 + ... + bi + (bo/2)
Thus bo is the remainder on division by 2 and

is the quotient.

The following C++ program finds the binary representation of a non-negative inte-
ger. The operator %is used to calculate the remainder after integer division.
56 Chapter 3. Number Representation

II remain.cpp

#include <iostream.h>

void mainO
{
int ij
unsigned long N = 345j
unsigned long array[32]j
for(i=Oji<32ji++) { array[i] = OJ }
for(i=Oji<32ji++) { array[i] = ~!.2j N = N/2j }
for(i=31ji>=Oji--) { cout« array[i]j }
}

Decimal to Hexadecimal. Divide the decimal number successively by 16; remainders


are the coefficients of 16°, 161 , 162 , •••

Example.

302/16 = 18, remainder 14; coefficient of 16° is E


18/16 = 1, remainder 2; coefficient of 161 is 2
1/16 = 0, remainder 1; coefficient of 162 is 1

Therefore, 302d = 12Eh.

This works for the same reMon that the method for decimal-to-binary conversion
works. That is, division by 16 produces as a remainder the coefficient (h o) of 16°,
and as a quotient the decimal number minus the quantity (ho· 16°).

Hexadecimal to Decimal. Write the hexadecimal number in its place-value summa-


tion form and then evaluate.

Example.

CA14h = C .163 + A .162 + 1.161 + 4 ·16°.


Thus
CA14h = 12·4096 + 10·256 + 1·16 + 4 = 51732d.
3.1 Binary, Decimal and Hexadecimal Numbers 57

The following C++ program finds the hexadecimal representation of a non-negative


integer.

II remain2.cpp

#include <iostream.h>

void mainO
{
int i;
unsigned long N = 15947;
unsigned char array[8];

for(i=0;i<8;i++)
array [i] = 0;

for(i=0;i<8;i++)
{
array[i] = N%16;
if (array [i] >9)
array[i] += 'A'-10;
else
array[i] += '0';
N = N/16;
}

for(i=7;i>=0;i--)
cout « array[i];
}

The output is 00003E4B.


58 Chapter 3. Number Representation

3.1.2 Arithmetic
The rules for addition of binary numbers are:
o+ 0 = 0
o+ 1 = 1
1 + 0 =1
1 + 1 = (1) 0
where (1) denotes a carry of 1. Note that lOb is the binary equivalent of 2 decimal.
Thus the sum 1 + 1 requires two bits to represent it, namely 10, the binary form
of the decimal number 2. This can be expressed as follows: one plus one yields a
sum bit s = 0 and a carry bit c = 1. If we ignore the carry bit and restrict the sum
to the single bit s, then we obtain 1 + 1 = O. This is a very useful special form of
addition known as modulo-2 addition.

Doing arithmetic in the binary and hexadecimal number systems is best shown by
examples and best learned by practice.
Example. Decimal Arithmetic

45
+ 57

102

,.
Remember that 7 + 5 is 2 with a 1 carry in decimal. 5 + 4+ the carried 1 is 0 with
a 1 carry.
Example. Binary Arithmetic
1011
+ 1001

10100

Remember that 1 + 1 is 0 with a 1 carry in binary.


Example. Binary Arithmetic
1111
+ 1111

11110
3.1 Binary, Decimal and Hexadecimal Numbers 59

Example. Hexadecimal Arithmetic

lA
+ 5

IF

In decimal A + 5 is 10 + 5 = 15. Thus 15d = Ph.

Example. Hexadecimal Arithmetic

FF
+ 3

102

F + 3 is 15 + 3 in decimal. Thus 18d = 12h. Thus we write down a 2, carry a 1.

Binary multiplication can be done by repeated addition. The following example


shows how to multiply the two binary numbers 1001 (9 decimal) and 110 (6 decimal).

1001 multiplicand
x 110 multiplier

0000 first partial product


1001 second partial product
1001 third partial product

110110 product (54 decimal)

High speed multiplication techniques use addition and subtraction or uniform mul-
tiple shifts. Binary divisions can be performed by a series of subtractions and shifts.
60 Chapter 3. Number Representation

3.1.3 Signed Integers


We can easily see how positive integers are stored. For example, 345 is stored as
101011001. This will not fit into a byte because it has more than 8 bits, but it fits
into a word (2 consecutive bytes). A byte has 8 bits, a word has 16 bits and a double
word has 32 bits.

Example. Show 65712d as a binary (a) byte, (b) word, (c) double word. We have

65712d= 10000000010110000b

Thus

(a) Does not fit in a byte (it is too large).

(b) Does not fit in a word (it is to large).

(c) 00000000000000010000000010110000

The largest integer number which fits into a register with 32 bits is

31
232 - 1 == L 2k == 4294967295.
k=O

The largest integer number which fits into a register with 64 bits is

31
264 - 1 == L 2k == 18446744073709551615.
k=O

Storing negative integers presents a more difficult problem since the negative sign
has to be represented (by a 0 or a 1) or some indication has to be made (in binary!)
that the number is negative. There have been many interesting and ingenious ways
invented to represent negative numbers in binary. We discuss three of these here:

1. Sign and magnitude

2. One's complement

3. Two's complement
3.1 Binary, Decimal and Hexadecimal Numbers 61

Sign and Magnitude

The sign and magnitude representation is the simplest method to implement nega-
tive integers. Knuth [105] used sign and magnitude in his mythical MIX computer.
In sign and magnitude representation of signed numbers, the leftmost (most signif-
icant) bit represents the sign:
o for positive
and
1 for negative.
Example. The positive integer number 31 stored in a double word (32 bits) using
sign and magnitude representations is

00000000000000000000000000011111b
Thus the negative integer -31 becomes

10000000000000000000000000011111b

There are two drawbacks to sign and magnitude representation of signed numbers:
1. There are two representations of 0:

+0 = OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOb
and
-0 = 10000000000000000000000000000000b.

Thus the CPU has to make two checks every time it tests for O. Checks for 0
are done frequently, and it is inefficient to make two such checks.
2. Obviously,
a + (-b)
is not the same as
a- b.
What this means is that the logic designer must build separate circuits for
subtracting; the adding circuit used for a + b is not sufficient for calculating
a-b.
Example. The following shows that

52 - 31
and

52 + (-31)
are not the same in sign and magnitude representation.
62 Chapter 3. Number Representation

52 = 00000000 00000000 00000000 00110100b


-31 = -00000000 00000000 00000000 00011111b

21 = 00000000 00000000 00000000 00010101b

On the other hand

52 00000000 00000000 00000000 00110100b


+ -31 = 10000000 00000000 00000000 00011111b

21 10000000 00000000 00000000 01010011b

Thus this shows that the sign and magnitude representation is not useful for imple-
mentations on CPU's.

Furthermore 31 - 31 gives

31 = 00000000 00000000 00000000 00011111b


-31 = -00000000 00000000 00000000 00011111b

o= 00000000 00000000 00000000 OOOOOOOOb

and 31 + (-31) gives

31 00000000 00000000 00000000 00011111b


+ -31 = 10000000 00000000 00000000 00011111b

o= 10000000 00000000 00000000 00111110b


3.1 Binary, Decimal and Hexadecimal Numbers 63

One's Complement
One's complement method of storing signed integers was used in computers more in
the past than it is currently. Here we assume again that 32 bits are given (double
word). In one's complement, the leftmost bit is still 0 if the integer is positive. For
example,
00000000000000000000000000011111b
still represents +31 in binary. To represent the negative of this, however, we replace
all O's with l's and alII's with O's. Thus

11111111111111111111111111100000b

represents -31. Note that the leftmost bit is again 1. Notice that in assembly
language one starts counting from zero from the rightmost bit.

Example. Using a double word of storage, -1 is stored as

11111111111111111111111111111110b

since 1 is stored in binary as

0000000000000000000000000000000lb

Thus the second drawback to sign and magnitude representation has been elimi-
nated. This means a - b is the same as a + (-b). Thus the circuit designer need only
include an adder; it can also be used for subtraction by replacing all subtractions
a - b with a + (-b).

The following example shows, however, that this adder must do a little more than
just add.

Example. We show that 52 - 31 and 52 + (-31) are the same in one's complement
representation. For 52 - 31 we have

52 = 00000000 00000000 00000000 00110100b


-31 = -00000000 00000000 00000000 00011111b

21 = 00000000 00000000 00000000 00010101b

Next we consider the one's complement. Since

31 = 00000000 00000000 00000000 00011111b


64 Chapter 3. Number Representation

we find for the one's complement


-31 = 11111111 11111111 11111111 11100000b
Addition of the two terms 52 + (-31) yields
00000000 00000000 00000000 00010100b
plus an overflow bit. Addition of the overflow bit to the right-most bit yields
00000000 00000000 00000000 00010101b
which is 21 in binary representation.

The adder for one's complement arithmetic is more complicated; it must carry
around any overflow bit in order to work correctly for subtraction. The first draw-
back is still with us, however. In one's complement, there are still two representa-
tions of 0
00000000 00000000 00000000 OOOOOOOOb positive 0
and
11111111 11111111 11111111 l1111111b negative 0
when viewed as a double word.

One's complement is implemented in C, C++ and Java with the - operator. The
following program shows an application.
II complement.cpp

#include <iostream.h>

char *binary(unsigned int N)


{
static char array[36];

for(int i=34,j=27;i>=0;i--)
if«i == j) && (i != 0)) { array[i]=' '; j-=9;}
else { array[i] = N%2 + '0'; N = N/2; }
array[35]='\0' ;
return array;
}

void mainO
{
int a = 17; II binary 000000000 00000000 00000000 0010001
cout « "a = " « a « endl « binary(a) « endl;
int b = -a; II binary 111111111 11111111 11111111 1101110
cout « "-a = " « b « endl « binary(b) « endl;
}
3.1 Binary, Decimal and Hexadecimal Numbers 65

Two's Complement

The two '8 complement method of storing signed integers is used in most present-day
CPU's, including the 386, 486, Pentium and Alpha Dec. The two's complement is
formed by

(1) forming the one's complement and then

(2) adding 1.

Example. Using two's complement and a double word (32 bits). Thus the decimal
number 31 is stored as

00000000000000000000000000011111b.

Consequently -31 is stored as

11111111111111111111111111100001b.

We can easily check that


a - b = a + (-b)
and that there is only one way to represent 0, i.e., +0 and -0 are stored the same,
namely
OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOb.
The two's complement of a number is the true (one's) complement of the number
plus 1.

With n bits we can represent numbers from

in two's complement. If we have registers with 32 bits then we can store the integer
numbers (n = 32)
-2147483648 to 2147483647.
Although taking the two's complement of a number is more difficult than taking its
one's complement, addition of two's complement numbers is simpler than addition
in one's complement or in signed-magnitude representations.

Next we consider some examples of addition in two's complement. We assume that


32 bits are given.
66 Chapter 3. Number Representation

Addition of Two Positive Numbers

+3 = 00000000 00000000 00000000 00000011b


+4 = 00000000 00000000 00000000 00000100b

+7 = 00000000 00000000 00000000 00000111b


Addition of Two Negative Numbers

-4 = 11111111 11111111 11111111 11111100b


-1 = 11111111 11111111 11111111 l1111111b

-5 = 11111111 11111111 11111111 11111011b

Addition of One positive and One Negative Number

-7 = 11111111 11111111 11111111 11111001b


+5 = 00000000 00000000 00000000 00000101b

-2 = 11111111 11111111 11111111 11111110b


In two's complement, it is possible to add or subtract signed numbers, regardless of
the sign. Using the usual rules of binary addition, the result comes out correctly,
including the sign. The carry is ignored. This is a very significant advantage. If this
were not the case, we would have to correct the result for sign every time, causing
a much slower addition or subtraction time. For the sake of completeness, let us
state that two's complement is simply the most convenient representation to use
for microprocessors. All signed integers will be implicitly represented internally in
two's complement notation.
The following Java program shows an implementation of two's complement.
I I Twocomp. java

public class Twocomp


{
public static void main(String[] args)
{
int rl = 14; I I binary 1110

II two's complement to find the negative number


II of a given integer number
II The operation - gives the one complement
II and then we add 1 to find the two's complement
int r2 = -rl; r2++;
System.out.println(r2); II => -14
}
}
3.1 Binary, Decimal and Hexadecimal Numbers 67

3.1.4 Overflow
If we do arithmetic operations with 32 bit registers overflow will occur in cases:

1. if we go beyond the range 0-4294967295 for the data type unsigned long
in C and C++. This means we add numbers so that the sum is larger than
4294967295. Also negative numbers are out of range.

2. if we go out of the range -214783648 to 2147483647 (long in C and C++).


This means if we add or subtract numbers which go beyond this range.

Example. Consider the sum

4294967295 + 1 = 4294967296
The number on the right-hand side is out of the range for a 32 bit register for the
C and C++ data type unsigned long. Since

4294967295 = Illlllllllllllllllllllllllllllllb
for unsigned long, the addition of 1 yields

OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOb
with one overflow bit. Thus the output is O.

Example. Consider the sum

+( -2147483648) + (-3) = -2147483651

The number on the right-hand side is out of range for a 32 bit register for long.
Since

-2147483648 = 10000000000000000000000000000000b
and
-3 = 11111111111111111111111111111101b
we obtain
01111111111111111111111111111101b
Thus the output is 2147483651.
68 Chapter 3. Number Representation

Consider the following C++ program.

II overflow.cpp

#include <iostream.h>

int mainO
{
unsigned long a = 4294967295;
unsigned long b = 1;
unsigned long r1;
r1 = a + b;
cout « "r1 = " « r1 «endl; II 0

unsigned long c 4294967295;


unsigned long d = 2;
unsigned long r2;
r2 = c + d;
cout « "r2 = " « r2 «endl; II 1

unsigned long e = 0;
unsigned long f = -1;
unsigned long r3 = e + f;
r3 = e + f;
cout « "r3 = " « r3 « endl; II 4294967295

long g = -2147483648;
long h = -3;
long r4 = g + h;
cout « "r4 = " « r4 « endl; II 2147483645

unsigned long i = OxFFFFFFFF; II hexadecimal number


unsigned long j = Ox1; II hexadecimal number
unsigned long r5 = i + j;
cout « "r5 = " « r5 « endl; II 0

return 0;
}
3.1 Binary, Decimal and Hexadecimal Numbers 69

The range of the data type unsigned long is 0 - 4294967295. The binary repre-
sentation of 4294967295 is

11111111 11111111 11111111 11111111b

This is the largest number which fits into 32 bits. Thus if we add 1 to this binary
number under the assumption that 32 bits are given we find

00000000 00000000 00000000 OOOOOOOOb

with 1 overflow bit. The output of the C++ program ignores the overflow bit and
displays the output O.

Remark. Obviously the overflow flag of the CPU will be set.

Analogously we understand the other three outputs,

r2 =1
r3 = 4294967295
and
r4 = 2147483645.

of the program.

Java has the signed data type long. The size is 64 bits. Thus the range is

-9223372036854775808 to 9223372036854775807.
70 Chapter 3. Number Representation

3.1.5 Binary-Coded Decimal Form


The discussion so far has assumed that decimal numbers are translated into base-2
form for processing by digital circuits. An alternative approach is to encode the
decimal digits into binary form, but maintain the base-lO positional notation in
which all digits are weighted by powers of 10. Such numbers are called binary-coded
decimal numbers or, if the context is clear, simply decimal numbers. We usually
restrict the term binary-coded decimal, which is abbreviated BCD, to refer to the
most widely used code of this sort.

An unsigned decimal number

NlO = dn- I dn- 2 •.• dIdo


is converted into the standard BCD form by mapping each digit di separately into
a 4-bit binary number

where (Bih = (~)1O. Thus, a 9 in NlO is mapped into 1001, an 8 into 1000, a 7 into
0111, and so on. For example, if NlO = 7109 10 , then the decimal-to-BCD conversion
process takes the form

leading to

NlQ= OlllOOOlOOOOlOOllQ
where the underlined subscript 10 is our notation for binary-coded decimal. This
conversion process is, in fact, the same as that used for changing a hexadecimal
number to binary. In this case, there are 10 digits instead of 16, so only 10 of the 16
possible 4-bit binary numbers are needed. Also, each 4-bit group must be assigned
weight 10 rather than 16. For example, we get

assuming N as an integer. The weight of an individual bit in NlQ is of the form


j . lOi , where j is 8, 4, 2, or 1. Standard BCD is therefore sometimes called 8421
decimal code.

Conversion from BCD to ordinary decimal form is achieved by replacing 4-bit groups
with the equivalent decimal digit. For instance,
3.1 Binary, Decimal and Hexadecimal Numbers 71

implying that N{o = 284905 10 . Conversion between binary (base 2) and BCD re-
quires the decimal-binary conversion procedure, in addition to the decimal digit-
encoding procedure discussed above.

Not all the possible binary patterns correspond to BCD numbers. The six 4-bit
patterns

{ 1010, 1011, 1100, 1101, 1110, 1111}


are not needed to represent decimal digits, and therefore cannot appear in the digit
positions of a BCD number. These six digit patterns appear in BCD numbers only
as the result of an error, such as failure of a hardware component or a program-
ming mistake. The fact that some n-bit patterns are unused implies that a larger
n is needed to represent a given range of numbers if BCD code is employed in
place of binary. For example, to represent all integers from zero to a million re-
quires 24 bits of BCD code, but only 20 bits of binary code. Note that 220 > 106 .
Therefore, BCD numbers have slightly greater storage requirements than binary
numbers. Arithmetic operations on BCD numbers are more complicated than their
binary counterparts. The main advantage of BCD code is that it eliminates most
of the need for time-consuming base-1O-to-base-2 and base-2-to-base-1O conversions.
Digital computers are designed primarily to process binary numbers, but many have
features to support BCD operations.

The following C++ program converts a number (N = 15947) to BCD form.

II bcd.cpp

#include <iostream.h>

void mainO
{
int i;
unsigned long N = 15947;
unsigned char array [4] ;
unsigned char mask[2]={OxOF,OxFO};
unsigned char shift[2]={0,4};
for(i=0;i<4;i++)
array[i] = 0;
for(i=0;i<8;i++)
{array[i/2] 1= (N%10) «shift [i%2] ; N = N/l0; }
for(i=7;i>=O;i--)
{ cout « char«(array[i/2]&mask[i%2]»>shift[i%2])+'O'); }
}
72 Chapter 3. Number Representation

3.2 Floating Point Representation


3.2.1 Introduction
We have seen how to store integer numbers in bit sequences, and in Chapter 2 an
example illustrated a similar method for representing real numbers on a certain in-
terval with a guaranteed accuracy. The interval was divided into equal subintervals
according to the number of bit string combinations allowed. Certain real numbers
can be converted without loss of accuracy. These are the numbers which are bound-
aries for the subintervals. The other real numbers will be encoded, with error, to one
of these numbers. This is the fixed point method of storing real numbers. The max-
imum accuracy of the fractions is uniquely specified by the size of the sub intervals.
Thus we can identify a fractional part and an integer part giving a fixed separation
of bits representing the integer and fractional parts respectively. Traditionally this
separation is indicated by a decimal point in the symbolic representation. Floating
point number representations use unequal subintervals to represent real numbers.
For example small numbers require better accuracy in calculations and thus are rep-
resented with smaller intervals. Larger numbers use larger intervals. To compromise
we may require that the ratio of the error in representation to the number being
represented be approximately constant.

Storing floating-point numbers presents a problem similar to that of storing signed


integers. For integers, some indication of a positive or negative sign has to be rep-
resented. For floating-point instructions some method must be devised for showing
where the decimal point should go. That is, we must distinguish between the frac-
tional part to the right of the decimal point - called the mantissa - and the integer
portion to the left of the decimal point. Different methods have been used in the
past and different methods continue to be used by the various manufacturers of com-
puters. There have been so many different ways of coding a floating-point number
into binary that the Institute of Electrical and Electronics Engineers (IEEE) has
proposed a standard format.

There are actually three formats - one that requires 32 bits, one that is used for 64
bits, and one for 80 bits. We describe the 32-bit format, called the short real format,
here.

The table lists seven numeric data types showing the data format for each type. The
table also shows the approximate range of normalized values that can be represented
with each type. Denormal values are also supported in each of the real types, as
required by IEEE Std 854.
3.2 Floating Point Representation 73

Table: Numeric Data Types

Significant
Digits Approximate Normalized
Data Type Bits (Decimal) Range (Decimal)

Word Integer 16 4 -32768 S; x S; +32767

Short Integer 32 9 -2 x 109 S; X S; +2 X 109

Long Integer 64 18 -9 x 10 18 S; X S; +9 x 1018

Packed Decimal 80 18 -99 ... 99 S; x S; +99 ... 99(18 digits)

Single Real 32 7 1.18 x 10-38 < Ixl < 3.40 x 1038


Double Real 64 15-16 2.23 x 10-308 < Ixl < 1.79 x 10308
Extended Real 80 19 3.37 x 10-4932 < Ixl < 1.18 x 104932

All operands are stored in memory with the least significant digits starting at the
initial (lowest) memory address. Numeric instructions access and store memory
operands using only this initial address.
74 Chapter 3. Number Representation

3.2.2 Representation
The first step to understanding how a binary fraction is stored using short real
format is to normalize it. This is similar to putting a decimal point number into the
familiar scientific notation in which we have a sign, an exponent, and a mantissa.
To normalize a binary fraction, we write it so that the first 1 is just to the left of
the binary point.

Example. Consider the binary number

111111111
O.OOOllllOl = O· - + 0 . - + 0 . - + 1 . - + 1· - + 1 . - + 1 . - + O· - + 1 . -
2 22 23 24 25 26 27 28 29

Then the normalized representation is

..
1.l1l0l .2- 4

The next step is to represent the important parts of the normalized fraction in 32
bits. The important parts are those that will allow us to recover the original number
(and allow the computer to perform operations on it). These parts are the

1. Sign

2. Exponent (whose base is understood to be 2)

3. Mantissa

In the IEEE short real format, the sign is stored in the leftmost bit, the exponent
is stored in the next 8-bits, after some alteration, and the mantissa is stored in the
rightmost 23 bits, again after a minor adjustment.

1. To store the sign. 0 for positive, 1 for negative.

2. To store the exponent. Add 127 (l11l11b) to it. The number 127 is called a
bias, and the resulting exponent is called a biased exponent. Biased exponents
may range from 1 to 254, so that exponents range from -126 to +128.

3. To store the mantissa. Remove the leftmost 1 and store the rest of the fraction
left-adjusted. This technique of not storing the first 1 before the binary point
is a common way to store mantissas. It is called hidden bit storage. Computer
circuitry knows that the 1 is really part of the mantissa.
3.2 Floating Point Representation 75

Example. Find 0.0390625 (base 10) as it would be stored in short real format.

Step 1. Convert the fraction to binary

.0390625 10 = .00001012
Step 2. Normalize the binary fraction .

.0000101 normalized is 1.01 . T 5

Step 3. Calculate the sign, the exponent, and the mantissa.

Sign: 0, since this is a positive number

Exponent: -5 + 127 = 122 (base 10) = 01111010 (base 2)

Mantissa: .01 left-adjusted into a field of width 23 is

.0100000000000000000000

Thus the entire number is represented by the bitstring

~~ P100000000090000000000Q
Sign Exponent Fradion

The following C++ program implements this algorithm. The only difference is that
the actual conversion to binary is delayed until after the normalization procedure.
We use the above test example. For the output we find

0.0390625 (base 10) =


o 01111010 01000000000000000000000 (floating point base 2)
76 Chapter 3. Number Representation

II float2bin.cpp

#include <iostream.h>
#include <math.h>

void normalize(float &f,char &e)


{
e = 0;
IInumbers larger than 2 we reduce
IIdown to 1 plus a fraction,
IIand a compensating exponent
while(fabs(f) > 2)
{
f 1= 2;
e++;
}

II numbers smaller than 1 we promote


II up to 1 plus a fraction,
II and a compensating exponent
while(fabs(f) < 1)
{
f *= 2;
e--j
}
}

void float2bin(float f,char *b)


{
char el;
int e;
int i;

normalize(f,el);

II add the bias


e = int(el) + 127;

II set the sign bit


b[O] = (f < 0) ? '1':'0';

f = fabs(f);
II remove the leftmost 1 bit
f -= 1;

b[1]=b[10]=' ';
3.2 Floating Point Representation 77

II convert the exponent


for(i=8;i>0;i--)
{
b[i+1] = e%2 + '0';
e/=2;
}

II convert the mantissa


for(i=1;i<24;i++)
{
int bit (f>=pow(2,-i)) ;

b[i+10] (bit)? '1':'0';


if (bit) f -= pow(2,-i);
}

b[34]='\0' ;
}

void main(void)
{
char b[35] ;
float f=0.0390625;

float2bin(f,b);
cout « f « " (base 10) = "
« b « " (floating point base 2)"«endl;
}
78 Chapter 3. Number Representation

Example. What number is stored as

10111110111101000000000000000000?

We recover the parts as

1101111101111101000000000000000000

Sign: 1, so the number is negative

Exponent: 0111110b = 125 10 , 125 - 127 = -2

Mantissa: affixing 1 to the left of

.11101000000000000000000
results in

1.11101000000000000000000
which is

1.11101 2 •
Multiplying by 2- 2 (provided by the exponent) yields

1 1 1 1 1
.011110b = - + - + - + - + - = -0.4765625 10
4 8 16 32 128
Chapter 4
Logic Gates

4.1 Introduction
A digital electronic system uses a building-block approach. Many small operational
units are interconnected to make up the overall system. The system's most basic
unit is the gate circuit. These circuits have one output and one or more inputs. The
most basic description of operation is given by the function table, which lists all
possible combinations of inputs along with the resulting output in terms of voltage,
high and low. Table 4.1(a) shows a function table for a 2-input circuit. This table
indicates that if both inputs are low or both are high, the output will be low. If
one input is high and the other is low, a high level will result on the output line.
As we deal with logic design, it is appropriate to use Is and Os rather than voltage
levels. Thus, we must choose a positive (H = 1, L = 0) or negative (H = 0, L =
1) logic scheme. Once this choice is made, we use the function table to generate a
truth table. The function table describes inputs and outputs in terms of Is and Os
rather than voltage levels. Function tables are used by manufacturers of logic gates
to specify gate operation. The manufacturer conventionally defines gates in terms
of positive logic.

Inputs Output Inputs Output Inputs Output


Al A2 X Al A2 X Al A2 X
L L L 0 0 0 1 1 1
L H H 0 1 1 1 0 0
H L H 1 0 1 0 1 0
H H L 1 1 0 0 0 1
L=low voltage level
..
POSItIve lOgIC NegatIve lOgIC
H=high voltage level
(a) (b) (c)

Table 4.1: Function Table and Truth Tables for a Logic Circuit

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
80 Chapter 4. Logic Gates

4.2 Gates
4.2.1 AND Gate
The AND gate has one output and two or more inputs. The output will equal 0 for
all combinations of input values except when all inputs equal 1. When each input
is 1, the output will also equall. Figure 4.1 shows the AND gate. Table 4.2 shows
the function and positive logic truth tables. The AND gate will function as an OR
gate for negative logic, but the gate is named for its positive logic function.

Al A2 X
0 0 0
0 1 0
1 0 0
1 1 1

Table 4.2: Truth Table for the AND Gate

Figure 4.1: Symbol for 2-input AND Gate

The AND operation can be interpreted as the multiplication of a set of 1 bit numbers;
a 0 among the input variables makes the result (product) 0; the product is 1 if and
only if all the inputs are 1. For this reason the AND function is written as a product
expression

or
X AND := Al ..... An
if we have n inputs. Alternative AND symbols in common use are /\ and &. The
latter is the AND designator in the standard box symbol for an AND gate. As with
multiplication the symbol . is sometimes omitted from AND expressions, so that
Al . A2 reduces to A I A2.

In CMOS the 4081 provides quad two-input AND gates.


4.2 Gates 81

4.2.2 OR Gate
The OR gate has one output and two or more inputs. If all inputs are equal to 0,
the output will be equal to O. The presence of a 1 bit leads to an output of 1. Table
4.3 describes this operation in terms of a truth table. The standard symbol for a
2-input OR gate is shown in Figure 4.2.

Al A2 X
0 0 0
0 1 1
1 0 1
1 1 1

Table 4.3: Truth Table for the OR Gate

Figure 4.2: Symbol for 2-input OR Gate

The OR operation takes its name from the fact that the output X is 1 if and only
if Al is 1 or A2 is 1. In other words, the output X of an OR gate is 1 if and only if
the number of Is applied as input is one or greater.

We can have more than 2 input lines. The X is 1 if and only if Al is 1 or A2 is 1 or


... An is 1. This interpretation leads to the use of the symbol 2: 1 in the OR box.
By a somewhat weak analogy with numerical addition, the OR function is usually
written as a sum expression

Thus, + denotes OR in this context, and is read as "or" rather than plus. An
alternative OR symbol is V.

In CMOS 4071 is a quad two-input gate and the CMOS 4072 is a two quad-input
gate.
82 Chapter 4. Logic Gates

4.2.3 XOR Gate


If the exclusive OR gate (XOR gate) has two inputs, then the output gives a 1 when
either input is 1, but not when both are 1. If the input is Al = 0 and A2 = 0, then
the output is O. Table 4.4 shows the truth table for the XOR gate.

Al A2 X
0 0 0
0 1 1
1 0 1
1 1 0

Table 4.4: Truth Table for the XOR Gate

The generalization of XOR to n input variables is most easily specified in terms of


the parity of the number of 1's among the n input variables

X (A A A ) '= {I if an odd number of inputs are 1


XOR b 2,"" n' 0 otherwise

For this reason, XOR is also called the odd-parity junction, and is the basis of error-
handling circuits. This versatile function can also be interpreted as (numerical)
summation modulo 2. Thus, another definition of XOR equivalent to the definition
given above is

The XOR gate is a special gate and is widely employed in digital circuits that
perform mathematical functions.

The symbol for the XOR gate are shown in the next figure.

Figure 4.3: Symbol for 2-input XOR Gate

The use of the generic odd number 2k + 1 as the function designator in the standard
box symbol reflects the fact that the output is 1 if and only if 2k + 1 inputs are
1, for k = 0,1,2, .... In logic expressions, the XOR operator is Ell which is read as
exclusive OR, ring-sum, or sum modulo 2. Thus, we can write
XXOR = Al Ell A2 Ell ... Ell An
The CMOS 4030 is a quad two-input exclusive OR gate.
4.2 Gates 83

4.2.4 NOT Gate (Inverter)


The inverter (also called NOT gate) performs the NOT or INVERT function and
has one output and one input. The output level is always opposite to the input level.
Thus an inverter converts a 0 to 1 and a 1 to 0, an operation known variously as
inversion, complementation, or the NOT function. NOT is denoted by an overbar
in functional expressions and by a small circle, the inversion symbol, in circuit
diagrams. We write
X NOT = A.
Table 4.5 shows the truth table for the inverter. Figure 4.4 shows the symbol for
the inverter.

lffiJ
o
1
x
1
0

Table 4.5: Truth Table for the NOT Gate

Figure 4.4: Symbol for the NOT Gate

In CMOS the 4069 is a hex inverter. Each of the six inverters is a single stage.

The NOT gate can be combined with the AND, OR and XOR gate to provide the
NAND, NOR and XNOR gate.
84 Chapter 4. Logic Gates

4.2.5 NAND Gate


The NAND gate is an AND gate followed by an inverter. A NAND gate can have
two or more inputs. Thus the NAND gate is formed by appending NOT to AND.
The output will be 0 only when all inputs are 1. Its logic expression is

X = A l · A2 • •.. · An
which indicates that the inputs AI, A2 , . .. , An are first ANDed and then the result
inverted. Thus a NAND gate always produces an output that is the inverse (op-
posite) to an AND gate. The gate symbol is therefore formed by appending the
graphic inversion symbol (a small circle) to the corresponding AND symbol.

Al A2 X
0 0 1
0 1 1
1 0 1
1 1 0

Table 4.6: Truth Table for the NAND Gate

Figure 4.5: Symbol for 2-input NAND Gate

Since both inverters and AND gates can be constructed from NAND gates, the
NAND gate is seen to be a functionally complete set itself. The AND gate and
inverter form a functionally complete set. This means that any logic function realized
by logic gates can be realized with the AND and NOT functions. For example the
XOR gate can be represented by

Figure 4.6: XOR Implemented With NAND Gates

In CMOS the 4011 provides a quad two-input NAND gate.


4.2 Gates 85

4.2.6 NOR Gate


The NOR gate is an OR gate followed by an inverter. The NOR gate can have two
or more inputs. Thus the NOR gate combines the OR and NOT operations such
that the output will be 0 when any input is 1. Its logical expression is

which indicates that AI, A2 , • .. , An are first ORed and then the result is inverted.
A NOR gate always gives an output that is the inverse of the OR gate. The gate is
characterized by the tables and symbols of Table 4.7 and Figure 4.7.

Al A2 X
0 0 1
0 1 0
1 0 0
1 1 0

Table 4.7: Truth Table for the NOR Gate

Figure 4.7: Symbol for 2-input NOR Gate

All other gates can be constructed from NOR gates. For example, the XOR gate
can be found as

Figure 4.8: XOR Implemented With NOR Gates

In CMOS the 40018 is a quad two-input NOR gate.


86 Chapter 4. Logic Gates

4.2.1 XNOR Gate


The exclusive-NOR or XNOR gate produces a 1 output only when the inputs are
at the same logic level. The exclusive-NOR gate is also known as the even-parity
function for obvious reasons. The truth table is given in Table 4.8

Al A2 X
0 0 1
0 1 0
1 0 0
1 1 1

Table 4.8: Truth Table for the XNOR Gate

Figure 4.9: Symbol for 2-input XNOR Gate

The XNOR gate is not a universal gate.

In CMOS the 4077 provides the quadruple exlusive-NOR gate.


4.3 Buffer 87

4.3 Buffer
The buffer is an IC device that provides no change in logic at the output, but does
provide a high input load impedance, and therefore good output drive capability. It
works the same way as an emitter-follower circuit. The output of a MOS micropro-
cessor, for example, has very poor drive capability when driving a TTL device. By
inserting a buffer between the output of the MOS microprocessor and the input of
the TTL device, we can solve the problem. The buffer provides an input load the
processor can handle and an output drive that is TTL-compatible. The truth table
and the symbol for a buffer are shown in Table 4.9 and Figure 4.10.

lffiJ
o
1
x
0
1

Table 4.9: Truth Table for the Buffer

Figure 4.10: Symbol for the Buffer

As an example consider the buffering of MPU buses. Thc MPU, RAM and ROM
are chips that are generally manufactured using CMOS technology. The decoders,
gates, inverters, tri-state buffers, and output register are all TTL devices, usually
L8-TTL to minimize power requirements and loading.

In CMOS the 4041B is a quadruple true/complement buffer which provides both an


inverted active LOW output and a non-inverted active HIGH output (0) for each
input (I).
88 Chapter 4. Logic Gates

4.4 Tri-State Logic


The development of bus organized computers led to the development of a type
of logic circuitry that has three distinct output states. These devices, called tri-
state logic (TSL) devices, have a third output condition in addition to the normal
HIGH and LOW logic voltage levels. This third output condition is called the high-
impedence, or high-Z state. Thus tri-state circuits have the high impedence state Z
as a normal output variable in addition to the usual 0 and 1 values. We can convert
any logic circuit C to trl-state form simply by inserting a switch S in its output line
X. The control input signal E of S is called an enable signal if E = 1 makes X = y,
and E = 0 makes X = Z. Otherwise it is a disable signal When E = 1 the output
X is said to be enabled, and assumes its normal 0-1 levels determined by C. The
ENABLE input determines the output operation so that the output either acts as
a normal TTL output (ENABLE=I) or as a high-Z output (ENABLE=O). In the
enabled condition, the circuit behaves exactly as any logic buffer gate, producing
an output voltage level equivalent to the input logic level. In the disabled high-Z
state, the output terminal acts as if it were disconnected from the buffer; in other
words, think of it as a virtual open circuit.

Inputs Output
A E X

~ ~LE_N__l_'\l-,~ X 0 0
0 1
Z
1
1 0 Z
1 1 0

Inputs Output
A E X

~ ~LE_N__l_'\l-,~ X 0
0
0
1
0
Z
1 0 1
1 1 Z

Figure 4.11: (a) A tri-state inverter with an enable line, (b) a tri-state buffer with
a disable line

Tri-state buffers are often used in applications where several logic signals are to be
connected to a common line called a bus. Many types of logic circuits are currently
available with trl-state outputs. Other tri-state circuits include flip-flops, registers,
memories, and almost all microprocessors and microprocessor interface chips. In
CMOS the 400097 is a hex non-inverting buffer with 3-state outputs. The 3-state
outputs are controlled by two enable inputs.
4.5 Feedback and Gates 89

4.5 Feedback and Gates


Any logical circuit in which signal flow is unidirectional, a so-called feedforward
circuit, has a finite memory span bounded by the maximum forward value of the
combined propagation delays along any path from a primary output to a primary
input. In order to construct a circuit with unbounded memory span from unidi-
rectional logic elements, it is necessary to create a closed signal or feedback loop.
Feedback is a basic property of sequential circuits (flip-flops and latches). The
problem caused by feedback in a purely combinational logic circuit ~ that is, one
with gate delay zero, is that it can lead to a logical inconsistency.

Figure 4.12: NAND Gate With Feedback

In Figure 4.12 there is a feedback loop from the output to the input of the NAND
gate, implying the Boolean equation

X(t) = A(t) . X(t)

must be satisfied. This equation is satisfied by A(t) = 0 and X(t) = 1, since

Consequently, the signal configuration is consistent and stable. If A(t) = 1 we obtain


a logically inconsistent situation, since

A(t) . X(t) = o.
Similarly, for X(t) = 0 we have a logically inconsistent situation, since X(t) cannot
be 0 and 1 at the same time.

The inconsistency present in this example disappears if the NAND gate has a
nonzero propagation delay t pd , which also makes a better model for the behaviour
of a physical gate. Our equation changes to

X(t) = A(t - tpd) . X(t - tpd).


The output signal A(t) is no longer a function of its present value. Instead, it
depends on the past value X(t - tpd), which can differ from X(t). In particular,
when A(t - tpd) = 1, we can satisfy the equation with

X(t) = X(t - tpd).

Hence, if A(t) changes from 0 to 1 at some time t, this change will cause X(t) to
change from 0 to 1 at some time t + tpd. Owing to our equation, this second change
90 Chapter 4. Logic Gates

will change X(t) from 1 to 0 at t + 2t pd , and so on. Hence, the value of X(t) must
change every tpd time units. This type of regular and spontaneous changing, called
oscillation, is an extreme form of unstable behaviour. However it is not logically
inconsistent. This type of behaviour plays an important role in generating the clock
signal that controls synchronous circuits. Spontaneous oscillation of the above kind
involves narrow pulses of width tpd that tend to be filtered out by the gate through
which they pass. Consequently, such an oscillation usually dies out quickly.
Chapter 5
Combinational Circuits

5.1 Introduction
A combinational circuit consists of gates representing Boolean connectives; it is free
of feedback loops. A combinational circuit has no state; its output depends solely
on the momentary input values. Examples are the full adder, comparator, decoder
and multiplexer. In reality, however, signal changes propagate through a sequence
of gates with a finite speed. This is due to the capacitive loads of the amplifying
transistors. Hence circuits have a certain propagation delay.

In this chapter we consider circuits such as adders, multipliers and comparators


which can be used to build an arithmetic logic unit. Optimizations for some of
these circuits are also considered.

Every Boolean function can be expressed in a normal form consisting of disjunctions


of conjunctions, and it can therefore be implemented by two levels of gates only. Of
considerable technical relevance are devices which represent two levels of gates in
a general form. A specific function is selected by opening (or closing) connections
between specific gates. This is called programming the device, and the device is a
programmable logic device (PLD). The gates in a PLD are of the AND, OR and
NOT types. Programming happens electrically under computer control. PLD's are
highly attractive in order to reduce the number of discrete components in circuits.
A specific form of a PLD is the read-only memory (ROM). Another example is
programmable array logic (PAL) where the AND gates are programmable.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
92 Chapter 5. Combinational Circuits

5.2 Decoder
In digital computers, binary codes are used to represent many different types of
information, such as instructions, numerical data, memory addresses, and control
commands. A code group that contains N bits can have 2N different combinations,
each of which represents a different piece of information. A logic circuit is required
which can take the N-bit code as logic inputs and then generate an appropriate
output signal to identify which of the 2N different combinations is present. Such a
circuit is called a decoder.

Thus a l-out-of-n decoder is a circuit with n outputs and N = log2 n = ldn inputs,
outputs Xj are numbered from 0 to (n - 1). An output goes to 1 when the input
number A is identical to the number j of the relevant output. Figure 5.1 shows the
truth table for a 1-out-of-4 decoder. The variables Ao and Al represent the binary
code of the decimal number m. The sum of the products (disjunctive normal form)
of the recoding functions can be taken directly from the truth table. The circuit is
also shown using AND and NOT gates. The functions are

Most integrated-circuit decoders can decode 2-,3-, or 4-bit input codes.

In CMOS 4028 is a 4-bit BCD to 1-of-10 active high decoder.

Ao-.---I-
Al -t-----f-1+
Inputs Outputs
m Ao Al Xo Xl X 2 X3
0 0 0 0 0 0 1
1 0 1 0 0 1 0
2 1 0 0 1 0 0
3 1 1 1 0 0 0

Figure 5.1: Truth Table and Circuit of a 1-out-of-4 Decoder


5.3 Encoder 93

5.3 Encoder
A decoder takes an input code and activates the one corresponding output. An
encoder performs the opposite operation; it generates a binary code corresponding
to which input has been activated. A commonly used Ie encoder is represented in
Figure 5.2. It has eight active LOW inputs, which are kept normally high. When
one of the inputs is driven to 0, the binary output code is generated corresponding
to that input. For example, when input 13 = 0, the outputs will be CBA = 011,
which is the binary equivalent of decimal 3. When 16 = 0, the outputs will be
CBA = 110. For some encoders, if more than one input is made low the output
would be garbage. For a priority encoder, the outputs would be the binary code for
the highest-numbered input that is activated. For example, assume that the encoder
of the Figure is a priority encoder and that inputs 14 and 17 are simultaneously made
low. The output code will be CBA = 111 corresponding to 17 . No matter how many
inputs are activated, the code for the highest one will appear at the output.

To
T;
C
I:;
~
8-Line-to-3-line B
~
encoder
I;,
A
h
17

Figure 5.2: Typical Ie Encoder


A simple 8-input encoder is given by

A h+h+h+h
B h + 13 + 16 + 17
C 14 +h +h+17
V ~+h+~+h+~+h+h+h

The output V is used to indicate when an input is 1 for the encoder; it differentiates
between the 0 input 10 and when no inputs are 1. The encoder is not a priority
encoder, it performs a bitwise OR on all the inputs which are set to 1.
94 Chapter 5. Combinational Circuits

In CMOS the 4532 is an 8-input priority encoder with eight active HIGH priority
inputs (fa to h), three active HIGH outputs (00 to O2 ), an active HIGH enable
input (Ein ), an active HIGH enable output (Eout) and an active HIGH group select
output (GS). Data is accepted on inputs fa to f 7 . The binary code corresponding
to the highest priority input (fa to f7) which is HIGH, is generated on 0 0 to O2 if
E in is HIGH. Input h is assigned the highest priority. GS is HIGH when one or
more priority inputs and Ein are HIGH. Eout is HIGH when fa to f7 are LOW and
E in is HIGH. E in , when LOW, forces all outputs (00 to O2 , GS, Eout) LOW. The
circuit is given below.

~
~
~
~
~ >---
-
~ t-
t-
~
1111111
~ 1111111.

~ ~
& & & & & &

::::1 ::::1

& &

00 GS Eout

Figure 5.3: Circuit for the CMOS 4532


5.3 Encoder 95

The logic equations are

O2 = Ein . (h + h + h + 17)

0 1 = E in · (12 . ~ . h + 13 . ~ . 15 + 17)
0 0 = Ein · (h . h.. ~. Tr, + 13 . 14 . Tr, + h· Tr, + Ir)
Eaut = Ein . (To . II . 12 . 1;, . ~ . Y;, . Tr, . J.;)
GS = Ein . (10 + h + 12 + 13 + 14 + h + h + 17).

Inputs Outputs
E in Ir h 15 14 fa h h 10 GS O2 0 1 0 0 EOut
L X X X X X X X X L L L L L
H L L L L L L L L L L L L H
H H X X X X X X X H H H H L
H L H X X X X X X H H H L L
H L L H X X X X X H H L H L
H L L L H X X X X H H L L L
H L L L L H X X X H L H H L
H L L L L L H X X H L H L L
H L L L L L L H X H L L H L
H L L L L L L L H H L L L L

Table 5.1: Truth Table for the CMOS 4532


96 Chapter 5. Combinational Circuits

5.4 Demultiplexer
A demultiplexer can be used to distribute input information D to various outputs.
It represents an extension of the l-out-of-n decoder. The addressed output does
not go to one, but assumes the value of the input variable D. Figure 5.4 shows
its implementation using AND and NOT gates. If we make D =const= 1, the
demultiplexer operates as a l-out-of-n decoder.

The logic functions are

Xo = D . Ao . A!, Xl = D· Ao' AI,

The following figure shows the basic mode of operation and the circuit.

Ao--.-----i
Al-+-----t-.-l

Xo

D-~----_nr-------~

Figure 5.4: Demultiplexer Circuit

In CMOS the 4555 is a dual 1-of-4 decoder/demultiplexer. Each has two address
inputs (Ao and AI), an active LOW enable input (It) and four mutually exclusive
outputs which are active HIGH (00 to 0 3 ), When used as a decoder (It is HIGH),
then 0 0 to 0 3 is LOW. When used as a demultiplexer, the appropriate output is
selected by the information on Ao and Al with It as data input. All unselected
outputs are LOW.

The truth table is given by Table 5.2.

Inputs Outputs
It Ao Al 0 0 0 1 O2 0 3
L L L H L L L
L H L L H L L
L L H L L H L
L H H L L L H
H X X L L L L

Table 5.2: Truth Table for CMOS 4555


5.5 Multiplexer 97

5.5 Multiplexer
A multiplexer or data selector is a logic circuit that accepts several data inputs
and allows only one of them at a time to get through to the output. It is an
extension of an encoder. The routing of the desired data input to the output is
controlled by SELECT inputs (sometimes referred to as ADDRESS inputs). There
are many IC multiplexers with various numbers of data inputs and select inputs.
Thus the opposite of a demultiplexer is a multiplexer. The following figure shows
the multiplexer circuit.

AO---..----1
A 1 _-+-__-+..-!

Da-----i-----1:i===t====1
x

Figure 5.5: Multiplexer Circuit

The logic function is

In CMOS technology, a multiplexer can be implemented using both gates and ana-
log switches (transmission gates). When analog switches are employed, signal trans-
mission is bidirectional. In this case, therefore, the multiplexer is identical to the
demultiplexer. The circuit is then known as an analog multiplexer/demultiplexer.

In CMOS the 4019 provides four multiplexing circuits with common select inputs
(SA, SB). Each circuit contains two inputs (An' En) and one output (On). It may
be used to select four bits of information from one of two sources. The A inputs
are selected when SA is HIGH, the E inputs are selected when SB is HIGH. When
SA and SB are HIGH, the output (On) is the logical OR of the An and En inputs
(On = An +En). When SA and SB are LOW, the output (On) is LOW independent
of the multiplexer inputs.
98 Chapter 5. Combinational Circuits

5.6 Binary Adder


5.6.1 Binary Half Adder
Consider the task of adding two I-bit numbers Ao and A 1• Obviously,

0+0 = 0, 0+ 1 = 1, 1+0=1.

The sum 1+1 requires two bits to represent it, namely 10, the binary form of
two(decimal). This can be expressed as follows: one plus one yields a sum bit
S = 0 and a carry bit C = 1. If we ignore the carry bit and restrict the sum to the
single bit s, then we obtain 1 + 1 = O. This is a very useful special form of addition
known as modulo-2 addition.

The half adder circuit can be realized using an XOR gate and an AND gate. One
output gives the sum of the two bits and the other gives the carry. In CMOS for
the AND gate the 4081 can be used and for the XOR gate the 4030 can be used.

The circuit and the figure for the input and output is shown in the following figure.

Figure 5.6: Half Adder Circuit

Inputs Outputs
Ao Al S C
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1

Table 5.3: Half Adder Truth Table

The logic function for S and C are

C= Ao·A1.
5.6 Binary Adder 99

5.6.2 Binary Full Adder


A full adder adds three bits at a time, a necessary operation when two multi-bit
binary numbers are added. The figure shows a full adder. It consistes of three AND
gates, an OR gate and an XOR gate. The full adder computes the numerical sum
of the three input bits in unsigned binary code. For example, when all three inputs
are 1, the output is YOYl = 11, which is the binary representation of the decimal
number three. One of the inputs is the carry bit from the previous addition.

We can construct logic expressions for these operations. The five gates (three AND
gates, one OR gate and one XOR gate) used in the full adder given below lead to
the logic equations

Note that the + is the logical OR operation, and· is the AND operation and EEl is
the XOR operation.
sum
Yi

A2
carry
Al Yo

Ao

Figure 5.7: Full Adder Circuit

Inputs Outputs
Ao Al A2 Yo YI
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1

Table 5.4: Full Adder Truth Table


100 Chapter 5. Combinational Circuits

5.6.3 Binary Four-Bit Adder


To add 3+3 (decimal) (i.e. 11 + 11 in binary), two full adders in parallel are required,
as shown in the figure. Actually the first full adder, F AI, need only be a half adder
since it handles just two bits. The output is 6 (decimal) or 110 in binary.
1
1

1
carry 1 o
sum

Figure 5.8: Two Full Adders in Parallel

To add two four-bit numbers, four adders are connected in parallel as shown for the
addition of binary 1110 (14 decimal) and 0111 (7 decimal) to give the sum 10101
(21 in decimal). By joining more full adders to the left of the system, numbers with
more bits can be added.
o
1

1
carry o 1 o 1
sum

Figure 5.9: Four Bit Adder Consisting of Four Adders

In CMOS the 4008 is a 4-bit binary full adder with two 4-bit data inputs, a carry
input, four sum outputs, and a carry output. The IC uses full look-ahead across
4-bits to generate the carry output. This minimizes the necessity for extensive
look-ahead and carry cascading circuits.
5.6 Binary Adder 101

5.6.4 Faster Addition


The 4-bit adder calculates a carry bit three times before the final carry value can be
passed to another 4-bit adder. This is known as carry cascading. A speed increase
is obviously possible if the final carry value can be calculated immediatly from the
input values (to reduce the carry cascading). Let A;, Bi and Gi- 1 (i = 0, 1,2) denote
the inputs for each stage i of a 3-bit adder. The equations for the carry bits are
given by
G- 1 := Gin
Gi = Ai . Bi + Ai . Gi- 1 + Bi . Gi- 1 i = 0, 1, 2.
Thus for Gout we find

G2 = A2 · B2 + A2 · AI· BI + AI· B2 · BI +
A2 · AI· Ao· Bo + A2 · Ao· BI . Bo + AI· Ao· B2 · Bo + Ao· B2 · BI . Bo +
A2 . Al . Ao . Gin + A2 . Al . Bo . Gin + A2 . Ao . BI . Gin + Al . Ao . B2 . Gin +
A2 . Bl . Bo . Gin + Al . B2 . Bo . Gin + Ao . B2 . BI . Gin + B2 . Bl . Bo . Gin.

Each full adder requires 2 levels of gates to calculate the carry bit. Thus we have
reduced the carry computation from 6 levels of gates to 2 levels of gates. The circuit
for the computation is given below.

I
2:
I 1
I

888888888888888
Figure 5.10: Circuit for the Carry Bit of a 3-bit Adder

Similary the calculation for the carry bit of a 4-bit adder will be reduced from 8
levels of gates to 2 levels of gates.
102 Chapter 5. Combinational Circuits

5.7 Binary Subtraction


Binary subtraction can be built in the same way as binary addition. However, it
is possible to perform binary subtraction using binary addition which can simplify
circuits designed to perform binary arithmetic. In chapter 3 a number of ways to
represent negative integers in binary format were introduced. The two's complement
method made it possible to add signed integers using standard binary addition.
Thus it makes sense to use the two's complement format to enable subtraction
using existing methods.

To subtract B from A where B and A are 4-bit numbers we use

A-B = A+ (-B)
where the negation is two's complement.

The circuit is as follows

4-bit Adder

A-B

Figure 5.11: Binary Subtraction Using the Two's Complement


5.8 Binary Multiplication 103

5.8 Binary Multiplication


5.8.1 Unsigned Integer Multiplication
Binary multiplication can be implemented with simple addition. This is very slow.
Some improvement can be obtained by using the distributive properties of binary
arithmetic. Suppose

n n
A = L:ai2i and B = L:bi 2i
i=O i=O

with ai, bi E {O, I} are to be multiplied. Using distributivity we find

Multiplying by a power of 2 can be implemented as a simple shift operation. The


following algorithm summarizes the technique.

1. j := 0, result := °
2. if bj is 1 add A to result

3. shift left A

4. increment j

5. if j :::; n goto (2)


The algorithm requires a number of steps. To keep the circuits simple, control
mechanisms introduced in Chapter 7 will be needed. The alternative is a more
complicated implementation consisting of calculating all the shifted values of A first
and then adding each input where the input lines of Bare 1. The shift left operation
is simple. To perform addition controlled by the bj inputs it suffices to examine for
each output line the input value I, the calculated output 0 (in this case for addition)
and bj . Thus the final output is given by

To ensure that the product of two n-bit numbers can be represented the output may
be extended to 2n bits.
104 Chapter 5. Combinational Circuits

0000 A

Product
CA- Controlled 4-bit Adder
SL - Logical 4-bit Shift Left

Figure 5.12: Unsigned 4-bit Multiplication

The Russian peasant method [104] uses the same technique for multiplication, with a
small change to simplify implementation. It is a practical method for multiplication
by hand, since it involves only the operations of doubling, halving and adding.
Western visitors to Russia in the nineteenth century found the method in wide use
there, from which the method derives its name.

1. j := 0, result := 0

2. if bo is 1 add A to result

3. shift left A

4. shift right B

5. increment j

6. if j ::; n goto (2)


5.8 Binary Multiplication 105

5.8.2 Fast Multiplication


Instead of using an iterative algorithm, multiplication can be performed in parallel
by determining each output bit from the input bits only (at the same time). Consider
the following multiplication of two bits AlAo and BlBo.

Bl Bo
Al AD
Ao' BI Ao' Bo
AI' BI AI' Bo
P3 P2 PI Po

The product is given by

Po Ao·Bo
Pl (Ao . B I ) EB (Al . Bo)
g (AI' B l ) EB (Al . Ao . Bl . Bo)
P3 Al . AD . Bl . Bo

Ao----+---------------~----~----------------_.
Al--~r-------~------~----_+----_,
Bo--~~------_+----~H_----_+----~--------__,
Bl--~r-------~----~H_--_,

Po

Figure 5.13: 2-bit Fast Unsigned Multiplication


106 Chapter 5. Combinational Circuits

5.8.3 Signed Integer Multiplication


Two signed integers can be multiplied by multiplying their absolute values and
setting the appropriate sign afterwards. For two's complement numbers Booth's
algorithm is used. It is an efficient algorithm for multiplying signed integers and
optimizes the calculation in the following way: suppose an N-bit unsigned integer
has a block of consecutive 1's in its binary representation. If the first 1 is at bit k
and the last 1 is at bit n then the decimal representation for this block is 2n +1 - 2k.
This follows from the fact that

n
L 2k = 2 +l -
n 1.
k=O

This can be extended for two's complement numbers. The two's complement (nega-
tion) of the same number gives 2N - 2n + 2k+l. The contribution of 2N is an overflow
and does not influence the operation. Thus addition and subtraction can be per-
formed whenever a 1 and 0 are adjacent in the bit representations.

For Booth's algorithm we introduce an extra bit Q-l which is used to determine the
boundaries of blocks of O's or 1's. The final product is in AQ. A and Q are n-bit
registers. The arithmetic shift right (SHR) operation shifts all bits one position
right, and leaves the highest order bit (sign bit) at its previous value.

1. A:= 0, Q-l := 0
M :=Multiplicand
Q :=Multiplier
C:=n

2. If Qo = Q-l goto (5)

3. If QOQ-l = 01 A := A + M
4. If QOQ-l = 10 A := A - M

5. Arithmetic SHR A, Q, Q-l


C=C-1
6. If C is nonzero goto (2)
5.9 Binary Division 107

5.9 Binary Division


Binary division is modelled on the long division process, for example

1. A:= 0
M :=Divisor, Q :=Dividend, C := n

2. SHL AQ

3. A :=A-M

4. if (A < 0) goto (6)


5. Qo:= 1, goto (7)
6. Qo := 0, A := A + M

7. increment C

8. if(C>O) goto (2)


Two's complement division:

1. Load the divisor into the M register and the dividend into the AQ registers.
The dividend must be expressed as a 2n two's complement number. Thus,
for example, the 4-bit number 0111 becomes 00000111, and 1001 becomes
11111001.

2. Shift left AQ 1 bit position

3. If M and A have the same signs, perform A:= A-M; otherwise, A:= A+M.

4. The above operation is successful if the sign of A is the same before and after
the enumeration.

(a) If the operation is successful or (A = 0 AND Q = 0) then set Qo = 1.


(b) If the operation is unsuccessful and (A f= 0 OR Q f= 0), then set Qo=O
and restore the previous value of A.

5. Repeat steps (2) through (4) as many times as there are bit positions in Q.
6. The remainder is in A. If the signs of the divisor and dividend are the same,
the the quotient is in Q; otherwise the correct quotient is the two's complement
ofQ.
108 Chapter 5. Combinational Circuits

5.10 Magnitude Comparator


One of the basic operations in computation is comparing two integer numbers. Let
a and b be two integers. We have to consider three cases a > b, a = b and a < b. For
combinational circuits the comparison is done bit by bit. For example, 5 in binary
is 0101b and 3 in binary is 0011b. First we compare the most significant bits of 5
and 3. Both are O. Thus we move to the next bit (from left to right). For 5 the bit
is set, namely 1 and for 3 we have O. Thus 5 > 3.

In CMOS the 4585 is a four-bit magnitude comparator which compares two 4-bit
words (A and B), whether they are 'less than', 'equal to' or 'greater than'. Each
word has four parallel inputs (Ao to A 3) and (Bo to B 3); A3 and B3 being the most
significant inputs. Three outputs are provided. A greater than B (0A>B), A less
than B (OA<B) and A equal to B (OA=B). Three expander inputs (IA>B, 1A<B and
1A=B) allow cascading of the devices without external gates. For proper comparison
operation the expander inputs to the least significant position must be connected as
follows:
1A<B = LOW
For words greater than 4-bits, units can be cascaded by connecting output 0 A<B and
oA=B to the corresponding inputs of the next significant comparator (input 1A>B is
connected to a HIGH). Operation is not restricted to binary codes, the devices will
work with any monotonic code. Table 5.5 displays the truth table for the CMOS
4585. The following notation is used. H=HIGH state (the more positive voltage),
L=LOW state (the less positive voltage), X = state is immaterial. The upper 11
lines describe the normal operation under all conditions that will occur in a single
device or in a serial expansion scheme. The lower 2 lines describe the operation
under abnormal conditions on the cascading inputs. These conditions occur when
the parallel expansion technique is used. The circuit consists of 8 XNOR gates and
one NAND gate.

In CMOS the 74LV688 is an 8-bit magnitude comparator. It takes two 8-bit numbers
provided by the inputs Po to P7 and Qo to Q7. The output is

P=Q.

Table 5.6 shows the function table for the CMOS 74LV688 and Figure 5.14 the logic
diagram for the CMOS 74LV688.
5.10 Magnitude Comparator 109

Comparing inputs Cascading inputs Outputs


Aa,Ba A2,B2 AI,BI Ao,Bo IA>B IA<B IA=B OA>B OA<B OA=B
Aa > Ba X X X H X X H L L
Aa < Ba X X X X X X L H L
Aa =Ba A2 > B2 X X H X X H L L
Aa = Ba A2 < B2 X X X X X L H L
Aa =Ba A2 =B2 AI> Bl X H X X H L L
Aa = Ba A2 = B2 Al < Bl X X X X L H L
Aa = Ba A2 = B2 Al = Bl Ao > Bo H X X H L L
Aa = Ba A2 = B2 Al = Bl Ao < Bo X X X L H L
Aa = Ba A2 = B2 Al = Bl Ao = Bo X L H L L H
Aa =Ba A2 = B2 Al = Bl Ao = Bo H L L H L L
Aa = Ba A2 = B2 Al = Bl Ao = Bo X H L L H L
Aa =Ba A2 = B2 Al = Bl Ao = Bo X L H L H H
Aa = Ba A2 = B2 Al = Bl Ao = Bo L L L L L L

Table 5.5: Truth Table for the CMOS 4585

Inputs Output
Data Enable
Pn,Qn E P=Q
P=Q L L
X H H
P>Q L H
P<Q L H

Table 5.6: Function Table for CMOS 74LV688


Po Qo PI Ql P2 Q2 P3 Q3 P4 Q4 P5 Q5 P6 Q6 P7 Q7 E

&

Figure 5.14: Logic Diagram for the CMOS 74LV688


110 Chapter 5. Combinational Circuits

5.11 4-Bit ALU


The Arithmetic logic unit (AL U) is responsible for the arithmetic and logic oper-
ations discussed so far. Typically some input lines are used to select the required
functionality. The ALU combined with memory (such as registers) and control logic
is essentially all that is required for a classical computer. In CMOS the MC14581B
is a 4-bit ALU capable of providing 16 functions of two Boolean variables and 16
binary arithmetic operations on two 4-bit words. The level of the mode control
input determines whether the output function is logic or arithmetic. The desired
logic function is selected by applying the appropriate binary word to the select in-
puts (So thru S3) with the mode control input HIGH, while the desired arithmetic
operation is selected by applying a LOW to the mode control input, the required
level to carry in, and the appropriate word to the select inputs. The word inputs
and function outputs can be operated on with either active high or active low data.
The arithmetic functions interpret the input words as two's complement numbers.
As noted in the table, when Cn is opposite to the given value, the result is the given
arithmetic function plus 1 because Cn is interpreted as a carry bit.

Carry propagate (F) and carry generate (G) outputs are provided to aHow a full
look-ahead carry scheme for fast simultaneous carry generation for the four bits in
the package. Fast arithmetic operations on long words are obtainable by using the
MC14582B as a second order look ahead block. An inverted ripple carry input (Cn )
and a ripple carry output (Cn +4) are included for ripple through operation.

When the device is in the subtract mode (LHHL), comparison of two 4-bit words
present at the A and B inputs is provided using the A = B output. It assumes a
high-level state when indicating equality. Also, when the ALU is in the subtract
mode the Cn +4 output can be used to indicate relative magitude as shown in this
table

Data level Cn Cn +4 Magnitude


Active High H H A:::;B
H H A:::;B
L H A<B
H L A>B
L L A;:::B
Active Low H H A:::;B
L L A:::;B
H L A<B
L H A>B
H H A;:::B
5.11 4-Bit ALU 111

The truth table is as follows.

Inputs/Outputs Inputs/Outputs
Function Select
Active Low Active High
Logic Arithmetic' Logic Arithmetic'
83 82 81 80 Function Function Function Function
(MC=H) (MC=L, Cn=L) (MC=H) (MC=L, Cn=H)
L L L L A A minus 1 A A
L L L H A·B A·B minus 1 A+B A+B
L L H L A+B A·B minus 1 A·B A+B
L L H H Logic 1 minus 1 Logic 0 minus 1
L H L L A+B A plus (A+B) A·B A plus A·B
L H L H B A· B plus (A+B) B (A + B) plus A . B
L H H L AEJlB A minus B minus 1 AEJlB A minus B minus 1
L H H H A+B A+B A·B A· B minus 1
H L L L A·B A plus (A+B) A+B A plus A· B
H L L H AEJlB A plus B AEJlB A plus B
H L H L B A . B plus (A + B) B (A + B) plus A . B
H L H H A+B A plus B A·B A· B minus 1
H H L L Logic 0 A plus A Logic 1 A plus A
H H L H A·B A· B plus A A+B (A+B) plus A
H H H L A·B A- B plus A A+B (A+B) plus A
H H H H A A A A minus 1

The * indicates that the inputs are expressed in two's complement form. For arith-
metic functions with Cn in the opposite state, the resulting function is as shown
plus 1.
Thus, for active high inputs, the basic logic functions are achieved with the following
selections and M C = H.

83 82 81 80 Logic Function
L L L L NOT
H L H H AND
H H H L OR
L H L L NAND
L L L H NOR
L H H L XOR

The basic arithmetic operations of addition, subtraction, increment and decrement


are provided when Me = L.

83 82 81 80 Cn Arithmetic Function
H L L H L Addition
L H H L H Subtraction
L L L L L Increment
H H H H L Decrement
112 Chapter 5. Combinational Circuits

5.12 Read Only Memory (ROM)


Another frequently encountered purely combinational circuit is the read-only mem-
ory. Its structure is such that any Boolean function of n variables, where n is the
number of inputs, can be generated. Since complicated functions are usually not
perceived as functions, but rather as individual values corresponding to the possible
combinations of input values, the device is called a memory rather than a function
generator.

A ROM essentially consists of a decoder of the binary encoded input number, called
the address, an array of OR gates, and a set of output drivers. The decoder yields
a selector signal for each input value, addressing each cell .

..--
o
1
- 2
- 3
- 4
5
6
'----
7

2':1 2':1 2':1 2':1 2':1 2':1 2':1 2':1

Do

Figure 5.15: Logic Diagram for a ROM

As an example when the combination A2 = 1, Al = 1 and Ao = 0 the decoder


selects line 6 which means that the output lines give the bit string

which is the binary representation of 31. The ROM can be used to speed up certain
tasks. For example it can store multiplication tables. Of course ROMs can also be
used to store identification strings or any other data.
5.13 Combinational Programmable Logic Devices 113

5.13 Combinational Programmable Logic Devices


In the previous section the ROM was introduced. To make circuit design easier more
flexible approaches are available. Many hardware implementations implement the
SOP form (disjunctive normal form) of an expression directly. The circuit designed
determines the connections to make for a given configuration of gates (for example
selecting the inputs to the OR gates in the ROM configuration).

A programmable gate is one where the inputs to the gate can be selected from
a given set of inputs (for example from other gates). If no inputs are selected
we assume that all inputs are O. We introduce a new notation to simplify circuit
representation. A cross x indicates a programmable connection to a gate, in other
words a connection which can be removed (for example a fuse that can be burnt
open). A dot. indicates a fixed connection. The following figure shows an AND
gate with programmable inputs (the inputs from Ao and A2 can still be removed)
and an OR gate with two fixed inputs.

Ao ·1· A2

Figure 5.16: Input Representation for Programmable Gates

In the following examples we use two inputs, four AND gates and one OR gate for
the output. In general, for n inputs and m outputs, 2n AND gates and m OR gates
are required. One way to implement a programmable AND gate is to have an AND
gate with 2n inputs (for an input and its inverse) and to set the input to 1 whenever
an input is not connected. Similarly for the OR gate an input can be set to o. A
special case is when no input is selected, the output of the gate must be zero (as
if the gate is not present). In this case we set all inputs to the gate to o. In this
way gates with a fixed number of inputs can be used as programmable gates. For
each architecture we show the circuit before programming and after programming
the XOR operation.
114 Chapter 5. Combinational Circuits

PROM stands for programmable read only memory. These devices consist of a
number of fixed AND gates (fixed in input) and programmable OR gates. The
AND gates are over all possible inputs. For an n variable system there are 2n AND
gates. All connections are initially closed, the unwanted connections are then burnt
by applying a voltage to the appropriate inputs. Once a PROM is programmed
it cannot be reprogrammed. The EPROM or erasable PROM can be erased (all
connections are closed). The EEPROM is an electrically erasable PROM.

Ao~----~--~----~--~­

Al~----~--~----~--~-

Figure 5.17: PROM Device Architecture

Ao~----4-----~--~----+­

Al~----4---~~---r----+-

Figure 5.18: PROM Implementation of XOR


5.13 Combinational Programmable Logic Devices 115

PAL stands for programmable array logic. A number of programmable AND gates
feed fixed OR gates in these devices. The AND gates represent the product forms
of the desired expression's SOP form. Specific AND gates are dedicated to specific
OR gates.

GAL stands for generic array logic. They are used to emulate PALs. Different types
of PALs can then be replaced with a single device type (the GAL device).

Ao~----~--~~--~----*­

Al~----*---~--~~--~

Figure 5.19: PAL Device Architecture

Ao~----~--~~---+----+­

Al~----~----~---+----+-

Figure 5.20: PAL Implementation of XOR


116 Chapter 5. Combinational Circuits

PLA stands for programmable logic array. These devices provide the greatest
programming flexibility through the use of programmable AND gates feeding pro-
grammable OR gates. Any AND gate can feed any OR gate.

Ao~----*---~--~~--~

Al~--~~--~--~~--*-

Figure 5.21: PLA Device Architecture

Ao~----+---~--~~--~

Figure 5.22: PLA Implementation of XOR


5.14 Programmable Gate Arrays 117

5.14 Programmable Gate Arrays


A programmable gate array (PGA) consists of an array (or matrix) of cells. Each
cell's function and connections can be selected. Two kinds of field programmable
gate arrays (FPGA) are commonly used. The first kind consists of cells which are
loadable and erasable as a whole like an EPROM (i.e. all the cells are erased and
written at the same time). The second type consists of cells which are not modifiable
but the connections between them are. For example, suppose a cell can implement
any of the 16 possible Boolean function of two variables. The cell can be programmed
using a multiplexer to select the inputs used and a second multiplexer to determine
the output. Of course, a multiplexer to select only the output function (where all
16 functions are implemented separately) is also possible. In the following figure the
input multiplexers are assumed to be fixed (until reconfigured) and selection signals
are not shown

x
Figure 5.23: Example of a Combinational FPGA Cell

To design a circuit using FPGA the cell functions must be specified as well as the
connections between cells. Determining which connections must be closed is called
routing. For example a FPGA may have a grid pattern where the output can be
connected to four adjacent cells. Each outward going arrow is a duplicate of the
function output. Each input arrow can be configured to be closed or open.

Figure 5.24: Grid Pattern for PGA Design


118 Chapter 5. Combinational Circuits

5.15 VHDL
VHDL [151] is a standardized language that is not tied to any single tool vendor
or hardware manufacturer. It is a complete programming language with built-in
mechanisms to handle and synchronize parallel processes, and also supports abstract
data types and high level modelling. The IEEE adopted VHDL as a standard in
1987.

VHDL was initially intended to describe digital electronics systems. It can be used
to model existing hardware for verification and testing, and also for synthesis.

The following code example is a VHDL description of a 4-bit equality comparator.


The data types bit and bit_vector and are basic data type in VHDL. The as-
signment operator is <= and the comparison operator is =. A comment is preceded
by --, everything following on the same line is part of the comment. The entity
declaration describes the communication mechanism. It describes pins (port) used
for input and output. In the example below a and b represent 2 four bit inputs.
Thus a(O) refers to the first bit of a and b (3) refers to the last bit of b. We re-
fer to the logic values as '0' and '1'. The data type bi L vector uses literals in
double quotes, for example "0010". The architecture declaration describes the
functionality. In this case the architecture dataflow which is associated with the
entity eqcomp4. It ensures that the output pin equals is always defined.

-- eqcomp4.vhd

-- eqcomp4 is a four bit equality comparator


entity eqcomp4 is
port (a, b in bit_vector(3 downto 0);
equals: out bit); -- equals is active high
end eqcomp4;

architecture dataflow of eqcomp4 is


begin
equals <= '1' when (a = b) else '0';
end dataflow
Chapter 6
Latches and Registers

6.1 Introduction
The combinational circuits introduced so far perform a function and, except for
the ROM, do not provide any memory. The ROM provides a static memory, the
content is predetermined. A system providing dynamic memory is required to store
data which cannot be predetermined. Any two state (bistable) system which can be
dynamically controlled will provide this function. Many systems acting in parallel
can provide the required data width for operations provided by combinational cir-
cuits. The bistable systems are called latches and the parallel combination registers.
Combinational circuits are free of loops. In this chapter we examine circuits with
feedback loops. This is what allows them to store information. Propagation delays
are important in the analysis of these circuits.

The following chapter introduces mechanisms for an external source of timing. The
timing system helps describe the logic functions of these circuits under specific
conditions.

In this chapter we discuss the SR latch and J K latch which use two inputs. One
sets the logical value of the latch to 0 and the other sets the logical value of the latch
to 1. The D latch has only one input and remembers the logical value of the input
for one time interval. The D register and J K register use D latches and J Klatches
respectively, to provide the same logical action as the latches, but with different
physical characteristics.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
120 Chapter 6. Latches and Registers

6.2 SR Latch
This circuit has two inputs labelled S (Set) and R (Reset), and two outputs Q and
Q, and consists of two NOR gates connected in a feedback arrangement.

R
Q

Figure 6.1: Logic Diagram for the SR Latch

The following table summarizes the characteristics of the operation of the latch.
The Sand R should not both be set at the same time as this gives an undetermined
value for Q.

St Rt Qt Qt+l
0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 -
1 1 1 -

Table 6.1: Characteristic Table for the SR Latch

The circuit is stable when S = R = 0 (Qt+l = Qt). The output is time dependent
and there is a delay from the time that one of S or R are set to one and the time
when the circuit is stable again. If S = 0 and R = 1 the system is reset. If S = 1
and R = 0 the system is set. The logical equation for Qt+l (if St and Rt are not 1
at the same time) is
6.3 D Latch 121

6.3 D Latch
The input S = R = 1 must be avoided when using the SR Latch. The D latch
overcomes this by using only a single D input. The ouput is always the same as the
last D input.

D-4----j

Figure 6.2: Logic Diagram for the D Latch

The D latch is sometimes called the data latch because it stores 1 bit of information.
It is also called the delay latch because it delays the output of the 0 or 1 (in an
environment where a CLOCK input is provided, the delay is one clock cycle). The
characteristic table for the D latch is as follows:

D Qt+l
0 0
1 1

Table 6.2: Characteristic Table for the D Latch

The latch described above is called transparent since the output Q is the same as
the input D. An extra input can be introduced to indicate when to set the output
identical to the given input.

G-+--....

Figure 6.3: Logic Diagram for the D Latch with Enable

The G input is called an enable input.


122 Chapter 6. Latches and Registers

6.4 JK Latch
The JK latch takes two inputs. Unlike the SR latch all input combinations are
valid. The J input performs the set function while the K input performs the reset
function. When J = K = 1 the toggle function is performed (the outputs are
inverted).

Figure 6.4: Logic Diagram for the J KLatch

The characteristic table describes the functionality of the circuit.

Jt K t Qt QHl
0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 0

Table 6.3: Characteristic Table for the J KLatch

The logic equation for QHl is


6.5 D Register 123

6.5 D Register
The transparency of a D latch can be undesirable. It may be preferable to accept
an input value upon the rising edge of a control signal, and retain the stored value
before and after the transition. Latches are level-sensitive whereas registers are
edge-sensitive. An example implementation of this is the master-slave latch pair. It
consists of two latches connected in series with each enable input the inverse of the
other. This separates the storage of the input D and the output Q. The boxes with
the symbols D, G and Q represent D latches.

c:~Q
CK

Figure 6.5: D Register Using Two D Latches

This is called an edge-triggered D register. Typically C K is a periodical signal. The


signals C K and C K must never be active at the same time to guarantee nontrans-
parency. The output of a transparent D latch can be used for this purpose. The
following figure shows a D-Master-Slave register.

Figure 6.6: Logic Diagram for the D Register


124 Chapter 6. Latches and Registers

6.6 JK Register
Similar to the D register, the principle for the J K register is based on the J Klatch.
A master-slave configuration can again be used to implement this register.

J
~Q K ~Q ~Q
K=O=O=K J J

CK
CK"-------'

Figure 6.7: JK Register Using Two JK Latches

Each J K latch has two additional AND gates for each input where the appropriate
CK or CK is the second input to each AND gate. This construction has the same
purpose as the G input in a D latch. A variation on the register is to use the Q and
Q feedback loops directly to the first input and not for each latch. The following
figure shows a JK-Master-Slave register.

Figure 6.8: Logic Diagram for the J K Register


Chapter 7
Synchronous Circuits

7.1 Introduction
Circuits that react immediately to the stimulus of the input are called asynchronous.
This term is a combination of the greek words meaning "without regard to time" .
In digital systems it is important that outputs change at precise points in time.
Circuits that operate in this manner are called synchronous. Digital circuits often
use time reference signals called clocks. A clock signal is nothing more than a
square wave that has a precise known period. The clock will be the timing reference
that synchronizes all circuit activity and tells the device when it should execute
its function. Thus the clock signal is the signal that causes things to happen at
regularly spaced intervals. In particular, operations in the system are made to take
place at times when the clock signal is making a transition from 0 to 1 or from
1 to O. These transitions are pointed out in the figure. The O-to-l transition is
called the rising edge or positive-going edge of the clock signal. The synchronous
action of the clock signal is the result of using clocked latches, which are designed to
change states on either (but not both) the rising edge or the falling edge of the clock
signal. In other words, the clocked latches will change states at the appropriate
clock transition and will rest between successive clock pulses. The frequency of the
clock pulses is generally determined by how long it takes the latches and gates to
respond to the level changes by the clock pulse, that is, the propagation delays of
the various logic circuits.

1
o
--> Time

Figure 7.1: Example Clock Signal

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
126 Chapter 7. Synchronous Circuits

Many ways of designing and controlling latches have evolved over the years. They
differ not only in their logic design but also how they use the clock signal. Let
us consider a latch. During the period tl : t2 when the clock is enabled C = 1,
any change made to the data signal may enter the latch immediately. After some
propagation delay, these changes affect the latch's data output Q (and also Q) during
the period t3 : t 4. Thus, ignoring the brief and somewhat uncertain transition
periods when the data and clock signals are actually changing values, the latch
responds to all input changes that occur when C is at the inactive 1 level. For this
reason latches are said to be level sensitive or level-triggered.

t} Data input tJ tl
I changes accepted I

Clock 6I I I
I I
I Output Q I
.. may change ...
t3 t4 t3

Figure 7.2: Level Sensitive Latch

To obtain latch behavior, we must ensure that the period tl : t2 (when input data
changes are accepted) and the period t3 : t4 (when the output data changes) do not
overlap. One way a latch can meet this requirement is by accepting input changes
when C = 1, and changing its output when C = o. This pulse mode of operation
was used in some early designs for bistables. The clocking method most commonly
used in modern latch design is edge triggering, in which a transition or edge of the
clock signal C causes the actions required in tl : t2 and t3 : t4 to take place, as
shown in the figure.

Data input
changes accepted
T"gge'in~t; t2

Clock 6I,--.tl::~=:::-'--_""&-",,,-,~-

Output Q
may change

Figure 7.3: Edge Triggered Latch


7.2 Shift Registers 127

7.2 Shift Registers


Shift registers are classed as sequential logic circuits, and as such they are con-
structed from latches. Thus shift registers are chains of latches which allow data
applied to the input to be advanced by one latch with each clock pulse. After pass-
ing through the chain, the data is available at the output with a delay but otherwise
is unchanged. Shift registers are used as temporary memories and for shifting data
to the left or right. Shift registers are also used for changing serial to parallel data
or parallel to serial data.

Identification of shift registers may be made by noting how data is loaded into and
read from the storage unit. In the following figure we have a register 8 bits wide.
The registers are classified as:

1. Serial-in serial-out

2. Serial-in parallel-out

3. Parallel-in serial-out

4. Parallel-in Parallel-out

Serial-in Serial-out Serial-in


10 11 --'U 10 1111 f-.- 1011 1011 ---111011111
HH
1011
'--v---'
Parallel out

(1) Serial-in serial-out (2) Serial-in parallel-out

Parallel in Parallel in
~ ~
1011 101 1
++++ Serial-out
IlIOI1l1f-.- 1011
~10 11
'--v---'
Parallel out
(3) Parallel-in serial-out (4) Parallel-in parallel-out

Figure 7.4: Types of Shift Registers


128 Chapter 7. Synchronous Circuits

A simple four-bit shift register is displayed in the following figure. It uses four D-
latches. Data bits (Os and Is) are fed into the D input of latch 1. This input is
labelled as the serial data input. The clear input will reset all four D latches to 0
when activated by a LOW. A pulse at the clock input will shift the data from the
serial-data input to the position A (Q of latch 1). The indicators (A, B, C, D) across
the top of the figure show the contents of the register. This register can be classified
as a serial-in parallel-out unit if data is read from the parallel inputs (A, B, C, D)
across the top.

Parallel-data output indicators


A

Serial-data D Q
CLK CLK CLK CLK
CLR CLR CLR CLR
Clear -+--+---+--+-------1--+--+-----'
Clock -4-_ _ _--+---_ _---<>--_ _.....J

Figure 7.5: Logic Diagram of a 4-bit Serial-Load Shift-Right Register

In CMOS the 4014B is a fully synchronous edge-triggered 8-bit static shift regis-
ter with eight synchronous parallel inputs, a synchronous serial data input, a syn-
chronous parallel enable, a LOW to HIGH edge-triggered clock input and buffered
parallel outputs from the last three stages.
7.3 Binary Counter 129

7.3 Binary Counter

Latches can be connected in various arrangements to function as binary counters


that count input clock pulses. A wide variety of counters are available as standard
integrated-circuit packages and it is seldom necessary to construct a counter from
individual latches. We review the external operating characteristics of currently
available IC counters.

Next we discuss the basic counter operation. The Figure 7.6 shows the schematic
representation of a 4-bit counter. This counter contains four latches, one per bit,
with outputs labelled A, B, C, and D. Two inputs are shown, the clock pulse
input, CP, and Reset. The counter operates such that the states of the four latches
represent a binary number equal to the number of pulses that have been applied to
the C P input. The diagram shows the sequence which the latch outputs follow as
pulses are applied. The A output represents the LSB (least significant bit) and D
is the MSB (most significant bit) of the binary count. For example, after the fifth
input pulse, the outputs DCBA = 0101, which is the binary equivalent of 5. The
C P input has a small circle and triangle to indicate that the latches in the counter
change states on the negative going edge of the clock pulses. Counters that trigger
on positive-going edges are also available and they do not have the circle on the C P
input.

Figure 7.6: Four-Bit Binary Counter

D C B A
0 0 0 0 Before 1st input pulse
16 Different 0 0 0 1 After 1st input pulse
possible 0 0 1 0 After 2nd input pulse
states 0 0 1 1 After 3rd input pulse

1 1 1 1 After 15th input pulse


Sequence 0 0 0 0 After 16th input pulse
repeats 0 0 0 1 After 17th input pulse

Table 7.1: Counting Sequence


130 Chapter 7. Synchronous Circuits

In general, a counter with N latches can count from 0 up to 2N - 1, for a total


of 2N different states. The total number of different states is called the counter's
MOD number. The counter in the Figure is a MOD-16 counter. A counter with N
latches would be a MOD-2N counter. Some IC counters are designed so that the
user can vary the counting sequence through the appropriate external additional
logic connections. These are usually referred to as variable-MOD counters.

In addition to counting pulses, all counters can perform frequency division. This is
illustrated in the following figure for the 4-bit, MOD-16 counter. The state of the
A output is seen to change at a rate exactly ~ that of the CP input. The C output
is ~ that of the A output and ~ of the CP input. The C output is ~ the frequency
l
of the B output and the input frequency, and the D output is ~ the frequency of
-k
the C output and the input frequency. In general, the waveform out of the MSB
latch of a counter will divide the input frequency by the MOD number.

CP

CP
A

B-------'

c------'

D------------'

Figure 7.7: Counter Waveforms Showing Frequency Division

The counters described above can count up from zero to some maximum count and
then reset to zero. There are several IC counters that can count in either direction
and are called up/down counters. The following figure shows the two basic up/down
counter arrangements. The counter in this Figure has a single CP input that is
used for both count-up and count-down operations. The UP/DOWN input is used
to control the counting direction. One logic level applied to this input causes the
counter to count up from 0000 to 1111 as pulses are applied to CPo The other logic
level applied to UP/DOWN causes the counter to count down from 1111 to 0000
as pulses are applied to CPo The second counter does not use and UP/DOWN
control input. Instead, it uses separate clock inputs CPu and CPD for counting up
and down, respectively. Pulses applied to CPu cause the counter to count up, and
pulses applied to CPD cause the counter to count down. Only one CP input can
be pulsed at one time, or erratic operations will occur.

In CMOS the 4516 is an edge triggered synchronous up/down 4-bit binary counter
with a clock input and an up/down count control input.
7.3 Binary Counter 131

CP CPu
Up/Down CPD

Figure 7.8: Representation of Two Types of Up/Down Counters

An example of a mod-4 ripple counter implemented with clocked J K latches is given


below. The J K latch was chosen for the simple toggle capability.

1 J Q
CLK CLK
1 K 1 K

Figure 7.9: 2-bit Binary Ripple Counter

The C LK input of the second latch is driven by the output of the first latch. The
C LK input of the first latch is driven by an external clock signal. Every second
toggle action of the first latch will cause the the second latch to toggle. The output
A is the least significant bit and B is the most significant bit of the binary counter.

This ripple action of one latch depending on the output of the previous latch can
necessitate potentially large clock cycles, due to propagation delays. To avoid this
lag, latches can be updated in parallel. The latches are driven by the same external
clock at their CLK inputs. This is illustrated below in another mod-4 counter.

1 J Qr---+----I J Q

CLK CLK
1 K K

Figure 7.10: 2-bit Binary Parallel Counter


132 Chapter 7. Synchronous Circuits

VHDL can also be used to simulate a synchronous circuit. For example, consider

A
B
----..----1 ~1 1------- Y

D Q X

eLK ----------~

Figure 7.11: Synchronous Circuit Specified in VHDL Program

The VHDL program is given below.


-- simple.vhd

entity simple is
porteA, B, eLK: in bit;
X, Y out bit);
end simple;

architecture break_out of simple is


begin
Y <= A or B;

pl: process begin


wait until eLK = '1';
X <= A xor B;
end process;
end break_out;
7.4 Example Program 133

7.4 Example Program


For the PIC16F8X processor, the rotate instruction can be used to do fast multipli-
cation. For example if we want to multiply two 8 bit numbers and store the result
in two 8 bit registers (HBYTE and LBYTE), we apply the instruction rotate right f
through carry, where f is an 8 bit register. For the PIC16F8X, RRF is the "rotate
right f through carry" instruction. The following diagram illustrates the operation.

The instruction BTFSC is the "bit test f and skip if clear" instruction. If the tested
bit of f is 0 the next instruction is skipped. Thus if the BTFSC is executed with the
operands STATUS and 0, the carry flag (STATUS register bit 0) is tested to determine
if the next instruction is executed. The instruction DECFSZ is the "decrement f and
skip if zero" instruction. The value of register f is decremented, and if the result is
zero the next instruction is skipped.

; multiply.asm
;*******************************************************************
Multiplies two 8 bit numbers
00000011 (decimal 3)
and
01100101 (decimal 101)
and stores the result (16 bits)
00000001 00101111 (decimal 303)
in LBYTE and HBYTE
LBYTE: 00101111
HBYTE: 00000001

RRL rotate right f through carry


The contents of register f are rotated
one bit to the right through the Carry Flag.

;*******************************************************************
PROCESSOR 16f84
INCLUDE "p16f84.inc"

; Variable Declarations
LBYTE EQU H'l1' variable at address Ox11 in SRAM
HBYTE EQU H'12' variable at address Ox12 in SRAM
COUNT EQU H'13' variable at address Ox13 in SRAM
NOA EQU H'20' first number at address Ox20 in SRAM
NOB EQU H'21' second number at address Ox21 in SRAM
134 Chapter 7. Synchronous Circuits

ORG H'OO'

Start
BSF STATUS, RPO
MOVLW B'llllllll'
MOVWF PORTA
MOVLW B'OOOOOOOO'
MOVWF PORTB
BCF STATUS, RPO

CLRF LBYTE
CLRF HBYTE
MOVLW 8
MOVWF COUNT

MOVLW B'OOOOOOll'
MOVWF NOA
MOVLW B'01100101'
MOVWF NOB

MOVF NOB, W
BCF STATUS, 0

LOOP
RRF NOA
BTFSC STATUS, 0
ADDWF HBYTE
RRF HBYTE
RRF LBYTE
DECFSZ COUNT, F
GOTO LOOP

MOVF HBYTE, 0
MOVWF PORTB
Stop GOTO Stop

END
Chapter 8
Recursion

8.1 Introduction
Recursion is a fundamental concept in mathematics and computer science. It is a
useful tool for simplifying solutions to problems. A recursive solution is possible if
a problem can be solved using the solution of a simpler problem of the same type
and a solution to the simplest of problems of the same type is known. A recursive
solution to a problem consists of

• A solution to a simplest problem of the same type (base problem or stopping


condition)
• A method to solve the problem if the solution to a simpler problem of the
same type is known

Let us now list some recursive structures. One of the most important recursive
structures are strings. The string manipulation functions can be implemented using
recursion, for example to find the length of a string and reverse a string. The
linear linked list is a recursive structure; it has a head followed by a linked list. An
example implementation of a recursive linked list is given later in the next chapter,
it allows lists to be copied, compared, searched and items to be inserted and deleted
recursively. Another structure which is recursive is the binary tree.

In mathematics, recursion is the name given to the technique of defining a function


in terms of itself. Any recursive definition must have an explicit definition for some
value or values of the argument(s), otherwise the definition is circular. Recursion
can also occur in another form if a process is defined in terms of subprocesses, one
of which is identical to the main process.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
136 Chapter 8. Recursion

For example, consider the double integral

b d

/ / f(x,y)dxdy.
a c

One method of evaluation is to write the double integral as a repeated integral

The evaluation of the outer integral requires us to know the value of the integrand
at selected points, and calculation of the integrand requires the evaluation of an
integral, so that the subprocess is the same as the main process.

Example. The set N~ is bijective with No. To see this we write the elements of N~
in a table.

(0,0) (0,1) (0,2) (0,3)


(1,0) (1,1) (1,2) (1,3)
(2,0) (2,1) (2,2) (2,3)
(3,0) (3,1) (3,2) (3,3)

We now write down the elements of this table by moving along the diagonals which
go from north-east to south-west, that is, we write them in the sequence
(0,0), (0,1), (1,0), (0,2), (1,1), (2,0), (0,3), (1,2), ...
Since there are (k+ 1) pairs (r,s) with r+s = k, we see that the pair (m,n) occurs
in the position

(m+n)(m+n+1)
1+2+ ... +(m+n)+m= 2 +m.

Hence we have a bijection f : N~ -+ No given by

1
f(m,n) = "2(m+n)(m+n+ 1) +m.
8.1 Introduction 137

We have two functions 9 and h from No -> No such that f-l(r) = (g(r), h(r)).
They are given by the following formulas. Find 8 E No so that

1 1
28(8 + 1) :S r < 2(8 + 1)(8 + 2).

Let m be

1
r - -8(8
2 + 1) .

Then m :S 8, and g(r) = m, and h(r) = 8 - m.

We can use f to obtain bijections

for all k using recursion. We define h to be the identity, and we define h by

If fk has been defined, then fk+l is defined by

The inverse of !k has as its components composites of 9 and h. For instance

f3- 1 (r) = (g(r), g(h(r)), h(h(r))).

Example. Let n = 0,1, ... and f(n) = 2n. Then we can find the recursive definition
as follows
f(n + 1) = 2n +1 = 2 . 2n = 2f(n)
Thus f(n + 1) = 2f(n) where f(O) = 1.

Example. Another typical example of recursion is the Fibonacci sequence given by

n = 0,1,2, ...

where Fo = Fl = 1.
138 Chapter 8. Recursion

Example. The Bessel functions In(x) are solutions of the linear second order dif-
ferential equation

n = 0, 1,2, ....

A recurrence formula for Bessel functions is given by

n = 0,1,2, ...

where

00 . x2j
Jo(x) = ~)-1)1 ITj _ (2k)2
1-0 k-l

Example. Given a first order differential equation

dy
dx = f(x,y(x)), y(xo) = Yo

where f is an analytic function of x and y. Formal integration yields

Jf(s,y(s))ds.
x

y(x)=Yo+
XO

Thus a recursive definition for an approximation of y(x) is given by

Jf(s, Yn(s))ds.
x

Yn+l(X) = Yo +
XO

This is known as Picard's method.


8.1 Introduction 139

As an example we consider

dx
dy = x + y, Xo = 0, y(xo) = 1.

The approximation at each step is given by

Yn+1(x) = 1+ j(S+Yn(S))dS.
o

The method yields a polynomial approximation after each step. Thus

Yo(X) = 1

Y3(X)
140 Chapter 8. Recursion

8.2 Example Programs


Example. The Towers of Hanoi problem illustrates the benefits of recursion very
well. The problem has an easy recursive solution. The problem is as follows

• There are 3 pegs A, Band C.

• There are n discs of different sizes.

• Initially all n discs are on peg A with the largest disc at the bottom and discs
decrease in size towards the top of the pile. If disc 1 is above disc 2 then disc
1 is smaller than disc 2.

• Only one disc may be moved at a time. A disc must be moved from the top
of a pile on one peg to the top of a pile on another peg. A larger disc may not
be placed on a smaller one.

• The task is to move all the discs from peg A to peg B.

If n=l we can move the disc from A to B. If n=2 we can move a disc from A to C,
A to B, C to B. This is the inspiration for the solution to the general problem. If
n > 2 move the pile of n - 1 discs from A to C, move the disc on A to B and move
the pile on peg C to peg B.

II hanoi.cpp

#include <iostream>

using namespace std:

void hanoi(unsigned long n,char A,char B,char C)


{
if (n==1)
cout « A « " -> " « B « endl:
else
{
hanoi(n-l,A,C,B):
cout « A « " -> " « B « endl;
hanoi(n-l,C,B,A):
}
}

void main(void)
{
cout « "Tower of Hanoi with 1 disc:" « endl:
hanoi(l, 'A', 'B', 'C');
cout « "Tower of Hanoi with 2 discs:" « endl;
hanoi(2,'A','B','C'):
cout « "Tower of Hanoi with 3 discs:" « endl;
8.2 Example Programs 141

hanoi(3,' A' ,'B' ,'e') ;


cout « "Tower of Hanoi with 4 discs:" « endl;
hanoi(4, 'A', 'B', 'e');
}

The output of the program is

Tower of Hanoi with 1 disc:


A -> B
Tower of Hanoi with 2 discs:
A -> C
A -> B
C -> B
Tower of Hanoi with 3 discs:
A -> B
A -> C
B -> C
A -> B
e -> A
C -> B
A -> B
Tower of Hanoi with 4 discs:
A -> C
A -> B
C -> B
A -> C
B -> A
B -> C
A -> C
A -> B
C -> B
C -> A
B -> A
C -> B
A -> C
A -> B
C -> B
142 Chapter 8. Recursion

Example. The number of multiplications involved in calculating an integer power of


a number can be reduced significantly with a recursive solution. Using the identity

n even
n odd

with a E Rand n E N and ~ is calculated using integer division (i.e. l~J). The
program power. cpp implements the solution.

II power.cpp

#include <iostream>
#include <iomanip>

using namespace std;

double power(double a,unsigned int n)


{
double power_ndiv2;

if(n == 0) return 1.0;

power_ndiv2 = power(a,n/2);
if(n )
return a*power_ndiv2*power_ndiv2;
return power_ndiv2*power_ndiv2;
}

void main(void)
{
cout « "13.4~0=" « power(3.4,0) « endl;
cout « "2~24=" « setprecision(9) « power(2,24) « endl;
cout « "3.1415~7=" « setprecision(9) « power(3.1415,7) « endl;
}

The output of the program is

13.4~O=1

2~24=16777216

3.1415~7=3019.66975
8.2 Example Programs 143

Example. If R is a relation on a set A and S is a sequence of elements from A then


the sequence S can be sorted. For R to be a relation we require

1. R ~ A x A.
We view the statement (a, b) E R with a, b E A as a proposition. We also
write (a, b) ERas aRb
2. aRb and bRc implies aRc
A fast sorting method would be to place the elements of S in a tree as they occur
in the sequence and traverse the tree to find the sorted sequence. Another fast
sorting algorithm called quicksort is implemented using recursion. The algorithm
first partitions the sequence around an element Si such that all elements on the
left of Si have the property sjRs i and all elements to the right of Si do not. The
next step is to sort each of the partitions, and we use use quicksort to do this (i.e.
recursively). The program qsort. cpp uses the function partition to partition the
sequence at each step of the qsort algorithm. This is the most important part of
the algorithm. The function takes an element of the array and rearranges the array
such that all elements before are less than the given element and all elements after
are greater than the given element.
II qsort.cpp

#include <iostream>
#include <string>

using namespace std;

II general definition of ordering R(t1,t2)


II returns >0 if t2 R t1, <=0 otherwise
template <class T>
void partition(T *array,int n,int (*R) (T,T) ,int &p)
{
II partition around the first element of the array
II any element could have been used.

II p is the index of the element around which the


II partition is made
II pe (declared below) points to the element after
II the second partition
int i = n-1, pe = 1;
T templ,temp2;

p=O;
while(i > 0)
{
if(R(array[p],array[pe]) > 0)
{
temp1 = array[p]; temp2 = array [p+1] ;
144 Chapter 8. Recursion

array [p++] = array [pe] ;11 put element in first partition


array[p] = tempi; II move element around which partition
II is made, one element right
if (pe-p > 0) II if the second partition is not empty
array[pe] = temp2; II move second partition one element right
}
pe++;
i--;
}
}

template <class T>


void qsort(T *array,int n,int (*R)(T,T»
{
int pelement;
if(n <= 1) return;

partition(array,n,R,pelement);
qsort(array,pelement,R);
qsort(array+pelement+l,n-pelement-l,R);
}

int less_int(int nl,int n2) { return nl>n2; }


int less_string(string nl,string n2) { return (nl>n2); }

void main(void)
{
int testl[9] = {1,5,3,7,2,9,4,6,8};
string test2[6] = {"orange","grape","apple","pear","banana","peach"};
int i;

qsort<int>(testl,9,less_int);
qsort<string>(test2,6,less_string);
for(i=0;i<9;i++) cout « test1[i] « " ,
II.

cout « endl;
for(i=0;i<6;i++) cout « test2[i] « " ,
II.

cout « endl;
}

The output of the program is

1 2 3 4 5 6 7 8 9
apple banana grape orange peach pear
8.2 Example Programs 145

Example. The Ackermann function f : No x No -+ N is defined as

f(n, m) := 1 m+l
f(n - 1,1)
ifn=O
if m = 0
f(n -1,f(n, m - 1)) otherwise

Thus f is defined recursively. Ackermann's function f is a total function; each


pair of numbers (n, m) yields a value f(n, m) of the function. We see that f(n, m)
depends only on the values of f(r,p) with r < n and p < m.

II acker.cpp
#include <iostream>

using namespace std;

unsigned long ackermann(unsigned long n,unsigned long m)


{
if(n==O) return m+1;
if(m==O) return ackermann(n-1,1);
return ackermann(n-1,ackermann(n,m-1»;
}

void main(void)
{
cout«"f (1, 1)="«ackermann(1, 1)«"
«"f (2, 1)="«ackermann(2, 1)«"
«"f (3, 1) ="«ackermann(3 , 1) «endl ;
cout«"f (1 ,2)="«ackermann(1, 2)«"
«"f(2,2)="«ackermann(2,2)«"
«"f(3,2)="«ackermann(3,2)«endl;
cout«"f (1 ,3)="«ackermann(1 ,3)«"
«"f(2,3)="«ackermann(2,3)«"
«"f(3,3)="«ackermann(3,3)«endl;
}

The output of the program is

£(1,1)=3 £(2,1)=5 £(3,1)=13


£(1,2)=4 £(2,2)=7 £(3,2)=29
£(1,3)=5 £(2,3)=9 £(3,3)=61
146 Chapter 8. Recursion

Example. The logistic map f : [0,1] -+ [0,1] is given by

f(x) = 4x(1 - x)

The map can be written as a difference equation

t = 0,1,2, ...

where Xo E [0,1] is the initial value. Thus we can implement the function to compute
Xt recursively. Of course it makes more sense to implement the function using
iteration [164].

II logistic.cpp

#include <iostream>

using namespace stdj

double logistic(unsigned int t,double xO)


{
double Xj
if(t==O) return xOj
x=logistic(t-1,xO)j
return 4.0*x*(1.0-x)j
}

void main(void)
{
cout « "x100 = " « logistic(100,O.3899)
« " when xO=O.3899" « endlj
cout « "x500 = " « logistic(500,O.5)
« " when xO=O.5" « endl;
cout « "x10000 = " « logistic(10000,O.8988i)
« " when xO=O.89881" « endlj
}

The output of the program is

x100 = 0.744501 when xO=0.3899


x500 = 0 when xO=0.5
x10000 = 0.311571 when xO=0.89881
8.2 Example Programs 147

Example. Homer's rule is used to reduce the number of multiplications in evaluating


a polynomial. Consider, for example the polynomial

where x, a5, a4, a3, a2, al and ao are given numbers. Finding P5 would involve
5 + 4 + 3 + 2 + 1 = 15 multiplications and 5 additions. Rewriting this in the form
(Horner's rule)

reduces the number of multiplications to five and we still have five additions. In
general, let
Pn(X) = anx n + an_lX n - 1 + ... + alx + ao
which can be rewritten as

Then we have n multiplications and n additions. The next program shows a non-
recursive implementation of Horner's rule in C++.

II horner1.cpp

#include <iostream>

using namespace std;

template <class T>


T peT x,const T *a,int n)
{
T s = a[n] ;
while(--n >= 0) s = s*x + a[n];
return S;
}

void main(void)
{
const double a[5] = { 1.0,0.5,0.0,-18.0,3.0 };

cout « "P(x) = 3x-4-18x-3+x/2+1" « endl;


cout « "P(O.O) = " « P(0.0,a,4) « endl;
cout « "P(-LO) = " « P(-LO,a,4) « endl;
cout « "P(5.0) = " « P(5.0,a,4) « endl;
}
148 Chapter 8. Recursion

A recursive implementation of Horner's rule is given in the next program.

II horner2.cpp

#include <iostream>

using namespace std;

template <class T>


T PCT x,const T *a,int n)
{
if Cn==O)
return a[O];
return a[0]+x*PCx,a+1,n-1);
}

void mainCvoid)
{
const double a[5] = { 1.0,0.5,0.0,-18.0,3.0 };

cout « "PCx) = 3x"4-18x"3+x/2+1" « endl;


cout « "PCO.O) = " « PCO.O,a,4) « endl;
cout « "pC-1.0) = " « PC-1.0,a,4) « endl;
cout « "PC5.0) = " « PC5.0,a,4) « endl;
}

The output of both programs is

p(x) = 3x"4-18x"3+x/2+1
p(O.O) = 1
PC-1.0) = 21.5
P(5.0) = -371.5
8.2 Example Programs 149

Example. The following Java program is used to construct a graphic pattern called
a Hilbert curve. Each curve Hi consists of four half-sized copies of Hi - 1 with a
different orientation. The Hilbert curve is the limit of this construction process,
i.e Hoo. Thus we can implement the methods AO, BO, CO and DO to draw the
four copies for each step in the construction of the Hilbert curve using recursion.
Lines are drawn to connect the four copies. For example, the first three steps in
constructing the Hilbert curve are given below.

Figure 8.1: First 3 Steps in the Construction of the Hilbert Curve

/ / Hilbert. java

import java.awt.*;
import java.awt.event.*;

public class Hilbert extends Frame


implements WindowListener, ActionListener
{
public Hilbert 0
{
addWindowListener(this);
drawButton.addActionListener(this);
setTitle("Hilbert");
Panel parameterPanel = new Panel();
parameterPanel.setLayout(new GridLayout(2,1»;
Panel nStepsPanel = new Panel();
nStepsPanel.add(new Label("no of steps = II»;
nStepsPanel.add(nStepsField);
Panel buttonPanel = new Panel();
buttonPanel.add(drawButton);
parameterPanel.add(nStepsPanel);
parameterPanel.add(buttonPanel);
add("North",parameterPanel);
add("Center" ,hilbertCurve);
setSize(400,400); setVisible(true);
}
150 Chapter 8. Recursion

public static void main(String[] args)


{ new Hilbert()j }

public void actionPerformed(ActionEvent action)


{
if(action.getSource() == drawButton)
hilbertCurve.setSteps(Integer.parselnt(nStepsField.getText(»)j
System.out.println(Integer.parselnt(nStepsField.getText(»)j
}

public void windowClosing(WindowEvent event)


{ System.exit(O)j }
public void windowClosed(WindowEvent event){}
public void windowOpened(WindowEvent event){}
public void windowDeiconified(WindowEvent event){}
public void windowlconified(WindowEvent event){}
public void windowActivated(WindowEvent event){}
public void windowDeactivated(WindowEvent event){}

TextField nStepsField = new TextField(15",5)j


Button drawButton = new Button(IDraw")j
HilbertCurve hilbertCurve = new HilbertCurve()j
}

class HilbertCurve extends Canvas


{
private int x, y, h, n, lenj

public HilbertCurve() { n = 5j }

public void A0
{
if(n > 0)
{
Graphics g = getGraphics()j n--j
D()j g.drawLine(x, y, x-h, y)j x-=hj
A()j g.drawLine(x, y, x, y-h)j y-=hj
A()j g.drawLine(x, y, x+h, y)j x+=hj
BO j n++j
}
}
public void B0
{
if(n > 0)
{
Graphics g = getGraphics()j n--j
C()j g. drawLine (x, y, x, y+h)j y+=hj
B()j g. drawLine (x, y, x+h, Y)j x+=hj
8.2 Example Programs 151

B(); g. drawLine (x, y, x, y-h); y-=h;


AO; n++;
}
}
public void CO
{
ifCn > 0)
{
Graphics g = getGraphics(); n--;
B(); g. drawLine (x, y, x+h, y); x+=h;
C(); g.drawLine(x, y, x, y+h); y+=h;
C(); g.drawLine(x, y, x-h, y); x-=h;
DO; n++;
}
}
public void DO
{
if(n > 0)
{
Graphics g = getGraphics(); n--;
A(); g. drawLine (x, y, x, y-h); y-=h;
D(); g.drawLine(x, y, x-h, y); x-=h;
D(); g.drawLine(x, y, x, y+h); y+=h;
CO; n++;
}
}

public void paint(Graphics g)


{
Dimension size = getSize();
h = 4*Math.min(size.width,size.height)/5;
x = size.width/2+h/2;
y = size.height/2+h/2;

for(int i=len=l;i<n;i++) len = 2*len+l;


h/=len; AO;
}

public void setSteps(int nSteps)


{ n = nSteps; repaint(); }
}
152 Chapter B. Recursion

8.3 Mutual Recursion


If the solution to a problem relies on the solution to another problem which in turn
relies on a simpler problem of the first type, a recursive solution can be implemented.
Mutual recursion refers to the recursive dependence of one solution on another. This
concept can be extended for more than two problems.

The Jacobi elliptic functions can be defined as inverse of the elliptic integral of first
kind [53J. Thus, if we write

x(rI.. k) -
'1', -
JVI
o
4>
ds
- k2'sm 2 s
(B.1)

where k E [O,IJ we then define the following functions

sn(x, k) := sin( ¢), cn(x, k) := cos(¢),

For k = 0 we obtain

sn(x,O) == sin(x), cn(x,O) == cos(x), dn(x,O) == 1 (B.3)


and for k = 1 we have

2
sn(x, 1) == tanh(x), cn(x,l) == dn(x, 1) == - - -X (B.4)
eX + e-

We have the following identities

( k) - 2sn(x/2, k)cn(x/2, k)dn(x/2, k)


sn x, = 1 _ k2sn4(x/2, k) (8.5)

(8.6)

(8.7)
8.3 Mutual Recursion 153

The expansions of the Jacobi elliptic functions in powers of x up to order 3 are given
by

x3
sn(x, k) = x - (1 + k2 ) , + ... (8.8)
3.

x2
cn(x, k) = 1 - , + ... (8.9)
2.

x2
dn(x,k) = 1- k2 , + ... (8.10)
2.

For x sufficiently small these will be good approximations.

We can now use the identities (8.5)-(8.7) and the expansions (8.8)-(8.10) to im-
plement the Jacobi elliptic functions using one recursive call. The recursive call in
scdn uses half of the provided parameter x. In other words the absolute value of
the parameter passed in the recursive call is always smaller (by 4). This guarantees
that for fixed f > 0 the parameter Ixl will satisfy Ixl < f after a finite number
of recursive calls. At this point a result is returned immediately using the poly-
nomial approximation (8.8)-(8.10). This ensures that the algorithm will complete
successfully. The recursive call is possible due to the identities for the sn, cn and dn
functions given in (8.5)-(8.7). Since the identities depend on all three functions sn,
cn and dn we can calculate all three at each step instead of repeating calculations
for each of sn, cn and dn [81]. Lastly some optimization was done to reduce the
number of multiplications used in the double angle formulas. We also use the fact
that the denominator of all three identities is the same.

The advantage of this approach is that all three Jacobi elliptic functions are found
with one function call. Furthermore the cases k = 0 and k = 1 include the sine,
cosine, tanh and sech functions. Obviously, for these special cases faster routines are
available. Elliptic functions belong to the class of doubly periodic functions in which
2K plays a similar role to 'if in the theory of circular functions, where K = F(I, k)
is the complete elliptic integral of first kind. We have the identities

sn(x±2K, k) == -sn(x, k), cn(x±2K, k) == -cn(x, k), dn(x±2K, k) == dn(x, k) .


154 Chapter 8. Recursion

To reduce the argument of the Jacobi elliptic functions we can also apply these
identities.

The recursion method described above can be implemented using C++ as follows.
The arguments to the function scdn are

• x, the first argument to sn, cn and dn.

• k2, the square of the second argument to sn, cn and dn.

• eps, the upper bound on the argument x for application of the Taylor expan-
sion approximation.

• s, a variable for the value of sn(x, k).

• c, a variable for the value of cn(x, k).

• d, a variable for the value of dn(x, k).

Using the implementation we calculate the sine, cosine, identity, hyperbolic tan,
hyperbolic sec and hyperbolic cosec functions for the value 3.14159.

I I jacobi. cpp

#include <iostream.h>
#include <math.h>

II forward declaration
void scdn(double,double,double,double&,double&,double&);

void main(void)
{
double x, k, k2, eps;
x = 3.14159;
eps = 0.01;

double resl,res2,res3;

cout « "x = " « x « endl;

II sin,cos,l of x
k = 0.0;
k2 = k*k;
scdn(x,k2,eps,resl,res2,res3);
cout « "sin(x) = " « resl « endl;
cout « "cos(x) = " « res2 « endl;
cout « "l(x) = " « res3 « endl;

II tanh,sech,sech of x
8.3 Mutual Recursion 155

k = 1.0;
k2 = k*k;
scdn(x,k2,eps,resl,res2,res3);
cout «"tanh(x) ,,« resl « endl;
cout « "sech(x) " « res2 « endl;
cout « "sech(x) " « res3 « endl;
}

void scdn(double x,double k2,double eps,double &s,double &c,double &d)


{
if(fabs(x) < eps)
{
double x2 = x*x/2.0;
s = x*(1.0 - (1.0 + k2)*x2/3.0);
c=1.0-x2;
d = 1. 0 - k2*x2;
}
else
{
double sh,ch,dh;

scdn(x/2.0,k2,eps,sh,ch,dh); II recursive call

double sh2 = sh*sh;


double sh4 = k2*sh2*sh2;
double denom = 1.0 - sh4;

s = 2.0*sh*ch*dh/denom;
c = (1.0 - 2. 0*sh2+sh4)Idenom;
d = (1.0 - 2.0*k2*sh2+sh4)/denom;
}
}
156 Chapter 8. Recursion

8.4 Wavelets and Recursion


The discrete wavelet transform (or DWT) is an orthogonal function which can be
applied to a finite group of data. Functionally, it is very much like the discrete
Fourier transform, in that the transforming function is orthogonal, a signal passed
twice through the transformation is unchanged, and the input signal is assumed to
be a set of discrete-time samples. Both transforms are convolutions. Whereas the
basis function of the Fourier transform is sinusoidal, the wavelet basis is a set of
functions which are defined by a recursive difference equation

M-l
</J(x) = L ck</J(2x - k)
k=O

where the range of the summation is determined by the specified number of nonzero
coefficients M. The number of the coefficients is not arbitrary and is determined
by constraints of orthogonality and normalization. Owing to the periodic boundary
condition we have
Ck := Ck+nM
where n E N. Generally, the area under the wavelet curve over all space should be
unity, i.e.

R
J ¢(x)dx = 1.

It follows that

M-l
L Ck = 2.
k=O

In the Hilbert space L 2 (R), the function ¢ is orthogonal to its translations; i.e.

J ¢(x)¢(x - k)dx = 0, k# o.
R

What is desired is a function 1jJ which is also orthogonal to its dilations, or scales,
i.e.,

J
R
1jJ(x)1jJ(2x - k)dx = o.
8.4 Wavelets and Recursion 157

Such a function 'IjJ does exist and is given by (the so-called associated wavelet func-
tion)
'IjJ(x) = L(-I)kc1 _k1>(2x - k)
k=l

which is dependent on the solution of 1>. Normalization requires that

which means that the above sum is zero for all m not equal to zero, and that the
sum of the squares of all coefficients is two. Another equation which can be derived
from the above conditions is

L(-I)kC1-kCk-2m = o.
k

A way to solve for 1> is to construct a matrix of coefficient values. This is a square
M x M matrix where M is the number of nonzero coefficients. The matrix is
designated L with entries
Lij = C2i-j·
This matrix has an eigenvalue equal to 1, and its corresponding (normalized) eigen-
vector contains, as its components, the value of the function 1> at integer values of
x. Once these values are known, all other values of the function 1> can be generated
by applying the recursion equation to get values at half-integer x, quarter-integer x,
and so on down to the desired dilation. This determines the accuracy of the function
approximation.
An example for 'IjJ is the Haar function

1 o:sx<~
-1 ~:Sx<1
o otherwise

and 1> is given by

O:Sx<1
1>(x) = { ~ otherwise

The functions
'ljJm,n(x) := 2- T 'IjJ(2- mx - n), m, n E Z
form a basis in the Hilbert space L2(R).
158 Chapter 8. Recursion

This class of wavelet functions is constrained, by definition, to be zero outside of a


small interval. This is what makes the wavelet transform able to operate on a finite
set of data, a property which is formally called compact support. The recursion
relation ensures that a wavelet function ¢ is non-differentiable everywhere. The
following table lists coefficients for three wavelet transforms. The pyramid algorithm

Wavelet CO Cl C2 C3 C4 C5
Haar 1.0 1.0
Daubechies-4 !(1+V3) !(3+V3) !(3-V3) !(I-V3)
Daubechies-6 0.332671 0.806891 0.459877 -0.135011 -0.085441 0.035226

Table 8.1: Coefficients for Three Wavelet Functions

operates on a finite set on N input data, where N is a power of two; this value will be
referred to as the input block size. These data are passed through two convolution
functions, each of which creates an output stream that is half the length of the
original input. These convolution functions are filters, one half of the output is
produced by the "low-pass" filter

. N
t = 0,1, ... , ~ - 1

and the other half is produced by the "high-pass" filter function

. N
z = 0,1, ... ,~ - 1

where N is the input block size, Cj are the coefficients, f is the input function, and
a and b are the output functions. In the case of the lattice filter, the low- and
high-pass outputs are usually referred to as the odd and even outputs, respectively.
In many situations, the odd or low-pass output contains most of the information
content of the original input signal. The even, or high-pass output contains the
difference between the true input and the value of the reconstructed input if it were
to be reconstructed from only the information given in the odd output. In general,
higher order wavelets (i.e. those with more nonzero coefficients) tend to put more
information into the odd output, and less into the even output. If the average
amplitude of the even output is low enough, then the even half of the signal may
be discarded without greatly affecting the quality of the reconstructed signal. An
important step in wavelet-based data compression is finding wavelet functions which
cause the even terms to be nearly zero.
8.4 Wavelets and Recursion 159

The Haar wavelet represents a simple interpolation scheme. After passing these data
through the filter functions, the output of the low-pass filter consists of the average
of every two samples, and the output of the high-pass filter consists of the difference
of every two samples. The high-pass filter contains less information than the low
pass output. If the signal is reconstructed by an inverse low-pass filter of the form

N/2-1

fl = L c2i-j+lai, j = 0,1, ... ,]V - 1


i=O

then the result is a duplication of each entry from the low-pass filter output. This is
a wavelet reconstruction with 2 x data compression. Since the perfect reconstruction
is a sum of the inverse low-pass and inverse high-pass filters, the output of the inverse
high-pass filter can be calculated. This is the result of the inverse high-pass filter
function

j = 0,1, ... ,]V - 1.

The perfectly reconstructed signal is

where each f is the vector with elements Ii- Using other coefficients and other orders
of wavelets yields similar results, except that the outputs are not exactly averages
and differences, as in the case using the Haar coefficients.

The following C++ program implements the Haar wavelet transform.


II wavelet.cpp

#include <iostream.h>
#include <math.h>

void mainO
{
const double pi = 3.14159;
int n = 16; II n must be a power of 2
double* f = new double[n];

II input signal
int k;
for(k=O; k < n; k++)
f[k] = sin(2.0*pi*(k+1)/n);
160 Chapter 8. Recursion

double* c = new double[nJ;


for(k=O; k < n; k++)
c[kJ = 0.0;

c[OJ = 1.0; c[lJ = 1.0;

double* a = new double[n/2J;


for(k=O; k < n/2; k++)
a[kJ = 0.0;
double* b = new double[n/2J;
for(k=O; k < n/2; k++)
b[kJ = 0.0;

int i, j;
for(i=O; i < n/2; i++)
{
for(j=O; j < n; j++)
{
if(2*i-j+l < 0) a[iJ += c[2*i-j+l+nJ*f[jJ;
else a[iJ += c[2*i-j+1J*f[jJ;
}
a[i] = 0.5*a[iJ;
}

for(i=O; i < n/2; i++)


{
for(j=O; j < n; j++)
{
if(j-2*i < 0) b[i] += pow(-1.0,j)*c[j-2*i+n]*f[j];
else b[i] += pow(-1.0,j)*c[j-2*i]*f[jJ;
}
b[i) = 0.5*b[iJ;
}

for(k=O; k < n/2; k++)


cout « "a[" « k « IIJ II « a[k) « endl;
for(k=O; k < n/2; k++)
cout « "b[" « k « II] II « b[k] « endl;

/linverse
double* fL = new double[n);
double* fH = new double[nJ;

for(j=O; j < n; j++)


fL[j) = 0.0;
for(j=O; j < n; j++)
fH[j] = 0.0;
8.4 Wavelets and Recursion 161

for(j=O; j < n; j++)


{
for(i=O; i < n/2; i++)
{
if(2*i-j+l < 0) fL[j] += c[2*i-j+l+n]*a[i];
else fL[j] += c[2*i-j+l]*a[i];
}
}

for(k=O; k < n; k++)


cout « IfL[" « k « II] II « fL[k] « end1;

for(j=O; j < n; j++)


{
for(i=O; i < n/2; i++)
{
if(j-1-2*i < 0) fH[j] += pow(-1.0,j)*c[j-1-2*i+n]*b[i];
else fH[j] += pow(-1.0,j)*c[j-1-2*i]*b[i];
}
}

for(k=O; k < n; k++)


cout « "fH [II « k « II] = II « fH [k] « endl;

II input signal reconstructed


double* g = new double[n];
for(k=O; k < n; k++)
g[k] = fL[k] + fH[k];

for(k=O; k < n; k++)


cout « "g[II « k « II] II « g [k] « endl;
}
162 Chapter 8. Recursion

8.5 Primitive Recursive Functions


Let No be the natural numbers including 0, i.e. {O, 1, 2, ... }. For a function to be
computable, there must be an algorithm or procedure for computing it. So in a
formal definition of this class of functions we must replace the intuitive, semantic
ideas with precise descriptions of the functions [48, 63, 77, 97, 115J.

To begin, we take as variable the letters n,XI,X2, .... We write x for (Xl, ... ,Xk).

Next we list the basic, incontrovertibly computable functions which we use as build-
ing blocks for all others.

zero function Z: No --t °


No Z(n) = for all n
successor function S: No --t No S(n) = n + 1
projection functions I1 : N~ --t No pt,(XI, ... , Xk) = Xi for 1 ::; i ::; k

These functions are called initial functions. We sometimes call the projections pick-
out functions, and PI the identity function, written id(x) = x.

Next, we specify the ways we allow new functions to be defined from ones we already
have.

Composition. If 9 is a function of m-variables and hI, ... , hm are functions of k


variables, which are already defined, then composition yields the function

f(x) = g(hl(x), .. . , hm(x)).


Primitive recursion. For functions of one variable the schema is

f(O) d
f(n + 1) h(f(n), n)

where d is a number and h is a function already defined.

For functions of two or more variables, if 9 and h are already defined then f is given
by primitive recursion on h with basis 9 as

f(O,x) d
f(n + l,x) h(f(n, x), n, x)

The reason we allow both n and x as well as f(n, x) to appear in h is that we may
wish to keep track of both the stage we are at and the input.
8.5 Primitive Recursive Functions 163

The primitive recursive functions are exactly those which are either an initial func-
tion or can be obtained from the initial functions by a finite number of applications
of the basic operations.

Example. The sum Xl + X2 is primitive recursive defined by

sum(O,X2) Pf( X2)


sum(xi + 1, X2) S(sum(xI' X2))
Example. The product XIX2 is primitive recursive defined by

product(O, X2) o
product(XI + 1, X2) sum(product(XI' X2), X2)
Example. The predecessor of X is primitive recursive defined by

pred(O) 0
pred(x + 1) X
Example. The function

Xl :::: X2
otherwise

is primitive recursive.

minus(O, X2) o
minus(xi + 1,x2) S(minus(xI' X2))
Example. We can now describe the mod function

mod(n m) = { n n<m
, mod(n - m,m) otherwise

The definition is as follows.

mod(O, m) o
mod(n+ I,m) k(mod(n, m), n, m)

k(p, n, m) = minus(S(p), product(m, minus (S(p) , m)))

The function k is a composition of primitive recursive functions (some of which are


compositions themselves) and is primitive recursive. Thus mod( n, m) is primitive
recursive. ,.
164 Chapter 8. Recursion

The Ackermann function f is not primitive recursive. It is obviously not an initial


function. Nor can it be defined for the case f(O, m) as a number independent of m.
Thus the only way the Ackermann function can be primitive recursive is if it is a
composition of primitive recursive functions. Since for m =I 0 and n =I 0 f relies on
an evaluation of f itself it cannot be a composition of primitive recursive functions.

Definition. A set C of total functions from Ng to No (for all n) is called primitive


recursively closed if it satisfies the following conditions.

1. all initial functions are in C


2. C is closed under primitive recursion, if f comes from 9 and h by primitive
recursion and g, h E C then f E C

3. C is closed under composition, if f comes from 9 and hI, ... ,hr by composition
(f(x) = g(hl(x), ... , hr(x))) and g, hI, ... , hr E C then f E C.
Theorem. The set of all primitive recursive functions is primitive recursively closed.
[48,63,97]

Theorem. Any primitive recursively closed set contains every primitive recursive
function. [48, 63, 97]

Definition. The definition of Jl-recursive is

• The primitive recursive functions are Jl-recursive.

• If

is Jl-recursive then
Jlf: N~ ----> No
defined by

Jlf:= min{ x I f(x,y) = 0 with f(u,y) defined for u:::; x}

is Jl-recursive. The function Jlf will be undefined if no such x exists.

• Functions defined by composition and primitive recursion of Jl-recursive func-


tions are also Jl-recursive.

In other words Jl acts as a minimalization operator in the sense that it maps to the
minimum x such that f(x, y) = o.

The Ackermann function is Jl-recursive.


8.6 Backtracking 165

8.6 Backtracking
A common approach to finding a solution, when no simple solution algorithm is
avaialable, is trial and error. Suppose a configuration is built up by a number of
well defined steps and then tested to see if it is a solution. If it is not a solution we
return to an earlier step and try a different option. Backtracking is the technique of
returning to an earlier stage in a solution process to make a different choice in the
attempt to find solutions.

Example. 8-Queens problem. A chessboard is 8 columns wide and 8 rows high. The
8 queens problem requires us to place 8 queens on the chessboard so that no queen
is attacking another. A queen attacks another if it is on the same row, column or
diagonal as the other queen. An example solution is

Q
Q
Q
Q
Q
Q
Q
Q

Figure 8.2: A Solution to the 8-Queens Problem

The recursive solution is to place a queen on a position which is not attacked, row
by row. If there is no position available for a queen, the algorithm return to the
previous row and moves the queen to the next position which is not attacked. To
check every possible placement of the 8 queens includes configurations which are
obviously incorrect and will also take a long time. This algorithm uses a technique
called pruning to reduce the number of configurations to test. We can form a tree
according to the square we place each queen in. The root of the tree corresponds to
an empty board. The branches from the root correspond to each possible placement
of a queen. By rejecting certain options earlier, the corresponding branches and
entire sub-trees are removed from consideration.
166 Chapter 8. Recursion

II queens.cpp

#include <iostream>

using namespace std;

const char QUEEN = 'Q';


const char SPACE = '#';

void printboard(char board [8) [8)


{
int i,j;
for(i=0;i<8;i++)
{
for(j=O ;j<8;j++)
cout « board[i) [j);
cout « endl;
}
cout « endl;
}

int attacking(char board[8) [8),int row,int col)


{
int i;
for(i=0;i<8;i++)
{
if«board[row) [i)==QUEEN)I I (board [i) [col)==QUEEN» return 1;
if«i+row-col>=0)&&(i+row-col<8»
if (board [i+row-col) [i]==QUEEN)
return 1;
if«col+row-i>=0)&&(col+row-i<8»
if (board [col+row-i) [i)==QUEEN)
return 1;
}
return 0;
}

void queens(char board [8) [8),int row)


{
int i;
if(row<O) return;
if(row>7) { printboard(board)jreturn; }
for(i=0;i<8;i++)
if(!attacking(board,row,i»
{
board[row) [i)=QUEEN;
queens(board,row+1);
board[row] [i]=SPACE;
8.6 Backtracking 167

}
}

void main(void)
{
char board [8] [8] ;
int i,j;

for(i=O;i<8;i++)
for(j=O; j<8; j++)
board[i] [j]=SPACE;
queens(board,O);
}

The program output is


Q#######
####Q###
#######Q
#####Q##
##Q#####
######Q#
#Q######
###Q####

Q#######
#####Q##
#######Q
##Q#####
######Q#
###Q####
#Q######
####Q###
168 Chapter 8. Recursion

8.7 Stacks and Recursion Mechanisms


8.7.1 Recursion Using Stacks
Most implementations of recursion rely on a stack which is a last-in first-out (LIFO)
storage mechanism. The stack has two operations, namely to push data onto the
top of a stack and pop data off the top of the stack. For a recursive function call
all the local data must be preserved. This is done by pushing the local data onto
a stack. After the recursive function call has completed, the local data is popped
off the stack into registers or other local storage. This stack can also be used to
return the result of a function call. Next we introduce two routines called CALL
and RETURN to implement the recursion. CALL has three arguments.

1. The address of the function to enter.

2. The address of the first register of workspace to be preserved.

3. The number of registers to be preserved.

The CALL routine pushes the specified registers and return address onto the stack,
and then transfers control to the specified function.
RETURN has two arguments.

1. The address of the first register of the work-space to be restored.

2. The number of registers to be restored.

The RETURN routine restores the specified registers from the stack and then returns
control to the return address on the stack. In shortened form

CALL (function, registers) : PUSH currentaddress


PUSH registers
GOTO function
RETURN (registers) POP into registers
POP into returnaddress
GOTO returnaddress

A simple recursive function would then be

function: IF (basecase) ... RETURN(registers)

CALL (function, registers)

RETURN (registers)
8.7 Stacks and Recursion Mechanisms 169

8.7.2 Stack Free Recursion


It is sometimes possible to convert a recursive function to an iterative one. The
stack is very important in implementing recursion. It allows the recursive function
to return to a previous state which it has stored. This is also possible if the function
to be computed has an inverse. The inverse allows an iterative routine to return to
previous values of the function evaluation.

Example. The Fibonacci sequence defined recursively by

n = 0, 1,2, ...

where Po = PI = 1 can be implemented iteratively by simply storing the previous


two function evaluations. In this case the inverse is given by

n = 0,1,2, ...

but it is not necessary to use this information.

II fibonacci.cpp

#include <iostream>

using namespace std;

void main(void)
{
int i;
unsigned long FO = 1,F1 1;
unsigned long temp;

for(i=O;i<10;i++)
{
cout « FO « " ";
temp = F1;
F1 = FO + F1;
FO = temp;
}
cout « endl;
}
170 Chapter 8. Recursion

To remove the dependence on a stack, the changes made to variables by a recursive


call must be reversible. This excludes some variables such as those used exclusively
for return values. If we consider the towers of Hanoi problem, the algorithm just
swaps variables and decrements a variable. This can obviously be reversed. Swap-
ping two variables is reversed by the same action and the decrement is reversed by
an increment. The changes must be made immediately before the recursive call and
reversed immediately after the recursive call.
II hanoi2.cpp

#include <iostream>

using namespace std;

char A,B,C;
unsigned long n;

void hanoiO
{
i f (n==1) cout « A « " -> " « B « endl;
else
{
n--; C = BAC; B = BAC; C = BAC; II swap B and C
hanoi 0 ;
C = BAC; B = BAC; C = BAC; II swap B and C back
cout « A « " -> " « B « endl;
C = AAC; A = AAC; C = AAC; II swap A and C
hanoi 0 ;
n++; C = AAC; A = AAC; C = AAC; II swap A and C back
}
}

void main(void)
{
A= 'A'; B= 'B'; C= 'C'; n=l;
cout « "Tower of Hanoi with 1 disc:" « endl;
hanoi 0 ;
A = 'A'; B = 'B'; C = 'C'; n = 2;
cout « "Tower of Hanoi with 2 discs:" « endl;
hanoi 0 ;
A = 'A'; B = 'B'; C = 'C'; n = 3;
cout « "Tower of Hanoi with 3 discs:" « endl;
hanoi 0 ;
}
Chapter 9
Abstract Data Types

9.1 Introduction

Programming languages such as C++ and Java have built in data types (so-called
basic data types or primitive data types) such as integers that represent information
and have operations that can be performed on them (such as multiplication and
addition). For example the built in basic data types in C++ are short, int, long,
float, double and char.

An abstract data type (ADT) consists of data and the operations which can be per-
formed on it. Generally the data is represented with standard data types of the
language in which it is implemented but can also include other abstract data types.
The operations defined on the ADT provide access to the information and manipu-
lation of the data without knowing the implementation of the ADT. The abstract
data type is implemented using constructors, data fields and methods (functions).
Information hiding is when ADT data is inaccessable (no operation can retrieve the
data). Encapsulation refers to the hiding of inner details (such as implementation).
In C++ the concepts of public, private and protected data fields and methods
are important in the implementation of an ADT. Public members of an ADT are
always accesible. Private members are only accesible by the ADT itself and pro-
tected members are only accesible by the ADT and any derived ADT's. A derived
ADT may override the accessibility of members by forcing all inherited members to
a specified level if they are more accessible.

For example the Standard Template Library [2] in C++ introduces many ADT's
such as Vector, list, staCk, queue and set. Standard C++ now includes the ab-
stract data type string. Symbolic C++ [169] includes the template classes Rational,
Complex, Quaternion, Vector, Matrix, Polynomial and Sum. Operations such as
addition and multiplication, determinant, trace and inverse of matrices are included
in the matrix class. An instance of Matrix could then be used in the same way
that integers are used without knowing the internal differences.

In the following sections various useful ADT's will be introduced.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
172 Chapter 9. Abstract Data Types

9.2 Linked List


The linked list is a useful data structure that can dynamically grow according to
data storage requirements. This is done by viewing data as consisting of a unit of
data and a link to more units of data.

A linked list is useful in the implementations of dynamic arrays, stacks, strings and
sets. The linked list is the basic ADT in some languages, for example LISP. LISP
stands for List Processing. All the program instructions in LISP operate on lists.

Linked lists are most useful in environments with dynamic memory allocation. With
dynamic memory allocation dynamic arrays can grow and shrink with less cost than
in a static memory allocation environment. Linked lists are also useful to manage
dynamic memory environments. Diagrammatically a linear linked list can be viewed
as follows.

Figure 9.1: Diagrammatic Representation of a Linked List

The list consists of data elements. Each data element has an associated link to the
next item in the list. The last item in the list has no link. In C++ we can implement
this by using a null pointer. The first element of the list is called the head, the last
element is called the tail.

Extensions to the ADT include double-linked lists where links exist for the next data
and the previous data allowing easier access to data earlier in the list, and sorted
linked lists. We use a template class definition so that the linked list can store any
kind of data without having to change or reimplement any functionality.

The following class is a C++ implementation of the ADT list. It has methods
for creating and destroying a list, copying one list to another (assignment oper-
ator), adding items to and removing items from the list (additem, insert item,
removeitem), merging lists (operators for addition), iteration (first, next, last,
position, data) and indexing elements (operator[]).
9.2 Linked List 173

II list.h

#ifndef LIST_HEADER
#define LIST_HEADER

using namespace std;

template <class T>


struct listitem { T data; list item *next; };

template <class T>


class list
{
protected:
listitem<T> *head;
listitem<T> *current;
int size;
public:
list 0 ;
list(const list&);
-list 0 ;
list &operator=(const list&);
void additem(T);
void insertitem(T);
int insertitem(T,int);
void removeitem(void);
int removeitem(int);
list operator+(const list&) const;
list &operator+=(const list&);
T operator[] (int);
T data(void);
int next(void);
int first(void);
int last(void);
int position(int);
int getsize(void);
};

template <class T>


list<T>::list() {head=current=NULL; size=O; }

template <class T>


list<T>: :list(const list &1)
{
listitem<T> *li=l.head;
head=current=NULL;
size=l.size;
if(li!=NULL)
174 Chapter 90 Abstract Data Types

{
head=new listitem<T>;
head->data=li->data;
head->next=NULL;
current=head;
li=li ->next;
}
while (li !=NULL)
{
current->next=new listitem<T>;
current=current->next;
current->data=li->data;
current->next=NULL;
li=li->next;
}
current=head;
}

template <class T>


list<T>: :-listO
{
while (head !=NULL)
{
current=head;
head=head->next;
delete current;
}
}

template <class T>


list<T> &list<T>::operator=(const list &1)
{
listitem<T> *li=lohead;
if(this==&l) return *this;

size=losize;
while(head!=NULL)
{
current=head;
head=head->next;
delete current;
}

head=current=NULL;
i f (li !=NULL)
{
head=new listitem<T>;
9.2 Linked List 175

head->data=li->data;
head->next=NULL;
current=head;
li=li->next;
}
while (Ii! =NULL)
{
current->next=new listitem<T>;
current=current->next;
current->data=li->data;
current->next=NULL;
li=li ->next;
}
current=head;
return *this;
}

template <class T>


void list<T>::additem(T t)
{
listitem<T> *li=head;
if (head==NULL)
{
head=new listitem<T>;
head->data=t;
head->next=NULL;
current=head;
}
else
{
while(li->next!=NULL) li=li->next;
li->next=new listitem<T>;
li=li ->next;
li->data=t;
li->next=NULL;
}
size++;
}

template <class T>


void list<T>: :insertitem(T t)
{
listi tem<T> *li;
if (head==NULL)
{
head=new listitem<T>;
head->data=t;
head->next=NULL;
176 Chapter 9. Abstract Data Types

current=head;
}
else
{
li=current->next;
current->next=new listitem<T>;
current->next->data=t;
current->next->next=li;
}
size++;
}

template <class T>


int list<T>::insertitem(T t,int i)
{
int j=O;
listitem<T> *li1, *li2;
li1=head;
while«j<i)&&(li1->next!=NULL» { li1=li1->next;j++; }
if (j==i)
{
li2=lil->next;
li1->next=new listitem<T>;
lil=lil->next;
li1->data=t;
li1->next=li2 ;
size++;
return 1;
}
return 0:
}

template <class T>


void list<T>::removeitem(void)
{
listitem<T> *li=head;
if(head==NULL) return;
if (head==current)
{
delete head;
head=current=NULL;
size--;
}
if(current!=NULL)
{
while(li->next!=current) li=li->next:
li->next=current->next;
delete current;
9.2 Linked List 177

current=li;
size--;
}
}

template <class T>


int list<T>: :removeitem(int i)
{
int j=O;
listitem<T> *li1, *li2;
li1=head;
if(head==NULL) return 0;
while«j<i-1)&&(li1->next!=NULL)) { li1=li1->next;j++; }
if (j==i-1)
{
li2=li1->next;
li1->next=li1->next->next;
if(li2!=NULL) delete li2;
size--;
return 1;
}
return 0;
}

template <class T>


list<T> list<T>::operator+(const list &1) const
{
list<T> 12(*this);
12+=1;
return 12;
}

template <class T>


list<T> &list<T>::operator+=(const list &1)
{
listitem<T> *li=l.head;
while(li!=NULL) { additem(li->data); li=li->next; }
return *this;
}

template <class T>


T list<T>::operator[] (int i)
{
int j=O;
listitem<T> *li=head;
if(li==NULL) return T();
while«j<i)&&(li->next!=NULL)) { li=li->next; j++; }
if(j==i) return li->data;
178 Chapter 9. Abstract Data Types

return TO;
}

template <class T>


T list<T>: : data (void)
{
if(current==NULL) return T();
return current->data;
}

template <class T>


int list<T>::next(void)
{
if(current->next==NULL) return 0;
current=current->next;
return 1;
}

template <class T>


int list<T>::first(void)
{
current=head;
if(head==NULL) return 0;
return 1;
}

template <class T>


int list<T>::last(void)
{
current=head;
if(head==NULL) return 0;
while(current->next!=NULL) current=current->next;
return 1;
}

template <class T>


int list<T>::position(int i)
{
int res;
res=first 0 ;
while(res&&(i>O)) { next();i--; }
return res;
}

template <class T>


int list<T>::getsize(void) { return size; }

#endif
9.2 Linked List 179

Now the ADT is used in an example program to illustrate the available operations.

/ / listeg. cpp

#include <iostream>
#include 11ist.h"

using namespace std;

void main(void)
{
int i;
list<int> 11;
11.additem(1);
11.additem(2) ;
11.additem(3) ;
11.additem(5);
11.additem(8) ;
list<int> 12(11),13;
13=12;
11.nextO;
11.insertitem(13); 11.insertitem(21,5);
11.firstO;
cout«"11: ";
do cout«l1.dataO«" ";
while(11.next());
cout«endl;
cout«"12: ";
do cout«12.dataO«" ";
while(12.next());
cout«endl ;
cout«"13: ";
do cout«13.dataO«" ";
while(13.next());
cout«endl;
list<int> 14=11+12;
cout«"14: ";
do cout«14.dataO«" ";
while(14.next());
cout«endl;
14+=13;
14.firstO;
cout«"14: ";
do cout«14.dataO«" ";
while(14.next());
cout«endl;
14.firstO;
cout«"The first item of 14 is "«14.dataO«endl;
180 Chapter 9. Abstract Data Types

14.1astO;
cout«"The last item of 14 is "«14.dataO«end1;
cout«"The fourth item of 14 is "«14[3] «end1;
14.position(3);
14.removeitemO;
14.removeitem(7);
14.firstO;
cout«"14: ";
do
cout«14.dataO«" ";
whi1e(14.next(»;
cout«end1;
cout«"Size of 11 is "«l1.getsize0 «end1;
cout«"Size of 12 is "«12.getsizeO«end1;
cout«"Size of 13 is "«13.getsizeO«end1;
cout«"Size of 14 is "«14.getsize0 «end1;
}

The program output is:

11: 1 2 13 3 5 8 21
12: 1 2 3 5 8
13: 1 2 3 5 8
14: 1 2 13 3 5 8 21 1 2 3 5 8
14: 1 2 13 3 5 8 21 1 2 3 5 8 1 2 3 5 8
The first item of 14 is 1
The last item of 14 is 8
The fourth item of 14 is 3
14: 1 2 13 5 8 21 1 3 5 8 1 2 3 5 8
Size of 11 is 7
Size of 12 is 5
Size of 13 is 5
Size of 14 is 15
9.2 Linked List 181

The linked list can also be viewed as a recursive structure with the first element
followed by a linked list. This view can make the implementation of many methods
easier.

Suppose an item is to be inserted into the sorted list. The item either comes before
the head of the list (which can be easily implemented) or after the head in which
case the item actually has to be inserted in a list with the second element of the list
as the head. Similarly to delete an item either the head must be removed or the
item must be deleted from a list with the second element of the list as the head.

The benefits of the recursive structure is demonstrated with a function Reverse


which reverses the list. This is as simple as removing the head of the list, reversing
the rest of the list and then adding the head to the end of the list. The assignment
and equality operators also demonstrate the benefits of the structure with simple
implementations. Assignment first deletes the current list (if the list is not being
assigned to itself). The head is copied first and then the rest of the list can be
copied. Equality is determined by testing the equality of the heads of two lists. On
success the equality of the rest of the two lists are tested. A linear search also has
a simple implementation. First the head of the list is examined to see if it contains
the data. If this is not the case, the rest of the list is searched.

In each of the above cases the rest of the list is a list itself and the method can be
applied recursively. Usually the simplest case for each recursive method is for the
empty list.

Special care must be taken when destroying the list. Simply deleting the head of
the list will create a memory leak. The remaining list must be destroyed before the
head can be destroyed. In the implementation the head data is part of the class
data and so the memory leak is avoided.

The data members of the class are few since most of the data management is done
by the recursive structure. The data member head stores the data for the node in
the linked list. Since a linked list can be empty, a node with the data member empty
set to one represents an empty list. A pointer tail provides access to the rest of
the list.
182 Chapter 9. Abstract Data Types

/ / rlist.h

#ifndef RLIST_HEADER
#define RLIST_HEADER

#include <assert.h>

using namespace std;

template <class T>


class RList
{
public:
RListO;
RList(const RList &) ;
-RListO;
RList &operator = (const RList&);
int operator == (const RList&);
void Insert(const T&);
int Search(const T&);
int Delete(const T&);
T Head(void);
RList *Tail(void);
int Empty(void);
RList *Reverse(RList*);
private:
T head;
RList* tail;
int empty;
};

template <class T>


RList<T>::RList() {empty 1;}

template <class T>


RList<T>::RList(const RList<T> &RL)
{
empty = RL.empty;
head RL.head;
tail = new RList<T>(*RL.tail);
}

template <class T>


RList<T>::-RList() { if(!empty) delete tail; }

template <class T>


RList<T> &RList<T>: :operator=(const RList<T> &RL)
{
9.2 Linked List 183

if(this == &RL) return;


if(!empty) delete tail;
empty = RL.empty;
head = RL.head;
tail = new RList<T>(*RL.tail);
}

template <class T>


int RList<T>::operator==(const RList<T> &RL)
{
if(empty&&RL.empty) return 1;
if(this == &Rl) return 1;
if(head != RL.head) return 0;
return (*tail == *RL.tail);
}

template <class T>


void RList<T>: : Insert (const T &tolnsert)
{
if (empty)
{
head = tolnsert;
tail = new RList<T>;
empty=O;
}
else tail->Insert(tolnsert);
}

template <class T>


int RList<T>::Search(const T &toSearch)
{
if(empty) return 0;
else if(head == toSearch) return 1;
else return tail -> Search(toSearch);
}

template <class T>


int RList<T>::Delete(const T &toDelete)
{
if(empty) return 0;
else if(head==toDelete)
{
head = tail -> head;
empty = tail -> empty;
tail -> Delete(tail -> head);
if (tail -> empty) delete tail;
return 1;
}
184 Chapter 9. Abstract Data Types

else return tail -> Delete(toDelete);


}

template <class T>


T RList<T>::Head(void)
{
assert ( ! empty) ;
return head;
}

template <class T>


RList<T> *RList<T>::Tail(void)
{
assert (! empty);
return tail;
}

template <class T>


int RList<T>::Empty(void) { return empty; }

template <class T>


RList<T> *RList<T>::Reverse(RList<T> *RL)
{
if (RL->EmptyO )
{
RList<T> *temp;
temp = new RList<T>;
return temp;
}
else
{
RList<T> *R;
R = Reverse(RL->Tail());
(*R).Insert(RL->Head());
return R;
}
}

#endif
9.2 Linked List 185

Now the ADT is used in an example program to illustrate the available operations.

II rlisteg.cpp

#include <iostream>
#include "rlist.h"

using namespace std;

int main(void)
{
RList<int> L;
int i;
for(i=l; i<=8; i++)
L. Insert(i) ;

RList<int>* LX = &L;
cout « "The initial list is: " « endl;

while(!LX -> Empty())


{
cout «LX -> Head() « ' ';
LX = LX -> Tail();
}
cout « endl « endl;

RList<int>* R = L.Reverse(&L);
RList<int>* LP = R;

while(!LP -> Empty())


{
cout « LP -> Head() « ' ';
LP = LP -> Tail();
}
cout « endl « endl;

cout « "what happened to the initial list: ,,« endl;


LP = &L;
while(!LP -> Empty())
{
cout « LP -> Head() « ' ';
LP = LP -> Tail();
}
cout « endl;

cout « "remove some items: ,,« endl;


L.Delete(l);
L.Delete(4);
186 Chapter 9. Abstract Data Types

L.Delete(S);
LP = &:L;
while(!LP -> Empty(»
{
cout « LP -> Head() « ' ';
LP = LP -> Tail();
}
cout « endl;

cout « "is 3 in the list: " « L.Search(3) « endl;


cout « "is 4 in the list: " « L.Search(4) « endl;

return 0;
}

The program output is:

The initial list is:


1 2 3 4 5 6 7 8

8 7 6 5 432 1

what happened to the initial list:


1 2 3 4 5 6 7 8
remove some items:
23567
is 3 in the list: 1
is 4 in the list: 0
9.3 Stack 187

9.3 Stack
The stack is a LIFO (last in first out structure). The last value stored (and not yet
retrieved) is the only value that can be retrieved. The traditional analogy is a stack
of plates where only the top plate can be removed, and a plate can only be placed
on top of the stack. Due to the dynamic nature of a stack the implementation is
based on the linked list.

Since we have already created a list ADT which can grow or shrink in size as needed
we can reduce the amount of work needed to implement the stack ADT. The list
enables access to any element in the structure, the stack can be viewed as a restricted
list with access to only the tail. The operation of putting data on the stack is referred
to as "pushing" data onto the stack, and the operation of retrieving data from the
stack is referred to as "popping" data off the stack. The stack is an important
structure for implementing recursion.

Diagrammatically a stack can be viewed as follows.

Access to stack
K
C
A
T
S

Figure 9.2: Diagrammatic Representation of a Stack

We implement the stack as a class in C++. The class has methods for creating
a stack using an empty list (the constructor), copying one stack to another (the
assignment operator), pushing data onto the stack (push which simply adds the data
to the end of the list) and popping data off the stack (pop which simply removes the
last element of the list). No destructor is needed since the list destructor is called
automatically.
188 Chapter 9. Abstract Data Types

II stack.h

#ifndef STACK_HEADER
#define STACK_HEADER
#include "list.h"
using namespace std;

template <class T>


class stack
{
protected:
list<T> stacklist;
public:
stackO;
stack(const stack&);
stack &operator=(const stack&);
void push(T);
T pop(void);
int getsize(void);
};

template <class T>


stack<T>::stack() : stacklist() {}

template <class T>


stack<T>::stack(const stack &s) stacklist(s.stacklist) {}

template <class T>


stack<T> &stack<T>::operator=(const stack & s)
{ stacklist=s.stacklist; return *this; }

template <class T>


void stack<T>::push(T t) { stacklist.additem(t); }

template <class T>


T stack<T>::pop(void)
{
T data;
stacklist.last();
data=stacklist.data();
stacklist.removeitem();
return data;
}

template <class T>


int stack<T>: :getsize() { return stacklist.getsize(); }

#endif
9.3 Stack 189

Now the ADT is used in an example program to illustrate the available operations.

II stackeg.cpp

#include <iostream>
#include "stack.h"

using namespace std;

void main(void)
{
int i;
stack<int> sl;
s1. push(l) ;
s1.push(2) ;
s1.push(3) ;
sl.push(5);
s1. push (7) ;
s1. push(l1) ;
stack<int> s2(sl);
stack<int> s3;
s3=sl;
stack<int> s4;
cout«"Size of sl is "«s1.getsizeO«endl;
cout«"sl: ";
while(s1.getsizeO>O) {cout«(i=s1.popO)«" ";s4.push(i);}
cout«endl«"s2: ";
while(s2.getsizeO>O) cout«s2.popO«" ";
cout«endl«"s3: ";
while(s3.getsizeO>O) cout«s3.popO«" ";
cout«endl«"s4: ";
while(s4.getsizeO>O) cout«s4.popO«" ";
cout«endl;
}

The program output is:

Size of 51 i5 6
51: 11 7 5 3 2 1
52: 11 7 5 3 2 1
s3: 11 7 5 3 2 1
54: 1 2 3 5 7 11
190 Chapter 9. Abstract Data Types

9.4 Tree

A tree is a branching structure. It has a starting node called a root node. An n-ary
tree can have up to n branches from each node to other nodes. A binary tree is a
2-ary tree. Every node in a tree is the root of a subtree. A tree is noncyclic, in other
words there is only one path between any two nodes in a tree.

A binary tree is useful for classification by proposition. If P( x, y) is a proposition


regarding x and y, then, when a node represents x all items y, where P( x, y) is false,
should be accesible only via the left branch from the node and all items y, where
P(x, y) is true, should be accesible only via the right branch from the node. This
can be used to sort elements. Binary trees can also be searched more quickly than
linear structures such as a linked list.

In general an n-ary tree has a search time O( logns) where s is the number of elements
in the tree. For a linear structure such as the linked list the search time is O(n).
Diagrammatically a binary tree can be viewed as follows.

Figure 9.3: Diagrammatic Representation of a Binary Tree

We implement a binary tree as a class in C++. The class has methods for creating
a new binary tree, destroying a binary tree, copying one binary tree to another
(assignment operator), adding an item and removing an item from the tree (addi tern
and removeitem), determining if an item is present in the tree (find) and iterating
through the tree (first, last, next and previous).
9.4 Tree 191

II tree.h

#ifndef TREE_HEADER
#define TREE_HEADER

#include "list.h"
#include "stack.h"

using namespace std;

template <class T>


struct treenode
{
T data;
list<treenode<T>*> left children;
list<treenode<T>*> rightchildren;
};

template <class T>


class tree
{
protected:
treenode<T> *root;
treenode<T> *current;
stack<treenode<T>*> traverse;
void additem(T,treenode<T>*);
void copy_subtree(treenode<T>*,treenode<T>*);
void delete_subtree(treenode<T>*);
treenode<T> *find(treenode<T>*,T);
unsigned int limit;
public:
tree(unsigned int);
tree(const tree&);
-treeO;
tree &operator=(const tree&);
void additem(T);
int removeitem(void);
int find(T);
void first(void);
void last(void);
void next(void);
void previous(void);
T &data(void);
};

template <class T>


void tree<T>::additem(T t,treenode<T> *&tn)
{
192 Chapter 9. Abstract Data Types

int i=O;
list<treenode<T>*> *1;
if (tn==NULL)
{
tn=new treenode<T>;
tn->data=t;
return;
}
if(t<tn->data) l=&(tn->leftchildren);
else l=&(tn->rightchildren);

if (l->getsize()==O)
{
l->additem(new treenode<T»;
(*1) [O]->data=t;
return;
}
else
{
while«i<l->getsize(»&&«*l) [i]->data<t» i++;
if«l->getsize()<limit-l)&&
(tn->leftchildren.getsize()+tn->rightchildren.getsize()<limit»
{
l->insertitem(new treenode<T>,i);
(*1) [i]->data=t;
}
else additem(t,(*l) [i]);
}
}

template <class T>


void tree<T>::copy_subtree(treenode<T> *&tl,const treenode<T> *&t2)
{
int i;
if(t2==NULL) {tl=NULL; return;}
tl=new treenode<T>;
tl.data=t2.data;
for(i=O;i<t2->leftchildren.getsize();i++)
{
tl->leftchildren.additem(NULL);
copy_subtree(tl->leftchildren[i],t2->leftchildren[i]);
}
for(i=O;i<t2->rightchildren.getsize();i++)
{
tl->rightchildren.additem(NULL);
copy_subtree(tl->rightchildren[i],t2->rightchildren[i]);
}
}
9.4 Tree 193

template <class T>


void tree<T>: : delete_subtree (treenode<T> *tl)
{
int i;
if(tl==NULL) return;
for(i=O;i<tl->leftchildren.getsize();i++)
delete_subtree(tl->leftchildren[i]);
for(i=O;i<tl->rightchildren.getsize();i++)
delete_subtree(tl->rightchildren[i]);
delete t1;
}

template <class T>


treenode<T> *find(treenode<T>* tn,T t)
{
int i;
list<treenode<T>*> *1;
treenode<T> *result=NULL;
if(tn->data==t) return tn;
if(t<tn->data) l=&(tn->leftchildren);
else l=&(tn->rightchildren);
for(i=O;(i<l->getsize(»&&(result==NULL);i++) result=find«*l)[i],t);
return result;
}

template <class T>


tree<T>: :tree(unsigned int n) {root=current=NULL; limit=n;}

template <class T>


tree<T>::tree(const tree &t)
{
int i;
stack<treenode<T>*> s;
treenode<T> *tp;
root=current=NULL;
limit=t.limit;
copy_subtree(root,t.root);
current=root;
}

template <class T>


tree<T>: : -tree 0
{
delete_subtree(root);
}

template <class T>


194 Chapter 9. Abstract Data Types

tree<T> &tree<T>::operator=(const tree<T> &t)


{
if(&t==this) return;
delete_subtree(root);
copy_subtree(root,t.root);
current=root;
}

template <class T>


void tree<T>::additem(T t)
{
additem(t,root);
}

template <class T>


int tree<T>: :removeitem();

template <class T>


int tree<T>::find(T t)
{
treenode<T> H;
result=find(root,t);
if (resul t ! =NULL)
{
current=result;
return 1;
}
return 0;
}

template <class T>


void tree<T>::first(void)
{
current=root;
while(current->leftchildren.getsize(»O)
{
current->leftchildren.first();
current=current->leftchildren->data();
}
}

template <class T>


void tree<T>: :last(void)
{
current=root;
while(current->rightchildren.getsize(»O)
{
current->rightchildren.last();
9.4 Tree 195

current=current->rightchildren->data();
}
}

template <class T>


void tree<T>::next(void);

template <class T>


void tree<T>::previous(void);

template <class T>


T &tree<T>::data(void)
{
return current->data;
}
#endif
196 Chapter 9. Abstract Data Types

Now the ADT is used in an example program to illustrate the available operations.

II treeeg.cpp

#include <iostream>
#include "tree.h"

using namespace stdi

void main(void)
{
int ii
Tree<int> ti

t.insert(4);t.insert(1);t.insert(2)it.insert(7)it.insert(5)i
Tree<int> t2(t);
Tree<int> t3i

t3=ti
if(t2==t) cout « It2==t" « endli
if(t3==t) cout « It3==t" « endli
for(i=Oii<t.size()ii++)
cout « "t[" « i « II] = II « t[i] « endli
cout«endli
}

The program output is:

t2==t
t3==t
t[O] 1
t[1] 2
t [2] 4
t [3] 5
t[4] 7
Chapter 10
Error Detection and Correction

10.1 Introduction
Due to external influences and the imperfection of physical devices, errors can occur
in data representation and data transmission. This chapter examines some methods
of limiting the effect of errors in data representation and transmission. Error control
coding should protect digital data against errors which occur during transmission
over a noisy communication channel or during storage in an unreliable memory.
The last decade has been characterized not only by an exceptional increase in data
transmission and storage requirements, but also by rapid developments in micro-
electronics providing us with both a need for, and the possibility to, implement
sophisticated algorithms for error control.

The data representation examined here is strings of bits (binary strings, binary
sequences)

where ai E {a, I} i = 0,1, ... , n - 1 and

En = {O,l} X {O,l} X •.• X {O,l} (n times)

as defined before. Therefore an error is a bit flip, i.e. we have a; for some i.

We discuss single bit error detection in the form of parity checks, Hamming codes
for single bit error correction and finally the noiseless coding theorem which de-
scribes the limitations of coding systems and the requirements on codes to reduce the
probability of error. Another commonly used error detection scheme, the weighted
checksum, is also discussed.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
198 Chapter 10. Error Detection and Correction

10.2 Parity Function


In data transmission it is important to identify errors in the transmission. If the
probability of error is low enough, or example if we know that the probability of
error is ~ then bit strings of length n or longer are unlikely to have more than one
error. If this error can be detected the data can be transmitted again until it is
transmitted without error. The parity function can be used for this purpose. The
parity function can be used to detect an odd number of errors in a bit string.

The result of the parity function is a single bit stored in an extra bit an, the bit is
stored or transmitted with the data. If an odd number of errors occur the result of
the parity function over an-la n-2 ... ao will not concur with an. The parity of the
bit string must be calculated when the data is sent or stored, and when the data is
received or retrieved. The bit reserved for the parity information can take the values
o or 1. To ensure the meaning of the bit is consistent we introduce the following
definitions.

Definition. The even-parity function of a bit string is given by:

Definition. The odd-parity function of a bit string is given by

The odd-parity function sets an such that anan-l ... ao has an odd number of Is.
The even-parity function sets an such that anan-l ... ao has an even number of Is.
an is called the parity bit. Either parity function can be used, but consistency must
be ensured so that results are meaningful.

Example. Consider the bit string 1101. Podd(1101) = O. The stored string is then
01101. Suppose an error occurs giving 01001 then Podd (lOO1) = 1 and an error is
detected. Suppose an error occurs in the parity bit giving 11101. Podd (l101) = 0
and once again an error is detected. If two errors occur, for example 11001, then
Podd (lOOl) = 1 and the errors are not detected. ...
10.3 Hamming Codes 199

10.3 Hamming Codes


The Hamming code [3, 67] is a well-known type of error correction algorithm used
for detecting and correcting memory errors. The algorithm was developed by R.W.
Hamming and is able to detect single-bit errors and correct them. The algorithm is
also able to detect double-bit errors and nibble-bit errors but is not able to correct
them. First we have to introduce the Hamming distance.

Definition. The Hamming distance dH of two bit strings an -la n -2 ... ao and
bn - 1 bn - 2 . .. bo of the same length n is the number of positions that differ, formally

n-l
dH(a n -la n -2 ... ao,b n - 1bn - 2 ... bo) := L(a; - b;)2.
;=0

We can easily see that d H is a metric on En. For all a, b, c E En we have

• dH(a, b) ~ 0

• dH(a, b) = 0 iff a = b

• dH(a, b) = dH(b, a)

• dH(a, c) :::; dH(a, b) + dH(b, c)

The first three properties are easy to see. The last property follows from the fact
that

(a-c)2 = (a-b+b-c)2 = (a-b)2+2(a-b)(b-c)+(b-c)2:::; (a-b)2+(b-c?

for a, b, c E {O, I}.

Example. Let A = 10111010 and E = 01110101. The Hamming distance is 6. ..

The following C++ program calculates the Hamming distance.

II hdist.cpp

#include <iostream.h>

void main(void)
{
unsigned long x = 186; 1110111010b
unsigned long y = 117; 1I01110101b
int d.H = 0;
200 Chapter 10. Error Detection and Correction

for(int i=8*sizeof(unsigned long)-l; i >= 0 ;i--)


{
II Add 1 to the Hamming distance if the bit in position
II i differs for x and y. The AND (&) operator isolates
II the bit and the XOR (A) operator performs the comparison.
dH += ««1 « i) & x) A «1 « i) & y)) > 0) ? 1:0;
}

cout « IdH(" « x « "." « y « ") = II « dH « endl;


}

The Hamming distance can be used as a tool for error correction. For a set C c En,
of allowable bit strings for data representation, we define the minimum distance

t5(C):= min{dH(a,b)}.
a,bEG

It is then possible to detect up to t5(C) errors in a bit string from C. The minimum
distance principle for error correction is to select c E C for a bit string x E En such
that dH(c,x) is a minimum.

Theorem. If the minimum distance principle for error correction is used and
t5(C) ~ 2e+ 1
then up to e errors in a bit string from C can be corrected.

Proof Let ae be the bit string a E C with up to e errors. Let b E C and b =f. a then
dH(a,a e ) + dH(a e , b) > dH(a,b)
e + dH(a e , b) > t5(C)
> 2e + 1.

For t5(C) = 3 only one error can be corrected. C is called a code and the elements

of C are called code words.

Theorem. An upper bound of the number s of code words of length n which can
correct up to e errors if the minimum distance principle is used, is given by

s< ( ).
- E~=o 7

Proof Since the codewords can correct up to e errors we have dH(a, b) > e for any
two codewords a and b. We consider the number of binary sequences of length n
10.3 Hamming Codes 201

which would be corrected to a specific codeword c. This is simply the number of


binary sequences of length n derived from c with up to e errors

There are s codewords, and a maximum of 2n possible binary sequences of length n


which gives

Thus the bound for s follows.

For e = 1 we find

2n 2n
s< =--.
- "e_ (n)
L..t_O t
1+ n

Suppose n + 1 is a power of 2, i.e. n + 1 = 2m . The above bound reduces to

A Hamming code is the best code that can detect and correct one error in the sense
that it contains the most code words. Let Hr be an r x (2r -1) matrix with entries
hi,j E {O, I}, no two columns the same and no zero columns.

Example. For r = 2 and r = 3 we have

(~ ~)
1
H2 = 0

H,~ U 0 0 1 1 1
1 1 0 0 1
0 1 0 1 0 n
202 Chapter 10. Error Detection and Correction

The Hamming code is now given by

where we use column representation of bit strings

and

The Hamming code CHr has ICHrl = 22r - r - 1 code words. Since addition is modulo
2, we find

Example. Using the matrices given above, for H2 and H3 we find

0 1 1 0 0 0 0 1
0 0 0 1 0 0 1 1
0 1 0 0 1 1 0 1
CHa = 0 1 0 0 0 1 1 1
0 0 0 1 1 0 0 1
0 1 1 0 1 0 1 1
0 0 1 1 0 1 0 1

1 0 1 1 1 0 0 1
0 0 1 1 1 1 1 0
0 0 1 0 0 1 1 1
1 1 0 0 1 0 1 0
1 1 0 1 0 0 1 1
0 1 0 1 0 1 0 0
0 1 0 0 1 1 0 1
10.3 Hamming Codes 203

Suppose the bit string 1101010 is received. We test if it is a valid code.

o
1
o
H3 1
o
1
1

So it is not a valid code. Assuming at most one error the code word 1101010 must
have been 0101010 E CH3 . ..

In the previous example the result of the test was nonzero. The last row determines
the even parity of the bits in positions 1, 3, 5 and 7 where the first bit is numbered
as 1. The second row determines the even parity of bits 2, 3, 6 and 7 and the first
bits 4, 5, 6, 7. For the last row the first bit of all the positions listed is 1. For the
second row the second bit in the positions listed is 1. For the first row the third
bit in the positions listed is L Thus if a bit string fails the test, the resulting bit
string can be used to determine the position of the error. This is possible because
the columns of H3 are numerically ascending.

Example. From the above example the test result was 111. The result indicates the
error is in the last position, giving the desired code 0101010. ..

The Hamming code C Hr forms an abelian group with group operation

For all a, b, c E C

eaEElO=a
e a EEl a = 0, therefore -a = a

e a EEl (b EEl c) = (a EEl b) EEl c due to the associativity of EEl


204 Chapter 10. Error Detection and Correction

10.4 Weighted Checksum


Central to the weighted checksum representation is the weight matrix W. It is a
t x n matrix that generates t checksums from a column vector of length n. The
weighted checksum representation of a column vector is found by appending these
checksums to the end of the vector, making it a separable code. The number of
checksums is typically much smaller than the data it is calculated on. So it relies
on a probabilistic model to catch most, but not all, errors in the data.

Given a column vector a = (al a2 ... an)T, and a t x n weight matrix W, the
column coded version of a is

Let
H= (W -It).
An encoded vector a" containing valid data is guaranteed to satisfy the equation
H a" = 0, which is seen as follows

(WIn - ItW) a
o
Matrices can be encoded in a similar manner. Each data matrix A has a set of
column, row and full weighted checksum matrices Ac, A,., and Af .

Af = ( WAA WAW
Awr) T

Matrix addition, multiplication, LV decompositions, transpose, and multiplication


by a scalar all preserve the weighted checksum property.
10.5 Noiseless Coding Theorem 205

10.5 Noiseless Coding Theorem


In this section an overview of Shannon's noiseless coding theorem is given following
Schumacher [144]. Suppose A is a source of messages {ao, ... , an} with probabilities
p(ao),.·. ,p(an ). The probabilities p(ao), ... ,p(an ) satisfy

n
and LP(ai) = 1.
i=O

Definition. The Shannon entropy Es(A) of A is defined by

Es(A) ;= - LP(a) log2p(a).


a

The Shannon entropy is also called the missing information.

Example. Let the probabilities for the messages be

{ I l l Ill}
4'16'16'4'8'4 .

Thus p(ao) = ~, p(al) = lB, ... , p(a5) = ~. Then

3 1 1 1 2 1
Es(A) - - iog 2 - - - log2 - - - iog2 -
4 4 8 8 16 16

2.375

It takes 3 bits to specify which message was received. The value 2.375 can be inter-
preted as the average number of bits needed to communicate this information. This
can be achieved by assigning shorter codes to those messages of higher probability
and longer messages to those of lower probability.
206 Chapter 10. Error Detection and Correction

First we introduce the weak law of large numbers.


The weak law of large numbers
Let Xl, X2, ... , X N be N independent, identically distributed random variables, each
with mean x and finite variance (1. Given 8, € > 0 then there exists No(8, €) such
that for N > No

Now suppose A produces a sequence of independent messages a = ala2 ... aN with


probability
p(a)p(al)p(a2) ... p(aN).
=
Define the random variable 0: := -log2p(a) for a generated by A with Q = Es(A).
It follows that for 8, € > 0 there exists No (8, €) such that for N > No

with -log2 p( a) = L:?=l O:i. We assume now N > No. Define

So with probability greater than 1 - € a sequence a is in r and satisfies

T N (ES(A)-8) 2 p(a) 2 T N (Es(A)+8)

Let, = If! denote the number of elements in r. The bounds of, are given by
1 2 L p(a) 2 LT N(Es(A)+8) = ,TN(Es(A)+8)
aEr aEr

and
1- E:S; LP(a) :s; LT N(Es(A)-8) = ,2- N(Es(AH).
aEr aEr

Thus we find
10.5 Noiseless Coding Theorem 207

Noiseless coding theorem


Let A be a message source and f, Ii> O.
1. If Es(A) + Ii bits are available to encode messages from A then there exists
No(Ii, f) such that for all N > No sequences of messages from A of length N
can be coded into binary sequences with probability of error less than f.

Using the above results we have

and each element of r can be encoded uniquely in a bit string of length Es(A)+
Ii. The other sequences are encoded as bit strings of length Es(A) + Ii but
will not be correctly decoded. Since these sequences are not in r they have
probability less than f.

2. If Es(A) - Ii bits are available to encode messages from A then there exists
No(li, f) such that for all N > No sequences of messages from A of length N
are coded into binary sequences with probability of error greater than 1 - f.

Let A, () > 0, >. < Ii. Then 2 N (Es(A)-<l) sequences of messages from A can be
encoded uniquely. The rest will not be correctly decoded. There exists No
such that for N > No,
p(a) :::; TN(Es(A)-A)

for a E rand

Let Pc denote the probability that the sequence is correctly decoded. Then
Pc < 2 N (Es(A)-<l)2- N (Es(A)-A)

< () + 2 N (Es(A)-o)r N (Es (A)-A)


() + 2N (A-<l)
So for () = ~, Pc < f and the probability of error is greater than 1 - f.

The fidelity F of a coding-decoding scheme is defined to be the probability that



a message sequence is decoded correctly, in other words the probability of error is
I-F.
208 Chapter 10. Error Detection and Correction

10.6 Example Programs


The following C++ program generates the Hamming code of a given length. The
function increment takes an array of type char as input where the entries are 0 or 1,
and does a binary increment on the entries. The function genmatrix generates the
matrix used to generate the Hamming codes, i.e. the matrix with column entries of
oand 1 and no zero columns. The function hammingcode iterates through all binary
codes determining which codes satisfy the criteria of the Hamming code using the
generated matrix.

II hamming.cpp

#include <iostream>

using namespace stdj

void increment(char *c,int n)


{
int i,added = OJ
for(i=Oj(i<n) && (!added)ji++)
if (c [i] == 1)
c [i]=O j
else
{
added = 1j
c[i] = 1j
}
}

void genmatrix(char **m,int x)


{
int i,j j
char *c = new char[x]j
for(i=Oji < xji++)
c[i] = OJ
for(i=Oji < «1«x)-1)ji++)
{
increment(c,x)j
for(j=Ojj < Xjj++)
m[j][i] = c[j] j
}
delete[] Cj
}

void hammingcode(int x)
{
int size = (l«x)-l;
10.6 Example Programs 209

int number = l«size;


char *c = new char [size] ;
char **m = new char*[x];
int h,i,j,sum,iszero;

for(i=O;i < size;i++)


c[i] = 0;
for(i=O;i < x;i++)
m[i] = new char[size];
genmatrix(m,x);
for(h=O;h<number;h++)
{
iszero = 1;
for(i=O;i < x;i++)
{
sum = 0;
for(j=O;j < size;j++) sum += m[i] [j]*c[j];
if(sum%2 == 1) iszero = 0;
}
if (iszero)
{
cout « "( ";
for(i=O;i < size-1;i++)
if (c [i]) cout « "1" « " II.,
else cout « "0" « ", ";
cout « char('0'+c[size-1]) « " )" « endl;
}
increment(c,size);
}
for(i=O;i < x;i++) delete[] m[i];
delete[] m;
delete [] c;
}

void main(void)
{
cout « "Hamming codes of length 3:" « endl;
hammingcode(2);
cout « "Hamming codes of length 7:" « endl;
hammingcode(3);
}
210 Chapter 10. Error Detection and Correction

The program output is

Hamming codes of length 3:


( 0, 0,
( 1, 1,
°1 ))
Hamming codes of length 7:
( 0, 0, 0, 0, 0, 0, °)
( 1, 1, 1, 0, 0, 0,
° )
( 1, 0, 0, 1, 1, 0,
° )
( 0, 1, 1, 1, 1, 0,
° )
( 0, 1, 0, 1, 0, 1,
° )
( 1, 0, 1, 1, 0, 1,
° )
( 1, 1, 0, 0, 1, 1,
° )
( 0, 0,
( 1, 1,
1, 0,
0, 1,
1,
0,
1,
° )
0, 1 )
( 0, 0, 1, 1, 0, 0, 1 )
( 0, 1, 0, 0, 1, 0, 1 )
( 1, 0, 1, 0, 1, 0, 1 )
( 1, 0, 0, 0, 0, 1, 1 )
( 0, 1, 1, 0, 0, 1, 1 )
( 0, 0, 0, 1, 1, 1, 1 )
( 1, 1, 1, 1, 1, 1, 1 )

which is the same as results calculated earlier in this chapter.


10.6 Example Programs 211

The following C++ program implements a weighted checksum. The function encode
takes a matrix (2-dimensional array) and a vector (I-dimensional array) as argu-
ments and calculates the vector with checksum information using matrix multipli-
cation. The function checksum takes a matrix and a vector as arguments. It de-
termines the matrix for the checksum test, and determines if matrix multiplication
with the supplied vector gives the zero vector (the checksum test is satisfied).

II checksum.cpp

#include <iostream.h>

void encode(int n,int t,int **w,int *a,int *ac)


{
int i,j;

for(i=O;i < n;i++)


ac[i] = a [i] ;
for(i=O;i < t;i++)
{
ac[n+i] = 0;
for(j=O;j < n;j++)
ac[n+i] += w[i] [j] *a[j];
}
}

int checksum(int n,int t,int **w,int *ac)


{
int i,j,sum;

for(i=O;i < t;i++)


{
sum = 0;
for(j=O;j < n;j++)
sum += w[i] [j] *ac [j] ;
sum -= ac[n+i];
ifCsum != 0)
return 0;
}
return 1;
}

void main(void)
{
int data[7] = {3,8,1,7,9,200,5};
int datac[10];
int **W = new int*[3];
int i;
212 Chapter 10. Error Detection and Correction

for(i=0;i<3;i++)
WEi] = new int[7];

W[0] [0] l', W[l] [0] = 1; W[2] [0] 1;


W[0] [1] 0; W[l] [1] = 1; W[2] [1] l',
W[0] [2] 1; W[l] [2] = 0; W[2] [2] 1;
W[0] [3] 0; W[1] [3] = 0; W[2] [3] = 0;
W[O] [4] 1; W[l] [4] = 1; W[2] [4] = 0;
W[0] [5] = 0; W[1] [5] = 1; W[2] [5] 0;
W[0] [6] = 1; W[1] [6] = 0; W[2] [6] = 1;

encode(7,3,W,data,datac);
if (checksum(7,3,W,datac»
cout « "Checksum satisfied." «endl;
else
cout « "Checksum failed." «endl;

i = datac[4];
datac[4] = 0;
if(checksum(7,3,W,datac»
cout « "Checksum satisfied." «endl;
else
cout « "Checksum failed." «endl;
datac[ 4] = i;

i = datac[9];
datac [9] = 0;
if (checksum(7,3,W,datac»
cout « "Checksum satisfied." «endl;
else
cout « "Checksum failed." «endl;

for(i=0;i<3;i++)
delete W[iJ ;
delete W;
}
10.6 Example Programs 213

Java includes a class java.util.zip.CRC32 which implements the CRC-32 cyclic


redundancy check checksum algorithm. The method

void update(byte[] b)

in class CRC32 is used to update the CRC-32 calculation when the bytes in the byte
array are added to the data used to calculate the checksum. The method
byte[] getBytes()

in class String is used to provide the data for the calculation. The method

void resetO

in clas CRC32 resets the calculation so that the CRC-32 checksum can be calculated
with new data. The method

long get Value 0

is used to get the value of the checksum for the given data. If the value is not the
expected value then the checksum indicates an error.

II Cksum.java

class Cksum
{
public static void main(String[] args)
{
long csum;
java.util.zip.CRC32 code;
String data = "Checksum example";
String output;

code = new java.util.zip.CRC32();


code.update(data.getBytes());
csum = code.getValue();

output = "\"" + data + "\"" + " has a CRC32 checksum of ";


output += Long.toString(csum);
System.out.println(output);

code. reset 0 ;
data = "Ch-cksum exmaple";
code.update(data.getBytes());

output = "\"" +data + "\"" + " has a CRC32 checksum of ";


output += Long.toString(code.getValue());
System.out.println(output);
214 Chapter 10. Error Detection and Correction

if(csum == code.getValue(»
System.out.println("Checksum satisfied.");
else
System.out.println("Checksum failed.");
}
}

The program output is

"Checksum example" has a CRC32 checksum of 1413948801


"Ch-cksum exmaple" has a CRC32 checksum of 2843844351
Checksum failed.
Chapter 11
Cryptography

11.1 Introduction
Cryptology is the science which is concerned with methods of providing secure stor-
age and transport of information. Cryptography can be defined as the area within
cryptology which is concerned with techniques based on a secret key for concealing
or enciphering data. Only someone who has access to the key is capable of decipher-
ing the encrypted information. In principle this is impossible for anyone else to do.
Cryptanalysis is the area within cryptology which is concerned with techniques for
deciphering encrypted data without prior knowledge of which key has been used.

Suppose A (the transmitter, normally called Alice) wishes to send a message enci-
phered to B (the receiver, normally called Bob). Often the original text is simply
denoted by M, and the encrypted message by C. A possible method is for A to use a
secret key K for encrypting the message M to C, which can then be transmitted and
decrypted by B (assuming B possesses the key K). We denote by C = EK(M) the
message M encrypted using the key K, and M = DK (C) the message C decrypted
using the key K. We assume that an attacker (normally called Eve) can easily
read any communication between Alice and Bob. The communication method must
attempt to send the message in a form which Eve cannot understand and possibly
also include authentication of the transmitter and receiver.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
216 Chapter 11. Cryptography

11.2 Classical Cypher Systems


We can distinguish between two types of classical cypher systems, namely transposi-
tion systems and substitution systems. A transposition cipher is based on changing
the sequence of the characters in the message. In other words the enciphered mes-
sage is a permutation of the original message. A substitution cipher does not change
the order of the components of a message but replaces the original components with
new components. We give three example programs. In the first example we con-
sider the transposition cipher where the positions of the symbols in a message are
rearranged. In the second example example the cipher substitutes one symbol for
another, in this case a cyclic substitution is used. The substitution only depends on
the symbol being replaced. In the third example a more advanced substitution is
performed using the symbol to be replaced and the symbol's position in the message.

Example. The function transpose takes a text message m, a permutation p and


the size of the permutation l as arguments. If the length len of the string m is not
a multiple of l the permutation cannot be applied to the last len%l bytes of the
string, where %is the modulus operator. The function returns without enciphering
the text in this case. To overcome this the string could be lengthened with (for
example) spaces. To decipher the message the same algorithm can be applied with
the inverse permutation.

II transpose.cpp

#inc1ude <iostream>
#inc1ude <string>

using namespace std;

int transpose(string &m,int *p,int 1)


{
int i,j,len;
char *temp = new char[l];

len = m.1ength();
if(len%l) return 0;
for(i=O;i < 1en;i++)
{
temp[i%l] = m[1*(i/1)+p[i%1]];
if«i%l) == 1-1)
for(j=i-1+1;j < i+1;j++) m[j] = temp[j%l];
}
de1ete[] temp;
return 1;
}

void main(void)
11.2 Classical Cypher Systems 217

{
string m = "A sample message";
int pl[2] = {l,O}, pli[2] = {l,O};
int p2[4] = {3,1,O,2}, p2i[4] = {2,1,3,O};
int p3[S] = {5,1,7,O,2,3,4,6}, p3i[S] = {3,1,4,5,6,O,7,2};
cout « "m = " « m « endl;
transpose(m,pl,2);
cout « "Enciphering musing pl = " « m « endl;
transpose(m,pli,2);
cout « "Deciphered using pli = " « m « endl;
transpose(m,p2,4);
cout « "Enciphering musing p2 = " « m « endl;
transpose(m,p2i,4);
cout « "Deciphered using p2i = " « m « endl;
transpose(m,p3,S);
cout « "Enciphering musing p3 = " « m « endl;
transpose(m,p3i,S);
cout « "Deciphered using p3i = " « m « endl;
}

The program output is

m = A sample message
Enciphering m using pi Aaspmelm seaseg
Deciphered using pii = A sample message
Enciphering musing p2 a Asepmlsm eeasg
Deciphered using p2i = A sample message
Enciphering musing p3 = p eAsamlame essg
Deciphered using p3i = A sample message

A keyword may be provided with the message to derive the permutation. For
example the permutation may be specified by arranging the letters of the first word
in alphabetical order. For example if the reference word is "word" and is placed at
the beginning of the message as "dowr" the permutation can be inferred to be p2
in the above example, and the rest of the message can be deciphered.

In this case the permutation serves as the key. There are N! permutations of length
N. The identity permutation is not of any use so the total number of useful keys
are N! - 1.
218 Chapter 11. Cryptography

A substitution cipher is based on replacing components of the message. For example,


we can replace characters in a text message with other characters. A one-to-one into
mapping serves this purpose. A simple substitution cipher is the Caesar substitution.
A cyclic shift of the alphabet is used. In other words if 'A' is the first letter, 'B' the
second letter and so on, then the n-th letter maps to the ((n+k) mod 26)-th letter,
where k is an integer which defines the substitution.

Example. In this example the function substitute takes the message m to encipher,
and the number n by which to shift the alphabet. The substitution is only applied
to the letters 'A'-'Z' and 'a'-'z'.

II substitute.cpp

#include <iostream>
#include <string>

using namespace std;

void substitute(string &m,int n)


{
int i,l;

I = m.lengthO;
while(n < 0) n += 26;
for(i=O;i < l;i++)
if((m[i] >= 'A')&&(m[i] <= '2'))
m[i] = (m[i]-'A'+n)%26+'A';
else if ( (m [i) >= , a' ) && (m [i] <= , z') )
m[i] = (m[i]-'a'+n)%26+'a';
}

void main(void)
{
string m = "A sample message";
cout « "m = " « m « endl;
substitute(m,l);
cout « "Caesar cipher with n=l "« m « endl;
substitute(m,-l);
substitute(m,-l);
cout « "Caesar cipher with n=-l "« m « endl;
substitute(m,l);
substitute(m,10);
cout « "Caesar cipher with n=10 "« m « endl;
substitute(m,-10);
cout « "m = " « m « endl;
}

The program output is


11.2 Classical Cypher Systems 219

m = A sample message
Caesar cipher with n=l = B tbnqmf nfttbhf
Caesar cipher with n=-l = Z rzlokd ldrrzfd
Caesar cipher with n=10 = K ckwzvo wocckqo
m = A sample message
If each alphabet is viewed as a key then there are only 26 keys. The first alphabet
is the one we already use, so 25 useful keys are left. If permutations of the alphabet
are used instead of only shifts a total of 26! - 1 useful keys are available.

A more advanced substitution is obtained using a Vigenere table. The substitution


rule changes with each position in the message. Each symbol is used as an index
for the column of the table. A keyword is repeated below the message. The symbol
in the keyword string is used as an index for the row of the table. The table has
the standard alphabet as the first row and the previous row shifted left for each row
following. In other words the first row is 'A' 'B' 'C' ... , the second row is 'B' 'C'
'D' ... and so on.

A word can be used for a key to identify for each symbol to encode which row of the
Vigenere table to use. For example, the word "CIPHER" indicates that the third,
ninth, sixteenth, eighth, fifth and eighteenth rows are to be used for enciphering.
Thus the symbol at position i is encoded using the row identified by the symbol in
the i mod l position of the key word, where l is the number of symbols in the key
word.

Example. We modify the previous program to use the Vigenere table and a keyword.
The function vigenere takes three arguments. The argument decipher determines
if the function enciphers or deciphers the message. The argument m is the message
to be enciphered, and k is used as the index for the row in the Vigenere table.

II vigenere.cpp

#include <iostream>
#include <string>

using namespace std;

void vigenere(string tm,string k,int decipher)


{
int i,l,n;

n = k.lengthO;
1 = m.lengthO;
for(i=O;i < l;i++)
if«m[i] >= 'A')&&(m[i] <= 'Z'))
if (decipher)
m[i] = (m[i]-'A'+26-(k[i%n]-'A'))%26+'A';
220 Chapter 11. Cryptography

else
m[i] = (m[i]-'A'+k[i%n]-'A')%26+'A';
else if«m[i]>='a')&&(m[i]<='z'))
if (decipher)
m[i] (m[i]-'a'+26-(k[i%n]-'A'))%26+'a';
else
m[i] (m[i]-'a'+k[i%n]-'A')%26+'a';
}

void main(void)
{
string m = "A sample message";
string k = "CIPHER";
cout « "m = " « m « endl;
vigenere(m,"CIPHER",O);
cout « "Cipher with Vigenere table and keyword CIPHER = "
« m « endl;
vigenere (m, "CIPHER", 1) ;
cout « "m = " « m « endl;
}

The program output is

m = A sample message
Cipher with Vigenere table and keyword CIPHER C hhqgnm tijuivl
m = A sample message
11.3 Public Key Cryptography 221

11.3 Public Key Cryptography


In a public key system two keys are used, one for enciphering and one for deciphering
the message. A system which relies on a public key and a private key is called an
asymmetrical cipher system. These systems rely on a function to be easy to calculate
but the inverse function is difficult to calculate without extra information.

The RSA system is a well known public key system. It uses the fact that the product
of two prime numbers is easy to calculate, but to factor the product into the two
prime numbers is difficult. First two prime numbers p and q are generated, and the
product n = pq calculated. Then e is determined as follows.

3 < e < (p - 1)(q - 1), gcd(e, (p - 1)(q - 1)) = 1.

Lastly d must be determined from

ed = 1 mod (p - 1)(q - 1).

Suppose we have a message with non-negative integer value M. The ciphered mes-
sage is represented by
C = Me mod n.
The message is deciphered as follows

M=C d mod n.

Definition. Euler's totient function cp(n) is the number of positive integers smaller
than n and relatively prime to n. For a prime number p we have cp(p) = p - 1. Thus
for cp( n) we find
cp(n) = cp(p)cp(q) = (p - 1)(q - 1)
where n = pq as given above.

Theorem. For all a, n E N with 0 < a < nand gcd( a, n) = 1


a'P(n) = 1 mod n.

The theorem is called Euler's theorem. For the proof we refer to [171]. The theorem
is of interest because it can be used to prove that encipherment and decipherment
using the RSA system are inverse operations. In other words if we have a message
M enciphered
C=Me mod n
and deciphered according to
M' = Cd mod n
with ed = 1 mod n then M' = M. Again we refer to [171].

The public key in this system is (e, n) and the private key d. The method can be
improved to include verification of the sender and remove transport of the private
222 Chapter 11. Cryptography

key from the sender to the receiver. Suppose the sender has a public key (el' nl) and
a private key d1 . Similarly suppose the receiver has public key (e2' n2) and private
key d2 . Let the message to be encoded be M. Thus an encoded message would be

In other words the sender encodes the message using a private key and then using
the public key of the receiver. The receiver can decode the message using

The receiver decodes the message by first using a private key and then using the
public key of the sender. Using this method only public keys are exchanged. Since
the receiver can only decode the message using the sender's public key the message
source can be verified.

The RSA system relies on the fact that two large prime numbers p and q can be
found. It is generally quite slow to check if numbers are prime, since the obvious
method is check for any factors. Define the Jacobi symbol as follows

J(l,p) := 1

._ { (_1)(p2- 1l/ 8J (a/2,p) a even


J( a, p).- (-1 )(a-l)(p- 1l/4 J(p mod a, a) a odd

Suppose we wish to test if p is prime, select a E {I, 2, ... ,p - I} and calculate


gcd(a,p). If
gcd(a,p) # 1
then p is not a prime number. Otherwise if

J(a,p) f= a(p- 1l/2 mod p

P is not prime. If p is prime then

gcd(a,p) = 1

and
J(a,p) = a(p- 1l/2 mod p
for all a E {I, 2, ... ,p - I}. If p is not prime then the test will fail in more than 50%
of the cases. Every time a is successfully tested the probability that p is a prime
number increases.

First the prime numbers must be generated to implement the algorithm. To per-
form faster encryption a table of prime numbers is used. The prime numbers are
11.3 Public Key Cryptography 223

generated with the following C++ program, and then can be read by a program
which needs prime numbers. The program takes one parameter on the command
line to indicate how many prime numbers to generate. The program output is a
list of prime numbers which can be used in other C++ programs. The standard
error output stream is used to output how many prime numbers have been found.
The program output can be redirected in UNIX and Windows systems with the
command

genprime 10000 > primes.dat

which generates 10000 prime numbers and places them in the file primes. dat. The
header file list. h contains the implementation of the ADT list class developed
earlier.

II gprime.cpp

#include <iostream>
#include <ctype.h>
#include <stdlib.h>
#include <math.h>
#include "list.h"

using namespace std;

typedef unsigned long type;

int main(int argc,char *argv[])


{
list <type> 1;
int i,j,count,success;
type n(5),sn;

if(argc :: 1) return 1;
count: atoi(argv[1]);
1.additem(2); 1.additem(3);
cout « count « endl;

for(i:O;i < count-1;n+:type(2»


{
success: 1;
sn: (type)(sqrt(n)+1);
for(j:O;success&&(j<l.getsize(»&!(l[j]<sn);j++)
if«n%l[j]) :: type(O» success: 0;
if (success)
{
1. addi tem (n) ;
cout « n « endl;
224 Chapter 11. Cryptography

cerr « i « II \r";
i++;
}
}
for(;i < count;n+=type(2))
{
success = 1;
sn = (type)(sqrt(n)+l);
for(j=O;success&&(j<l.getsize())&&(I[j)<sn);j++)
if«n%l[j)) == type(O)) success = 0;
if (success)
{
1. addi tem (n) ;
cout « n « endl;
cerr « i « II \r" ;
i++;
}
}
cerr « endl;
return 0;
}

Similarly the program gkeys. cpp generates an array of key values using the prime
numbers generated in gprime . cpp. The RSA program can then simply use an index
to specify the key. The generation of prime numbers and keys takes a long time,
it is much faster to do the long calculations once and then just use precalculated
results in the algorithm.

II gkeys.cpp

#include <fstream>
#include <stdlib.h>
#include <time.h>

using namespace std;

typedef unsigned long type;

type primelist(int i)
{
type data;
int j;
ifstream primes("primes.dat");

primes» j;
for(j=O;(j<=i)&&!primes.eof()&&!primes.fail();j++)
primes » data;
primes. close 0 ;
11.3 Public Key Cryptography 225

return data;
}

type GCD(type a,type b)


{
type r(i);

while(r != type(O»
{
r = a%b;
if(r != type(O» { a = b; b = r; }
}
return b;
}

void main(int argc,char *argv[])


{
int i,j,count,maxprime,maxkeys = 0;
int total;
ifstream primes("primes.dat");

primes » maxprime;
total = int«double(maxprime)*maxprime-1)/2);
primes. close 0 ;

if(argc == 1) return 1;
count = atoi (argv [1] ) ;
if(count > total) count = total;
cout « count « endl;

srand(time(NULL»;
for(i=O;maxkeys<=count&&i<maxprime;i++)
for(j=i+1;maxkeys<count&&j<maxprime;j++)
{
type temp,temp2,p,q,n,e,d;

p = primelist(i);
q = primelist(j);
n = p*q;
temp = (p-type(l»*(q-type(l»;
d = e = type(O);

for(p=type(4);p < temp;p++)


if(GCD(p,temp) == type(l»
{
e = p;
for(q=type(l);p != temp;q++)
if«(q*temp+l)%e) == 0)
226 Chapter 11. Cryptography

{ d = (q*temp+l)/e; p = temp; }
}
if«e != type(O»&&(d != type(O»)
{
maxkeys++;
cout « n « II II.,
cout « e « II II.,
cout « d « endl;
}
cerr « (total--) « " left to try, " « maxkeys
« " generated \r";
cerr.flushO;
}
if(maxkeys<count) cout « "Not enough keys generated.";
cerr « endl;
}

In the following program it is important to use the class Verylong [169J, which
provides a theoretically unbounded integer type, since for even small prime numbers
< 216 the calculations used in the RSA system can exceed the bounds of the data
type unsigned long depending on the underlying hardware platform. The program
performs the RSA encoding of a message using the previously generated keys. We
again use a recursive implementation for raising an integer to an integer power, this
time using Verylong and modulo arithmetic.

II rsa.cpp

#include <fstream>
#include <stdlib.h>
#include <time.h>
#include <assert.h>
#include "verylong.h"

using namespace std;

void keylist(int i,Verylong &n,Verylong &e,Verylong &d)


{
int j;
ifstream keys("keys.dat");

keys» j;
for(j=O;(j<=i)&&!keys.eof()&&!keys.fail();j++)
{ keys » n; keys » e; keys » d;}
keys. close 0 ;
}

Verylong powermodn(Verylong a,Verylong n,Verylong mod)


11.3 Public Key Cryptography 227

{
Verylong temp;

if(n == Verylong(O)) return Verylong(l);


a %= mod;
temp = powermodn(a,n/Verylong(2) ,mod);
temp = (temp*temp)'l~od;
if(n%Verylong(2) == Verylong(l)) return (a*temp)'l~od;
return temp;
}

void rsa(Verylong *m,Verylong e,Verylong n,int len)


{
int i;
for(i=O;i < len;i++)
m[i] = (powermodn(m[i] ,e,n));
}

void vltoc(Verylong *l,char *c,int n)


{
int i;
for(i=O;i <n ;i++) c[i] char (int Cl [i] ) ) ;
}

void ctovl(char *c,Verylong *l,int n)


{
int i;
for(i=O;i < n;i++) lEi] = Verylong((unsigned)c[i]);
}

void main(void)
{
int i,len,maxkeys;
Verylong e,d,n;
char m[18];
Verylong mt[17];
ifstream keys ("keys. dat") ;

keys » maxkeys; keys.close();

srand(time(NULL));
keylist(rand()'l~axkeys,n,e,d);

strcpy(m,"A sample message");


len = strlen(m);
ctovl(m,mt,len);

cout « "Initial message "« endl;


228 Chapter 11. Cryptography

for(i=O;i < len-l;i++) cout « mt[i] « II


, ,
II.

cout « mt[i] « endl;


cout « m « endl « endl;

rsa(mt,e,n,len);

cout « "Encrypted message ; II « endl;


for(i=O;i<len-l;i++) cout « mt[i] « II II.

cout « mt[i] « endl « endl;

rsa(mt,d,n,len);
vltoc(mt,m,len);

cout « "Decrypted message ; II « endl;


for(i=O;i<len-l;i++) cout « mt[i] « II II.

cout « mt[i] « endl;


cout « m « endl;
}

The program output is

Initial message ;
65,32,115,97,109,112,108,101,32,109,101,115,115,97,103,101
A sample message

Encrypted message
696340,554727,635395,510042,702669,39492,737693,78176,554727,702669,
78176,635395,635395,510042,635068,78176

Decrypted message ;
65,32,115,97,109,112,108,101,32,109,101,115,115,97,103,101
A sample message
Chapter 12
Finite State Machines

12.1 Introduction
Finite state machines [49, 67] provide a visual representation of algorithms. Algo-
rithms are implemented on a machine with a finite number of states representing
the state of the algorithm. This provides an abstract way of designing algorithms.
The chapter will only cover deterministic machines (the actions of the machines are
determined uniquely).

The reason for studying these machines is to determine what are necessary require-
ments to be able to perform arbitrary functions. Certain machines (as will be illus-
trated) cannot perform certain functions. Computer scientists are interested in the
requirements for functions to be performed and what functions can be performed.
Finite state machines can be used to understand these problems. Finite state ma-
chines are concerned with taking an input, changing between internal states, and
generating an output (which may just be the machine's final state). This describes
all computing devices. Thus in an abstract way it is possible to consider what is
computable. Any machine required to solve arbitrary problems must be described
in terms of a basic set of features and operations which determine what the machine
can do. From the description, algorithms to solve problems can be constructed.
Furthermore the basic operations must be reasonable in the sense that it must be
known that the operations can be performed in a finite amount of time. The features
of a machine can, for example, be memory and the ability to output.

In this chapter we discuss finite automata, finite automata with output and Turing
machines. It will become evident that with each improvement the machines can
compute more. We show some problems which are computable by Turing machines
and not by finite automata. Turing machines are used as the basis for deciding what
is computable and what is not.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
230 Chapter 12. Finite State Machines

12.2 Finite Automata


In this section we discuss a simple machine type and consider some computations
these machines can perform. The basic operations are transitions between states.
The features are states which serve as memory.

Definition. A finite automaton consists of

• A finite set S of states. One state is designated as the start state. Some states
may be designated as final states.

• An alphabet E of possible input symbols.

• A finite set of transitions for each state and symbol in the alphabet. Transi-
tions are ordered triples (a, b, c) where a, bE Sand c E E, and b is uniquely
determined by a and c.

An input string of elements of E is provided to the finite automaton. The finite


automaton reads each symbol in the string and causes a transition between states.
(a,b,c) represents a transition from state a to state b when the symbol c is read. If
all symbols of the string have been read and the finite automaton is not in the final
state then the finite automaton is said to fail and the input string is not accepted.
The finite automaton also fails if no transition exists for a symbol read. If the finite
automaton has not failed and all symbols in the input string have been read the
finite automaton terminates successfully and the input string is accepted.

Visually the finite automaton can be represented with circles for the states and
directed edges between states for the transitions. This visual representation is called
a transition diagram. A "-" in a state denotes the start state. A "+" in a state
denotes a final state.

Finite automata can be used to define languages. The language consists of all input
words accepted by the finite automata. The automaton can only accept input words,
it cannot provide any output except for failing or accepting. The only memory the
finite automaton possesses is the state it is currently in and its transitions. This
is obviously a limitation. More effective computing machines such as push-down
automata (using a stack as memory) and Turing machines can increase the number
of computable functions.

Now we provide some examples to show some of the uses of finite automata.
12.2 Finite Automata 231

Example. We can use a finite automaton to perform a parity check. Let

and ~:= {O, I}.

The start state is Seven and the final state Sodd. The table for the transitions is given
by Table 12.1. The transition diagram is shown in Figure 12.1.

State Input Symbol Next State


Sodd 0 Sodd
Sodd 1 Seven
Seven 0 Seven
Seven 1 Sodd

Table 12.1: Parity Check Finite Automaton - Transitions

Figure 12.1: Parity Check Finite Automaton

The finite automaton only accepts bit strings which pass the odd parity test. If
Seven were selected as the final state instead of Sodd the finite automaton would only
accept bit strings which pass the even parity test. Note that it is not necessary to
label the states in the transition diagram since this does not change the operation
of the finite automata. '"
232 Chapter 12. Finite State Machines

Example. Consider the finite automaton with I: := {O, I},

with the start state SStart and final states SO,3 and Sl,3, and transition Table 12.2.
The transition diagram is given by Figure 12.2.

State Input NextState


SStart
SStart
°1 SO,l
Sl,l
SO,l
SO,l
°1 SO,2
SNA
SO,2
SO,2
°1 SO,3
SNA
SO,3
SO,3
°1 SNA
SNA
Sl,l
Sl,l
°1 SNA
Sl,2
Sl,2
Sl,2
°1 SNA
Sl,3
Sl,3
Sl,3
°1 SNA
SNA
SNA
SNA
°1 SNA
SNA

Table 12.2: Hamming Code Finite Automaton - Transitions

0,1
y
o ~
° °
1 1
~
/
0,1

0,1

Figure 12.2: Hamming Code Finite Automaton

This finite automaton only accepts code words from the Hamming code CH2 . "
12.3 Finite Automata with Output 233

12.3 Finite Automata with Output


Now we extend the abilities of the machine by letting it output symbols. This
allows the machine to do something. The output may be used by other devices
and more input may be generated so a machine that reacts to its environment can
be constructed. Two extensions to finite automata that achieve this are Moore
machines and Mealy machines.

Definition. A Moore machine consists of


• A finite set 8 of states. One state is designated as the start state.

• An alphabet ~ of possible input symbols.

• An alphabet r of possible output symbols.


• A finite set of transitions for each state and symbol in the alphabet ~. Tran-
sitions are ordered triples (a, b, c) where a, bE 8 and c E ~, and b is uniquely
determined by a and c.

• For each state the symbol from r to output when the state is entered.
The transition diagrams already introduced can be extended for Moore machines
by writing the output symbol in the circle for each state. Unlike finite automata, a
Moore machine does not accept or reject input strings, rather it processes them. If
8 is a state in a Moore machine then the notation 8 - denotes the fact that 8 is a
start state.

This machine has only the memory of which state it is in and its transitions. In this
respect it is no more powerful than a finite automaton. But its relation to practical
usage is stronger since now the machine is able to give us information about the
input provided, beyond a simple accept or fail. The ability to output is also tied to
memory. If a machine can read its own output at a later stage it may be able to
compute more. These ideas are incorporated into the Turing machines.

These machines can be coupled so that the output of one machine can be used as
input for another. A Moore machine exists for any pair of coupled Moore machines.
The set of states for such a machine is the Cartesian product of the sets of states of
each of the machines. Let 8 1 = {Sl,O, Sl,l, ... } and 8 2 = {S2,0, S2,1, ... }, where Sl,O
and S2,0 are the start states, be the states of the first and second Moore machines
respectively. Let the output for Si,j be denoted by Oi,j' The machine with states 8 1 x
8 2 , start state (Sl,O, S2,0), output 02,j for state (Sl,i, S2,j), and transitions (Sl,i, S2,j) --+
(Sl,k, S2,Z) if the Sl,i --+ Sl,k is a transition for some input for the first machine and
S2,j --+ S2,Z is a transition for input 01,k in the second machine. Thus combining
Moore machines provides no extra computing power to this class of machines.
234 Chapter 12. Finite State Machines

Example. Table 12.3 describes a simple Moore machine which performs the NOT
operation. Here

State Output Input Next State


So- 0 0 Sl
1 S2
Sl 1 0 Sl
1 S2
S2 0 0 Sl
1 S2

Table 12.3: Moore Machine for the NOT Operation - Transitions

1 0

Figure 12.3: Moore Machine for the NOT Operation


12.3 Finite Automata with Output 235

Example. This example shows how an n-bit incrementer which increments any n-bit
number modulo 2n (2n == 0 modulo 2n) can be implemented with a Moore machine.
The bits are fed in from low order to high order. For example the decimal number
11 with bit representation 1011 will be input as 1,1,0 and then 1. The transition
table is given by Table 12.4. Here

The transition diagram is given by Figure 12.4.

State Output Input Next State


80 - 0 0 81
1 82
81 1 0 83
1 84
82 0 0 81
1 82
83 0 0 83
1 84
84 1 0 83
1 84

Table 12.4: n-bit Incrementer Moore Machine - Transitions

0- 1

Figure 12.4: n-bit Incrementer Moore Machine


236 Chapter 12. Finite State Machines

Definition. A Mealy machine consists of


• A finite set S of states. One state is designated as the start state.

• An alphabet E of possible input symbols.

• An alphabet r of possible output symbols.


• A finite set of transitions for each state and symbol in the alphabet E. Tran-
sitions are ordered triples (a, b, c) where a, b E Sand c E E, and b is uniquely
determined by a and c.

• For each transition the symbol from r to output.

The transition diagrams already introduced can be extended for Mealy machines by
writing the input and output symbols as an ordered pair (i, 0) for each transition.
Unlike finite automata a Mealy machine does not accept or reject input strings
rather it processes them. Similarly to Moore machines, Mealy machines can be
combined. Using a similar proof to the one for Moore machines, the combination of
Mealy machines provides no extra computing power.

Example. Table 12.5 describes a simple Mealy machine which performs the NOT
operation. Here
S:= {So}, E:= {D, I}, r:= {D, I}.
The transition diagram is given in Figure 12.5.

State Input Output Next State


So- D 1 So
1 0 So

Table 12.5: Mealy Machine for the NOT Operation - Transitions

(P(D,I),(I,D)

Figure 12.5: Mealy Machine for the NOT operation


12.3 Finite Automata with Output 237

Example. This example shows how an n-bit incrementer which increments any n-bit
number modulo 2n (2n == 0 modulo 2n) can be implemented with a Mealy machine.
The bits are fed in from low order to high order. For example the number 11 with
bit representation 1011 will be input as 1,1,0 and then 1. The transition table is
given by Table 12.6. Here

S:={SO,S1,S2}, E:={O,l}, r={O,l}.

State Input Output Next State


So- 0 1 S1
1 0 S2
S1 0 0 S1
1 1 S1
S2 0 1 S1
1 0 S2

Table 12.6: n-bit Incrementer Mealy Machine - Transitions

(0,0),(1,1)

(1,0)

Figure 12.6: n-bit Incrementer Mealy Machine

For every Moore machine there is an equivalent Mealy machine and conversely. For
the proof we refer to [49]. This is simply a matter of showing how to gain the same
output for Moore and Mealy machines with the same input.
238 Chapter 12. Finite State Machines

12.4 Turing Machines


Turing machines are more powerful than the finite automata discussed in the pre-
vious section because they have memory.

Definition. A Turing machine consists of


• A finite set of states S one of which is designated the start state. Some states
may be designated as halt states which cause the Turing machine to terminate
execution.
• An alphabet E of possible input symbols.
• An alphabet r of possible output symbols.

• The blank symbol ~.


• A tape or memory device which consists of adjacent cells labelled
cell[O], cell[I], ....
Cells of the tape can contain a single symbol from E U r U {~}. The input
string is placed in the first cells of the tape, the rest of the cells are filled
with ~.
• A tape head that can read the contents of a tape cell, put a symbol from r or
the ~ symbol in the tape cell and move one cell right or left. All these actions
take place simultaneously. If the tape head tries to move left from cell[O] the
Turing machine is said to crash. If the head is at cell[i] and moves left (right)
then the head will be at cell[i - 1] (cell[i + 1]).
• A finite set of transitions for states and symbols from EUrU{~}. A transition
is an ordered 5-tuple (a, b, c, d, e) with
a E S, bEE U r U {~}, c E S, dE r U {~}
and e E {r, l}. Here a is the current state, b is the symbol read by the tape
head, c is the next state, d is the symbol for the tape head to write in the
current cell and e = r (e = l) moves the tape head right (left). The elements
c, d and e are uniquely determined by a and b. If an input symbol is read and
no transition corresponds to the current state and symbol read the Turing
machine is said to crash.
Input strings of symbols from E which cause the Turing machine to end on a halt
state are said to be accepted by the Turing machine. Graphically states are repre-
sented with circles and transitions with directed edges between states labelled with
a triple (a, b, c) where a is the symbol read from the tape, b is the symbol to write to
the tape and c is the direction to move the tape head(l or r). A "-" in a state will
represent a start state and a "+" in a state will represent a halt state. Obviously
Turing machines can do at least as much as Mealy machines (and therefore also
Moore machines) and finite automata.
12.4 Thring Machines 239

Example. We can use a Turing machine to perform the parity check. Let

and
E:={O,1}, r:={O,1}.
The start state is Seven and halt state Sfin. The table for the transitions is given by
Table 12.7. The symbol r in the movement column instructs the tape head to move
one cell right. The transition diagram is given by Figure 12.7.

State Input Symbol Output Symbol Movement Next State


Sodd
Sodd °
1 1° r
r
Sodd
Seven
Seven
Seven °
1
~

~
r
r
r
Seven
Sodd
Sodd Sfin

Table 12.7: Parity Check Turing Machine - Transitions

+
(1,1,r) (~,~,r)

(O,O,r) 0 - o (O,O,r)

\------------~)
(1,1,r)

Figure 12.7: Parity Check Turing Machine

The Turing machine only accepts bit strings which pass the odd parity test. Note
that it is not necessary to label the states in the transition diagram since this does
not change the operation of the Turing machine. ..
240 Chapter 12. Finite State Machines

Example. Now we use a 'lUring machine to calculate the parity bit for odd parity
and place it in the cell of the tape immediately after the bit string used for input.
Let

and
E := {a, 1}, r = {a, 1}.
The start state is Seven and halt states S/inl and S/in2. The table for the transitions
is given by Table 12.8. The transition diagram is given by Figure 12.8.

State Input Symbol Output Symbol Movement Next State


Sodd
Sodd °1 1° r
r
Sodd
Seven
Sodd
Seven
tl.

°1
°°
1
r
r
r
S/inl
Seven
Seven Sodd
Seven tl. 1 r S/in2

Table 12.8: Parity Calculation 'lUring Machine Transitions

(tl.,1,r) (1,1,r) (tl.,O,r)

(O,O,r) (O,O,r)

\.'-------~)
(1,1,r)

Figure 12.8: Parity Calculation Turing Machine


12.4 Thring Machines 241

Example. Now we use a Turing machine to negate (NOT) a bit sequence (one's
complement). The states are

Sstart is the start state and Shalt is a halt state. The alphabets are

~:={O,1}, r:={O,1}.

The transition table is given by Table 12.9. The transition diagram is given by
Figure 12.9.

State Input Output Movement Next State


Sstart 0 1 r Sstart
Sstart 1 0 r Sstart
Sstart ~ ~ r Shalt

Table 12.9: Turing Machine for the NOT Operation - Transitions

(O,1,r),(1,O,r)

Q(~,~,r)~
~

Figure 12.9: Turing Machine for the NOT Operation


242 Chapter 12. Finite State Machines

Example. Now we consider a 'lUring machine which has no finite automaton equiv-
alent. The 'lUring machine reverses a bit string. The states are

~ := {OI,lI} and r := {Do, 10 ,01 , II}. 8start is the start state and 8 halt is the
halt state. The input and output alphabet are different so that the machine can
differentiate between input and output symbols. The input and output will be
interpreted as binary digits but using different alphabets means the machine can
remember what it has already done. The transitions are given by Table 12.10. The
transition diagram is given by Figure 12.10. ..

State Input Output Movement Next State


8start 00 00 r 8 halt
10 10 r 8 halt
01 ~ r 80,1
II ~ r 8 1,1
80,1 01 01 r 80,1
II II r 80,1
00 00 I 80,2
10 10 I 80,2
~ ~ I 80,2
80,2 01 00 I 80,3
h 00 I 8 1,3
80,3 01 01 I 80,3
II II I 80,3
00 00 I 80,3
10 10 I SO,3
~ 00 r 8start
8 1,1 01 O[ r 8 1,1
I[ 1/ r 8 1,1
00 00 l 8 1,2
10 10 I 8 1,2
~ ~ I 8 1,2
8 1,2 II 10 I 8 1,3
01 10 I 80,3
8 1,3 01 01 I 8 1,3
h II I 8 1,3
00 00 I 8 1,3
10 10 I 8 1,3
~ 10 r 8start

Table 12.10: Bit Reversal 'lUring Machine Transitions


12.4 Thring Machines 243

LF(Ll, 10, r)
L2=(Ll, 00, r)
L3=(1[,00,1)
L4=(O[,lo,l)

Figure 12.10: Bit Reversal Thring Machine


244 Chapter 12. Finite State Machines

12.5 Example Programs


A general Turing machine is implemented in C++. The Turing machines for parity
check, parity calculation and bit string reversal are constructed and tested. The
program contains three classes, Tapecel! which provides support for a dynami-
cally growing tape, Transition which is used to implement transition tables and
TuringMachine which implements the Turing machine. The constructor for class
Transition takes as arguments an integer to identify the state, a symbol which
when read causes the transition, a symbol to output, a movement right (1) or left
(-1), the state to change to and a value to indicate if the state is a halt state (0 indi-
cates the state is not a halt state). The protected methods of TuringMachine are
tcrash to print an error message when the machine crashes, add to extend the list
dynamically to accomadate new symbols on the tape, lookup, to find the transition
for the current symbol on the tape and current state of the machine and ishalt to
determine if the current state is a halt state. The constructor of TuringMachine
takes as arguments a transition table, an integer specifying how many transitions
the Turing machine has and an integer identifying the start state. The destructor
deallocates the list used for the tape. The method run takes as arguments a string
as input and an integer specifying the length of the input.

II turing.cpp

#include <iostream>
#include <string>

using namespace std;

class TuringMachine; II forward declaration

class Tapecell
{
protected:
char symbol;
Tapecell *next,*previous;

friend class TuringMachine;


};

class Transition
{
public:
int state,nextstate;
char input ,output ,movement ,halt;
Transition(int s = O,char i = , ',char 0 = , ,
char m = l,int ns = O,char h = 0)
:state(s) ,input (i) ,output (0) ,
nextstate(ns),movement(m),halt(h) { }
12.5 Example Programs 245

};

class TuringMachine
{
protected:
Tapecell *tape;
Transition *table;
int ccell,state,tentries,crash,sstate;
void tcrash(char);
void add(char);
Transition *lookup(Tapecell *);
int ishalt(int);

public:
TuringMachine(Transition *,int,int);
-TuringMachine();
void run(const string &,int);
static char left,right;
};

II constructor
TuringMachine::TuringMachine(Transition *ttable,int entr,int strt)
{
int i;

table = new Transition[entr];


tape = (Tapecell *)NULL;
for(i=O;i < entr;i++)
table[i] = ttable[i];
ccell = -1;
sstate = strt;
tentries = entr;
}

II destructor
TuringMachine::-TuringMachine()
{
Tapecell *cell = tape;

delete[] table;
if(cell != (Tapecell *)NULL)
while(cell->next != (Tapecell *)NULL)
{
cell = cell->next;
if(cell->previous != (Tapecell *)NULL)
delete cell->previous;
}
if(cell != (Tapecell *)NULL)
246 Chapter 12. Finite State Machines

delete cell;
}

void TuringMachine::run(const string &input,int len)


{
int i,haltj
Tapecell *cell = tape;
Transition *trans;

for(i=O;i < len;i++)


add (input [i)) ;
ccell = 0;
crash = 0;
state=sstate;
if(cell == (Tapecell*)NULL)
{
add(' ')j
cell = tape;
}
halt = ishalt(state);
while(!crash && !halt)
{
trans = lookup(cell);
if(trans == (Transition *)NULL)
tcrash(cell->symbol);
else
{
if(!crash && !halt)
{
cell->symbol = trans->outputj
state = trans->nextstatej
if(trans->movement < 0)
{
if(cell->previous == (Tapecell *)NULL)
tcrash(cell->symbol);
else
{
cell = cell->previous;
ccell--;
}
}
else if(trans->movement > 0)
{
if(cell->next == (Tapecell *)NULL)
add(' ');
cell = cell->next; ccell++;
}
else
12.5 Example Programs 247

tcrash(cell->symbol);
}
halt = ishalt(state);
}
}
if(!crash)
{
cell = tape;
cout « "Succesful completion, tape:" « endl;
while(cell != (Tapecell *)NULL)
{
cout « cell->symbol;
cell cell->next;
}
cout « endl;
}
cell = tape;
if (cell != (Tapecell *)NULL)
while(cell->next != (Tapecell*)NULL)
{
cell = cell->next;
if(cell->previous != (Tapecell*)NULL)
delete cell->previous;
}
if(cell != (Tapecell *)NULL)
delete cell;
tape = (Tapecell *)NULL;
}

void TuringMachine::tcrash(char symbol)


{
crash = 1;
cout « "The Turing Machine crashed at state " « state
« " and cell " « ccell
« " with symbol \"" « symbol
« "\"" « endl;
}

void TuringMachine::add(char symbol)


{
if(tape == (Tapecell *)NULL)
{
tape = new Tapecell;
tape->next = tape->previous (Tapecell *)NULL;
tape->symbol = symbol;
}
else
{
248 Chapter 12. Finite State Machines

Tapecell *cell = tape;


while(cell->next != (Tape cell *)NULL)
cell = cell->next;
cell->next = new Tapecell;
cell->next->previous = cell;
cell->next->next = (Tapecell *)NULL;
cell->next->symbol = symbol;
}
}

Transition *TuringMachine::lookup(Tapecell *cell)


{
int i;
for(i=O;i < tentries;i++)
if «table [i] . state == state)&&(table[i].input cell->symbol»
return &(table[i]);
return (Transition*) NULL;
}

int TuringMachine::ishalt(int state)


{
int i;
for(i=O;i < tentries;i++)
if«table[i].state == state)l&(table[i].halt 1»
return 1;
return 0;
}

char TuringMachine::left -l,TuringMachine::right 1;

void main(void)
{
II parity calculation Turing Machine Transitions
Transition paritytable[8] = {
Transition(l,'O','O',TuringMachine::right,l,O),
Transition(l,'l','l',TuringMachine::right,O,O),
Transition(l,' ','0',TuringMachine::right,2,0),
Transition(O,'O','O',TuringMachine::right,O,O),
Transition(O,'l','l',TuringMachine::right,l,O),
Transition(O,' ','1',TuringMachine::right,3,0),
Transition(2,' ',' ',TuringMachine::right,2,1), I Ihal t state
Transition(3,' ',' ',TuringMachine::right,3,1) I Ihal t state
};

II string reverse Turing Machine Transitions


Transition reversetable[29] = {
Transition(0,'0','0',TuringMachine::right,50,O),
Transition(O,'1','1',TuringMachine::right,50,O),
12.5 Example Programs 249

Transition(O,'a',' ',TuringMachine::right,10,O),
Transition(O,'b',' ',TuringMachine::right,ll,O),
Transition(10,'a','a',TuringMachine::right,10,O),
Transition(10,'b','b',TuringMachine::right,10,O),
Transition(10,'O','O',TuringMachine::left,20,O),
Transition(10,'1','1',TuringMachine::left,20,O),
Transition(10,' ',' ',TuringMachine::left,20,O),
Transition(20,'a','O',TuringMachine::left,30,O),
Transition(20,'b','O',TuringMachine::left,31,O),
Transition(30,'a','a',TuringMachine::left,30,O),
Transition(30,'b','b',TuringMachine::left,30,O),
Transition(30,'O','O',TuringMachine::left,30,O),
Transition(30,'1','1',TuringMachine::left,30,O),
Transition (30 , , ','O',TuringMachine::right,O,O),
Transition(ll,'a','a' ,TuringMachine::right,11,O),
Transition(ll,'b','b',TuringMachine::right,ll,O),
Transition(11,'O','O',TuringMachine::left,21,O),
Transition(11,'1','1',TuringMachine::left,21,O),
Transition(ll,' ',' ',TuringMachine::left,21,0),
Transition(21,'a','1',TuringMachine::left,30,0),
Transition(21,'b','1',TuringMachine::left,31,0),
Transition(31,'a','a',TuringMachine::left,31,0),
Transition(31,'b','b',TuringMachine::left,31,0),
Transition(31,'0','0',TuringMachine::left,31,0),
Transition(31,'1','1',TuringMachine::left,31,0),
Transition(31,' ','l',TuringMachine::right,O,O),
Transition(50,' ',' ',TuringMachine::right,50,1) //halt state
};

string paritycheck="01101001";
string reversecheck="01101001";

TuringMachine parity(paritytable,8,0);
cout « "Parity calculation with input "
« paritycheck « endl;
parity.run(paritycheck,8);

TuringMachine reverse(reversetable,29,O);
cout « "Reverse input "
« reversecheck « endl;
reverse.run(reversecheck,8);

paritycheck[6] = 'a';
cout « "Crash parity calulation with input "
« paritycheck « endl;
parity.run(paritycheck,8);
}
250 Chapter 12. Finite State Machines

The output of the program is

Parity calculation with input 01101001


Succesful completion, tape:
011010011
Reverse input 01101001
Succesful completion, tape:
10010110
Crash parity calulation with input 011010a1
The Turing Machine crashed at state 1 and cell 6 with symbol "a"
Chapter 13
Computability and Complexity

13.1 Introduction
Once we have the building blocks for a computing device, we can construct the
device and give it tasks to perform. Some tasks are more difficult than others.
Some tasks may even be impossible for the computing device to perform. This is
the concept of computability. Since tasks can be represented as functions, we need to
determine the computability of functions. The computable functions are obviously
limited by the computing device, but if we choose a sufficiently general computing
device it can serve as a measure for computability.

We also need a measure of the difficulty of tasks. This measure indicates how fast
the task can be done. Some problems are inherently difficult such as prime number
factorization as used in public key cryptography systems, and therefore take a long
time to perform. This is referred to as the complexity of the problem. In general two
measures of complexity are often used, the time complexity and space complexity.
Time complexity describes the amount of time taken to do a task given the input.
Space complexity refers to the amount of memory required to perform the task given
the input. More precisely the measure of complexity is applied to algorithms, since
some algorithms are more efficient than others.

Usually the complexity of an algorithm is described in terms of the size n of the


input. The notation f(n) is (of order) O(g(n)) is used to indicate that there exists
c E R with c > 0 and No E N such that for all N > No If(N)1 ~ clg(N)I. For
example (n + 1)2 is O(n) and O(n2 ).

The complexity of sequences of symbols has been analysed [106, 109]. Thus if an
algorithm can be transformed into an appropriate sequence of symbols, the com-
plexity of the sequence can be used as a measure of the complexity of the algorithm.
An example is given in [161].

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
252 Chapter 13. Computability and Complexity

13.2 Computability

Computability is formulated with respect to given computing models. For example


the Turing machine is a computing model. We could define computability in terms
of what is computable with a Turing machine. A difficulty arises when we note that
a Turing machine can compute more than a finite automaton. Other computing
models exist, but if they are proven to be equivalent to the Turing machine model,
the computable functions remain the same. The computing model must be reason-
able in the sense that the components of the model must be achievable. We need to
determine a reasonable computing model such that no other computing model can
compute more functions.

13.2.1 Church's Thesis

Church's thesis states that the intuitively computable functions are exactly the
partial recursive functions. Sometimes Church's thesis is called the Church-Turing
thesis because it can be formulated as the intuitively computable functions are
the functions which can be computed by Turing machines. To show that these two
statements are equivalent requires that we show that every partial recursive function
can be computed by a Turing machine and every Turing machine computes a partial
recursive function. It is simple to see how to implement the successor function, at
least it is simple to build a binary incrementer Turing machine (in the previous
chapter we showed how to achieve this using Moore and Mealy machines). The
projection operation is also not difficult to implement on a Turing machine. It can
be achieved by reading from the least significant bit to the most significant bit and
if the bit is 0 blank every second word (bit sequence) and if the bit is 1 blank every
first word (bit sequence). We can introduce new symbols to indicate the end of
words and the end of the words on the tape to simplify the implementation. The
zero function is trivial to implement using a Turing machine. It is also necessary to
show that primitive recursion and composition can be realised somehow with Turing
machines. Composition should pose no problem, if new symbols are introduced again
to make the task easier. The composition is a combination of the Turing machines
implementing each of the functions in the composition, and a control structure.
Primitive recursion can be implemented by writing n, n - 1, ... ,0 on the tape after
the input number. The value for n = 0 is part of the TUring machine structure,
independent of the contents of the tape. Once the function value is known for zero,
the value at n = 1 can be calculated and so on, up to n + 1. So we expect that
a Turing machine can compute all primitive recursive functions. A motivation for
the thesis is that Turing machines can compute anything that we can. Given as
much paper as needed we can compute certain functions using basic operations, for
a Turing machine the paper is formally defined by the tape and the basic operations
are formally defined by transitions. Any step in the computation is determined by
the contents of the paper, a TUring machine operates uniquely according to the tape
contents. Since we use the term "intuitively computable" , the statement cannot be
proven. A proof would require a definition of intuitive computability.
13.2 Computability 253

13.2.2 The Halting Problem


An interesting task for a computing model is simulation. If a computing model
A can, in some way, simulate another computing model B then A is at least as
powerful as B. There exists a 'lUring machine, called a universal Turing machine
which can simulate any other 'lUring machine. As input, the table of transitions
and the input of the simulated machine must be stored on the tape. Since we can
number the states from 1 to n we can represent states by a bit string or simply a
symbol duplicated i times for state i. We are completely free to choose the number
of symbols for the representation of states and symbols. We also require a method
of tracking which state the machine is in and which input symbol must be read next.
The following universal 'lUring machine is due to Minsky [67, 117].

Figure 13.1: Universal 'lUring Machine

To simplify the machine, if there is no arc for a given state and input then the
machine continues the last motion and replaces the symbol with itself (a transition
to a state for this machine always has the same motion of the tape head). Also an
254 Chapter 13. Computability and Complexity

arc with no label replaces the symbol on the tape with itself and moves the tape
head left. We assume that the machine we wish to simulate uses only binary for
input and output. For each state, a valid transition can be represented in a finite
number of bits, i.e. a fixed number to represent the current and next state, and a
single bit to represent the input, output and movement. The description here uses
a tape which is infinite to the left, with the description of the Turing machine to be
simulated starting at the rightmost position of the tape. The description consists of
transitions represented in binary, where the end of a transition description is marked
by the symbol X. The end of the table of transitions is marked by a Y. Additional
symbols are used to mark the state of the machine. The start state is assumed to
begin immediately under the tape head.

Now we consider some problems the Turing machine cannot solve. For the halting
problem we consider if a Turing machine H exists which always halts, when given
as input a representation of another Turing machine and its input, and will give an
output indicating if the given Turing machine halts or not. A simple extension gives
the machine H' which halts whenever the input machine does not halt, and never
halts when the input machine does halt (achieved by a simple loop between two
states for any symbol read from the tape). Furthermore we require that the input
machine take its own description as input. If we use as input to the machine H', the
machine H' itself with itself again as input, we obtain a machine which halts only
when the machine does not halt. Thus such a Turing machine H' does not exist.

13.3 Godel's Incompleteness Theorem


Godel's Incompleteness Theorem states that not all theorems in number theory can
be proved. An important part of the proof is the Codel numbering given below. This
can be used to describe any sequence of symbols, for example a theorem's proof, in
terms of the natural numbers.

13.3.1 Godel Numbering

We can work with an alphabet which contains only a single letter, e.g. the letter I.
The words constructed from this alphabet (apart from the empty word) are: I, II, III,
etc. These words can, in a trivial way, be identified with the natural numbers 0,
1, 2, .... Such an extreme standardization of the "material" is advisable for some
considerations. On the other hand, it is often convenient to disperse the diversity
of an alphabet consisting of several elements.

The use of an alphabet consisting of one element does not imply any essential
limitation. We can associate the words W over an alphabet A consisting of N
elements with natural numbers G(W), in such a way that each natural number is
associated with at most one word. Similar arguments apply to words of an alphabet
consisting of one element. Such a representation of G is called a G6del numbering [63]
13.3 Gi:>del's Incompleteness Theorem 255

(also called arithmetization) and G(W) is the Godel number of the the word W with
respect to G. The following are the requirements for an arithmetization of W:
1 If WI i= W 2 then G(W1) i= G(W2)'
2 There exists an algorithm such that for any given word W, the corresponding
natural number G(W) can be computed in a finite number of steps.
3 For any natural number n, it can be decided whether n is the Godel number
of a word W over A in a finite number of steps.
4 There exists an algorithm such that if n is the Godel number of a word W over
A, then this word W (which is unique by argument (1)) can be constructed in
a finite number of steps.
Here is an example of a Godel numbering. Consider the alphabet with the letters
a, b, c. A word is constructed by any finite concatenation of these - that is, a
placement of these letters side by side in a line. For example, abcbba is a word. We
can then number the words as follows:
Given a word XIX2'" Xn where each Xi is a, b or c, we assign to it the number
2do • 3d , • ... • pdn
n

where Pi is the ith prime number (and 2 is the oth prime) and

I if Xi is a
d; := { 2 if Xi is b
3 if Xi is c

The empty word is given the number O.


For example, the word acbc has number 21 . 33 . 52 . 73 = 463050, and abc has
the number 21 .32 .53 = 2250. The number 7350 represents aabb because 7350 =
21 .3 1 .5 2 • 72 •

To show that this numbering satisfies the criteria given above, we use the funda-
mental theorem of arithmetic:
Any natural number 2: 2 can be represented as a product of primes, and that
product is, except for the orner of the primes, unique.
We may number all kinds of objects, not just alphabets. In general, the criteria for
a numbering to be useful are:
1. No two objects have the same number.

2. Given any object, we can "effectively" find the number that corresponds to it.
3. Given any number, we can "effectively" determine if it is assigned to an object
and, if so, to which object.
256 Chapter 13. Computability and Complexity

13.3.2 Godel's Incompleteness Theorem


Now we give an overview of the incompleteness theorem [10]. We assume that
number theory is consistent, i.e. a theorem and its logical negation cannot both be
proved. If this were the case, the theory would not be interesting.

Since we have already considered a Godel numbering we can associate numbers


with theorems and proofs in number theory. Let the predicatep(i,j) of two natural
numbers be true if and only if i is the Godel number associated with a formula
B(x) with one free variable x and j is the Godel number associated with the proof
of B(i). Furthermore, if p(i,j) is true then a proof can be constructed for these
specific integers.

Now consider

with the Godel number m. This states that there is no proof for the theorem x. Let

A:= Vy ---,p(m,y).

Thus A states that there is no proof of A. Suppose A can be proved and n is


the Godel number for the proof. Thus p( m, n) is true and can be proved. But a
proof exists for A which means we can prove Vy ---, p( m, y) which implies ---, p( m, n),
a contradiction. Suppose instead that ---, A can be proved, i.e :3y p( m, y) can be
proved. Thus there exists n such that p( m, n) can be proved, and n is a proof of A
which is a contradiction.

Thus if number theory is consistent there exists a theorem such as A which cannot
be proved.

13.4 Complexity
13.4.1 Complexity of Bit Strings
Usually the complexity of an algorithm is expressed in terms of the size of the
input. Many different definitions of complexity have been proposed in the litera-
ture. A few are algorithmic complexity (Kolmogorov-Chaitin) [41], the Lempel-Ziv
complexity [109], the logical depth of Bennett [13], the effective measure of com-
plexity of Grassberger [76], the complexity of a system based on its diversity [94],
the thermodynamic depth [111], and a statistical measure of complexity [113].

We may describe the time complexity in terms of the total number of operations
required for a certain input size, or we may choose some basic operation as the
most expensive (such as multiplication or comparison) and use that to describe
the complexity of an algorithm. We can represent any program as a bitstring,
for example by calculating the Godel number of the program and using the bit
representation of this number. We can then use, as a measure of complexity, the
13.4 Complexity 257

compressibility of the bit string. Here we use the measure defined by Lempel and
Ziv [109, 161].

Given a binary string S = 8182, ... , 8 n of finite length n, we denote by S(i,j) the
substring 8i8i+l ... 8j (or the empty word if i > j) of S and by v(S) all substrings
of S. If Sl and S2 are two strings SlS2 denotes the concatenation (appending) of
S2 and Sl. The complexity in the sense of Lempel and Ziv of a finite string is
evaluated from the point of view of a simple self-delimiting learning machine, which
as it scans a given n digit string S = 8182, ... , 8 n from left to right, adds a new
string to its memory every time it discovers a substring of consecutive digits not
previously encountered. We begin with the complexity of the empty string as O.
Suppose we have already scanned the first r digits

where 0 indicates that we know the complexity c(R). We have to determine if the
rest of the string S(r + 1, n) can be produced by a simple copy operation. To do
this we consider the substrings

Qr+i := S(r + 1, r + i) 1::::: i ::::: n - r.

For i < 1 we use the empty string as Qr+i. Initially we consider i = 1. The substring
RQr+i can be produced by a simple copy operation if

If this is the case and the substring begins at 8j with j ::::: r, we can simply copy
8jH to 8 r H for k = 1,2, ... i, so we try i + 1. For r + i = n we have the special
case C(RQr+i) = c(R) + 1. If this is not the case, we have C(RQr+i) = c(R) + 1 and
repeat the process using RQr+i as Rand i = 1.

For example the bitstring consisting of only Os or only Is has complexity 2. Alter-
nating Os and Is has complexity 3.

o0 1 0010101. ...
The string 01101000011101001 has complexity 6.

0010 10 01000001 0 1101001.

We give an implementation of this algorithm below.


258 Chapter 13. Computability and Complexity

II complex.cpp

#include <iostream.h>
#include <string.h>

int substring(char *s,int r,int i)


{
int j;

for(j=O;j <= r;j++)


if(strncmp(s+r+1,s+j,i) 0) return 1;

return 0;
}

int complexity(char *s)


{
static char *laststring="";
static int c,r;
int n = strlen(s);
int i;

if(n 0) return 0;
if(r n-1) return c;

if(laststring!=s) { c 1; r 0; }
laststring=s;

for(i=1;i < n-r;i++)


if(!substring(s,r,i»
{
c++;
r+=i;
return complexity(s);
}

return ++c;
}

void main(void)
{
char *str1 "0101010101";
char *str2 "1010101010101010101";
char *str3 "01101000011101001";
char *str4 "1011001011";

cout « str1 « " has complexity" « complexity(str1) « endl;


cout « str2 « " has complexity" « complexity(str2) « endl;
13.4 Complexity 259

cout « str3 « " has complexity" « complexity(str3) « endl;


cout « str4 « " has complexity" « complexity(str4) « endl;
}

The program output is

0101010101 has complexity 3


1010101010101010101 has complexity 3
01101000011101001 has complexity 6
1011001011 has complexity 5

13.4.2 NP-class of Problems


Definition. We can define the time complexity [114] Ct(T, n) of a Turing machine T
as the maximum number of transitions between states for an input of length n on
the tape.

Definition. The space complexity [114] Cs(T, n) of a Turing machine T as the max-
imum number of cells into which T writes.

Definition. A Turing machine T is called polynomial if there exists a polynomial


p(n) such that Ct(T,n) is O(p(n)).

We consider problems on decidability, problems which require a 'yes' or 'no' answer.

Definition. The class of problems on decidability for which there exists a polynomial
Turing machine is called the P-class of problems, denoted by the set P.

Definition. The N P -class of problems (denoted by the set N P) are those problems
for which, when given a potential solution, there exists a polynomial Turing machine
to determine if the solution is valid. The N P is for non-deterministic polynomial.
Thus if we can find a potential solution, for example by construction using random
numbers such that the probability of constructing an actual solution is sufficiently
high, the validity of the solution can be efficiently checked.

Consider the problem of satisfiability. A logic formula consisting of n Boolean vari-


ables, requires (as a worst case) checking 2n combinations of truth values before we
know if the formula can be satisfied. The truth table method requires the evalua-
tion of all 2n combinations, for other techniques there always exists a formula for
which the worst case holds. If we have an assignment of truth values to the Boolean
variables, a polynomial operation can check if the assignment satisfies the formula.

Definition. A problem N PC on decidability is NP-complete if every problem in


NP is polynomially reducable to NPC.

If A is NP-complete then we can reformulate any problem in N P to the form of


A. Thus if we can solve A we can solve any other problem in N P, furthermore if
260 Chapter 13. Computability and Complexity

a polynomial algorithm exists for A then a polynomial algorithm exists for every
other problem in N P. An important question in complexity is if the classes P and
N P are the same. This reduces to the question is A E P for A any NP-complete
problem.

Cook's theorem [114, 183] states that the satisfiability problem is NP-complete.
Since the satisfiability problem is in NP, there exists a polynomial Turing machine
that can check the validity of a solution. The proof of the theorem consists of
analysing the Turing machine and constructing a logical formula, in a polynomial
number of operations, which describes the operation of the Turing machine. The
formula introduces a polynomial number of Boolean variables. The formula is a
conjunction of disjunctions which are the requirements on the Turing machine. For
example, the Turing machine can only be in one state at a time. Thus if any
problem A in NP is polynomially reducable to a satisfiability problem, A is also
NP-complete.
Chapter 14
Neural Networks

14.1 Introduction
Artificial neural networks is an abstract simulation of a real nervous system that
contains a collection of neuron nets communicating with each other via axon connec-
tions. Such a model bears a strong resemblance to axons and dendrites in a nervous
system. The first fundamental modelling of neural nets was proposed in 1943 by
McCulloch and Pitts in terms of a computational model of "nervous activity". The
McCulloch-Pitts neuron is a binary device and each neuron has a fixed threshold
logic. This model lead to the works of John von Neumann, Marvin Minsky, Frank
Rosenlatt, and many others. Hebb postulated [85], that neurons were appropriately
interconnected by self-organization and that "an existing pathway strengthens the
connections between the neurons". He proposed that the connectivity of the brain
is continually changing as an organism learns different functional tasks, and that
cell assemblies are created by such changes. By embedding a vast number of simple
neurons in an interactive nervous system, it is possible to provide computational
power for very sophisticated information processing.
The neuron is the basic processor in neural networks. Each neuron has one ouput,
which is generally related to the state of the neuron - its activation - and which
may fan out to several other neurons. Each neuron receives several inputs over
these connections, called synapses. The inputs are the activations of the incoming
neurons multiplied by the weights of the synapses. The activation of the neuron is
computed by applying a threshold function to this product. This threshold function
is generally some form of nonlinear function.
The basic artificial neuron (Cichocki and Unbehauen [45], Fausett [65], Hassoun [82],
Haykin [83], Rojas [139], Steeb [164]) can be modelled as a multi-input nonlinear
device with weighted interconnections Wji, also called synaptic weights or strengths.
The cell body (soma) is represented by a nonlinear limiting or threshold function f.
The simplest model of an artificial neuron sums the n weighted inputs and passes
the result through a nonlinearity according to the equation

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
262 Chapter 14. Neural Networks

where j is a threshold junction, also called an activation junction, OJ (OJ E R) is


the external threshold, also called an offset or bias, Wji are the synaptic weights
or strengths, Xi are the inputs (i = 1,2, ... ,n), n is the number of inputs and Yj
represents the output. The activation function is also called the nonlinear transfer
characteristic or the squashing function. The activation function j is a monotoni-
cally increasing function.

A threshold value OJ may be introduced by employing an additional input Xo equal


to + 1 and the corresponding weight WjO equal to minus the threshold value. Thus
we can write

where
Xo = 1.

The basic artificial neuron is characterized by its nonlinearity and the threshold
OJ. The McCulloch-Pitts model of the neuron used only the binary (hard-limiting)
function (step function or Heaviside function), i.e.

1ifX>O
H(x):= { 0 if X 0 . <
In this model a weighted sum of all inputs is compared with a threshold OJ. If this
sum exceeds the threshold, the neuron output is set to 1, otherwise to O. For bipolar
representation we can use the sign function

if x> 0
sign(x) := f~ if x = 0
1-lifx<O

The threshold (step) function may be replaced by a more general nonlinear function
and consequently the output of the neuron Yj can either assume a value of a discrete
set (e.g. {-1, 1}) or vary continuously (e.g. between -1 and 1 or generally between
Ymin and Ymax > Ymin)' The activation level or the state of the neuron is measured
by the output signal Yj, e.g. Yj = 1 if the neuron is firing (active) and Yj = 0 if the
neuron is quiescent in the unipolar case and Yj = -1 for the bipolar case.

In the basic neural model the output signal is usually determined by a monotonically
increasing sigmoid function of a weighted sum of the input signals. Such a sigmoid
14.1 Introduction 263

function can be described for example by

for a symmetrical (bipolar) representation. For an unsymmetrical unipolar repre-


sentation we have

where). is a positive constant or variable which controls the steepness (slope) of the
sigmoidal function. The quantity Uj is given by

n
Uj := L WjiXi·
i=O

The following program, thresh. cpp, gives an implementation of these threshold


functions.
264 Chapter 14. Neural Networks

II thresh.cpp

#include <iostream.h>
#include <math.h>

int H(double* w,double* x,int m)


{
double sum = 0.0;
for(int i=O; i<=m; i++)
{
sum += w[i]*x[i];
}
if(sum >= 0.0) return 1;
else return 0;
}

int sign(double* w,double* x,int m)


{
double sum = 0.0;
for(int i=O; i<=m; i++)
{
sum += w[i]*x[i];
}
if(sum >= 0.0) return 1;
else return -1;
}

double unipolar(double* w,double* x,int m)


{
double lambda = 1.0;
double sum = 0.0;
for(int i=O; i<=m; i++)
{
sum += w[i]*x[i];
}
return 1.0/(1.0 + exp(-lambda*sum»;
}

double bipolar(double* w,double* x,int m)


{
double lambda = 1.0;
double sum = 0.0;
for(int i=O; i<=m; i++)
{
sum += w[i]*x[i];
}
return tanh(lambda*sum);
}
14.1 Introduction 265

int mainO
{
int n = 5; II length of input vector includes bias
double theta = 0.5; II threshold
II allocation memory for weight vector w
double* w = NULL;
w = new double[n];

w[O] -theta;
w[l] 0.7; w[2] = -1.1; w[3] = 4.5; w[4] 1.5;

II allocation memory for input vector x


double* x = NULL;
x = new double[n];

x[O] 1.0; I I bias


x[l] 0.7; x[2] = 1.2; x[3] 1. 5; x [4] -4.5;

int rl = H(w,x,n-l);
cout « "rl = " « rl « endl;

int r2 = sign(w,x,n-l);
cout « "r2 = " « r2 « endl;

double r3 = unipolar(w,x,n-l);
cout « "r3 = " « r3 « endl;

double r4 = bipolar(w,x,n-l);
cout « "r4 = " « r4 « endl;

delete [] w;
delete [] x;

return 0;
}
266 Chapter 14. Neural Networks

14.2 Hyperplanes
Hyperplanes are used to describe the function of a perceptron. They are used to
classify points in space as being elements of one of two half spaces.

Definition. A hyperplane Hp ,,,, is a subset of Rn defined by

with p ERn, Ct E Rand pT denotes the transpose of p.

A hyperplane H p ,,,, defines two closed half spaces

and two open half spaces

H;,,,, .- {x I pT X < Ct, x E Rn }

Any point x ¢ Hp ,,,, in Rn has the property that either x E H:'", or x E H;,,,,.

These definitions can also be expressed in terms of a fixed point on the hyperplane.
Suppose a ERn is a point on the hyperplane Hp,O/. Any point x on the hyperplane
must satisfy
pTX - pTa = Ct - Ct = O.

Thus we obtain the definitions

Hp,a {x I pT(X - a) = 0, x E Rn }
Ht,a {x I pT(x - a) > 0, x E Rn }
H;'a { x I pT(x - a) < 0, x E R n }.

then 8 1 and 8 2 are said to be properly separated by H p ,,,,.


14.2 Hyperplanes 267

Definition. Two sets of points A and B in the n-dimensional space Rn are called
linearly separable if n + 1 real numbers wo, WI," . , Wn exist, such that every point
(XI,X2, ... ,Xn ) E A satisfies ~i=IWiXi ~ Wo and every point (XI,X2, ... ,X n ) E B
satisfies ~~IWiXi < WOo

Definition. Two sets A and B of points in the n-dimensional space Rn are called
absolutely linearly separable if n+ 1 real numbers wo, WI, ... , wn exist such that every
point (Xl, X2, . .. , Xn) E A satisfies ~i=IWiXi > Wo and every point (Xl, X2,.· . , Xn) E
B satisfies ~i=IWiXi < W00

Definition. The open (closed) positive half space associated with the n-dimensional
weight vector w is the set of all points x ERn for which w T x> 0 (w T X ~ 0). The
open (closed) negative half space associated with w is the set of all points x E Rn
for which wTx < 0 (wTx S; 0).

Example. Consider the plane in R4 described by

with normal vector

It is a hyperplane H p ,4. The point (1,1,0, 1)Y can be used to describe the two half
spaces

To understand the separation better, we can examine the effect of the division on
subspaces. The hyperplane divides the subspace corresponding to X3 around the
origin. The hyperplane divides the subspace corresponding to Xl around 1. The
same applies for the subspaces corresponding to X2 and X4' Thus we can classify the
following points.
s, t, v ~ 1, u S; 0

(s,t,u,vf E H~4' s,t,v S; 1, u ~ 0


where at least one equality does not hold. Considering two- and three-dimensional
subspaces leads to an even better description of the division of the vector space.
268 Chapter 14. Neural Networks

14.3 Perceptron
14.3.1 Introduction
The perceptron is the simplest form of a neural network used for the classification
of special types of patterns said to be linearly separable (i.e. patterns that lie
on opposite sides of a hyperplane). It consists of a single neuron with adjustable
synaptic weights Wi and threshold ().

Definition. A perceptron is a computing unit with threshold () which, when receiving


the n real inputs Xl, X2, ... , Xn through edges with the associated weights Wl, W2,
... , W n , outputs 1 if the inequality

1 1>
~ w·x,
"" - ()
i=l

holds otherwise it outputs zero.

The origin of the inputs is not important irrespective of whether they come from
other perceptrons or another class of computing units. The geometric interpretation
of the processing performed by perceptrons is the same as with McCulloch-Pitts
elements. A perceptron separates the input space into two half-spaces. For points
belonging to one half-space the result of the computation is 0, for points belonging
to the other it is 1.
We can also formulate this definition using the Heaviside step function

I for X ~0
H{x):= { 0 for X <0

Thus

Hct i=l
WiXi _ ()) = {I for {~~=1 w~x~
0 for (~i=l W,X,
=()) <~ 00
())

With Wl, W2, ... , Wn and () given, the equation

LWiXi = ()
i=l

defines a hyperplane which divides the Euclidean space Rn into two half spaces.
14.3 Perceptron 269

Example. The plane

divides R3 into two half spaces.

In many cases it is more convenient to deal with perceptrons of threshold zero only.
This corresponds to linear separations which are forced to go through the origin of
the input space. The threshold of the perceptron with a threshold has been converted
into the weight -(J of an additional input channel connected to the constant 1. This
extra weight connected to a constant is called the bias of the element. Thus the
input vector (Xl, X2, . .. , xn) must be extended with an additional! and the resulting
(n + 1)-dimensional vector

is called the extended input vector, where

Xo = 1

The extended weight vector associated with this perceptron is

whereby Wo = -(J.
The threshold computation of a perceptron will be expressed using scalar products.
The arithmetic test computed by the perceptron is thus

if wand x are the weight and input vectors, and

if wand x are the extended weight and input vectors.

Example. If we are looking for the weights and threshold needed to implement the
AND function with a perceptron, the input vectors and their associated outputs are

(0,0) 1-+ 0, (0,1) 1-+ 0, (1,0) 1-+ 0, (1,1) 1-+ l.

If a perceptron with threshold zero is used, the input vectors must be extended and
the desired mappings are

(1,0,0) 1-+ 0, (1,0,1) 1-+ 0, (1,1,0) 1-+ 0, (1,1,1) 1-+ l.

A percept ron with three still unknown weights (wo, WI, W2) can carry out this task.
270 Chapter 14. Neural Networks

Example. The AND gate can be simulated using the perceptron. The AND gate is
given by

Input Output
0 0 0
0 1 0
1 0 0
1 1 1

Thus the input patterns are

Let

Then

wT = {1,1}
and the evaluation of H(wTxj - fJ) for j = 0, 1, 2, 3 yields

H(wTXo - fJ) = H(O -~) = H(-~) = 0


2 2

T 3 1
H(w Xl - fJ) = H(l - -) = H( --) = 0
2 2

T 3 1
H(w X2 - 0) = H(1 - -) = H( --) = 0
2 2
14.3 Perceptron 271

Example. Consider the Boolean function

This Boolean function can be represented by

y = H(wTx - e)

where w T = (-1,2, -1) and e= ~ since

Xl X2 X3 Y
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 0
1 1 0 1
1 1 1 0

Table 14.1: Function Table for the Boolean Function (Xl' X2) + (X2 . X3)

Thus to find wand e we have to solve the following inequalities

0 < e
Wl < e
W2 > e
W3 < e
Wl +W2 > e
Wl +W3 < e
W2 +W3 > e
Wl + W2 + W3 < e
which admits the solution

Wl = -1, W3 = -1.
272 Chapter 14. Neural Networks

14.3.2 Boolean Functions


Which logical functions can be implemented with a single perceptron? A perceptron
network is capable of computing any logical function since perceptrons are even more
powerful than unweighted McCulloch-Pitts elements. If we reduce the network to a
single element, which functions are still computable? Taking the boolean functions
of two variables we can gain some insight into this problem.

Since we are considering logical functions of two variables, there are four possible
combinations for the input. The outputs for the four inputs are four bits which
uniquely distinguish each logical function. We use the number defined by these four
bits as a subindex for the name of the functions. The function (Xl,X2) t-+ 0, for
example, is denoted by 10 (since 0 corresponds to the bit string 0000). The AND
function is denoted by 18 (since 8 corresponds to the bit string 1000), whereby the
output bits are ordered according to the following ordering of the inputs: (1,1),
(0,1), (1,0), (0,0).

The sixteen possible functions of two variables are thus

10 (Xl, X2) 10000 (Xl, X2) =0


h( X2,X2) 10001 (Xl, X2) = Xl + X2
!2(Xl,X2) 10010 (Xl, X2) = Xl . X2
h(Xl,X2) lOOl1(XbX2) = X2
!4(Xl, X2) 10000(Xb X2) = Xl . X2
15 (Xl, X2) 10101 (Xb X2) = Xl
/6{Xl, X2) 10110(Xb X2) = Xl ffi X2
h(Xl,X2) 10111 (Xl, X2) = Xl . X2
h(XI, X 2) /!ooo(Xl, X2) = Xl . X2
fg(Xl, X2) /!001 (Xb X2) = Xl ffi X2
11O(Xl, X2) /!010(Xl, X2) = Xl
111(Xl,X2) 11011 (Xb X2) = Xl + X2
/!2(Xl, X2) IUOO(Xb X2) = X2
/!3(Xl, X2) IU01 (Xl, X2) = Xl + X2
114(Xl,X2) 11110 (Xl , X2) = Xl + X2
/!5(Xl, X2) IUU(XbX2) = 1.
The function 10 is the zero function whereas 114 is the inclusive OR-function.
Perceptron-computable functions are those for which the points whose function value
is 0 can be separated from the points whose function value is 1 using a line. For the
AND function and OR function we can find such a separation.

Two of the functions cannot be computed in this way. They are the function XOR
(exclusive OR) (function 16) and the function XNOR fg. No line can produce the
necessary separation of the input space. This can also be shown analytically.
14.3 Perceptron 273

Let WI and W2 be the weights of a percept ron with two inputs, and () its threshold.
If the perceptron computes the XOR function the following four inequalities must
be fulfilled.

Xl =0 X2 =0 WIXI + W2X2 = 0 =? o<(}


Xl =1 X2 =0 WIXI + W2X2 = WI =? WI ~ ()
Xl =0 X2 =1 WIXI + W2X2 = W2 =? W2 ~ ()
Xl =1 X2 =1 WIXI + W2X2 = WI + W2 =? WI + W2 < ().

Since the threshold () is positive, according to the first inequality, WI and W2 are
positive too, according to the second and third inequalities. Therefore the inequality
WI + W2 < () cannot be true. This contradiction implies that no perceptron capable
of computing the XOR function exists. An analogous proof holds for the function f9'

A perceptron can only compute linearly separable functions, When n = 2, 14 out


of the 16 possible Boolean functions are linearly separable. When n = 3, 104 out
of 256 and when n = 4, 1882 out of 65536 possible functions are linearly separable.
No formula for expressing the number of linearly separable functions as a function
of n has yet been found.

Thus using
y=H(wTx-(})
we cannot represent all Boolean functions. However we can realize the universal
NAND-gate (or universal NOR-gate). Thus any boolean function can be realized
using a network of linear threshold gates. For example the XOR gate can be con-
structed as in Figure 14.1.

Figure 14.1: XOR Implementation Using NAND Operations

Now the NAND gate can be represented by

1
(}=--
2'
274 Chapter 14. Neural Networks

This is a solution to the set of inequaltities

0> 0,

Thus the XOR gate can be simulated by

Y H(Wltl+ W2t2 - 0)
H(WIH(WIXI + W2S - 0) + W2H(WIS + W2X2 - 0) - 0)
H(WIH(WIXI + w2H(WIXl + W2X2 - 8) - 0)

+w2H(WIH(WIXl + W2X2 - 8) + W2X2 - 8) - 8)

The program below implements this equation.

II xor.cpp

#include <iostream>

using namespace std;

int H(double x) { return (x>=O); }


int NAND(int xl,int x2) { return H(-x1/3.0-x2/3.0+0.5); }

int XOR(int xl,int x2)


{
int s=NAND(xl,x2);
int tl=NAND(xl,s);
int t2=NAND(x2,s);
return NAND(tl,t2);
}

void main(void)
{
cout « "XOR(O,O) " « XOR(O,O) « endl;
cout « "XOR(O,l) " « XOR(O,O « endl;
cout « "XOR(l,O) " « XOR(l,O) « endl;
cout « "XOR(l,l) = « XOR(l,l) « endl;
II

The program output is

XDR(O,O) 0
XDR(O,l) 1
XDR(l,O) 1
XDR(l,l) =0
14.3 Perceptron 275

14.3.3 Perceptron Learning

A learning algorithm is an adaptive method by which a network of computing units


self-organize to implement the desired behavior. This is done in some learning
algorithms by presenting some examples of the desired input-output mapping to
the network. A correction step is executed iteratively until the network learns to
produce the desired response.

Learning algorithms can be divided into supervised and unsupervised methods. Su-
pervised learning denotes a method in which some input vectors are collected and
presented to the network. The output computed by the network is observed and
the deviation from the expected answer is measured. The weights are corrected ac-
cording to the magnitude of the error in the way defined by the learning algorithm.

Unsupervised learning is used when, for a given input, the exact numerical output
a network should produce is unknown. Assume, for example, that some points in
two-dimensional space are to be classified into three clusters. We can use a classifier
network with three output lines. Each of the three computing units at the output
must specialize by firing only for inputs corresponding to elements of each cluster.
If one unit fires, the others must keep silent. In this case we do not know a priori
which unit is going to specialize on which cluster. Generally we do not even know
how many well-defined clusters are present. The network must organize itself in
order to be able to associate clusters with units.

Supervised learning is further divided into methods which use reinforcement or error
correction. Reinforcement learning is used when after each presentation of an input-
output example we only know whether the network produces the desired result or
not. The weights are updated based on this information so that only the input
vector can be used for weight correction. In learning with error correction, the
magnitude of the error, together with the input vector, determines the magnitude
of the corrections to the weights (corrective learning).

The percept ron learning algorithm is an example of supervised learning with rein-
forcement. Some variants use supervised learning with error correction.

The proof of convergence of the perceptron learning algorithm assumes that each
perceptron performs the test w T x > O. So far we have been working with percep-
trons which perform the test w T x ::=: o. If a perceptron with threshold zero can
linearly separate two finite sets of input vectors, then only a small adjustment to its
weights is needed to obtain an absolute linear separation. This is a direct corollary
of the following proposition.

Proposition. Two finite sets of points, A and B, in n-dimensional space which are
linearly separable are also absolutely linearly separable.
276 Chapter 14. Neural Networks

A usual approach for starting the learning algorithm is to initialize the network
weights randomly and to improve these initial parameters, looking at each step to see
whether a better separation of the training set can be achieved. We identify points
(Xl, X2,'" ,xn ) in n-dimensional space with the vector x with the same coordinates.

Let P and N be two finite sets of points in Rn which we want to separate linearly.
A weight vector is sought so that the points in P belong to its associated positive
half-space and the points in N to the negative half-space. The error of a perceptron
with weight vector w is the number of incorrectly classified points. The learning
algorithm must minimize this error function E(w). Now we introduce the percept ron
learning algorithm. The training set consists of two sets, P and N, in n-dimensional
extended input space. We look for a vector w capable of absolutely separating both
sets, so that all vectors in P belong to the open positive half-space and all vectors
in N to the open negative half-space of the linear separation.

Algorithm. Perceptron learning

start: The weight vector w(t = 0) is generated randomly


test: A vector x E puN is selected randomly,
if x E P and w(tf x > 0 goto test,
if x E P and w(t)Tx::; 0 goto add,
if x EN and w(tf x < 0 goto test,
if x EN and w(tf x :2: 0 goto subtract,
add: set w(t + 1) = w(t) + x and t := t + 1, goto test
subtract: set w(t + 1) = w(t) - x and t := t + 1 goto test

This algorithm makes a correction to the weight vector whenever one of the selected
vectors in P or N has not been classified correctly. The perceptron convergence
theorem guarantees that if the two sets P and N are linearly separable the vector
w is updated only a finite number of times. The routine can be stopped when all
vectors are classified correctly.

Example. Consider the sets in the extended space

P = { (1,2.0,2.0), (1,1.5,1.5)}

N = { (1,0,1), (1,1,0), (1,0,0)}.

Thus in R 2 we consider the two sets of points

{ (2.0,2.0), (1.5,1.5)}
and
{ (0, 1), (1,0), (O,O)}.
14.3 Perceptron 277

These two sets are separable by the line

Thus w T = (-~, 1, 1).

The following C++ program implements the algorithm.

I I classify. cpp

#include <iostream>
#include <stdlib.h>
#include <time.h>

using namespace std;

void classify(double **P,double **N,int p,int n,double *w,int d)


{
int i,j,k,classified = 0;
double *x,sum;

srand(time(NULL));
for(i=O;i<d;i++) w[i] = double(rand())/RAND_MAX;
k = 0;
while(!classified)
{
i = rand()%(p+n-l);
if(i<p) x = P[i]; else x = N[i-p];
for(j=O,sum=O;j < d;j++) sum += w[j]*x[j];
if«i<p) && (sum<=O))
for(j=O;j < d;j++) w[j] += x[j];
if«i>=p) && (sum>=O))
for(j=O;j < d;j++) w[j] -= x[j];
k++;
classified = 1;
II check if the vectors are classified
if«k%(2*p+2*n)) == 0)
{
for(i=O;(i < p)&&classified;i++)
{
sum = 0;
for(j=O,sum=O;j < d;j++) sum += w[j]*P[i] [j];
if(sum <= 0) classified = 0;
}
278 Chapter 14. Neural Networks

for(i=O;(i<n)ltclassified;i++)
{
sum = 0;
for(j=O,sum=O;j < d;j++) sum += w[j]*N[i] [j];
if(sum >= 0) classified = 0;
}
}
else classified = 0;
}
}

void main(void)
{
double **P = new double*[2];

P[O] = new double[3]; P[l] = new double[3];

P [0] [0] = 1. 0; P [0] [1] = 2.0; P [0] [2] = 2.0;


P[l] [0] = 1.0; P[l] [1] = 1.5; P[l] [2] = 1.5;

double **N = new double*[3];

N[O] = new double[3]; N[l] = new double[3]; N[2] = new double[3];

N[0] [0] = 1. 0; N[0] [1] = 0.0; N[0] [2] = 1. 0;


N[l] [0] = 1.0; N[l] [1] = 1.0; N[l] [2] = 0.0;
N[2] [0] = 1. 0; N[2] [1] = 0.0; N[2] [2] = 0.0;

double *w = new double[3];

classify(P,N,2,3,w,3);

cout « lOW = ( " « w[O] « " , " « w[l]


« " , " « w[2] « " ) " « endl;

delete [] P [0]; delete [] P [1] ;


delete[] N[O]; delete[] N[l]; delete[] N[2];
delete[] P; delete[] N;
delete w;
}

The program output is

w = ( -1.59917 , 1.47261 , 1.2703 )


14.3 Perceptron 279

14.3.4 Quadratic Threshold Gates


Thus far we have considered linear threshold gates. We can consider using nonlinear
threshold gates to simulate functions which cannot be simulated with linear thresh-
old gates (for example the XOR operation). This can be accomplished by expanding
the number of inputs to a linear threshold gate. For example, one can do this by
feeding the products or AND of inputs as new inputs to the linear threshold gate.
In this case, we require a fixed preprocessing layer of AND gates that artificially in-
creases the dimensionality of the input space. We expect that the resulting Boolean
function (which is now only partially specified) becomes a threshold function and
hence realizable using a single linear threshold gate. The realization of a Boolean
logic function by the preceeding process leads to a quadratic threshold gate. The
general transfer characteristics for an n-input quadratic threshold gate are given by

F{
n n n
1 L WiXi +L L WijXiXj 2': B
i=l i=l j=i
0 otherwise

for x E Rn and

n n n

y~{ 1 LWiXi
i=l
+L L
i=l j=i+1
WijXiXj 2': B

0 otherwise

for x E {O, 1}n. The only difference between the above two equations is the range of
the index j of the second summation in the double-summation term. The bounds
on the double summations eliminate WijXiXj and WjiXjXi duplications. Quadratic
threshold gates greatly increase the number of realizable Boolean functions when
compared to linear threshold gates.

Example. Consider

This quadratic threshold gate can be used to implement the XNOR operation. The
function classifies points in R2 according to g(x, y) 2': 0 and g(x, y) < 0 where

1
g(x,y) = -x-y+3xy+ 2"'
280 Chapter 14. Neural Networks

The gate is illustrated in Figure 14.2.

1.2 ,...---...---.--,---,-----,-----.----,-----,
g(X,y) = 0 -
o •
0.8 g(x, y) < 0
0.6
g(x,y) > 0
0.4

0.2

o • g(x,y) < 0 o

-0.2 '--_ _.l....-_ _- ' - -_ _- ' - ' ' - - - _ - ' -_ _.....I..._ _....L..._ _- '
-0.2 o 0.2 0.4 0.6 0.8 1 1.2

Figure 14.2: Quadratic Threshold Gate for XOR

The following program illustrates a quadratic threshold gate for the XNOR opera-
tion.

II quadratic.cpp

#include <iostream.h>

double f(double *x,double *wv,double **wm,int n)


{
double sum = 0.0;
int i,j;

for(i=O;i <= n;i++)


sum += wv[i]*x[i];
for(i=O;i <= n;i++)
for(j=i;j <= n;j++)
sum += wm[i] [j]*x[i]*x[j];

if(sum >= 0) return 1.0;


return 0.0;
}

void main(void)
{
int i;
14.3 Perceptron 281

int n = 2;
double T = 0.5;

double *x = NULL;
x = new double[n+l];
double *wv = NULL;
wv = new double [n+l] ;

double **wm = NULL;


wm = new double*[n+l];
for(i=O;i <= n;i++)
wm[i] = new double [n+l] ;

wv[O] = T; wv[l] = -1.0; wv[2] = -1.0;


wm[O][O] = 0.0; wm[O] [1] = 0.0; wm[O] [2] = 0.0;
wm[l] [0] = 0.0; wm[l] [1] = 0.0; wm[l] [2] = 3.0;
wm[2] [2] = 0.0;

x[O] = 1.0;
II case 1
x[l] = 0.0; x[2] = 0.0;
double rOO = f(x,wv,wm,n);
cout « "rOO = " « rOO « endl;
II case 2
x[l] = 0.0; x[2] = 1.0;
double rOl = f(x,wv,wm,n);
cout « "rOl = " « rOl « endl;
II case 3
x[l] = 1.0; x[2] = 0.0;
double rl0 = f(x,wv,wm,n);
cout « "rl0 = " « rl0 « endl;
II case 4
x[l] = 1.0; x[2] = 1.0;
double rll = f(x,wv,wm,n);
cout « "rl1 = " « rl1 « endl;

delete [] x;
delete [] wv;

for(i=O;i <= n;i++) delete[] wm[i];


delete[] wm;
}
282 Chapter 14. Neural Networks

14.3.5 One and Two Layered Networks


We now consider feed-forward networks structured in successive layers of computing
units. The networks we consider must be defined in a more precise way in terms of
their architecture. The atomic elements of any architecture are the computing units
and their interconnections. Each computing unit collects the information from n
input lines with an integration function ~ : Rn -+ R. The total excitation computed
in this way is then evaluated using an activation function f : R -+ R. In perceptrons
the integration function is the sum of the inputs. The activation, also called output
function, compares the sum with a threshold. We can generalize f to produce all
values between 0 and 1. In the case of ~ some functions other than addition can
also be considered. In this case the networks can compute some difficult functions
with fewer computing units.

Definition. A network architecture is a tuple (1, N, 0, E) consisting of a set 1 of


input sites, a set N of computing units, a set 0 of output sites and a set E of weighted
directed edges. A directed edge is a tuple (u, v, w) whereby u E 1 U N, v E N U 0
and wE R.

The input sites are entry points for information into the network and do not perform
any computation. Results are transmitted to the output sites. The set N consists
of all computing elements in the network. The edges between all computing units
are weighted, as are the edges between input and output sites and computing units.

Layered architectures are those in which the set of computing units N is subdivided
into £ subsets N 1 , N 2 , .•. ,Nt in such a way that only connections from units in Nl
go to units in N2 , from units in N2 to units in N3 , etc. The input sites are only
connected to the units in the subset Nl, and the units in the subset Nt are the only
ones connected to the output sites. The units in Nt are the output units of the
network. The subsets Ni are called the layers of the network. The set of input sites
is called the input layer, the set of output units is called the output layer. All other
layers with no direct connections from or to the outside are called hidden layers.
Usually the units in a layer are not connected to each other and the output sites
are omitted from the graphical representation. A neural network with a layered
architecture does not contain cycles. The input is processed and relayed from the
layer to the other, until the final result has been computed.

In layered architectures normally all units from one layer are connected to all other
units in the following layer. If there are m units in the first layer and n units in the
second one, the total number of weights is mn. The total number of connections
can be rather large.
14.3 Perceptron 283

14.3.6 Perceptron Learning Algorithm


Here we use Wo = -() for the first component of w. For the input vectors x we use
Xo = 1 for the first component.

A simple percept ron learning algorithm is

1. Initialize the connection weight w to small random values.

2. Initialize acceptable error tolerance EO

3. Set Emax = 0

4. For each of the input patterns {Xj, j = 0, 1, ... ,m - I} do the following

(a) Calculate the output Yj via

where H is the Heaviside function.


(b) Calculate the difference between the output Yj and the desired output fh
of the network
dj := Yj - Yj·
(c) Calculate the changes in the connection strengths

where 'TJ is the learning rate.


(d) Update the connection weight w according to

(e) Set

5. If Emax > EO return to step 3.

Example. Consider the AND gate. Let

wT = (0.2,0.1,0.05), EO = 0.01, 'TJ = 0.5


with the input pattern
284 Chapter 14. Neural Networks

The desired output is

Yo = 0, Yl = 0, Y2 = 0, Ya = 1 .
The calculations yield

1) (a) Yo = H(wTX{)) = 1 => do = Yo - Yo = -1 => t:J.w = 71do(l,O,O) = (-0.5,0,0)


=> w = (-0.3,0.1,0.05)
(b) Yl = H(wT Xl) = 0 => dl = Yl - Yl = 0 => t:J.W = 0
(c) Y2 = H(WT X2) = 0 => ~ = Y2 - Y2 = 0 => t:J.W = 0
(d) Ya = H(wT xa) = 0 => da = Ya - Ya = 1 => t:J.w = 71da(l, 1, 1) = (0.5,0.5,0.5)
=> w = (0.2,0.6,0.55), lOmax =1

2) (a) Yo = H(wTX{)) = 1 => do = -1 => t:J.w = (-0.5,0,0)


=> w = (-0.3,0.6,0.55)
(b) Yl = H(wT Xl) = 1 => dl = -1 => t:J.w = (-0.5, -0.5,0)
=> w = (-0.8,0.1,0.55)
° °
(c) Y2 = H(WT X2) = => ~ = => t:J.w = 0
(d) Ya = H(wT xa) = 0 => da = 0 => t:J.w = (0.5,0.5.0.5)
=> w = (-0.3,0.6,1.05), Emax =1

3) (a) Yo = H(wTX{)) = 0 => do·= 0 => t:J.w = 0


(b) Yl = H(wTxl) = 1 => d l = -1 => t:J.w = (-0.5, -0.5,0)
=> w = (-0.8,0.1,1.05)
(c) Y2 = H(WT X2) = 1 => d2 = -1 => t:J.w = (-0.5,0, -0.5)
=> w = (-1.3,0.1,0.55)
(d) Ya = H(wTxa) = 0 => da = 1 => t:J.w = (0.5,0.5,0.5)
=> w = (-0.8,0.6,1.05), Emax =1
14.3 Perceptron 285

4) (a) Yo = H(w Txo) = 0 :::} do = 0 :::} /},W = 0

(b) YI = H(WTXI) = o:::} dl = o:::} /},w =0


(c) Y2 = H(w T X2) = 1 :::} d2 = -1 :::} /},w = (-0.5,0.0, -0.5)
:::} w = (-1.3,0.6,0.55)

(d) Y3 = H(w TX3) = 0 :::} d3 = 1 :::} /},w = (0.5,0.5,0.5)


:::} w = (-0.8,1.1,1.05), Emax =1

5) (a) Yo = H(w TXo) = 0 :::} do = 0 :::} /},w =0


(b) YI = H(w T Xl) = 1 :::} dl = -1 :::} /},w = (-0.5, -0.5,0.0)
:::} w = (-1.3,0.6,1.05)

(c) Y2 = H(W T X2) = O:::} d2 = o:::} /},w = 0


(d) Y3 = H(wTx3) = 1:::} d3 = 0, Emax =1

6) (a) Yo = H(wTXo) = O:::} do = o:::} /},w = 0


(b) YI = H(WTXI) = o:::} d l = o:::} /},w = 0
(c) Y2 = H(W T X2) = o:::} d2 = o:::} /},w = 0
(d) Y3 = H(wTX3) = 1:::} d3 = O:::} /},w = 0, Emax =0

Thus with

wT = (0.6,1.05), () = 1.3
we can simulate the AND gate.

In the extended space we have


286 Chapter 14. Neural Networks

In the program percand. cpp we use the notation of the extended space. Further-
more, the threshold is also initialized to a small random value at t = O.

II percand.cpp

#include <iostream.h>
#include <math.h>

double H(double z)
{
if(z >= 0.0) return 1.0;
else
return 0.0;
}

double scalar(double* u, double* v,int n)


{
double result = 0.0;
for(int i=O; i<n; i++)
result += u[i)*v[i);
return result;
}

double distance(double* u, double* v,int n)


{
double result = 0.0;
for(int i=O; i<n; i++)
result += fabs(u[i) - v[i));
return result;
}

void change(double** x,double* yt,double* w,double eta,int m,int n)


{
double* d = NULL; d = new double[m);

for(int j=O; j<m; j++)


{
d[j) = yt[j) - H(scalar(w,x[j),n));
for(int i=O; i<n; i++)
{
w[i) = w[i) + eta*d[j)*x[j) [i);
}
}
delete [) d;
}

int mainO
14.3 Perceptron 287

{
II number of input vectors (patterns) is m = 4
II length of each input vector n = 3
int m = 4;
int n = 3;
double** x = NULL;
x = new double*[m];
for(int k=O; k<m; k++)
x[k] = new double[n];

x [0] [0] 1.0; x [0] [1] 0.0; x [0] [2] = 0.0;


x [1] [0] 1.0; x [1] [1] 0.0; x [1] [2] = 1.0;
x [2] [0] 1.0; x [2] [1] 1.0; x [2] [2] = 0.0;
x [3] [0] 1.0; x [3] [1] = 1.0; x [3] [2] = 1.0;

II desired output
double* yt = NULL;
yt = new double [m];
yt[O] = 0.0; yt[l] 0.0; yt[2] = 0.0; yt[3] 1.0;

II weight vector
II w[O] = - theta (threshold)
double* w = NULL;
w = new double en];
II initialized to small random numbers
w[O] = 0.01; w[l] = 0.005; w[2] = 0.006;

II learning rate
double eta = 0.5;

double* wt = NULL;
wt = new double[n];
for(int i=Oj i<nj i++)
wt [i] = w [i] j

fore; ;)
{
change(x,yt,w,eta,m,n);
double dist = distance(w,wt,n);
if(dist < 0.0001) break;
for(i=O; i<n; i++)
wt [i] = wei] j
}

II display the output of the weight vector


for(i=O; i<n; i++)
cout « "w[" « i « II] = " «w[i] « " ";
288 Chapter 14. Neural Networks

delete [] w·,
delete [] wt;
delete [] yt;

for(i=O; i<m; i++)


{
delete [] x[i] ;
}
delete [] x;

return 0;
}

The output is given by

w[O] = -1.49 w[1] = 1.005 w[2] 0.506

Thus with

Wo = -() = -1.49, WI = 1.005, W2 = 0.506


we can simulate the AND gate.
14.3 Perceptron 289

14.3.7 The XOR Problem and Two-Layered Networks


The properties of a two-layered network can be discussed using the case of the XOR
function as an example. A single perceptron cannot compute this function, but
a two-layered network can. The network in Figure 4.5 is capable of doing this.
The network consists of an input layer, a hidden layer and an output layer and
three computing units. One of the units in the hidden layer computes the function
Xl /I. 'X2, and the other the function 'Xl /I. X2. The third unit computes the OR
function, so that the result of the complete network computation is

Figure 14.3: A Three-layered Network for the Computation of XOR

The calculations for the XOR gate are as follows. We work in the extended space.
The input vectors are

1) input layer ---+ hidden layer. The weights are

Wooo = -0.5, WOOl = 1.0, WOO2 = -1.0


WOlO = -0.5, WOll = -1.0, WOl2 = 1.0.
The weight has three indexes. The first index indicates the layer, in this case 0 for
the input layer. The second index indicates to which node in the hidden layer it
points where the number for the hidden node is incremented by 1 so that we can
assign the index 0 to the bias in the hidden layer. The third index indicates the
number of the neuron.
290 Chapter 14. Neural Networks

Consider the input vector Xo

a) H((_, Woo" _ ) ( ~ )) ~ H( -0.5) ~ 0 ~ z,

Consider the input vector Xl

Consider the input vector X2

Consider the input vector X3


14.3 Perceptron 291

2) hidden layer ----> output. The input pairs from the hidden layer are (1,0,0),
(1,0,1), (1,1,0) and (1,0,0). Thus the first and the last patterns are the same. The
weights are

WlOO = -0.5, WlOl = 1.0, W102 = 1.0.

Consider input pattern (1,0,0) from hidden layer

Consider input pattern (1,0,1) from hidden layer

Consider input pattern (1,1,0) from hidden layer

0) H((w,oo, w"" w,,,,) ( ~ } ~ H( +0.5) ~ 1

Consider input pattern (1,0,0) from hidden layer (already considered above)

Thus we have simulated the XOR gate using a hidden layer.


292 Chapter 14. Neural Networks

II XOR1.cpp

#include <iostream.h>

double H(double s)
{
if(s >= 0.0) return 1.0;
else
return 0.0;
}

double map(double*** w,double* testpattern,int size2,int size3)


{
int k;
double* z = NULL;
z = new double[size2];
z[O] = 1.0; z[l] = 0.0; z[2] = 0.0;

II input layer to hidden layer


for (k=O; k<size3; k++)
{
z[l] += w[O] [0] [k]*testpattern[k];
z[2] += w[O] [1] [k]*testpattern[k];
}
z [1] = H(z [1]) ;
z[2] = H(z[2]);

II hidden layer to output layer


double y = 0.0;
for(k=O; k<size3; k++)
y += w[l] [0] [k]*z[k];

delete [] z;

y = H(y);
return y;
}

int mainO
{
int sizel, size2, size3;
size1 = 2; size2 = 2; size3 = 3;
int i, j, k;

double*** w = NULL;
w = new double** [size1];
for(i=O; i<size1; i++)
{
14.3 Perceptron 293

w[iJ = new double* [size2J;


for(j=O; j<size2; j++)
{
w[iJ [jJ = new double [size3J;
}
}
w[OJ [OJ [OJ = -0.5; w[OJ [OJ [lJ = 1.0; w[OJ [OJ [2J = -1.0;
w[OJ [lJ [OJ = -0.5; w[OJ [lJ [lJ = -1.0; w[OJ [lJ [2J = 1.0;

w[lJ [OJ [OJ = -0.5; w[lJ [OJ [lJ = 1.0; w[lJ [OJ [2J = 1.0;
w[l] [1] [OJ = 0.0; w[1J [lJ [lJ = 0.0; w[lJ [lJ [2J = 0.0;

II input patterns
int p = 4; II number of input pattern
int n = 3; II length of each input pattern
double** x = NULL;
x = new double* [pJ;
for(int k=O; k<p; k++)
{
x[kJ = new double [nJ;
}
x [OJ [OJ 1.0; x [OJ [1J = 0.0; x [OJ [2J = 0.0;
x [1] [OJ 1.0; x [1] [1J = 0.0; x[l] [2J = 1.0;
x [2J [OJ 1.0; x [2J [1] 1.0; x[2J [2J = 0.0;
x [3J [OJ 1.0; x [3J [1] = 1.0; x[3J [2J = 1.0;

double result = map(w,x[OJ ,size2,size3);


cout « "result = " « result « endl; II => 0

result = map(w,x[lJ ,size2,size3);


cout « "result = " « result « endl; II => 1

result = map(w,x[2J,size2,size3);
cout « "result = " « result « endl; II => 1

result = map(w,x[3J,size2,size3);
cout « "result = " « result « endl; II => 0

return 0;
}
294 Chapter 14. Neural Networks

14.4 Multilayer Perceptrons


14.4.1 Introduction
In a practical application of the back-propagation algorithm, learning results from
the many presentations of a prescribed set of training examples to the multilayer
perceptron. One complete presentation of the entire training set during the learning
process is called an epoch. The learning process is maintained on an epoch-by-epoch
basis until the synaptic weights and threshold levels of the network stabilize and the
average squared error over the entire training set converges to some minimum value.
Randomizing the order of presentation of training examples from one epoch to the
next may improve the learning rate. This randomization tends to make the search in
weight space stochastic over the learning cycles, thus avoiding the possibility of limit
cycles in the evolution of the synaptic weight vectors. We follow in our notation
closely Hassoun [82]. For a given training set, back-propagation learning may thus
proceed in one of two basic ways.

Let

{Xk, dd
be the training data, where k = 0,1, ... , m - 1. Here m is the number of training
examples (patterns). The sets Xk (k = 0, 1, ... , m-1) are the input pattern and the
sets d k are the corresponding (desired) output pattern. One complete presentation
of the entire training set during the learning process is called an epoch.

1. Pattern Mode. In the pattern mode of back-propagation learning, weight updat-


ing is performed after the presentation of each training example; this is the mode
of operation for which the derivation of the back-propagation algorithm presented
here applies. To be specific, consider an epoch consisting of m training examples
(patterns) arranged in the order

... , Xm-l,dm - 1.

The first example XQ, do in the epoch is presented to the network, and the sequence
of forward and backward computations described below is performed, resulting in
certain adjustments to the synaptic weights and threshold levels of the network.
Then, the second example x(1), d(1) in the epoch is presented, and the sequence of
forward and backward computations is repeated, resulting in further adjustments
to the synaptic weights and threshold levels. This process is continued until the last
training pattern xm-l, d m- 1 is taken into account.

2. Batch Mode. In the batch mode of back-propagation learning, weight updating


is performed after the presentation of all the training examples that constitute an
epoch.
14.4 Multilayer Perceptrons 295

14.4.2 Cybenko's Theorem


Single-hidden-Iayer neural networks are universal approximators. A rigorous math-
ematical proof for the universality of feedforward layered neural networks employing
continuous sigmoid type activation functions, as well as other more general activa-
tion units, was given by Cybenko [52]. Cybenko's proof is based on the Hahn-Banach
theorem. The following is the statement of Cybenko's theorem.

Theorem. Let f be any continuous sigmoid-type function, for example

f(s) = 1/(1 + exp( -AS)),


Then, given any continuous real-valued function g on [O,l]n (or any other compact
subset of Rn) and E > 0, there exists vectors WI, W2, ... , WN, a, and 9 and a
parameterized function

G(·, w, a, 9) [0, l]n ---+ R


such that

IG(x, w, a, 9) - g(x)1 < E for all x E [0, It


where

N
G(x, w, a, 9) = L O:jf(wf X + OJ)
j=1

and

WjERn, OjER, W=(Wl,W2, ... ,WN)

a = (0:1,0:2, ... ,O:N), 9 = (0 1 ,02 "" ,ON)'


For the proof we refer to the paper by Cybenko [52].

Thus a one hidden layer feedforward neural network is capable of approximating


uniformly any continuous multivariate function to any desired degree of accuracy.
This implies that any failure of a function mapping by a multilayer network must
arise from inadequate choice of parameters, i.e., poor choices for WI, W2, ... , WN, a,
and 9 or an insufficient number of hidden nodes.

Hornik et al. [92] employing the Stone-Weierstrass theorem and Funahashi [69]
proved similar theorems stating that a one-hid den-layer feedforward neural network
is capable of approximating uniformly any continuous multivariate function to any
desired degree of accuracy.
296 Chapter 14. Neural Networks

14.4.3 Back-Propagation Algorithm

We consider one hidden layer. The notations we use follow closely Hassoun [82].
Thus we consider a two-layer feedforward architecture. This network receives a set
of scalar signals

where Xo is a bias signal set to 1. This set of signals constitutes an input vector
Xk E Rn. The layer receiving this input signal is called the hidden layer. The hidden
layer has J units. The output of the hidden layer is a J dimensional real-valued
vector Zk = (ZO,Zl,"" ZJ-l), where we set Zo = 1 (bias signal). The vector Zk
supplies the input for the output layer of L units. The output layer generates an
L-dimensional vector Yk in response to the input vector Xk which, when the network
is fully trained, should be identical (or very close) to the desired output vector d k
associated with Xk.

The two activation functions fh (input layer to hidden layer) and fo (hidden layer
to output layer) are assumed to be differentiable functions. We use the logistic
functions

1 1
fh(s):= 1 + exp(-AhS) , fo (s) := -1-+-ex-p--'-(--A-oS-'-)

where Ah, Ao 2: 1. The logistic function

1
f(s) = 1 + exp( -As)

satisfies the nonlinear differential equation

df
- = Af(l - 1) .
ds

The components of the desired output vector d k must be chosen within the range
of fo. We denote by Wji the weight of the jth hidden unit associated with the input
signal Xi. Thus the index i runs from 0 to n -1, where Xo = 1 and j runs from 1 to
J - 1. We set WOi = O. Now we have m input/output pairs of vectors
14.4 Multilayer Perceptrons 297

where the index k runs from 0 to m - 1. The aim of the algorithm is to adap-
tively adjust the (J - l)n + LJ weights of the network such that the underlying
function/mapping represented by the training set is approximated or learned. We
can define an error function since the learning is supervised, i.e. the target outputs
are available. We denote by Wlj the weight of the lth output unit associated with
the input signal Zj from the hidden layer. We derive a supervised learning rule for
adjusting the weights Wji and Wlj such that the error function

is minimized (in a local sense) over the training set. Here w represents the set of
all weights in the network.

Since the targets for the output units are given, we can use the delta rule directly
for updating the Wlj weights. We define

Since

we find using the chain rule

I::!.Wlj = %(dl - ydf;(netl)zj


where I = 0, 1, ... ,L - 1 and j = 0,1, ... ,J - 1. Here

J-l
netl := L WljZj
j=O

is the weighted sum for the lth output unit, f~ is the derivative of fa with respect to
netl, and w&ew and wlj are the updated (new) and current weight values, respectively.
The Zj values are calculated by propagating the input vector x through the hidden
layer according to
298 Chapter 14. Neural Networks

where j = 1,2, ... ,J - 1 and Zo = 1 (bias signal). For the hidden-layer weights Wji
we do not have a set of target values (desired outputs) for hidden units. However, we
can derive the learning rule for hidden units by attempting to minimize the output-
layer error. This amounts to propagating the output errors (d1 - Yl) back through
the output layer toward the hidden units in an attempt to estimate dynamic targets
for these units. Thus a gradient descent is performed on the criterion function

1 £-1
E(w) = 2 L (d1 - Yl)2
1=0

where w represents the set of all weights in the network. The gradient is calculated
with respect to the hidden weights

j)E
flWji = -rtf< tfw),
<='--- .. ' j = 1,2, ... ,J - 1, i = 0, 1, ... ,n - 1

where the partial derivative is to be evaluated at the current weight values. We find

oE oE OZj onetj
OWji OZj onetj OWji

where

onet j f'( netj ) .


OZj = Jh
-0-- = Xi, -0--
Wji netj

We used the chain rule in this derivation. Since

we obtain
14.4 Multilayer Perceptrons 299

Now we can define an estimated target dj for the jth hidden unit implicitly in terms
of the backpropagated error signal as follows

L-1
dj - Zj := L(d1 - YM~(netl)wlj.
1=0

The complete approach for updating weights in a feedforward neural net utilizing
these rules can be summarized as follows. We do a pattern-by-pattern updating of
the weights.

1. Initialization. Initialize all weights to small random values and refer to them as
current weights w 1j and wJi .

2. Learning rate. Set the learning rates Tlo and 'f/h to small positive values.

3. Presentation of training example. Select an input pattern Xk from the training set
(preferably at random) propagate it through the network, thus generating hidden-
and output-unit activities based on the current weight settings. Thus find Zj and Yl.

4. Forward computation. Use the desired target vector d k associated with Xk, and
employ

to compute the output layer weight changes l:!..wlj.

5. Backward computation. Use

or
300 Chapter 14. Neural Networks

to compute the hidden layer weight changes. The current weights are used in these
computations. In general, enhanced error correction may be achieved if one employs
the updated output-layer weights

wijW = wlj + !:1wlj.


However, this comes at the added cost ofrecomputing YI and J' (nett}.

6. Update weights. Update all weights according to

and

w&ew = wlj + !:1wlj


for the output and for the hidden layers, respectively.

7. Test Jor convergence. This is done by checking the output error function to
see if its magnitude is below some given threshold. Iterate the computation by
presenting new epochs of training examples to the network until the free parameters
of the network stabilize their values. The order of presentation of training examples
should be randomized from epoch to epoch. The learning rate parameter is typically
adjusted (and usually decreased) as the number of training iterations increases.

An example of the back-propagation algorithm applied to the XOR problem is given


in [164]. In the C++ program we apply the back-propagation algorithm to the parity
function, where m = 16 is the number of input vectors each of length 5 (includes
the bias input). The training set is given in Table 14.2. The number of hidden
layer units is 5 which includes the bias input Zo = 1. The neural network must
calculate the parity bit such that the parity is even. By modifying m, n, J and
L the program can easily be adapted to other problems. The arrays x [i] are the
input values. The value x[i] [0] is always 1 for the threshold. The arrays d[i]
are the desired outputs for each input x[i]. In this case d[i] is the odd-parity
bit calculated from x [i] [1] -x [i] [4]. In the program the value y [0], after each
calculation, gives the neural network approximation of the parity calculation.

The following table gives the training set for the odd parity function over four bits.
The equation is
P = A3 E9 A2 E9 Al E9 Ao
where P is the odd parity function and Ao, At, A2 and A3 are the inputs.
14.4 Multilayer Perceptrons 301

Inputs Parity
0 0 0 0 1
0 0 0 1 0
0 0 1 0 0
0 0 1 1 1
0 1 0 0 0
0 1 0 1 1
0 1 1 0 1
0 1 1 1 0
1 0 0 0 0
1 0 0 1 1
1 0 1 0 1
1 0 1 1 0
1 1 0 0 1
1 1 0 1 0
1 1 1 0 0
1 1 1 1 1

Table 14.2: Training Set for Parity Function

II backpr2.cpp
II back propagation

#include <iostream>
#include <math.h> II for exp

using namespace stdj

II activation function (input layer -> hidden layer)


double fh(double net)
{
double lambdah = 10.0j
return 1.0/(1.0 + exp(-lambdah*net))j
}

II activation function (hidden layer -> output layer)


double fo(double net)
{
double lambdao = 10.0j
return 1.0/(1.0 + exp(-lambdao*net))j
}

double scalar(double* a1, double* a2, int length)


{
double result = O.Oj
302 Chapter 14. Neural Networks

for(int i=O; i<length; i++)


{
result += al[i]*a2[i];
}
return result;
}

int mainO
{
int k, i, j, 1, t; II summation index
II k runs over all input pattern k = 0, 1, .. , m-l
III runs over all output units 1 = 0, 1, .. , L-l
II j runs over all the hidden layer units j = 0, 1, .. , J-l
II i runs over the length of the input vector i = 0, 1, .. , n-l

I I learning rates
double etao = 0.05;
double etah = 0.05;

double lambdao = 10.0;


double lambdah = 10.0;

II memory allocations
double** x = NULL;
int m = 16; II number of input vectors for Parity problem
int n = 5; II length of each input vector for Parity problem
II input vectors
x = new double* Em] ;
for(k=O; k<m; k++) x[k] = new double En];

x [0] [0] = 1.0; x[O] [1] = 0.0; x[O] [2] = 0.0; x[O] [3] = 0.0;
x [0] [4] = 0.0;
x [1] [0] 1.0; x [1] [1] = 0.0; x [1] [2] = 0.0; x[l] [3] = 0.0;
x [1] [4] = 1.0;
x [2] [0] = 1.0; x [2] [1] = 0.0; x [2] [2] = 0.0; x [2] [3] 1.0;
x [2] [4] = 0.0;
x [3] [0] 1.0; x[3] [1] = 0.0; x[3] [2] = 0.0; x [3] [3] 1.0;
x[3] [4] = 1.0;
x[4] [0] = 1.0; x [4] [1] = 0.0; x [4] [2] = 1.0; x [4] [3] = 0.0;
x[4] [4] = 0.0;
x[5] [0] = 1.0; x [5] [1] = 0.0; x [5] [2] = 1.0; x [5] [3] = 0.0;
x[5] [4] = 1.0;
x[6] [0] = 1.0; x [6] [1] = 0.0; x [6] [2] = 1.0; x [6] [3] = 1.0;
x[6] [4] = 0.0;
x[7] [0] = 1.0; x [7] [1] = 0.0; x [7] [2] = 1.0; x [7] [3] = 1.0;
x[7] [4] = 1.0;
x[8] [0] = 1.0; x [8] [1] = 1.0; x [8] [2] = 0.0; x [8] [3] = 0.0;
x[8] [4] = 0.0;
14.4 Multilayer Perceptrons 303

x[9] [0] = 1.0; x[9] [1] = 1.0; x[9] [2] = 0.0; x[9] [3] = 0.0;
x[9] [4] = 1.0;
x[10] [0]= 1.0; x[10] [1]= 1.0; x[10] [2]= 0.0; x[10] [3]= 1.0;
x[10] [4]= 0.0;
x[l1] [0]= 1.0; x [11] [1]= 1.0; x[l1] [2]= 0.0; x[l1] [3]= 1.0;
x[l1] [4]= 1.0;
x[12] [0]= 1.0; x[12] [1]= 1.0; x[12] [2]= 1.0; x[12] [3]= 0.0;
x[12] [4]= 0.0;
x[13] [0]= 1.0; x [13] [1]= 1.0; x[13] [2]= 1.0; x[13] [3]= 0.0;
x[13] [4]= 1.0;
x[14] [0]= 1.0; x[14] [1]= 1.0; x[14] [2]= 1.0; x[14] [3]= 1.0;
x[14] [4]= 0.0;
x[15] [0]= 1.0; x[15] [1]= 1.0; x[15] [2]= 1.0; x[15] [3]= 1.0;
x[15] [4]= 1.0;

II desired output vectors


II corresponding to set of input vectors x
double** d = NULL;
II number of outputs for Parity problem
int L = 1;
d = new double* em];
for(k=O; k<m; k++) d[k] = new double [L];
d[O] [0] = 1.0; d[1] [0] = 0.0; d[2] [0] = 0.0; d[3] [0] = 1.0;
d[4] [0] = 0.0; d[5] [0] = 1.0; d[6] [0] = 1.0; d[7] [0] = 0.0;
d[8] [0] = 0.0; d[9] [0] = 1.0; d[10] [0]= 1.0; d[ll] [0]= 0.0;
d[12] [0]= 1.0; d[13] [0]= 0.0; d[14] [0]= 0.0; d[15] [0]= 1.0;

II error function for each input vector


double* E = NULL;
E = new double [m];

double totalE = 0.0; II sum of E[k] k = 0, 1, .. , m

II weight matrix (input layer -> hidden layer);


II number of hidden layers includes 0
II current
int J = 5;
double** We = NULL;
We = new double* [J];
for(j=O; j<J; j++) Wc[j] = new double en];

We [0] [0] = 0.0; We [0] [1] = 0.0; We [0] [2] = 0.0; We [0] [3] = 0.1;
Wc[O] [4] = -0.2;
We [1] [0] = -0.2; We [1] [1] = 0.5; We [1] [2] = -0.5; We [1] [3] = 0.3;
We [1] [4] = 0.1;
We [2] [0] = -0.3; Wc[2] [1] = -0.3; We [2] [2] = 0.7; We [2] [3] = 0.1;
Wc[2] [4] = -0.2;
We [3] [0] = 0.2; We [3] [1] = 0.1; We [3] [2] = 0.5; We [3] [3] = -0.3;
304 Chapter 14. Neural Networks

Wc[3] [4] = -0.1;


We [4] [0] = -0.3; We [4] [1] = -0.1; We [4] [2) = 0.1; We [4] [3] = 0.3;
Wc(4) [4) = 0.2;

II new
double** Wnew = NULL;
Wnew = new double* [J];
for(j=O; j<J; j++) Wnew[j) = new double [n];
II weight matrix (hidden layer -> output layer)
II current
double** Whc = NULL;
Whc = new double* [L];
for(l=O; l<L; 1++) Whc[l) = new double [J);

Whc[O) [0] = -0.2; Whc[O) [1] = 0.3; Whc[O] [2) = 0.5;

II new
double** Whnew = NULL; Whnew = new double* [L);
for(l=O; l<L; 1++) Whnew[l] = new double [J];

II vector in hidden layer


double* z = NULL; z = new double [J);
z[O] = 1.0;

II vector output layer (output layer units)


II for the Parity problem the output layer has only one element
double* y = NULL; Y = new double [L);

II increment matrix (input layer -> hidden layer)


double** delW = NULL; delW = new double* [J);
for(j=O; j<J; j++) delW[j) = new double [n);

II increment matrix (hidden layer -> output layer)


double** delWh = NULL; delWh = new double* [L);
for(l=O; l<L; 1++) delWh[l) = new double [J);

II net vector (input layer -> hidden layer)


double* netj = NULL; netj = new double [J);
netj[O] = 0.0;

II net vector (hidden layer -> output layer)


double* netl = NULL; netl = new double [L];

II training session
int T = 10000; II number of iterations
for(t=O; t<T; t++)
{
14.4 Multilayer Perceptrons 305

II for loop over all input pattern


for(k=O; k<m; k++)
{

for(j=l; j<J; j++)


{
netj[j] = scalar(x[k],Wc[j],n);
z[j] = fh(netj[j]);
}

for(l=O; l<L; 1++)


{
netl[l] = scalar(z,Whc[l],J);
y [1] = fo(netl[l]);
}

for(l=O; l<L; 1++)


for(j=O; j<J; j++)
delWh[l] [j] =
etao*(d[k] [1]-y[1])*lambdao*fo(netl[1])*(1.0-fo(netl[1] »*z[j];

double* temp = NULL;


temp = new double [J];
for(j=O; j<J; j++)
temp[j] = 0.0;

for(j=O; j<J; j++)


for(l=O; l<L; 1++)
temp[j] +=
(d[k] [1]-y[1])*fo(netl[1])*(l.0-fo(netl[1]»*Whc[1] [j];

for(j=O; j<J; j++)


for(i=O; i<n; i++)
delW[j] [i] =
etah*temp[j]*lambdah*fh(netj[j])*(1.0-fh(netj[j]»*x[k][i];

for(i=O; i<n; i++)


delW[O][i] = 0.0;

II updating the weight matrices


for(j=O; j<J; j++)
for(i=O; i<n; i++)
Wnew[j] [i] = Wc[j] [i] + delW[j] [i];

for(l=O; l<L; 1++)


for(j=O; j<J; j++)
Whnew[l] [j] = Whc[l] [j] + delWh[l] [j];
306 Chapter 14. Neural Networks

II setting new to current


for(j=O; j<J; j++)
for(i=O; i<n; i++)
Wc [j] [i] = Wnew[j] [i] ;

for(l=O; l<L; 1++)


for(j=O; j<J; j++)
Whc [l] [j] = Whnew [1] [j] ;

E[k] = 0.0;
double sum = 0.0;
for(l=O; l<L; 1++)
sum += (d[k] [1] - y[l])*(d[k][l] - y[l]);

E[k] = sum/2.0;
totalE += E[k];
} II end for loop over all input pattern
if(totalE < 0.0005) goto L;
else totalE = 0.0;
} II end training session
L:
cout « "number of iterations "« t « endl;

II output after training


for(j=O; j<J; j++)
for(i=O; i<n; i++)
cout « "WC[" « j « II] [II « i « II]
« Wc[j] [i] « endl;
cout « end1;

for(l=O; l<L; 1++)


for(j=O; j<J; j++)
cout « "Whc [II « 1 « II] [II « j « II]
« Whc[l] [j] « endl;

II testing the Parity function


II input (1,0,0,0,0)
forCj=l; j<J; j++)
{
netj[j] = scalar (x [0], Wc[j], n) ;
z[j] = fh(netj[j]);
}

for(l=O; l<L; 1++)


{
netl[l] = scalar(z,Whc[l],J);
y[1] = fo(net1[1]);
14.4 Multilayer Perceptrons 307

cout « "y[" « 1 « II] = II « y[l] « endl;


}

II input (1,0,0,0,1)
for(j=l; j<J; j++)
{
netj [j] = scalar (x [1] , Wc[j] ,n) ;
z [j] = fh(netj[j]);
}

for(l=O; l<L; 1++)


{
netl[l] = scalar(z,Whc[l],J);
y [1] = f 0 (netl[l] ) ;
cout « "y[" « 1 « "] = II « y[l] « endl;
}

II input (1,0,0,1,0)
for(j=l; j<J; j++)
{
netj[j] = scalar(x[2],Wc[j],n);
z [j] = fh(netj [j]) ;
}

for(l=O; l<L; 1++)


{
netl[l] = scalar(z,Whc[l],J);
yEll = fo(netl[l]);
cout « "y[" « 1 « II] = " « yEll « endl;
}

II input (1,0,0,1,1)
for(j=l; j<J; j++)
{
netj[j] = scalar(x[3],Wc[j] ,n);
z [j] = fh(netj[j]);
}
for(l=O; l<L; 1++)
{
netl[l] = scalar(z,Whc[l] ,J);
yEll = fo(netl[l]);
cout « "y[" « 1 « II] = II « yEll « endl;
}

II input (1,0,1,0,0)
for(j=l; j<J; j++)
{
netj[j] = scalar(x[4],Wc[j] ,n);
308 Chapter 14. Neural Networks

Z [j] = fh(netj[j]);
}
for(I=O; I<L; 1++)
{
netl[l] = scalar(z,Whc[I],J);
y[l] = fo(netl[I]);
cout « "y[" « 1 « II] = II « y[l] « endl;
}

II input (1,0,1,0,1)
for(j=l; j<J; j++)
{
netj[j] = scalar(x[5],Wc[j],n);
Z[j] = fh(netj [j]) ;
}
for(I=O; I<L; 1++)
{
netl[l] = scalar(z,Whc[I],J);
y[l] = fo(netl[I]);
cout « "y[" « 1 « II] = II « y[l] « endl;
}

II input (1,0,1,1,0)
for(j=l; j<J; j++)
{
netj[j] = scalar(x[6],Wc[j] ,n);
z [j] = fh(netj[j]);
}
for(I=O; I<L; 1++)
{
netl[l] = scalar(z,Whc[I],J);
y[l] = fo(netl[I]);
cout « "y[" « 1 « II] = II « y[l] « endl;
}

II input (1,0,1,1,1)
for(j=l; j<J; j++)
{
netj[j] = scalar(x[7],Wc[j],n);
z [j] = fh(netj [j]) ;
}
for(I=O; I<L; 1++)
{
netl[l] = scalar(z,Whc[I],J);
y[l] = fo(netl[I]);
cout « "y[" « 1 « II] = II « y[l] « endl;
}
14.4 Multilayer Perceptrons 309

II input (1,1,0,0,0)
for(j=l; j<J; j++)
{
netj[j] = sealar(x [8], We [j],n);
z[j] = fh(netj[j]);
}
for(l=O; l<L; 1++)
{
netl[l] = sealar(z,Whe[l],J);
y[l] = fo(netl[l]);
eout « "y[" « 1 « II] = II « y[l] « endl;
}

II input (1,1,0,0,1)
for(j=l; j<J; j++)
{
netj[j] = sealar(x[9],We[j],n);
z[j] = fh(netj [j]) ;
}
for(l=O; l<L; 1++)
{
netl[l] = sealar(z,Whe[l],J);
y[l] = fo(netl[l]);
eout « "y[" « 1 « II] = II « y[l] « endl;
}

II input (1,1,0,1,0)
for(j=l; j<J; j++)
{
netj[j] = sealar(x[10],We[j],n);
z[j] = fh(netj[j]);
}
for(l=O; l<L; 1++)
{
netl[l] = sealar(z,Whe[l],J);
y[l] = fo(netl[l]);
eout « "y[" « 1 « II] = II « y[l] « endl;
}

II input (1,1,0,1,1)
for(j=l; j<J; j++)
{
netj[j] = sealar(x[ll],We[j],n);
z [j] = fh(netj[j]);
}
for(l=O; l<L; 1++)
{
netl[l] = sealar(z,Whe[l],J);
310 Chapter 14. Neural Networks

y[l] = fo(net1[1]);
cout « "y[" « 1 «II] "« y[l] « end1;
}

II input (1,1,1,0,0)
for(j=l; j<J; j++)
{
netj[j] = sca1ar(x[12],Wc[j],n);
z[j] = fh(netj [jD ;
}
for(l=O; l<L; 1++)
{
net1[1] = sca1ar(z,Whc[1],J);
y[l] = fo(net1[1]);
cout « "y[" « 1 « II] = II « y[l] « end1;
}

II input (1,1,1,0,1)
for(j=l; j<J; j++)
{
netj[j] = sca1ar(x[13],Wc[j],n);
z [j] = fh(netj [jD ;
}
for(l=O; l<L; 1++)
{
net1[1] = sca1ar(z,Whc[1],J);
y[l] = fo(net1[1]);
cout « "y[" « 1 « II] = II « y[l] « end1;
}

II input (1,1,1,1,0)
for(j=l; j<J; j++)
{
netj[j] = sca1ar(x[14],Wc[j],n);
z[j] = fh(netj[j]);
}
for(l=O; l<L; 1++)
{
net1[1] = sca1ar(z,Whc[1],J);
y[l] = fo(net1[1]);
cout « "y[" « 1 « II] = " « y[l] « end1;
}

II input (1,1,1,1,1)
for(j=l; j<J; j++)
{
netj[j] = sca1ar(x[15],Wc[j],n);
z[j] = fh(netj [jD;
14.4 Multilayer Perceptrons 311

}
for(I-0; I<L; 1++)
{
netl[I] = scalar(z,Whc[I],J);
y[l] = fo(netl[l]);
cout « "y[" « 1 « II] - " « y[l] « endl;
}

return 0;
}

The output is

number of iterations - 10000


We [0] [0] - 0
We [0] [1] = 0
We [0] [2] = 0
Wc[O] [3] = 0.1
Wc[O] [4] = -0.2
We[l] [0] = -0.890614
We[l] [1] = 0.199476
We[l] [2] = -0.592286
We[l] [3] = 0.605594
We[l] [4] = 0.604114
We [2] [0] = -0.379614
We [2] [1] = -0.777377
We [2] [2] = 0.777529
We[2] [3] = 0.758172
We [2] [4] = 0.760994
We [3] [0] = 0.538437
We [3] [1] = 0.372678
Wc[3] [2] = 0.512117
We [3] [3] = -0.656055
We[3] [4] = -0.65043
We [4] [0] = -0.0856427
We [4] [1] = -0.165472
We [4] [2] = 0.161642
We [4] [3] = 0.151453
We [4] [4] = 0.151421

Whe[O] [0] = -2.05814


Whc[O] [1] = 1.47181
Whe[0][2] = -2.45669
Whe [0] [3] = 1.37033
Whe[O] [4] = 3.96504
yeO] - 0.987144
312 Chapter 14. Neural Networks

y[O] = 5.96064e-07
yeO] = 5.32896e-07
yeo] = 0.989954
yeo] = 0.0183719
yeo] = 0.986117
yeo] = 0.98594
y[O] = 0.0110786
yeo] = 0.0200707
yeo] = 0.998834
yeo] = 0.998846
yeo] = 0.00840843
yeo] = 0.983464
yeo] = 0.00589264
yeo] = 0.00599696
yeo] = 0.996012
The values y [0] approximate the parity function.
Chapter 15
Genetic Algorithms

15.1 Introduction
Evolutionary methods have gained considerable popularity as general-purpose
robust optimization and search techniques. The failure of traditional optimiza-
tion techniques in searching complex, uncharted and vast-payoff landscapes riddled
with multimodality and complex constraints has generated interest in alternate ap-
proaches.

Genetic algorithms (Holland [89]' Goldberg [72], Michalewicz [116], Steeb [164]) are
self-adapting strategies for searching, based on the random exploration of the solu-
tion space coupled with a memory component which enables the algorithms to learn
the optimal search path from experience. They are the most prominent, widely used
representatives of evolutionary algorithms, a class of probabilistic search algorithms
based on the model of organic evolution. The starting point of all evolutionary
algorithms is the population (also called farm) of individuals (also called animals,
chromosomes, strings). The individuals are composed of genes which may take on a
number of values (in most cases 0 and 1) called alleles. The value of a gene is called
its allelic value, and it ranges on a set that is usually restricted to {O, I}. Thus these
individuals are represented as binary strings of fixed length, for example

"10001011101"

Each individual can be uniquely represented as an unsigned integer. For example


the bit string given above corresponds to the integer

If the binary string has length N, then 2N binary strings can be formed. If we
describe a DNA molecule the alphabet would be a set of 4 symbols, {A, C, G, T}
where A stands for Adenine, C stands for Cytosine, G stands for Guanine and
T stands for Thymine. Strings of length N from this set allow for 4N different
individuals. We can also associate unsigned integers with these strings.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
314 Chapter 15. Genetic Algorithms

For example

II TCCGAT"
is associated with the integer

For the four colour problem we also use an alphabet of 4 symbols, {R, G, B, Y}
where R stands for red, G stands for green, B stands for blue and Y stands for
yellow.

Each of the individuals represents a search point in the space of potential solu-
tions to a given optimization problem. Then random operators model selection,
reproduction, crossover and mutation. The optimization problem gives quality in-
formation (fitness function or short fitness) for the individuals and the selection
process favours individuals of higher fitness to transfer their information (string) to
the next generation. The fitness of each string is the corresponding function value.
Genetic algorithms are specifically designed to treat problems involving large search
spaces containing multiple local minima. The algorithms have been applied to a
large number of optimization problems. Examples are solutions of ordinary differ-
ential equations, the smooth genetic algorithm, genetic algorithms in coding theory,
Markov chain analysis, the DNA molecule.

In the fundamental approach to finding an optimal solution, a fitness function (also


called cost function) is used to represent the quality of the solution. The objective
function to be optimized can be viewed as a multidimensional surface where the
height of a point on the surface gives the value of the function at that point. In
case of a minimization problem, the wells represent high-quality solutions while the
peaks represent low-quality solutions. In case of a maximization problem, the higher
the point in the topography, the better the solution.

The search techniques can be classified into three basic categories.

(1) Classical or calculus-based. This uses a deterministic approach to find the best
solution. This method requires the knowledge of the gradient or higher-order
derivatives. The technique can be applied to well-behaved problems.

(2) Enumemtive. With these methods, all possible solutions are generated and
tested to find the optimal solution. This requires excessive computation in
problems involving a large number of variables.

(3) Random. Guided random search methods are enumerative in nature; how-
ever, they use additional information to guide the search process. Simulated
annealing and evolutionary algorithms are typical examples of this class of
search methods.
15.2 The Sequential Genetic Algorithm 315

15.2 The Sequential Genetic Algorithm


The genetic algorithm evolves a multiset of elements called a population of indi-
viduals or farm of animals. Each individual Ai (i = 1,··· , n) of the population A
represents a trial solution of the optimalization problem to be solved. Individuals
are usually represented by strings of variables, each element of which is called a
gene. The value of a gene is called its allelic value, and it ranges on a set that is
usually restricted to {O, I}.

The population of individuals is also called a farm of animals in the literature.


Furthermore an individual or animal is also called a chromosome or string.

A genetic algorithm is capable of maximizing a given fitness function f computed


on each individual of the population. If the problem is to minimize a given objec-
tive function, then it is required to map increasing objective function values into
decreasing f values. This can be achieved by a monotonically decreasing function.
The standard genetic algorithm is the following sequence:

Step 1. Randomly generate an initial population A(O) := (Al(O),···, An(O)).

Step 2. Compute the fitness f(A;(t)) of each individual Ai(t) of the current popu-
lation A(t).

Step 3. Generate an intermediate population Ar(t) by applying the reproduction


operator.

Step 4. Generate A(t + 1) by applying some other operators to Ar(t).

Step 5: t := t + 1 if not (enLtest) goto Step 2.

The most commonly used operators are the following:

1) Reproduction (selection). This operator produces a new population, Ar(t), ex-


tracting with repetition individuals from the old population, A(t). The extraction
can be carried out in several ways. One of the most commonly used method is the
roulette wheel selection, where the extraction probability Pr (Ai (t)) of each individual
A;(t) is proportional to its fitness f(Ai(t)).

2) Crossover. This operator is applied in probability, where the crossover probability


is a system parameter, Pc. To apply the standard crossover operator (several vari-
ations have been proposed) the individuals of the population are randomly paired.
Each pair is then recombined, choosing one point in accordance with a uniformly
distributed probability over the length of the individual strings (parents) and cut-
ting them in two parts accordingly. The new individuals (offspring) are formed by
the juxtaposition of the first part of one parent and the last part of the other parent.
316 Chapter 15. Genetic Algorithms

3) Mutation. The standard mutation operator modifies each allele of each individual
of the population in probability, where the mutation probability is a system param-
eter, Pm. Usually, the new allelic value is randomly chosen with uniform probability
distribution.

4) Local search. The necessity of this operator for optimization problems is still
under debate. Local search is usually a simple gradient-descent heuristic search
that carries each solution to a local optimum. The idea behind this is that search in
the space of local optima is much more effective than search in the whole solution
space.

The purpose of parent selection (also called setting up the farm of animals) in a
genetic algorithm is to give more reproductive chances, on the whole, to those pop-
ulation members that are the most fit. We use a binary string as a chromosome to
represent real value of the variable x. The length of the binary string depends on
the required precision. A population or farm could look like

"10101110011111110"
"00111101010100001"

"11111110101010111" <- individual (chromosome, animal, string)

"10101110010000110"

For the crossover operation the individuals of the population are randomly paired.
Each pair is then recombined, choosing one point in accordance with a uniformly
distributed probability over the length of the individual strings (parents) and cutting
them in two parts, accordingly. The new individuals (offspring) are formed by the
part of one part and the last part of the other. An example is

1011011000100101 parent
0010110110110111 parent
I I
1011010110110101 child
0010111000100111 child

The mutation operator modifies each allele (a bit in the bitstring) of each individual
of the population in probability. The new allele value is randomly chosen with
uniform probability distribution. An example is

1011011001011001 parent
I
1011111001011001 child
15.2 The Sequential Genetic Algorithm 317

The bit position is randomly selected. Whether the child is selected is decided by
the fitness function.

We have to map the binary string into a real number x with a given interval [a, b]
(a < b). The length of the binary string depends on the required precision. The
total length of the interval is b - a. The binary string is denoted by

where So is the least significant bit (LSB) and SN-I is the most significant bit (MSB).
In the first step we convert from base 2 to base 10

N-l
m = L si2i.
i=O

In the second step we calculate the corresponding real number on the interval [a, b]

b-a
x=a+m-
2N .
-1

Obviously if the bit string is given by "000 ... 00" we obtain x a and if the
bitstring is given by "111 ... 11" we obtain x = b.

We consider the two-dimensional case. The extension to higher dimensions is


straightforward. Consider the two-dimensional domain

[a, b] x [C, d]

which is a subset of R2. The coordinates are Xl and X2, i.e. Xl E [a, b] and X2 E [c, d].
Given a bitstring

of length

The block

is identified with ml, i.e.


318 Chapter 15. Genetic Algorithms

N1-I
ml = L Si 2i
i=O

and therefore

b-a
Xl =a+ml-N- - ·
2 1-1

The block

SN-ISN-2· .. SNI

is identified with the variable m2, i.e.

N-I
m2 = L Si 2i - N1
i=Nl

and therefore

d-c
X2 = c+ m2-N- - '
2 2-1

Example. In the one-dimensional case consider the binary string 10101101 of length
8 and the interval [-1,1]. Therefore

Thus

2
x=-1+173 256 _ 1 =0.357.

Example. In the two-dimensional case consider the binary string 0000000000000000


with NI = N2 = 8 and the domain [-1,1] x [-1,1]. Then we find ml = m2 = 0,
Xl = -1 and X2 = -1.
15.2 The Sequential Genetic Algorithm 319

Reversing a bit string can also be used as a technique to introduce variation in


genetic algorithms. The operation is useful for implementing the Fourier transform.
It is quite simple to reverse a bit sequence, for example the following C++ program
implements the operation on integers. The size of the data type unsigned int is 4
bytes (32 bits). For each least significant bit of i is place in the least significant bit
position of r. Then i is shifted right and r is shifted left. The process is repeated
for each bit in i.
II reverse.cpp

#include <iostream.h>

unsigned int reverse(unsigned int i)


{
int j;
unsigned int r = 0;
int len = sizeof(i)*8;

for(j=O;j < len;j++)


{
r = r*2 + (i%2);
i 1= 2;
}

return r;
}

void main(void)
{
cout « reverse(23) « endl;

II The output is 3892314112


}

Since 23 is the bitstring

00000000 00000000 00000000 00010111


we obtain

11101000 00000000 00000000 00000000

which is 3892314112 in decimal.


320 Chapter 15. Genetic Algorithms

15.3 Gray Code


The Gray code is an encoding of numbers so that adjacent numbers have a single
digit differing by 1. It plays an important role in genetic algorithms. The binary
Gray code can be used instead of the usual interpretation of binary values. The
binary Gray code is an encoding of integers so that incrementing an integer value
involves complementing exactly one bit in the bit string representation. For example
the 3-bit binary Gray code is given in Table 15.1.

Decimal Binary Gray code


0 000 000
1 001 001
2 010 011
3 011 010
4 100 110
5 101 111
6 110 101
7 111 100

Table 15.1: 3 Bit Binary Gray Code

The advantage of the Gray code for genetic algorithms is that the mutation operator
does not cause a large change in the numeric value of an animal in the population.
Large changes are provided by additions of randomly initialized animals to the
population at regular intervals. Thus mutation would provide a more local search.

The conversion from standard binary encoding to binary Gray code is achieved as
follows. If we want to convert the binary sequence bn - 1 bn - 2 .•• bo to its binary Gray
code gn-lgn-2 ... go, the binary Gray code is

Thus gn-l = bn- 1 and gi = bi +1 EB bi for 0 < i :::; n - 1. To use numerical values
in calculations we need to apply the inverse Gray encoding. To convert the binary
Gray code gn-lgn-2 .. ·go to the binary number bob1b2 ... bn- 1 we use

bi = gn-l EB gn-2 EB ... EB gi'

The following Java program gives an implementation. We apply the built in Bi tSet
class in Java.
15.3 Gray Code 321

/ / Gray. java

import java.util.*;

public class Gray


{
static int size;

public static void main(String args[])


{
BitSet[] b=new BitSet[8];

size=3;

for(int i=0;i<8;i++)
{
b[i]=new BitSet(size);
if«i&l)==l) b[i] .set(O);
if«i&2)==2) b[i] .set(l);
if«i&4)==4) b[i] .set(2);
System.out.println("binary to gray "+btos(b[i])+"
+btos(b[i]=graycode(b[i])));
}
for(int i=0;i<8;i++)
{
System.out.println(lI gray to binary "+btos(b[i])+"
+btos(inversegraycode(b[i])));
}
}

private static String btos(BitSet b)


{
String s=nev String();

for(int i=O;i<size;i++)
{
if(b.get(i)) s=ll"+s;
else S="O"+S;
}
return S;
}

private static BitSet graycode(BitSet b)


{
BitSet g=new BitSet(size);
BitSet gsr=new BitSet(size);

//perform a right shift of g


322 Chapter 15. Genetic Algorithms

for(int i=Oji<sizeji++)
{
if (b.get(i»
{
g.set(i)j
if(i>O)
gsr.set(i-l)j
}
}
g.xor(gsr)j
return gj
}

private static BitSet inversegraycode(BitSet b)


{
BitSet ig=new BitSet(size)j

for(int i=Oji<sizeji++)
{
int sum=Oj
for(int j=ijj<sizejj++)
{
if(b.get(j» sum++j
}
if «(sum%2)==1)
ig.set(i)j
else
ig.clear(i)j
}
return igj
}
}
15.4 Schemata Theorem 323

15.4 Schemata Theorem


A schema (Holland [89],Goldberg [72]) is a similarity template describing a subset of
strings with similarities at certain string positions. We consider the binary alphabet
{O, I}. We introduce a schema by appending a special symbol to this alphabet. We
°
add the * or don't care symbol which matches either or 1 at a particular position.
With this extended alphabet we can now create strings (schemata) over the ternary
alphabet

{ 0, 1, *} .
A schema matches a particular string if at every location in the schema 1 matches
a 1 in the string, a 0 matches a 0, and a * matches either. As an example, consider
the strings and schemata of length 5. The schema

describes a subset with four members

01010, 01011, 11010, 11011

We consider a population of individuals (strings) Aj , j = 1,2, ... , n contained in


the population A(t) at time (or generation) t (t = 0,1,2, ... ) where the boldface
is used to denote a population. Besides notation to describe populations, strings,
bit positions, and alleles, we need a convenient notation to describe the schemata
contained in individual strings and populations. Let us consider a schema H taken
from the three-letter alphabet

V := {O, 1, * }.
For alphabets of cardinality k, there are (k + 1)1 schemata, where I is the length of
the string. Furthermore, recall that in a string population with n members there
are at most n . 21 schemata contained in a population because each string is itself a
representative of 21 schemata. These counting arguments give us some feel for the
magnitude of information being processed by genetic algorithms.

All schemata are not created equal. Some are more specific than others. The schema
011*1** is a more definite statement about important similarity than the schema
0******. Furthermore, certain schemata span more of the total string length than
others. The schema 1****1* spans a larger portion of the string than the schema
1*1****. To quantify these idea.'l, two schema properties are introduced: schema
order and defining length.

Definition. The order of a schema H, denoted by o(H), is the number of fixed


positions (in a binary alphabet, the number of 1's and O's) present in the template.
324 Chapter 15. Genetic Algorithms

Example. The order of the schema 011*1** is 4, whereas the order of the schema
0****** is 1.

Definition. The defining length of a schema H, denoted by 6(H), is the distance


between the first and last specific string position.

Example. The schema 011*1** has defining length 6 = 4 because the last specific
position is 5 and the first specific position is 1. Thus 6(H) = 5 - 1 = 4.

Schemata provide the basic means for analyzing the net effect of reproduction and
genetic operators on building blocks contained within the population. Let us con-
sider the individual and combined effects of reproduction, crossover, and mutation
on schemata contained within a population of strings. Suppose at a given time step
t there are m(H, t) examples of a particular schema H contained within the popu-
lation A(t). During reproduction, a string is copied according to its fitness, or more
precisely a string Ai gets selected with probability

After picking a non-overlapping population of size n with replacement from the


population A(t), we expect to have m(H, t + 1) representatives of the schema H in
the population at time t + 1 as given by

(H 1) = m(H, t)nf(H)
m ,t+ "n f.()
L...J=l J t

where f(H) is the average fitness of the strings representing schema H at time t.
The average fitness of the entire population is defined as

_ 1 n
f:= -
n
Eh·
j=l

Thus we can write the reproductive schema growth equation as follows

f(H)
m(H, t + 1) = m(H, t) J(t) .
15.4 Schemata Theorem 325

Assuming that f(H)/ J remains relatively constant for t = 0,1, ... , the preceding
equation is a linear difference equation x(t + 1) = ax(t) with constant coefficient
which has the solution x(t) = atx(O). A particular schema grows as the ratio of the
average fitness of the schema to the average fitness of the population. Schemata with
fitness values above the population average will receive an increasing number of sam-
ples in the next generation, while schemata with fitness values below the population
average will receive a decreasing number of samples. This behaviour is carried out
with every schema H contained in a particular population A in parallel. In other
words, all the schemata in a population grow or decay according to their schema
averages under the operation of reproduction alone. Above-average schemata grow
and below-average schemata die off. Suppose we assume that a particular schema
H remains an amount cJ above average with c a constant. Under this assumption
we find

m(H, t + 1) = m(H, t) (J ~ cJ) = (1 + c)m(H, t).

Starting at t = 0 and assuming a stationary value of c, we obtain the equation

m(H, t) = m(H, 0)(1 + cr

This is a geometric progression or the discrete analog of an exponential form. Repro-


duction allocates exponentially increasing (decreasing) numbers of trials to above-
(below-) average schemata. The fundamental theorem of genetic algorithms is as
follows (Goldberg [72]).

Theorem. By using the selection, crossover, and mutation of the standard genetic
algorithm, then short, low-order, and above average schemata receive exponentially
increasing trials in subsequent populations.

The short, low-order, and above average schemata are called building blocks. The
fundamental theorem indicates that building blocks are expected to dominate the
population. It is necessary to determine if the original goal of function optimization
is promoted by this fact. The preceding theorem does not answer this question.
Rather, the connection between the fundamental theorem and the observed opti-
mizing properties of the genetic algorithm is provided by the following conjecture.

The Building Block Hypothesis. The globally optimal strings in n


f : n ---+ R with n = { 0, 1}n
may be partitioned into substrings that are given by the bits of the fixed positions
of building blocks.
326 Chapter 15. Genetic Algorithms

15.5 Markov Chain Analysis


Vose [180] showed that the stochastic transition through genetic operations of cross-
over and mutation can be fully descibed by the transition matrix Q of size N x N
where the matrix element Qk,v is the conditional probability that population v is
generated from population k. The total number of different populations is denoted
by N. Suppose populations consist of M individuals, each of length L over an
alphabet of size a. We denote by nk,j the number of individuals in population k of
type j where 0 :::; j < a L . We use the notation

n times
o(n)=~

The population can be represented by


o(nk,o) lQ(nk,l) 1 ... lQ(nk,.,L_l)

which is a M + a L - 1 bit representation. We use the '1' symbol to mark the end
of the number of occurences of one individual and the beginning of the number of
occurences of the next. Thus the number of different populations is

When the new population consists only of individuals generated by selection, cross-
over and mutation the following equation for Q is obtained

(p )n .
Qk ,=v
MI
'
II
"L_1
k,j
I
",3
j=O nv,j'

where Pk,j is the probability that individual j occurs in population k, and nv,i is
generated according to the multinomial distribution based on Pk,j' Furthermore
Vose [180] derived

for any population k. Here J.L is the probability of mutation. Suzuki [168] analysed
the modified elitist strategy for genetic algorithms. This strategy always selects the
15.5 Markov Chain Analysis 327

fittest individual i k of a population k to be in the next generation (population), and


the other M - 1 individuals are obtained by the operations of selection, crossover
and mutaton operations. He obtained

Qk,v = H(i k _ i v )(M _ 1)1. "rr


L
-
1
(Pk,jtv,j-Oj,i k
( 5:)1
j=O nv,j - Uj,ik •

where i k is the fittest individual of population k, and

I x2:0
H(x):= { 0 x<O

and OJ,ik denotes the Kronecker symbol, i.e.

I j = ik
{
OJ,ik = 0 otherwise

The matrix Q consists of submatrices Q(i) of size N(i) x N(i) along the diagonal
and zero above these matrices. For the size N(i) we have

N(i) = (M -1M-1
+ 0:£ - i)
where Q(i) denotes the submatrix associated with the ith fittest individual of the
i k . The eigenvalues of each submatrix Q(i) are eigenvalues of Q. Furthermore, the
eigenvalues have magnitude not more than one. Denote by q/: the probability that
the nth generation (population) is population k, and by K the set of all populations
which include the fittest individual. To demonstrate the convergence of the genetic
algorithm using the modified elitist strategy Suzuki [168] showed that there exists
a constant C such that
L q~ 2: 1 - Cl-X.ln
kEK

where ,x,. is the eigenvalue with greatest magnitude. Thus, with enough iterations,
the probablity that a population includes the fittest individual is close to unity.
328 Chapter 15. Genetic Algorithms

15.6 Bit Set Classes in C++ and Java


In genetic algorithms bitwise operations play the central role. In this section we
describe these operations. The basic bit operations setbit, clearbit, swapbit
and testbi t can be implemented in C++ as follows. The bit position b runs from
o to 31 starting counting from right to left in the bit string. In C, C++, and Java
the bitwise operators are:

& bitwise AND


1 bitwise OR (inclusive OR)
- bitwise XOR (exclusive OR)
- NOT operator (one's complement)
» right-shift operator
« left-shift operator

The operation setbit sets a bit at a given position b (i.e the bit at the position b is
set to 1).

unsigned long b = 3;
unsigned long x = 15;
x 1= (1 « b); II shortcut for x = x 1 (1 « b);

The operation clearbit clears a bit at a given position b (i.e. the bit at the position
b is set to 0).

unsigned long b 3;
unsigned long x 15;
x &= - (1 « b); II short cut for x = x & -(1 « b);

The operation swapbi t swaps the bit at the position b, i.e. if the bit is 0 it is set to
1 and if the bit is 1 it is set to o.

unsigned long b = 3;
unsigned long x 15;
x -= (1 « b); II short cut for x =x - (1 « b);

The operation testbi t returns 1 or 0 depending on whether the bit at the position
b is set or not.

unsigned long b = 3;
unsigned long x = 15;
unsigned long result = «x & (1 «b» != 0);
15.6 Bit Set Classes in C++ and Java 329

The operations setbit, clearbit, swapbit and testbit are written as functions.
This leads to the following program.

II mysetbit.cpp

#include <iostream.h>

inline void setbit(unsigned long& x, unsigned long b)


{
x 1= (1 « b);
}

inline void clearbit(unsigned long& x, unsigned long b)


{
x &= - (1 « b);
}

inline void swapbit(unsigned long& x, unsigned long b)


{

inline unsigned long testbit(unsigned long x, unsigned long b)


{
return «x & (1 «b» != 0);
}

int mainO
{
unsigned long b = 3;
unsigned long x = 10; II binary 1010
setbit(x,b);
cout « "x = " « x « endl; II 10 => binary 1010

clearbit(x,b);
cout « "x = " « x « endl; II 2 => binary 10

swapbit(x,b) ;
cout « "x = " « x « endl; II binary

unsigned long r = testbit(x,b);


cout « "r = " « r « endl; II 0

unsigned long y = 17; II 17 => binary 10001


setbit(y,b);
cout « "y = " « y « endl; II 25 => binary 11001
clearbit(y,b);
cout « "y = " « y « endl; II 17 => binary 10001
330 Chapter 15. Genetic Algorithms

unsigned long s = testbit(y,b);


cout « "s = " « s « endl; I I 0

unsigned long z = 8; II binary 8 => 1000


unsigned long t = testbit(z,b);
cout « "t = " « t « endl; II 1

return 0;
}

Java has a BitSet class which includes the following methods (member functions):

void and(BitSet set) performs a logical AND

void andNot(BitSet set) clears all of the bits in this BitSet


whose corresponding bit is set in the
specified BitSet

void clear(int bitlndex) the bit with index bitlndex in this BitSet
is changed to the clear (false) state

boolean get(int bitlndex) returns the value of the bit with the
specified index

void or(Bitset set) performs a logical OR of this bit set with the
bit set argument

void xor(BitSet set) performs a logical XoR of this bit set with
the bit set argument

The constructors are

BitSet() creates a new bit set

BitSet(int nbits) creates a bit set whose initial size is the


specified number of bits

The BitSet class will be used in the program for the four colour problem.
15.6 Bit Set Classes in C++ and Java 331

In C++ we can use the standard template library's bitset class. The methods are

Constructors
bitset<N> s construct bitset for N bits
bitset<N> s(aBitSet) copy constructor
bitset<N> s(ulong) create bitset representing an
unsigned long value

Bit level operations


s.flipO flip all bits
s.flip(i) flip position i
s.reset(O) set all bits to false
s.reset(i) set bit pOSition i to false
s.setO set all bits to true
s.set(i) set bit pOSition i to true
s. test (i) test if bit position i is true

Operations on entire collection


s.anyO return true if any bit is true
s.noneO return true if all bits are false
s.countO return number of true bits

Assignment

sl&=s2 bitwise AND and assign


sll=s2 bitwise inclusive OR and assign
slA=s2 bitwise exclusive OR and assign
sl«=n shift left n and assign
sl»=n shift right n and assign

Combination with other bitsets


sl & s2 bi twise AND
sl 1 s2 bitwise inclusive OR
sl A s2 bitwise exclusive OR
s == s2 return true if two sets are the same

Other operations
s bitwise complement of s
s « n shift set left by n
s » n shift set right by n
s.to_string() return string representation of set

The following small program shows an application of the bitset class.


332 Chapter 15. Genetic Algorithms

I I bi tset 1. cpp

#include <iostream>
#include <bitset>
#include <string>
using namespace std;

int mainO
{
const unsigned long n = 32;
bitset<n> s;
cout « s.set() « endl; II set all bits to 1

cout « s.flip(12) « endl; II flip at position 12

bitset<n> t;
cout « t.reset() « endl; II set all bits to false

t.set(23);
t.set(27);

bitset<n> u;
u = s " t;
cout « "u = " « u « endl;

bitset<n> V;
V = sit;
cout « "v = " « v « endl;
bitset<n> W;
W =S - t;
cout « "w = " « w « endl;
bitset<n> Z;
z =w W;
cout « "z = " « z « endl;

cout « "z.to_stringO = " « z.to_stringO;

return 0;
}
15.7 A Bit Vector Class 333

15.7 A Bit Vector Class


II Bitvect.h
II Bit Vector Class

#include <string.h>

#ifndef __ BITVECTDR
#define __ BITVECTDR

const unsigned char _BV_BIT[8] { 1,2,4,8,16,32,64,128 }j

class BitVector
{
protected:
unsigned char *bitvecj
int lenj
public:
BitVectorO j
BitVector(int nbits)j
BitVector(const BitVector& b)j II copy constructor
-BitVectorO j
void SetBit(int bit,int val=l)j
int GetBit(int bit) constj
void ToggleBit(int bit)j
BitVector operator&(const BitVector&) constj
BitVector& operator &= (const BitVector&)j
BitVector operator 1 (const BitVector&) constj
BitVector& operator 1= (const BitVector&)j
BitVector operator - (const BitVector&) constj
BitVector& operator -= (const BitVector&)j
friend BitVector operator - (const BitVector&)j
BitVector& operator = (const BitVector&)j
int operator[] (int bit) constj
void SetLength(int nbits)j
}j

BitVector: :BitVector()
{
len = OJ
bitvec = NULLj
}

BitVector::BitVector(int nbits)
{
len = nbits/8+«nbits%8)?1:0)j
bitvec = new unsigned char[len]j
}
334 Chapter 15. Genetic Algorithms

BitVector::BitVector(const BitVector &b)


{
len = b.len;
bitvec = new unsigned char[len];
memcpy(bitvec,b.bitvec,len);
}

BitVector::-BitVector()
{
if(bitvec != NULL) delete[] bitvec;
}

void BitVector::SetBit(int bit,int val)


{
if(bit < 8*len)
{
if(val) bitvec[bit/8] 1= _BV_BIT[bity'8];
else bitvec[bit/8] &= -_BV_BIT[bity'8];
}
}

int BitVector::GetBit(int bit) const


{
if(bit < 8*len) return «bitvec[bit/8]&_BV_BIT[bity'8])?1:O);
return -1;
}

void BitVector::ToggleBit(int bit)


{
if(bit<8*len) bitvec[bit/8] ~= _BV_BIT[bity'8];
}

BitVector BitVector::operator & (const BitVector &b) const


{
int i;
int mIen = (len> b.len)?len:b.len;
BitVector ret(mlen*8);
for(i=O;i<mlen;i++)
ret.bitvec[i] = bitvec[i]&b.bitvec[i];
return ret;
}

BitVector& BitVector::operator &= (const BitVector &b)


{
int i;
int mIen = (len>b.len)?len:b.len;
for(i=O;i<mlen;i++)
15.7 A Bit Vector Class 335

bitvec[i) &= b.bitvec[i);


return *this;
}

BitVector BitVector::operator 1 (const BitVector &b) const


{
int i;
int mIen = (len>b.Ien)?len:b.Ien;
BitVector ret(mIen*8);
for(i=O;i<mlen;i++)
ret.bitvec[i) = bitvec[i) Ib.bitvec[i);
return ret;
}

BitVector& BitVector: : operator 1= (const BitVector &b)


{
int i;
int mIen = (len>b.Ien)?len:b.Ien;
for(i=O;i<mlen;i++)
bitvec[i) 1= b.bitvec[i);
return *this;
}

BitVector BitVector: : operator - (const BitVector &b) const


{
int i, mIen = (len>b.Ien)?len:b.Ien;
BitVector ret(mIen*8);
for(i=O;i<mlen;i++)
ret.bitvec[i) = bitvec[i)-b.bitvec[i);
return ret;
}

BitVector& BitVector::operator - (const BitVector &b)


{
int i;
int mIen = (len>b.Ien)?len:b.Ien;
for(i=O;i<mlen;i++)
bitvec[i) -= b.bitvec[i];
return *this;
}

BitVector operator - (const BitVector &b)


{
int i;
BitVector ret(b.Ien*8);
for(i=O;i<b.len;i++)
ret.bitvec[i) = -b.bitvec[i);
return ret;
336 Chapter 15. Genetic Algorithms

BitVector& BitVector::operator = (const BitVector& b)


{
if(bitvec == b.bitvec) return *this;
if(bitvec != NULL) delete[] bitvec;
len = b.len;
bitvec = new unsigned char[len];
memcpy(bitvec,b.bitvec,len);
return *this;
}

int BitVector::operator[] (int bit) const


{
return GetBit(bit);
}

void BitVector: :SetLength(int nbits)


{
if(bitvec != NULL) delete[] bitvec;
len = nbits/8 + ((nbits%8)?1:0);
bitvec = new unsigned char[len];
}

#endif
15.8 Maximum of One-Dimensional Maps 337

15.8 Maximum of One-Dimensional Maps


As an example we consider the following fitness functions

f(x) = cos(x)
and
g(x) = cos (x) - sin(2x)
in the interval [0 : 2111 In this interval the function f has two global maxima at
the value 0 and 27r. The function g has three maxima. The global maximum is at
5.64891 and the two local maxima are at a and 2.13862.

A simple C++ program would include the following functions

II fitness function of individual


double f(double)

II fitness function value of individual


double f_value(double (*func)(double),int* arr,int& N,
double a, double b)

II x_value
double x_value(int* arr,int& N,double a,double b)

II setup of farm
void setup(int** farm, int M, int N)

II crossing two individuals


void crossings(int** farm, int M, int N)

II mutate an individual
void mutate(int** farm, int M, int N)

Here N is the length of the binary string and M is the size of the population, which
is kept constant at each time step. For the given problem we select N = 10 and
M = 12. The binary string" SN-1SN-2"'SO" is mapped into the integer number m
and then into the real number x in the interval [0 : 27r] as described above.

The farm is set up using a random number generator. In our implementation the
crossing function selects the two fittest strings from the two parents and the two
children. The parents are selected by a random number generator. With a popu-
lation of 12 strings in the farm we find after 100 iterations both the maxima at a
and 27r for the function f. A typical result is that five strings are related to the
maximum at x = a and seven strings are related to the maximum at x = 27r. For
the fitness function g we find the global maximum and the second highest maximum
after 100 iterations.
338 Chapter 15. Genetic Algorithms

II genetic.cpp
II A simple genetic algorithm
II finding the global maximum of
II the function f in the interval [a,b].

#include <iostream.h>
#include <stdlib.h>
#include <time.h> II for srand(), rand()
#include <math.h> II for cos(), sin(), pow

II fitness function where maximum to be found


double f(double x)
{
return cos(x) - sin(2*x);
}

II fitness function value for individual


double f_value(double (*func) (double) ,int* arr,intk N,
double a,double b)
{
double res;
double m = 0.0;
for(int j=O; j<N; j++)
{
double k = j;
m += arr[N-j-l]*pow(2.0,k);
}
double x = a + m*(b-a)/(pow(2.0,N)-1.0);
res = func(x);
return res;
}

II x_value at global maximum


double x_value(int* arr,int& N,double a,double b)
{
double m = 0.0;
for(int j=O; j<N; j++)
{
double k = j;
m += arr[N-j-l]*pow(2.0,k);
}
double x = a + m*(b-a)/(pow(2.0,N)-1.0);
return x;
}

II setup the population (farm)


void setup(int** farm, int M, int N)
{
15.8 Maximum of One-Dimensional Maps 339

time_t t;
srand«unsigned) time(&t»;
for(int j=O; j<M; j++)
{
for(int k=Oj k<N; k++)
{
farm[j] [k] = randO%2;
}
}
}

II cross two individuals


void crossings(int** farm,int& M,int& N,double& a,double& b)
{
int K = 2;
int** temp = NULL;
temp = new int* [K];
for(int i=O; i<K; i++)
{
temp[i] = new int[N]j
}

double res[4];
int r1 = rand()%M;
int r2 = rand()Y~;
II random returns a value between
II 0 and one less than its parameter
while(r2 == r1) r2 = rand()%M;

res[O] = f_value(f,farm[r1],N,a,b);
res[l] = f_value(f,farm[r2],N,a,b);

for(int j=Oj j<N; j++)


{
temp[O][j] = farm [r1] [j];
temp[l][j] = farm [r2] [j];
}

int r3 = rand()%(N-2) + 1;

for(j=r3; j<N; j++)


{
temp[O][j] = farm [r2] [j];
temp[l][j] = farm[r1][j];
}

res[2] = f_value(f,temp[O],N,a,b);
res[3] = f_value(f,temp[l],N,a,b);
340 Chapter 15. Genetic Algorithms

if (res [2] > res[O])


{
for(j=O; j<N; j++)
farm [r1l [j] = temp [0] [j] ;
res[O] = res[2];
}

if(res[3] > res[l])


{
for(j=O; j<N; j++)
farm[r2] [j] = temp [1] [j];
res [1] = res [3] ;
}
for(j=O; j<K; j++)
delete [] temp[j];
delete [] temp;
}

II mutate an individual
void mutate(int** farm,int& M,int& N,double& a,double& b)
{
double res [2] ;
int r4 = rand()%N;
int rl = rand()%M;
res[O] = f_value(f,farm[rl],N,a,b);
int vl = farm [rl] [r4];
if(vl == 0) farm[r1] [r4] = 1;
if(v1 == 1) farm [r1] [r4] = 0;
double a1 = f_value(f,farm[r1] ,N,a,b);
if (al < res[O]) farm[rl] [r4] = vl;

int r5 = rand()%N;
int r2 = rand()%M;
res[l] = f_value(f,farm[r2] ,N,a,b);
int v2 = farm [r2] [r5] ;
if(v2 == 0) farm[r2] [r5] = 1;
if(v2 == 1) farm[r2] [r5] = 0;
double a2 = f_value(f,farm[r2] ,N,a,b);
if(a2 < res[l]) farm[r2] [r5] = v2;
}

void mainO
{
int M = 12; II population (farm) has 12 individuals (animals)
int N = 10; II length of binary string

int** farm = NULL; II allocate memory for population


15.8 Maximum of One-Dimensional Maps 341

farm = new int* [M];


for(int i=O; i<M; i++)
{
farm[i] = new int[N];
}

setup (farm, M, N);

double a = -1.0; double b = 1.0; II interval [a,b]

for(int k=O; k<1000; k++)


{
crossings(farm,M,N,a,b);
mutate(farm,M,N,a,b);
} II end for loop

for(int j=O; j<N; j++)


{
cout « "farm[1] [II « j « II] II « farm [1] [j] « endl;
}
cout « endl;

for(j=O; j<M; j++)


cout « "fitness f_value[II « j « II]
« f_value(f,farm[j] ,N,a,b)
«" "« II x_value [II « j « II]
« x_value(farm[j],N,a,b) « endl;

for(j=O; j<M; j++)


delete [] farm[j];
delete [] farm;
}
342 Chapter 15. Genetic Algorithms

In the program given above we store a bit as into This wastes a lot of memory
space. A more optimal use of memory is to use a string, for example" 1000111101".
Then we use 1 byte for 1 or O. An even more optimal use is to manipulate the
bits themselves. In the following we use the class BitVector described above to
manipulate the bits. The BitVector class is included in the header file bitVect.h.

/ / f indmax . cpp

#include <iostream.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>
#include "bitvect.h"

double f(double x) { return cos(x)-sin(2*x); }

double f_value(double (*func) (double) ,const BitVector &arr,


int &N,double a,double b)
{
double res, m = 0.0;
for(int j=O;j<N;j++)
{
double k = j;
m += arr[N-j-l]*pow(2.0,k);
}
double x = a+m*(b-a)/(pow(2.0,N)-1.0);
res = func(x);
return res;
}

double x_value(const BitVector &arr,int &N,double a,double b)


{
double m = 0.0;
for(int j=O;j<N;j++)
{
double k = j;
m += arr[N-j-l]*pow(2.0,k);
}
double x = a + m*(b-a)/(pow(2.0,N)-1.0);
return x;
}

void setup(BitVector *iarm,int M,int N)


{
srand«unsigned)time(NULL));
for(int j=O;j<M;j++)
for(int k=O;k<N;k++)
farm[j] .SetBit(k,rand()%2);
15.8 Maximum of One-Dimensional Maps 343

void crossings(BitVector *farm,int &M,int &N,double &a,double &b)


{
int K = 2, j;
BitVector *temp = new BitVector[K] ;
for(int i=O;i<K;i++) temp[i] .SetLength(N);
double res [4] ;
int rl = rand()%M;
int r2 = rand()%M;
while(r2 == rl) r2 = rand()%M;
res[O] = f_value(f,farm[rl],N,a,b);
res[l] = f_value(f,farm[r2],N,a,b);
for(j=O;j<N; j++)
{
temp [0] .SetBit(j,farm[rl] [j]);
temp[l] . SetBit (j ,farm [r2] [j]);
}
int r3 = rand()%(N-2)+1;
for(j=r3;j<N;j++)
{
temp [0] .SetBit(j,farm[r2] [j]);
temp [1] .SetBit(j,farm[rl] [j]);
}
res[2] = f_value(f,temp[O] ,N,a,b);
res[3] = f_value(f,temp[O],N,a,b);
if(res[2]>res[0])
{
farm[rl] = temp [0] ;
res [0] = res [2] ;
}
if(res[3] > resEll)
{
farm[r2] = temp [1] ;
res [1] = res [3] ;
}
delete[] temp;
}

void mutate(BitVector *farm,int &M,int &N,double &a,double &b)


{
double res[2];
int r4 = rand()%N;
int rl rand()%M;
res[O] = f_value(f,farm[rl],N,a,b);
int vi = farm[rl] [r4] ;
farm[rl] .ToggleBit(r4);
double al = f_value(f,farm[rl] ,N,a,b);
344 Chapter 15. Genetic Algorithms

if(al < res[O]) farm[rl] .ToggleBit(r4);


int rS = rand()%N;
int r2 = rand()%M;
res[l] = f_value(f,farm[r2],N,a,b);
int v2 = farm [r2] [rS];
farm [r2] .ToggleBit(rS);
double a2 = f_value(f,farm[r2],N,a,b);
if(a2 < res[l]) farm[r2].ToggleBit(r5);
}

void main(void)
{
int M = 12;
int N = 10;
int i, j, k;

BitVector* farm = new BitVector[M];

for(i=O;i<M;i++) farm[i] .SetLength(N);


setup(farm,M,N);

double a = 0.0, b = 6.28318;

for(k=0;k<1000;k++)
{
crossings(farm,M,N,a,b);
mutate(farm,M,N,a,b);
}
for(j=O ;j<N; j++)
cout « "farm[1] ["«j«"]=" « farm[l] [j] « endl;
cout«endl;
for(j=O ;j<M;j++)
cout « "fitness f_value["«j«"]="
« f_value(f,farm[j] ,N,a,b)
«" x_value["«j«"]="« x_value(farm[j],N,a,b) « endl;
delete [] farm;
}

A typical output is

farm [1] [0] =1


farm [1] [1] =1
farm[1] [2]=1
farm [1] [3] =0
farm[1] [4]=1
farm [1] [5] =0
farm [1] [6]=0
farm[l] [7]=0
15.8 Maximum of One-Dimensional Maps 345

fam[l] [8]=0
fam[l] [9]=0

fitness f_value[0]=1.75411 x_value [0]=5.6997


fitness f_value[1]=1.75411 x_value [1] =5. 6997
fitness f_value[2]=1.75411 x_value [2] =5. 6997
fitness f_value[3]=1.75411 x_value[3]=5.6997
fitness f_value[4]=1.75411 x_value[4]=5.6997
fitness f_value[5]=1.75411 x_value[5]=5.6997
fitness f_value[6]=1 x_value [6]=0
fitness f_value[7]=0.59771 x_value [7]=0. 196541
fitness f_value[8]=1.75411 x_value [8]=5.6997
fitness f_value[9]=1 x_value [9] =0
fitness f_value[10]=1.75411 x_value[10]=5.6997
fitness f_value[11]=1.75411 x_value[11]=5.6997
346 Chapter 15. Genetic Algorithms

15.9 Maximum of Two-Dimensional Maps

Here we consider the problem how to find the maximum of a two-dimensional


bounded function! : [a, bJ x [c, dJ -+ R, where a, b, c, d E R, a < band c < d.
We follow in our presentation closely Michalewicz [116J. Michalewicz also gives a
detailed example.

We use the following notation. N is the length of the chromosome (binary string).
The chromosome includes both the contributions from the x variable and y variable.
The size of N depends on the required precision. M denotes the size of the farm
(population) which is kept constant at each time step. First we have to decide about
the precision. We assume further that the required precision is four decimal places
for each variable. First we find the domain of the variable x, i.e. b-a. The precision
requirement implies that the range [a, bJ should be divided into at least (b-a) ·10000
equal size ranges. Thus we have to find an integer number NI such that

The domain of variable y has length d - c. The same precision requirement implies
that we have to find an integer N2 such that

The total length of a chromosome (solution vector) is then N = NI + N 2. The first


NI bits code x and the remaining N2 bits code y.

Next we generate the farm. To optimize the function! using a genetic algorithm,
we create a population of size = M chromosomes. All N bits in all chromosomes
are initialized randomly using a random number generator.

Let us denote the chromosomes by vo, VI, ... , VM-I. During the evaluation phase
we decode each chromosome and calculate the fitness function values f(x, y) from
(x, y) values just decoded.

Now the system constructs a roulette wheel for the selection process. First we
calculate the total fitness F of the population

M-I
F:= L !(Vi).
i=O

Next we calculate the probability of a selection Pi and the cumulative probability qi


for each chromosome Vi
15.9 Maximum of Two-Dimensional Maps 347

i
f(vi)
Pi:=p' qi:= LPk> i = 0, 1, ... , M - 1.
k=O

Obviously, qM-l = 1. Now we spin the roulette wheel M times. First we generate a
(random) sequence of M numbers for the range [0 .. 1]. Each time we select a single
chromosome for a new population as follows. Let fa be the first random number.
Then qk < fa < qk+1 for a certain k. We selected chromosome k + 1 for the new
population. We do the same selection process for all the other M - 1 random
numbers. This leads to a new farm of chromosomes. Some of the chromosomes can
now occur twice.

We now apply the recombination operator, crossover, to the individuals in the new
population. For the probability of crossover we choose Pc = 0.25. We proceed in the
following way: for each chromosome in the (new) population we generate a random
number f from the range [0 .. 1]. Thus we generate again a sequence of M random
numbers in the interval [0, 1]. If f < 0.25, we select a given chromosome for crossover.
If the number of selected chromosomes is even, so we can pair them. If the number
of selected chromosomes were odd, we would either add one extra chromosome or
remove one selected chromosome. Now we mate selected chromosomes randomly.
For each of these two pairs, we generate a random integer number pos from the
range [O .. N - 2]. The number pos indicates the position of the crossing point. We
do now the same process for the second pair of chromosomes and so on. This leads
to a new farm of chromosomes.

The next operator, mutation, is performed on a bit-by-bit basis. The probability


of mutation Pm = 0.01, so we expect that (on average) 1% of bits would undergo
mutation. There are N x M bits in the whole population; we expect (on average)
0.01· N . M mutations per generation. Every bit has an equal chance to be mutated,
so, for every bit in the population, we generate a random number f from the range
[0 .. 1]. If f < 0.01, we mutate the bit. This means that we have to generate N . M
random numbers. Then we translate the bit position into chromosome number and
the bit number within the chromosome. Then we swap the bit. This leads to a new
population of the same size M.

Thus we have completed one iteration (i.e., one generation) of the while loop in the
genetic procedure. Next we find the fitness function for the new population and
the total fitness of the new population, which should be higher compared to the
old population. The fitness value of the fittest chromosome of the new population
should also be higher than the fitness value of the fittest chromosome in the old
population. Now we are ready to run the selection process again and apply the
genetic operators, evaluate the next generation and so on. A stopping condition
could be that the total fitness does not change anymore.
348 Chapter 15. Genetic Algorithms

II twodim.cpp

#include <iostream.h>
#include <math.h>
#include <stdlib.h>
#include <time.h>

II function to optimize
double f(double x, double y)
{
return exp(-(x-l.0)*(x-l.0)*y*y/2.0);
return x*y;
}

II determines the chromosone length required


II to obtain the desired precision
int cLength(int precision, double rangeStart, double rangeEnd)
{
int length = 0;
double total = (rangeEnd - rangeStart)*pow(lO.O,precision);
while(total > pow(2.0,length» length++;
return length;
}

void setup(int** farm,int size,int length)


{
int i, j;
time_t t;
srand((unsigned) time(&t»;

for(i=O; i<size; i++)


for(j=O; j<length; j++)
farm[i] [j] = randO%2;
}

void printFarm(int** farm,int length,int size)


{
int i, j;
for(i=O; i<size; i++)
{
cout « "\n";
for(j=O; j<length; j++)
{
cout « farm[i] [j] ;
}
}
}
15.9 Maximum of Two-Dimensional Maps 349

double xValue(int* chromosome,int xLength,double* domain)


{
int i;
double m = 0.0;
for(i=O; i<xLength; i++)
{
m += chromosome[xLength-i-l]*pow(2.0,i);
}
double x =
domain[O] + m*(domain[1]-domain[0])/(pow(2.0,xLength)-1.0);
return x;
}

double yValue(int* chromosome,int yLength,int length,double* domain)


{
int i;
double m = 0.0;
for(i=O; i<yLength; i++)
{
m += chromosome[length-i-l]*pow(2.0,i);
}
double y =
domain[2] + m*(domain[3]-domain[2])/(pow(2.0,yLength)-1);
return y;
}

double fitnessValue(double (*f)(double,double), int* chromosome,


int length, double* domain,int xLength,int yLength)
{
double x = xValue(chromosome, xLength, domain);
double y = yValue(chromosome, yLength, length, domain);
double result = f(x,y);
return result;
}

II A new farm is set up by using a roulette wheel


II parent selection process
void roulette(int** farm,int length,int size,double* domain,
int xLength,int yLength)
{
int i, j;
II fitness matrix contains the fitness of each
II individual chromosome on farm
double* fitnessVector = NULL;
fitnessVector = new double[size];

for(i=O; i<size; i++)


{
350 Chapter 15. Genetic Algorithms

fitnessVector[i] =
fitnessValue(f.farm[i] .1ength.domain.xLength.yLength);
}

II fitness vector contains the fitness of


II each individual chromosome of the farm
double totalFitness = 0.0;
for(i=O; i<size; i++)
{
totalFitness += fitnessVector[i];
}

II calculate probability vector


double* probabilityVector = NULL;
probabilityVector = new double [size] ;
for(i=O; i<size; i++)
{
probabilityVector[i] fitnessVector[i]/totalFitness;
}

II calculate cumulative probability vector


double cumulativeProb = 0.0;
double* cum_prob_Vector = NULL;
cum_prob_Vector = new double [size];

for(i=O; i<size; i++)


{
cumulativeProb += probabilityVector[i] ;
cum_prob_Vector[i] = cumulativeProb;
}

II setup random vector


double* randomVector = NULL;
randomVector = new double [size];
time_t t;
srand«unsigned) time(&t»;

for(i=O; i<size; i++)


randomVector[i] = rand()/double(RAND_MAX);

II create new population


int count;
int** newFarm = NULL;
newFarm = new int* [size];
for(i=O; i<size; i++)
newFarm[i] = new int [length];

for(i=O; i<size; i++)


15.9 Maximum of Two-Dimensional Maps 351

{
count = 0;
while(randomVector[i] > cum_prob_Vector[count]) count++;
for(j=O; j<length; j++)
{
newFarm[i] [j] = farm[count] [j] ;
}
}

for(i=O; i<size; i++)


for(j=O; j<length; j++)
farm[i] [j] = newFarm[i] [j];

delete [] fitnessVector;
delete [] probabilityVector;
delete [] cum_prob_Vector;
delete [] randomVector;

for(i=O; i<size; i++)


delete [] newFarm[i];
delete [] newFarm;
} II end function roulette

void crossing(int** farm,int size,int length)


{
int i, j, k, m;
int count = 0;
int* chosen = NULL;
chosen = new int [size];

double* randomVector = NULL;


randomVector = new double [size];

time_t t;
srand((unsigned) time(&t));

for(i=O; i<size; i++)


randomVector[i] = rand()/double(RAND_MAX);

II fill chosen with indexes of all random values < 0.25


for(i=O; i<size; i++)
{
if(randomVector[i] < 0.25)
{
chosen[count] = i;
count++;
}
}
352 Chapter 15. Genetic Algorithms

II if chosen contains an odd number of chromosomes


II one more chromosome is to be selected
if((count%2 != 0) I I (count == 1))
{
int index = 0;
while(randomVector[index] < 0.25) index++;
count++;
chosen[count-1] = index;
}

II cross chromosomes with index given in chosen


int** temp = NULL;
temp = new int* [2];
for(i=O; i<2; i++)
{
temp[i] = new int[length];
}

for(i=O; i<count; i=i+2)


{
for(j=O; j<length; j++)
{
temp [0] [j] = farm[chosen[i]] [j];
temp [1] [j] = farm[chosen[i+1]] [j];
}
int position = rand()%length;

for(k=position; k<length; k++)


{
temp [0] [k] = farm[chosen[i+1]] [k];
temp [1] [k] = farm[chosen[i]] [k];
}

for(m=O; m<length; m++)


{
farm[chosen[i]] [m] = temp [0] [m];
farm[chosen[i+1]] [m] = temp[l] [m] ;
}
}

delete [] chosen;
delete [] randomVector;

for(i=O; i<2; i++)


delete [] temp[i];
delete [] temp;
} II end function crossing
15.9 Maximum of Two-Dimensional Maps 353

void mutate(int** farm,int size,int length)


{
int i;
int totalbits = size*length;

double* randomVector = NULL;


randomVector = new double [totalbits];

time_t t;
srand((unsigned) time(&t));

for(i=O; i<totalbits; i++)


randomVector[i] = rand()/double(RAND_MAX);

int a, b;
for(i=O; i<totalbits; i++)
{
if(randomVector[i] < 0.01)
{
if(i >= length)
{
a = i/length; b i%length;
}
else
{
a = 0; b = i;
}
if(farm [a] [b] == 0)
farm [a] [b] 1;
else
farm[a][b] 0;
}
}

delete [] randomVector;
}

void printFinalResult(int** farm,int length,int size,double* domain,


int xLength,int yLength,int iterations)
{
int i;

double* fitnessVector = NULL;


fitnessVector = new double [size];

for(i=O; i<size; i++)


fitnessVector[i]
354 Chapter 15. Genetic Algorithms

fitnessValue(f,farm[i],length,domain,xLength,yLength);

II search for chromosome with maximum fitness


double x, y;
int pos = 0;
double max = fitnessVector[O];

for(i=l; i<size; i++)


{
if(fitnessVector[i] > max)
{
max fitnessVector[i];
pos i;
}
}

x = xValue(farm [pos], xLength, domain);


y yValue(farm [pos], yLength, length, domain);

II displaying the result


cout « "\n\n After II « iterations
« II iterations the fitnesses are: \n";
for(i=O; i<size; i++)
{
cout « "\n fitness of chromosome II
« i « ": II « fitnessVector[i];
}

cout « "\n\n The maximum fitness: f(1I « x « " , « y « ")


II

« max;

delete [] fitnessVector;
}

int mainO
{
int size = 32; II population size
int prec1s1on = 6; II precision
int iterations = 10000;

double domain[4]; II variables specifying domain


double xl, x2, yl, y2;
xl -2.0; x2 2.0;
yl = -2.0; y2 = 2.0;

domain [0] xl; domain[1] x2;


domain [2] yl; domain[3] y2;
15.9 Maximum of Two-Dimensional Maps 355

int xLength = cLength(precision, domain [0] ,domain[l]);


cout « "\n\n the xLength is: " « xLength;

int yLength cLength(precision, domain [2] ,domain[3]);


cout « "\n the yLength is: " « yLength;

I I total length
int length = xLength + yLength;
cout « "\n the chromosone length is: " « length;

int i;

II allocate memory for farm


int** farm = NULL;
farm = new int* [size];
for(i=O; i<size; i++) { farm[i] new int[length]; }

setup(farm,size,length);

cout « "\n\n The inital farm: \n";


printFarm(farm,length,size);
cout « endl;

II iteration loop
int t;

for(t=O; t<iterations; t++)


{
roulette(farm,length,size,domain,xLength,yLength);
crossing(farm,size,length);
roulette(farm,length,size,domain,xLength,yLength);
mutate(farm,size,length);
}

printFinalResult(farm,length,size,domain,xLength,yLength,iterations);

for(i=O; i<size; i++)


{ delete [] farm[i]; }
delete [] farm;

return 0;
}
356 Chapter 15. Genetic Algorithms

15.10 The Four Colour Problem


A map is called n-colourable [42] if each region of the map can be assigned a colour
from n different colours such that no two adjacent regions have the same colour. The
four colour conjecture is that every map is 4-colourable. In 1890 Heawood proved
that every map is 5-colourable. In 1976 Appel and Haken proved the four colour
conjecture with extensive use of computer calculations.

We can describe the m regions of a map using a m x m adjacency matrix A where


A;j = 1 if region i is adjacent to region j and A;j = 0 otherwise. We set A;i = O. For
the fitness function we can determine the number of adjacent regions which have
the same colour. The lower the number, the fitter the individual.

The program below solves the four colour problem for the map in Figure 15.1(a}.

0 1 2 y B R

3 B
6 Y
5 G
4 Y

9 B
7 8 R Y

(a) (b)

Figure 15.1: A Map for the Four Colour Problem

Individuals are represented as strings of characters, where each character represents


the colour for the region corresponding to the characters position in the string. The
member population is the number of individuals in the population, and mu is the
probability that an individual is mutated. The method fitness evaluates the fitness
of a string using the adjacency matrix to determine when adjacent regions have the
same colour. If the fitness is equal to 0 we have found a solution. The adjacency ma-
trix can be modified to solve for any map. The method mutate determines for each
individual in the population whether the individual is mutated, and mutates a com-
ponent of the individual by randomly changing the colour. The method crossing
performs the crossing operation as discussed previously. The genetic algorithm is
implemented in the method GA. The arguments are an adjacency matrix, a string
specifying which colours to use and the number of regions on the map. It returns a
string specifying a solution to the problem. One such solution is YBRBYGYRYB, where
R stands for red, G for green, B for blue and Y for yellow. This corresponds to the
colouring in Figure 15.1(b}.
15.10 The Four Colour Problem 357

/ / Colour. java

public class Colour


{
static int population=1000;
static double mu=O.Ol;

public static void main(String[] args)


{
int[] [] adjM={{O,l,O,l,O,O,O,O,O,O},
{l,O,l,O,O,l,O,O,O,O},
{0,1,0,0,0,0,1,0,0,0},
{1,0,0,0,1,1,0,0,0,0},
{O,O,O,l,O,l,O,l,O,O},
{0,1,0,1,1,0,1,0,1,1},
{0,0,1,0,0,1,0,0,0,1},
{0,0,0,0,1,0,0,0,1,0},
{0,0,0,0,0,1,0,1,0,1},
{0,0,0,0,0,1,1,0,1,0}};

System.out.println(GA(adjM,IRGBY",10));
}

static int fitness(int[] [] adjM,String s,int N)


{
int count = 0;
for(int i=O;i < N-1;i++)
{
for(int j=i+1;j < N;j++)
{
if«s.charAt(i) == s.charAt(j)) && (adjM[i] [j] 1))
count++;
}
}
return count;
}

static void mutate(String[] p,String colors)


{
int j;
for(int i=O;i<p.length;i++)
{
if (Math.random()<mu)
{
int pos=(int) (Math.random()*(p[i] .length()-l));
int mut=(int) (Math.random()*(colors.length()-2));
char[] ca1=p[i] .toCharArray();
char[] ca2=colors.toCharArray();
358 Chapter 15. Genetic Algorithms

for(j=O;cal[pos] !=ca2[j];j++) {};


cal[pos]=ca2[(j+mut)%colors.length()];
p[i]=new String(cal);
}
}
}

static void crossing(String[] p,int[] [] adjM)


{
int pl=(int)(Math.random()*(p.length-l»;
int p2=pl;
int cl=(int) (Math.random()*(p[O] .1ength()-1»;
int c2=cl;

while(p2==pl) p2=(int)(Math.random()*(p.length-l»;
while(c2==cl) c2=(int)(Math.random()*(p[O].length()-1»;
if(c2<cl) {int temp=c2; c2=cl; cl=temp;}

String[] temp=new String[4];


temp[O]=p[pl];temp[1]=p[p2];
temp [2] =p[pl] . substring(O,cl)+p[p2] .substring(cl+l,c2)
+p[pl] .substring(c2+1,p[pl].length()-1);
temp [3] =p[p2] . substring(O,cl)+p[pl] .substring(cl+l,c2)
+p[p2].substring(c2+1,p[p2].length()-1);

int i,f;
for(i=O,f=O;i<4;i++)
{
if(fitness(adjM,temp[i],temp[i] .length(»
>fitness(adjM,temp[f],temp[f] .length(»)
f=i;
}
{String tmp=temp[f]; temp[f]=temp[O]; temp[O]=tmp;}
for(i=1,f=1;i<4;i++)
{
if(fitness(adjM,temp[i],temp[i].length(»
>fitness(adjM,temp[f],temp[f] .length(»)
f=i;
}
{String tmp=temp[f]; temp[f]=temp[l]; temp[l]=tmp;}
p[pl]=temp[2]; p[p2]=temp[3];
}

static String GA(int[] [] adjM,String colors,int N)


{
int maxfitness,mfi=O;
String[] p = new String[population];
char[] temp = new char[N];
15.10 The Four Colour Problem 359

for(int i=O;i < population;i++)


{
for(int j=O;j < N;j++)
{
temp[j] = colors.charAt((int)((Math.random()*colors.length())));
}
p[i]=new String(temp);
}
maxfitness=fitness(adjM,p[O] ,p[O] .length());
while(maxfitness!=O)
{
mutate(p,colors);
crossing(p,adjM);
for(int i=O;i<p.length;i++)
{
if(fitness(adjM,p[i],p[i] .length())<maxfitness)
{
maxfitness=fitness(adjM,p[i],p[i] .length());
mfi=i;
}
}
}
return p[mfi] ;
}
}
360 Chapter 15. Genetic Algorithms

15.11 Problems with Constraints


15.11.1 Introduction
Thus far, we have only discussed genetic algorithms for searching unconstrained
objective functions. Many practical problems contain one or more constraints that
must also be satisfied. A typical example is the traveling salesman problem, where
all cities must be visited exactly once. In a more mathematical formulation, the
traveling salesman problem is stated as follows. For a given n x n distance matrix
C = (Cij), find a cyclic permutation 7r of the set {1, 2, ... ,n} that minimizes the
function

n
c(7r) = LCi7l"(i).
i=l

The value c( 7r) is usually referred to as the length (or cost or weight) of the per-
mutation 7r. The traveling salesman problem is one of the standard problems in
combinatorial optimization and has many important applications like routing or
production scheduling with job-dependent set-up times. Another example is the
knapsack problem, where the weight which can be carried is the constraint. The
norm of an n x n matrix over the real numbers R is given by

IIAII := sup IIAxll·


IIxll=l

This is a problem with the constraint

IIxll = 1

i.e. the length of the vector x E Rn must be 1. This problem can be solved with
the Lagrange multiplier method. The Lagrange multiplier method is as follows. Let
M be a manifold and I be a real valued function of class C(1) on some open set
containing M. We consider the problem of finding the extrema of the function 11M.
This is called a problem of constrained extrema. Assume that I has a constrained
°
extremum at x* = (xi, x;, ... , x~). Let gl (x) = 0, ... ,gm (x) = be the constraints
(manifolds). Then there exist real numbers AI, ... ,Am such that x* is a critical point
of the function

The numbers AI, ... , Am are called Lagrange multipliers. For the problem to find
the norm of an n x n matrix one considers the functions

F(x) := IIAxl1 + Allxll


where A is the Lagrange multiplier.
15.11 Problems with Constraints 361

The most difficult problem in genetic algorithms is the inclusion of constraints. Con-
straints are usually classified as equality or inequality relations. Equality constraints
may be included into the system. It would appear that inequality constraints pose
no particular problem. A genetic algorithm generates a sequence of parameters to be
tested using the system model, objective function, and the constraints. We simply
run the model, evaluate the fitness function, and check to see if any constraints are
violated. If not, the parameter set is assigned the fitness value corresponding to the
objective function evaluation. If constraints are violated, the solution is infeasible
and thus does not have a fitness. This procedure is fine except that many practical
problems are highly constrained; finding a feasible point is almost as difficult as find-
ing the best. As a result, we usually want to get some information out of infeasible
solutions, perhaps by degrading their fitness ranking in relation to the degree of
constraint violation. This is what is done in a penalty method. In a penalty method,
a constrained problem in optimization is transformed to an unconstrained problem
by associating a cost or penalty with all constraint violations. This cost is included
in the objective function evaluation.

Consider, for example, the original constrained problem in minimization form

minimize g(x) subject to

i = 1,2, ... ,n

where x is an m vector. We transform this to the unconstrained form

minimize

n
g(x) + r L <I> [hi (x)]
i=1

where <I> is the penalty function and r is the penalty coefficient. Other approaches
use decoders or repair algorithms.

A detailed discussion of problems with constraints is given by Michalewicz [116J. He


proposes that appropriate data structures and specialized genetic operators should
do the job of taking care of constraints. He then introduces an approach to handle
problems with linear constraints (domain constraints, equalities, and inequalities).
We consider here the knapsack problem and traveling salesman problem applying
genetic algorithms.
362 Chapter 15. Genetic Algorithms

15.11.2 Knapsack Problem


Formally, the knapsack problem can be stated as follows.

Problem. Given M, the capacity of the knapsack,

{WiIWi>O, i=0,1, ... ,n-1}

the weights of the n objects, and

I > 0,
{ Vi Vi i = 0, 1, ... ,n - I}

their corresponding values,

n-l
maximize L ViXi
i=O

n-l
subject to "w·x·
~ <M
t t_
i=O

where xiE{O,l}.
Here Xi = 0 means that item i should not be included in the knapsack, and Xi = 1
means that it should be included.

As an example for the knapsack problem we consider the following problem. A


hiker planning a backpacking trip feels that he can comfortably carry at most 20
kilograms. After laying out all the items that he wants to take and discovering that
their total weight exceeds 20 kilograms, he assigns to each item a "value" rating, as
shown in the Table. Which items should he take to maximize the value of what he
can carry without exceeding 20 kilograms ?

Table. An Instance of the Knapsack Problem

Canteen Change of Camp Sleeping Dried


Item Tent (filled) clothes stoves bag food
Weight 11 7 5 4 3 3
Value 20 10 11 5 25 50

First-aid Mosquito Rain Water


Item kit repellent Flashlight Novel gear purifier
Weight 3 2 2 2 2 1
Value 15 12 6 4 5 30
15.11 Problems with Constraints 363

Although we do not know yet how to obtain the solution, the way to fill the knapsack
to carry the most value is to take the sleeping bag, food, mosquito repellent, first-aid
kit, flashlight, water purifier, and change of clothes, for a total value of 149 with a
total weight of 19 kilograms. An interesting aspect of the solution is that it is not
directly limited by the weight restriction. There are ways of filling the knapsack
with exactly 20 kilograms, such as substituting the change of clothes for the camp
stove and rain gear, but this decreases the total value.

The following program uses a genetic algorithm to solve the problem. We use the
header file bi tvect . h given above.

II knapsack.cpp

#include <fstream.h>
#include <time.h>
#include <stdlib.h>
#include "bitvect.h"

struct item
{
char name[50];
double weight;
double value;
};

void readitems(char *file,item *&list,int &n,double &max)


{
int i;
if stream data(file);
data» n;
list = new item[n];

for(i=O;i<n;i++)
{
data » list[i] .name;
data » list[i] .weight;
data » list[i] . value;
}
data» max;
}

void destroyitems(item *list)


{
delete [] list;
}

double value(const BitVector &b,int n,double max,item *list)


364 Chapter 15. Genetic Algorithms

{
int ij
double tweight = O.Oj
double tvalue = O.Oj

for(i=Oji<nji++)
{
if(b.GetBit(i»
{
tweight += list[i].weightj
tvalue += list[i].valuej
}
if(tweight > max)
{ tvalue = -1.0j i = nj }
}
return tvaluej
}

void mutate(BitVector *farm,int m,int n,item *list,double max)


{
const int tries = 1000j
int animal = rand()~~j
int i = 0, pos, pos2j
BitVector* newanim = new BitVector(farm[animal])j
pos2 = pos = rand()~>nj
newanim -> ToggleBit(pos)j

while(i<tries)
{
while(pos2 == pos) pos2 = rand()Y>n;
newanim -> ToggleBit(pos2);
if (value (*newanim,n,max, list) > 0) i=tries;
else { newanim -> ToggleBit(pos2);i++;pos2=pos; }
}
if(value(*newanim,n,max,list»value(farm[animal],n,max,list»
farm [animal] = *newanim;

delete newanim;
}

void crossing(BitVector *farm,int m,int n,item *list,double max)


{
const int tries = 1000;
int animall = rand()Y~;
int animal2 = rand()~~;
int i, pOSj

while (animal2 animal1) animal2 = rand()~~;


15.11 Problems with Constraints 365

BitVector *newaniml = new BitVector(farm[animall]);


BitVector *newanim2 = new BitVector(farm[anima12]);
pos = randO'l.n;

for(i=pos;i<n;i++)
{
newaniml -> SetBit(i,farm[anima12] [i]);
newanim2 -> SetBit(i,farm[animall] [i));
}

if(value(*newaniml,n,max,list) > value(farm[animall),n,max,list))


farm[animall) = *newaniml;
if(value(*newanim2,n,max,list) > value (farm [anima12] ,n,max,list))
farm[anima12) = *newaniml;

delete newaniml;
delete newanim2;
}

void setupfarm(BitVector *farm,int m,int n,item *list,double max)


{
const int tries = 2000;
double temp;
int i,j,k;
srand(time(NULL));
for(i=O;i<m;i++)
{
for(j=O;j<n;j++) farm[i) .SetBit(j,O);
temp = 0.0;
k = 0;
while«temp < max) && (k < tries))
{
j = randO%n;
if(!farm[i) .GetBit(j)) temp+=list[j) .weight;
if(temp < max) farm [i) .SetBit(j);
k++;
}
}
}

void mainO
{
item* list = NULL;
int n, m = 100,i,iterations = 500, besti = 0;
double max, bestv = 0.0, bestw = 0.0, temp;
BitVector *farm = NULL;
readitems("knapsack.dat",list,n,max);
366 Chapter 15. Genetic Algorithms

farm = new BitVector[m];

for(i=O;i<m:i++)
farm[i].SetLength(n);

setupfarm(farm,m,n,list,max);

for(i=O:i<iterations;i++)
{
crossing(farm,m,n,list,max);
mutate(farm,m,n,list,max);
}

for(i=O:i<m;i++)
if((temp=value(farm[i],n,max,list» > bestv)
{ bestv=temp; besti=i; }

cout«"Items to take : "«endl«endl;


for(i=O;i<n;i++)
{
if(farm[besti].GetBit(i»
{
cout « list[i].name « "," « endl:
bestw += list[i] .weight;
}
}
cout « endl:
cout « "for a weight of " « bestw
« "kg and value of " « bestv «endl;
delete [] farm:
destroyitems(list);
}
15.11 Problems with Constraints 367

The input file knapsack. dat is

12
tent 11 20
canteen_(filled) 7 10
change_of_clothes 5 11
camp_stoves 4 5
sleeping_bag 3 25
dried_food 3 50
first-aid_kit 3 15
mosquito_repellent 2 12
flashlight 2 6
novel 2 4
rain_gear 2 5
water_purifier 1 30
20

The output is

Items to take

change_of_clothes,
sleeping_bag,
dried30od,
first-aid_kit,
mosquito_repellent,
flashlight,
water_purifier,

for a weight of 19k9 and value of 149

We can extend the knapsack problem to one with m knapsacks. The capacity of
knapsack j is denoted by M j . The problem statement becomes

m-l n-l n-l m-l


Maximize LL VjXj,i subject to L WiXi,j :s; M j and L Xj,i = 1
i=O j=O i=O i=O

where Xi,j E {O, I}. Here Xi,j = 0 means that item i should not be included in
knapsack j, and Xi,j = 1 means that it should be included. The meanings of Wi and
Vi are the same as for the single knapsack problem.
368 Chapter 15. Genetic Algorithms

15.11.3 Traveling Salesman Problem


The traveling salesman problem is a combinatorial optimization problem. Many
combinatorial optimization problems like the traveling salesman problem can be
formulated as follows. Let

be some permutation from the set

{1,2, ... ,n}.


The number of permutations is n!. Let!1 be a space of feasible solutions (states)
and j(7r) the optimality function (criterion). It is necessary to find 7r' such that

7r• = {.... .• } = arg{f( 7r ) ---> mIn


21' 22' ... , 2n '} .
..en

The structure of !1 and f (7r) depends on the problems considered. A typical problem
is the traveling salesman problem. The traveling salesman problem is deceivingly
simple to state. Given the distances separating a certain number of towns the aim
is to find the shortest tour that visits each town once and ends at the town it
started from. As there are several engineering and scientific problems equivalent to
a traveling salesman problem. The problem is of practical importance. The number
of all possible tours is finite, therefore in principle the problem is solvable. However,
the brute force strategy is not only impractical but completely useless even for a
moderate number of towns n, because the number of possible tours grows factorially
with n. The traveling salesman problem is the best-known example of the whole class
of problems called NP-complete (or NP-hard), which makes the problem especially
interesting theoretically. The NP-complete problems are transformable into each
other, and the computation time required to solve any of them grows faster than
any power ofthe size of the problem. There are strong arguments that a polynomial
time algorithm may not exist at all. Therefore, the aim of the calculations is usually
to find near-optimum solutions.

The following C++ program permut. cpp finds all permutations of the numbers
1,2, ... ,n. The array element p [0] takes the value 0 at the beginning of the pro-
gram. The end of the evaluation is indicated by p [0] = 1.
15.11 Problems with Constraints 369

II permut.cpp
II permutation of the numbers 1,2, ... , n

#include <iostream.h>

int mainO
{
int i, j, k, t, tauj
unsigned long n = 3j

int* p = NULLj P = new int[n+1]j

II starting permutation
I I identity 1, 2, ... , n -> 1, 2, ... , n
for(i=Oj i<=nj i++)
{
p[i] = i;
cout « lip [II « i « II] II « p[i] « II II.
}
cout « endlj

int test = 1j

do
{
i = n -1j
while(p[i] > p[i+1]) i = i - 1j
if(i > 0) test = 1j else test = OJ
j = nj
while(p[j] <= p[i]) j = j - 1j

t = p[i]j p[i] = p[j]j p[j] = tj


i = i + 1j j = nj
while(i < j)
{
t = p[i]j p[i] = p[j]j p~] = tj
i = i + 1j j = j - 1j
}
II display result
for(tau=Oj tau<=nj tau++)
cout « lip [II « tau « II] II « p[tau] « II II.,
cout « endlj
} while(test == 1)j
return OJ
}
370 Chapter 15. Genetic Algorithms

Goldberg and Lingle [73] suggested a crossover operator, the so-called partially
mapped crossover. They believe it will lead to an efficient solution of the traveling
salesman problem. A partially mapped crossover proceeds as follows. We number
the cities from 0 to N - 1. Let N = 10.

We explain with an example how the partially mapped operator works. Assume
that the parents are

(1 2 3 4 5 6 7 8 9 10 11 12) al

(7 3 6 11 4 12 5 2 10 9 1 8) a2

al and a2 are integer arrays. Positions count from 0 to n - 1, where n = 12. We


select two random numbers r 1 and r2

o ~ r 1 ~ (n - 1), 0 ~ r2 ~ (n - 1), r1 ~ r2
Let r1 = 3, r2 = 6. Truncate parents using r1 and r2.

(1 2 3 4 5 6 7 1 8 9 10 11 12)

(7 3 6 11 4 12 5 1 2 10 9 1 8)

We obtain the subarrays

81 = (4 5 6 7) and 82 = (11 4 12 5).

Next we do the crossing

(1 2 3 11 4 12 5 1 8 9 10 11 12)

(7 3 6 4 5 6 7 1 2 10 9 1 8)

Now some cities occur twice while others are missing in the new array. The crossing
defines the mappings

11 -> 4 4 -> 5 12 -> 6 5 -> 7 (*)

4 -> 11 5 -> 4 6 -> 12 7 -> 5 (**)

Positions which must be fixed are indicated by X

(1 2 31 11 4 12 5 1 8 9 10 X X)

(X 3 X 1 4 5 6 7 1 2 10 9 1 8)

We fix the first array using the mapping (*).


15.11 Problems with Constraints 371

a) number 11 at position 10 must be fixed

i) map 11 f-4 4 but 4 is in array 82

ii) map 4 f-4 5 but 5 is in array 82

iii) map 5 f-4 7 o.k. 7 is not in array 82

Thus replace number 11 at position 10 by number 7.

a) number 12 at position 11 must be fixed

i) map 12 f-4 6 o.k. 6 is not in array s2

Thus replace number 12 at position 11 by number 6.

a) number 7 at position 0 must be fixed

i) map 7 f-4 5 but 5 is in array 81

ii) map 5 f-4 4 but 4 is in array 81

iii) map 4 f-4 11 o.k. 11 is not in array 81

Thus replace number 7 at position 0 by number 11

b) number 6 at position 2 must be fixed

i) map 6 f-4 12 o.k. 12 is not in array 81

Thus replace number 6 at position 2 by number 12. Consequently, the children are

(1 2 3 11 4 12 5 8 9 10 7 6)

(11 3 12 4 5 6 7 2 10 9 1 8)
372 Chapter 15. Genetic Algorithms

Bac and Perov [4] proposed another operator of crossings using the permutation
group. We illustrate the operator with an example and a C++ program. Let the
parents be given by

(0 1 2 3 4 5 6 7 8 9) -> (8 7 3 4 5 6 0 2 1 9) parent 1
(0 1 2 3 4 5 6 7 8 9) -> (7 6 0 1 2 9 8 4 3 5) parent 2

The permutation map yields

o -> 8 -> 3
1 -> 7 -> 4
2 -> 3 -> 1

etc. Thus the children are given by

(0 1 2 3 4 5 6 7 8 9) -> (3 4 1 2 9 8 7 0 6 5)
(0 1 2 3 4 5 6 7 8 9) -> (2 0 8 7 3 9 1 5 4 6)

The implementation of this permutation is straightforward.

II tspperm.cpp
#include <iostream.h>

void crossing(int* a1,int* a2,int* a3,int* a4,int n)


{
int i;
for(i=O; i<n; i++)
{
int p = a1[i];
a3 [i] = a2 [p] ;
}

for(i=O; i<n; i++)


{
int q = a2 [i] ;
a4 [i] = a1 [q] ;
}
}

int mainO
{
int n = 10;
int i;

int* a1 = NULL; int* a2 = NULL;


int* a3 = NULL; int* a4 = NULL;
15.11 Problems with Constraints 373

a1 = new int[n]; a2 = new int[n];


a3 = new int[n]; a4 = new int[n];
a1 [0] = 8; a1[1] = 7; a1 [2] = 3; a1 [3] = 4;
a1 [4] = 5; a1 [5] = 6; a1 [6] = 0; a1[7] = 2;
a1 [8] = 1; a1 [9] = 9;
a2[0] = 7; a2 [1] = 6; a2[2] = 0; a2[3] = 1;
a2[4] = 2; a2[5] = 9; a2 [6] = 8; a2[7] = 4;
a2[8] = 3; a2[9] = 5;

crossing(a1,a2,a3,a4,n);

cout « endl;

for(i=O; i<n; i++)


{
cout « la3[" « i « II] = II « a3[i] «" ";
if«(i+1)%2) == 0) { cout « endl; }
}

cout « endl;

for(i=O; i<n; i++)


{
cout « la4[" « i « II] = II « a4[i] «" ";
if«(i+l)%2) == 0) { cout « endl; }
}

delete [] al; delete [] a2;


delete [] a3; delete [] a4;

return 0;
}
374 Chapter 15. Genetic Algorithms

In the following program we use these operators to find solutions to the traveling
salesman problem.

II tsp.cpp
II
II traveling salesman problem

#include <fstream.h>
#include <stdlib.h>
#include <time.h>
#include "bitvect.h"

void readdist(char* filename,double **&dist,int &cities)


{
int i,j;
if stream d(filename);
d » cities;
dist = new double*[cities];
for(i=O;i<cities;i++)
dist[i] = new double[cities];
for(i=O;i<cities;i++)
for(j=i+l;j<cities;j++)
{
d » dist [iJ [j] ;
dist [j] [i] = dist [i] [j] ;
}
for(i=O;i<cities;i++) dist[i] [i]=O;
cout « "d[O] [0] = " « dist[O] [0] « endl;
d.closeO;
}

void destroydist(double **dist,int cities)


{
for(int i=O;i<cities;i++) delete[] dist[i];
delete[] dist;
}

double distance(int *seq,int cities,double **dist)


{
double sumdist = 0.0;
for(int i=l;i<cities;i++)
sumdist += dist[seq[i]] [seq[i-l]];
sumdist += dist[seq[O]] [seq[cities-l]];
return sumdist;
}

void setupfarm(int **farm,int n,int cities)


{
15.11 Problems with Constraints 375

BitVector used(cities);
int city,i,j;
srand(time(NULL));
for(i=O;i<n;i++)
{
for(j=O;j<cities;j++) used.SetBit(j,O);
for(j=O;j<cities;j++)
{
city = rand()%cities;
if(!used.GetBit(city)) {farm[i] [j]=city;used.SetBit(city);}
else j--;
}
}
}

void mutate(int **farm,int n,int cities,double **dist)


{
int i;
int seq = rand()%n;
int pos1 = rand()%cities;
int pos2 = rand()%cities;
while(pos2 == pos1) pos2 = rand()%cities;
int *mutated = new int[cities];
for(i=O;i<cities;i++) mutated[i] = farm[seq] [i];
mutated[pos1] = farm [seq] [pos2] ;
mutated[pos2] = farm [seq] [pos1] ;
if(distance(farm[seq],cities,dist) > distance(mutated,cities,dist))
{
delete farm[seq]; farm[seq] = mutated;
}
else delete mutated;
}

void permutate(int** farm,int n,int cities,double** dist)


{
int i;
int seq1 = rand()%n;
int seq2 = rand()%n;
int *result1, *result2, *result3, *result4;
while(seq2 == seq1) seq2 = rand()%n;
int *child1 = new int[cities];
int *child2 = new int[cities];
for(i=O;i<cities;i++)
{
child1 [i] farm [seq2] [farm[seq1] [ill;
child2 [i] farm [seq1] [farm[seq2] [i]];
}
if (distance (farm [seq1] ,cities,dist»distance(child1,cities,dist))
376 Chapter 15. Genetic Algorithms

result1 = child1;
else result1 = farm[seq1];
if(distance(farm[seq2],cities,dist) > distance(child2,cities,dist))
result2 = child2;
else result2 = farm[seq2];
result3 = «result1 == farm[seq1])?child1:farm[seq1]);
result4 = «result2 == farm[seq2])?child2:farm[seq2]);
farm[seql] = resultl;
farm[seq2] = result2;
delete [] result3;
delete [] result4;
}

int insequence(int el,int *seq,int pl,int p2)


{
for(int i=pl;i<p2;i++)
if(seq[i] == el) return i;
return -1;
}

void pmx(int **farm,int n,int cities,double **dist)


{
int i,pos;
int seql = rand()'l.n;
int seq2 = rand()'l.n;
int *resultl, *result2, *result3, *result4;
while(seq2 == seql) seq2 = rand()%n;
int posl = rand()%cities;
int pos2 = rand()%cities;
while(pos2 == posl) pos2=rand()%cities;
if(pos2<pos1) { i = pos2; pos2 = posl; posl i;}
int *childl = new int[cities);
int *child2 = new int[cities];
for(i=O;i<cities;i++)
{
if«i<pos2) && (i>=posl))
{
childl[i] = farm[seq2] [i];
child2[i] = farm [seql] [i];
}
else
{
child1[i] = farm[seql] [i];
child2[i] = farm[seq2] [i] ;
}
}

for(i=O;i<cities;i++)
15.11 Problems with Constraints 377

{
if«i<pos1) II (i>=pos2))
while«pos = insequence(child1[i],child1,pos1,pos2)) >= 0)
childl[i] = child2[pos];
if«i<pos1) I I (i>=pos2))
while«pos=insequence(child2[i],child2,pos1,pos2))>=0)
child2[i] = childl[pos];
}

if(distance(farm[seq1],cities,dist) > distance(childl,cities,dist))


result1 = child1;
else resultl = farm[seq1];
if(distance(farm[seq2],cities,dist) > distance(child2,cities,dist))
result2 = child2;
else result2 = farm [seq2] ;
result3=«result1 == farm[seq1])?child1:farm[seq1]);
result4=«result2 == farm[seq2])?child2:farm[seq2]);
farm[seql] = resultl;
farm[seq2] = result2;
delete [] result3;
delete [] result4;
}

void main(void)
{
int N = 20; // number of animals/chromosomes
int iterations = 300;
cout « N « endl;
int** farm = NULL;
int i,j;
double** dist = NULL; // array of distances
int cities; // number of cities
readdist("tsp.dat",dist,cities);
cout«"Cities: "«cities«endl;
farm = new int*[N];
for(i=O;i<N;i++) farm[i]=new int[cities];
setupfarm(farm,N,cities);
for(i=O;i<iterations;i++)
{
mutate(farm,N,cities,dist);
permutate(farm,N,cities,dist);
pmx(farm,N,cities,dist);
}
for(i=O;i<N;i++)
{
for(j=O;j<cities;j++) cout«farm[i] [j]«" ";
cout « " distance:" « distance (farm [i] ,cities,dist) « endl;
}
378 Chapter 15. Genetic Algorithms

destroydist(dist,cities);
}

The input file tsp. dat is:

8
14.5413
20.7663
13.5059
19.6041
10.4139
4.60977
14.5344
6.34114
5.09313
9.12195
5
12.0416
14.0357
8.70919
10.4938
11.2432
18.3742
18.8788
14.213
7.5326
12.6625
17.7071
9.72677
15.4729
10.5361
7.2111
10.198
10
15.11 Problems with Constraints 379

A typical output is

Cities: 8
742 3 150 6 distance:64.8559
742 3 1 506 distance:64.8559
742 3 1 506 distance:64.8559
065 1 3 247 distance:66.1875
7 4 2 3 1 506 distance:64.8559
7 4 2 3 1 506 distance:64.8559
7 4 2 3 150 6 distance:64.8559
065 1 3 247 distance:66.1875
7 4 2 3 150 6 distance:64.8559
4 2 1 3 0 6 5 7 distance:67.9889
7 4 2 3 1 506 distance:64.8559
7 4 2 3 1 506 distance:64.8559
065 1 3 2 4 7 distance:66.1875
065 1 3 2 4 7 distance:66.1875
742 3 150 6 distance:64.8559
742 3 150 6 distance:64.8559
742 3 150 6 distance:64.8559
742 3 1 5 6 0 distance:66.1875
742 3 1 506 distance:64.8559
742 3 150 6 distance:64.8559
38D Chapter 15. Genetic Algorithms

15.12 Other Applications for Genetic Algorithms


Many other optimization problems can be solved using genetic algorithms. Two of
them are the bin packing problem and Steiner's problem.

The bin packing problem is as follows. Given bins each of size Sand m objects with
sizes

minimize the number of bins n

n-l n-l
subject to L SiXi,j :; S, LXi,j = I,
;=0 i=O

where Xi,j E {D,I}.


We use Xi,j = 1 when object i is in bin j.

Genetic algorithms can also be applied to Steiner's problem. In this problem there
are n villages. Village j requires L j phone lines to a station. A line costs c per
kilometer. Determine where to place a single station such that the total cost for the
phone lines is minimized. The set of positions of the villages is

Thus we must minimize

n-l
L CLjlXj - 81
i=O

where 8 is the location of the station.


15.13 Distributed Global Optimization 381

15.13 Distributed Global Optimization


Distributed global optimization [172] is a technique which attempts to overcome
some of the limitations of other techniques for global optimization such as gradient
descent, Monte Carlo methods and genetic algorithms. It does not suffer the loss of
speed due to extensive use of random numbers, the lack of a stopping condition and
does not require the tuning of parameters (such as the population size in genetic
algorithms) to aid effectiveness. Furthermore the function to be optimized does not
have to be continuous or differentiable.

The algorithm starts with an initial guess 8(0). This can be randomly generated or
specifically chosen. Each iteration of the algorithm produces a new point 8(i). We
use the transformation T(m1, m2, 8) to generate two new points for every for every
point 8 in a set. The transformations operate on the part of the point identified by
m1 and m2, where m1 and m2 are bit positions. Each resolution must use a power
of 2 as the number of bits of representation.
1. Set the resolution n := 1.

2. From 8(i) generate 2n +l - 1 points as follows.

(a) For j = O, ... ,n set

21-1
P(j):= U T(k2 n - i ,(k+ 1)2n - j -1,8(i)).
k=O

(b) The points are in UP(j) U {8( in.


3. Find the point k with minimum function value from the 2n +l points.

4. If this is a new minimum set 8(i + 1) := k and goto 2.


5. Increase the resolution, i.e. increment n.
6. If the resolution is less than the maximum resolution goto 2.
7. Stop.
Now we give an example program, to calculate the maximum of the function

cos(x) - sin(2x).

Since the discrete global optimization technique attempts to find a minimum func-
tion value, we use -(cos(x) - sin(2x)) in the program for function evaluation. For
the transformation we use simple bit inversion (Le. we apply NOT to the selected
bits). Valafar [172] recommends using the Gray code to transform the selected bits.
The chosen transform influences the effectiveness of the technique.
382 Chapter 15. Genetic Algorithms

II dgo.cpp

#include <iostream>
#include <math.h>
#include <stdlib.h>
#include <time.h>

using namespace std;

const double pi = 3.14115927;

II maximum bit resolution between 1 and 32


const int maxres = 20;

typedef double (*function) (unsigned int);


typedef unsigned int (*transform) (int,int,unsigned int);

unsigned int pow2(unsigned int y)


{
static unsigned int pow2table[31];
static int init = 0;
unsigned int p2;

if(!init)
{
pow2table[0]=1;
for(p2=1;p2<32;p2++) pow2table[p2]=2*pow2table[p2-1];
}
if(y>31) return 0;
return pow2table[y];
}

void dgo(unsigned int &S,function f,transform T)


{
int n = l,j,k,newmin;
unsigned int *P,*q;
double min;

min = f(S);
while(n < maxres)
{
q = P = new unsigned int[pow2(n+1)-1];
newmin = 0;
for(j=O;j <= n;j++)
for(k=O;k < pow2(j);k++)
{
*q = T(k*pow2(n-j),(k+l)*pow2(n-j)-l,S);
if«f(*q) < min)&&(S != *q»
15.13 Distributed Global Optimization 383

{
min = H*q);
S = *q;
newmin = 1;
}
q++;
}
delete[] P;
if ( !newmin) n++;
}
}

double f(unsigned int a)


{
double x = 2*pi*a/(pow2(8*sizeof(unsigned int»-l);
return -cos(x)+sin(2*x);
}

unsigned int T(int ml,int m2,unsigned int x)


{
unsigned int mask=pow2(m2+1)-pow2(m1);
return x-mask;
}

void main(void)
{
unsigned int S;
double x;

srand(time(NULL»;
S=randO;
dgo(S,f,T);
x = 2*pi*S/(pow2(8*sizeof(unsigned int»-l);
cout « "For cos(x)-sin(2x) DGO gives" « -£(S)
« " at " « x « endl;
}

The program output is

For cos(x)-sin(2x) DGO gives 1.76017 at 5.64887


384 Chapter 15. Genetic Algorithms

15.14 Genetic Programming


Genetic programming [107] uses the techniques of genetic algorithms to generate
"programs". The programs are evaluated according to their effectiveness in solving
a certain problem. It may not be possible to find a program that solves a given
problem exactly. The constructs and instructions provided by a language may be
insufficient to solve a problem. In this case the genetic algorithm techniques attempt
to approximate a solution. A simple illustration of genetic programming is symbolic
regression, which attempts to find a function which best fits a given set of data
points. We construct the function symbolically, the function consists of the basic
operations +, - and * acting on polynomials of a single variable, or one of the
functions cos, sin or expo We exclude division to avoid division by zero errors.
Obviously not all functions will be able to be simulated exactly without division.
Since each operation takes at most two arguments, the functions can be represented
as a binary tree. For mutation, a binary operation in the tree is selected and
randomly changed to one of the other operations, or a leaf node is replaced with
a randomly generated tree. For crossover, subtrees are swapped. We keep only
the fittest third of the population and replace the rest with new individuals. This
attempts to provide the variety of functions needed to find the best fit.

In the following program we apply the technique to the functions

cos(x), cos(2x)

using 10 data points, and

1
l+x

using 20 data points. We use SymbolicC++ [169] to create the expressions for the
functions and to evaluate the functions at the data points. The generation of a
tree for a symbolic expression is simple. The function type is randomly determined,
and if the function takes any parameters these must also be generated. For every
function parameter generated the probability that the next parameter is a leaf node
(a constant, or variable, but not a function) is increased. Thus we can ensure
that a randomly generated tree will not exceed a certain depth. The crossover
function randomly selects two individuals from the population, and then selects the
subtrees to be swapped. This is done by randomly determining if the current node
in the tree will be used as the subtree or if one of its branches will be used. The
process is repeated until a node is selected or a leaf node is found. An improvement
would be to use the roulette wheel method to select candidates for crossover. The
function fitness takes a symbolic tree, converts it to a symbolic expression and
then calculates the error for each data point. The sum of these errors is used as the
fitness. A fitness of zero is a perfect match.
15.14 Genetic Programming 385

The program below uses small populations and few iterations of the algorithm in
order to reduce the execution time and memory requirements. To achieve better
results large populations should be used to perform more extensive global search for
the optimum.

II sreg.cpp

#include <iostream.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include "Msymbol.h"

const double pmutate = 0.3;


const double pleaf = 1.7;
const int ops = 9;
const int opsl = 3;
const int ops2 = 6;
const char types [ops] ={' 0' , '1' , 'x' , 'c' , 's' , 'e' , '+' , '-' , '*'};

struct stree
{
int type;
struct stree *arg[2];
};

Sum<double> zero(O.O) , one(1.0), x("x",l);

Sum<double> symbolic(struct stree st)


{
switch(types[st.type])
{
case '0' : return zero;
case '1' : return one;
case 'x' : return x;
case 'c' : return cos(symbolic(*(st.arg[O])));
case 's' : return sin(symbolic(*(st.arg[O])));
case 'e' : return exp(symbolic(*(st.arg[O])));
case '+' : return symbolic(*(st.arg[O]))+symbolic(*(st.arg[l]));
case '-' : return symbolic(*(st.arg[O]))-symbolic(*(st.arg[l]));
case '*' : return symbolic(*(st.arg[O]))*symbolic(*(st.arg[l]));
case 'I' : return symbolic(*(st.arg[O]))/symbolic(*(st.arg[l]));
}
return zero;
}

void destroy(struct stree st)


386 Chapter 15. Genetic Algorithms

{
if(st.type >= opsl)
{
destroy(*(st.arg[O]»;
delete st.arg[O];
}
if(st.type >= ops2)
{
destroy(*(st.arg[l]»;
delete st.arg[l];
}
}

struct stree generate(double p=O.l)


{
double q = double(rand(»/RAND_MAX;
int type;
struct stree st;

if(q> p)
type = rand()y'ops;
else
type = rand()Y.opsl;

st. type = type;


if(type >= opsl)
{ st.arg[O] = new struct stree; *(st.arg[O]) = generate(p*pleaf);}
if(type >= ops2)
{ st.arg[l] = new struct stree; *(st.arg[l]) = generate(p*pleaf);}
if«symbolic(st) == zero) I I (symbolic(st) == one»
{
destroy(st);
return generate(p);
}
return st;
}

void mutate(struct stree *st,int n)


{
int ind = rand()y'n;
int mut = rand();
int branch = rand()Y.2;
double p = double(rand(»/RAND_MAX;

if(n == 1) ind = 0;
if(st[ind].type < opsl)
{ destroy(st[ind]); st[ind] = generate(); return; }
else if(st[ind].type < ops2)
15.14 Genetic Programming 387

{ mut %= ops2-opsl; mut += opsl; branch = OJ}


else
{ mut %= ops-ops2; mut += ops2; }

if((p < pmutate) I I (st[ind) . type < ops2))


st[ind) .type = mut;
else mutate(st[ind) .arg[branch),l);
}

struct stree copy(struct stree st)


{
stree str=st;
if(st.type >= opsl)
{
str.arg[O]=new struct stree;
*(str.arg[O))=copy(*(st.arg[O]));
}
if(st.type >= ops2)
{
str.arg[l]=new struct stree;
*(str.arg[l])=copy(*(st.arg[l)));
}
return str;
}

void crossover(struct stree *st,int n)


{
int indl = rand()'l>n;
int ind2 = rand()'l>n;

while(st[indl] .type < opsl) indl = rand()%n;


while(st[ind2] .type < opsl) ind2 = rand()%n;

st[n] = copy(st[indl));
st[n+l] = copy(st[ind2]);

struct stree *stnl,*stn2,*stpl,*stp2;


int dl = O,d2 = O,argl = O,arg2 = 0;

stpl = stnl = &(st[indl]); stp2 = stn2 = &(st[ind2));


while ( !dl)
{
argl = randO%2;
double p = double(rand())/RAND_MAX;
if (stnl->type < ops2) argl = 0;
stpl = stnl;
stnl = stnl->arg[argl];
if((stnl->type < opsl) II (p > 0.5))
388 Chapter 15. Genetic Algorithms

d1 = 1;
}
while(!d2)
{
arg2 = randO%2;
double p = double(rand(»/RAND_MAX;
if (stn2->type < ops2) arg2 = 0;
stp2 = stn2;
stn2 = stn2->arg[arg2];
if«stn2->type < opsl) I I (p > 0.5»
d2 = 1;
}

struct stree *temp = stp1->arg[arg1];


stp1->arg[arg1] = stp2->arg[arg2];
stp2->arg[arg2] = temp;
}

double fitness(struct stree st,double* data,int n)


{
double sum = 0.0;
int i;
for(i=O;i < n; i++)
{
x.set(*(data++»;
sum += fabs(symbolic(st).nvalue()-*(data++»;
}
return sum;
}

void gp(int n,int m,double *data, int d)


{
int i,j,k;
struct stree *st = new struct stree[n+2];
double minf;

for(i=O;i < n+2;i++) st[i] generate 0 ;


for(i=O;i < m;i++)
{
cout « i+1 « "/" « m « "generations \r";
cout.f1ushO;
mutate(st,n);
crossover(st,n);
for(j=O; j< n+2;j++)
{
for(k=j+1;k < n+2; k++)
if(fitness(st[j],data,d) > fitness(st[k],data,d»
{
15.14 Genetic Programming 389

struct stree temp = st[j];


st [j] = st [k] ;
st[k] = temp;
}
}
for(j=n/3+1;j < n+2;j++)
{ destroy(st[j]); if (j < n) st[j] = generate();}
}
cout « endl;
Sum<double> f = symbolic(st[O]);
cout « f « endl;
for(i=O;i < 2*d;i+=2)
{
x. set (data [i]) ;
cout « data[i+1] « 11 " « f.nvalueO « endl;
}
for(i=O;i < n;i++) destroy(st[i]);
delete [] st;
}

void main(void)
{
const double pi=3.1415297;
double data[20*2];
int i;

srand(time(NULL»;

for(i=O;i < 20;i+=2) { data[i]=i*pi/l0; data[i+l]=cos(data[i]); }


gp(6,50,data,10);
cout « endl;

for(i=O;i < 20;i+=2) { data[i]=i; data[i+l]=i*i; }


gp(6,50,data,10);
cout « endl;

for(i=O;i < 20;i+=2) { data[i]=i; data[i+l]=cos(2*i); }


gp(8,60,data,10);
cout « endl;

for(i=O;i < 40;i+=2) { data[i]=i; data[i+l]=1.0/(i+l.0); }


gp(3,100,data,20);
cout « endl;
}

The program outputs the function which best fits the data points and the data point
and function evaluation for the different input values. The left column of numbers
are the desired values, and the right column the obtained values.
390 Chapter 15. Genetic Algorithms

50/50 generations
cos (x)
1 1
0.809024 0.809024
0.309041 0.309041
-0.308981 -0.308981
-0.808987 -0.808987
-1 -1
-0.809061 -0.809061
-0.309101 -0.309101
0.308921 0.308921
0.80895 0.80895

50/50 generations
x-(2)
o0
4 4
16 16
36 36
64 64
100 100
144 144
196 196
256 256
324 324

60/60 generations
sin(cos(2*x»
1 0.841471
-0.653644 -0.608083
-0.1455 -0.144987
0.843854 0.74721
-0.957659 -0.817847
0.408082 0.39685
0.424179 0.411573
-0.962606 -0.820683
0.834223 0.740775
-0.127964 -0.127615

100/100 generations
sin((exp(x)*x+x-(2»*sin(x)*x)
1 0
0.333333 0.396537
0.2 0.429845
0.142857 -0.199608
15.14 Genetic Programming 391

0.111111 0.976007
0.0909091 0.997236
0.0769231 -0.246261
0.0666667 0.327578
0.0588235 0.762098
0.0526316 -0.998154
0.047619 -0.677241
0.0434783 -0.998545
0.04 0.209575
0.037037 -0.863178
0.0344828 -0.987161
0.0322581 0.492514
0.030303 -0.326958
0.0285714 -0.801911
0.027027 -0.328644
0.025641 0.782954
392 Chapter 15. Genetic Algorithms

15.15 Gene Expression Programming


Gene expression programming is a genome/phenome genetic algorithm [66] which
combines the simplicity of genetic algorithms and the abilities of genetic program-
ming. In a sense gene expression programming is a generalization of genetic algo-
rithms and genetic programming.

A gene is a symbolic string with a head and a tail. Each symbol represents an
operation. For example the operation "+" takes two arguments and adds them.
The operation "x" would evaluate to the value of the variable x. The tail consists
only of operations which take no arguments. The string represents expressions in
prefix notation, i.e. 5 - 3 would be stored as "- 5 3". The reason for the tail is to
ensure that the expression is always complete. Suppose the string has h symbols in
the head which is specified as an input to the algorithm, and t symbols in the tail
which is determined from h. Thus if n is the maximum number of arguments for an
operation we must have
h+t-1 = hn.
The left-hand side is the total number of symbols except for the very first symbol.
The right-hand side is the total number of arguments required for all operations. We
assume, of course, that each operation requires the maximum number of arguments
so that any string of this length is a valid string for the expression. Thus the equation
states that there must be enough symbols to serve as arguments for all operations.
Now we can determine the required length for the tail

t = h(n - 1) + 1.

Suppose we use h = 8, and n = 2 for arithmetic operations. Thus the tail length
must be t = 9. So the total gene length is 17. We could then represent the expression

cos(x2 + 2) - sin(x)

with the string

-c+*xx2slxlx226x31

The vertical I is used to indicate the beginning of the tail. Here c represents cosO
and s represents sinO.

A chromosome is a series of genes. The genes combine to form an expression using


some operation with the same number of arguments as genes in the chromosome.
For example the expressions of genes of a chromosome may be added together.

A number of operations are applied to chromosomes.

• Replication. The chromosome is unchanged. The roulette wheel selection


technique can be used to select chromosomes for replication.
15.15 Gene Expression Programming 393

• Mutation. Randomly change symbols in a chromosome. Symbols in the tail


of a gene may not operate on any arguments. Typically 2 point mutations per
chromosome is used.

• Insertion. A portion of a chromosome is chosen to be inserted in the head of


a gene. The tail of the gene is unaffected. Thus symbols are removed from the
end of the head to make room for the inserted string. Typically a probability
of 0.1 of insertion is used. Suppose +x2 is to be inserted into

-c+*xx2slxlx226x31

at the fourth position in the head. We obtain

-c++x2*xI xlx226x31

which represents
cos((x + 2) + x2) - 1.

• Gene transposition. One gene in a chromosome is randomly chosen to be the


first gene. All other genes in the chromosome are shifted downwards in the
chromosome to make place for the first gene.

• Recombination. The crossover operation. This can be one point (the chro-
mosomes are split in two and corresponding sections are swapped), two point
(chromosomes are split in three and the middle portion is swapped) or gene
(one entire gene is swapped between chromosomes) recombination. Typically
the sum of the probabilities of recombination is used as 0.7.

In the following program we implement these techniques. The example is the same as
for genetic programming. This implementation is faster and more accurate than the
implementation for genetic programming. This is a result of the relative simplicity of
gene expression programming. For simplicity we use only one gene in a chromosome
and only one point recombination.
394 Chapter 15. Genetic Algorithms

II gep.cpp

#include <iostream.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
#include <string.h>

const double pi=3.1415297j


const int nsymbols=8j
II 2 terminal symbols (no arguments) x and 1
const int terminals=2j
II terminal symbols first
const char symbols [nsymbols] = {' 1 ' , , x' , , c' , , s' , , e' , , +' , , -, , , * ' } j
const int n = 2j II for +,- and *
int h = 5j

double evalr(char *&e,double x)


{
switch( * (e++»
{
case ' 1': return 1.0j
case 'x' : return Xj
case 'c' : return cos(evalr(e,x»j
case 's' : return sin(evalr(e,x» j
case 'e' : return exp(evalr(e,x» j
case '+' : return evalr(e,x)+evalr(e,x)j
case ,-, : return evalr(e,x)-evalr(e,x)j
case '*' : return evalr(e,x)*evalr(e,x);
default return 0.0;
}
}

double eval(char *e,double x)


{
char *c=ej
return evalr(c,x)j
}

void printr(char *&e)


{
switch(*(e++»
{
case ' 1': cout « ' l' j
breakj
case 'x' : cout « 'x' ;
breakj
case 'c' : cout « "cos("jprintr(e)j
15.15 Gene Expression Programming 395

cout « 11)11;
break.;
case's': ·coat« "sin(";printr{e);
coat « II)";
break;
case 'e': cout « "exp(";printr(e);cout«")";
break;
case '+' : cout « ' ( , ;
printr(e);
cout « '+' ;
printr(e);
cout « ,) , ;
break;
case ,-, . cout « ' (' ;
printr(e);
cout « ,-, ,.
printr(e);
cout « ,) , ;
break;
case '*': cout « '(';
printr(e);
cout « '*';
printr(e);
cout«')';
break;
}
}

void print(char *e)


{
char 'l<c=e;
printr(c);
}

double fitness(char *c,double *data,int N)


{
int j;
double sum=O.O;

II the average error


for(j=O; j<N; j++)
sum+=fabs(eval(c,data[2*j])-data[2*j+l]);
return sum;
}

II N number of data points


II population of size P
Il eps = accuracy required
396 Chapter 15. Genetic Algorithms

void gep(double *data,int N,int P, double eps)


{
int i,j,k,replace,replace2,rlen,rp:
int t = h*(n-l)+l:
int gene_len = h+t:
int pop_len = P*gene_Ien:
int iterations=O:
char *population = new char[pop_len):
char *elim = new char[P):
int toelim = P/2:
double bestf,f: II best fitness, fitness value
double sumf = 0.0: II sum of fitness values
double pm = 0.1: II probability of mutation
double pi = 0.4: II probability of insertion
double pr = 0.7: II probability of recombination
double r,lastf: II random numbers and roulette wheel selection
char *best = (char*)NULL,*iter: II best gene, iteration variable

II initialize the population


for(i=O:i < pop_len:i++)
if(i%gene_Ien < h)
population [i) symbols[rand()%nsymbols):
else
population [i) symbols[rand()%terminals):

II initial calculations
bestf = fitness(population,data,N):
best = population:
forCi=O,sumf=O.O,iter=population;i < P:i++,iter+=gene_Ien)
{
f = fitness(iter,data,N):
sumf += f:
if (f<bestf)
{
bestf = f:
best = population+i*gene_len:
}
}

while(bestf >= eps)


{
II reproduction
II roulette wheel selection
for(i=O:i<P:i++)
elim[i) =0;
for(i=O:i < toelim:i++)
{
r = sumf*(double(rand())/RAND_MAX):
15.15 Gene Expression Programming 397

lastf = 0.0;
for(j=O;j < P;j++)
{
f = fitness(population+j*gene_len,data,N);
if«lastf<=r) && (r<f+lastf))
{
elim[j] = 1;
j = P;
}
lastf += f;
}
}

for(i=O;i < pop_len;)


{
if (population+i==best)
i += gene_len; II never modify/replace best gene
else for(j=O;j < gene_len;j++,i++)
{
II mutation or elimination due to failure in selection
II for reproduction
if «double (rand())/RAND_MAX<pm) I I elim[i/gene_len])
if(i%gene_len < h)
population[i] symbols [rand () %nsymbols] ;
else
population [i] symbols[rand()%terminals];
}

II insertion
if (double (rand())/RAND_MAX<pi)
{
II find a position in the head of this gene for insertion
II -gene_len for the gene since we have already moved
II onto the next gene
replace = i-gene_len;
rp = rand 0 'l.h;
II a random position for insertion source
replace2 = rand()%pop_len;
II a random length for insertion from the gene
rlen = rand()%(h-rp);
II create the new gene
char *c = new char[gene_len];
II copy the shifted portion of the head
strncpy(c+rp+rlen,population+replace+rp,h-rp-rlen);
II copy the tail
strncpy(c+h,population+replace+h,t);
II copy the segment to be inserted
strncpy(c+rp,population+replace2,rlen);
398 Chapter 15. Genetic Algorithms

II if the gene is fitter use it


if(fitness(c,data,N) < fitness(population+replace,data,N»
strncpy(population+replace,c,h)j
delete[] Cj
}

II recombination
if(double(rand(»/RAND_MAX < pr)
{
II find a position in the gene for one point recombination
replace = i-gene_Ienj
rlen = rand()%gene_Ienj
II a random gene for recombination
replace2 = (rand()%P)*gene_Ienj
II create the new genes
char *c[5]j
c[O] = population+replacej
c[l] = population+replace2j
c[2] = new char[gene_Ien]j
c[3] = new char[gene_Ien]j
c[4] = new char[gene_Ien]j
strncpy(c[2],c[O],rlen)j
strncpy(c[2]+rlen,c[1]+rlen,gene_len-rlen)j
strncpy(c[3],c[1],rlen)j
strncpy(c[3]+rlen,c[O]+rlen,gene_len-rlen)j
II take the fittest genes
for(j=Ojj < 4jj++)
for(k=j+ljj < 4jj++)
if(fitness(c[k],data,N) < fitness(c[j],data,N»
{
strncpy(c[4],c[j],gene_Ien)j
strncpy(c[j],c[k],gene_Ien)j
strncpy(c[k],c[4],gene_Ien)j
}
delete [] c [2] j
delete [] c [3] j
delete[] c[4] j
}
}

II fitness
for(i=O,sumf=O.O,iter=populationji < Pji++,iter+=gene_Ien)
{
f = fitness(iter,data,N)j
sumf += fj
if (f < bestf)
{
bestf = fj
15.15 Gene Expression Programming 399

best population+i*gene_len;
}
}
iterations++;
}

print(best);
cout « endl;
cout « "Fitness of " « bestf « " after "
« iterations « " iterations." « endl;
for(i=O;i < N;i++)
cout « data [2*i +1] « " " « eval(best, data [2*i]) « endl;

delete[] population;
delete [] elim;
}

void main(void)
{
double data[20*2];
int i;

srand(time(NULL));

for(i=O;i < 20;i+=2)


{
data[i] = i*pi/l0;
data[i+l] = cos(data[i]);
}

gep(data,10,50,O.OOl);
cout « endl;

for(i=O;i < 20;i+=2)


{
data[i] = i;
data[i+l] = i*i;
}

gep(data,10,50,O.OOl);
cout « endl;

for(i=O;i < 20;i+=2)


{
data[i] = i;
data[i+l] = cos(2*i);
}
400 Chapter 15. Genetic Algorithms

gep(data,10,50,0.001);
cout « endl;
}

/*
Results:

cos (x)
Fitness of 1.95888e-16 after 6 iterations.
1 1
0.809024 0.809024
0.309041 0.309041
-0.308981 -0.308981
-0.808987 -0.808987
-1 -1
-0.809061 -0.809061
-0.309101 -0.309101
0.308921 0.308921
0.80895 0.80895

(x*x)
Fitness of 0 after 0 iterations.
o0
4 4
16 16
36 36
64 64
100 100
144 144
196 196
256 256
324 324

cos««x+x)+x)-x»
Fitness of 1.59324e-16 after 191 iterations.
1 1
-0.653644 -0.653644
-0.1455 -0. 1455
0.843854 0.843854
-0.957659 -0.957659
0.408082 0.408082
0.424179 0.424179
-0.962606 -0.962606
0.834223 0.834223
-0.127964 -0.127964
*/
Part II
Quantum Computing
Chapter 16
Quantum Mechanics

16.1 Hilbert Spaces


In this chapter we introduce the Hilbert space which plays the central role in quan-
tum mechanics. For a more detailed discussion of this subject we refer to the books
of Balakrishnan [5], Richtmyer [137], Sewell [146], Stakgold [152]' Steeb [163], Weid-
mann [181]' Yosida [185]. Moreover the proofs of the theorems given in this chapter
can be found in these books. We assume that the reader is familiar with the notation
of a linear space. First we introduce the pre-Hilbert space.

Definition. A linear space L is called a pre-Hilbert space if there is defined a numer-


ical function called the scalar product (or inner product) which assigns to every f,
9 of vectors of L (f,g E L) a complex number C. The scalar product satisfies the
conditions

(a) (j, J) "2 0, (j, J) = 0 iff f = 0


(b) (j, g) = (g, f)
(c) (cf,g) = c(j,g) where c is an arbitrary complex number

(d) (j1 + f2,g) = (j1,g) + (j2,g)


where (g, J) denotes the complex conjugate of (g, J) .

It follows that

and
(j,cg) = c(j,g).
Definition. A linear space E is called a normed space, if for every fEE there is
associated a real number Ilfll, the norm of the vector f such that

(a) Ilfll "20, Ilfll = 0 iff f =0


(b) Ilcfll = Icillfil where cis an arbitrary complex number
(c) Ilf + gil::; Ilfll + Ilgll·

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
404 Chapter 16. Quantum Mechanics

The conditions imply that

III - gil ~ 111/11 -lIglll·


This can be seen as follows. From

III - gil + Ilgll ~ 11I11


we obtain
IIi - gil ~ Ilill - IIgll·
On the other hand

Iii - gil = I - Illig - ill ~ IIgll - 11/11·

The topology of a normed linear space E is thus defined by the distance

d(f,g) = III - gil·

If a scalar product is given we can introduce a norm. The norm of I is defined by

11111 := J(f,J).

A vector I E L is called normalized if 11/11 = 1.


Definition. Two functions f ELand gEL are called orthogonal if
(f,g) = O.
Example. Consider the pre-Hilbert space R4 and

Then xTy = O.
Definition. A sequence {In} (n E N) of elements in a normed space E is called a
Cauchy sequence if, for every t: > 0, there exists a number M, such that II Ip - fq II < t:
for p,q > M •.
16.1 Hilbert Spaces 405

Definition. A normed space E is said to be complete if every Cauchy sequence of


elements in E converges to an element in E.

Definition. A complete pre-Hilbert space is called a Hilben space.

Definition. A complete normed space is called a Banach space.

Example. The vector space C([a, b]) of all continuous (real or complex valued)
functions on an interval [a, bJ with the norm

11111 = max
[a,b]
II(x)1

is a Banach space.

A Hilbert space will be denoted by 1t in the following. A Banach space will be


denoted by B in the following.

Theorem. Every pre-Hilbert space £ admits a completion 1t which is a Hilbert


space.

Example. Let £ = Q. Then 1t = R.

Before we discuss some examples of Hilbert spaces we give the definitions of strong
and weak convergence in Hilbert spaces.

Definition. A sequence {In} of vectors in a Hilbert space 1t is said to converge


strongly to I if
IIIn - III --+ 0
as n --+ 00. We write s- limn-->oo In --+ f.
Definition. A sequence {In} of vectors in a Hilbert space 1t is said to converge
weakly to I if
(fn,g) --+ (f,g)
as n --+ 00, for any vector g in 1t. We write w - limn-->oo In --+ f.
It can be shown that strong convergence implies weak convergence. The converse is
not generally true, however.

Example. Consider the sequence

In(x) := sin(nx), n=1,2, ...

in the Hilbert space £2[0, 1rJ. The sequence does not tend to a limit in the sense
of strong convergence. However, the sequence tends to 0 in the sense of weak
convergence. "
406 Chapter 16. Quantum Mechanics

Let us now give several examples of Hilbert spaces which are important in quan-
tum mechanics (and quantum computing). Although quantum computing is mainly
discussed for finite dimensional Hilbert spaces (see chapter 17), infinite dimensional
Hilbert spaces are now also discussed [29, 30, 31, 112, 175].

Example. Every finite dimensional vector space with an inner product is a Hilbert
space. Let en be the linear space of n-tuples of complex numbers with the scalar
product

n
(u, v) := L U{Vj.
j=1

Then en is a Hilbert space. Let u E en. We write the vector u as a column vector

Thus we can write the scalar product in matrix notation


(u, v) = uTv
where u T is the transpose of u.

Example. By l2(N) we mean the set of all infinite dimensional vectors (sequences)
u = (Ul, U2, ... f of complex numbers Uj such that

00

L IUjl2 < 00.


j=1

Here l2 (N) is a linear space with operations (a E e)


au (aUl,aU2, ... )T
u+v (U1+Vl,U2+V2, ... f
with v = (V1, V2, ... )T and ~~1IvjI2 < 00. One has

00 00 00

L IUj + Vjl2 ::; L(IUjI2 + IVjl2 + 21UjVjl) ::; 2 L(lujl2 + IVjI2) < 00.
j=l j=l
16.1 Hilbert Spaces 407

The scalar product is defined as

L U{Vj = UTV.
00

(u, v) :=
j=l

It can also be proved that this pre-Hilbert space is complete. Therefore 12(N) is a
Hilbert space. As an example, let us consider

1 lIT
u=(l,-,-, ... ,-, ... ) .
2 3 n

Since

00 1
~-<oo
~n2
n=l

we find that u E 12 (N). Let

Then u ~ lz(N).

Example. L2(M) is the space of Lebesgue square-integrable functions on M, where


M is a Lebesgue measurable subset of Rn, where n E N. If 1 E L2(M), then

M
J1112 dm < 00.

The integration is performed in the Lebesgue sense. The scalar product in L2 (M)
is defined as
(I,g) := Jf(x)g(x)dm
M

where 9 denotes the complex conjugate of g. It can be shown that this pre-Hilbert
space is complete. Therefore L 2 (M) is a Hilbert space. Instead of dm we also write
408 Chapter 16. Quantum Mechanics

dx in the following. If the Riemann integral exists then it is equal to the Lebesgue
integral. However, the Lebesgue integral exists also in cases in which the Riemann
integral does not exist. ..

Example. Consider the linear space Mn of all n x n matrices over C. The trace of
an n x n matrix A = (ajk) is given by

n
trA:= Lajj.
j=l

We define a scalar product by

(A, B) := tr(AB*)

where tr denotes the trace and B* denotes the conjugate transpose matrix of B. We
recall that tr(C+D) = trC+trD, where C and Dare n x n matrices. For example,
if A is the n x n unit matrix we find

(A, A) = tr(AA*) = tr(A) = n.

Thus IIAII = y'n.

Example. Consider the linear space of all infinite dimensional matrices A = (ajk)
over C such that

00 00

L L lajkl 2 < 00.


j=lk=l

We define a scalar product by

(A, B) := tr(AB*)

where tr denotes the trace and B* denotes the conjugate transpose matrix of B. We
recall that tr( C + D) = trC + trD where C and D are infinite dimensional matrices.
The infinite dimensional unit matrix does not belong to this Hilbert space. ..

Example. Let D be an open set of the Euclidean space Rn. Now L2 (D)pq denotes
the space of all q x p matrix functions Lebesgue measurable on D such that

~ trf(x)f(x)*d~ < 00
D
16.1 Hilbert Spaces 409

where m denotes the Lebesgue measure, * denotes the conjugate transpose, and tr
is the trace of the q x q matrix. We define the scalar product as

(j,g) := Jtrf(x)g(x)*dm.
D

Then L2 (D)pq is a Hilbert space.

Theorem. All complex infinite dimensional Hilbert spaces are isomorphic to l2(N)
and consequently are mutually isomorphic.

Definition. Let S be a subset of the Hilbert space 7t. The subset S is dense in 7t
if for everyf E 7t there exists a Cauchy sequence {Ij} in S such that Ii ----> f as
j ----> 00.

Definition. A Hilbert space is called separable if it contains a countable dense subset


{JI,h, ... }.
Example. The set of all u = (Ul, u2, ... f in l2(N) with only finitely many nonzero
components Uj is dense in l2(N). "
Example. Let C(2)(R) be the linear space of the once continuously differentiable
functions that vanish at infinity together with their first derivative and which are
square integrable. Then C(2)(R) is dense in L2(R). "

In almost all applications in quantum mechanics the underlying Hilbert space is


separable.

Definition. A subspace /C of a Hilbert space 7t is a subset of vectors which themselves


form a Hilbert space.

It follows from this definition that, if /C is a subspace of H, then so too is the set /C J.
of vectors orthogonal to all those in /C. The subspace /C J.. is termed the orthogonal
complement of /C in H. Moreover, any vector f in H may be uniquely decomposed
into components fK: and f KJ., lying in /C and /C J., respectively, i.e.

Example. Consider the Hilbert space H = l2(N). Then the vectors

with Un = 0 for n > N, form a subspace /C. The orthogonal complement /C J. of /C


then consists of the vectors
(0, ... ,0, UN+l, UN+2,"')
with Un = 0 for n s:; N.
410 Chapter 16. Quantum Mechanics

Definition. A sequence {I/Jj}, j E I and I/Jj E 1i. is called an orthonormal sequence if

where I is a countable index set and 8jk denotes the Kronecker delta, i.e.

1 for j = k
{
8jk := 0 for j #- k

Definition. An orthonormal sequence {I/Jj} in 1i. is an orthonormal basis if every


f E 1i. can be expressed as
I: Index set

for some constants aj E C. The expansion coefficients aj are given by

Example. Consider the Hilbert space 1i. = C 4 . The scalar product is defined as

4
(u, v) := L UjVj.
j=l

The Bell basis for C 4 is an orthonormal basis given by

<1>+ = _1
y'2
(~)
0 '
1

w+ = _1
y'2
(~)
1 '
o
16.1 Hilbert Spaces 411

Let

u~~(l)
be a normalized state in C 4 . Then the expansion coefficients are given by

Consequently

Example. Let 'Ii = L 2 ( -71",71"). Then an orthonormal basis is given by

{ ¢Jk(X) := vk exp(ikx)

Let f E L2 (-71", 71") with f (x) = x. Then the expansion coefficients are

ak = (j,¢Jk) = Jf(X)¢Jk(X)dx = -./2-i1 Jxexp(-ikx)dx.


11"

-71"
_ 11"

-1['

Remark. We call the expansion

the Fourier expansion of f.


412 Chapter 16. Quantum Mechanics

Theorem. Every separable Hilbert space has at least one orthonormal basis.

Inequality of Schwarz. Let f, g E H. Then


1(f,g)l:::; Ilfll·llgll
Triangle inequality. Let f, g E 1i. Then
IIf + gil:::; Ilfll + IIgll
Let B = {¢n : n E I} be an orthonormal basis in a Hilbert space 1i. I is the
countable index set. Then
(1) (¢n, ¢m) = 8nm

(2) 1\ f = L (f, ¢n)¢n


/E1i. nEl

(3) 1\
/,gE1i.
(f,g) = L(f,¢n)(g,¢n)
nEI

(4) (1\ (f, ¢n) = 0) '* f= 0


</>nEB

(5) 1\ IlfIl 2 =LI(f,¢nW


/E1i. nEI

Equation (3) is called Parseval's relation.


Examples of orthonormal bases.
1) Rn or en
1 o o
o 1 o
B= o o
o
o o 1

B = { (Ejk ); j, k = 1,2, ... ,n}


where (Ejk ) is the matrix with a one for the entry in row j, column k and zero
everywhere else.
16.1 Hilbert Spaces 413

Ixl < 7r

l = 0, 1,2, ... }

The polynomials are called the Legendre polynomials. For the first four Legendre
polynomials we find Po(x) = 1, P1 (x) = X, P2 (x) = !(3x 2 - 1) and P3(X) =
!(5x 3 - 3x).

6) L2[0, a] with a > 0

{
1 .
Va exp(271unja) nEZ}

~-cos 27rXn)
(- - , ~ . (27rXn)
-sm --
a a a a

ya:f2 cos (7rxn)


~
414 Chapter 16. Quantum Mechanics

B = { (211"~n/2 exp(ik· x)

where IXjl < 11" and kj E Z.


8) L2([0, a] x [0, a] x [0, aD

B = { _1_ ei2"n.x/a
a3/ 2

where a > 0 and nj E Z.

9) L2(82) where 8 2 := {(Xl> X2, X3) : xi + x~ + x~ = I}

(_l)l+m 21 + 1 (l- m)!. dl+ 1ml (sinO)21 .~.I.


y, (0 </J) .- --. smmO e'"''''
1m, .- 211! 411" (l + m)! d(cosO)I+lml

where
0,1,2,3, ...
m + 1, ... , +l
-l, -l
and 0 :::; </J < 211", 0 :::; 0 < 11". The functions Ylm are called spherical harmonics.
The orthogonality relation is given by

J JYlm(O,</J)ft'm,(O,</J)~=§Il'§mm'.
" 2" dfl
(Ylm,YI'm'):=
8=0<1>=0

The first few spherical harmonics are given by

1
Yoo(O, </J) = . t.=
V 411"

YlO(O,</J) = {fCOSO
16.1 Hilbert Spaces 415

k = 0,1,2, ... }

The functions

are called the Hermite polynomials. For the first four Hermite polynomials we find
Ho(x) = 1, H1(x) = 2x, H2(x) = 4x 2 - 2, H3(X) = 8x3 - 12x.
11) L2(0,00)

n = 0, 1,2, ... }
where

The functions Ln are called Laguerre polynomials. For the first four Laguerre poly-
nomials we find Lo(x) = 1, L 1(x) = -x + 1, L2(x) = x 2 - 4x + 2, L3(X) =
_x3 + 9x2 - 18x + 6. eft
In many applications in quantum mechanics such as spin-orbit coupling we need
the tensor product of Hilbert spaces. The tensor product also plays a central role
in quantum computing. Let 'HI and 'H 2 be two Hilbert spaces. We first consider
the algebraic tensor product considering the spaces merely as linear spaces. The
algebraic tensor product space is the linear space of all formal finite sums

n
h = L(fj I8l gj), Ii E 'HI, gj E 'H 2
j=1

with the following expressions identified

c(f I8l g) = (f I8l cg) = (cf I8l g)


(It + h) I8l 9 = (It I8l g) + (12 I8l g)
f I8l (g1 + g2) = (f I8l g1) + (f I8l g2)
416 Chapter 16. Quantum Mechanics

where c E C. Let Ii, hI E 'HI and gj, kl E 'H 2 • We endow this linear space with the
inner product

Thus we have a pre-Hilbert space. The completion of this space is denoted by


'HI 18l'H2 and is called the tensor product Hilbert space.

As an example we consider the two Hilbert spaces 'HI = L2(a, b) and 'H 2 = L 2(c, d).
Then the tensor product Hilbert space 'HI 18l'H2 is readily seen to be

L2 ((a, b) x (c, d))

the space of the functions f(XI, X2) with a < Xl < b, c < X2 < d and

d b

/ / If(XI, X2Wdxldx2 < 00.


c a

The inner product is defined by

JJf(XI' X2)g(XI, X2) dx ldx2.


d b

(f, g) :=
c a

Let 'HI = L 2(a, b) and 'H 2 = L 2(c, d). Then we have the following

Theorem. Let
{¢>n:nEN}
be an orthonormal basis in the Hilbert space L 2 (a, b) and let

be an orthonormal basis in the Hilbert space L2 (C, d). Then the set

nEN, mEN}

is an orthonormal basis in the Hilbert space L 2( (a :::; Xl :::; b) x (c :::; X2 :::; d)).
16.2 Linear Operators in Hilbert Spaces 417

It is easy to verify by integration that the set is an orthonormal set over the rectangle.
To prove completeness it suffices to show that every continuous function I with

JJI(xl,x2) dx ldx2 <


d b

00
c a

whose Fourier coefficients with respect to the set are all zero, vanishes identically
over the rectangle.

In some textbooks and articles the so-called Dirac notation is used to describe
Hilbert space theory in quantum mechanics (Dirac [59]). Let H be a Hilbert space
and H. be the dual space endowed with the multiplication law of the form

where A E C and ¢ E H. The inner product can be viewed as a bilinear form


(duality)
(.1.) : H. x H ---+ C
such that the linear mappings

(¢I : 7/J ---+ (¢I7/J), (·1: H. ---+ 1i'


17/J) : ¢ ---+ (¢I7/J), I·) : H ---+ H:
where prime denotes the space of linear continuous functionals on the corresponding
space, are monomorphisms. The vectors (¢I and 17/J) are called bra and ket vectors,
respectively. The ket vector I¢) is uniquely determined by a vector ¢ E H, therefore
we write I¢) E H. A dyadic product of a bra vector (¢21 and a ket vector 1¢1) is a
linear operator defined as

In some chapters we will adopt the Dirac notation.

16.2 Linear Operators in Hilbert Spaces


A linear operator, A, in a Hilbert space, H, is a linear transformation of a linear
manifold, V(A) (c H), into H. The manifold V(A) is termed the domain of def-
inition, or simply the domain, of A. Throughout this section we consider linear
operators.

Definition. The linear operator A is termed bounded if the set of numbers, IIAIII, is
bounded as I runs through the normalized vectors in V(A). In this case, we define
418 Chapter 16. Quantum Mechanics

IIAII, the norm of A, to be the supremum, i.e. the least upper bound, of I AI I , as I
runs through these normalized vectors, i.e.

IIAII := sup IIAIII·


11/11=1

Example. Let 1i = en. Then all n x n matrices over e are bounded linear operators.
If In is the n x n identity matrix we have IIInll = 1. Ie
Example. Consider the Hilbert space L 2 (0, a) with a > 0. Let AI(x) := xI(x).
Then V(A) = L 2 (0, a) and IIAII = a. Ie
It follows from this definition that

IIAIII ~ II All 11111 for all vectors I in V(A).

If A is bounded, we may take V(A) to be 1i, since, even if this domain is originally
defined to be a proper subset of 1i, we can always extend it to the whole of this
space as follows. Since

IIAlm - Afnll ~ IIAlillfm - fnll


we conclude that the convergence of a sequence of vectors {In} in V(A) implies that
of {Afn}. Hence, we may extend the definition of A to V(A), the closure of V(A),
by defining
A n---+oo
lim fn:= n--+oo
lim Afn.

We may then extend A to the full Hilbert space 1i, by defining it to be zero on
V(A)-L, the orthogonal complement of V(A).

On the other hand, if A is unbounded, then in general, V(A) does not comprise the
whole of 1i and cannot be extended to do so.

Example. Consider the differential operator d/dx acting on the Hilbert space, 1i,
of square-integrable functions of the real variable x. The domain of this operator
consists of those functions f(x) for which both J If(x)1 2 dx and J Idf(x)/dxI 2dx are
both finite, and this set of functions does not comprise the whole of 1i. Ie
Definition. Let A be a bounded operator in 1i. We define A*, the adjoint operator
of A, by the formula

(1,A*g):=(Af,g) forall f,gE1i.

Definition. The operator A is termed self-adjoint if A* = A or, equivalently, if


(1, Ag) = (AI, g) for all I, 9 E 1i.
16.2 Linear Operators in Hilbert Spaces 419

Example. Let H = C 2 . Then

A=( -z0. 0i)


is a self-adjoint operator (hermitian matrix).

In the case where A is an unbounded operator in H we again define its adjoint, A*,
by the same formula, except that j is confined to D(A) and 9 to the domain D(A*),
which is specified as follows: 9 belongs to D(A*) if there is a vector gA in H such
that
U,gA) = (Aj,g) for all j in D(A)
in which case gA = A*g. The operator A is termed self-adjoint if D(A*) = D(A) and
A* = A. The coincidence of D(A*) with D(A) is essential here. The domain of a
self-adjoint operator is dense in H.

Remark. If merely (Aj, g) = (f, Ag) for all j, 9 E D(A), and if D(A) is dense in H,
i.e., if A c A*, then A is called hermitian (symmetric); D(A*) may be larger than
D(A), in which case A* is a proper extension of A.

Definition. Let A be a linear operator with dense domain. Then its nullspace is
defined by
N(A) := { u E H : Au = O}.

Definition. A self-adjoint operator A is termed positive if

U,Af) ~ 0

for all vectors j in D(A).

Example. Let B be a bounded operator. We define A := B* B. Then A is a bounded


self-adjoint operator and the operator A is positive. "

Remark. If B is unbounded, then B* B need not be self-adjoint.

Remark. An operator product AB is defined on a domain

D(AB) = {v E D(B) : Bv E D(A) }

and then
(AB)v := A(Bv).
Therefore D(A* A) may be smaller than D(A).
420 Chapter 16. Quantum Mechanics

Next we summarize the algebraic properties of the operator norm. It follows from
the definitions of the norm and the adjoint of a bounded operator, together with
the triangular inequality that if A, B are bounded operators and c E C, then

IIcAl1 IcIIiAIl
IIA"AII IIAI12
IIA+BII < II All + IIBII
IIABII < IIAIIIIBII·
Definition. Suppose that K, is a subspace of 1i. Then since any vector I in 1i may be
resolved into unequally defined components !K: and !K:J. in K, and K,.L, respectively,
we may define a linear operator II by the formula III = !K:. This is termed the
projection operator from 1i to K" or simply the projection operator or projector for
the subspace K,.

It follows from this definition and the orthogonality of lie and !K:J. that

and therefore that II is bounded. It also follows from the definition of II that

II2 = II = II".

This formula is generally employed as a definition of a projection operator, II, since


it implies that the set of elements {III} form a subspace of 1i, as I runs through
the vectors in 1i.

Example. Let 1i = R2. Then

III
1(11)
=2 1 1 ' II2 =
12 (1 -1)
-1 1

are projection operators (projection matrices). We have IIlII2 = o.


Example. In the case where K, is a one-dimensional subspace, consisting of the
scalar multiples of a normalized vector cP, the projection operator II = II(cp) is given
by
II(cp)1 = (cp, f}CP·

Definition. An operator, U, in a Hilbert space 1i is termed a unitary operator if

(UI,Ug) = (j,g)
16.2 Linear Operators in Hilbert Spaces 421

for all vectors I, 9 in H, and if U has an inverse U- 1, i.e. UU- 1 = U-1U = I, where
I is the identity operator, i.e. I I = I for all I E H.

In other words, a unitary operator is an invertible one which preserves the form
of the scalar product in H. The above definition of unitarity is equivalent to the
condition that
U'U = UU* = I i.e. U* = U- 1.
A unitary mapping of H onto a second Hilbert space H' is an invertible transforma-
tion V, from H to H', such that

Example. Let H = e 4 . Then the permutation matrix

o 0
o 1
1 0
o0

is a unitary operator (unitary matrix).

Next we discuss operator convergence. Suppose that A and the sequence {An} are
bounded linear operators in H.

Definition. The sequence of operators An is said to converge uniformly, or in norm,


to A as n -+ 00 if

Definition. The sequence of operators An is said to converge strongly to A if Ani


tends strongly to Af for all vectors f in 'H.

Definition. The sequence of operators An is said to converge weakly to A if Anf


tends weakly to Af for all vectors f in H.

Example. Consider the Hilbert space L2(R). Let An be the translation operator

(Anf)(x) := f(x + 2n).


The operator An converges weakly to the zero operator. However, An does not
converge strongly to anything, since if it did the limit would have to be zero, whereas,
for any f, IIAnfl1 = Ilfll, which does not tend to zero. It
From these definitions, it follows that norm convergence implies strong operator
convergence, which in turn implies weak operator convergence. The converse state-
ments are applicable only if H is finite dimensional, i.e. H = en.
422 Chapter 16. Quantum Mechanics

Definition. A density matrix, p, is an operator in 'H of the form

where {II( ¢>n) } are the projection operators for an orthonormal sequence { ¢>n} of
vectors and { Wn } is a sequence of non-negative numbers whose sum is unity. Thus,
a density matrix is bounded and positive.

Definition. The trace of a positive operator B is defined to be

tr(B) := L (¢>n, B¢>n)


nEI

where {¢>n : n E I} is any orthonormal basis set. The value of tr(B), which is
infinite for some operators, is independent of the choice of basis.

It follows from these definitions of density matrices and trace that a density matrix
is a positive operator whose trace is equal to unity.

Example. Consider the Hilbert space 'H = C 4 . The Bell states

<1>+ = _1
v'2 (~)
0 '
1

1l1+ = ~v'2 (~) 1 '


1l1- _ _
1
- v'2 ( ~10)
1

form a basis in C 4 • Consider the density matrix (we apply the Dirac notation)

This density matrix describes the Werner state. This mixed state, a ~ vs. ~ singlet-
triplet mixture, can be produced by mixing equal amounts of singlets and random
16.2 Linear Operators in Hilbert Spaces 423

uncorrelated spins, or equivalently by sending one spin of an initially pure singlet


through a 50% depolarizing channel [19, 182]. Since

we find

and

We can characterize a mixed state by tr(p2) < 1.

The pure states of a quantum mechanical system are given by normalized vectors in
a Hilbert space. The expectation value of an observable A (self-adjoint operator),
for the state represented by I7/» is

(A),p := (7/>IA7/» == tr(pA) where p = 17/»(7/>1·

We have
p = p', tr(p) = 1,
For a statistical mixture of pure states, given by an orthonormal set of vectors

{7/>n : n = 1,2, ... , N }

in a Hilbert space, with respective probabilities

{W n : n = 1,2, ... , N}

where

N
LW n = 1, Wi :::0: 0 for i = 1,2, ... ,N
n=l

the expectation value of an observable is

N
(A) = L wn(7/>nI AI7/>n).
n=l
424 Chapter 16. Quantum Mechanics

Using the density matrix

N
P := L Wnl1jJn}(1jJn I
n=1

we can also write


(A) = tr(pA)
where
tr(p) = 1, p' = p
and
l i- p and tr(p2) < 1 if Wn i- 0 for more than one n.
Definition. A one-parameter group of unitary transformations of H is a family {Ut }
of unitary operators in H, with t running through the real numbers, such that

Uo =1.

The group is said to be continuous if Ut converges strongly to 1 as t tends to zero;


or equivalently, if Ut converges strongly to Uto as t tends to to, for any real to. In
this case, Stone's theorem tells us that there is a unique self-adjoint operator, K, in
H such that

!Ud = iKUd = iUtKj for all j in D(K).

This equation is formally expressed as

and iK is termed the infinitesimal generator of the group {Ut }.

Example. Let

0
K = ( -i 0
i) .

Then

u=eiKt=(C?st -sint).
t smt cost

Next we consider linear operators in tensor product space. Suppose that HI and H2
are Hilbert spaces, and that H is a third Hilbert space, defined in terms of HI and
16.2 Linear Operators in Hilbert Spaces 425

1i2 as follows. We recall that for each pair of vectors iI, 12 in 1i I , 1i2, respectively,
there is a vector in 1i, denoted by iI 181 12, such that

If Al and A2 are operators in 1iI and 1i2, respectively, we define the operator AII8iA2
in 1iI 181 1i2 by the formula

Al 181 A2 is called the tensor product of Al and A2.

Similarly, we may define the tensor product 1iI 181 1i2 181 ... 181 1in as well as that,
Al 181 A 2 18i .. ·181 An, of operators AI,'" , An. In standard notation, one writes

n
® 1ij = 1iI 181 1i2 181 ... 181 1in
j=I

and

n
® Aj = Al 181 A2 181 ... 181 An .
j=I

Al = (~ ~), A2 = (~ ~)

we obtain

~ (l (l n
0 1 1 0

~
D'
0 0 0 0
A, ® A, 0 0
A,®A, 0 0
1 0 0 1

We see that Al 181 A2 =1= A2 18i AI.


426 Chapter 16. Quantum Mechanics

Let us now discuss the spectrum of a linear operator. Let T be a linear operator
whose domain V(T) and range R(T) both lie in the same complex linear topological
space X. In our case X is a Hilbert space 1{. We consider the linear operator

T>.:=)..J -T
where A is a complex number and I the identity operator. The distribution of the
values of A for which T>. has an inverse and the properties of the inverse when it
exists, are called the spectral theory for the operator T. We discuss the general
theory of the inverse of T>. (Yosida [185]).

Definition. If Ao is such that the range R(T>.o) is dense in X and T>.o has a continuous
inverse (AoI - T)-l, we say that Ao is in the resolvent set e(T) of T, and we denote
this inverse (AoI - T)-l by R(Ao;T) and call it the resolvent (at Ao) of T. All
complex numbers A not in e(T) form a set a(T) called the spectrum of T. The
spectrum a(T) is decomposed into disjoint sets Pu(T), Cu(T) and Ru(T) with the
following properties:

Pu(T) is the totality of complex numbers A for which T>. does not have an inverse.
Pu(T) is called the point spectrum of T. In other words the point spectrum Pu(T)
is the set of eigenvalues of T; that is
Pu(T) := pEe: Tf = Ai for some nonzero f in X}.

Cu(T) is the totality of complex numbers A for which T>. has a discontinuous inverse
with domain dense in X. Cu(T) is called the continuous spectrum of T.

Ru(T) is the totality of complex numbers A for which T>. has an inverse whose
domain is not dense in X. Ru(T) is called the residual spectrum of T.

For these definitions and the linearity of the operator T we find the

Proposition. A necessary and sufficient condition for Ao E Pu(T) is that the equation

Tf = Aof
has a solution f "I 0 (J E X). In this case Ao is called an eigenvalue of T, and
f the corresponding eigenvector. The null space N(AoI - T) of T>.o is called the
eigenspace of T corresponding to the eigenvalue Ao of T. It consists of the vector 0
and the totality of eigenvectors corresponding to Ao. The dimension of the eigenspace
corresponding to Ao is called the multiplicity of the eigenvalue Ao.

Theorem. Let X be a complex Banach-space, and T a closed linear operator with


its domain V(T) and range R(T) both in X. Then, for any Ao E e(T), the resolvent
(AoI - T)-l is an everywhere defined continuous linear operator. For the proof we
refer to Yosida [185J.

Example. If the linear space X is of finite dimension, then any bounded linear
operator T is represented by a matrix (t ij ). The eigenvalues of T are obtained as
16.2 Linear Operators in Hilbert Spaces 427

the roots of the algebraic equation, the so-called secular or characteristic equation
of the matrix (t ij ):
det(.AOij - tij ) = 0
where det(.) denotes the determinant of the matrix.

Example. Consider the Hilbert space H = L2(R). Let T be defined by

Tf(x) := xf(x)
that is,
V(T) = {f(x) : f(x) and xf(x) E L 2 (R) }
and Tf(x) = xf(x) for f{x) E V(T). Then every real number ).0 is in CO'(T), i.e.
T has a purely ,continuous spectrum consisting of the entire real axis. For the proof
we refer to Yosida [185]. ,.

Example. Let X be the Hilbert space 12(N). Let T be defined by

T(Ul, U2," .)T := (0, Ul, U2," ·f.


Then 0 is in the residual spectrum of T, since R(T) is not dense in 12(N). ,.

Example. Let H be a self-adjoint operator in a Hilbert space H. The spectrum


CJ( H) lies on the real axis. The resolvent set (2( H) of H comprises all the complex
numbers). with CS()') =f. 0, and the resolvent R().; H) is a bounded linear operator
with the estimate

1
IIR().; H)II :::; ICS().)I·

Moreover,
CS((>.I - H)f, f) = CS().)llfI12, f E V(H).

Example. Let U be a unitary operator. The spectrum lies on the unit circle 1).1 = 1;
i.e. the interior and exterior of the unit circle are the resolvent set (2(U). The
residual spectrum is empty. ,.

Example. Consider the linear bounded self-adjoint operator in a Hilbert space 12(N)

o 1 0 0
1 0 1 0
o 1 0 1
A=
428 Chapter 16. Quantum Mechanics

In other words

if i=j+1
if i=j-1
otherwise

with i,j E N. We find specA = [-2,2]' i.e. we have a continuous spectrum [163].

Example. The operator -d?- /dx 2 , with a suitably chosen domain in L2 (R) has a
purely continuous spectrum consisting of the nonnegative real axis. The negative
real axis belongs to the resolvent set. ..

Example. The operator -d?- /dx 2 + x2, with a suitably chosen domain in L2(R),
has a pure point spectrum consisting of the positive odd integers, each of which is
a simple eigenvalue. ..

Example. Let 1-£ = b(N). Let A be the unitary operator that maps

onto

The point spectrum is empty and the continuous spectrum is the entire unit circle
in the>. plane. ..

Example. In the Hilbert space 1-£ = 12 (N U {O} ), annihilation and creation operators
denoted by band b* are defined as follows. They have a common domain

00

V l = V(b) = V(b*) = {u = (uo, Ul, U2, ... ) : L nlunl2 < oo}.


n=O

Then bu and b*u are given by

b(uo, Ul, u2,···f .- (Ul, V2U2, V3u3,···f


b*(uo, Ul, U2, ... )T (0, Uo, V2Ul, V3U2," .f.
The physical interpretation for a simple model is that the vector

!.pn = (0,0, ... ,0, Un = 1,0, .. .)T

represents a state of a physical system in which n particles are present. In particular,


!.po represents the vacuum state, i.e.
b!.po = b(l, 0, 0, .. y = (0,0, .. y.
16.2 Linear Operators in Hilbert Spaces 429

The action of the operators band b* on these states is given by

bipn ,[iiipn-l
b*ipn In + 1ipn+1'
We find that b* is the adjoint of b. We can show that

b*b-bb* =-1

in the sense that for all u in a certain domain 'D2 ( C 'Dd


b*bu - bb*u = -u.

The operator denotes the identity operator. The operator N = bOb with domain 'D2
is called the particle-number operator. Its action on the states ipn is given by

Nipn = Nipn
where N = 0,1,2, .... Thus the eigenvalues of N are N = 0,1,2, .... The point
spectrum of b is the entire complex plane. The point spectrum of b* is empty. The
equation
b*u = Au
implies u = 0. We can show that the residual spectrum of b* is the entire complex
plane. ,.

Remark. Instead of the notation ipn the notation In) is used in physics, where
n = 0, 1,2, ....

Remark. A point Ao in the spectrum a(A) of a self-adjoint operator A is a limit point


of a(A) if it is either an eigenvalue of infinite multiplicity or an accumulation point
of a(A). The set al(A) of all limit points of a(A) is called the essential spectrum of
A, and its complement ad(A) = a(A) \ al(A), i.e., the set of all isolated eigenvalues
of finite multiplicity, is called the discrete spectrum of A.

Now we discuss the spectral analysis of a self-adjoint operator. Suppose that A is a


self-adjoint, possibly unbounded, operator in 11 and cp is a vector in 11, such that
Acp=)..cp
where A is a number, then cp is termed an eigenvector and A the corresponding
eigenvalue of A. The self-adjointness of A ensures that A is real. The self-adjoint
operator A is said to have a discrete spectrum if it has a set of eigenvectors {CPn :
n E I} which form an orthonormal basis in 11. In this case, A may be expressed in
the form

where II( CPn) is the projection operator and An the eigenvalue for CPn.
430 Chapter 16. Quantum Mechanics

In general, even when the operator A does not have a discrete spectrum, it may
still be resolved into a linear combination of projection operators according to the
spectral theorem [137] which serves to express A as a Stieltjes integral

A= JAdE(A)
where {E(A)} is a family of intercommuting projectors such that

E(-oo) 0
E(oo) I
E(A) < E(X) if A < A'

and E(X) converges strongly to E(A) as X tends to A from above. Here E(A) is a
function of A, i.e. X..\(A), where

I for x < A
X..\(x) = { 0
forx ~ A.

In the particular case where A has a discrete spectrum, i.e.

then
E(A) = L II(q'Jn) .
..\n<..\

In general, it follows from the spectral theorem that, for any positive N, we may
express A in the form

where

J AdE(A) J AdE(A) + J AdE(A).


N+O -N-O 00

AN = and A~ =
-N-O -00 N+O
16.3 Schmidt Decomposition 431

Thus, A is decomposed into parts, AN and A~, whose spectra lie inside and outside
the interval [- N, N], respectively, and

A = lim AN
N---+oo

on the domain of A. This last formula expresses unbounded operators as limits of


bounded ones.

Let us consider two self-adjoint operators

A:= J )"dE()") , B:= J )"dF()") .


R R

They are said to commute if

for all ).., J1..

Since A and B are generally unbounded, one cannot say that AB = BA unless
the domains of AB and BA happen to be the same, whereas E()") and F()..) are
defined on allH; however ABu = BAu for all u (if any) such that both sides of the
equation are meaningful. Commuting operators A and B are said to have a simple
joint spectrum or to form a complete set of commuting observables if there is an
element X in H such that the closed linear span of the elements

{E()..)F(J1.)x : -00 < J1., A < 00 }

is all of H. If A and B are two bounded operators in a Hilbert space we can define
the commutator [A, B] := AB - BA in the sense that for all u E H we have

[A, B]u = (AB)u - (BA)u = A(Bu) - B(Au).

Important special cases are discussed in Steeb [163].

16.3 Schmidt Decomposition


Let Hl and H2 be two finite dimensional Hilbert spaces with the underlying field
C. We define for the coupled system H 1 Q9 H2 (described by the density operator
PH,0H2)
PH, := trH2 (Prt,0H2)
where tr'H2 denotes the partial trace over H 2 , i.e we use 1m Q91,8j) as the basis for the
trace where l,8j) is an orthonormal basis in H 2 • Let dim(H 1 ) = m and dim(H 2 ) = n.
432 Chapter 16. Quantum Mechanics

Thus dim(H l 0 H2) = m· n. An arbitrary normalized vector l'l/ih2 in the tensor


product space HI 0H 2 can be expanded as

m n
l'l/ih2 = L2:>ijlih 01jh
i=l j=l

where aij E C and { lih 1i = 1, ... , m } and { Ij)2 1j = 1, ... ,n } are orthogonal
basis for HI and H2, respectively. Let

I~i) := L aijljh
j=l

We notice that the I~i) need not be mutually orthogonal or normalized. Thus l'l/ih2
can be written as

l'l/ih2 = L lih 01~i)'


i=l

Let

and let

be the partial traces.

Theorem.
1. PI and P2 have the same nonzero eigenvalues AI, . .. ,Ak (with the same multi-
plicities) and any extra dimensions are made up with zero eigenvalues, where
k :::; min(m, n). There is no need for HI and H2 to have the same dimension,
so the number of zero eigenvalues of PI and P2 can differ.
2. The state l'l/ih2 can be written as

k
l'l/ih2 = L I\lih @ l<I>i)
i=l

where lih (respectively l<I>i)) are orthonormal eigenvectors of PI in HI (respec-


tively P2 in H2) belonging to Ai' This expression is called the Schmidt polar
form or Schmidt decomposition.
16.3 Schmidt Decomposition 433

Proof As stated above l1/Jh2 can be written as

11/J)12 = L lih @ I¢i)


i=1

where the I¢i) are (not necessarily orthogonal) states in H 2 • Taking the partial trace
of l1/Jh2 12(1/J1 over H2 and equating to

P1 = L ·\Iih 1(il gives


i=1

Hence it turns out that the { I¢i) } are orthogonal after all. Thus at most min(m, n)
eigenvalues are non-zero. Consequently, the set of states

is an orthonormal set in the Hilbert space H 2 , where we exclude the zero eigenvalues.
It follows that

k
l1/Jh2 = L ~Iih @ I¢i)
i=1

and taking the partial trace over H1 gives

k
P2 = L,\;I¢i)(¢il·
i=1

Example. Consider dim(H1) = 3, dim(H2) = 2 and



434 Chapter 16. Quantum Mechanics

We have

The eigenvalues of PI are ~, ~ and O. The eigenvalues of P2 are ~ and ~. Thus

PI ~O)(l 0 o)+~m(o o 1)

P2 ~(~)(1 O)+~(~)(O 1)

11/J} ~m®m+~m®(n

16.4 Spin Matrices and Kronecker Product


In this section we study spin systems. In Pauli's nonrelativistic theory of spin certain
spin wave functions, vectors, or spinor functions - along with spin operators, or
matrices - are introduced to facilitate computation. We define

Ii) .- (~ ) spin-up vector

II} .- (~ ) spin-down vector

and

CTx := (~ ~), CTy :=


(0 -i)
i 0 ' CTz := (~ ~1)'
16.4 Spin Matrices and Kronecker Product 435

The matrices O"x, O"y, o"z, are called the Pauli spin matrices. Let I be the 2 x 2 unit
matrix. We find the following relationships. After squaring the spin matrices, we
have
0"; = I, O"~ = I, 0"; = I.
Since the squares of the spin matrices are the 2 x 2 unit matrix, their eigenvalues
are ±l. The anticommutators are given by

where 0 is the 2 x 2 zero matrix. Summarizing these results we have


O"iO"j + O"jO"i = 26ijI
where i and j may independently be x, y, or z and 6ij is the Kronecker delta. The
matrices I, O"x, O"y and o"z form an orthogonal basis in the Hilbert space M2. This
means every 2 x 2 matrix can be written as
M = CxO"x + cyO"y + CzO"z + cd
where cx , cy, cz , Cl E C. Another orthonormal basis (standard basis) is given by

The trace of a matrix is the sum of the diagonal terms. For all three Pauli spin
matrices the trace is zero. The Pauli spin matrices are self-adjoint operators (her-
mitian matrices) and therefore have real eigenvalues. The commutators are given
by

These three relationships may be combined in a single equation u x u = 2iu, where


x denotes the vector product and u = (o"x, O"y, O"z)T. We define

These are the spin-flip operators. We define


436 Chapter 16. Quantum Mechanics

The two matrices are projection matrices. As mentioned above the four matrices
O"±, A± form an orthonormal basis in the Hilbert space M2.
Let us now study the action of spin matrices on spin vectors. A vector u E C 2 can
be written as

where Ul, U2 E C. We find the following relations

O"xl j) = 11), O"xll) = I j)

O"yl j) = ill), O"yll) = -il I),


O"zl j) = I j), O"zll) = -11)
and

Furthermore we find

The projection operators A± select the positive or negative spin components of a


vector

and

The matrices O"± and A± obey

0"; = 0,
16.4 Spin Matrices and Kronecker Product 437

In studying spin systems such as the Heisenberg model, the XY model and the
Dirac spin matrices we have to introduce the Kronecker product (Steeb [162]). Also
in the spectral representation of hermitian matrices the Kronecker product plays an
important role.

Definition. Let A be an m x n matrix and let B be a p x q matrix. Then

A I8i B is an (mp) x (nq) matrix and I8i is the Kronecker product (sometimes also
called tensor product or direct product).

We have the following properties. Let A be an m x n matrix, B be a p x q matrix,


C be an n x r matrix and D be an r x s matrix. Then

(A I8i B)(C I8i D) = (AC) I8i (BD)

where AC and BD denote the ordinary matrix product. An extension is

The size of the matrices must be such that the matrix products exist. Further rules
are

AI8i (B+ C)
(A I8i Bf
BI8iA P(AI8i B)Q

where P and Q are certain permutation matrices. Let A be an m x m matrix and


let B be a p x p matrix. Then

tr(A I8i B) (trA) (trB)


(A I8i B)-1 A-I I8i B- 1 if A-I and B- 1 exist

det(A I8i B) (detA)P( detB)m

where tr denotes the trace and det the determinant. The Kronecker product of two
orthogonal matrices is again an orthogonal matrix.

Theorem. Let A be an m x m matrix and B be a p x p matrix. Let AI, A2, ... , Am


be the eigenvalues of A. Let /-Ll,/-L2, ... ,/-Lp be the eigenvalues of B. Then Aj/-Lk
438 Chapter 16. Quantum Mechanics

(j = 1, ... ,m; k = 1, ... ,p) are the eigenvalues of A @ B. Let Uj (j = 1, ... , m)


be the eigenvectors of A. Let Vk (k = 1, ... ,p) be the eigenvectors of B. Then
Uj @ Vk (j = 1, ... , m; k = 1, ... ,p) are the eigenvectors of A @ B.

Theorem. Let A be an m x m matrix and B be a p x p matrix. Let A1, A2,'" ,Am


be the eigenvalues of A. Let J.l1, J.l2,' .. ,J.lP be the eigenvalues of B. Then Aj + J.lk
(j = 1, ... ,m; k = 1, ... ,p) are the eigenvalues of A @ Ip + 1m @ B. Let Uj (j =
1, ... , m) be the eigenvectors of A. Let Vk (k = 1, ... ,p) be the eigenvectors of B.
Then Uj @Vk (j = 1, ... ,m; k = 1, ... ,p) are the eigenvectors of A@ Ip + 1m @B.

For the proofs we refer to Steeb [162].

With the help of the eigenvalues and eigenvectors of a hermitian matrix A we can
reconstruct the matrix A using the Kronecker product.

Theorem. Let A be an n x n hermitian matrix. Assume that the eigenvalues


A1,' .. ,An are all distinct. Then the normalized eigenvectors
U1, U2, ... , Un are
orthonormal and form an orthonormal basis in the Hilbert space Thenen.

A= LAjuj@Uj.
n

j=l

Example. We consider the matrix

A=(~ ~)

with eigenvalues A1 = +1, A2 = -1. The normalized eigenvectors are given by

Since

(a, b) @ ( de) = (aead bdbe)


16.4 Spin Matrices and Kronecker Product 439

we find that the spectral representation of the matrix A is given by

Moreover we have

and

m= Ill, m= II2 and IIlII2 = O.


where Ih and II2 are projection matrices. Thus
..
The Dirac spin matrices aI, a2, a3, and 13 can be constructed using the Pauli spin
matrices and the Kronecker product. These matrices play a central role in the
description of the electron. We define

a,~ u. u. ~ (~
® ! 0
1 0
o 01
1 0 o 0
1 a,~
' (! Il
u. ®u, ~
0
0
-~

0
0

0
0

a,~u.®u, ~ (j ~~u.®l,~ (! ~1
0 1 0 0
0
0
0
0 ~1o 1' 1 0
0 -1
-1 0 0 0 0 -1

The 4 x 4 matrices aI, a2, a3 and 13 satisfy the rules


132 = I, aiaj + ajai = 26ij I, aif3 + f3ai =0
where I is the 4 x 4 unit matrix.
440 Chapter 16. Quantum Mechanics

The spin matrices are defined by

Definition. Let j = 1,2, ... , N. We define

where I is the 2 x 2 unit matrix, a = x, y, z and O'a is the a-th Pauli matrix in the
j-th location. Thus O'a,; is a 2N x 2N matrix. Analogously, we define

Sa,; := I ® ... ® I ® Sa ® I ® ... ® I.

In the following we set


S; := (Sx,j, Sy,;, Sz,;)T.
We calculate the eigenvalues and eigenvectors for the two-point Heisenbery model.
The Heisenberg model is used to describe interacting spin-systems [162]. The model
is given by

2
if = JLSj' S;+1
;=1

where J is the so-called exchange constant (J > 0 or J < 0) and . denotes the scalar
product. We impose cyclic boundary conditions, i.e. 8 3 == 8 1 , It follows that

Therefore

Since
Sx,1 = Sx ®I, Sx,2 = I®Sx
etc. where I is the 2 x 2 unit matrix, it follows that

if = J[(Sx ® 1)(1 ® Sx) + (Sy ® 1)(/ ® Sy) + (Sz ® 1)(/ ® Sz)


+(1 ® Sx)(Sx ® I) + (I ® Sy)(Sy ® I) + (I ® Sz)(Sz ® I)].

Thus we obtain

if = 2J[(S" ® S,,) + (Sy ® Sy) + (S", ® S",)].


16.4 Spin Matrices and Kronecker Product 441

Since

we obtain

o 0
o 1
1 0
o 0

etc. Then the Hamilton operator iI is given by the 4 x 4 symmetric matrix

where EB denotes the direct sum of matrices. The eigenvalues and eigenvectors can
now easily be calculated. We define

I it) := I j) @ I j), I il) := I j) @ 11), Ill):= 11) @ I j), I 11) := 11) @ 11)
where I j) and 11) have been given above. Consequently,

Obviously these vectors form the standard basis in C 4 . One sees at once that I it)
and I 11) are eigenvectors of the Hamilton operator with eigenvalues J /2 and J /2,
respectively. This means the eigenvalue J /2 is degenerate. The eigenvalues of the
matrix

{(-1 2)
2 2-1
442 Chapter 16. Quantum Mechanics

are given by J /2 and -3J/2. The corresponding eigenvectors are given by

1
2(1 it) + I H)), ~(I it) -I H))·

16.5 Postulates of Quantum Mechanics


Quantum mechanics, as opposed to classical mechanics, gives a probabilistic descrip-
tion of nature. The probabilistic interpretation of measurement is contained in one
of the standard postulates of quantum mechanics (G limm and Jaffe [71], Prugovecki
[135], Schommers [142]).

Remark. More than sixty years after the formulation of quantum mechanics the
interpretation of this formalism is by far the most controversial problem of current
research in the foundations of physics and divides the community of physicists into
numerous opposing schools of thought. There is an immense diversity of opinions
and a huge variety of interpretations. A more detailed discussion of the interpreta-
tion of the measurement in quantum mechanics is given in chapter 18.

The standard postulates of quantum mechanics are

PI. The pure states of a quantum system, S, are described by normalized vectors
1/J which are elements of a Hilbert space, 1i, that describes S. The pure states of a
quantum mechanical system are rays in a Hilbert space 1i (Le., unit vectors, with
an arbitrary phase). Specifying a pure state in quantum mechanics is the most that
can be said about a physical system. In this respect, it is analogous to a classical
pure state. The concept of a state as a ray in a Hilbert space leads to the probability
interpretation in quantum mechanics. Given a physical system in the state 1/J, the
probability that it is in the state X is I(1/J, xW. Clearly

While the phase of a vector 1/J has no physical significance, the relative phase of
two vectors does. This means for lad = 1, l(a1/J,x)1 is independent of a, but 1(1/Jl +
a1/J2, X) I is not. It is most convenient to regard pure states 1/J simply as vectors in
1i, and to normalize them in an appropriate calculation.

PII. The states evolve in time according to


16.5 Postulates of Quantum Mechanics 443

where H is a self-adjoint operator which specifies the dynamics of the system S.


This equation is called the Schrodinger equation. The formal solution takes the
form
1jJ(t) = exp( -iHt/li)1jJ(O)
where 1jJ(0) == 1jJ(t = 0) with (1jJ(0),1jJ(0)) = 1. It follows that (1jJ(t),1jJ(t)) = 1.

Example. Consider the Hamilton operator

in the Hilbert space C 3 , where w is the constant frequency. Then we find

! + ! cos(wt) - ~sin(wt)
(
exp( -iHt/li) = - ~ sin(wt) cos(wt)
! cos(wt) -! - ~sin(wt)

Let

1jJ(0) = ~(1, 1, l)T

be the initial state. Then

cos(wt) - ~ sin(wt) )
exp( -iHt/li)'I/J(O) = Ja ( cos(wt) - ..J?i sin(wt) .
cos(wt) - ~ sin(wt)

The probability
p(t) = 1('I/J(t),'I/J(O)W
is given by

p(t) = 1 - ~ sin2 (wt) .


444 Chapter 16. Quantum Mechanics

PIlI. Every observable, a, is associated with a self-adjoint operator A. The only


possible outcome of a measurement of a is an eigenvalue Aj of A, i.e.

where <Pj is an eigenfunction.

PIV. If the state of the system is described by the normalized vector 'IjJ, then a
measurement of a will yield the eigenvalue Aj with probability

Notice that (<pj, 'IjJ) can be complex. It is obvious that 0 ::; Pj ::; 1.

In order for successive measurements of a to yield the same value Aj it is necessary


to have the projection postulate:

PV. Immediately after a measurement which yields the value Aj the state of the
system is described by IIj'IjJ, where IIj is the projection operator which projects onto
the eigenspace of the eigenvalue Aj •

The type of time evolution implied by PV is incompatible with the unitary time
evolution implied by PII. PIV can be replaced by the weaker postulate:

PIV'. If a quantum system is described by the state <Pj then a measurement of a


will yield the value Aj .

Clearly PIV' is a special case of PIV but it is not a statement about probabilities.
The replacement of PIV by PIV' eliminates the immediate need for PV since the
state is <Pj before and after the measurement.

PVI. Quantum mechanical observables are self-adjoint operators on Ji. The ex-
pected (average) value of the observable b with the corresponding self-adjoint oper-
ator B in the normalized state 'IjJ is

E",(B) := ('IjJ, B'IjJ).

Examples of observables are the Hamiltonian (energy) observable, the momentum


observable, and the position observable.

The statistical mixtures in quantum mechanics lead to quantum statistical mechan-


ics. The usual statistical mixture is described by a positive trace class operator p,
yielding the expectation

p(B) = tr(pB)
trp
16.5 Postulates of Quantum Mechanics 445

where tr denotes the trace. If p has rank 1, then p(B) is a pure state with p/trp the
projection onto 1/J. Otherwise, p(B) is a convex linear combination of pure states,

p(B) = LG:j(<pj,B<pj)
j

where the <Pj are the (orthonormal) eigenvectors of p and Lj G:j = 1.

PVII. The Hamilton operator H is the infinitesimal generator of the unitary group
U(t) := exp( -itH /h)
of time translations. The unit of action h (h/27r) has the same dimension as pq,
where p is the momentum and q is the position [59].

The momentum operator p is the infinitesimal generator of the unitary space trans-
lation group
exp(iq· Plh)
where

N 3
q. p:= L LqkjPkj.
k=lj=l

We recall that
exp(a· V)u(q) = u(q + a)
where u is a smooth function and

and q = (ql1, q12, q13, q2b·· ., qN3).

The angular momentum operator j is the infinitesimal generator for the unitary
space rotation group
exp( -i8 . j).
Remark. This leads to the quantization. Consider the energy conservation equation

E=
N
LL
3

k=lj=l
:k 2
j
mk
+ V(q).
446 Chapter 16. Quantum Mechanics

We make the formal substitution

.to 8
Pkj -+ - Z n - .
8qkj

We arrive at the formal operator relation

Applying this operator relation to a wave function ,¢(q, t) we obtain the Schrodinger
equation.

The time translation group U(t) determines the dynamics. There are two standard
descriptions: the Schrodinger picture and the Heisenberg picture. In the Schrodinger
picture, the states '¢ E 1f. evolve in time according to the Schrodinger equation, while
the observables do not evolve. The vectors satisfy the Schrodinger equation. The
time-dependent normalized state '¢ (t) yields the expectation

E';'(t)(B) = ('¢(t) , B'¢(t)).

The second description of dynamics is the Heisenberg picture, in which the states
remain fixed, and the observables evolve in time according to the automorphism
group

B -+ B(t) = eitH/IiBe-itH/1i = U(t)* BU(t).


Obviously we assume that the Hamilton operator does not depend explicitly on
t. Thus the observables B satisfy the dynamical equation (Heisenberg equation of
motion)

-in d~;t) = [if, B(t)]

with the formal solution

(it In)n
B(t) = E
00

n=O
-,-[H, [H, ... , [H, B], .. .]] = exp(zHtln)Bexp(-zHtln).
n.
A A A • A • A
16.5 Postulates of Quantum Mechanics 447

The relation between the Heisenberg and Schrodinger pictures is given by

('l/J(t), B'l/J(t)} = ('l/J, B(t)'l/J)


where 'l/J = 'l/J(t = 0).

Postulate PVII ensures that the results of an experiment, i.e., inner products ('l/J, X),
are independent of the time at which the experiment is performed. This means

I('l/J, X) I = I('l/J(t), x(t))I·


Theorem. Every symmetry of 1i can be implemented either by a unitary transfor-
mation U on 1i,
'l/J' = U'l/J
or by an anti unitary operator A on 1i

'l/J'=A'l/J.

The interpretation of this result is that every symmetry of 1i can be regarded as


a coordinate transformation. In particular, the group of time translations is im-
plemented by a unitary group of operators U(t). Only certain discrete symmetries
(e.g., time inversion in nonrelativistic quantum mechanics) are implemented by an-
tiunitary transformations.

Example. In nonrelativistic quantum mechanics one usual representation for a sys-


tem of N particles moving in a potential V is

This choice is called the Schrodinger representation (as distinct from the Schrodinger
picture). The function 'l/J(q) E 1i has the interpretation of giving the probability
distribution
p(q) = 1'l/J(q)12
for the position of the particles in R3N. Using postulate PVII, we find

'J. {)
= - Z{)qkj
A

Pk" -+ Pk" n-
J J

and a nonrelativistic Hamilton function of the form

N 3 2
H=EE:kj +V(q)
k=lj=l mk
448 Chapter 16. Quantum Mechanics

becomes the elliptic differential operator

, N 3 n,2 82 ,
H=-LL--2
k=l j=l 2mk 8qkj
+V(q).

In other words the Hamilton operator if follows from the Hamilton function H via
the quantization

The operator rlkj is defined by rlkjf(q) .- qkjf(q). We find for the (canonical)
commutation relations

[rlkj,qk1j'] 0
[Pkj, Pk' j'] 0
[Pkj,rlk1j'] -iMkk1ojjlI

They are preserved by the Heisenberg equation of motion.

Thus far the spin of the particle is not taken into account. We have spin 0 for 7r
mesons, spin ~ for electrons, muons, protons, or neutrons, spin 1 for photons, and
higher spins for other particles or nuclei. To consider spin-dependent forces (for
example the coupling of the spin magnetic moment to a magnetic field) we have to
extend the Hilbert space L 2 (R3N ) to the N-fold tensor product

Here L2(R3, S) denotes functions defined on R3N with values in the finite dimen-
sional spin space S. For spin zero particles we have S = C, and we are reduced
to L 2 (R3N ). For nonzero spin s, we have S = C 2s+l. We write ¢(q) as a vector
with components 11'( q, (). A space rotation (generated by the angular momentum
observable J) will rotate both q and (, the latter by a linear transformation of the
( coordinates according to an N-fold tensor product of a representation of the spin
group SU(2, R). The group SU(2, R) consists of all 2 x 2 matrices with

UU' =I and detU = 1

Particles of a given type are indistinguishable. To obtain indistinguishable particles,


we restrict ourselves to a subset of 0L 2(R 3 , S) invariant under an irreducible repre-
sentation of the symmetric group (permutation group) of the N particle coordinates
16.5 Postulates of Quantum Mechanics 449

(qk, (k), k = 1, 2, ... , N. The standard choices are the totally symmetric repre-
sentation for integer spin particles and the totally antisymmetric representation for
half-integer spin particles.

The choice of antisymmetry for atomic and molecular problems with spin ~ is known
as the Pauli exclusion principle. One can prove that integer spin particles cannot be
antisymmetrized and half-integer spin particles cannot be symmetrized. Particles
with integer spin are called bosons. Those with half-integer spin are called fermions.

Postulate VIII. A quantum mechanical state is symmetric under the permutation


of identical bosons, and antisymmetric under the permutation of identical fermions.
Chapter 17
Quantum Bits and Quantum Computation

17.1 Introduction
Digital computers are based on devices that can take on only two states, one of
which is denoted by 0 and the other by 1. By concatenating several Os and Is
together, 0-1 combinations can be formed to represent as many different entities
as desired. A combination containing a single 0 or 1 is called a bit. In general,
n bits can be used to distinguish among 2n distinct entities and each addition of
a bit doubles the number of possible combinations. Computers use strings of bits
to represent numbers, letters, punctuation marks, and any other useful pieces of
information. In a classical computer, the processing of information is done by logic
gate. A logic gate maps the state of its input bits into another state according to a
truth table. Quantum computers require quantum logic, something fundamentally
different to classical Boolean logic. This difference leads to a greater efficiency of
quantum computation over its classical counterpart.

In the last few years a large number of authors have studied quantum computing
([122], [8]). The most exciting development in quantum information processing has
been the discovery of quantum algorithms - for integer factorization and the discrete
logarithm - that run exponentially faster than the best known classical algorithms.
These algorithms take classical input (such as the number to be factored) and yield
classical outputs (the factors), but obtain their speedup by using quantum inter-
ference computation paths during the intermediate steps. A quantum network is a
quantum computing device consisting of quantum logic gates whose computational
steps are synchronised in time. Quantum computation is defined as a unitary evo-
lution of the network which takes its initial state input into some final state output.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
452 Chapter 17. Quantum Bits and Quantum Computation

17.2 Quantum Bits and Quantum Registers


17.2.1 Quantum Bits

In a quantum computer the quantum bit [8, 18, 21, 132, 138, 156, 178] or simply
qubit is the natural extension of the classical notion of bit. A qubit is a quantum
two-level system, that in addition to the two pairwise orthonormal states 10) and
11) in the Hilbert space C 2 can be set in any superposition of the form

Co, Cl E C.

Since 1'ljJ) is normalized, i.e. ('ljJI'ljJ) = 1, (111) = 1, (010) = 0, and (011) = 0 we have

ICol 2 + ICll2 = 1.

Any quantum two-level system is a potential candidate for a qubit. Examples are
the polarization of a photon, the polarization of a spin-1/2 particle (electron), the
relative phase and intensity of a single photon in two arms of an interferometer, or
an arbirary superposition of two atomic states. Thus the classical Boolean states,
oand 1 can be represented by a fixed pair of orthogonal states of the qubit. In the
following we set

10):= (n, 11) := (~) .

Often the representations of 10) and 11) are reversed, this changes the matrix repre-
sentation of operators but all computations and results are equivalent. In the follow-
ing we think of a qubit as a spin-1/2 particle. The states 10) and 11) will correspond
respectively to the spin-down and spin-up eigenstates along a pre-arranged axis of
quantization, for example set by an external constant magnetic field. Although a
qubit can be prepared in an infinite number of different quantum states (by choosing
different complex coefficient £;'s) it cannot be used to transmit more than one bit of
information. This is because no detection process can reliably differentiate between
non-orthogonal states. However, qubits (and more generally information encoded
in quantum systems) can be used in systems developed for quantum cryptogra-
phy, quantum teleportation or quantum dense coding. The problem of measuring a
quantum system is a central one in quantum theory. In a classical computer, it is
possible in principle to inquire at any time and without disturbing the computer)
about the state of any bit in the memory. In a quantum computer, the situation
is different. Qubits can be in superposed states, or can even be entangled with
each other, and the mere act of measuring the quantum computer alters its state.
Performing a measurement on a qubit in a state given above will return 0 with
probability ICol 2 and 1 with probability IClI2. The state of the qubit after the mea-
surement (post-measurement state) will be 10) or 11) (depending on the outcome),
17.2 Quantum Bits and Quantum Registers 453

and not ColO) + clI1). We think of the measuring apparatus as a Stern-Gerlach de-
vice [23,68] into which the qubits (spins) are sent when we want to measure them.
When measuring a state of outcomes 0 and 1 will be recorded with a probability
ICol 2 and ICll2 on the respective detector plate.

17.2.2 Quantum Registers


To study quantum computing we need a collection of qubits (a quantum register).
Thus we call a collection of qubits a quantum register. This leads to the tensor
product (product Hilbert space) of the Hilbert space C 2 . Since we consider finite
dimensional Hilbert spaces over C we can identify the tensor product with the Kro-
necker product. As in the classical case, it can be used to encode more complicated
information.

For instance, the binary form of 9 (decimal) is 1001 and loading a quantum register
with this value is done by preparing four qubits in the state

19) == 11001) == 11) @ 10) @ 10) @ 11).

Thus the state 19) == 11001) is an element in the Hilbert space C16. In the literature
the notation
11)10)10)11)
is sometimes used, i.e. the symbol @ is omitted. Consider first the case with two
quantum bits. Then we have the basis

100) 10) @ 10),


101) 10) @ 11),
110) 11) @ 10),
Ill) 11) @ 11)
in the Hilbert space C 4 . Thus the number of states is 22 = 4.

Consider now the n qubit case. We use the notion

which denotes a quantum register prepared with the value

From la) we find

Thus the scalar product in the product space is


454 Chapter 17. Quantum Bits and Quantum Computation

Two states la) and Ib) are orthogonal if aj i= bj for at least one j. For an n-bit
register, the most general state can be written as

2n-l
I1/;) = L cxlx)
x=Q

where

2n-l
L
x=o
Icx l2 = 1.

Thus I1/;) is a state in the Hilbert space C 2n .

Quantum data processing consists of applying a sequence of unitary transformations


to the state vector 11/;). This state describes the situation in which several different
values of the register are present simultaneously; just as in the case of the qubit,
there is no classical counterpart to this situation, and there is no way to gain a
complete knowledge of the state of a register through a single measurement.

Measuring the state of a register is done by passing, one by one, the various spins
that form the register into a Stern-Gerlach apparatus and recording the results. For
instance a two-bit register initially prepared in the state

17/1) = ~(IO) @ 10) + 11) @ 11))

will, with equal probability, result in either two successive clicks in the down-detector
or two successive clicks in the up-detector. The post measurement state will be either

depending on the outcome. A record of a click-up followed by a click-down, or the


opposite (click-down followed by click-up), signals an experimental or a preparation
error, because neither
11) @ 10) nor 10) @ 11)
appear in the state 11/;).
17.3 Entangled States 455

17.3 Entangled States


Entangled quantum states [165] are an important component of quantum computing
techniques such as quantum error-correction, dense coding and quantum teleporta-
tion. Entanglement is the characteristic trait of quantum mechanics which enforces
its entire departure from classical lines of thought. We consider entanglement of
pure states. Thus a basic question in quantum computing is as follows: given a
normalized state Iu) in the Hilbert space C 4 , can two normalized states Ix) and Iy)
in the Hilbert space C 2 be found such that

Ix) @ Iy) = lu)

where @ denotes the Kronecker product [162, 163]. In other words, what is the
condition on lu) such that Ix) and Iy) exist? If no such Ix) and Iy) exist then lu)
is said to be entangled. If Ix) and Iy) do exist we say that lu) is not entangled. As
an example the state

is not entangled. The Bell basis states [132] (which form a basis in C 4 ) are given by

(17.1 )

where lab) = la) @ Ib) and

I!) := ( ~) .

Consider an arbitrary orthonormal basis {la),I,e)} in C 2 • The state la) can be


expressed as
la) = all) + bl !),
with a, b E C and lal 2 + Ibl 2 = 1. Thus
456 Chapter 17. Quantum Bits and Quantum Computation

where () E R, and a and b denote the complex conjugate of a and b respectively.


Now consider the state

1
v'2 (lam - l!3a))

iO
~(I H) -I it))

Thus measurement of q,- always yields opposite outcomes for the two qubits, inde-
pendent of the basis. The Bell states

cp+ = _1
v'2
(~) 0 '
1

q,+ = ~v'2 ~
(
1
)
'
q, __ ~
- v'2 ( ~10)
1

are entangled. The entangled state w+ is also called the EPR state, after Einstein,
Podolsky and Rosen [60]. Entangled states exhibit nonlocal correlations. This
means that two entangled systems which have interacted in the past and are no
longer interacting still show correlations. These correlations are used for example
in dense coding and quantum error-correction techniques [19, 132]. The Bell states
can be characterized as the simultaneous eigenvectors of the 4 x 4 matrices
17.3 Entangled States 457

where 0') and 0'3 are the Pauli spin matrices

The measure for entanglement for pure states E(u) is defined as follows [19, 132J

where the density matrices are defined as

PA:= trBlu)(ul,

and
S(p) := -trplog2 p.
°
Thus S; E S; 1. If E = 1 we call the pure state maximally entangled. If E = 0,
the pure state is not entangled.

As an example consider Bohm's singlet state

17jI) := ~(I it) - Ill))

which is 1l1- in the Bell basis. Since

I7jI)(7j1 I =! 0 °1 _oo~ O~o)


( 00-1
2

° °
we find

PB .- trA(17jI)(7jIi)
WI Q9 h) I7jI)(7j1 I(I j) Q9 h) + WI Q9 12 )17jI)(7jI1(ll) Q9 12 )

!12

where 12 is the 2 x 2 unit matrix. Thus


458 Chapter 17. Quantum Bits and Quantum Computation

This state is maximally entangled. As another example consider


It/» ;= I T!) == It) ® I!)·
Since

It/»(t/>I = ( 0~ 0~1 ~ ~~)


we find

Thus
E", = S(PB) = -trpB log2 PB = O.
This state is not entangled.

The Schmidt number can also be used to characterize entanglement. The Schmidt
number is the number of nonzero eigenvalues of PA and PB. A pure state is entangled
if its Schmidt number is greater than one. In this case we have E > O. Otherwise
the pure state is not entangled and we have E = O.
Next we derive the requirement for a state in C 4 to be entangled. We use the
representation

Ix) =( :~ ) ,

Since In) is normalized at least one of Ul, U2, U3, U4 is nonzero. From the normaliza-
tion conditions and (1) we find
IUll 2+ IU21 2 + IU31 2+ IU412 1 (17.2)
IXll2 + IX212 1 (17.3)
IYll2 + IY21 2 1 (17.4)
XIYl (17.5)
XIY2 (17.6)
X2Yl (17.7)
X2Y2 (17.8)
17.3 Entangled States 459

From (17.5)-(17.8) we find that the condition on lu) is given by

(17.9)
From (17.3)-(17.8) we obtain

IXl12 = IUl1 2 + IU212 (17.10)


IX212 = IU312 + IU412 (17.11)
IYl1 2 = IUl12 + IU31 2 (17.12)
IY21 2 = IU212 + IU412 (17.13)
Let

Now equations (17.5)-(17.8) become

0:1 + (31 arg(U1) mod 27r (17.14)


0:1 + (32 arg(U2) mod 27r (17.15)
0:2 + (31 arg(U3) mod 27r (17.16)
0:2 + (32 arg( U4) mod 27r . (17.17)
Suppose that (17.9) holds, then a solution is given by

(17.18)

Y1 = (VIUlI2 + IU312)ei1h , (VIU212 + I U41 2 )ei!12


Y2 = (17.19)
0:1 = 0, 0:2 = arg(U3) - (31 (17.20)
(31 = arg(U1), (32 = arg(U2) (17.21 )
The decomposition (if possible) of lu) is not unique. For example

and
460 Chapter 17. Quantum Bits and Quantum Computation

This follows from the fact that if lu) = Ix) Q9 Iy) is a decomposition of lu) then

BER

is also a decomposition of lu).

XIk, Yll, X2k, Y21 i- 0 (17.22)


XIkYll = Uj (17.23)
X2kY21 = Uj (17.24)

for some k, l E {1,2}. XIk can be written as XIk = CX2k, C E C which gives CYll =
Y21. Let k' := 3 - k and l' := 3 - l. If XIk' is nonzero then X2k' is nonzero and
XIk'Yll = X2k'Y21 so that XIk' = CX2k" Similarly if Yll' is nonzero then CYll' = Y21"
Thus decomposition is unique up to a phase factor.

Next we describe the relation between condition (17.9) and the measure of entan-
glement introduced above. Since

UIU2 UIU3

lu)(ul =
C'"'
U2 U I
U3UI
U2U2
U3U2
U2U3
U3U3
"'U')
U2U4
U3U4
(17.25)
U4UI U4U2 U4U3 U4U4

we find

(17.26)

(17.27)

The 2 x 2 density matrices PA and PB given by (17.26) and (17.27) are hermitian and
have the same eigenvalues. Thus the eigenvalues Al and A2 are real. The matrices
are also positive semi-definite i.e. for allla) E C 2 we have (aIPA,Bla) 2: O. Thus the
eigenvalues are non-negative. The eigenvalues are given by

(17.28)
17.3 Entangled States 461

Since lu) is normalized we have

tr(trAlu)(ul) = 1 (17.29)
tr(trRlu)(ul) =1 (17.30)

and therefore
(17.31)
where we used the fact that the trace of an n x n matrix is the sum of its eigenvalues.
This can also be seen from (17.28). Thus 0 ::::; >'1, A2 ::::; 1. Now we have

det( trR(lu) (ul))


(UIU4 - U2U3)(UI U4 - U2 U3)

IUIU4 - U2 U31 2 . (17.32)

Thus if UIU4 = U2U3 the determinant is equal to O. Since the determinant of an


n x n matrix is the product of the eigenvalues we find that one eigenvalue is equal
to 0 and owing to (17.31) the other eigenvalue is 1. Obviously the entanglement can
be written as
E(u) = -(Alog2 A + (1 - A) log2(1- A)) (17.33)
where A E {AI, A2} is one of the eigenvalues given above. Using these facts and

we find that E(u) = 0 if condition (17.9) is satisfied. Thus if A = 0 or A = 1 we have


E(u) = O. For A = ~ the entanglement E(u) has a maximum and we find E(u) = 1.
Vice versa we can prove that if E(u) = 0 the condition (17.9) follows. The squares
of the density operators are given by

When (17.9) holds we find that

P~ = PA and P~ = PR·

Using the computer algebra system SymbolicC++ [169] the expression UIU4 - U2U3
can be evaluated symbolically and compared against 0 which then provides the in-
formation whether the state is entangled or not. SymbolicC++ includes among
462 Chapter 17. Quantum Bits and Quantum Computation

other classes a template class Complex and a Sum class to do the symbolic manip-
ulations. If the state is entangled, then we can use equations (38) and (33) to find
the entanglement E.

A remark is in order about the precision of the numerical calculations of the condi-
tion Ul U4 = U2U3 and the entanglement E to test for non-entanglement. To test the
condition Ul U4 = U2U3 has the advantage that it consists of only multiplication of
complex numbers and the normalization factor of the vector lu) must not be taken
into account. On the other hand if the difference IUlu4 - u2u31 is of order 0(10- 15 )
the term

can be taken as
log(1 + 0(10-30 )).
Therefore 1 + 0(10- 3°) is rounded to log 1 for data type double. Thus the entangle-
ment E is less affected by the problem of the floating point comparison. However
in calculating E we have to take into account the normalization factor of the vector
lu). Warnings should be issued if E or IU1u4 - u2u31 are close to zero when we use
the data type double. Java and a number of computer algebra systems admit a
data type of arbitrary precision of floating point numbers. For example, Java has
the abstract data type BigDecimal. Then we can work with higher precision. An
important special case arises when one of the components of the vector lu) is equal
to zero. For example, say U4 = O. If the state lu) is non-entangled then U2 or U3
must be zero.

The analysis of separability can be extended to higher dimensions, for example Steeb
and Hardy [167] consider when states in C9 can be separated into a product of two
states in C 3 . They have only considered separability of pure states.

The more general question of the separability of mixed states has been considered
in [90, 91, 127].
17.4 Quantum Gates 463

17.4 Quantum Gates


17.4.1 Introduction
Quantum computation is a unitary transformation, where a measurement is per-
formed at the end to extract the result. A unitary transformation is itself reversible;
therefore, we have to use reversible gates in order to be able to implement quan-
tum gates. A unitary transformation may operate on a single qubit or multiple
qubits. Some transformations on multiple qubits cannot be expressed as a sequence
of operations on single qubits.

States evolve according to the Schrodinger equation

where H is a linear self-adjoint operator. The formal solution is given by

'Ij;(t) = e-itH/h'lj;(O)
and since H is self-adjoint exp( -itH In) is unitary. Thus the evolution of states in
quantum computation is described by unitary operations.

For example, a general unitary transformation in the two-dimensional space C 2 can


be defined as follows

with B, 6, 17, T E R.

Thus any quantum gate operating on a single qubit is given by an appropriate


choice of B, 8, 17, and T. Single qubit operations are not sufficient to implement
arbitrary unitary transforms required by quantum algorithms. Thus it is important
to determine if some basic set of unitary operations are sufficient to implement any
unitary transform.

Definition. A unitary transformation on n qubits is called simple if n - 2 of the


qubits always remain unchanged by the transformation.

Theorem. Given any unitary transformation U and E > 0 there exists simple unitary
transformations U1 , U2 , . .. ,Uk such that
464 Chapter 17. Quantum Bits and Quantum Computation

where k is a polynomial function of 2n and log2 ; and

IIAII := 1I~f=111 Alx) II·

This theorem is important for the discussion of universality (see section 17.4.6).

Next we discuss some important quantum gates.

17.4.2 NOT Gate


The corresponding quantum gate of the classical NOT gate is implemented via a
unitary matrix UNOT operation that evolves the basis states into the corresponding
states according to the same truth table. The quantum version of the classical NOT
gate is the unitary operation UNOT such that

Since 10) = (0 If and 11) = (1 of we find the unitary matrix

The quantum NOT gate for the two quantum bit case would be then the unitary
4 x 4 matrix

since

This can be extended to any dimension. The unitary matrix UNOT is a permutation
matrix. The NOT gate is a special case of the unitary matrix

if 0: = 1.
17.4 Quantum Gates 465

The NOT gate is denoted as

Figure 17.1: NOT Gate

17.4.3 Walsh-Hadamard Gate


In quantum mechanics, the notation of gates can be extended to operations that
have no classical counterpart. For instance, the operation UH (Walsh-Hadamard
gate) that evolves according to

Note that it evolves classical states into superpositions and therefore cannot be
regarded as classical. Thus UH is given by the 2 x 2 unitary matrix

since

UH ll) == v'2
1(11)(1)
-1 1 0 = v'2
1(1-1) = v'2(IO)
1 -11)).
466 Chapter 17. Quantum Bits and Quantum Computation

The unitary operation represented by the unitary matrix UH corresponds to a 45°


rotation of the polarization. This is intrinsically nonclassical because it transforms
Boolean states into superpositions. The inverse matrix of UH is given by

The Walsh-Hadamard gate is a special case of the rotation matrix when 0 = -~,

U (0):= (c~SO -sinO).


R sm 0 cosO

The Walsh-Hadamard gate is denoted as

Figure 17.2: Walsh-Hadamard Gate

The Walsh-Hadamard gate is quite useful when extended using the Kronecker prod-
uct. If we take an n-bit quantum register initially in the state

100 ... 0)

and apply UH to every single qubit of the register. The resulting state is

Thus we can write

1 2"-1
I1/!) = 2n/2 L
x=o
Ix).
17.4 Quantum Gates 467

When the initial configuration of the qubits is

Y = Yo + yl 2 + ... + Yn_1 2n - l ,
in other words the register is prepared as

IYn-1 ... YIYO) ,


applying the Walsh-Hadamard transform to each qubit yields

1 2n-1
(UH Q9 UH Q9 •.• Q9 UH)ly) = 2n/2 L (-I)x*Ylx).
x=O

where
x * Y = (xo . Yo) EB (Xl' YI) EB ... EB (Xn-l . Yn-l).
This means with a linear number of operations (i.e. n applications of UH) we have
generated a register state that contains an exponential (2n) number of distinct terms.
Using quantum registers, n elementary operations can generate a state containing
all 2n possible numerical values of the register. In contrast, in classical registers
n elementary operations can only prepare one state of the register representing
one specific number. It is this ability of creating quantum superpositions which
makes the quantum parallel processing possible. If after preparing the register in a
coherent superposition of several numbers all subsequent computational operations
are unitary and linear (i.e. preserve the superpositions of states) then with each
computational step the computation is performed simultaneously on all the numbers
present in the superposition.

17.4.4 XOR and the Controlled NOT Gate


The most important two-qubit quantum gate is the XOR gate. It is defined as

UxoRla,b) := la,aEBb).
Consequently

UXORIOO) = 100), UxoR I01) = 101), UxoR llO) = 111), UxoR ll1) = 110).
The vectors 100),101),110) and 111) form an orthonormal basis in C 4 . If we consider
the basis in this order, the matrix representation of Ux 0 R is

o 0
1 0
o 0
o 1
468 Chapter 17. Quantum Bits and Quantum Computation

If we consider the order 111), 110), 101), 100), the matrix representation is

1 0
o 0
o 1
o 0

Both matrices are permutation matrices. Sometimes in the literature the definition

UxORla, b) := la ED b, b)
is used. Furthermore the XOR gate is also called the controlled NOT gate (GNaT
gate). The name comes from the fact that the gate effects a logical NOT on the
second qubit (target bit), if and only if the first qubit (control bit) is in state 1.
We see that UXOR cannot be written as a Kronecker product of 2 x 2 matrices.
Two interacting magnetic dipoles sufficiently close to each other can be used to
implement this operation. The XOR gate is denoted by

J)

Figure 17.3: XOR Gate

17.4.5 Other Quantum Gates


The exchange gate simply swaps two bits, i.e. it applies the transform

100) >-> 100), 101) >-> 110), 110) >-> 101), 111) >-> 111).
We have
UEXCH := 100)(001 + 110)(011 + 101)(101 + 111)(111·
The matrix representation is

o 0
o 1
1 0
o 0

which is a permutation matrix.


17.4 Quantum Gates 469

The phase shift gate is defined on two qubits as

where a, bE {O, I} and . denotes the classical AND operation. Thus we have

Ups (¢) 100) 1°0),


Ups (¢)101) 1°1),
Ups (¢) 110) 110),
Ups (¢) 111) ei</:>Ill).

The gate performs a conditional phase shift, i.e. a multiplication by a phase factor
only if the two qubits are both in their 11) state. The three other basis states are
ei </:>
unaffected. An important special case is if ¢ = 7r. The phase shift gate is denoted
by

Figure 17.4: Phase Shift Gate

The phase shift gate which acts on one qubit is defined as (in matrix notation)

(
° °)
e-i</:>
e'</:> .

The TofJoli gate is a classically universal, reversable, 3 input, 3 output gate. It


is sometimes also called the controlled controlled NOT gate. It transforms a state
according to
la, b, c) f-> la, b, (a. b) EEl c).
The NOT gate can be constructed as

11,1, a) f-> 11,1, -,a),


and the AND gate can be implemented as

la, b, 0) f-> la, b, a. b).


The gate can be described in terms of UXOR

UTOFFOLI := 10)(01181 12 181 12 + 11)(11181 UXOR.


470 Chapter 17. Quantum Bits and Quantum Computation

We can also describe the gate by

which is a special case of Deutsch's gate (given below) for a = 1. Thus we obtain
the matrix representation

1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
UTOFFOLI = 0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0

The Toffoli gate is denoted by

!
Figure 17.5: Toffoli Gate

The Fredkin gate is described by

UFREDKIN := 10)(0118l h I8l h + 11)(1118l UEXCH.


The gate is also called the controlled exchange gate. It is also possible to construct
AND and NOT gates from the Fredkin gate.

Deutsch's gate acts on elements of the Hilbert space C 8 and is given by

where al,a2,aa E {O, I} and


17.4 Quantum Gates 471

is a 2 x 2 unitary matrix. Here Sf denotes the Kronecker symbol. Thus UD is a


unitary 8 x 8 matrix given by

1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
UD(o:) =
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 0 0 i cos( 1l'o:/2) sin( 1l'o:/2)
0 0 0 0 0 0 sin( 1l'o:/2) i cos(m:1O/2)

where we used the following ordering of al a2a3

000, 001, 010, 011, 100, 101, 110, 111.

17.4.6 Universal Sets of Quantum Gates


In classical computing we described how any Boolean function can be expressed in
terms of the NAND (or NOR) operation. Is their a single quantum gate which can
be used to implement any other quantum gate?

Deutch's gate, described in the previous section is one such gate [57]. It is a class
of gates described by a real parameter. Thus to prove a set of quantum gates is a
universal set, all that is required is to show that the set can implement Deutch's
gate. For example UXOR together with the set of all single qubit transformations
described by the matrix

is a universal set of quantum gates [7, 138].

It has also been shown that a combination of single and double qubit operations
[6, 57] can also form a universal set.
472 Chapter 17. Quantum Bits and Quantum Computation

17.4.7 Functions
Now we illustrate how to construct a simple transformation implementing a classical
function. We consider only the case where one qubit is changed, i.e. the function
we compute over the input gives the value 0 or 1. A simple permutation, which
is unitary, and its inverse allows functions with a greater number of output qubits
to be computed. Suppose the input consists of n qubits, and the function to be
calculated is f := {O, 1, ... ,2n - I} - {O, I}. A unitary transform given by

2"-1 1
Uf := L L Ij)(jl ® Ik EB f(j))(kl
j=O k=O

which is a permutation. The terms in the sum consist of the mapping f


(Ij) ® If(j)))((jl ® (01)
and

to ensure unitarity.

For example, the sum bit of the full adder would be implemented as

USUM := 10000)(00001 + 10011)(00101 + 10101)(01001 + 10110)(01101 +


11001)(10001 + 11010)(10101 + 11100)(11001 + 11111)(11101 +
10001)(00011 + 10010)(00111 + 10100)(01011 + 10111)(01111 +
11000){10011 + 11011)(10111 + 11101)(11011 + 11110)(11111
and the carry bit would be implemented as

UCARRY := 10000)(00001 + 10010)(00101 + 10100)(01001 + 10111)(01101 +


11000)(10001 + 11011)(10101 + 11101)(11001 + 11111)(11101 +
10001)(00011 + 10011)(00111 + 10101)(01011 + 10110)(01111 +
11001)(10011 + 11010)(10111 + 11100)(11011 + 11110)(11111·

To compute a more complex function which maps to n bits, we describe each n


functions which map to a single bit. This is possible since each bit value can be
viewed as a function of the input separate from the other n - 1 bits.
17.4 Quantum Gates 473

Next we describe how quantum computers deal with functions ([8], [123]). Consider
a function
f: {O, 1, ... , 2m - I} --t {O, 1, ... , 2n -l}
where m and n are positive integers. A classical device computes f by evolving each
labelled input
0,1, ... ,2m -1
into its respective labelled output

f(O), f(l), ... , f(2 m - 1).

Quantum computers, due to the unitary (and therefore reversible) nature of their
evolution, compute functions in a slightly different way. It is not directly possible
to compute a function f by a unitary operation that evolves Ix) into If(x)). If f is
not a one-to-one mapping (i.e. if f(x) = f(y) for some x f= y), then two orthogonal
kets Ix) and Iy) can be evolved into the same state

If(x)) = If(y))·

Thus this violates unitarity. One way to compute functions which are not one-to-
one mappings, while preserving the reversibility of computation, is by keeping the
record of the input. To achieve this, a quantum computer uses two registers; the first
register to store the input data, the second one for the output data. Each possible
input x is represented by the state Ix), the quantum state of the first register.
Analogously, each possible output y = f(x) is represented by Iy), the quantum
state of the second register. States corresponding to different inputs and different
outputs are orthogonal,

(xix') oxx"
(yly/) Oyyl .

Thus
((y'l ® (x'I)(lx) ® Iy)) = OxxlOyyl .
The function evaluation is then determined by a unitary evolution operator Uf that
acts on both registers
Uflx) ® 10) = Ix) ® If(x)).
A reversible function evaluation, i.e. the one that keeps track of the input, is as
good as a regular, irreversible evaluation. This means that if a given function can
be computed in polynomial time, it can also be computed in polynomial time using
a reversible computation. The computations we are considering here are not only
reversible but also quantum, and we can do much more than computing values of
f(x) one by one. We can prepare a superposition of all input values as a single state
474 Chapter 17. Quantum Bits and Quantum Computation

and by running the computation Uf only once, we can compute all of the 2m values
1(0), ... , 1(2m - 1),

How much information about 1 does the state 1'If) contain? No quantum measure-
ment can extract all of the 2m values

1(0), 1(1), ... , 1(2m - 1)


from 1'If). Imagine, for instance, performing a measurement on the first register
of 1'If). Quantum mechanics enables us to infer several facts. Since each value x
appears with the same complex amplitude in the first register of state of 1'If), the
outcome of the measurement is equiprobable and can be any value ranging from 0 to
2m -1. Assuming that the result of the measurement is Ij), the post-measurement
state of the two registers (i.e. the state of the registers after the measurement) is

1'If} = Ij} ® 11U)} .


Thus a subsequent measurement on the second register would yield with certainty
the result 1U), and no additional information about 1 can be gained.

Nielsen and Chuang [123] use a different notation. The initial state is assumed to
be of the form
Id} ® IP}
where Id} is the state of the m-qubit data register, and IP} is a state of the n-qubit
program register. The two registers are not entangled. The dynamics of the gate
array is given by
Id} ® IP} ~ G(ld} ® IP})
where G is a unitary operator. This operation is implemented by some fixed quan-
tum gate array. A unitary operator, U, acting on m qubits, is said to be implemented
by this gate array if there exists a state lPu} of the program register such that

G(ld} ® IPu)) = (uld}) ® IPb}

for all states Id) of the data register, and some state IPb) of the program register.
To see that IPb} does not depend on Id}, suppose that

Gldl } ® IP} = (Uldl }) ® IF{}

Gld) ® IP) = (Uld2 )) ® Ig) .


17.4 Quantum Gates 475

Taking the inner product of these two equations, we find that

(P{IP~) =1
provided

Thus
IP{) = I~)
and therefore there is no Id) dependence of IPu). The case (ddd2 ) = 0 follows
by similar reasoning. Nielsen and Chuang [123] show how to construct quantum
gate arrays that can be programmed to perform different unitary operations on a
data register, depending on the input to some program register. Furthermore, they
show that a universal quantum register gate array - a gate array which can be
programmed to perform any unitary operation - exists only if one allows the gate
array to operate in a probabilistic fashion. Since the number of possible unitary
operations on m qubits is infinite, it follows that a universal gate array would require
an infinite number of qubits in the program register, and thus no such array exists.

Suppose distinct (up to a global phase) unitary operators U1 , ... UN are implemented
by some programmable quantum gate array. Nielsen and Chuang [123] showed that
the program register is at least N dimensional, that is, contains at least log2 N
qubits. Moreover, the corresponding programs Ig), .. . IPN ) are mutually orthog-
onal. A deterministic programmable gate array must have as many Hilbert space
dimensions in the program register as the number of programs implemented.
476 Chapter 17. Quantum Bits and Quantum Computation

17.5 Garbage Disposal


In performing a calculation, an algorithm may use a number of temporary registers
for storing intermediate results. Since operations in quantum computing involve
unitary (reversable) operations, these temporary registers cannot be forced into
some initial state independent of the register contents. The unitarity of operations
does however provide a mechanism to return temporary registers to some initial
state.

Suppose an algorithm requires the application of the sequence of unitary operators

where each operator Ui places the result in register (of appropriate size) i. The final
result is placed in register n. We assume each register i is in an initial state 10};-
The register indicated by 0 is in the initial state la) which serves as a parameter to
the algorithm. Each Ui successively places the result Ji(a) of the computation in
register i, given the values a, h(a), . .. ,fi-l(a) as parameters. Thus application of
the operators gives

Since we only require the result fn(a) we can apply the inverse operations

U~_l' U~_2'···' U;

to return the temporary registers to their initial states

u;u; ... U~_l(la) ® Ih(a)) ® Ih(a)) ® ... ® fn(a)) = la) ® IOh ® .. ·IO)n-l ® Ifn).
This can be understood by examining the register content after each unitary oper-
ation

U1(la) ® 10h ® ... ® IO)n) = la) ® Ifl(a)) @ 10h ® ... @ 10)n


U2(la) ® Ih(a)) ® 10h ® ... ® IO)n) = la) ® Ih(a)) @ Ih(a)) ® l0)s @ ... ® 10)n

Each step depends only on previously calculated values, thus reversing each compu-
tation from Un - 1 to U1 does not destroy the final result. This method of regaining
the use of temporary registers is termed garbage disposal, as it eliminates "garbage"
in temporary registers which is no longer useful.
17.6 Quantum Copying 477

17.6 Quantum Copying


In this section we consider the problems associated with duplicating information
in quantum computing. Copying is an extensively used operation in classical al-
gorithms. In this section we show that copying of arbitrary states in quantum
computing is not possible, and then consider limited copying processes.

Theorem. Given an arbitrary state, 11,b) no unitary matrix U exists such that
UI1,b, 0) = 11,b,1,b)·

Proof. Suppose U does exist. Then for a state la) and a different state Ib) we have

Ula,O) = la, a), Ulb,O) = Ib, b).

Now
U(la) + Ib)) @ 10) = (Ia) + Ib)) @ (Ia) + Ib))
and
U(la) + Ib)) @ 10) = la) @ la) + Ib) @ Ib).
This is a contradiction since in general

la) @ la) + Ib) @ Ib) i la) @ la) + a) @ Ib) + b) @ la) + b) @ Ib).

This is called the no-cloning theorem. It means that in general it is impossible to



make an exact copy of qubits and quantum registers. However it is simple to copy
a quantum register with purely classical data (i.e. it is not in a superposition) with
a CNOT gate for every qubit in the register.

Mozyrsky et al. [122] derived a Hamilton operator for copying the basis up and
down states of a quantum two-state system - a qubit - onto n copy qubits (n 2: 1)
initially prepared in the down state. The qubit states by quantum numbers are
denoted by qj = 0 (down) and qj = 1 (up), for spin j. The states of the n + 1 spins
will then be expanded in the basis

The copying process imposes the two conditions

1100 ... 0) ~ 1111 ... 1)

1000 ... 0) ~ 1000 ... 0)


up to possible phase factors. Therefore, a unitary transformation that corresponds
to quantum evolution over the time interval /j,t is not unique. Thus the Hamilton
478 Chapter 17. Quantum Bits and Quantum Computation

operator is not unique. One chooses a particular transformation that allows analyt-
ical calculation and, for n = 1, yields a controlled-NOT gates. They considered the
following unitary transformation.

U eifj ll11 ... 1}(100 ... 01


+ eiPIOOO ... O}(OOO ... 01 + ei <>II00 ... 0}(111 ... 11
L{qj} Iqlq2q3 ... Qn-l)(QlQ2Q3 ... Qn-ll·

The sum in the fourth term, {q;}, is over all the other quantum states of the system,
i.e., excluding the three states

1111 ... 1}, 1100 ... O}, 1000 ... O} .

The first two terms accomplish the desired copying transformation. The third term
is needed for unitarity since the quantum evolution is reversible. General phase
factors are allowed in these terms. Thus

UIOOO ... O} = eiPIOOO ... O}


_ ifj
UI100 ... 0}-e 1111 ... 1}.

To calculate the Hamilton operator if according to

we diagonalize the unitary matrix U. The diagonalization is simple because we only


have to work in the subspace of the three special states

1111 ... 1}, 1100 ... O}, 1000 ... O} .


The part related to the state 1000 ... O} is diagonal. In the subspace labeled by
1111 ... 1}, 1100 ... O}, 1000 ... O}, in that order, the unitary matrix U is represented
by the matrix

o ).
eip

The eigenvalues of U are


17.6 Quantum Copying 479

Thus the eigenvalues of the Hamiton operator in the selected subspace are given by

E3
n 27rn
= - - p - -N3.
~t ~t

Universal optimum cloning [33, 35] is an attempt to provide the best copy of an
arbitrary quantum state given the constraints of quantum mechanics. The specific
constraints specified for the copy operation are given as follows
1. The density operators of the source and destination states must be identical
after the copy operation

2. All pure states should copy equally well. This can be implemented, for ex-
ample, by requiring that, for a certain distance measure, the copied state is
always a fixed distance d from the original pure state.

3. The distance between the state to be copied and the copy must be a minimum.
The distance between the original state before and after copying must also be
minimized.
Using the Bures distance for density operators,

Buzek and Hillery [35] found the following transformations satisfy the given con-
straints

UoQcIO) i8llj) i8llq) := ~Ij) i8llj) i8llaj) + ~(IO) i8l11) + 11) i8l10)) i8l1J31-j)

where Iq) is the initial state of the ancillary system used in the copying process and
lao) and lal) are orthonormal states in the Hilbert space of the ancillary system.
Using a slightly different approach Brufi et al. [33] found the same transformation.
480 Chapter 17. Quantum Bits and Quantum Computation

17.7 Example Programs


In the following we give an implementation of the decomposition of a non-entangled
state [165] as described earlier in the chapter.

Let us assume the state I") E C 2 is not entangled, i.e. UIU4 = U2U3. The C++
program decompose. cpp will calculate the decomposition into Ix) and IY) using
(17.18)~(17.21) assuming I") is normalized. We use a two-dimensional array of
data type double to represent the state Ill) and an array of two double variables
to represent the real and imaginary parts of the complex numbers. Owing to the
numerical calculation of Ix) and IY) these states can contain small rounding errors.

II decompose.cpp

#include <iostream>
#include <cmath>
using namespace std;

void factor(double x[2] [2] ,double y[2] [2] ,double u[4] [2])
{
double xln,x2n,yln,y2n,uls,u2s,u3s,u4s;

uls = u [0] [0] *u[O] [0] +u [0] [1] *u [0] [1] ;


u2s = u [1] [0] *u[1] [0] +u [1] [1] *u[l] [1] ;
u3s = u [2] [0] *u [2] [0] +u [2] [1] *u [2] [1] ;
u4s = u[3] [O]*u[3] [O]+u[3] [l]*u[3] [1];

xln = sqrt(uls+u2s); x2n = sqrt(u3s+u4s);


yin = sqrt(uls+u3s); y2n = sqrt(u2s+u4s);

double aul,au2,au3,a4;

if(uls==O.O) aul = 0.0; else


aul = acos(u[O] [O]/(sqrt(uls)));
if(u2s==0.0) au2 = 0.0; else
au2 = acos(u[l] [O]/(sqrt(u2s)));
if(u3s==0.0) au3 = 0.0; else
au3 = acos(u[2] [O]/(sqrt(u3s)));
a4 = au3-aul;

x[O] [0] = xln; x [0] [1] = 0.0;


x [1] [0] = x2n*cos(a4); x [1] [1] = x2n*sin(a4);
y [0] [0] = yln*cos (aul) ; y [0] [1] = yln*sin(aul);
y [1] [0] = y2n*cos(au2); y [1] [1] = y2n*sin(au2);
}

void displayfactor(double u[4] [2])


{
17.7 Example Programs 481

int i;
double x [2] [2] , y [2] [2] ;

cout « IOU = ( "; for(i=0;i<4;i++) cout « u[i] [0] « "+"


« u[i] [1] « "i "; cout « ")" « endl;
factor(x,y,u);
cout « "x = ( " « x[O] [0] « "+" « x[O] [1] « "i " « x[l] [0]
«"+"« x[1][l] «"i)"« endl;
cout « lOy = ( " « yEO] [0] « "+" « yEO] [1] « "i " « y[l] [0]
«"+"« y[l] [1] «"i)"« endl« endl;
}

II u[4] represents 4 complex numbers


II where u[j] [0] is the real part of u[j]
II and u[j] [1] is the imaginary part of u[j]
int mainO
{
double u[4] [2];
II ( 0.5 0.5 0.5 0.5 )
u[O] [0] = 0.5;u[0] [1] = O;u[l] [0] = 0.5;u[1] [1] = 0.0;
u[2] [0] = 0.5;u[2] [1] = 0;u[3] [0] = 0.5;u[3] [1] = 0.0;
displayfactor(u);
II ( 0.5 -0.5 -0.5 0.5 )
u[O] [0] = 0.5; u[O] [1] = O.O;u[l] [0] = -0.5;u[1] [1] = 0.0;
u[2] [0] = -0.5;u[2] [1] = 0.0;u[3] [0] = 0.5; u[3] [1] = 0.0;
displayfactor(u);
II i is equivalent to (0,1) in the implementation
II ( i/sqrt(2) 1/sqrt(2) 0 0 )
u [0] [0] = O.O;u [0] [1] = 1/sqrt (2) ;
u[l] [0] = 1. O/sqrt (2) ;u[l] [1] = 0.0;
u[2] [0] = 0.0;u[2] [1] = 0.0;
u[3] [0] = 0.0;u[3] [1] = 0.0;
displayfactor(u);
II ( 0.7 0.3 2.1 0.9 ) once normalized
double size=sqrt(0.7*0.7 + 0.3*0.3 + 2.1*2.1 + 0.9*0.9);
u[O] [0] = 0.7/size; u[O] [1] = 0;
u[l] [0] = 0.3/size; u[l] [1] = 0.0;
u[2] [0] = 2.1/size; u[2] [1] = 0;
u[3] [0] = 0.9/size; u[3] [1] = 0.0;
displayfactor(u);
return 0;
}
482 Chapter 17. Quantum Bits and Quantum Computation

We consider now three applications for quantum computing [166] and give the sim-
ulation using SymbolicC++ [169]. First we show how entangled states can be gen-
erated from unentangled states using unitary transformations. The quantum circuit
is also given. Next we consider a quantum circuit for swapping two bits. The third
application deals with teleporation [17, 28]. Finally, we consider the Greenberger-
Horne-Zeilinger state [96]. Then we provide the SymbolicC++ [169] implementation
of these applications.

In our first example we start from standard basis (unentangled states) in the Hilbert
space C 4 and transform them into the Bell states. The Bell states are defined as

~+ ~ ),(100) + 111)) =), ( !), <1>- =


1
y'2(100) -111)) == y'2
1 1)
(°~1 '

>Ii+ ~ ),(1 01 ) + 110)) =), m 1 1


\11- = y'2(101) -110)) == y'2 ( ~10) .
1

The Bell states also form a basis in C 4 . They are entangled. Entangled states
exhibit nonlocal correlations. This means that two entangled systems which have
interacted in the past and are no longer interacting still show correlations. These
correlations are used for example in dense coding and quantum error-correction
techniques [162, 17]. To transform the standard basis into the Bell states we apply
the following two unitary transformations. The first unitary transformation is given
by

where 12 is the 2 x 2 unit matrix. Our second unitary transformation is UXOR'


Applying these two unitary matrices to the states 100), 101), 110) and 111) we find

UXOR(UH 0 h)IOO) = <1>+, UXOR(UH 012)101) = \11+,

UXOR(UH 012 )110) = <1>-, UXOR(UH 0 12 )111) = \11- .


17.7 Example Programs 483

These operations can be represented by the quantum circuit

Figure 17.6: Quantum Circuit to Generate Bell States

As our second example we consider the swapping of a pair of bits. The circuit for
swapping a pair of bits is given by

Figure 17.7: Quantum Circuit to Swap a Pair of Bits

This circuit is represented by the product of the three permutation matrices

o 0
o 0
o 1
1 0

Thus we find the permutation matrix UEXCH . This permutation matrix cannot be
represented as the Kronecker product of 2 x 2 matrices.

Finally we consider the Greenberger-Home-Zeilinger (GHZ) state [96]. This state is


an entangled superposition of three qubits and is given by

1 1
11l1)GHZ := J2(1000) + 1111) == J2(10) @ 10) @ 10) + 11) @ 11) @ 11)).

Thus in the Hilbert space C 8 we have

1 T
11l1)GHZ = J2(10000001)

where T stands for transpose. If we consider 000 and III to be the binary represen-
tation of "0" and "7", respectively, the GHZ state simply represents the coherent
484 Chapter 17. Quantum Bits and Quantum Computation

superposition 1/v'2(I"O") + 1"7")). In this state all three qubits are either 0 or 1
but none of the qubits has a well-defined value of its own. Measurement of anyone
qubit will immediately result in the other two qubits attaining the same value. For
example

The implementation in SymbolicC++ [169] is as follows. The Matrix class of Sym-


bolicC++ includes the method kron for the Kronecker product of two matrices and
the method dswn for the direct sum of two matrices. The overloaded operators *
and + are used for matrix multiplication and addition. The identity matrix is also
implemented. Thus the code for the three quantum circuits is as follows.

II qthree.cpp

#include <iostream>
#include "Vector.h"
#include "Matrix.h"
#include "Rational.h"
#include "Msymbol.h"
using namespace std;

typedef Sum<Rational<int> > C;

template <class T> Vector<T> Hadamard(Vector<T> v)


{
assert(v.length() == 2);
Matrix<T> H(2,2);
H[O] [0] = T(1)/sqrt(T(2»; H[O] [1] = T(1)/sqrt(T(2»;
H[l] [0] = T(1)/sqrt(T(2»; H[l] [1] = T(-1)/sqrt(T(2»;
return (H*v);
}

template <class T> Vector<T> XOR(Vector<T> v)


{
assert(v.length() == 4);
Matrix<T> X(4,4);
X[O] [0] = T(l); X[O] [1] = T(O); X[O] [2] = T(O); X[O] [3] = T(O);
X[1] [0] = T(O); X[1] [1] = T(1); X[1] [2] = T(O); x[1] [3] = T(O);
X[2] [0] = T(O); X[2] [1] = T(O); X[2] [2] = T(O); X[2] [3] = T(1);
X[3][0] = T(O); X[3][1] = T(O); X[3][2] = T(1); X[3][3] = T(O);
return (X*v);
17.7 Example Programs 485

template <class T> Vector<T> Bell(Vector<T> v)


{
assert(v.length() == 4);

Matrix<T> I(2,2),H(2,2),X(4,4);
1. identity 0 ;

H[O][O] = T(1)/sqrt(T(2»; H[O] [1] = T(1)/sqrt(T(2»;


H[l][O] = T(1)/sqrt(T(2»; H[l] [1] = T(-1)/sqrt(T(2»;

Matrix<T> UH=kron(H,I);

X[O][O] = T(l); X[O] [1] = T(O); X[O] [2] = T(O); X[O] [3] = T(O);
X[l] [0] = T(O); X[l] [1] = T(1); X[1] [2] = T(O); X[1] [3] = T(O);
X[2] [0] = T(O); X[2] [1] = T(O); X[2] [2] = T(O); X[2] [3] = T(1);
X[3] [0] = T(O); X[3] [1] = T(O); X[3] [2] = T(l); X[3] [3] = T(O);

return (X*(UH*v»;
}

template <class T> Vector<T> 8wap(Vector<T> v)


{
assert(v.length()==4);
Matrix<T> 8(4,4);
8 [0] [0] = T(1); 8 [0] [1] = T(O); 8 [0] [2] = T(O); 8 [0] [3] = T(O);
8[1][0] = T(O); 8[1] [1] = T(O); 8[1] [2] = T(O); 8[1] [3] = T(l);
8 [2] [0] = T(O); 8 [2] [1] = T(O); 8 [2] [2] = T(l); 8 [2] [3] = T(O);
8 [3] [0] = T(O); 8 [3] [1] = T(1); 8 [3] [2] = T(O); 8 [3] [3] = T(O);
return XOR(8*XOR(v»;
}

template <class T> Vector<T> Teleport(Vector<T> v)


{
int i;
assert(v.length() == 8);
Vector<T> result;
Matrix<T> NOT(2,2),H(2,2),I(2,2),X(4,4);

NOT[O] [0] = T(O); NOT[O] [1] = T(1);


NOT[l] [0] = T(1); NOT[l] [1] = T(O);

H[O] [0] = T(1)/sqrt(T(2»; H[O] [1] = T(1)/sqrt(T(2»;


H[l] [0] = T(1)/sqrt(T(2»; H[l] [1] = T(-1)/sqrt(T(2»;

1. identity 0 ;
486 Chapter 17. Quantum Bits and Quantum Computation

X[O] [0] = T(1); X[O] [1] = T(O); X[O] [2] = T(O); X[O] [3] = T(O);
X[l] [0] = T(O); X[1] [1] = T(l); X[l] [2] = T(O); X[l] [3] = T(O);
X[2] [0] = T(O); X[2] [1] = T(O); X[2] [2] = T(O); X[2] [3] = T(O;
X[3][0] = T(O); X[3][1] = T(O); X[3][2] = T(l); X[3][3] = T(O);

Matrix<T> Ul=kron(I,kron(H,I»;
Matrix<T> U2=kron(I,X);
Matrix<T> U3=kron(X,I);
Matrix<T> U4=kron(H,kron(I,I»;
Matrix<T> U5=kron(I,X);
Matrix<T> U6=kron(I,kron(I,H»;
Matrix<T> U7=dsum(I,dsum(I,dsum(NOT,NOT»);
Matrix<T> U8=kron(I,kron(I,H»;

result=U8*(U7*(U6*(U5*(U4*(U3*(U2*(Ul*v»»»);
for(i=0;i<8;i++)
{
while(result[i].put(power(sqrt(T(2»,-6),power(T(2),-3»);
while(result[i].put(power(sqrt(T(2»,-4),power(T(2),-2»);
while(result[i].put(power(sqrt(T(2»,-2),power(T(2),-1»);
}
return result;
}

II The outcome after measuring value for qubit.


II Since the probabilities may be symbolic this function
II cannot simulate a measurement where random outcomes
II have the correct distribution
template <class T>
Vector<T> Measure(Vector<T> v,unsigned int qubit,unsigned int value}
{
assert(pow(2,qubit)<v.length(»;
assert(value==O I I value==l);
int i,len,skip = l-value;
Vector<T> result(v);
T D = T(O);

len = v.length()/int(pow(2,qubit+l»;
for(i=O;i<v.length();i++)
{
if(!(i%len» skip = l-skip;
if(skip) result[i] = T(O);
else D += result[i]*result[i];
}
result/=sqrt(D);
return result;
}
17.7 Example Programs 487

II for output clarity


ostream &print(ostream &o,Vector<C> v)
{
char *b2 [2] ={" 10>"," 11>"};
char *b4 [4] ={" 100>" ," 101>" ," 110>", II I11>"} ;
char *b8 [8] ={" 1000>" ,"1001>", "1010>" ,"1011>",
Ill00>1,lll0l>I,llll0>1,lllll>"};
char **b,i;

if(v.length()==2) b=b2;
if(v.length()==4) b=b4;
if(v.length()==8) b=b8;

for(i=O;i<v.length();i++)
if(!v[i] .is_Number() I I v[i] .nvalue()!=C(O))
o « "+(" « v[i] « ")" « b[i];
return 0;
}

void main(void)
{
Vector<C> zero(2),one(2);
Vector<C> zz(4),zo(4) ,oz(4) ,00(4) ,qreg;
Vector<C> tpOO,tpOl,tpl0,tpll,psiGHZ;
Sum<Rational<int> > a("a" ,0) ,b("b" ,0);
int i;

zero[O] = C(l); zero[l] = C(O);


one [0] = C(O); one[l] = C(l);
zz = kron(Matrix<C>(zero),Matrix<C>(zero))(O);
zo = kron(Matrix<C> (zero) ,Matrix<C> (one)) (0);
oz = kron(Matrix<C> (one) ,Matrix<C>(zero)) (0);
00 = kron(Matrix<C>(one),Matrix<C>(one))(O);

cout « "UHIO> , print (cout ,Hadamard(zero)) « endl;


II.

cout « "UHll> , print(cout,Hadamard(one)) « endl;


II.

cout « endl;
cout « "UXORIOO> , print(cout,XOR(zz)) « endl;
II.

cout « II UXOR I01> , print (cout ,XOR(zo)) « endl;


II.

cout « II UXOR I10> , print(cout,XOR(oz)) « endl;


II.

cout « II UXOR I11> , print(cout,XOR(oo)) « endl;


II.

cout « endl;
cout « "UBELLIOO> , print(cout,Bell(zz)) « endl;
II.

cout « "UBELLI01> , print(cout,Bell(zo)) « endl;


II.

cout « "UBELLll0> , print (cout,Bell(oz)) « endl;


II.

cout « "UBELLI11> , print (cout,Bell(oo)) « endl;


II.

cout « endl;
cout « "USWAPIOO> , print (cout ,Swap(zz)) « endl;
II.
488 Chapter 17. Quantum Bits and Quantum Computation

cout « "USWAPI01> = II., print (cout,Swap(zo» « endl;


cout « "USWAP 110> = II., print (cout,Swap(oz» « endl;
cout « "USWAP 111> = II., print (cout,Swap(oo» « endl;
cout « endl;

qreg=kron(a*zero+b*one,kron(zero,zero»(0);
cout « "UTELEPORT("; print(cout,qreg) « ") = ";
print(cout,qreg=Teleport(qreg» « endl;
cout « "Results after measurement of first 2 qubits:" « endl;
tpOO = Measure(Measure(qreg,O,O),l,O);
tpOl = Measure(Measure(qreg,O,O),l,l);
tpl0 = Measure(Measure(qreg,O,l),l,O);
tpll = Measure(Measure(qreg,O,l),l,l);
for(i=0;i<8;i++)
{
while(tpOO[i].put(a*a,C(l)-b*b»;
while(tpOO[i].put(power(sqrt(C(1)/C(2»,-2),C(2»);
while(tpOl[i].put(a*a,C(l)-b*b»;
while(tpOl[i].put(power(sqrt(C(1)/C(2»,-2),C(2»);
while(tpl0[i].put(a*a,C(1)-b*b»;
while(tpl0[i] .put(power(sqrt(C(1)/C(2»,-2),C(2»);
while(tpll[i].put(a*a,C(l)-b*b»;
while(tpll[i].put(power(sqrt(C(1)/C(2»,-2),C(2»);
}
cout « 100>
II print (cout,tpOO) « endl;
cout « 101>
II print (cout, tpOi) « endl;
cout « II 110> print (cout,tpl0) « endl;
cout « II 111> print (cout, tpll) « endl;
cout « endl;

psiGHZ=(kron(Matrix<C>(zz),Matrix<C>(zero»/sqrt(C(2»
+kron(Matrix<C>(oo),Matrix<C>(one»/sqrt(C(2»)(0);
cout « "Greenberger-Horne-Zeilinger state: ";
print(cout,psiGHZ) « endl;
cout « "Measuring qubit 0 as 1 yields: ";
print(cout,Measure(psiGHZ,O,l» «endl;
cout « "Measuring qubit 1 as 1 yields: ";
print(cout,Measure(psiGHZ,l,l» «endl;
cout « "Measuring qubit 2 as 0 yields: ";
print(cout,Measure(psiGHZ,2,0» «endl;
}

The program generates the following output:

UHIO> +(sqrt(2)~(-1»10>+(sqrt(2)~(-1»11>
UH11> +(sqrt(2)~(-1»10>+(-sqrt(2)~(-1»11>

UXORIOO> = +(1)100>
17.7 Example Programs 489

UXORI01> = +(1)101>
UXORll0> = +(1)111>
UXOR I11> = +(1)110>

UBELLIOO> = +(sqrt(2)-(-1))100>+(sqrt(2)-(-1)) 111>


UBELLI01> = +(sqrt(2)-(-1))101>+(sqrt(2)-(-1)) 110>
UBELLll0> = +(sqrt(2)-(-1)) 100>+(-sqrt(2)-(-1)) 111>
UBELLlll> = +(sqrt(2)-(-1)) 101>+(-sqrt(2)-(-1)) 110>

USWAPIOO> = +(1)100>
USWAPI01> = +(1)110>
USWAPll0> = +(1)101>
USWAPlll> = +(1)111>

UTELEPORT(+(a)1000>+(b)ll00» = +(1/2*a)1000>+(1/2*b) 1001>


+(1/2*a) 1010>+(1/2*b) 1011>
+(1/2*a) 1100>+(1/2*b) 1101>
+(1/2*a) I110>+(1/2*b) 1111>
Results after measurement of first 2 qubits:
100> +(a) I000>+ (b) 1001>
101> +(a) 1010>+(b) 1011>
110> +(a) 1100>+(b) 1101>
111> +(a) I 110>+(b) 1111>

Greenberger-Horne-Zeilinger state: +(sqrt(2)-(-1))1000>


+(sqrt(2)-(-1)) 1111>
Measuring qubit 0 as 1 yields +(sqrt(2)-(-1)
*sqrt(sqrt(2)-(-2))-(-1)) 1111>
Measuring qubit 1 as 1 yields +(sqrt(2)-(-1)
*sqrt(sqrt(2)-(-2))-(-1)) 1111>
Measuring qubit 2 as 0 yields +(sqrt(2)-(-1)
*sqrt(sqrt(2)-(-2))-(-1)) 1000>

In 1996 Schack and Brun [141] described a powerful C++ library for solving quantum
systems. The core of the library are the C++ classes State and Operator, which
represent state vectors and operators in Hilbert space. However the disadvantage
of this C++ library is that the constants (for example the coupling constant in a
Hamilton operator) can only be treated numerically, i.e. it is of data type double.
In SymbolicC++ we can treat constants either symbolically or numerically. Using
the method set we can switch from a symbolic representation of a constant to a
numeric representation. Using the approach of Schack and Brun it is also difficult to
construct the CNOT operator on any two qubits of a state. In 1995, a year before the
paper of Schack and Brun [141], Steeb [159] described a computer algebra package
based on Reduce and Lisp that can handle Bose, Fermi and coupled Bose-Fermi
systems. Since spin operators can be expressed with Fermi operators, the package
490 Chapter 17. Quantum Bits and Quantum Computation

can also deal with spin-systems. It also has the advantage that constants can be
treated either numerically or symbolically.

Two other simulations are described by Omer [124] and Pritzker [134]. Both are
implemented in C++. However, these implementations can also only use numeric
representations and not symbolic representations. None of them implement the
Kronecker product and direct sum to aid the construction of operators such as we
have used for the simulation of teleportation.

The OpenQubit simulation [134] implements the classes QState, which represents
the state of the entire quantum computer, and QRegister, which refers to specific
qubits from QState to be used as a quantum register. Further support for quan-
tum algorithms are provided by four operators denoted by R x , Ry, Ph and GNot
which are rotations, phase changes and the controlled NOT gate. The implementa-
tion supports the simulation of measurement. Shor's factoring algorithm has been
successfully implemented using this system.

The simulation described by Omer [124] attempts to reduce the requirements on


the classical computer used for simulation by reducing the storage requirements for
states, using a bi tvector and by never storing zero amplitudes for states as well
as using other techniques such as hashing. The class quBaseState represents the
state of the quantum computer. The class quSubState references qubits from a
quBaseState, and provides access to the registers of the quantum computer. The
system provides a number of methods to decribe operators. The class opMatrix
represents an arbitrary 2n x 2n matrix. The class opEmbedded is used to describe
operators which are applied to subspaces of the quantum system, and the class
opPermutation is used for the permutation operators. The system provides op-
erators for the identity, arbitrary single qubit transformations, the identity opera-
tion, the qubit swapping operation, controlled NOT, Toffoli gate, and phase change
amongst others. Shor's algorithm has also been illustrated in this system.
Chapter 18
Measurement and Quantum States

IS. 1 Introduction
The interpretation of measurements in quantum mechanics is still under discussion
(Healey [84], Bell [9], Redhead [136]). Besides the Copenhagen interpretation we
have the many-worlds interpretations (Everett interpretations), the modal interpre-
tations, the decoherence interpretations, the interpretations in terms of (nonlocal)
hidden variables, the quantum logical interpretations.

A satisfactory interpretation of quantum mechanics would involve several things.


It would provide a way of understanding the central notions of the theory which
permits a clear and exact statement of its key principles. It would include a demon-
stration that, with this understanding, quantum mechanics is a consistent, empiri-
cally adequate, and explanatorily powerful theory. And it would give a convincing
and natural resolution of the paradoxes. A satisfactory interpretation of quantum
mechanics should make it clear what the world would be like if quantum mechanics
were true. But this further constraint would not be neutral between different at-
tempted interpretations. There are those, particularly in the Copenhagen tradition,
who would reject this further constraint on the grounds that, in their view, quan-
tum mechanics should not be taken to describe (microscopic) reality, but only our
intersubjectively communicable experimental observations of it. It would therefore
be inappropriate to criticize a proposed interpretation solely on the grounds that it
does not meet this last constraint. But this constraint will certainly appeal to philo-
sophical realists, and for them at least it should count in favour of an interpretation
if it meets this constraint.

It is well known that the conceptual foundations of quantum mechanics have been
plagued by a number of paradoxes, or conceptual puzzles, which have attracted a
host of mutually incompatible attempted resolutions - such as that presented by
Schrodinger [143], popularly known as the paradox of Schrodinger's cat, and the
EPR paradox, named after the last initials of its authors, Einstein, Podolsky, and
Rosen [60].

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
492 Chapter 18. Measurement and Quantum States

18.2 Measurement Problem


Consider a spin - ~ particle initially described by a superposition of eigenstates of
Sz, the z component of spin:

where

and

(<1>1<1» = 1.
Thus

ICll 2 + 1~12 = 1.
Let IR =1) and IR =!) denote the up and down pointer-reading eigenstates of an
Sz-measuring apparatus. Thus IR =1) and IR =!) are eigenstates of the operator
k According to quantum mechanics (with no wave-function collapse), if the appa-
ratus ideally measures the particle, the combined system evolves into an entangled
superposition

Common sense insists that after the measurement, the pointer reading is definite.
According to the orthodox value-assignment rule, however, the pointer reading is
definite only if the quantum state is an eigenstate of

the pointer-reading operator, where I is the identity operator. Since l<p) is not an
eigenstate of I ® il, the pointer reading is indefinite. The interpretations of quantum
mechanics mentioned above attempt to deal with this aspect of the measurement
problem. However their solutions run into a technical difficulty which is called the
basis degenemcy problem.
18.3 Copenhagen Interpretation 493

18.3 Copenhagen Interpretation


In the Copenhagen view, the Born rules explicitly concern the probabilities for
various possible measurement results (Healey [84]). They do not concern possessed
values of dynamical variables. On each system there will always be some dynamical
variables which do not possess precise values. In the Copenhagen interpretation,
the Born rules assign probabilities. They have the form

prob",(A E 0) = p.
Here p is a real number between zero and one (including those limits), A is a quan-
tum dynamical variable, 0 is a (Borel) set of real numbers, and 1/J is a mathematical
representative of an instantaneous quantum state. In quantum state 1/J, the proba-
bility of finding that the value of A lies in 0 is p. How is the phrase "of finding" to
be understood? This probability is calculated according to the appropriate quantum
algorithm. For example,

where 1/J is the system's state vector, and PA(O) is the projection operator corre-
sponding to the property A E O. On the present interpretation, a quantum state
may be legitimately ascribed to a single quantum system (and not just to a large
ensemble of similar systems), but only in certain circumstances. A system does not
always have a quantum state. These circumstances are not universal. Nevertheless,
every quantum system always has a dynamical state. Consequently, there can be
no general identification between a system's quantum state and its dynamical state;
nor is it always true that one determines the other.

The Born rules apply directly to possessed values of quantities, and only deriva-
tively to results of measurements of these quantities. In this view every quantum
dynamical variable always has a precise real value on any quantum system to which
it pertains, and the Born rules simply state the probability for that value to lie in
any given interval. Thus the Born rules assign probabilities to events involving a
quantum system u of the form "The value of A on u lies in 0". A properly con-
ducted measurement of the value of A on u would find that value in 0 just in case
the value actually lies in O.

Since the statement of the Born rules then involves explicit reference to measure-
ment (or observation), to complete the interpretation it is necessary to say what
constitutes a measurement. Proponents of the Copenhagen interpretation have typ-
ically either treated measurement (or observation) or cognates as primitive terms
in quantum mechanics, or else have taken each to refer vaguely to suitable interac-
tions involving a classical system. If measurement remains a primitive term, then
it is natural to interpret it epistemologically as referring to an act of some observer
which, if successful, gives him or her knowledge of some structural feature of a phe-
nomenon. But then, quantum mechanics seems reduced to a tool for predicting
494 Chapter 18. Measurement and Quantum States

what is likely to be observed in certain (not very precisely specified) circumstances,


with nothing to say about the events in the world which are responsible for the
results of those observations we make, and with no interesting implications for a
world without observers. This instrumentalist/pragmatist conception of quantum
mechanics has often gone along with the Copenhagen interpretation. On the other
hand, if a measurement is a suitable interaction with a classical system, we need to
know what interactions are suitable.
In what we call the weak version of the Copenhagen interpretation, the dynamical
properties of an individual quantum system are fully specified by means of its quan-
tum state. A dynamical variable A possesses a precise real value ai on a system if
and only if that system is describable by a quantum state for which the Born rules
assign probability one to the value ai of A. In that state, a measurement of A would
certainly yield the value ai. In other states, for which there is some chance that
value ai would result if A were measured, and some chance that it would not, it is
denied that A has any precise value prior to an actual measurement. Within the
limits of experimental accuracy, measurement of a dynamical variable always yields
a precise real value as its result, and this raises the question of the significance to be
attributed to this value, given that it is typically not the value the variable possessed
just before the measurement, nor the value it would have had if no measurement had
taken place. Thus the measured variable acquires the measured value as a result of
the measurement. Then the Born rules explicitly concern the probabilities that dy-
namical variables acquire certain values upon measurement. By ascribing a precise
real value to a variable given earlier, one concludes that after a precise measurement
of a dynamical variable, a system is describable by a quantum state for which the
Born rules assign probability one to the measured value of that variable.
Since, in this version, measurement effects significant changes in the dynamical
properties of a system, it is important for a proponent of the interpretation to spec-
ify in just what circumstances such changes occur. One might expect that such a
specification would be forthcoming in purely quantum mechanical terms, through
a quantum mechanical account of measuring interactions. Such an account would
show how a physical interaction between one quantum system and another, which
proceeds wholly in accordance with the principles of quantum mechanics, can effect
a correlation between an initial value of the measured variable on one system (the
object system) and a final recording property on the other (apparatus) system. The
problem of giving such an account has become known as the quantum measurement
problem. A solution to the measurement problem would explain the reference to
measurement in the Born rules in purely physical (quantum mechanical) terms, and
would also show to what extent the projection postulate may be considered a valid
principle of quantum mechanics. The key difficulty may be stated quite simply.
It is that many initial states of an object system give rise to final compound ob-
ject+apparatus quantum states which, in the present interpretation, imply that the
apparatus fails to register any result at all. For, in such a final compound quantum
state, the Born rules do not assign probability one to any recording property of the
apparatus system (Healey [84]).
18.4 Hidden Variable Theories 495

18.4 Hidden Variable Theories

A motivation behind the construction of such theories has been the belief that some
more complete account of microscopic processes is required than that provided by
quantum mechanics according to the Copenhagen interpretation (Healey [84]). The
general idea has been to construct such an account by introducing additional quan-
tities, over and above the usual quantum dynamical variables (such as de Broglie's
pilot wave, Bohm's quantum potential, or fluctuations in Vigier's random ether),
and additional dynamical laws governing these quantities and their coupling to the
usual quantum variables. The primary object is to permit the construction of a de-
tailed dynamical history of each individual quantum system which would underlie
the statistical predictions of quantum mechanics concerning measurement results.
Though it would be consistent with this aim for such dynamical histories to conform
only to indeterministic laws, it has often been thought preferable to consider in the
first instance deterministic hidden variable theories. A deterministic hidden vari-
able theory would underlie the statistical predictions of quantum mechanics much
as classical mechanics underlies the predictions of classical statistical mechanics. In
both cases, the results of the statistical theory would be recoverable after averaging
over ensembles of individual systems, provided that these ensembles are sufficiently
typical: but the statistical theory would give demonstrably incorrect predictions for
certain atypical ensembles.

Bell [9] showed that no deterministic hidden variable theory can reproduce the
predictions of quantum mechanics for certain composite systems without violating
a principle of locality. This principle is based on basic assumptions concerning the
lack of physical connection between spatially distant components of such systems;
and the impossibility of there being any such connection with the property that a
change in the vicinity of one component should instantaneously produce a change
in the behaviour of the other. Further work attempting to extend Bell's result
to apply to indeterministic hidden variable theories has shown that there may be
a small loophole still open for the construction of such a theory compatible with
the relativistic requirement that no event affects other events outside of its future
light-cone.

Existing hidden variable theories, such as that of Vigier [179], are explicitly nonlocal,
and do involve superluminal propagation of causal influence on individual quantum
systems, although it is held that exploiting such influences to transmit information
superluminally would be extremely difficult, if not impossible. Any superluminal
transmission of causal signals would be explicitly inconsistent with relativity theory.
If this were so, such nonlocal hidden variable theories could be immediately rejected
on this ground alone. Relativity does not explicitly forbid such transmission. Nonlo-
cal hidden variable theories like that of Vigier can conform to the letter of relativity
by introducing a preferred frame, that of the subquantum ether, with respect to
which superluminal propagation is taken to occur. By doing so they avoid the gen-
eration of so-called causal paradoxes. However they violate the spirit of relativity
496 Chapter 18. Measurement and Quantum States

theory by reintroducing just the sort of privileged reference frame. The princi-
ple that a fundamental theory can be given a relativistically invariant formulation
seems so fundamental to contemporary physics that no acceptable interpretation of
quantum mechanics should violate it.

A hidden variable theory is a separate and distinct theory from quantum mechanics.
To offer such a theory is not to present an interpretation of quantum mechanics but
to change the subject. One reason is that a hidden variable theory incorporates
quantities additional to the quantum dynamical variables. Another is that hidden
variable theories are held to underlie quantum mechanics in a way similar to that
in which classical mechanics underlies the distinct theory of statistical mechanics.
A final reason is that a hidden variable theory (at least typically) is held to be
empirically equivalent to quantum mechanics only with respect to a restricted range
of conceivable experiments, while leading to conflicting predictions concerning a
range of possible further experiments which may, indeed, be extremely hard to
actualize.

18.5 Everett Interpretation


Everett's interpretation has proven most influential in the development of the present
interactive interpretation (Everett [64], Bell [9], Healey [84]). The Everett interpre-
tation may be regarded as the prototype of all interactive interpretations, since
it was the earliest and most influential attempt to treat measurement as a physi-
cal interaction internal to a compound quantum system, one component of which
represents the observer or measuring apparatus. The Everett interpretation, like
the present interactive interpretation, rejects the projection postulate. Both inter-
pretations maintain that all interactions, including measurement interactions, may
be treated as internal to a compound system, the universe, whose state evolves al-
ways in accordance with a deterministic law such as the time-dependent Schrodinger
equation. Both deny that it is necessary to appeal to any extra quantum-mechanical
notions such as that of a classical system, or an observer, in order to give a precise
and empirically adequate quantum mechanical model of a measurement interaction.
Finally, both interpretations undertake to explain how, and to what extent, quan-
tum interactions internal to a compound system can come to mimic the effects of
the projection postulate.

According to Everett, all observers correspond to quantum systems, which may


be called, for convenience, apparatus systems. An observation or measurement is
simply a quantum interaction of a certain type between an apparatus system a and
an object system a, which (provided this compound system is isolated) proceeds in
accordance with the time-dependent Schrodinger equation governed by the Hamilton
operator for the pair of systems concerned. In particular, for a good observation
of a dynamical variable A whose associated self-adjoint operator A has a complete
set of eigenvectors { l4>i) }, the interaction Hamilton operator is such that the joint
18.5 Everett Interpretation 497

quantum state immediately after the conclusion of the interaction is related to the
intitial state as follows

I-I,O"EIJ"')
'I-'
= 1,,0")
'1-',
1Zl1'l/J'" ) ---+ I'l/J'O"EIJ"')
1... 1
= IcpO")
" 1Zl1'
I l/J'".. ·,ail )

for each eigenvector Icpr) of A, where the I'l/Jf) are orthonormal vectors, [ail stands for
a recording of the eigenvalue ai of A. The dots indicate that results of earlier good
observations may also be recorded in the state of a. It follows from the linearity of
the Schrodinger equation that an arbitrary normalized initial object system quantum
state Li eMf) with

gives rise to the following transformation

Each component Icpf) 1Zl1'l/J~... ,ail) with nonzero coefficient c; in the superposition on
the right-hand side corresponds to a distinct state in which the observer has recorded
the ith eigenvalue for the measured quantity on the object system, while the object
system remains in the corresponding eigenstate Icp;). Moreover, all these states are
equally real. Every possible result is recorded in some observer state I'l/J~... ,ail)' and
there is no unique actual result. For a sequence of good observations by a single
observer, consisting of multiple pairwise interactions between the apparatus system
and each member of a set of object systems, Everett is able to show the following. If a
good observation is repeated on a single object system in circumstances in which that
system remains undisturbed in the intervening interval (in the sense that the total
Hamilton operator commutes with the operator representing the observed quantity),
then the eigenvalues recorded for the two observations are the same, in every observer
state. This is exactly what would be predicted by an observer who represents each
object system independently by a quantum state vector and regarded the first of each
sequence of repeated measurements on it as projecting the relevant object system's
quantum state onto an eigenvector corresponding to the initially recorded eigenvalue.
This is the first respect in which, for each observer, a good observation appears
to obey the projection postulate. Everett shows that each observer will get the
right probabilities for results of arbitrary good observations on a system which has
been subjected to an initial good observation, if, following this initial observation,
one assigns to the system the quantum state it would have had if projection had
then occurred. For the following two probabilities are demonstrably equal: the
probability of result bj in a subsequent good observation of B on (J by an observer
corresponding to a who applies the projection postulate to the state of (J alone after
498 Chapter 18. Measurement and Quantum States

an initial good observation of A on (J by a yielding result ai; and the probability


assuming that the state of the compound (J EB a evolves according to the Schr6dinger
equation that after the B measurement the observer state of a will record the values
ai and bj, conditional on the observer state of a after the A measurement recording
the result ai of the initial observation. This demonstration explains how, for each
observer, it is as if a good observation prepared a corresponding eigenstate of the
observed system. However this still does not suffice to establish that everything is
as if projection actually occurs. There are two further consequences of projection.
If projection really occurred, then each of several independent observers performing
repeated good observations ofthe same quantity on an otherwise undisturbed system
would necessarily obtain the same result. Moreover, if projection really occurred,
then the state of a immediately after a good observation would be one of the l1Pi). In
this apparatus state the pointer position quantity has its ith eigenvalue, recording
that the observed quantity had its ith eigenvalue on (J. In this apparatus state
the probability is 1 that a subsequent observation of the pointer position quantity
would reveal that it has its ith eigenvalue. Consequently, the result of a subsequent
observation of the pointer position quantity on an undisturbed apparatus system
will reveal that the pointer position quantity had at the conclusion of the initial
interaction with (J the value which recorded the result of the measurement on (J.

18.6 Basis Degeneracy Problem


Many-world, decoherence, and modal interpretations of quantum mechanics suffer
from a basis degeneracy problem arising from the nonuniqueness of some biorthog-
onal decompositions. According to the biorthogonal decomposition theorem, any
quantum state vector describing two systems can for a certain choice of bases, be
expanded in the simple form

where the {IAi)} and {IBi)} vectors are orthonormal, and are therefore eigenstates
of self-adjoint operators (observables) A and B associated with systems 1 and 2,
respectively. This biorthogonal expansion picks out the Schmidt basis. The basis
degeneracy problem arises because the biorthogonal decomposition is unique just in
case all of the nonzero ICil are different. When ICII = IC21, we can biorthogonally
expand

in an infinite number of bases. If CI = C2, then the biorthogonal decomposition of the


apparatus with the particle-environment system is not unique, and therefore gives us
no principled reasoning for singling out the pointer-reading basis. This is the basis
degeneracy problem. The basis degeneracy problem arises in the context of many-
world interpretations. Many-world interpretations [62] address the measurement
18.6 Basis Degeneracy Problem 499

problem by hypothesizing that when the combined system occupies state lip), the
two branches of the superposition split into separate worlds, in some sense. The
pointer reading becomes definite relative to its branch. For instance, in the "up"
world, the particle has spin up and the apparatus possesses the corresponding pointer
reading. In this way, many-world interpreters explain why we always see definite
pointer readings, instead of superpositions.

Elby and Bub [62] proved that when a quantum state can be written in the tri-
orthogonal form

then, even if some of the c; are equal, no alternative bases exist such that Iw) can
be rewritten

Therefore the triorthogonal decomposition picks out a special basis. This preferred
basis can be used to address the basis degeneracy problem. The tridecompositionaZ
uniqueness theorem provides many-world interpretations, decoherence interpreta-
tions, and modal interpretations with a rigorous solution to the basis degeneracy
problem. Several interpretations of quantum mechanics can make use of this special
basis. For instance, many-world adherents can claim that a branching of worlds
occurs in the preferred basis picked out by the unique triorthogonal decomposition.
Modal interpreters can postulate that the triorthogonal basis helps to pick out which
observables possess definite values at a given time. Decoherence theorists can cite
the uniqueness of the triorthogonal decomposition as a principled reason for assert-
ing that pointer readings become classical upon interacting with the environment.

When the environment interacts with the combined particle-apparatus system the
following state results

where IE±) is the state of the rest of the universe after the environment interacts
with the apparatus. As time passes, these environmental states quickly approach
orthogonality:
(E+IE_) ~ O.
In this limit, we have a triorthogonal decomposition of Iw). Even if Cl = C2, the
triorthogonal decomposition is unique. In other words, no transformed bases exist
such that Iw) can be expanded as
500 Chapter 18. Measurement and Quantum States

Therefore, IIJ!) picks out a preferred basis. Many-world interpreters can postulate
that this basis determines the branches into which the universe splits. For the proof
we refer to the literature (Elby and Bub [62]).

18.7 Information Theoretic Viewpoint


The classical concept of information can be extended to the von Neumann entropy in
quantum mechanics. From a classical viewpoint entropy can never be negative. By
extending the definitions of conditional, combined and mutual entropy Adami and
Cerf [1, 38, 39] have concluded that negative entropies are needed in the quantum
case. The measurement problem is then explained as follows. Suppose the system
is in a superposition
I1/;) = ala) + bl,6)
where

Now we introduce an ancillary system A to perform the measurement resulting in


the product state
I1/;A) = ala) ®Ia) + bl,6) 1191,6)·
Finally the observer must interact with the system A to observe the measured value.

I1/;AO) = ala) ®Ia) ®Ia) + bl,6) 1191,6) 1191,6)·


Since we are only interested in the measurement the original system is ignored. In the
mathematical representation this involves taking the partial trace of I1/;AO)(1/;AOI
with respect to the original system. This yields the mixed state

Thus the measurement is classically correlated, but the result is random. Further
measurements will retain this correlation giving the observer the illusion of the pro-
jection postulate being satisfied. The mutual information shared with the original
system vanishes, thus no information is obtained about the state of the original
system.
Chapter 19
Quantum State Machines

19.1 Introduction
In this chapter we introduce the quantum state machine [80, 119]. The quantum
state machine is an extension of classical finite state machines used to represent the
computations possible in quantum computing. Quantum state machines introduce
amplitudes for transitions between states to represent the parallelism available.

19.2 Quantum Automata


Definition. A quantum automaton consists of

• A finite set 5 of states where the elements are uniquely identified with or-
thonormal states in a Hilbert space 1-l of dimension at least 151. One state
So E 5 is designated as the start state. We will use the one-to-one function
m : 5 -+ 1-l to denote the relationship between states and elements of the
Hilbert space. We will use the notation Is) = m(s).

• A suh-Hilbertspace 1-lA of 1-l, and the corresponding projection operator PA


from 1-l into 1tA.

• An alphabet A of possible input symbols.

• A finite set of transitions for each combination of two (possibly identical) states
and symbols in the alphabet. Transitions are ordered 4-tuples (a, b, c, da,b,c)
where a, b E 5, c E A and dEC. We require that L:b,c Ida,b,cI 2 = 1 where the
sum is over all transitions from a. We will also define da,b,c to be zero when
no transition exists between a and b for input c. The values da,b,c must also
satisfy
L d.,t,cds',t,c = os,s'
tES

for every pair of states s, s' E 5, where os,s' is the Kronecker delta.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
502 Chapter 19. Quantum State Machines

The condition
L ds,t,cdsl,t,c = ds,s'
tES

is used to enforce the unitarity of transformations. We can construct a unitary


transition matrix Uc for every input symbol c from the values da,b,c where a and b
determine the column and row in the matrix. The above condition only refers to
entries in the matrix Uc , and refers directly to the unitarity condition in terms of the
entries in the matrix obtained by multiplying Uc with the complex conjugate of the
transpose of Uc . Since we identify transitions with unitary operators in the Hilbert
space, we can describe a computational path as the product of unitary operators.
For the input symbols ala2 ... an the finite quantum automaton defines the evolution
of the initial state according to

From Sn we can define the words which are accepted. If

we say that ala2 ... an is accepted, otherwise it is rejected. Thus the input symbols
define a sequence of unitary operations to apply to an initial state, or a program.
This can be thought of as a program of quantum operations controlled classically,
which is exactly the way we have described quantum algorithms. The end of the
input corresponds to a measurement, i.e. we have to determine if the machine is in
a final state. We cannot define halt states, since the initial state may evolve into a
superposition of halt states and states which are not. This is the quantum halting
problem [110, 101]. The quantum finite automaton cannot crash on an input, since
it simply performs the transition with amplitude (probability) o.

We can also define the amplitude for a state s after n steps as

n
D(s, n) = L II dSj_bSj,aj"
SI,82, ...,snES j=l
al,a2, ... ,anEA

It is easy to show that


L ID(s, nW = l.
sES

Graphically, we can represent quantum automata in the same way as with finite
automata, with the additional labelling of arcs between states with the complex
amplitudes for the corresponding transition. The description of quantum automata
19.2 Quantum Automata 503

is tied closely to unitary transformations in a Hilbert space on an initial state.


It is much simpler to analyse quantum algorithms in terms of unitary operations
in the Hilbert space H. We consider quantum automata since they are language
acceptors. Finite automata are also language acceptors so the classical and quantum
machines can be compared to determine which types of languages they accept. This
is an important question, since it would be useful to know if quantum machines can
achieve more than their classical counterparts.

A q-automaton is a 4-tuple (H, So, A, U), where H is a finite dimensional Hilbert


space, So E H is the initial state, A is a finite alphabet, and U is a mapping from A
to the unitary operators acting on H. The q-automaton is useful since it is described
in terms of the Hilbert space, and the unitary operations defined on its states. For
words
a1 a2 ... an E A X A x ... X A (n times)
we denote by U( a1a2 ... an) the product Uan Uan _ l . . . Ual of the operators derived
from the symbols a1 to an. A finalizing q-automaton is a 5-tuple (H, So, A, U, F)
where (H, So, A, U) is a q-automaton and F is a subspace of H such that either
So E F or So E FJ.. We denote by PF the projection onto the subspace F. The
probability that a word w causes the q-automaton to reach a final state in F is given
by

The function U( w )so is called the response function of the q-automaton. A function
R : A* -+ H is realizable by the q-automaton if R(w) = U(w)so. A word w is
accepted if U (w ) So E F. The language of the q-automaton is the set of all words
accepted by the q-automaton.

For any R : A* -+ H we have

• R is realizable by a q-automaton.

• There exists a unitary operator Ux for every x E A such that R(xw)


U(x)R(w) for w E A*.

• There exists an orthonormal basis l7j;j) for H and an orthonormal basis l7j;j(x))
for every x E A such that (R(xw)l7j;j) = (R(x)l7j;j(x)).

We can define the tensor product of two finalizing q-automata q1 = (H 1, S1, A, U1, F1)
and q2 = (H2' S2, A, U2, F2) over the same input alphabet as

Thus the language accepted by q1 ® q2 is the intersection of the languages accepted


by q1 and q2·
504 Chapter 19. Quantum State Machines

We can extend the languages accepted by a finalizing q-automaton. For 0 :::; 1] < 1
a word w is 1]-accepted by a q-automata q = (H, So, A, U, F) if

The 1]-accepted language for q is the set of all words 1]-accepted by q.

For further results in the theory of quantum automata we refer to Gudder [80].

We proposed that Turing machines implement the classically computable functions,


so we will extend Turing machines to the quantum case since this will provide a
better comparison of the complexity and computability of problems in quantum
computing, with that of the classical case.

19.3 Quantum Turing Machines


Definition. A quantum Thring machine [11, 54, 125, 147] consists of
• A finite set of states S. One state, So E S, is designated as the start state.
The states in S are identified with orthonormal states in a Hilbert space Hs
of dimension at least lSI. The one-to-one mapping ms : S -+ Hs specifies the
association of states and elements of the Hilbert space. We use the notation
Is) := ms(s).

• An alphabet A of possible input symbols.

• An alphabet r of possible output symbols.

• The blank symbol .6..


• A tape or memory device which consists of adjacent cells labelled

... , cell [-1], cell[O], cell [1], ....

Cells of the tape can contain a single symbol from

T:=Auru{.6.}.

The input string is placed in the first cells of the tape, the rest of the cells
are filled with.6.. The content of a cell is identified with orthonormal states
in a Hilbert space HT of dimension at least IAI + Irl + 1. The Hilbert space
describing the tape is thus HM := (8)~oo H T . The one-to-one mapping mT,
with mT : T -+ HT , associates elements in the tape cells with the elements of
the Hilbert space H T . We use the notation It) := mT(t).

• A tape head that can read the contents of a tape cell, put a symbol from r or
the .6. symbol in the tape cell and move one cell right or left. All these actions
take place simultaneously. If the head is at cell[i] and moves left (right) then
19.3 Quantum Thring Machines 505

the head will be at cell[i - 1] (cell[i + 1]). The position of the tape head
is identified with orthonormal states in an infinite dimensional Hilbert space
1iTH . The one to one mapping mTH : Z -+ 1iTH associates the tape head
position (an integer specifying the cell) with elements of the Hilbert space.
We use the notation Ij) := mTH(j).

• A finite set of transitions for states and symbols from ~U r u{~ }. A transition
is an ordered 6-tuple (a, b, c, d, e, fa,b,c,d,e) with

a E S, bE ~ U r u {~}, c E S, dE r u {~}, e E {r, l}


and fa,b,c,d,e E C. Here a is the current state, b is the symbol read by the
tape head, c is the next state, d is the symbol for the tape head to write in
the current cell and e = r (e = l) moves the tape head right (left). If no
transition exists for (a, b, c, d, e) we define fa,b,c,d,e = O. We require that

1. L fs,t,c,d,efsl,tl,c,d,e = lis,s,lit,tl
c,d,e

2. L fa,b,c,d,efa,b,c',dl,e l = lic,c,lid,d,lie,e'
a,b

The quantum Thring machine has a tape which is infinitely long in both directions.
This does not provide any additional computing power over the tape which is only
infinite in one direction, but it does make the description of the quantum Thring
machine simpler since we can avoid crashing the machine, which corresponds to the
lack of a unitary transform to describe what happens when the machine is at ceU[O].

The state of a quantum Thring machine at any time is described by a normalized


state in the Hilbert space

1iTM := 1is ®1iTH ®1iM.

The initial state of the machine is given by

IQT Mo) := Iso) 18110) 181 (~ I~)) ®I'l/Jo) 181 ( ~ I~))

where I'l/Jo) is the initial contents of the tape, using l cells. The evolution of the
machine is described by a unitary operator U, which in turn is specified by the
transitions. Thus after n steps of execution the machine is in the state
506 Chapter 19. Quantum State Machines

The unitary evolution U can be described in terms of the amplitudes of the transi-
tions.

U = L
x,a,b,c,d,e
ia,b,c,d,el c)(al ® Ix + OT,e - ol,e)(xl ® (® IT)
-00
® Id) (bl ® (® IT)
X+ 1

where IT is the identity operator for a tape cell. Considering

UU' = U'U = I = f, la)(al ® Ix)(xl ® (~ h) ® Ib)(bl ® (~IT)

we obtain the constraints

L /s,t,c,d,e/s',t',c,d,e = os,s,flt,tl
c,d,e

and
L fa,b,c,d,e/a,b,e ,d' ,e' = Oc,r!Od,d,tJe,e"
a,b

We cannot determine when a quantum Turing machine halts in the same way as
for quantum automata. Quantum automata relies on a finite input string which
describes the running of the machine and explicitly determines when the machine
halts. The tape of the quantum TUring machine cannot fulfill this role since the
machine can modify any cell on the tape, the input is not "consumed". Deutsch [54]
suggested reserving one cell of the tape which is always in one of two orthonormal
states to indicate when the machine has halted. The cell contents can become
entangled with the rest of the machine, giving a superposition of halted machines
and machines which have not halted [110, 101]. If it is known that for any input
of length n the quantum Turing machine will halt after t(n) steps, we can use the
state indicating the halt status of the machine as a control (in the same way as
the controlled NOT) for the transformation U of the quantum Turing machine, and
measure after t(n) steps with certainty that the machine has halted. Deutsch also
suggested the existence of a universal quantum Turing machine which can simulate
any other quantum Turing machine. Yu Shi [147] discusses why this cannot be the
case.
Chapter 20
Teleportation

20.1 Introduction
Quantum teleportation is the disembodied transport of an unknown quantum state
1'IjI) from one place to another. All protocols for accomplishing such transport re-
quire nonlocal correlations, or entanglement, between systems shared by sender and
receiver. The sender is normally called Alice and the receiver is called Bob. Most
attention has focused on teleporting the states of finite-dimensional systems, such
as the two-dimensional polarization of a photon or the discrete level structure of an
atom. First proposed in 1993 by Charles Bennett and his colleagues [17, 24, 138]
quantum teleporation thus allows physicists to take a photon or any other quantum
scale particle such as an atom and transfer its properties (such as the polarization)
to another photon even if the two photons are on opposite sides of the galaxy. This
scheme transports the particle's properties to the remote location and not the par-
ticle itself. The state of the original particle must be destroyed to create an exact
reconstruction at the other end. This is a consequence of the no cloning theorem. A
role in the teleportation scheme is played by an entangled ancillary pair of particles
which will be initially shared by Alice and Bob.

teleported
state ~

entangled par
~
~ ,~
a>

EPR-source

Figure 20.1: Teleportation

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
508 Chapter 20. Teleportation

20.2 Teleportation Algorithm


Suppose particle 1 (in the following we assume that it is a spin-~ particle) which
Alice wants to teleport is in the initial state

and the entangled pair of particles 2 and 3 shared by Alice and Bob is in the state

Alice gets particle 2 and Bob particle 3. This entangled state contains no information
on the individual particles 2 and 3. It only indicates that the two particles will be
in opposite states. Alice then performs a joint Bell-state measurement on the initial
particle 1 and particle 2 projecting them also onto an entangled state. After Alice has
sent the result of her measurement as classical information to Bob, he can perform
a unitary transformation on the other ancillary particle resulting in it being in the
state of the original particle.

Most experiments are done with photons which are spin-1 particles. The information
to be teleported is the polarization state of the photon. The Innsbruck experiment is
a simplified version of the teleportation described above. In this experiment photons
are used. The photon is a particle with spin 1 and rest mass O. If the photon moves
in positive z direction, i.e. the wave vector k is given by (0,0, kf we have the wave
functions

where el := (1,0, of and e2 := (0,1, of. Thus we have two transverse waves.
Although the photon is a spin-1 particle the vectors sand k can only be parallel
(or antiparallel). The wave ¢>l is in a state of positive helizity and the wave ¢>-l
is in a state of negative helizity. In the Innsbruck experiment at the sending sta-
tion of the quantum teleporter, Alice encodes photon M with a specific state: 45
degree polarization. This photon travels towards a beamsplitter. Meanwhile two
additional entangled photons A and B are created. Thus they have complementary
polarizations. For example, if photon A is later measured to have horizontal (0 de-
grees) polarization, then the other photon B must collapse into the complementary
state of vertical (90 degrees) polarization. Now entangled photon A arrives at the
beamsplitter at the same time as the message photon M. The beamsplitter causes
each photon either to continue towards detector 1 or change course and travel to
20.2 Teleportation Algorithm 509

detector 2. in 1/4 of all cases, in which the two photons go off into different de-
tectors, Alice does not know which photon went to which detector. Owing to the
fact that the two photons are now indistinguishable, the message photon Mioses
its original identity and becomes entangled with A. The polarization value for each
photon is now indeterminate, but since the two photons travel towards different de-
tectors Alice knows that the two photons must have complementary polarizations.
Since message particle M must have complementary polarization to particle A, then
the other entangled particle B must now attain the same polarization value as M.
Therefore teleportation is successful. The receiver Bob sees that the polarization
value of the particle B is 45 degrees, which is the initial value of the message photon.
In the experimental version of this setup executed at the University of Innsbruck,
the 45-degree polarization would always fire when detector 1 and detector 2 fired .
Except in rare instances attributable to background noise, it was never the case that
the 135-degree polarization detector fired in coincidence with detectors 1 and 2.

~91 __
1~a"~~le"

Figure 20.2: Experimental Realization of Teleportation


510 Chapter 20. Teleportation

Teleportation can also be understood using the quantum circuit shown in the fol-
lowing figure.

A------<r-IHt------'--------.-----

c-----<~----~~~~

Figure 20.3: Quantum Circuit for Teleportation

In the figure A is the input 1'Ij!), B the input 10) and C the input 10). Now we study
what happens when we feed the product state 1'Ij!00) into the quantum circuit. From
the circuit we have the following eight 8 x 8 unitary matrices

U5 = 12 0UxoR , U6 = 12 012 0UH , U7 = 14ffiUNoTffiUNoT, Us = 12 012 0UH


where ffi denotes the direct sum of matrices [162] and

Applying the first four unitary matrices U4U3 U2 Uj to the input state we obtain

~(IOOO) + 1100) + 1011) + 1111)) + ~(1010) -1110) + 1001) -1101)).

This state can be rewritten as

~(IOOO) + 1100) + 1010) + 1110)) + ~(1011) + 1111) + 1001) + 1101)).


20.3 Example Program 511

This state can be rewritten as

Thus the state I1/'} will be transferred to the lower output, where both other outputs
will come out in the state (IO) + II}}/Y'2. If the two upper outputs are measured
in the standard basis (IO) versus II}}, two random classical bits will be obtained in
addition to the quantum state I1/'} on the lower output.

Consider the case when the qubit to be teleported is one qubit of an entangled pair.
The first two qubits are entangled. Applying the teleportation algorithm to the
second, third and fourth qubits yields

~(IOO) + Ill}}@ IOO} ---+ ~(IOOOO) + 1100l}}·

The first and last qubits are now entangled, whereas the first and second are no
longer entangled. Thus we have achieved entanglement swapping.

20.3 Example Program


The following program uses SymbolicC++ [169, 166] to implement the teleportation
algorithm. It builds the matrix using the description given in the previous section.
It uses the direct sum and Kronecker product [162] extensively.
II teleport.cpp

#include "Vector.h"
#include "Matrix.h"
#include "Rational.h"
#include "Msymbol.h"
using namespace std;

typedef Sum<Rational<int> > C;

template <class T> Vector<T> Teleport(Vector<T> v)


{
int i;
assert(v.length() == 8);
Vector<T> result;

Matrix<T> NOT(2,2);
512 Chapter 20. Teleportation

NOT[O] [0] = T(O); NOT[O] [1] = T(l);


NOT[l] [0] = T(l); NOT[l] [1] = T(O);

Matrix<T> H(2,2);
H[O] [0] = T(1)/sqrt(T(2)); H[O] [1] = T(1)/sqrt(T(2));
H[l] [0] = T(1)/sqrt(T(2)); H[l] [1] = T(-1)/sqrt(T(2));

Matrix<T> 1(2,2);
I. identity 0 ;

Matrix<T> X(4,4);
X[O] [0] = T(l); X[O] [1] - T(O); X[O] [2] - T(O); X[O] [3] = T(O);
X[l] [0] = T(O); X[l] [1] = T(1); X[l] [2] - T(O); X[l] [3] = T(O);
X[2] [0] = T(O); X[2] [1] - T(O); X[2] [2] - T(O); X[2] [3] = T(l);
X[3] [0] = T(O); X[3] [1] - T(O); X[3] [2] = T(l); X[3] [3] - T(O);

Matrix<T> Ul=kron(I,kron(H,I));
Matrix<T> U2=kron(I,X);
Matrix<T> U3=kron(X,I);
Matrix<T> U4=kron(H,kron(I,I));
Matrix<T> U5=kron(I,X);
Matrix<T> U6-kron(I,kron(I,H));
Matrix<T> U7-dsum(I,dsum(I,dsum(NOT,NOT)));
Matrix<T> U8=kron(I,kron(I,H));

result=U8*(U7*(U6*(U5*(U4*(U3*(U2*(Ul*v)))))));
for(i-0;i<8;i++)
{
while(result[i].put(power(sqrt(T(2)),-6),power(T(2),-3)));
while(result[i).put(power(sqrt(T(2)),-4),power(T(2),-2)));
while(result[i].put(power(sqrt(T(2)),-2),power(T(2),-1)));
}
return result;
}

II The outcome after measuring value for qubit.


II Since the probabilities may be symbolic this function
II cannot simulate a measurement where random outcomes
II have the correct distribution
template <class T>
Vector<T> Measure(Vector<T> v,unsigned int qubit,unsigned int value)
{
assert(pow(2,qubit)<v.length());
assert(value==O I I value-=l);
int i,len,skip = l-value;
Vector<T> result(v);
T D = T(O);
20.3 Example Program 513

len = v.length()/int(pow(2,qubit+l));
for(i=O;i<v.length();i++)
{
if(!(i%len)) skip = i-skip;
if(skip) result[i] = T(O);
else D += result[i]*result[i];
}
result/=sqrt(D);
return result;
}

II for output clarity


ostream &print(ostream &o,Vector<C> v)
{
char *b2 [2] ={" 10>" ," 11>"};
char *b4[4]={" 100>"," 101>"," 110>"," Ill>"};
char *b8[8]={" 1000>"," 1001>","1010>"," 1011>",
"ll00>","ll0l>","lll0>","ll11>"};
char **b,i;

if(v.length()==2) b=b2;
if(v.length()==4) b=b4;
if(v.length()==8) b=b8;

for(i=O;i<v.length();i++)
if(!v[i) . is_Number 0 II v[i] . nvalue 0 !=C(O))
o « "+(" « v[i) « "),, « b[i);
return 0;
}

void main(void)
{
Vector<C> zero(2),one(2);
Vector<C> zz(4) ,zo(4) ,oz(4) ,00(4) ,qreg;
Vector<C> tpOO,tpOl,tpl0,tpll;
Sum<Rational<int> > a("a",O),b("b",O);
int i;

zero[O) = C(l); zero[l) C(O) ;


one [0) = C(O); one[l) C(l) ;

zz = kron(Matrix<C> (zero) ,Matrix<C>(zero)) (0);


zo = kron(Matrix<C>(zero),Matrix<C>(one))(O);
oz = kron(Matrix<C>(one),Matrix<C>(zero))(O);
00 = kron(Matrix<C>(one),Matrix<C>(one))(O);

qreg=kron(a*zero+b*one,kron(zero,zero))(O);
cout « "UTELEPORT("; print(cout,qreg) « ") ,
II.
514 Chapter 20. Teleportation

print(cout,qreg=Teleport(qreg» « endl;
cout « "Results after measurement of first 2 qubits:" « endl;
tpOO = Measure(Measure(qreg,O,O) ,1,0);
tpOl = Measure(Measure(qreg,O,O) ,1,1);
tpl0 = Measure(Measure(qreg,O, 1) ,1,0);
tpll = Measure(Measure(qreg,O, 1) ,1,1);
for(i=0;i<8;i++)
{
while(tpOO[i] .put(a*a,C(l)-b*b»;
while(tpOO[i] .put(power(sqrt(C(1)/C(2»,-2),C(2»);
while(tpOl[i] .put(a*a,C(l)-b*b»;
while(tpOl[i] .put(power(sqrt(C(1)/C(2»,-2),C(2»);
while(tpl0[i] .put(a*a,C(l)-b*b»;
while(tpl0[i] .put(power(sqrt(C(1)/C(2»,-2),C(2»);
while(tpll[i].put(a*a,C(l)-b*b»;
while(tpll[i].put(power(sqrt(C(1)/C(2»,-2),C(2»);
}
cout « " 100> print (cout,tpOO) « endl;
cout « " 101> print (cout ,tpOl) « endl;
cout « " 110> print (cout ,tpl0) « endl;
cout « " 111> print(cout,tpll) « endl;
cout « endl;

The program generates the following output:

UTELEPORT(+(a)1000>+(b)1100» = +(1/2*a) 1000>+(1/2*b) 1001>


+(1/2*a) I010>+(1/2*b) 1011>
+(1/2*a) 1100>+(1/2*b) 1101>
+(1/2*a) I110>+(1/2*b) 1111>
Results after measurement of first 2 qubits:
100> +(a) 1000>+(b) 1001>
101> +(a) 1010>+(b) 1011>
110> +(a) 1100>+(b) 1101>
111> +(a) I110>+(b) 1111>
Chapter 21
Quantum Algorithms

21.1 Deutsch's Problem


Deutsch's problem [54] is given as follows. Suppose we have a boolean function

f: {O, I} ---; {O, I}.

There are four such functions, the constant functions which map all inputs to 0 or
all inputs to 1, and the varying functions which have f(O) # f(I). In other words

11(0) = 0, 11(1) = 0
f2(0) = 1, f2(1) = 1
h(O) = 0, h(l) = 1
14(0) = 1, 14(1) = O.

The first two functions are constant. The task is to determine for such a function if
it is constant or varying using only one calculation of the function. In the classical
case it is necessary to compute f twice before it is known whether it is constant or
varying. For example if f(O) = 0, the function could be 11 or h, similarly for any
other single evaluation two of the functions have the same value.

In quantum computing the following solution was found [47, 178]. The function is
implemented on quantum hardware with the unitary transformation Uf such that

Uflx) @ Iy) = Ix) @ Iy E!:l f(x))

where E!:l denotes the XOR operation. We apply the transformation to the state

11/!):= ~(IO) + 11)) @(IO) -11)).

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
516 Chapter 21. Quantum Algorithms

This gives

1
2(10) @ 10 E9 f(O)) -10) @ 11 E9 f(O)))

1
+ 2(11) @ 10 E9 f(I)) -11) @ 11 E9 f(I))).

Since Uf is linear and quantum mechanics allows the superposition of states we have
calculated the function f with each input twice. This feature of quantum parallelism
makes it possible to solve the problem. Applying the Hadamard transform UH to
the first qu bit yields

1
(UH @12)Uf l1/i) = M(lO) @ (10 E9 f(O)) -11 E9 f(O)) + 10 E9 f(I)) -11 E9 f(I))))
2v2

1
+ M(ll) @ (10 E9 f(O)) -11 E9 f(O)) -10 E9 f(I)) + 11 E9 f(I)))).
2v2

If f is constant we have f(O) = f(l) and we find

1
(H@I2 )Uf l1/i) = )210) @ (10 E9 J(O)) -11 EB J(O))).

If f is varying we have

oE9 J(O) 1 E9 J(I),


1 E9 f(O) oE9 f(l)
and we find

Thus measuring the first qubit as zero indicates the function f is constant and
measuring the first qubit as one indicates the function f is varying. The function f
was only calculated once using U, and so the problem is solved.
21.1 Deutsch's Problem 517

The algorithm has been implemented using nuclear magnetic resonance techniques
[43,98].

A balanced boolean function is a boolean function which maps to an equal number


of a's and l's. In other words f : {a, l}n ...... {a, 1} is balanced if

I{x E {a, l}n I f(x) = a}1 = I{x E {a, l}n I f(x) = 1}1·

Deutsch and Josza [47, 55] generalized the problem. Let f : {a,l}n ...... {a, I} be
a boolean function. Assume that f is either constant or balanced. Thus f maps
only to a, only to 1, or to an equal number of a's and l's for all possible inputs.
The problem is to determine if f is constant or balanced using only one function
evaluation. Let Uf be the unitary operator which implements the function f,

Uflx) 119ly) = Ix) 119ly EEl f(x))

where x E {a, l}n and y E {a, I}. We note first that

1
y'2lx) 119 (Ia EEl f(x)) -11 EEl f(x)))

( 1)f(x)
y'2 Ix) 119 (Ia) -11)).

We also use the fact that

where x E {a,l}n and j * x denotes the bitwise AND of j and x followed by the
XOR of the resulting bits

The initial state is

1
I1/!) := y'2la) ® (Ia) - 11)).
518 Chapter 21. Quantum Algorithms

The first step is to apply the Walsh-Hadamard transform to the first n qubits of
11/1) .

( ®UH)I1/1)=
n
!n
v 2·· xE{O,l}n
L IX)0(IO)-11)).

Applying the unitary operator Uf yields

Applying the Walsh-Hadamard transform on the first n qubits gives

The probability that a measurement of the first n qubits yields 100 ... 0) is

~ L
I2
(_1)f( Xl I
2
= {1 l~f f is constant
xE{O,qn 0 f is balanced

Thus, after applying these transformations, measuring all 10) for the first n qubits
indicates that f is constant and any other result indicates that f is balanced. The
network representation for the algorithm is given in Figure 21.1.

nqubitsl~
H ----0-
10)-11) - - - - - - i U / f - - - - - 10)-11)

Figure 21.1: Network Representation to Solve Deutsch's Problem


21.2 Simon's Problem 519

21.2 Simon's Problem


Simon's problem[150] can be viewed as a generalization of Deutsch's problem. Sup-
pose we have a function

f:{O,lt----){O,l}m, m~n.

Furthermore, it is given that either f is one to one, or there exists a non-trivial s


such that for all x =J y
f(x) = f(y) {==} Y = x E9 s.
The problem is to determine if f is one to one, and if not to find s.

We start with the initial state


® 1°)·
n+m

Next we apply the Walsh-Hadamard transform to each of the first n qubits,

Only one function evaluation is required. This is the next step in the solution.

The final step is to apply the Walsh-Hadamard transform to the first n qubits again.

Suppose now that for all x =J y

f(x) = f(y) {==} y = x E9 s.


The amplitude of the state Ik) 0If(j)) is given by
520 Chapter 21. Quantum Algorithms

We have

(jo Ell 8o)ko + (jl Ell 8l)kl


+ ... + (jn-l Ell 8n -l)kn - l mod 2
joko + 8oko + jlkl + 8l kl + ... + jn-lkn-l + 8n- l kn - l mod 2
j * k + S* k mod 2.

Thus if 8 *k = 0,
(j Ell 8) * k = j * k.
If 8 * k =J 0 then the amplitude of the state Ik) @ 1/(j)) is zero. Thus measuring the
n+m qubits yields with certainty a number t such that t*8 = O. Repeating the above
procedure O(n) times will yield enough linearly independent t and corresponding
equations of the form t * 8 = 0 so that 8 can be found.

If 1 is one to one, then each measurement will yield a random value. The resulting
8 determined from the equations must then be tested, for example the values 1(0)
and 1(8) can checked for equality.

The expected time of the algorithm is O(nTf(n) + G(n)) where Tf(n) is the time
required to compute 1 on inputs of n bits and G(n) is the time required to solve n
linear equations for n unknowns in {O, I}.

The algorithm does not guarantee completion in polynomial time for the worst case.
A quantum algorithm which is guaranteed to complete in polynomial time (or less)
is called an exact quantum polynomial time algorithm. Brassard and H¢yer [27]
discovered an exact quantum polynomial time algorithm to solve Simon's problem.
Their algorithm is based on Simon's solution with the addition that after each
iteration, 0 and values of t already found are removed from the superposition. Thus
the time to determine the n linear equations is precisely determined.

We can eliminate the state with the first n qubits equal to t as follows. Suppose the
lth bit of t is 1. We begin with the final state of an iteration of Simon's solution.
We add an auxiliary qubit 10) and then apply the controlled NOT with qubit I as
the control and the auxiliary qubit as the target which gives the state

1 2n-12n-l .
UcNoT(I, n + m + 1) 2n L L
(-l)J·klk) @ I/(j)) @ 10)
j=O k=O

2~ 2I:lt 2I:\ _l)j·(k


j~O p~O k,~O
fllpt)lk Ell pt) @ I/(j)) @ Ip)
21.2 Simon's Problem 521

where kl is the lth bit of k and Ot = 0 and It = t. We obtain this result by separating
the sum over those k with the lth bit 0 and those with the lth bit 1. We also make
use of the fact that the k form a group with EB, the bitwise XOR operation. Thus
(k EB pt) will take all the values over k.

Now we apply the operator which maps Ix) to Ix EB t), for the first n qubits, only
when the auxiliary qubit is 11) and leaves the first n qubits unchanged otherwise.
In other words we apply the operator

which can be built using n controlled controlled NOT gates. Applying the operator
U(JJt yields

U(JJt 21n 2~1 t 2~1 (-I)j*(k(JJpt)lk


J=O p=O kl=O
EB pt) ® IfU)) ® Ip)

2: 2~lt 2~\_I)j*(k(JJmt)lk)
J=O p=0 kl=O
® IfU)) ® 1m)

= ;n 2~1 t 2~\ -1)j*klk) ® If(j)) ® (( -1)j*ptlp))


j=O p=O kl=O

Since the first n +m qubits are independent of p we can discard the auxiliary qubit.
Thus the final state is

This state is the same form as the final state of each iteration of Simon's solution,
except for the fact that the probability of measuring the first n qubits as the t
already found is 0 and the probability of measuring some other t' with t' * s = 0 is
greater. This process can be repeated to eliminate all of the t values with t * s = 0
already found. The above modification to the algorithm does not ensure that the
first n qubits will never be measured as o. To remove this possibility, Brassard
and H0yer [27] use a modification of Grover's search algorithm which, under certain
conditions, succeeds with probability 1. The technique is discussed in Section 21.6.
522 Chapter 21. Qua:ntum Algorithms

21.3 Quantum Fourier- Traasform


The discrete Fourier transform of a set of data x(O), x(l), ... , x(n -1) E C is given
by [160, 162]

1 n-l
x(k) = - L x(j)e-·27rkJ/n.
._

n j=O

The inverse is given by

n-l
x(j) = L x(k)ei27rki/n.
k=O

The discrete Fourier transform is useful in finding periodicity in a sequence. The


definition can be extended for quantum states as follows when n = 2m and provided
Ix(0)12 + IX(1)12 + ... + Ix(n -lW = n.
The transform can then be written as follows

The transform is unitary and is called the quantum Fourier transform. We have
[47,162]

1 1 1 1
1 e- i27r/ n e- i41r / n e- i (n-l)21r/n
1 1 e- i4'1r/n e- i81r / n e-i (n-l)47r/n
UQFT=-
..;n
1 e-i(n-l)211'/n e-i (n-l)47r/n e- i (n-l)22'1r/n

The transform can be implemented using m(~+l) single and double qubit gates
[47,138]. Let Hk denote the Hadamard transform acting on qubit k and Ups(k,j, </J)
denote the phase shift transform Ups(</J) acting on qubits k and j. We can rewrite
the transform as (by relabeling the sums)
21.3 Quantum Fourier Transform 523

1 m-l
®
;;=;:;n (10) + exp( -i21fk21- m )ll))
y2"· 1=0

1
ffm
m-l
~(IO) + B
m-l .
exp(-i21fkj2J-m)ll))

The network representation is given in Figure 21.2.

I~o)~ ...

Ikm - 3 )
Ikm - 2)
Ikm - 1)

Figure 21.2: Network for the Quantum Fourier Transform


524 Chapter 21. Quantum Algorithms

21.4 Factoring (Shor's Algorithm)


Shor [148J invented an algorithm for a quantum computer that could be used to find
the prime factors of integer numbers in polynomial time. The mathematical basis
of Shor's algoritm is a follows [8, 18, 118, 148, 118J. The aim is to find the prime
factors of an integer N. Let N = 21. Then N can be written as N = 3 . 7, where 3
and 7 are the prime factors of N = 21. This factorization problem can be related
to finding the period r of the function (x = 0, 1, 2, ... )

fa,N(X) := aX mod N
where a is any randomly chosen positive integer (a < N) which is coprime with N,
i.e. which has no common factors with N. If a is not coprime with N, then the
factors of N are trivially found by computing the greatest common divisor of a and
N.

Example. Let N = 21 and a = 11. Thus N and a have no common factors. Now
for x = 0, 1,2,3,4,5,6 we find

f11,21(0) = 1, f11,21(1) = 11, f11,21(2) = 16, fll,21(3) = 8,


fll,21(4) = 4, f(l1, 21)(5) = 2, f11,21(6) = 1.
Thus we see that the period is r = 6. The period r can also be found by solving the
equation

ar = 1 modN
for the smallest positive integer r. Obviously we find

11 6 = 1 mod 21

Since

for all x E N, we have


ar = ImodN
where r is even. This is equivalent to

(a; + 1)(a~ -1) = OmodN.

Knowing the period of fa,N, we can factor N provided r is even and

a;modN i - I .
For the example given above these two conditions are met. When a is chosen
randomly the two conditions are satisfied with probability greater than 1/2 [47J.
21.4 Factoring (Shor's Algorithm) 525

The factors of N are then given by

gcd(a r / 2 + 1, N), gcd(ar / 2 - 1, N).

The greatest common divisor can be found using the Euclidean algorithm. The Eu-
clidean algorithm runs in polynomial time on a classical computer. For the example
given above with N = 21, a = 11 and r = 6 we find

gcd(113 + 1,21) = 3, gcd(113 - 1,21) = 7.


Next we describe how a quantum computer can find the period r ofthe number a and
therefore factorize N. Shor's technique for finding the period of a periodic function
consists of evaluating the function on a superposition of exponentially many argu-
ments, computing a parallel Fourier transform on the superposition, then sampling
the Fourier power spectrum to obtain the function's period.

The quantum computer is prepared with two quantum registers, X and Y, each
consisting of a string of qubits initialized to the Boolean value zero

11/10) == ID) @ ID) .


The X register is used to hold arguments of the function f whose unknown period r is
sought, and the Y register is used to store values of the function. The width of the X
register is chosen so that its number of possible Boolean states is comfortably greater
than the square of the anticipated period r. The Y register is made sufficiently wide
to store values of the function f. We first set the input register X into an equally
weighted superposition of all possible states from D to 22L - 1, where

22L -1 ~ N 2.
This can be achieved by applying the Hadamard transform to each qubit of the
input register X, i.e.

1 22L_1
11/11) == (UH @ U H @ ... @ UH) @ liD) @ ID) = 2L I: Ix) @ ID).
x=o

Applying the operator I @ Uf.,N to this state we obtain the state

1 22L_1
11/12) == 2L I:
x=o
Ix) @ Ifa,N(x)).

At this stage, all the possible values of fa,N are encoded in the state of the second
register. However, they are not all accessible at the same time. However, we are not
526 Chapter 21. Quantum Algorithms

interested in the values themselves, but only in the periodicity of the function fa,N'
Observing the second quantum register as u would yield

with

laX =U mod N
g,,(x):= { 0 otherwise .

The next step is to Fourier transform the first register. This means we apply a
unitary operator that maps the state onto

1 22L_122L_l
11/13) = 22L L L exp(21l'ixk/22L) Ik) QSllfa,N) .
x=o k=O

The probability for finding the state

is

where the sum is over all numbers

such that

aX = am mod N .
Using x = m + br with b E No the sum becomes
21.4 Factoring (8hor's Algorithm) 527

Factoring out the term exp(27rimk/22L) yields

where {rk} is an integer in the interval

which is congruent to
rkmod(22L -1).
The above probability has well-defined peaks if

{rkh2L

is small (less than r), i.e., if rk is a multiple of 22L

rk = d2 2L
for some d < N. Thus, knowing L and therefore 22L and the fact that the position
of the peaks k will be close to numbers of the form d2 2L , we can find the period r
using continuous fraction techniques. To explicitly construct the unitary evolution
that takes the state 11/;1) into the state 11/;2) is a rather nontrivial task [118].

There are r¢(r) states which can be used to determine r [148], where ¢ is Euler's
totient function. Each state occurs in the superposition with probability at least
1/(3r2), so the probability of measuring a state which can be used to determine r
is ¢(r)/(3r). Using the fact that

for some u, the probability of measuring such a state is at least U/(log2(log2r)).


Thus repeating the algorithm O(log2 (log2 r)) times ensures a high probability of
successfully measuring a state from which r can be determined.
528 Chapter 21. Quantum Algorithms

21.5 The Hidden Subgroup Problem


In the previous sections we discussed important algorithms which illustrate the
characteristics of quantum computation. Shor [148] also discussed the problem of
finding the discrete logarithm, i.e. suppose we have x = aY then the discrete log
of x with base a is y. Kitaev was able to generalize these problems in terms of
the Abelian stabilizer problem [102]. Let G be any group acting on a finite set X.
Each element 9 of G acts as a map 9 : X - t X such that for all gl, 92 E G we
have 91 (92(X)) = (gl· 92)(X) where· is the group operation. The stabilizer Ba(x) of
x E X is the subgroup of G with s(x) = x for all s E Ba(x). The problem is to find
the stabilizer for a given x. When G is abelian then we have the Abelian stabilizer
problem.

Now the algorithms of Deutsch, Simon, Shor and Kitaev as well as others can be
formulated group theoretically as a hidden subgroup problem [93, 100, 121]. Let f
be a function from a finitely generated group G to a finite set X such that f is
constant on the cosets of a subgroup K and distinct on each coset. The cosets of K
are the sets
9· K:= {g. k IkE K}, 9 E G.

The cosets partition G, i.e. the union of all the cosets is the set of the group G and
every two cosets are equal or their intersection is empty. Thus we write f : G - t X
and
K={kEGlf(k·g)=f(g), V'gEG}.
The problem is, for given f and G determine the hidden subgroup K.

We describe briefly how the above mentioned problems can be expressed in terms
of the hidden subgroup problem.

Deutsch's problem. We set G = Z2 = {O, I} with the group operation· = EB, the
XOR operation. The only subgroups are {O} and {O, I}. If K = {O} then f is
balanced and if K = {O, I} then f is constant.

Simon's problem. For Simon's problem we have G = {O, l}n with· = EB, the bitwise
XOR operation. Simon's problem requires that f(x) = f(y) if and only if x = y or
x = y EB s. Immediately we see that K = {O, s}.

Shor's factorization algorithm. Let fa,N( x) := aX mod N where a is given and is


coprime with N and x E G = ZN the group of integers with addition modulo N. In
this case K = {O, r, 2r, . .. , kr} where r is the period of fa,N and kr is the greatest
multiple of r which is less than N.

Discrete logarithm. Let G be the group Zr x Zr where Zr is the additive group of


integers modulo r. Let a, bEG with b = am be given. We wish to find m. Further,
let f : G - t G be defined by f(x, y) = aXbY. Obviously f(x, y) = ax +my . We assume
21.5 The Hidden Subgroup Problem 529

that it is known that a is of order r, i.e. aT = 1. Thus f(XI, YI) = f(X2' Y2) if and
only if
Xl - X2 = -m(YI - Y2) mod r.
Equivalently f(x, y) = f(x· s, Y . t) if and only if s = -mt mod r. The hidden
subgroup (which is used to determine m) is

K = {(k, -km) Ik = 0, 1, ... , r -1}.


Abelian stabilizer problem. Let G be a group acting on a finite set X. Let f : G --t
X be defined by f(g) = g(x) for a given X E X. Here K = Se(x).
The quantum Fourier transform is an important component of quantum algorithms.
Defining a quantum Fourier transform on an Abelian group [93, 99, 100] is necessary
for a description of the algorithm to solve the Abelian subgroup problem. Suppose
Uf is the unitary operation which implements f, i.e.

Uf(lg) 010)) = Ig) 0If(g))


where 9 E G. For any kl' k2 E K we have f(g· kl ) = f(g· k2)' Define for 9 E G

1
Ig· K):= !IVi L Ig· k).
VIKI kEK

Thus (f is constant on a coset of K)

Uf(lg· K) 010)) = Ig· K) 0If(g))·


If we apply Uf to the superposition

IG) 010) = L Ig) 010)


gEe

and measure the function value, the measurement projects the first register onto
one of the cosets of K (since the cosets form a partition of G). From the coset we
would like to determine K. The states Ig . k) are all displaced by 9 with respect
to the group operation. We can associate the idea of a periodic sequence 9 . k
where the "period" of the sequence is the generator of the subgroup K. Thus we
can try to apply a transform analogous to the quantum Fourier transform. The
transform is constructed using techniques from group representation theory. For
more information see Josza [99, 100].

The above mentioned problems are all defined with Abelian groups. The construc-
tion of the Fourier transform is also for Abelian groups. Non-Abelian hidden sub-
group problems create more difficulties, for some results on these problems see for
example Ivanyos et al. [95], and Rotteler and Beth [140].
530 Chapter 21. Quantum Algorithms

21.6 Unstructured Search (Grover's Algorithm)


In some problems it is necessary to determine if any element in a set S satisfies a
certain property. An immediate example is the problem of satisfiability. Suppose
P(Xl' X2, ... , xn) is a predicate on n boolean variables, P can be satisfied if some
combination of assignments to the boolean variables results in P being true. If 0
represents false and 1 represents true, we can directly associate bit sequences with
assignments to the n boolean variables in the predicate. The task is to determine
if any bit sequence causes the predicate to be true. Classically every assignment
might have to be checked until one satisfies the predicate. This process is of order
O(2n). Grover [78, 79] proposed a solution that is of order O(2~). Let

We use the notation

n-times

® U := 'u ® U ®... ® U.
n

Now suppose the unitary operator Up performs the operation

Denote by X T the set of all bit sequences of length n which satisfy P, and by X F
the set of all bit sequences of length n which do not satisfy P. Thus applying Up
to l"po) gives

= ..; ~+1 Up ( E Ix) ® 10) - E Ix) ® 11) + E Ix) ® 10) - E Ix) ® 11))
2 xEXT XEXT XEXF XEXF

= ..; ~+1 (E Ix) ® 11) - E


2 xEXT xEXT
Ix) ® 10) + E
XEXF
Ix) ® 10) - E
xEXF
Ix) ® 11))

= ..;
2
~+1 ( E
xEXF
Ix) - E
xEXT
IX)) ® (10) -11))
21.6 Unstructured Search (Grover's Algorithm) 531

The amplitudes of the bit sequences satisfying P are negative. If the last qubit had
only been 10), the sequences satisfying P would be marked with a 11) in the last
qubit, but measuring would yield any sequence with equal probability, in general
obtaining a sequence which satisfies P would have low probability. The state 11/11) is
not much better, but can be manipulated to increase the probability of measuring
a sequence which satisfies P. The next step is to increase the absolute value of the
elements of X T , i.e. those elements in the superposition with negative amplitudes.
This is done with the inversion around average operation. This operation maps each
amplitude ai to 2A - ai where A is the average of all the amplitudes. We note

2A - ai == A + (A - ail
which explains the terminology "inversion around the average". The operation is
repesented by the 2n x 2n unitary matrix U1A (where fA indicates inversion about
the average)

i,j=I,2, ... ,2n.

For n = 1 and n = 2 we have the matrices

J)
1 1

[-I
2 2
1 1
-2
(~ ~),
2
1 1
2 -2
1 1
2 2

The operator U1A can be written as

... 1
... 2Ti"
2 )
-hn
. .. 2n

(~UH) diag(2,0, ... ,0) (~UH) - hn


532 Chapter 21. Quantum Algorithms

(~UH) diag(l, -1, ... , -1) (~UH)

where 12 n is the 2n x 2n unit matrix. If only one state satisfies P, the inversion about
average operation inverts and increases the amplitude of the state with negative
amplitude while the other amplitudes decrease. The process is repeated (calculate
P on the bit sequences with the amplitudes of those states satisfying P negative, and
inversion about the average) iv'2" times for a greater than 50% chance of obtaining
the state which satisifes P [26]. The algorithm also works when more than one x
satisfies P [26]. Unlike classical algorithms, applying the process further will lead
to a decrease in the probability of measuring the required state. This is due to
the fact that the states are normalized, and operations perform a rotation in the
state space. Since iterations of the algorithm always perform the same rotation, the
rotation must at some stage necessarily move away from the desired state, although
it may approach the desired state again under further iteration of the algoritm. We
can think of the algorithm as the rotation of a ray from the origin to the surface of
the unit ball. For the case of a single qubit with real amplitudes the rotation is on
the unit circle. Figure 21.3 gives the network representation of the algorithm.

i v'2" times
~

10)

10)

10) -11)

Figure 21.3: Network Representation of Grover's Algorithm

The algorithm has been generalized for the case when the amplitudes of the states
in the superposition are initially in an arbitrary configuration [22].

Bennett et al [20] found lower bounds for the unstructured search problem and
proved that a square root speed up (obtained by Grover's algorithm) is optimal.

Consider any quantum algorithm A for solving the unstructured search problem.
First we do a test run of A on the function f == o. Define the query magnitude of x
21.6 Unstructured Search (Grover's Algorithm) 533

to be l:t la x ,tI 2 , where ax,t is the amplitude with which A queries x at time t. The
expectation value of the query magnitude is T /2n. Thus,

For such x, the Cauchy-Schwarz inequality gives

Let the states of the algorithm A run on f be 14>0),14>1)' ... ' I4>T). We run the
algorithm A on the function

g(x):= { °
Ix = Y
x=f.y

Suppose the final state of A run on 9 is I1/JT). Then [20] 1114>T) - I1/JT) II must be
small and

where

and II Ix) II denotes the norm~.


The algorithm has also been proposed as a fast database search. This can be achieved
as follows. Suppose we search for the data relating to an item Xid. The predicate
Pid will search for a match for Xid. For simplicity we assume that there are 2n items
stored in the database. It is simple to construct the n bit quantum register

Now we prepare the database state as a tensor product of all items in the database,
the identifier of the item we are searching for, a qubit to store the search result and
534 Chapter 21. Quantum Algorithms

Is}. A superposition state of all the items in the database could also be used, but this
would require determining for each bit sequence if the item is in the database which
reduces the efficiency to no better than classical. We also assume that the database
is maintained in this tensor product form since constructing the quantum database
each time for a search again reduces the efficiency to no better than classical. The
initial state is

This state associates the data Idj } with the identifier Ix;}. We define the unitary
operator U~ by

Thus applying U~ to I'lj!} yields

From this point the algorithm proceeds as before. The probability of registering the
!
second quantum register as the id representing Xid is greater than and identifies
the element in the database to examine.
21.6 Unstructured Search (Grover's Algorithm) 535

Grover's algorithm has an interesting property when ~ of the states in the superpo-
sition are states satisfying P. We consider again the state l'l/h)

Now we have IXFI = ~2n and IXTI = ~2n. Applying U1A to the first n qubits (i.e.
we apply U1A 119 1) will ensure that measurement of the first n qubits will yield a
state from X T . To see this we calculate the average A of the amplitudes of the first
n qubits. We find

#
1 (34 41)
1
2#

Since each amplitude of a state in X F is ffn' these amplitudes become 2A- ffn = o.

The amplitudes of states in X T are - ffn·


These amplitudes become 2A + ffn = In.
In this case a single iteration of Grover's algorithm guarantees success.

Brassard and H0yer [27] discuss a related algorithm which works more generally,
when the probability of measuring a state which satisfies P is ~. Let l'lji) be the
state
l'lji) := IXT ) + IXF ),
where IXT ) is the superposition of all states satisfying P and IXF ) is the superpo-
sition of all states not satisfying P. There is no constraint on the amplitudes of the
states, except that l'lji) must be normalized. Obviously we must have (XTIX F) = O.
Suppose (XTIXT ) = t, then (XFIXF) = (1 - t). The algorithm transforms l'lji) to
l'lji') with
l'lji') = (2i(1- t) - l)IXT ) + i(l- 2t)IXF ).
If t = ~ then the amplitudes of all states in X F are zero. If t = 0 the algorithm
changes the global phase only. Let A be a quantum algorithm which evolves 10) (an
appropriate tensor product of 10) qubits) into the superposition l'lji), represented as
536 Chapter 21. Quantum Algorithms

the unitary operator UA . Instead of multiplying amplitudes of IXT ) by -1 we need


to multiply them by i. We achieve this by applying

to I'l/J) ® 10) where

is the single qubit phase change gate, ignoring the global phase. Similarly we need
So which takes the state 10) (n qubits) to iIO). The transform we use is described
by
G = UASOUi.1Sp.
We need to calculate (XTIGIXT) and (XFIGIXF). We have

Sp(IXT) + IXF)) = iIXT) + IXF).

The following calculations lead to the desired result for GI'l/J).

UAIO) IXT) + IXF)


Ui.1IXF) 10) - Ui.1IXT)
Ui.1Spl'l/J) iUi.1IXT) + 10) - Ui.1IXT) = (i - l)Ui. 1IXT) + 10)

We need to determine SoUi.1IXT). First we calculate (0IUi.1IXT) = (XTIUAIO) = t.


Thus we can define l¢l) := Ui.1IXT) - tlO) orthogonal to 10) so that

Ui.1IXT) tlO) + l¢l)


UAI¢l) (1 - t)IXT) - tIXF)
SoUi.1Spl'l/J) (i -1)(itIO) + l¢l)) + ilO)
UASoUi.1Spl'l/J) it(i -1)(IXT) + IXF))
+(i -1)«1- t)IXT) - tIXF))
+i(IXT) + IXF))
(i(l- t) - 1)IXT) + i(l- 2t)IXF)

Thus we obtain the desired result. This technique is used to remove 10) from the
superposition in the exact quantum algorithm for the solution of Simon's problem
since at least one qubit has probability ~ of being measured as 11) and every other
qubit has either probability 0 or ~.
21. 7 Quantum Key Distribution 537

21. 7 Quantum Key Distribution


In cryptography systems a key is used to decipher an encrypted message. It is
necessary to guarantee security when transmitting keys, otherwise the encryption
system achieves nothing. In the case of public key cryptography systems, a public
key transmitted securely gives a third party even less information to attempt to
decode messages. Thus it would be useful to be able to transmit keys securely. In
classical terms this cannot be guaranteed. A third party can obtain the key by
copying the key while it is transmitted. Due to the no cloning theorem, copying
of quantum states can cause disturbance in the states. Transmitting a key using
quantum states can thus be used to detect a third party attempting to copy the key.

Let Eo denote the basis {10), II)} and El denote the basis

Suppose Alice transmits the key and Bob is to receive the key. Alice and Bob agree
to use one to one mappings io : Eo ~ {O, I} and II : El ~ {O, I} to uniquely
convert between 0 and 1 and a given basis.

Bob

Figure 21.4: Quantum Key Distribution


538 Chapter 21. Quantum Algorithms

For each bit in the key Alice randomly chooses a basis from Bo and Bl and sends the
quantum state from that basis which corresponds to the bit. Bob randomly chooses
a basis from Bo and Bl and measures the quantum state he receives relative to this
basis. On average 50% of the time the basis chosen by Alice and Bob will be the
same. After Bob has received all the bits, Alice and Bob communicate on an open
channel to determine which quantum states were prepared and measured using the
same basis. This determines which bits are used, since Alice and Bob have the same
bit values in these cases. Suppose now a third party Eve attempts to obtain the
key from the quantum states sent by Alice to Bob. Eve attempts to measure the
states being sent from Alice to Bob by randomly choosing a basis from Bo and B 1 ,
which she chooses correctly 50% of the time. Eve then resends the quantum state
using the basis she guesses and the value she measured. When Bob uses the same
basis as Alice when Eve does not, Bob will measure the correct value 50% of the
time. This means that when Alice and Bob use the same basis when Eve attempts
to obtain the key Bob will obtain an incorrect value 25% of the time. Alice and
Bob agree after sending all the quantum states to use a number of the states where
the same basis was used to determine if someone tried to obtain the key. If enough
states are used an error rate of larger than (say) 5% may be agreed to indicate the
transmission was potentially influenced by a third party and is insecure. The rest
of the corresponding states are used as the key if the error rate is low enough.

Other techniques have been described for quantum key distribution based on Bell's
inequality [61], measurement uncertainty [12, 14, 15], and a distribution scheme
where quantum states are reused [36] using entanglement. These schemes rely on
the fact that the third party (Eve) cannot pretend to be Alice and Bob, i.e. Eve can
only inspect the quantum states which are sent, and cannot receive a state and send
another. If this were not the case Eve could impersonate Bob and Alice without
either party knowing.
21.8 Dense Coding 539

21.8 Dense Coding


Dense coding [16, 138, 156] uses an EPR pair and a single qubit to transmit two
classical bits of information. The EPR pair is shared ahead of time between the
transmitter and receiver. Only a single qubit is needed to transfer the two classical
bits of information. This is possible due to entanglement. Entanglement makes it
possible to transform a two qubit state to one of four orthogonal states by inter-
acting with only one qubit. This choice can be identified with two classical bits
of information. Dense coding illustrates the use of entanglement as an information
resource. The scheme is also secure in the sense that communication is only possible
between the the two parties which share the EPR pair.

The transmitter (Alice) and the receiver (Bob) each have one quantum subsystem
which together form the quantum system of an already prepared EPR state

1 1
I¢) := y'2(100) + 111)) == y'2(IO) QS) 10) + 11) QS) 11)).

We let the first system denote Alice's quantum subsystem (qubit) of the EPR state
and the second system denote Bob's quantum subsystem (qubit) of the EPR state.
Alice can transform I¢) to anyone of the Bell basis states according to

( (~ ~) QS) 12) I¢) ~(lOO) + 111)) == <p+

( (~ ~) QS) 12) I¢) ~(llO) + 101)) == 1l1+

1
( (~ ~1) QS) 12) I¢) y'2(IOO) -111)) == <p-

(( ~1 ~) QS) 12) I¢) ~(-llO) + 101)) == 1l1-

where 12 is the 2 x 2 unit matrix.


540 Chapter 21. Quantum Algorithms

Alice has two bits representing the values 0,1,2 or 3. She transforms 11/1) according
to the following table.

Value Initial state Transformed state


0 11/1) ~(IOO) + 111))
1 11/1) ~(l1O) + 101))
2 11/1) ~(-11O) + 101))
3 11/1) *(100) - 111))

Table 21.1: Dense Coding: Alice's Transformations

The transformations are obviously unitary. Alice then sends her quantum qubit to
Bob. Now Bob applies a controlled NOT using the first (Alice's) qubit as the control
and then applies the Hadamard transform to the first qubit. Finally a controlled
NOT is applied to yield the data. The following table describes the quantum state
after the transformation.

Value Initial state Transformed state Final state

° ~(IOO} + 111)) 100) 100}

1 ~(11O) + 101)) 101) 101)

2 ~(-110) + 101)) 111) 110)

3 *(100) -111)) 110) 111)

Table 21.2: Dense Coding: Bob's Transformations

Thus the transformed state uniquely determines the value 0,1,2 or 3.


Chapter 22
Quantum Information Theory

22.1 Introduction
The concepts of classical information theory can be extended to quantum infor-
mation theory. Since in general measurement yields a result with probability, we
may suggest using these probabilities in classical information theory. However the
probabilities do not contain phase information, which cannot be neglected. Thus the
definitions are given in terms of the density operator. These probabilities depend on
the basis used for measurement. A density operator p over a n-dimensional Hilbert
space 11. is a positive operator with unit trace. The trace tr(A) is defined as

n
tr(A) := ~]{3jIAI{3j}
j=1

where (3j for j = 1, ... , n is any orthonormal basis in 11.. Thus tr(p) = 1. The
eigenvalues of a density operator are greater than zero. By the spectral theorem
every density operator can be represented as a mixture of pure states

n
p = LPjlaj)(ajl
j=1

where laj} for j = 1, ... , n are the orthonormal eigenvectors of p (which form a
basis in 11.), and

n
Pj ER, Pj 2:: 0, LPj = 1.
j=1

The eigenvalue Pj is the probability of finding the state laj}.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
542 Chapter 22. Quantum Information Theory

22.2 Von Nenmann Entropy


The Von Neumann entropy of a system A, represented by the density operator PA,
is defined as [38]
S(A) := -trA(PA log2 PA).
The log function is base 2, following the convention of classical information theory,
and gives the qubit as the natural unit of information. Pure states have density
operators which are projection operators, their eigenvalues are 0 and 1, and so the
Von Neumann entropy for a pure state is always zero.
Example. We consider the Werner state.

The Von Neumann entropy is

For two quantum systems, A and B, the combined entropy is given by


S(AB) := -trAB(PAB log2 PAB).
The conditional and mutual entropies (S(AIB) and S(A : B) respectively) on the
quantum systems A and B are defined as [38, 39]
S(AIB) = -trAB(PAB log2 PAIB)
and
S(A : B) = -trAB(PAB log2 PA:B)
where
PAIB

It can be verified that the following identities hold.


S(AB) == S(A) + S(BIA)
and
S(A: B) == S(A) + S(B) - S(AB).
22.3 Measures of Entanglement 543

22.3 Measures of Entanglement

22.3.1 Bell's Inequality


Bell's inequality [40, 128, 129, 145, 170] was initially used to show that a local hidden
variable model of quantum mechanics is inadequate. The assumption of a local hid-
den variable model leads to an inequality that must be satisfied, but experimentally
is shown to be violated. As an example, consider a correlation experiment in which
a particle of total spin zero decays into two particles each with spin ~. A rotatable
polarizer and detector are set up for each particle, sufficiently far from the source,
so that the correlation between the spin orientations can be investigated. The first
polarizer, with angular setting 0:, only lets the first particle through if its spin in
the direction I1a takes the value~. The second polarizer, with angular setting (3,
lets the second particle through only if its spin in the direction I1a takes the value ~.
The detectors respond if the spin is positive, otherwise it is negative. A measure of
the correlation is N(o:; (3), defined by the relative number of experiments resulting
in the first particle registering as positive with the first polarizer at angle 0:, and the
second as positive with the second polarizer at angle (3. We have

where N.(O:,'Y,{3) is the relative number of experiments where the first particle has
positive spin with the polarizer at angle 0: and negative with (3, and spin s with the
polarizer at angle 'Y. In a local hidden variable theory, these quantities are available.
Since
N(o:, 'Y) = N+(o:, (3, 'Y) + N_(o:, (3, 'Y),

N(""t,{3) = N+(""t,o:,{3) + N_(""t,o:,{3),


we have N(o:,'Y) ~ N_(o:,{3,'Y) and N(""t,{3) ~ N+('Y,o:,{3). Thus

N(o:,/3) ::; N(o:,,) + N(t,/3).


For coplanar detectors, 0: = 0, {3 = ~ and 'Y = ~ the inequality is not satisfied.

A variant of Bell's inequality derived by Clauser, Horne, Shimony and Holt is given
by [127]
I(AB) + (AB') + (A' B) - (A' B') I ::; 2
The operators A and A' (respectively Band B') are normalized, noncommuting
and can be measured by an observer. The expectation values can be calculated if
the quantum state is known, and it is also experimentally observable, by repeating
the measurements sufficiently many times with identically prepared initial pairs of
quantum systems. The validity of this inequality for all combinations of independent
measurements on both systems is necessary, although not sufficient, for the existence
of a local hidden variable model.
544 Chapter 22. Quantum Information Theory

The nonlocal nature of entangled states can be illustrated without a statistical


formulation [25, 186]. Consider the GHZ (Greenberger-Horne-Zeilinger) state

1
11/1):= y'2(100l) + 1110)).

We consider the basis a = {IL), IR)} and the basis (3 = {IH), IV)} described by

a: 10) = ~(IL) + IR)), 11) = ~(IL) -IR)),

(3: 10) = ~(IH) + IV)), 11) = ~(IH) -IV))·

Expressing only one qubit of 11/1) in the basis a and the rest in the basis (3, and
expressing each qubit in the basis a yields

11/1) ~(ILHH) + IRVH) -ILVV) -IRHV)) a(3(3


~(IHLH) + IVRH) -IVLV) -IHRV)) (3a(3
~(lHVL) + IVHL) -IHHR) -IVVR)) (3(3a
~(ILLL) + IRRL) -ILRR) -IRLR))
Measuring in the basis (3 for each qubit must yield a duplicate result. Measuring
two qubits in the (3 basis as given in the previous equations allows us to deduce the
result of measuring the other qubit in the a basis. Let (3j denote the result after
measuring qubit j in the (3 basis. As an example, from the first equality, if (32 = (33
then the first qubit is IL) and IR) otherwise. Thus we construct the following table.

Outcomes in (3 basis Outcomes in a basis


(31 = (32 = (33 LLR
(31 = (32 =I- (33 RRR
(31 = (33 =I- (32 RLL
(32 = (33 =I- (31 LRL

None of the results obtained are consistent with the final equation for 11/1) in the a
basis for all qubits.
22.3 Measures of Entanglement 545

22.3.2 Entanglement of Formation


Here we introduce a measure of entanglement for mixed states. It is called the
entanglement of formation [19]. Another measure is distillable entanglement [19].
Let A and E be two quantum systems. We define for the coupled system A @ E

where trB denotes the partial trace over E, i.e we use I @ l,Bj) as the basis for the
trace where l,Bj) is an orthonormal basis in E. The measure of entanglement E(AE)
is then defined as
E(AE) := S(PA)'
This describes the entanglement for pure states.

Example. For the state 11{i) := 0(100) + 101) + 110)) we find

PA = trBI1{i)(1{i1 = H210)(01 + 10)(11 + 11)(01 + 11)(11).


Thus

where we used the eigenvalues ~ ± v;: of PA.


For mixed states we take the minimum entanglement as defined above over all
possible ensembles of pure states realizing PAB. Thus if

we have

where
PA,j = trB(I1{ij)(1{ijl)·
Example. For the Werner state

it has been shown that E (W) ~ 0.117 [19]. If we use the definition given for pure
states we obtain 1.
546 Chapter 22. Quantum Information Theory

22.3.3 Conditions on Entanglement Measures


The states which are not entangled can be written as a convex combination of
tensor products of pure states. We define the set of states which are not entangled
or sepamble for a particular tensor product space 'Hl ~ 'H 2 as

s = {p Ip = Lj PjPl,j ~ P2,j, Lj Pj = 1, Pj ~ 0, Pl,j E 'Hl, P2,j E 'H 2 }.

Thus we can define the entanglement of a state as the minimum distance to any
state in the set S [176, 177]. The Hilbert-Schmidt norm is defined as

IIAIIHS:= Vtr(A*A).

Thus a measure of entanglement may be defined as [184]

EHS(p) := min lip - O"llks


uES

where the minimum is taken over all states 0" E S which are not entangled.

Three processes are used to increase correlations between two quantum subsystems,
i.e. to distill locally a subensemble of highly entangled states from an original
ensemble of less entangled states.

1. Local geneml measurements. The measurements are performed separately by


two parties. They are described by the operators Aj and B j with the condition

EAjAj = I and EBjBj = I.


j j

The joint action is described by

2. Classical communication. The actions of the two parties can be correlated.


This can be described by a complete measurement on the whole space. If PAB
describes the initial state shared by the two parties, local general measurement
and classical communication gives
22.3 Measures of Entanglement 547

3. Post-selection. The general measurement is not complete. The density matrix


describes a subensemble of the original ensemble with appropriate normaliza-
tion.

A manipulation involving these techniques constitutes a purification procedure.


There are some properties we would like a measure E of entanglement to have
[176, 177],

1. E(p) = 0 if and only if pES.

2. For any unitary transforms UI in HI and U2 in H 2 ,

3. The expected entanglement cannot increase due to local general measurement,


classical communication and post-selection described by Lj Vj*Vj = I, i.e.

l:tr(aj)E(aj/tr(aj)) ::; E(a)


j

where
aj = VjaVj*.

So if we define the measure of entanglement according to the distance, as above, we


require that the "distance" measure ensures that the entanglement measure obeys
these three properties. Thus the distance does not have to be a metric.
Vedral et al. [176, 177] also give sufficient conditions for the distance measure to
define a measure of entanglement. Ozawa has shown [126] that the Hilbert-Schmidt
norm does not satisfy these conditions. Thus the Hilbert-Schmidt norm may not be
useful for the definition of an entanglement measure.
A related question is to determine which states belong to S. The Peres-Horodecki
criterion [90, 91, 127] gives a necessary and, under certain conditions, sufficient
criterion for separability.
548 Chapter 22. Quantum Information Theory

22.4 Quantum Coding


From classical information theory we have the noiseless and noisy coding theorems.
Schumacher [144] developed the quantum equivalents which we present here. As in
the classical case we wish to transmit information over some sort of channel. An
important difference is due to the no-cloning theorem. Quantum states cannot be
copied across the channel since this would violate the no-cloning theorem. Instead,
for a quantum transmission channel, transposition can be used. For example tele-
portation can be used to transmit information. The sender transposes the source
system M with another system X which can be conveyed to the transmitter. The
system X serves as the channel. The receiver then transposes X with the system
M'. If M and M' are identical, the inverse operation can be used to transfer the
information. In other words, suppose U transposes the state from M to X then U- 1
transposes the state from X to M'.

U(I1/JM) o lOx)) = 10 M ) 011/Jx)


U-1(IOM/) 011/Jx)) = I1/JM/) o lOx)
Suppose the state to be transmitted is laM). In transmission some information may
be lost (for example due to noise, or a channel with lower dimension than the source
system), so the end state must be represented by a density operator w(a)MI (this
is due to the fact that we need to trace over the portion of the system which is
lost, and the original state may have been entangled). The probability that w(a)M'
represents laM/) is given by

Thus a measure of the fidelity would be

F(M,M') = LP(a)tr(laM) (aMlw(a)M/)


a

where p( a) is the probability that a is sent and

Two results are required before we can prove the quantum noiseless coding theorem.
22.4 Quantum Coding 549

Lemma 1. Let the ensemble of signals in M be described by the density operator p


with
p = LP(a)laM)(aMI·
a

If the quantum channel C has dimension d and any projection r onto a d dimensional
subspace of M has the property

tr(pr) < "I


for some fixed "I, then F(M, M') < "I.

Since the channel has only dimension d, the final decoded state w(a)MI is only sup-
ported on a d dimensional subspace of M'. Let r denote the projection onto this
subspace. In other words w(a)MI results from a unitary transformation of the sepa-
rable state w(a)c 0 OM'-C where OM'-C is the initial state introduced once the state
is transmitted. Let w(a)M',r denote w(a)M' in the subspace. The d eigenstates of
w(a)MI,r (denoted by I¢>(ah), .. . I¢>(a)d)) form an orthonormal basis in this subspace.
Let >.(ah, .. . ,>.(a)d denote the eigenvalues corresponding to these eigenstates. Then
w (a) M',r can be expressed as

d
w(a)MI,r = L >.(ahl¢>(ah)(¢>(ahl·
k=O

Denote by 11jI(a)k) the state I¢>(ah) 010r.L) which is the state I¢>(a)k) extended in
M'. The projection operator r is given by

d
r = L 11jI(a)k) (1j!(ahl·
k=O

Now

d
tr(laM)(aMI L >.(a)klljl(ah) (ljI(ahD
k=O

d
< laM) (aMI L tr(lljI(ah) (ljI(ahl)
k=O
550 Chapter 22. Quantum Information Theory

Thus for the fidelity we have

F(M, M') = LP(a)tr(laM)(aMlw(a)MI) :s; tr(pr) < "'.


a

Lemma 2 Let the ensemble of signals in M be described by the density operator p.


If the quantum channel C has dimension d and there exists a projection r onto ad
dimensional subspace of M which has the property

tr(pr) > 1- '"

°
for some fixed :s; '" :s; 1, then there exists a transposition scheme with fidelity
F(M, M') > 1 - 2",.

The proof is by construction. Let r G be such a projection, projecting to the subspace


a. Let r G-L denote the projection onto the subspace of M orthogonal to a. This
allows us to rewrite laM) in a more usable form.

laM) (aMlrGlaM)rGlaM) + (aMlrG-L laM)rG-L laM)


I'claG) +I'c-LlaG-L)

where

and

Obviously
11'01 2 + 1I'0-L1 2 = 1.
Let the dimension of M be N, 11), ... , Id) denote an orthonormal basis in a, and
Id + 1), ... , IN) denote an orthonormal basis in ai.. Also let lIe), ... , Ide) be an
orthonormal basis in C and I(d + I)E)"" , INE) be an orthonormal basis in E (the
system representing the information lost during transmission). The states in a
will be used for transmission over the channel C. The initial state is prepared as
laM) ® 10e) ® 10E) where 10E) is the initial state for the system which represents the
°
loss of information, and we require (OElkE) = for k = d + 1, ... , N. The following
unitary transformation is used to prepare the state for transmission.

U Ik) 10) 10) {ID) ® Ike) ® IDE) k = 1, ... , d


Me ® e ® E = 10) ® IDe) ® IkE) k = d + 1, ... , N
22.4 Quantum Coding 551

Applying the transformation to laM) gives


Ie) := UMclaM) o 10c) 010E)
d N
b'C;UMC L(klac)lk) +'Ya~UMC L (klac~)lk)) o 10c) 010E)
k=l k=d+l

d N
'Ya L(klac)IO) 01kc) 010E) + 'Ya~ L (klac~)IO) o 10c) o IkE)
k=l k=d+l

where we used

d N
lac) := L(klac)lkc), laE):= L (klac~)lkE)'
k=l k=d+1

Only system C is used in transmission so systems M and E must be traced over


yielding
trM,E(le)(el) = l'YaI2Iac)(aci + ha~1210c)(Ocl·
After the transmission the system is augmented with an initial state for the receiving
system, in terms of two systems equivalent to M and Gl.. Thus we work with the
operator
v(a)MI = l'YaI2IO)(OI0Iac)(acI010E)(OEI + ha~1210)(OI010c)(OcI010E)(OEI.
For the decoding step we use uMb which gives

The fidelity is given by


F(M,M') = LP(a)tr(laM)(aMlw(a)MI)
a

LP(a)trbal'YaI2IaM) (acl + l'Ya~12(aMIO)laM)(OI)


a
552 Chapter 22. Quantum Information Theory

LP(a)(I'YoI4 + l'Yo.L 12( (aMIO) )2)


a

LP(a)(1-I'Yo.L1 2?
a

> 1 - 2 LP(a)I'Yo.L 12
a

It is given that
tr(pfa) LP(a)laM)(aMlfa
a

> 1-77
Thus
LP(a)I'YaI 2 > 1 - 77.
a

This gives the fidelity


F(M, M') > 1- 277.
Suppose we have a density matrix p representing the ensemble of states which may
be transmitted. We can write p as
p = L Aala) (al
a

where Aa and la) are the eigenvalues and orthonormal eigenstates of p. The Von
Neumann entropy is then
S(p) = - LAalog2Aa.
a

For the density matrix ® N P of N identical and independent systems the eigenval-
ues and orthonormal eigenstates are given by the products of N eigenvalues and
eigenvectors of p.
22.4 Quantum Coding 553

If we interpret the eigenvalue Aa as the probability that the eigenstate la) is trans-
mitted then the Von Neumann entropy is the classical Shannon entropy of these
probabilities, and so following page 206, the number of sequences of N eigenstates
which are likely to be transmitted is bounded above by 2N (S(p)H) and below by
(1- E)2 N (S(p)-o).

Quantum Noiseless Coding Theorem

1. If the quantum channel C has dimension at least 2S (p)+o then there exists
No(6, E) such that for all N > No sequences of eigenstates of p of length N can
be transmitted via C with fidelity greater than 1 - E.

There exists No(6,~) such that for all N > No

tr(®p)f> 1- ~
N 2

where f is a projection onto a subspace with dimension dim(C). f projects


to a subspace containing no more than 2N S(p) + 6 likely eigenstates of ®N p,
where the orthonormal basis is given by a subset of the orthonormal eigenstates
of ®N p. The sum of the corresponding eigenvalues is greater than 1-~. Thus
the fidelity
F(M,M') > 1- E.

2. If the quantum channel C has dimension at most 2S(p)-o then there exists
No(6, E) such that for all N > No sequences of eigenstates of p of length N
cannot be transmitted with fidelity greater than E.

There exists No(6, E) such that for all N > No

where f is any projection onto a subspace with dimension dim(C). Following


the reasoning on page 206 the sum of the eigenvalues in any dim( C) subspace
are bounded by E, which gives the above result. Thus

F(M,M') < E.


554 Chapter 22. Quantum Information Theory

22.5 Holevo Bound


The quantum noiseless coding theorem discussed in the previous section dealt only
with pure states, further work has been done in considering mixed states for coding
[86, 87, 88, 132].
Let A denote an input alphabet, C the set of quantum states in the Hilbert space
of the quantum communication channel and
c:A-+C
the mapping from the input alphabet to quantum states. The quantum states are
represented by density operators. Further let p( a) for a E A denote the probability
that a will be required to be sent over the quantum channel. The channel capacity
is then given by

m?, (S( L p( a)c( a)) - L p( a)S( c( a)))


aEA aEA

where the maximum is over all probability distributions and L:aEAP(a) = 1. The
quantity
H(A, C, c) := S(L p(a)c(a)) - L p(a)S(c(a))
aEA aEA

is called the Holevo information. Decoding is described by the positive operators


Xb with

where b is the output from the alphabet B. We denote by p(bla) the probability
that b is the output (where Xb is identified with b) if the input was a. The Shannon
information is given by

With these quantities we can describe the quantum entropy bound,


m:xIs(p,A,X) :S H(A, C,c)

where equality is achieved if and only if the operators p( a)c( a) commute.


Chapter 23
Quantum Error Detection and Correction

23.1 Introduction

The algorithms discussed in Chapter 21 rely on having isolated systems to store


information. In practical applications this is not possible, and the environment in-
teracts with the systems causing decoherence. Suppose the data is contained in the
state Ix), and the environment is described by IE). The initial state of the entire
system is described by the tensor product of the states Ix) 0IE), which evolves ac-
cording to some unitary operation U. The state Ix) evolves according tD the unitary
operation Ux which describes the algorithm. In classical error correction codes, all
that needs to be corrected are bit flips (see Chapter 10). In the quantum case errors
such as bit flips, phase changes and rotations complicate the error correction tech-
niques. Since arbitrary errors in an encoding of information cannot be corrected,
only certain types of errors are assumed to occur (this was also the case in classical
error correction, where an upper hound in the number of bits that could be flipped
in a transmission was assumed). The types of errors depend on the implementation.
For example, suppose the types of errors (which we assume are distinguishable due
to an encoding) are described by the unitary basis E 1 , .•• , En so that all errors are
a linear combination [138]

We use the state Ix)@IO), where Ix) is an encoded quantum state with the necessary
property that it can be used to determine if any error of E 1 , ••. , En has occurred,
and the second quantum register will hold the number of the type of error which
occurred. Further let S denote the operator for the error syndrome [138].

S(Ej @I)lx) @10) := Ix) @Ij)·

Now the encoded state with errors is given by

n
(E@ I) Ix) @ 10) = L Ejlx) @ 10).
j=l

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
556 Chapter 23. Quantum Error Detection and Correction

Applying the operator for the error syndrome gives

n
S(E ® I) Ix) ® 10) = L Eilx) ® Ij)·
i=l

Measuring the second register identifies the error. Suppose the measurement corre-
sponds to Ik), then the error is easily repaired since

(E;l ® I)((Ek ® I) Ix) ® Ik)) = Ix) ® Ik).

This illustrates that the additional difficulties in quantum error correction can, to
some degree, be overcome by the properties of quantum mechanics itself. In classical
error correction codes, duplication is used to overcome errors. This simple approach
cannot be directly applied in quantum error eorrecting codes since this would involve
a violation of the no cloning theorem. In the following section a code is introduced
which involves duplication of certain properties of a state, and does not violate the
no-cloning theorem. These duplications are specific to certain types of errors. The
code words for 10) and 11) must be orthogonal to make sure they are distinguishable.

Further error eorrecting techniques introduced are fault tolerant error correction
codes [58, 149, 157] which allow for some errors occuring in the error correction
process, and fault tolerant quantum gates [149].

23.2 The Nine-quhit Code


The nine-qubit code invented by Shor [75] can be used to correct a bit flip error and
a phase error. The code does this by duplicating the states from the orthonormal
basis and the phase of the state, and then using correction by majority, assuming
that only one such error has occurred. Only two duplications are needed so that a
"majority out of three correction" scheme can be used. The coding applies to one
qubit. Overcoming a bit flip error is achieved by the mapping

10) -+ 1000)

11) -+ 1111)
Thus the qubit ItPo) = adO) + 1111) is mapped to
ItPl) = 01000) + 111111).
Thus a single bit flip error can be corrected by a majority value correction scheme.
First the additional syndrome register must be added. We apply the operator

So = (I ® US)UXOR(1,5)UXOR(1,6)UXOR(2,4)UXOR(2,6PXOR(3,4)UXOR(3,5)
23.2 The Nine-qubit Code 557

where UXOR(i,j) denotes the CNOT operator working with the ith qubit as the
control and the jth qubit as the target, and

Us = 1000)(0001 + 1111)(1111 + 1100)(0111 + 1010)(1011 +


1001)(1101 + 1110)(0011 + 1101)(0101 + 1011)(1001·

So to correct the error we simply apply UXOR(3,1)UXOR(4,2PXOR(5,3). It is simple to


extend this to correct phase errors in one qubit. The mapping is extended to

10) -? (1000) + 1111)) 0 (1000) + 1111)) 0 (1000) + 1111))


11) -? (1000) -1111)) 0 (1000) -1111)) 0 (1000) -1111)).

The bit flip errors are corrected for each of the three 3-qubit registers in the same
way as above. The phase error (i.e. at most one sign change) is dealt with in
exactly the same way using the subspace described by {IOOO) + 1111), 1000) -ll11)}
instead of {IO), II)}. It is important to note that the total phase is ignored. Using
this procedure we can correct both bit flips and sign changes if they occur. The
operators I, (Jx, (Jy and (Jz described by

(Jy
0 -i)
=( i 0 '

form a basis for all 2 x 2 matrices since

Furthermore the unit matrix I effects no error on a qubit, (Jx effects a bit flip, (Jz
effects a sign change and (Jy effects a bit flip and sign change. All these errors can
be corrected by the nine-qubit code. Thus any linear combination of these errors
can be corrected. Consider the arbitrary phase change

which can also be corrected by this scheme. Thus the scheme can correct anyone-
qubit error.
558 Chapter 23. Quantum Error Detection and Correction

23.3 The Seven-qubit Code


A slightly more efficient code is a seven qubit code invented by Steane [133, 154].
The code uses classical Hamming codes to do the error correction for both phase
errors and bit flip errors. The classical seven-bit Hamming code can correct any
single-bit error. First we consider the mapping

This coding is obviously orthogonal and can correct any single bit flip error. But
we require that the coding can correct phase errors as well. This can be done by
noting that a phase change is a bit flip when we apply the Hadamard gate

and use the basis {10') = UHIO), II') = UHII)}. Applying the Hadamard transform
to all seven qubits in the code gives

Thus the code can still correct any single qubit error.
This code still requires seven qubits to encode a single qubit. Thus a 128 qubit
system requires 896 qubits to operate reliably. With such a large number of qubits
it is possible that interactions with the environment involving not only single qubits
can become a larger problem. Thus it is desirable to encode a qubit with as few
qubits as possible, in the next section we show that a qubit can be encoded reliably
with less that 7 qubits.
23.4 Efficiency and the Five-quhit Code 559

23.4 Efficiency and the Five-qubit Code


A measure of efficiency of a code is how many qubits the code needs to correct
an arbitrary single quhit error. From the above we know that any code correcting
arbitrary errors needs to be able to correct the three errors described by CYx , CYy
and CYz • These errors map codeword subspaces into subspaces. To distinguish these
errors the subspaces must be orthogonal. Suppose the code consists of n qubits.
Three errors can occur for each qubit or no errors occur. Furthermore the two
states must be orthogonal in each of these subspaces. Thus [108]

The requirement of orthonormality of the encoding is used to aid a derivation of the


code. Thus we obtain the constraints

31 31
L It£kl 2= L IVkl2 = 1
k=O k=O

31 31 31 31
L L t£kvl(kIEll) = L L Vkt£l(kIEll) = 0,
k=O 1=0 k=O 1=0

where t£k and Vk are the amplitudes of the encodings for 10) and 11), respectively.
For the code they obtain [108]

where
Ib1) 1000) + 1111)
Ib2) 1000) -1111)
Ib3) 1100) + 1011)
Ib4) 1100) - 1011)
Ib5) 1010) + 1101)
Ib6) 1010) - 1101)
Ib7) 1110) + 1001)
Ib8) 1110)-1001).
560 Chapter 23. Quantum Error Detection and Correction

The code was discovered by assuming that the absolute value of the non-zero am-
plitudes were equal and real. Thus a solution would be described exclusively by the
signs of the amplitudes. A computer search was used to find the code. A surprising
feature of the scheme is that the error correcting technique is the exact reverse of the
encoding technique [108], i.e we apply the same transformations but in the reverse
order. The following figure illustrates the encoding process
Iqabcd) --Iq'a'b'dd').
For the decoding we follow the process from right to left giving
Iq'a'b'e'd') --Iqabcd).
In the figure the 7r is a controlled phase change (multiplication with -1), the other

la) f-------"f"""-----Ia')
Ib) I---+---<i>---"---+---.---Ib')
Iq) ---+--+-....,.--E&---E&--+--i Iq')
Ie) f--t----<..........,f---t~-+_+-Id)

Id)----1 f--EIt--B7---EB--"*---ld')

Figure 23.1: Encoding for the 5-qubit Error Correction Code

gates have the usual meanings. A filled connection (circle) indicates the operation is
only applied when the corresponding qubit is 11) and an empty connection (circle)
indicates the operation is only applied when the corresponding qubit is 10). The error
syndrome and the result of the error on the state 0:10) +.811) (where 10:1 2 + 1.81 2 = 1)
is listed in the Table 23.4. Another 5 qubit code [19, 58] also found by a computer
search is given by
10) -- 15) := H+100000)
+111000) + 110001) + 100011) + 100110) + 101100)
-110100) -101001) -110010) -100101) -101010)
-111110) -111101) -111011) -110111) -101111))
11) -- Ii) := H+111111)
+100111) + I01ll0) + 11ll00) + IllOO1) + 110011)
-101011) -110110) -101101) -111010) -110101)
-100001) -100010) -100100) -101000) -110000)).
We note that

5
Ii) = ®UxoRli),
j=l

and the signs are chosen to satisfy the orthonormality constraints.


23.5 Stabilizer Codes 561

Syndrome la'b' c' d) Resulting state


10000) +odO) + ;311)
11101) -all) + tJIO)
11111) -aiD) + tJll)
10001) +aIO) - ;311)
11010) +aIO) - ;311)
11100) +aIO) - ;311)
10101) +aIO) - ;311)
10011) -aiD) - tJll)
11000) -aiD) - tJll)
10100) -aiD) - tJll)
10010) -aiD) - tJll)
10110) -all) - tJIO)
10111) -all) - tJIO)
11011) -all) - tJIO)
11110) -all) - ;310)
11001) -all) - tJIO)

Table 23.1: Error Syndrome for the 5 Qubit Error Correction Code

23.5 Stabilizer Codes


The Pauli spin matrices (including the unit matrix) with additional phases of ±l
and ±i

form a group with respect to matrix multiplication

x I CTx CTy CTz


I I ax ay az
ax CT x I iCT z -iCTy
CTy CTy -iCTz I iCTx
CTz az iay -iax I

Any two elements in the group either commute or anticommute, i.e for A, B E P
• if [A, BJ = AB - BA i- 0 it follows that [A, BJ+ = AB + BA = 0,
• if [A, Bl+ i- 0 it follows that [A, B] = O.
A consequence is that the set

also forms a group, where n E Z+.


562 Chapter 23. Quantum Error Detection and Correction

Let S be an Abelian subgroup of Pn . We define the quantum code Cs as the set of


states
Cs:= {11/J) I UI1/J) = 11/J) VU E S}.
The set S is called the stabilizer of the code Cs. The set S must be Abelian since
for M,N E S and 11/J) E Cs

M NI1/J) = MI1/J) = 11/J)


NMI1/J) = NI1/J) = 11/J)·
Thus
[M,N]I1/J) =0.
For any M E S, the inverse M- 1 = M* is also in S. If S contains - ®j=l I then the
code is trivial, i.e the code contains only the zero element of the underlying Hilbert
space. Obviously a subset of Pn forms a basis for all operations on n qubits (Le.
the elements of Pn unique up to phase). Suppose E E Pn such that [E, M]+ = 0 for
some M E S. For I¢», 11/J) E Cs we have

(¢>IEI1/J) = (¢>IEMI1/J) = -(¢>IMEI1/J) = -(¢>IM* EI1/J) = -(¢>IEI1/J)·


Thus (¢>IEI1/J) = o. So if for every E, F E Pn there exists M E S such that
[M, F* E]+ = 0, where E and F introduce errors in at most t qubits, the code
Cs can correct all t qubit errors. If for any errors E and F we have F* E E S,
then the errors E and F can be corrected, but not distinguished. A code with this
property is called a degenerate quantum code, otherwise it is called a nondegenerate
quantum code. The construction of these codes is described by Gottesman [74] and
Calderbank et al. [37]. If a code encodes k qubits into a Hilbert space of dimension
2n and corrects up to t errors we have the quantum Hamming bound

Let 11/J) be a codeword from Cs. We suppose that E, an error, has operated on
11/J). Let M E S, E and M either commute or anticommute. Suppose E and M
commute, the state EI1/J) is an eigenstate of M

M EI1/J) = EMI1/J) = EI1/J)


with eigenvalue 1, if E and M anticommute the state EI1/J) is an eigenstate of M

MEI1/J) = -EMI1/J) = -EI1/J)


with eigenvalue -1. Thus the eigenvalues corresponding to the eigenstate EI1/J) of
the operators in S give the error syndrome. Measuring the state with respect to M
does not destroy the information, and is used to determine the error syndrome.
Chapter 24
Quantum Hardware

24.1 Introduction

Computation is ultimately a physical process. In practice, the range of physically re-


alizable devices determines what is computable and the resources, such as computer
time, required to solve a given problem. Computing machines can exploit a variety
of physical processes and structures to provide distinct trade-offs in resource require-
ments. An example is the development of parallel computers with their trade-off of
overall computation time against the number of processors employed.

Hardware used to implement quantum algorithms have certain requirements.

• Storage. Qubits must be stored for long enough for a required algorithm
to complete and a result to be obtained. The discovery of quantum error
correcting codes decreases the hardware requirements at the cost of using
more qubits.

• Isolation. Quantum registers must be sufficiently isolated from the environ-


ment to minimize decoherence errors. If error correcting codes are used the
requirement reduces slightly to ensuring that only correctable errors can occur.

• Measurement. Quantum registers must be measured to obtain results. This


process must be efficient and reliable .

., Gates. Algorithms involve the manipulation and controlled manipulation of


qubits, which are described by some set of gates. These gates must be effi-
ciently implementable in the hardware system.

• Reliability. Algorithms must run reliably. Fault-tolerant gates and error cor-
recting codes can be used to satisfy the requirement provided that the hard-
ware only introduces errors which can be corrected.

A number of different approaches have been suggested.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
564 Chapter 24. Quantum Hardware

24.2 Trapped Ions


One method proposed to implement a quantum computing device is using an ion
trap [132, 155, 158]. Each qubit is carried by a single ion held in a linear Paul trap.
The quantum state of each ion is a linear combination of the ground state 0) and a
long-lived metastable exited state 11). A linear combination of the two states

remains coherent for a time comparable to the lifetime of the excited state, with
oscillating relative phase. To measure a qubit, a laser tuned to a transition from
the ground state to a short lived excited state is used to illuminate the ion. An ion
in the state 10) repeatedly absorbs and reemits the laser light. An ion in the state
11) will remain dark. Due to Coulomb repulsion, the ions are sufficiently separated
to be addressed by pulsed lasers. A laser tuned to the frequency w of the transition
focused on the appropriate ion induces Rabi oscillations between 10) and 11). Using
the appropriate laser pulse timing and phase, anyone qubit unitary transformation
can be applied to the ion.

The Coulomb repulsion between ions is used to achieve the interaction between
ions. The mutual Coulomb repulsion between the ions results in a spectrum of
coupled normal modes of vibration for the trapped ions. When the laser is correctly
tuned, then the absorption or emission of a laser photon by a single ion causes a
normal mode involving many ions to recoil coherently. The vibrational mode of
lowest frequency v is the center-of-mass mode. The ions can be laser cooled to
temperatures below that required for the center-of-mass mode, to levels such that
each vibrational mode is likely to occupy its quantum-mechanical ground state. A
laser tuned to the frequency w - v and applied to an ion for the time required to
rotate 11) to 10) and the the center-of-mass oscillation to transition from its ground
state to the first excited state, causes the information of the qubit to be transferred
to the collective state of motion of all the ions. Similarly, the information should be
transferred to another ion while returning the center-of-mass oscillator to its ground
state. Thus two ions can interact, and two qubit operations can be performed.

Measurement of arbitrary observables of trapped ions is described by Gardiner et al.


[70]. A method of performing quantum computations without the need for cooling
of the trapped ions has also been proposed [131].

Monroe et al. [120] demonstrated the operation of a two-qubit controlled NOT


quantum logic gate. The two qubits are stored in the internal and external degrees of
freedom of a single trapped ion, which is first laser cooled to the zero-point energy. In
their implementation, the target qubit 18) is spanned by two 281/ 2 hyperfine ground
states of a single 9Be+ ion, abbreviated by the equivalent spin-l/2 states I 1) and
I j), which are separated in frequency by wo/27r ~ 1.25 GHz. The control qubit In)
is spanned by the first two quantized harmonic oscillator states of the trapped ion
(10) and 11)), separated in frequency by the vibrational frequency wx /27r ~ 11 MHz
24.3 Cavity Quantum Electrodynamics 565

of the harmonically trapped ion. Manipulation between the four basis eigenstates
spanning the two qubit register is achieved by applying a pair of off-resonant laser
beams to the ion, which drives stimulated Raman transitions between basis states.
When the difference frequencey 5 is set near Wo transitions are coherently driven
between internal states IS) while preserving In). For 5 ::::J Wo - Wx (respectively
5 ::::J Wo + wx ) transitions are coherently driven between 11) 181 I 1) and 10) 181 I i)
(respectively 10) 18111) and 11) 1811 j)).

24.3 Cavity Quantum Electrodynamics


Instead of using the vibrational modes, as mentioned in the previous section, neutral
atoms can be trapped in a small high finesse optical cavity which interact because
of the coupling due to the normal modes of the electromagnetic field in the cavity
[32, 132, 158].

As described in the previous section, it is again possible to induce a transition in


one atom conditioned on the state of another atom. Alternatively, a qubit may be
stored in the polarization of a photon. The atoms are used to cause interactions
between photons. It has already been demonstrated that the circular polarization
of one photon can influence the phase of another photon. The first photon is stored
in the cavity where left polarization (I L)) does not couple to the atom and right
polarization (IR)) couples strongly. The second photon traverses the cavity where
again only the right polarization couples strongly with the atom. The second photon
acquires a phase shift eiD. only if both polarizations are right circular, i.e the following
transform is implemented

ILh ®ILh --> ILh ®IL)2


ILh ®IRh --> ILh ®IRh
IRh ®ILh --> IRh ®ILh
IRh ®IRh --> eiD.IRh ®IRh

Briegel et al. [32] propose two other methods to implement a phase shift gate.
The first method involves moving the potentials of the traps towards each other in
a state-dependent way while leaving the shape of the potential unchanged. This
results in two kinds of phase shifts. A single particle kinetic phase shift and an
interaction phase shift due to coherent interactions between two atoms.

The second method involves changing the shape of the potentials with time, de-
pending on the internal states of the particles. The atoms are initially trapped in
two displaced wells. The barrier between the wells is removed (quickly) for atoms in
a state Ib) while atoms in state la) experience no change. The atoms are allowed to
oscillate for some time and then the barrier is raised (again quickly) such that the
atoms are trapped again in their initial positions. The atoms acquire a kinematic
566 Chapter 24. Quantum Hardware

phase due to the oscillations within their respective wells and an interaction phase
due to the collision.

Both methods implement the phase change gate similar to the transform given
above, except for some additional overall phase introduced in the transform.

24.4 Quantum Dots


A quantum dot is a small metal or semiconductor box that holds a discrete number
of electrons. This number can be changed by adjusting electric fields in the neigh-
bourhood of the dot, for example by applying a voltage to a nearby metal gate.
We refer to quantum dots that have either zero or one electron in them. In a solid
most electrons are tightly bound to the atom. A few others are not bound to any
atom and are only confined in the quantum dots when present. We denote by 0 an
empty quantum dot and by 0) a quantum dot with a single electron.

Advanced semiconductor growth techniques such as molecular beam epitaxy al-


low the fabrication of semiconductor sandwich structures with interfaces of very
high precision. The various layers in the sandwich can be made to have different
properties by selecting an appropriate material during growth. Differences in the
bandgaps of different materials can be utilized in this way to create an effective
electronic potential energy which electrons experience. The layers can be grown
sufficiently thin so that quantum mechanical confinement becomes important. A
quasi two-dimensional electron gas forms in this thin layer. Patterning techniques
such as optical and electron-beam lithography can be used to reduce this to a quasi
one-dimensional (quantum wire) or quasi zero-dimensional (quantum dot) system.
A different scheme to reduce the quasi dimensionality is using electrostatic confine-
ment. A metallic pattern on top of the sandwich with a negative bias confines the
electron gas, two parallel strips create a quantum wire and two staggered patterns of
parallel strips can be used to create quantum dots. Other techniques include using
scanning tunneling microscopes and chemical self assembly.

Lent and Porod [130] suggest creating a cell of 5 quantum dots in the shape of an
"X" containing 2 electrons. The two electrons will be positioned in two opposite
corner quantum dots due to Coulomb repulsion. There are two such configurations
which can be identified with 0 and 1.

~ ~
~ ~
Figure 24.1: Two Possible Configurations for Quantum Dot Cells
24.4 Quantum Dots 567

The ground state of the system will be an equal superposition of the two basic
configurations with electrons at opposite corners. The quantum dots are labelled
as 0 for the top left corner, 1 for the top right corner, 2 for the bottom left corner,
3 for the bottom right corner and 0 for the center. Thus the cell polarity can be
defined as

where ni E {O, I} denotes the number of electrons in quantum dot i and

4
Lni = 2.
i=O

We can identify the polarity of -1 with binary 0 and a polarity of 1 with binary
1, where the polarities are of the configurations given above. Polarities between
-1 and 1 are interpreted as superposition states, for example the ground state has
a net polarity of O. The electrons can move between quantum dots if they are
close enough due to quantum tunneling. If two cells must interact, they must be
sufficiently distant to prevent electrons tunneling between them. Their interaction
is due to Coulomb interaction between the electrons. Suppose two adjacent cells are
configured as follows

o o 0 o
o o
o o 0 o

Forcing the first cell into the configuration with polarity 1, causes the second cell to
reconfigure to a minimum energy configuration.

,- -~_._O_~----,----~_'_O_~---,I ~ L-I~_._O_~_·--'----~_O_~---'
This allows signals to propagate along a series of quantum dots as quantum wires.
Classical operations are also possible using these cells, for example the majority
568 Chapter 24. Quantum Hardware

function can be implemented by

~
~
~~~
~~~
~
~
Forcing any three outer cells into some configurations causes the last outer cell
to be forced to the majority configuration to achieve the lowest energy state. A
majority gate can be used to construct any other classical gate. The OR gate can
be constructed by fixing the polarity of one of the inputs to 1, and the AND gate
by fixing the polarity of one of the inputs to -1. A full adder has been constructed
using these principles.

Measurement can be achieved by passing a current through a very narrow constric-


tion near a corner of a cell. An electron in the corner would prevent current from
flowing.

Another approach using quantum dots aimed specifically at quantum computing has
been proposed [34]. This approach does not use cells of quantum dots, but rather
the electron spin of an electron contained in a quantum dot. The manipulation
of more than one qubit is required for computation. This is achieved by coupling
quantum dots. Due to the Coulomb interaction and the Pauli exclusion principle
the ground state for two qubits is an entangled spin state. The system is described
by the Heisenberg Hamiltonian
H.(t) = J(t)8 1 .82
where J(t) is the exchange coupling between 8 1 and 8 2 . If the exchange coupling is
pulsed such that

h1 JJ(t)dt = T
JOT.
= 7r (mod 27r)

then the associated unitary evolution describes the swapping Usw of the quantum
states of the two qubits. The XOR operation is obtained as

UXOR ( i7rsz)
= eXP"2 ( i7r SZ)U!swexp (.Z7rSZ)U!
1 exp -"2 2 1 SW
24.5 Nuclear Magnetic Resonance Spectroscopy 569

1
a combination of U§w and single qubit rotations. The XOR operation combined
with single qubit rotations is a universal set of quantum gates. Thus any quantum
algorithm can be implemented using these operations. With only two quantum dots
a gate operation can be performed using uniform magnetic fields. For more qubits
local magnetic fields are necessary. The requirement is reduced by noting that using
the swap operation, a qubit state can be transferred to a qubit where an operation
can take place and then back to the original position without influencing the other
qubit states.

24.5 Nuclear Magnetic Resonance Spectroscopy

Most implementations of quantum computing devices utilize submicroscopic assem-


blies of quantum spins which can be difficult to prepare, isolate, manipulate and
observe.

In nuclear magentic resonance [50, 51, 103, 132] implementations, the qubit is iden-
tified with a nuclear spin in a molecule. A spin can be aligned (10)) or antialigned
(11)) with an applied magnetic field giving the basis of computation. The spins
take a long time to relax or decohere. The technique emulates a quantum compu-
tation using a large number of spins. Spin-active nuclei in each molecule of a liquid
sample are largely isolated from the spins in all other molecules, each molecule is
effectively an independent quantum computer. The computation is possible due to
the existence of pseudo-pure states, whose transformation properties are identical
to those of true pure states. Results of computations are then determined by, for
example, thermodynamic averaging. The method is chosen to average out unwanted
fluctuating properties so that only underlying coherent properties are measured. Al-
ternatively methods such as optical pumping and dynamic nuclear polarization can
be used to cool the system to a ground state. This leads to an ensemble quantum
computer.

Using a pulsed rotating magnetic field with frequency w determined by the energy
splitting between the spin-up and spin-down states, Rabi oscillations of the spin are
induced. The appropriate timing of the pulses can perform any unitary transform
on a single spin. All spins are exposed to the rotating magnetic field but only
those on resonance respond. The spins have dipole-dipole interactions which can be
exploited to perform two-qubit operations. The XOR (controlled NOT) operation
has been implemented using Pound-Overhauser double resonance and also using a
spin-coherence double reSOnance pulse sequence.

Average Hamiltonian theory can be used to implement quantum gates. The evo-
lution of the state at a time T is solved in terms of the time independent average
Hamiltonian H(T). The total Hamiltonian Htot(t) = Hint + Hext(t) is separated
into a time invariant internal Hamiltonian Hint and a time dependent Hamiltonian
570 Chapter 24. Quantum Hardware

Hext(t). After a period of evolution the overall dynamics is described by

U(T) = T exp ( -i 1
Htot(T)dT) = e- iHT

where T is the Dyson time ordering operator. For sufficiently small T the Magnus
expansion can be used to determine H. The coupling between qubits is always
active, thus it is useful to have an operation to suppress the undesirable couplings.
This can be achieved by an experimental method for "tracing out" or averaging out
unwanted degrees of freedom. The CNOT operation can be expressed as

where exp(~O"z 181 O"x) is implemented as

since the only two-body Hamiltonian available in liquid state nuclear magnetic res-
onance spectroscopy is the scalar coupling o"z 181 O"z.

Using nuclear magnetic resonance techniques Deutsch's algorithm [43, 98], Grover's
algorithm [44, 173] and a generalization of Shor's algorithm [174] have been imple-
mented. Maximally entangled states using this technique have also been achieved.
Chapter 25
Internet Resources

In the following we give a collection of web sites which provide information about
quantum computing. The web sites provide tutorials, information on experimental
implementations and electronic versions of papers.

http://issc.rau.ac.za
The web site for the International School for Scientific Computing. The school
offers courses in scientific computing including a course on classical and quantum
computing.

http://xxx.lanl.gov
The web site of the Los Alamos National Laboratory pre-print archive. The
site provides access to pre-prints in the fields of physics, mathematics, nonlinear
sciences and computer science. A search engine is also provided.

http://www.qubit.org
The Centre for Quantum Computation, part of the University of Oxford, conducts
theoretical and experimental research into all aspects of quantum information
processing, and into the implications of the quantum theory of computation for
physics itself.

http://www.theory.caltech.edu/-preskill/ph229
Quantum Information and Computation course notes. Overview of cla.<:;sical
complexity theory, quantum complexity, efficient quantum algorithms, quantum
error-correcting codes, fault-tolerant quantum computation, physical implemen-
tations of quantum computation.

http://www.openqubit.org
A quantum computation simulation project on Intel based architectures. The
project goal is to develop a system for describing and testing quantum computing
algorithms.

Y. Hardy et al., Classical and Quantum Computing


© Birkhäuser Verlag 2001
572 Chapter 25. Internet Resources

http://squint.stanford.edu/
A collaboration between researchers at Stanford University and U.C. Berkeley,
involving the experimental and theoretical study of quantum-mechanical systems,
and how they can be utilized to process and store information.

http://qso.lanl.gov/qc/
An overview of the work done at Los Alamos on quantum computation and
cryptography is provided. A number of papers are also provided in electronic
form.

http://theory.caltech.edu/-quic/
Quantum Information and Computation (QUIC). A collaboration of groups at
MIT, Caltech and USC investigating experimental, theoretical, and modelling
quantum computation.

http://www.research.ibm.com/quantuminfo/
Quantum Information and Information Physics at IBM Research Yorktown. The
group's work main work is in quantum information and computation theory, but
they also study other aspects of the relation between physics and information
processing.

http://www.iro.umontreal.ca/labs/theorique/index.html.en
Laboratory for Theoretical and Quantum Computing of the Computer Science
Department of the University of Montreal. Includes a bibliography of quantum
cryptography.

http://www.fysel.ntnu.no/Optics/qcr/
Quantum cryptography in Norway. As the first large task of the project, they are
building the demonstrator of a point-to-point quantum key distribution channel.
Some have already been built and tested by other research groups. The basic
principles are well known, but what remains a challenge is approaching practical
applications. They are working in this direction.

http://www.nd.edu/-qcahome/
Quantum-dot Cellular Automata. A web site exploring the possibilities of using
quantum-dots to form quantum wires and to construct gates. The web site pro-
vides tutorials, simulations and electronic versions of some papers on the subject.
Bibliography

[1] Adami C. and N.J. Cerf, "What Information Theory Can Tell Us About Quan-
tum Reality" ,
http) /xxx.lanl.gov, quant-ph/9806047.
[2] Ammeraal L., STL for c++ Programmers, John Wiley, Chichester, 1997.
[3] Ash R. B., Information Theory, Dover Publications, New York, 1990.
[4] Bac Fam Quang and Perov V. L., "New evolutionary genetic algorithms for
NP-complete combinatorial problems", Biological Cybernetics 69, 229-234,
1993.
[5] Balakrishnan A. V., Applied FUnctional Analysis, Second Edition Springer-
Verlag, New York, 1981.
[6] Barenco A., "A Universal Two-Bit Gate for Quantum Computation",
http://xxx.lanl.gov, quant-ph/9505016
[7] Barenco A. et al., "Elementary gates for quantum computation" ,
http://xxx.lanl.gov, quant-ph/9503016
[8] Barenco A., "Quantum Physics and Computers", Contemporary Physics 37,
375-389, 1996.
[9] Bell J. S., Speakable and unspeakable in quantum mechanics, Cambridge Uni-
versity Press, Cambridge, 1989.
[10] Ben-Ari M., Mathematical Logic for Computer Science, Prentice Hall, New
York, 1993.
[11] Benioff P., "Models of Quantum Turing Machines",
http://xxx.lanl.gov, quant-ph/9708054
[12] Bennett C. H. and G. Brassard, "Quantum cryptography: Public-key distri-
bution and coin tossing", Proceedings of IEEE International Conference on
Computers, Systems and Signal Processing, Bangalore, India, 175-179 (1984).

[13] Bennett C. H., In Emerging Syntheses in Science, ed. D. Pines, Addison-


Wesley, Reading, MA, 1988.
[14] Bennett C. H., G. Brassard, and N. D. Mermin, "Quantum cryptography
without Bell's theorem", Phys. Rev. Lett. 68, 557-559 (1992).
574 Bibliography

[15] Bennett C. R., "Quantum cryptography using any two nonorthogonal states",
Phys. Rev. Lett. 68, 3121-3124 (1992).

[16] Bennett C. H. and S.J. Wiesner, "Communication via one- and two-particle
operations on Einstein-Podolsky-Rosen states", Phys. Rev. Lett. 69, 2881-2884
(1992).
[17] Bennett H. C, G. Brassard, Crepeau C., R. Jozsa, A. Peres and W. K.
Wootters, "Teleporting an Unknown Quantum State via Dual Classical and
Einstein-Podolsky-Rosen Channels", Phys. Rev. Lett. 70 1895-1899 (1993).

[18] Bennett H. C., "Quantum Information and Computation", Physics Today,


October, 24-30, 1995.

[19] Bennett R. C., D. P. DiVincenzo, J. A. Smolin and W. K. Wootters, "Mixed


State Entanglement and Quantum Error Correction" ,
http://xxx.lanl.gov /quant-ph/9604024
[20] Bennett H. C., E. Bernstein, G. Brassard and U. Vazirani, "Strengths and
Weaknesses of Quantum Computing", SIAM Journal on Computing 26, 1510-
1523, 1997.
[21] Berthiaume A., "Quantum Computation",
http://andre.cs.depaul.edu/Andre/publicat.htm

[22] Biron D., O. Biham, E. Biham, M. Grassl and D. A. Lidar, "Generalized


Grover Search Algorithm for Arbitrary Initial Amplitude Distribution" ,
http://xxx.lanl.gov, quant-ph/9801066

[23] B6hm A., Quantum Mechanics, Springer-Verlag, 1936.


[24] Bouwmeester D. et al., "Experimental Quantum Teleportation", Nature 390,
575, 1997.
[25] Bouwmeester D., J.-W. Pan, M. Daniell, H.Weinfurter and A. Zeilinger, "Ob-
servation of three-photon Greenberger-Home-Zeilinger entanglement",
http://xxx.lanl.gov, quant-ph/9810035
[26] Boyer M., Brassard G., Hoyer P. and A. Tapp, "Tight bounds on quantum
searching" ,
http://xxx.lanl.gov, quant-ph/9605034

[27] Brassard G. and P. Hoyer, "An Exact Quantum Polynomial-Time Algorithm


for Simon's Problem",
http://xxx.lanl.gov, quant-ph/9704027

[28] Brassard G., Braunstein S. L. and R. Cleve, Physica D 120 43-47 (1998).

[29] Braunstein S. L., "Error Correction for Continuous Quantum Variables",


Physical Review Letters 80, 4084-4087, 1998.
Bibliography 575

[30] Braunstein S. L. and H. J. Kimble, "Teleportation of Continuous Quantum


Variables", Physical Review Letters 80, 869-872, 1998.
[31] Braunstein S. L. and H. J. Kimble, "Dense coding for continuous variables",
Physical Review A 61, 042302-1-042302-4, 2000.
[32] Briegel H.-J., T. Calarco, D. Jaksch, J. I. Cirac and P. Zoller, "Quantum
computing with neutral atoms" ,
http://xxx.lanl.gov, quant-ph/9904010
[33] BruB D., D. P. DiVincenzo, A. Ekert, C. A. Fuchs, C. Macchiavello and J. A.
Smolin, "Optimal universal and state-dependent quantum cloning" ,
http://xxx.lanl.gov, quant-ph/9705038
[34] Burkard G., D. Loss and D. P. DiVincenzo, "Coupled quantum dots as quan-
tum gates",
http://xxx.lanl.gov, quant-ph/9808026
[35] Buzek V., and M. Hillery, "Universal Optimal Cloning of Qubits and Quantum
Registers" ,
http://xxx.lanl.gov, quant-ph/9801009
[36] Cabello A., "Quantum Key Distribution Based on Entanglement Swapping",
http://xxx.lanl.gov, quant-ph/9911025
[37] Calderbank A. R., E. M. Rains, P. W. Shor, and N. J. A. Sloane, "Quantum
Error Correction and Orthogonal Geometry" ,
http://xxx.lanl.gov, quant-ph/9605005
[38] Cerf N.J. and C. Adami, "Negative entropy and information in quantum me-
chanics" ,
http://xxx.lanl.gov, quant-ph/9512022
[39] Cerf N.J. and C. Adami, "Quantum mechanics of measurement",
http://xxx.lanl.gov, quant-ph/9605002
[40] Cerf N.J. and C. Adami, "Entropic Bell Inequalities",
http://xxx.lanl.gov, quant-ph/9608047
[41] Chaitin G. J., Information, Randomness and Incompleteness, World Scientific,
Singapore, 1987.
[42] Chartrand C. and L. Lesniak, Graphs and Digraphs, Third Edition, Chapman
and Hall, London, 1996.
[43] Chuang I. L., L. M. K. Vandersypen, Xinlan Zhou, D. W. Leung and S. Lloyd,
"Experimental realization of a quantum algorithm" ,
http://xxx.lanl.gov, quant-ph/9801037
[44] Chuang I. L., N. Gershenfeld and M. Kubinec, "Experimental Implementation
of Fast Quantum Searching" ,
http://squint.stanford.edu/qc/nmrqc-grover/index.html
576 Bibliography

[45] Cichocki A. and Unbehauen R., Neural Networks for Optimization and Signal
Processing, John Wiley, Chichester, 1993.

[46] Cirac J. 1. and P. Zoller, "Quantum Computations with Cold Trapped Ions",
Physical Review Letters 74,4091-4094, 1995.

[47] Cleve R., A. Ekert, C. Macchiavello and M. Mosca, "Quantum Algorithms


Revisited", Proc. Roy. Soc. Lond. A 454, 339-354, 1998.
[48] Cohen D. E., Computability and Logic, John Wiley, New York, 1987.

[49] Cohen D.1.A., Introduction to Computer Theory, Revised Edition, Wiley, New
York,1991.
[50] Cory D. G., M. D. Price and T. F. Havel, "Nuclear Magnetic Resonance Spec-
troscopy: An Experimentally Accessible Paradigm for Quantum Computing" ,
http://xxx.lanl.gov, quant-ph/970900l
[51] Cory D. G. et al., "NMR Based Quantum Information Processing: Achieve-
ments and Prospects" ,
http://xxx.lanl.gov, quant-ph/0004104
[52] Cybenko G., Approximation by superpositions of a sigmoidal function, Math-
ematics of Control, Signals and Systems 2, 303-314, 1989

[53] H. T. Davis, Introduction to Nonlinear Differential and Integral Equations


Dover Publications, New York, 1962.
[54] Deutsch D., "Quantum theory, the Church-Turing principle and the universal
quantum computer" ,
Proc. Royal Soc. London A 400, 97-117,1985.
[55] Deutsch D. and R. Jozsa, Proc. Royal Soc. London A 439, 553, 1992.
[56] Deutsch D., A. Barenco and A. Ekert, "Universality in Quantum Computa-
tion" ,
http://xxx.lanl.gov, quant-ph/9505018
[57] DiVincenzo D. P., "Two-bit gates are universal for quantum computation",
http://xxx.lanl.gov, condmat/9407022
[58] DiVincenzo D. P. and P. W. Shor, "Fault-Tolerant Error correction with effi-
cient Quantum Codes" ,
http://xxx.lanl.gov, quant-ph/9605031
[59] Dirac P. A. M., The Principles of Quantum Mechanics, Clarendon Press, Ox-
ford, 1958.
[60] Einstein A., B. Podolski and N. Rosen, "Can quantum mechanical description
of reality be considered complete 7" ,
Physical Review 47, 777-780, 1935.
Bibliography 577

[61] Ekert A., "Quantum cryptography based on Bell's theorem", Phys. Rev. Lett.
67, 661--663 (1991).
[62] Elby A. and J. Bub, "Triorthogonal uniqueness theorem and its relevance to
the interpretation of quantum mechanics", Physical Review A 49, 4213-4216,
1994.
[63J Epstein R. 1. and Carnielli W. A., Computability, Wadsworth & Brooks/Cole,
Pacific Grove, California (1989).
[64J Everett III H., "Relative state formulation of quantum mechanics" Review of
Modem Physics 29 454-462, 1957.
[65J Fausett L., Fundamentals of Neural Networks: Architecture, Algorithms and
Applications, Prentice Hall, Englewood Cliffs, N. J., 1994
[66J Ferreira C., "Gene Expression Programming: a New Adaptive Algorithm for
Solving Problems" ,
http://xxx.lanl.gov, cs.AI/0l02027
[67J Feynman R. P., A. J. G. Hey (Editor) and R. W. Allen (Editor), Feynman
Lectures on Computation, Perseus books, 1996.
[68] Feynman R. P., R. B. Leighton and M. Sands, The Feynman Lectures on
Physics Volume III, Addison-Wesley, Reading, MA, 1966.
[69J Funahashi K.-1., "On the approximate realization of continuous mappings by
neural networks", Neural Networks, 2, 183-192, 1989
[70] Gardiner S. A., J. 1. Cirac and P. Zoller, "Measurement of Arbitrary Observ-
abIes of a Trapped Ion" ,
http://xxx.lanl.gov, quant-ph/9606026
[71J Glimm J. and A. Jaffe, Quantum Physics, Springer-Verlag, New York, 1981.
[72J Goldberg D. E., Genetic Algorithms in Search, Optimization and Machine
Learning, Addison-Wesley, Reading, MA, 1989.
[73] Goldberg D. E. and R. Lingle, "Alleles, Loci, and the TSP", in Greffenstette,
J. J. (Editor), Proceedings of the First International Conference on Genetic
Algorithms, Lawrence Erlbaum Associates, Hillsdale, NJ, 1985.
[74] Gottesman D., "A Class of Quantum Error-Correcting Codes Saturating the
Quantum Hamming Bound" ,
http://xxx.lanl.gov, quant-ph/9604038
[75J Gottesman D., "An Introduction to Quantum Error Correction",
http://xxx.lanl.gov, quant-ph/0004072
[76J Grassberger P., Int. Journ. Theor. Phys. 25, 907, 1986.
[77] Grassmann W. K. and J.-P. Tremblay, Logic and Discrete Mathematics: A
Computer Science Perspective, Prentice Hall, New Jersey, 1996.
578 Bibliography

[78] Grover L. K, "Quantum computers can search arbitrarily large databases by


a single query" ,
http://xxx.lanl.gov, quant-ph/9706005
[79] Grover L. K, "Quantum Mechanics helps in searching for a needle in a
haystack" ,
http://xxx.lanl.gov, quant-ph/9706033
[80] Gudder S., "Quantum Automata: An Overview", Int. Journ. Theor. Phys.
38, 2261, 1999.

[81] Hardy Y., W.-H. Steeb and R Stoop, "Jacobi Elliptic Functions, Nonlinear
Evolution Equations and Recursion" , International Journal of Modern Physics
ell, 27-31, 2000.
[82] Hassoun M. H., Fundamentals of Artificial Neural Networks, The MIT Press,
Cambridge Massachusetts, 1995.
[83] Haykin S., Neural Networks, Macmillan College Publishing Company, New
York,1994.
[84] Healey R, The philosophy of quantum mechanics Cambridge University Press,
Cambridge, 1990.

[85] Hebb D.O., The Organization of Behaviour, John Wiley, New York, 1949.
[86] Holevo A. S., "The Capacity of Quantum Channel with General Signal States",
http://xxx.lanl.gov, quant-ph/9611023
[87] Holevo A. S., "Coding Theorems for Quantum Communication Channels",
http://xxx.lanl.gov, quant-ph/9708046
[88] Holevo A. S., "Coding Theorems for Quantum Channels",
http://xxx.lanl.gov, quant-ph/9809023
[89] Holland J. H. Adaptation in Natural and Artificial Systems, University of
Michigan Press, Ann Arbor, 1975.
[90] Horodecki M., P. Horodecki and R Horodecki, "Separability of mixed states:
necessary and sufficient conditions" ,
http://xxx.lanl.gov, quant-ph/9605038
[91] Horodecki P., M. Lewenstein, G. Vidal and 1. Cirac, "Operational criterion
and constructive checks for the separability of low rank density matrices." ,
http://xxx.lanl.gov, quant-ph/0002089
[92] Hornik K, M. Stinchcombe and H. White, "Multilayer feedforward networks
are universal approximators", Neural Networks 2, 359-366, 1989
[93] H0yer P., "Conjugated Operators in Quantum Algorithms",
ftp:/ /ftp.imada.sdu.dk/pub/papers/pp-1997/34.ps.gz
Bibliography 579

[94] Huberman B. A. and T. Hogg, Physica D 22, 376, 1986.

[95] Ivanyos G., F. Magniez and M. Santha, "Efficient quantum algorithms for
some instances of the non-Abelian hidden subgroup problem",
http) /xxx.lanl.gov, quant-ph/0l020l4
[96] Jianwei Pan and A. Zeilinger, Phys. Rev. A 572208-2212 (1998).
[97] Jones N. D., Computability Theory: An Introduction, Academic Press, New
York,1973.

[98] Jones J. A. and M. Mosca., "Implementation of a Quantum Algorithm to Solve


Deutsch's Problem on a Nuclear Magnetic Resonance Quantum Computer",
http://xxx.lanl.gov, quant-ph/980l027

[99] Jozsa R., "Quantum Algorithms and the Fourier Transform",


http://xxx.lanl.gov, quant-ph/9707033

[100] Jozsa R., "Quantum factoring, discrete logarithms and the hidden subgroup
problem",
http://xxx.lanl.gov, quant-ph/00l2084
[101] Kieu T. D. and M. Danos, "The halting problem for universal quantum com-
puters" ,
http://xxx.lanl.gov, quant-ph/9811001

[102] Kitaev A. Y, "Quantum measurements and the Abelian Stabilizer Problem",


http://xxx.lanl.gov, quant-ph/9511026

[103] Knill E., I. Chuang and R. Laflamme, "Effective Pure States for Bulk Quantum
Computation" ,
http://xxx.lanl.gov, quant-ph/9706053

[104] Knuth D. E., The Art of Computer Programming, Volume 1, Fundamental


Algorithms, Addison-Wesley, Reading Massachusetts 1981.
[105] Knuth D. E., The Art of Computer Programming, Volume 2, Seminumerical
Algorithms, Addison-Wesley, Reading Massachusetts 1981.
[106] Kolmogorov A. N., "Three approaches to the quantitative definition of infor-
mation", Probl. Inform. Transmission 1, 1-7, 1965.
[107] Koza J. R., Genetic Programming, The MIT Press, Cambridge Massachusetts,
1993.

[108] Laflamme L., C. Miquel, J. P. Paz, and W. H. Zurek, "Perfect Quantum Error
Correction Code" ,
http://xxx.lanl.gov, quant-ph/96020l9
[109] Lempel A. and J. Ziv, "On the Complexity of Finite Sequences", IEEE Trans-
actions on Information Theory 22, 75-81, 1976.
580 Bibliography

[110] Linden N. and S. Popescu, "The Halting Problem for Quantum Computers",
http://xxx.lanl.gov, quant-ph/9806054
[111] Lloyd S. and H. Pagels, "Complexity as thermodynamic depth", Ann. Phys.
188, 186-213, 1988.

[112] Lloyd S, and Braunstein S. L., "Quantum Computation over Continuous Vari-
abIes", Physical Review Letters 82, 1784-1787, 1999.
[113] Lopez-Ruiz R., Mancini H. L. and X. Calbet, "A statistical measure of com-
plexity", Phys. Lett. A 209, 321-326, 1995.
[114] Lovasz L., Computation Complexity,
http://zoo.cs.yale.edu/classes/cs460/Spring98/complex. ps
[115] Mallozzi J. S. and N. J. De Lillo, Computability with Pascal, Prentice Hall,
New Jersey, 1984.
[116] Michalewicz Z., Genetic Algorithms + Data Structure = Evolution Programs,
Third Edition, Springer-Verlag, Berlin, 1996.
[117] Minsky M. L., Computation: Finite and Infinite Machines, Prentice Hall, New
York, 1967.
[118] Miquel C., J. P. Paz and R. Perazzo, "Factoring in a dissipative quantum
computer" , Physical Review A 54, 2605-2613, 1996.
[119] Moore C. and J. P. Crutchfield, "Quantum Automata",
http://xxx.lanl.gov, quant-ph/9707031
[120] Monroe C., D. M. Meekhof, B. E. King, W. M. Itano and D. J. Wineland,
"Demonstration of a Fundamental Quantum Logic Gate", Physical Review
Letters 75, 4714-4717, 1995.
[121] Mosca M. and A. Ekert, "The Hidden Subgroup Problem and Eigenvalue
Estimation on a Quantum Computer" ,
http://xxx.lanl.gov, quant-ph/9903071
[122] Mozyrsky D., V. Privman and M. Hillary, "A Hamiltonian for quantum copy-
ing", Physics Letters A 226, 253-256, 1997.
[123] Nielsen M. A. and I. L. Chuang, "Programmable Quantum Gate Arrays",
Physical Review Letters 79, 321-324, 1997.
[124] Orner B., http://tph.tuwien.ac.atroemer
[125] Ozawa M., "Quantum Turing machines: Local transition, preparation, mea-
surement and halting" ,
http://xxx.lanl.gov, quant-ph/9809038
[126] Ozawa M., "Entanglement measures and the Hilbert-Schmidt distance",
http://xxx.lanl.gov, quant-ph/0002036
Bibliography 581

[127] Peres A., "Quantum Entanglement: Criteria and Collective Tests" ,


http://xxx.lanl.gov, quant-ph/9707026
[128] Peres A., "All the Bell Inequalities",
http://xxx.lanl.gov , quant-ph/9807017
[129] Popescu S., "Bell's inequalities and density matrices. Revealing 'hidden' non-
locality" ,
http://xxx.lanl.gov, quant-ph/9502005
[130] Porod W., "Quantum-dot devices and quantum-dot cellular automata",
International Journal of Bifurcation and Chaos 7, 2199-2218, 1997.
[131] Poyatos J. F., J. I. Cirac and P. Zoller, "Quantum gates with 'hot' trapped
ions",
http://xxx.lanl.gov, quant-ph/9712012
[132] Preskill J, Quantum Information and Computation,
http://www.theory.caltach.edu/-preskill/ph229
[133] Preskill J., "Fault-tolerant quantum computation",
http://xxx.lanl.gov, quant-ph/9712048
[134] Pritzker Y., http://www.openqubit.org
[135] Prugovecki E., Quantum Mechanics in Hilbert Space, Second Edition, Aca-
demic Press, New York, 1981.
[136] Redhead M., Incompleteness, Nonlocality, and Realism Clarendon Press, Ox-
ford, 1990.
[137] Richtmyer R. D., Principles of Advanced Mathematical Physics, Volume I,
Springer-Verlag, New York, 1978.
[138] RiefIel E. and W. Polak, "An Introduction to Quantum Computing for Non-
Physicists" ,
http://xxx.lanl.gov, quant-ph/9809016.
[139] Rojas R., Neural Networks, Springer-Verlag, Berlin 1996
[140] ROtteler M. and T. Beth, "Polynomial-Time Solution to the Hidden Subgroup
Problem for a Class of non-abelian Groups" ,
http://xxx.lanl.gov, quant-ph/9812070
[141] Schack R. and T. A. Brun, "A C++ library using quantum trajectories to
solve quantum master equations" ,
http://xxx.lanl.gov, quant-ph/9608004
[142] Schommers W. (Editor), Quantum Theory and Picture of Reality, Springer-
Verlag, Berlin, 1989.
[143] Schrodinger E., "Discussion of Probability Relations Between Separated Sys-
terns", Proceedings of the Cambridge Philosophical Society 31, 555-563, 1935.
582 Bibliography

[144] Schumacher B. "Quantum coding", Physical Review A 51,2738-2747, 1995.


[145] Schwabl F., "Quantum Mechanics", Second revised edition, Springer-Verlag,
Berlin, 1995.
[146] Sewell G. L., Quantum Theory of Collective Phenomena, Clarendon Press,
Oxford, 1986.
[147] Yu Shi, "On quantum generalization of the Church-Turing universality of com-
putation" ,
http://xxx.lanl.gov, quant-ph/9805083
[148] Shor P. W., "Proceedings of the 35th Annual Symposium on the Foundations
of Computer Science", edited by S. Goldwasser (Los Alamitos, CA: IEEE
Compuer Society Press), p. 124, 1994.
[149] Shor P. W., "Fault-Tolerant Quantum Computation",
http://xxx.lanl.gov, quant-ph/9605011
[150] Simon D., "On the power of quantum computation", SIAM J. on Computing
26, 1474-1483, 1997.

[151] Skahill K., VHDL for Programmable Logic, Addison-Wesley, Reading Mas-
sachusetts, 1996.
[152] Stakgold 1., Boundary Value Problems of Mathematical Physics, Volume I,
MacMillan, New York 1967.
[153] Stallings W., Computer Organization and Architecture: designing for perfor-
mance, Fourth Edition, Prentice Hall, 1996.
[154] Steane A., "Multiple-Particle Interference and Quantum Error Correction",
http://xxx.lanl.gov, quant-ph/9601029
[155] Steane A., "The Ion Trap Quantum Information Processor",
http://xxx.lanl.gov, quant-ph/9608011
[156] Steane A., "Quantum computing",
http) /xxx.lanl.gov, quant-ph/9708022
[157] Steane A., "Efficient fault-tolerant quantum computing",
http://xxx.lanl.gov, quant-ph/9809054
[158] Steane A. and D. M. Lucas, "Quantum computing with trapped ions, atoms
and light",
http://xxx.lanl.gov, quant-ph/0004053
[159] Steeb W.-H., "Bose-Fermi Systems and Computer Algebra", Found. Phys.
Lett. 8 73-82, 1995.

[160] Steeb W.-H., Problems and Solutions in Theoretical and Mathematical Physics,
Volume I, World Scientific, Singapore, 1996.
Bibliography 583

[161J Steeb W.-H. and F. Solms, "Complexity, chaos and one-dimensional maps",
South African Journal of Science 92, 353-354, 1996.

[162J Steeb W.-H., Matrix Calculus and Kronecker Product with Applications and
C++ Programs, World Scientific, Singapore, 1997.

[163] Steeb W.-H., Hilbert Spaces, Wavelets, Generalized Functions and Modern
Quantum Mechanics, Kluwer Academic Publishers, Dordrecht, 1998.

[164] Steeb W.-H., The Nonlinear Workbook, World Scientific, Singapore, 1999.

[165J Steeb W.-H. and Y. Hardy, "Entangled Quantum States and a C++ Imple-
mentation", International Journal of Modern Physics C 11, 69-77, 2000.

[166] Steeb W.-H. and Y. Hardy, "Quantum Computing and SymbolicC++ Simu-
lations", International Journal of Modern Physics C 11, 323-334, 2000.

[167] Steeb W.-H. and Y. Hardy, "Entangled Quantum States", International Jour-
nal of Modern Physics C 39, 2765, 2000.

[168] Suzuki J., "A Markov Chain Analysis on Simple Genetic Algorithms" IEEE
Transactions on Systems, Man and Cybernetics 25, 655--{)59, 1995.

[169] Tan KS., W.-H. Steeb, and Y. Hardy, SymbolicC++ (2nd extended and revised
edition), Springer Verlag, London, 2000.

[170J Terhal B. M., "Bell Inequalities and The Separability Criterion",


http://xxx.lanl.gov, quant-ph/9911057

[171J van der Lubbe J. C. A., Basic Methods of Cryptography, Cambridge University
Press, Cambridge, 1998.

[172] Valafar H. Distributed Global Optimization and Its Applications, Purdue Uni-
versity, PhD. Thesis, 1995.

[173] Vandersypen L. M. K, M. Steffen, M. H. Sherwood, C. S. Yannoni, G. Breyta


and 1. L. Chuang, "Implementation of a three-quantum-bit search algorithm" ,
http://xxx.lanl.gov, quant-ph/9910075

[174] Vandersypen L. M. K, M. Steffen, G. Breyta, C. S. Yannoni, R. Cleve and 1.


L. Chuang, "Experimental realization of order-finding with a quantum com-
puter" ,
http://xxx.lanl.gov, quant-ph/0007017

[175] van Loock P. and Braunstein S. L., "Unconditional teleportation of continu-


ous-variable entanglement", Physical Review A 61, 010302-1-010302-4, 1999.

[176] Vedral V., M. B. Plenio, M. A. Rippin and P. L. Knight, "Quantifying Entan-


glement" ,
http://xxx.lanl.gov, quant-ph/9702027
584 Bibliography

[177] Vedral V. and M. B. Plenio "Entanglement Measures and Purification Proce-


dures" ,
http://xxx.lanl.gov, quant-ph/9707035
[178] Vedral V. and Plenio M. B., "Basics of Quantum Computation",
http://xxx.lanl.gov, quant-ph/9802065.
[179] Vigier J. P., "Non-locality Causality and Aether in Quantum Mechanics",
Astronomische Nachrichten 303, 55~80, 1982.
[180] Vose M. D., "Modelling of Genetic Algorithms", Foundations of Genetic Al-
gorithms 2, 63~73, 1992.
[181] Weidmann J., Linear Operators in Hilbert Space, Springer-Verlag, New York,
1980.
[182] Werner R. F., Physical Review A 40, 4277, 1989.
[183] Wilf H. S., Algorithms and Complexity,
http://www.cis.upenn.edurwilf, 1994.
[184] Witte C. and M. Trucks, "A new entanglement measure induced by the
Hilbert-Schmidt norm", Phys. Lett. A. 257, 14~20, 1999.
[185] Yosida K., Functional Analysis, Springer-Verlag, Berlin, 1978.

[186] Zukowski M., "Violations of Local Realism in the Innsbruck GHZ experiment",
http://xxx.lanl.gov, quant-ph/9811013
Index

Abelian group, 203 four-bit adder, 100


Abelian stabilizer problem, 528 full adder, 99
Absolutely linearly separable, 267 half adder, 98
Abstract data type, 171 multiplication, 59, 103
Activation function, 262, 282 notation, 51
Adjoint operator, 418 subtraction, 102
Algebraic tensor product, 415 tree, 190
Algorithm, 3 binary Gray code, 320
Alice, 507 Biorthogonal decomposition theorem,
Alphabet, 18 498
ALU, 110 Bit, 28
AND, 27 Bob, 507
Angular momentum operator, 445 Boolean
Animals, 313 algebra, 24
Annihilation operator, 428 function, 28
Arguments, 15 variables, 28
Arithmetic logic unit, 110 Booth's algorithm, 106
Arithmetization, 255 Bounded, 417
Assertion, 7 Bra vector, 417
Asynchronous circuit, 125 Buffer, 87
Average Hamiltonian theory, 569 Bures distance, 479
Byte, 52
Backtracking, 165
Banach space, 405 Canonical SOP form, 29
Basic data types, 171 Carry cascading, 101
Basis degeneracy problem, 492 Cauchy sequence, 404
Batch mode, 294 Cavity quantum electrodynamics, 565
BCD,70 Characteristic equation, 427
Bell basis, 410, 455 Characteristic function, 16
Bell's inequality, 543 Chromosomes, 313
Bessel functions, 138 Church's thesis, 252
Bias, 269 Clauser, Horne, Shimony, Holt
Bin packing problem, 380 inequality, 543
Binary Clocks, 125
adder, 98 Code, 200
coded decimal numbers, 70 Code words, 200
digit, 28 Commutator, 431
division, 59, 107 Compact support, 158
586 Index

Comparator, 108 Domain, 15, 417


Complement, 27 Doubleword, 52
Complete, 405 Dyadic product, 417
Complexity, 251
Edge triggering, 126
space, 259
Effectively computable, 16
time, 259
Eigenspace, 426
Computability, 251
Eigenvalue, 426
Concatenation, 18
Eigenvector, 426
Conjunction, 24
Empty word, 18
Conjunctive normal form, 28
Encapsulation, 171
Continuous spectrum, 426
Encoder, 93
Converge
priority, 93
strongly, 405, 421
Ensemble quantum computer, 569
uniformly, 421
Entangled, 455
weakly, 405, 421
Entanglement of formation, 545
Cook's theorem, 260
entanglement swapping, 511
Coprime, 524
Epoch,294
Cost function, 314
EPR state, 456
Creation operator, 428
EPROM, 114
Crossover, 315
Equivalence, 28
Crossover operation, 316
Error function, 297
Cryptography, 215
Error syndrome, 555
Cryptology, 215 Euclidean algorithm, 4
Euler's theorem, 221
Decimal incrementer, 37
Euler's totient function, 221
Decimal number system, 51
Even-parity function, 198
Decoder, 92
Exact quantum polynomial time, 520
Decoherence, 555
Exclusive OR, 29
Degenerate quantum code, 562
Execution, 5
DeMorgan's theorem, 27
Exponent, 74
Demultiplexer, 96
Dense coding, 539 Feedback, 89
Density matrix, 422 Fibonacci sequence, 137
Deterministic algorithm, 5 Fidelity, 207
Deutsch's problem, 515 Finite automata, 230
Dirac notation, 417 quantum, 501
Dirac spin matrices, 439 Fitness function, 314
Discrete Fourier transform, 522 Fixed point number, 72
Discrete logarithm, 528 Floating point number, 72
Discrete spectrum, 429 Four colour problem, 356
Discrete wavelet transform, 156 Fourier expansion, 411
Disjunction, 24 Free Boolean algebra, 25
Disjunctive normal form, 28 Full adder, 34
Distance, 404 Fundamental theorem of arithmetic,
Distributed global optimization, 381 255
Index 587

Godel Hyperplane, 266


incompleteness theorem, 254
number, 255 Individuals, 313
numbering, 254 Inequality of Schwarz, 412
GAL, 115 Information hiding, 171
Garbage disposal, 476 Initial functions, 162
Gate Inner product, 403
AND, 80, 270 Innsbruck experiment, 508
CNOT,468 Integration function, 282
controlled controlled NOT, 469 Invariant, 7
controlled exchange, 470 Inverter, 83
controlled NOT, 468
Jacobi elliptic functions, 152
Deutsch, 470
Jacobi symbol, 222
exchange, 468
Fredkin,470 Karnaugh maps, 35
NAND, 84 Ket vector, 417
NOR, 85 Knapsack problem, 362
NOT, 83 Kronecker delta, 410
OR, 81 Kronecker product, 437
phase shift, 469
TofIoli, 469 L-Ianguage, 19
Walsh-Hadamard, 465 L-system, 19
XNOR,86 Lagrange multiplier method, 360
XOR, 82, 467 Laguerre polynomials, 415
Gates, 32 Last-in first-out, 168, 187
GCD,4 Latch, 119
Gene expression programming, 392 Lebesgue square-integrable functions,
Generic Array Logic, 115 407
Genetic programming, 384 Legendre polynomials, 413
Gray code, 320 Level-triggered, 126
Greatest common divisor, 4 LIFO, 168, 187
Lindenmayer system, 19
Half adder, 34 Linearly separable, 267
Hamming code, 199, 201 Linked list, 172
Hamming distance, 199 Literal, 28
Heisenberg model, 440 Local search, 316
Heisenberg picture, 446 Logistic function, 296
Hermite polynomials, 415 Logistic map, 146
Hexadecimal, 52
Hidden subgroup problem, 528 Magnitude comparator, 108
Hilbert curve, 149 Mantissa, 74
Hilbert space, 403, 405 Mathematical induction, 6
Hilbert-Schmidt norm, 546 McCulloch-Pitts model, 262
Holevo information, 554 Mealy machine, 236
Horner's rule, 147 Metropolis criterion, 12
588 Index

Minimum distance principle for error Parseval's relation, 412


correction, 200 Partial function, 15
Mixed state, 423 Partial trace, 431, 545
Modulo-2 addition, 58 Particle-number operator, 429
Moore machine, 233 Pattern mode, 294
Multiplexer, 97 Pauli spin matrices, 435
Multiplicity, 426 Penalty method, 361
Mutation, 316 Perceptron, 268
Mutation operator, 316 Peres-Horodecki criterion, 547
Mutual recursion, 152 Permutation group, 372
PGA,117
NAND, 31 Picard's method, 138
NMR,569 PLA,116
No-cloning theorem, 477 PLD,113
Noiseless coding theorem, 205 Point spectrum, 426
quantum, 553 Population, 313
Non-deterministic algorithm, 5
Positive, 419
Nondegenerate quantum code, 562
Postcondition, 7
NOR, 31
Pre-Hilbert space, 403
Norm, 418
Precondition, 7
Normalize, 74
Prefix, 18
Normalized representation, 74
Primitive data types, 171
Normed space, 403
Primitive recursive functions, 162
NOT, 27
Primitive recursively closed, 164
NP-class of problems, 259
Product form, 28
NP-complete, 259
Programmable
Nuclear magnetic resonance
array logic, 115
spectroscopy, 569
gate array, 117
Null word, 18
logic array, 116
Nullspace, 419
logic device, 113
Odd-parity function, 82, 198 read only memory, 114
One to one, 15 Projection operator, 420
One's complement method, 63 PROM, 114
One-parameter group, 424 Properly separated, 266
Onto, 15 Pruning, 165
OR,27 Pseudo-pure states, 569
Orthogonal, 404 Pure state, 423
Orthogonal complement, 409 Purification procedure, 547
Orthonormal sequence, 410
Overflow, 67 Q-automaton, 503
finalizing, 503
P-class of problems, 259 Quadratic threshold gate, 279
PAL, 115 Quantization, 445
Parent selection, 316 Quantum
Parity function, 29, 198 bit, 452
Index 589

dot, 566 Stack, 187


entropy bound, 554 Steiner's problem, 380
Fourier transform, 522 Stochastic algorithms, 10
halting problem, 502 Strings, 313
Hamming bound, 562 Subspace, 409
key distribution, 537 Sum of products form, 28
network, 451 Symbolic regression, 384
register, 453 Synaptic weights, 261
Qubit, 452 Synchronous circuits, 125
Quicksort, 143
Quine-McKluskey method, 38 Tenbyte, 52
Tensor product, 415, 425
Random algorithms, 10 Termination, 5
Range, 15 Threshold function, 262
Read only memory, 112 Total function, 15
Recursion, 135 Trace, 408, 422
Register, 52, 119 Transition diagram, 230
Reproduction, 315 Translation operator, 421
Residual spectrum, 426 Trapped ions, 564
Resolvent, 426 Tree, 190
Resolvent set, 426 Tri-state logic, 88
Response function, 503 Triangle inequality, 412
ROM, 112 Tridecompositional uniqueness
Russian peasant method, 104 theorem, 499
Truth table, 28, 451
Scalar product, 403
Turing machine, 238
Schmidt decomposition, 432
polynomial, 259
Schmidt number, 458
quantum, 504
Schmidt polar form, 432
universal, 253
Schrodinger equation, 443
Two's complement method, 65
Schrodinger picture, 446
Secular equation, 427 Unitary operator, 420
Self-adjoint, 418 Universal set of operations, 31
Separable, 409, 546
Shannon entropy, 5, 205 VHDL,118
Shannon information, 554 Vigenere table, 219
Short real format, 72
Weighted checksum, 204
Sign, 74
Werner state, 422
Simple unitary transformation, 463
Word, 18, 52
Simulated annealing, 12
Spectral theorem, 430 XNOR,31
Spectral theory, 426 XOR,29
Spectrum, 426
Spherical harmonics, 414
Spin matrices, 440
Stabilizer, 562

You might also like