Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Iterative Algorithms 1st Edition Almudevar

Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

Full download ebook at ebookgate.

com

Approximate iterative algorithms 1st Edition


Almudevar

https://ebookgate.com/product/approximate-
iterative-algorithms-1st-edition-almudevar/

Download more ebook from https://ebookgate.com


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Approximate iterative algorithms 1st Edition Anthony


Almudevar

https://ebookgate.com/product/approximate-iterative-
algorithms-1st-edition-anthony-almudevar/

Iterative Receiver Design 1st Edition Henk Wymeersch

https://ebookgate.com/product/iterative-receiver-design-1st-
edition-henk-wymeersch/

Iterative Methods in Combinatorial Optimization 1st


Edition Lap-Chi Lau

https://ebookgate.com/product/iterative-methods-in-combinatorial-
optimization-1st-edition-lap-chi-lau/

Algorithms 1st Edition Umesh Vazirani

https://ebookgate.com/product/algorithms-1st-edition-umesh-
vazirani/
Algorithms Illuminated Part 4 Algorithms for NP Hard
Problems 1st Edition Tim Roughgarden

https://ebookgate.com/product/algorithms-illuminated-
part-4-algorithms-for-np-hard-problems-1st-edition-tim-
roughgarden/

Introduction to Approximate Solution Techniques


Numerical Modeling and Finite Element Methods 1st
Edition Victor N. Kaliakin

https://ebookgate.com/product/introduction-to-approximate-
solution-techniques-numerical-modeling-and-finite-element-
methods-1st-edition-victor-n-kaliakin/

Essential Algorithms 1st Edition Rod Stephens

https://ebookgate.com/product/essential-algorithms-1st-edition-
rod-stephens/

Beginning Algorithms 1st Edition Simon Harris

https://ebookgate.com/product/beginning-algorithms-1st-edition-
simon-harris/

Grokking Algorithms 1st Edition Aditya Y. Bhargava

https://ebookgate.com/product/grokking-algorithms-1st-edition-
aditya-y-bhargava/
Approximate Iterative Algorithms
Approximate Iterative Algorithms

Anthony Almudevar
Department of Biostatistics and Computational Biology,
University of Rochester, Rochester, NY, USA
CRC Press/Balkema is an imprint of the Taylor & Francis Group, an informa business
© 2014 Taylor & Francis Group, London, UK
Typeset by MPS Limited, Chennai, India
Printed and Bound by CPI Group (UK) Ltd, Croydon, CR0 4YY
All rights reserved. No part of this publication or the information contained
herein may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means, electronic, mechanical, by photocopying, recording or
otherwise, without written prior permission from the publisher.
Although all care is taken to ensure integrity and the quality of this publication
and the information herein, no responsibility is assumed by the publishers nor
the author for any damage to the property or persons as a result of operation
or use of this publication and/or the information contained herein.
Library of Congress Cataloging-in-Publication Data
Almudevar, Anthony, author.
Approximate iterative algorithms / Anthony Almudevar, Department of
Biostatistics and Computational Biology, University of Rochester, Rochester, NY, USA.
pages cm
Includes bibliographical references and index.
ISBN 978-0-415-62154-0 (hardback) — ISBN 978-0-203-50341-6 (eBook PDF)
1. Approximation algorithms. 2. Functional analysis. 3. Probabilities.
4. Markov processes. I. Title.
QA76.9.A43A46 2014
519.2 33—dc23
2013041800
Published by: CRC Press/Balkema
P.O. Box 11320, 2301 EH Leiden,The Netherlands
e-mail: Pub.NL@taylorandfrancis.com
www.crcpress.com – www.taylorandfrancis.com
ISBN: 978-0-415-62154-0 (Hardback)
ISBN: 978-0-203-50341-6 (eBook PDF)
Table of contents

1 Introduction 1

PART I
Mathematical background 3

2 Real analysis and linear algebra 5


2.1 Definitions and notation 5
2.1.1 Numbers, sets and vectors 5
2.1.2 Logical notation 6
2.1.3 Set algebra 6
2.1.4 The supremum and infimum 7
2.1.5 Rounding off 7
2.1.6 Functions 7
2.1.7 Sequences and limits 8
2.1.8 Infinite series 9
2.1.9 Geometric series 11
2.1.10 Classes of real valued functions 11
2.1.11 Graphs 12
2.1.12 The binomial coefficient 13
2.1.13 Stirling’s approximation of the factorial 13
2.1.14 L’Hôpital’s rule 14
2.1.15 Taylor’s theorem 14
2.1.16 The l p norm 14
2.1.17 Power means 15
2.2 Equivalence relationships 16
2.3 Linear algebra 16
2.3.1 Matrices 16
2.3.2 Eigenvalues and spectral decomposition 18
2.3.3 Symmetric, Hermitian and positive definite matrices 21
2.3.4 Positive matrices 22
2.3.5 Stochastic matrices 24
2.3.6 Nonnegative matrices and graph structure 25
vi Table of contents

3 Background – measure theory 27


3.1 Topological spaces 27
3.1.1 Bases of topologies 29
3.1.2 Metric space topologies 29
3.2 Measure spaces 30
3.2.1 Formal construction of measures 31
3.2.2 Completion of measures 33
3.2.3 Outer measure 34
3.2.4 Extension of measures 35
3.2.5 Counting measure 36
3.2.6 Lebesgue measure 36
3.2.7 Borel sets 36
3.2.8 Dynkin system theorem 37
3.2.9 Signed measures 37
3.2.10 Decomposition of measures 38
3.2.11 Measurable functions 39
3.3 Integration 41
3.3.1 Convergence of integrals 42
3.3.2 Lp spaces 43
3.3.3 Radon-Nikodym derivative 43
3.4 Product spaces 44
3.4.1 Product topologies 44
3.4.2 Product measures 45
3.4.3 The Kolmogorov extension theorem 49

4 Background – probability theory 51


4.1 Probability measures – basic properties 52
4.2 Moment generating functions (MGF) and cumulant generating
functions (CGF) 59
4.2.1 Moments and cumulants 61
4.2.2 MGF and CGF of independent sums 62
4.2.3 Relationship of the CGF to the normal distribution 62
4.2.4 Probability generating functions 63
4.3 Conditional distributions 63
4.4 Martingales 66
4.4.1 Stopping times 68
4.5 Some important theorems 68
4.6 Inequalities for tail probabilities 70
4.6.1 Chernoff bounds 71
4.6.2 Chernoff bound for the normal distribution 71
4.6.3 Chernoff bound for the gamma distribution 72
4.6.4 Sample means 72
4.6.5 Some inequalities for bounded random variables 73
4.7 Stochastic ordering 74
4.7.1 MGF ordering of the gamma and exponential distribution 75
4.7.2 Improved bounds based on hazard functions 76
4.8 Theory of stochastic limits 77
4.8.1 Covergence of random variables 77
Table of contents vii

4.8.2 Convergence of measures 78


4.8.3 Total variation norm 79
4.9 Stochastic kernels 82
4.9.1 Measurability of measure kernels 83
4.9.2 Continuity of measure kernels 84
4.10 Convergence of sums 85
4.11 The law of large numbers 86
4.12 Extreme value theory 88
4.13 Maximum likelihood estimation 89
4.14 Nonparametric estimates of distributions 91
4.15 Total variation distance for discrete distributions 92

5 Background – stochastic processes 97


5.1 Counting processes 97
5.1.1 Renewal processes 98
5.1.2 Poisson process 99
5.2 Markov processes 100
5.2.1 Discrete state spaces 101
5.2.2 Global properties of Markov chains 102
5.2.3 General state spaces 106
5.2.4 Geometric ergodicity 109
5.2.5 Spectral properties of Markov chains 111
5.3 Continuous-time Markov chains 111
5.3.1 Birth and death processes 114
5.4 Queueing systems 114
5.4.1 Queueing systems as birth and death processes 115
5.4.2 Utilization factor 116
5.4.3 General queueing systems and embedded Markov
chains 116
5.5 Adapted counting processes 118
5.5.1 Asymptotic behavior 119
5.5.2 Relationship to adapted events 123

6 Functional analysis 125


6.1 Metric spaces 126
6.1.1 Contractive mappings 126
6.2 The Banach fixed point theorem 128
6.2.1 Stopping rules for fixed point algorithms 130
6.3 Vector spaces 131
6.3.1 Quotient spaces 133
6.3.2 Basis of a vector space 133
6.3.3 Operators 133
6.4 Banach spaces 134
6.4.1 Banach spaces and completeness 136
6.4.2 Linear operators 137
6.5 Norms and norm equivalence 139
6.5.1 Norm dominance 140
6.5.2 Equivalence properties of norm equivalence classes 140
viii Table of contents

6.6 Quotient spaces and seminorms 142


6.7 Hilbert spaces 144
6.8 Examples of Banach spaces 146
6.8.1 Finite dimensional spaces 146
6.8.2 Matrix norms and the submultiplicative property 147
6.8.3 Weighted norms on function spaces 147
6.8.4 Span seminorms 149
6.8.5 Operators on span quotient spaces 151
6.9 Measure kernels as linear operators 153
6.9.1 The contraction property of stochastic kernels 153
6.9.2 Stochastic kernels and the span seminorm 153

7 Fixed point equations 157


7.1 Contraction as a norm equivalence property 158
7.2 Linear fixed point equations 160
7.3 The geometric series theorem 161
7.4 Invariant transformations of fixed point equations 163
7.5 Fixed point algorithms and the span seminorm 164
7.5.1 Approximations in the span seminorm 166
7.5.2 Magnitude of fixed points in the span seminorm 167
7.6 Stopping rules for fixed point algorithms 168
7.6.1 Fixed point iteration in the span seminorm 169
7.7 Perturbations of fixed point equations 169

8 The distribution of a maximum 173


8.1 General approach 174
8.2 Bounds on M̄ based on MGFs 174
8.2.1 Sample means 176
8.2.2 Gamma distribution 177
8.3 Bounds for varying marginal distributions 178
8.3.1 Example 180
8.4 Tail probabilities of maxima 181
8.4.1 Extreme value distributions 182
8.4.2 Tail probabilities based on Boole’s inequality 182
8.4.3 The normal case 183
8.4.4 The gamma(α, λ) case 184
8.5 Variance mixtures based on random sample sizes 185
8.6 Bounds for maxima based on the first two moments 186
8.6.1 Stability 188

PART II
General theory of approximate iterative algorithms 189

9 Background – linear convergence 191


9.1 Linear convergence 191
9.2 Construction of envelopes – the nonstochastic case 194
Table of contents ix

9.3 Construction of envelopes – the stochastic case 196


9.4 A version of l’Hôpital’s rule for series 196

10 A general theory of approximate iterative algorithms (AIA) 199


10.1 A general tolerance model 201
10.2 Example: a preliminary model 201
10.3 Model elements of an AIA 202
10.3.1 Lipschitz kernels 202
10.3.2 Lipschitz convolutions 203
10.4 A classification system for AIAs 204
10.4.1 Relative error model 206
10.5 General inequalities 208
10.5.1 Hilbert space models of AIAs 209
10.6 Nonexpansive operators 213
10.6.1 Application of general inequalities to nonexpansive
AIAs 214
10.6.2 Weakly contractive AIAs 216
10.6.3 Examples 216
10.6.4 Stochastic approximation (Robbins-Monro
algorithm) 218
10.7 Rates of convergence for AIAs 220
10.7.1 Monotonicity of the Lipschitz kernel 221
10.7.2 Case I – strongly contractive models with nonvanishing
bounds 221
10.7.3 Case II – rapidly vanishing approximation error 222
10.7.4 Case III – approximation error decreasing at
contraction rate 223
10.7.5 Case IV – Approximation error greater than
contraction rate 224
10.7.6 Case V – Contraction rates approaching 1 224
10.7.7 Adjustments for relative error models 227
10.7.8 A comparison of Banach space and Hilbert space
models 228
10.8 Stochastic approximation as a weakly contractive algorithm 230
10.9 Tightness of algorithm tolerance 231
10.10 Finite bounds 232
10.10.1 Numerical example 233
10.11 Summary of convergence rates for strongly contractive models 235

11 Selection of approximation schedules for coarse-to-fine AIAs 239


11.1 Extending the tolerance model 239
11.1.1 Comparison model for tolerance schedules 241
11.1.2 Regularity conditions for the computation function 242
11.2 Main result 242
11.3 Examples of cost functions 243
11.4 A general principle for AIAs 245
x Table of contents

PART III
Application to Markov decision processes 247

12 Markov decision processes (MDP) – background 249


12.1 Model definition 250
12.2 The optimal control problem 253
12.2.1 Adaptive control policies 253
12.2.2 Optimal control policies 253
12.3 Dynamic programming and linear operators 255
12.3.1 The dynamic programming operator (DPO) 256
12.3.2 Finite horizon dynamic programming 257
12.3.3 Infinite horizon problem 258
12.3.4 Classes of MDP 259
12.3.5 Measurability of the DPO 260
12.4 Dynamic programming and value iteration 261
12.4.1 Value iteration and optimality 262
12.5 Regret and ε-optimal solutions 265
12.6 Banach space structure of dynamic programming 267
12.6.1 The contraction property 269
12.6.2 Contraction properties of the DPO 269
12.6.3 The equivalence of uniform convergence and
contraction for the DPO 272
12.7 Average cost criterion for MDP 274

13 Markov decision processes – value iteration 279


13.1 Value iteration on quotient spaces 279
13.2 Contraction in the span seminorm 281
13.2.1 Contraction properties of the DPO 281
13.3 Stopping rules for value iteration 283
13.4 Value iteration in the span seminorm 283
13.5 Example: M/D/1/K queueing system 284
13.6 Efficient calculation of |||QJ |||SP 288
13.7 Example: M/D/1/K system with optimal control of service
capacity 291
13.8 Policy iteration 292
13.9 Value iteration for the average cost optimization 293

14 Model approximation in dynamic programming – general theory 295


14.1 The general inequality for MDPs 295
14.2 Model distance 298
14.3 Regret 300
14.4 A comment on the approximation of regret 302
14.5 Example 304

15 Sampling based approximation methods 309


15.1 Modeling maxima 310
Table of contents xi

15.1.1 Nonuniform sample allocation: Dependence on qmin ,


and the ‘Curse of the Supremum Norm’ 313
15.1.2 Some queueing system examples 314
15.1.3 Truncated geometric model 315
15.1.4 M/G/1/K queueing model 316
15.1.5 Restarting schemes 318
15.2 Continuous state/action spaces 320
15.3 Parametric estimation of MDP models 321

16 Approximate value iteration by truncation 327


16.1 Truncation algorithm 328
16.2 Regularity conditions for tolerance-cost model 329
16.2.1 Suboptimal orderings 329
16.3 Example 330

17 Grid approximations of MDPs with continuous state/action spaces 333


17.1 Discretization methods 333
17.2 Complexity analysis 335
17.3 Application of approximation schedules 336

18 Adaptive control of MDPs 341


18.1 Regret bounds for adaptive policies 342
18.2 Definition of an adaptive MDP 343
18.3 Online parameter estimation 345
18.4 Exploration schedule 347

Bibliography 351
Subject index 357
Chapter 1

Introduction

The scope of this volume is quite specific. Suppose we wish to determine the solution
V ∗ to a fixed point equation V = TV for some operator T. Under suitable conditions,
V ∗ will be the limit of an iterative algorithm

V 0 = v0
Vk = TVk−1 , k = 1, 2, . . . , (1.1)

where v0 is some initial solution. Such algorithms are ubiquitous in applied mathemat-
ics, and their properties well known.
Then suppose (1.1) is replaced with an approximation

V0 = v0
Vk = T̂k Vk−1 , k = 1, 2, . . . , (1.2)

where each T̂k is close to T in some sense. The subject of this book is the analysis
of algorithms of the form (1.2). The material in this book is organized around three
questions:

(Q1) If (1.1) converges to V ∗ , under what conditions does (1.2) also converge to V ∗ ?
(Q2) How does the approximation affect the limiting properties of (1.2)? How close
is the limit of (1.2) to V ∗ , and what is the rate of convergence (particularly in
comparison to that of (1.1))?
(Q3) If (1.2) is subject to design, in the sense that an approximation parameter, such
as grid size, can be selected for each T̂k , can an approximation schedule be
determined which minimizes approximation error as a function of computation
time?

From a theoretical point of view, the purpose of this book is to show how quite
straightforward principles of functional analysis can be used to resolve these ques-
tions with a high degree of generality. From the point of view of applications, the
primary interest is in dynamic programming and Markov decision processes (MDP),
with emphasis on approximation methods and computational efficiency. The emphasis
2 Approximate iterative algorithms

is less on the construction of specific algorithms then with the development of theoret-
ical tools with which broad classes of algorithms can be defined, and hence analyzed
with a common theory.
The book is divided into three parts. Chapters 2–8 cover background material in
real analysis, linear algebra, measure theory, probability theory and functional analy-
sis. This section is fairly extensive in comparison to other volumes dealing specifically
with MDPs. The intention is that the language of functional analysis be used to express
concepts from the other disciplines, in as general but concise a manner as possible.
By necessity, many proofs are omitted in these chapters, but suitable references are
given when appropriate.
Chapters 9–11 form the core of the volume, in the sense that the questions (Q1)–
(Q3) are largely considered here. Although a number of examples are considered (most
notable, an analysis of the Robbins-Monro algorithm), the main purpose is to deduce
properties of general classes of approximate iterative algorithms on Banach and Hilbert
spaces.
The remaining chapters deal with Markov decision processes (MDPs), which forms
the principal motivation for the theory presented here. A foundation theory of MDPs
is given in Chapters 12 and 13, from the point of view of functional analysis, while
the remain chapters discuss approximation methods.
Finally, I would like to acknowledge the patience and support of colleagues and
family, especially Cynthia, Benjamin and Jacob.
Part I

Mathematical background
Chapter 2

Real analysis and linear algebra

In this chapter we first define notation, then review a number of important results
in real analysis and linear algebra of which use will be made in later chapters. Most
readers will be familiar with the material, but in a number of cases it will be important
to establish which of several commonly used conventions will be used. It will also
prove convenient from time to time to have a reference close at hand. This may be
especially true of the section on spectral decomposition.

2.1 DEFINITIONS AND NOTATION

In this section we describe the notational conventions and basic definitions to be used
throughout the book.

2.1.1 Numbers, sets and vectors


A set is a collection of distinct objects of any kind. Each member of a set is referred to
as an element, and is represented once. A set E may be indexed. That is, given an index
set T , each element may be assigned a unique index t ∈ T , and all indices in T are
assigned to exactly one element of E, denoted xt . We may then write E = {xt ; t ∈ T }.
The set of (finite) real numbers is denoted R, and the set of extended real numbers
is denoted R̄ = R ∪ {−∞, ∞}. The restriction to nonegative real numbers is written
R+ = [0, ∞) and R̄+ = R+ ∪ {∞}. We use standard notation for open, closed, left closed
and right closed intervals (a, b), [a, b], [a, b), (a, b]. A reference to a interval I on R̄
may be any of these types.
The set of (finite) integers will be denoted I, while the extended integers will be
I∞ = I ∪ {−∞, ∞}. The set of natural numbers N is taken to be the set of positive
integers, while N0 is the set of nonnegative integers. A rational number is any real
number expressible as a ratio of integers. √
Then C denotes the complex numbers z = a + bi ∈ C, where i = −1 is the imagi-
nary number and a, b ∈ R. Note that i is added and multiplied as though it were a real
number, in particular i2 = −1. Multiplication is defined by z1 z2 = (a1 + b1 i)(a2 + b2 i) =
a1 a2 − b1 b2 + (a1 b2 + a2 b1 )i. The conjugate of z = a + bi ∈ C is written z̄ = a − bi,
so that zz̄ = a2 + b2 ∈ R. Together, z and z̄, without reference to their order, form
a conjugate pair. √
The absolute value of a ∈ R is denoted |a| = a2 , while |z| = (zz̄)1/2 = (a2 + b2 )1/2 ∈
R is also known as the magnitude or modulus of z ∈ C.
6 Approximate iterative algorithms

If S is a set of any type of number, S d , d ∈ N, denotes the set of d-dimensional


vectors s̃ = (s1 , . . . , sd ), which are ordered collections of numbers si ∈ S. In particular,
the set of d-dimensional real vectors is written Rd . When 0, 1 ∈ S, we may write the
zero or one vector 0  = (0, . . . , 0), 1
 = (1, . . . , 1), so that c1
 = (c, . . . , c).
A collection of d numbers from S is unordered if no reference is made to the
order (they are unlabeled). Otherwise the collection is ordered, that is, it is a vector.
An unordered collection from S differs from a set in that a number s ∈ S may be
represented more than once. Braces {. . .} enclose a set while parentheses ( . . . ) enclose a
vector (braces will also be used to denote indexed sequences, when the context is clear).

2.1.2 Logical notation


We will make use of conventional logical notation. We write S1 ⇒ S2 if statement S1
implies statement S2 , and S1 ⇔ S2 whenever S1 ⇒ S2 and S2 ⇒ S1 both hold. In addition,
‘for all’ is written ∀, ‘there exists’ is written ∃ and ‘such that’ is written .

2.1.3 Set algebra


If x is, or is not, an element of E, we write x ∈ E or x ∈ / E. If all elements in A are also
in B then A is a subset of B, that is A ⊂ B. If A ⊂ B and B ⊂ A then A = B. If A ⊂ B
but A  = B, then A is a strict subset of B. Define the empty set, or null set, ∅, which
contains no elements. We may write ∅ ⊂ A for any set A.
Set algebra is defined for the class of all subsets of a nonempty set , commonly
known as a universe. Any set we consider may only contain elements of . This always
includes both ∅ and . Set operations include union (A ∪ B) = (A or B) = (A ∨ B) (all
elements in either A or B), intersection (A ∩ B) = (A and B) = (A ∧ B) (all elements in
both A and B), complementation (∼A) = (not A) = (Ac ) (all elements in  not in A),
relative complementation, or set difference, (B ∼ A) = (B − A) = (B not A) = (BAc ) (all
elements in B not in A). For any indexed collection of subsets At ⊂ , t ∈ T , the union
is ∪t∈T At , the set of all elements in at least one At , and the intersection is ∩t∈T At , the
set of all elements in all At . De Morgan’s Law applies to any index set T (finite or
infinite), that is,

∪t∈T Act = (∩t∈T At )c and ∩t∈T Act = (∪t∈T At )c .

The cardinality of a set E is the number of elements it contains, and is denoted |E|.
If |E| < ∞ then E is a finite set. We have |∅| = 0. If |E| = ∞, this statement does not
suffice to characterize the cardinality of E. Two sets A, B are in a 1-1 correspondence
if a collection of pairs (a, b), a ∈ A, b ∈ B can be constructed such that each element of
A and of B is in exactly one pair. In this case, A and B are of equal cardinality. The
pairing is known as a bijection.
If the elements of A can be placed in a 1-1 correspondence with N we say A is
countable (is denumerable). We also adopt the convention of referring to any subset of
a countable set as countable. This means all finite sets are countable. If for countable
A we have |A| = ∞ then A is infinitely countable. Note that by some conventions, the
term countable is reserved for infinitely countable sets. For our purposes, it is more
natural to consider the finite sets as countable.
Real analysis and linear algebra 7

All infinitely countable sets are of equal cardinality with N, and so are
mutually of equal cardinality. informally, a set is countable if it can be writ-
ten as a list, finite or infinte. The set Nd is countable since, for example, N2 =
{(1, 1), (1, 2), (2, 1), (1, 3), (2, 2), (3, 1), . . .}. The set of rational numbers is countable,
since the pairing of numerator and denominator, in any canonical representation, is a
subset of N2 .
A set A is uncountable (is nondenumerable) if |A| = ∞ but A is not countable. The
set of real numbers, or any nonempty interval of real numbers, is uncountable.
If A1 , . . . , Ad are d sets, then A1 × A2 × · · · × Ad = ×di=1 Ai is a product set, con-
sisting of the set of all ordered selections of one element from each set ai ∈ Ai . A vector
is an element of a product set, but a product set is more general, since the sets Ai need
not be equal, or even contain the same type of element. The definition may be extended
to arbitrary forms of index sets.

2.1.4 The supremum and infimum


For any set E ⊂ R, x = max E if x ∈ E and y ≤ x ∀y ∈ E. Similarly x = min E if x ∈ E and
y ≥ x ∀y ∈ E. The quantities min E or max E need not exist (consider E = (0, 1)).
The supremum of E, denoted sup E is the least upper bound of E. Similarly, the
infimum of E, denoted inf E is the greatest lower bound of E. In contrast with the
min, max operations, the supremum and infimum always exist, possibly equalling
−∞ or ∞. For example, if E = (0, 1), then inf E = 0 and sup E = 1. That is, inf E or
sup E need not be elements of E. All numbers in R̄ are both upper and lower bounds
of the empty set ∅, which means

inf ∅ = ∞ and sup ∅ = −∞.

If E = {xt ; t ∈ T } is an indexed set we write, when possible,

max E = max xt , min E = min xt , sup E = sup xt , inf E = inf xt .


t∈T t∈T t∈T t∈T

For two numbers a, b ∈ R̄, we may use the notations max{a, b} = x ∨ y = max(a, b)
and min{a, b} = x ∧ y = min(a, b).

2.1.5 Rounding off


Rounding off will proceed by the floor and ceiling conventions 1.99 = 1 = 1 and
1.01 = 2 = 2.
When we write x ≈ 3.45, we mean x ∈ [3.445, 3.455). This convention is adopted
throughout.

2.1.6 Functions
If X, Y are two sets, then a function f : X → Y assigns a unique element of Y to each
element of X, in particular y = f (x). We refer to X and Y as the domain and range
(or codomain) of f . The image of a subset A ⊂ X is f (A) = {f (x) ∈ Y | x ∈ A}, and the
preimage (or inverse image) of a subset B ∈ Y is f −1 (B) = {x ∈ X | f (x) ∈ B}. We say f is
8 Approximate iterative algorithms

injective (or one-to-one) if f (x1 )  = f (x2 ) whenever x1  = x2 , f is surjective (alternatively,


many-to-one or onto) if Y = f (X), and f is bijective if it is both injective and surjective.
An injective, surjective or bijective function is also referred to as an injection, surjection
or bijection. A bijective function f is invertible, and possesses a unique inverse function
f −1 : Y → X which is also bijective, and satisfies x = f −1 (f (x)). Only bijective functions
are invertible. Note that a preimage may be defined for any function, despite what is
suggested by the notation.
An indicator function f maps a domain X to {0, 1} by specifying a set E ⊂ X
and setting f (x) = 1 if x ∈ E and f (x) = 0 otherwise. This may be written explicitly as
f (x) = I{x ∈ E}, or IE when the context is clear.
For real valued functions f , g, (f ∨ g)(x) = f (x) ∨ g(x), (f ∧ g)(x) = f (x) ∧ g(x). We
write f ≡ c for constant c if f (x) = c ∀x. A function f on R satisfying f (x) = −f (−x)
or f (x) = f (−x) is an odd or even function. A real valued function f will some-
times be decomposed into positive and negative components f = f + − f − where f + =
f (x)I{f (x) > 0} and f − = |f (x)|I{f (x) < 0}.
For mappings f : X → Y and g : Y → Z, where f is surjective, we denote the
composition (g ◦ f ) : X → Z, evaluated by g(f (x)) ∈ Z ∀x ∈ X.

2.1.7 Sequences and limits


A sequence of real numbers a0 , a1 , a2 , . . . will be written {ak }. Depending on the context,
a0 may or may not be defined. For any sequence of real numbers, by limk→∞ ak = a ∈ R
is always meant that ∀ > 0 ∃K k > K ⇒ |a − ak | < . A reference to limk→∞ ak implies
an assertion that a limit exists. This will sometimes be written ak → a or ak →k a when
the context makes the meaning clear.
When a limit exists, a sequence is convergent. If a sequence does not converge it is
divergent. This excludes the possibility of a limit ∞ or −∞ for a convergent sequence.
However, it is sometimes natural to think of a sequence with a ‘limit’ in {−∞, ∞}. We
can therefore write limk→∞ ak = ∞ if ∀M ∃K k > K ⇒ ak > M, and limk→∞ ak = −∞
if ∀M ∃K k > K ⇒ ak < M. Either sequence is properly divergent.
If ak+1 ≥ ak , the sequence must possess a limit a, possibly ∞. This is written
ak ↑ a. Similarly, if ak+1 ≤ ak , there exists a limit ak ↓ a, possibly −∞. Then {ak } is an
nondecreasing or nonincreasing sequence (or increasing, decreasing when the defining
inequalities are strict).
Then lim supk→∞ ak = limk→∞ supi≥k ai . This quantity is always defined since ak =
supi≥k ai defines an nonincreasing sequence. Similarly lim inf k→∞ ak = limk→∞ inf i≥k ai
always exists. We always have lim inf k→∞ ak ≤ lim supk→∞ ak and limk→∞ ak exists if
and only if a = lim inf k→∞ ak = lim supk→∞ ak , in which case limk→∞ ak = a.
When limit operations are applied to sequences of real values functions, the limits
are assumed to be evaluated pointwise. Thus, if we write fn ↑ f , this means that fn (x) ↑
f (x) for all x, and therefore fn is a nondecreasing sequence of functions, with analagous
conventions used for the remaining types of limits.
Note that pointwise convergence of a function limn→∞ fn = f is distinct
from uniform convergence of a sequence of functions, which is equivalent to
limn→∞ supx |fn (x) − f (x)| = 0. Of course, uniform convergence implies pointwise con-
vergence, but the converse does not hold. Unless uniform convergence is explicitly
stated, pointwise convergence is intended.
Real analysis and linear algebra 9

When the context is clear, we may use the more compact notation d̃ = (d1 , d2 , . . . )
to represent a sequence {dk }. If ã = {ak } and b̃ = {bk } then we write ã ≤ b̃ if ak ≤ bk
for all k.
Let S be the class of all sequences of finite positive real numbers which con-
verge to zero, and let S − be those sequences in S which are nonincreasing. If {ak } ∈ S
we define the lower and upper convergence rates λl {ak } = lim inf k→∞ ak+1 /ak and
λu {ak } = lim supk→∞ ak+1 /ak . If 0 < λl {ak } ≤ λu {ak } < 1 then {ak } converges linearly.
If λu {ak } = 0 or λl {ak } = 1 then {ak } converges superlinearly or sublinearly, respec-
tively. We also define a weaker characterization of linear convergence by setting
1/k 1/k
λ̂l {ak } = lim inf k→∞ ak and λ̂u {ak } = lim supk→∞ ak .
When λl {ak } = λu {ak } = ρ we write λ{ak } = ρ. Similarly λ̂l {ak } = λ̂u {ak } = ρ is
written λ̂{ak } = ρ.
A sequence {ak } is of order {bk } if lim supk ak /bk < ∞, and may be written ak =
O(bk ). If ak = O(bk ) and bk = O(ak ) we write ak = (bk ). Similarly, for two real valued
mappings ft , gt on (0, ∞) we write ft = O(gt ) if lim supt→∞ ft /gt < ∞, and ft = (gt ) if
ft = O(gt ) and gt = O(ft ).
A sequence {bk } dominates {ak } if limk ak /bk = 0, which may be written ak =
o(bk ). A stronger condition holds if λu {ak } < λl {bk }, in which case we say {bk } lin-
early dominates {ak }, which may be written ak = o (bk ). Similarly, for two real
valued mappings ft , gt on (0, ∞) we write ft = o(gt ) if limt→∞ ft /gt = 0, that is, gt
dominates ft .

2.1.8 Infinite series


Suppose we are given sequence {ak }. The corresponding series (or infinite series) is
denoted


 
ak = ak = a1 + a2 + · · · .
k=1 k

Some care is needed in defining a sum of an infinite collection of numbers. First, define
partial sums


n
Sn = ak = a1 + a2 + · · · + an , n ≥ 1.
k=1

We may set S0 = 0. It is natural to think of evaluating a series by sequentially adding


each an to a cumulative total Sn−1 . In this case, the total sum equals limn Sn , assuming
the limit exists. We say that the series (or simply, the sum) exists if the limit exists
(including −∞ or ∞). The series is convergent if the sum exists and is finite. A series
is divergent if it is not convergent, and is properly divergent if the sum exists but is
not finite.
10 Approximate iterative algorithms

It is important to establish whether or not the value of the series depends on the
 σ : N " → N is a bijective mapping (essentially,
order of the sequence. Precisely, suppose
an infinite permutation). If the series k ak exists, we would like to know if
 
ak = aσ(k) . (2.1)
k k

Since these two quantities are limits of distinct partial  sums, equality need not hold.
This question
 has a quite definite resolution. A series k ak is called absolutely conver-
gent if k |ak | is convergent (so that all convergent series of nonnegative sequences are
absolutely convergent). A convergent sequence is unconditionally convergent if (2.1)
holds for all permutations σ. It may be shown that a series is absolutely convergent
if and only if it is unconditionally convergent. Therefore, a convergent series may be
defined as conditionally convergent if either it is not absolutely convergent, or if (2.1)
does not hold for at least one σ. Interestingly, by the Riemann series theorem, if k ak
is conditionally convergent then for any L ∈ R̄ there exists permutation σL for which

k aσL (k) = L.
There exist many well known tests for series convergence, and can be found in
most calculus textbooks.
Let E = {at ; t ∈ T } be a infinitely countable indexed set of extended real numbers.

For example, we may have T = Nd . When there is no ambiguity, we can take t at
to be the sum of all elements of E. Of course, in this case the implication is that the
sum does not depend on the summation order.  This is the case if and only if there
is a bijective mapping σ : N " → T for which k aσ(k) is absolutely convergent. If this
holds, it holds for all such bijective mappings. All that is needed is to verify that the
cumulative sum of the elements |at |, taken in any order, remains bounded. This is
written, when possible
 
at = at .
t∈T t

We also define for a sequence {ak } the product ∞
k=1 ak . We will usually be interested
in products of positive sequences, so this may be converted to a series by the log
transformation:
∞  ∞
 
log ak = log(ak )
k=1 k=1

so that the issues are largely


 the same  as for series. Similarly, for indexed set E =
{at ; t ∈ T }, we may define t∈T at = t at when no ambiguity arises. This will be the
case when, for example, either at ∈ (0, 1] for all t or at ∈ [1, ∞) for all t.
Finally, we make note of the following convention. We will sometimes be interested

 subset of the index set T ⊂ T . This
in summing over a strict poses no particular
problem if the series t at is well defined. If it happens that T  = ∅, we will take
 
at = 0 and at = 1. (2.2)
t∈∅ t∈∅
Real analysis and linear algebra 11

2.1.9 Geometric series


We will make use of the following geometric series:


 (i + m)! m!
ri = for r2 < 1, m = 0, 1, 2, . . .
i! (1 − r)m+1
i=0


n
1 − rn+1
ri = for r  = 1. (2.3)
1−r
i=0

2.1.10 Classes of real valued functions


Suppose X is a subset of R̄. The real valued function f : X → R̄ is a bounded func-
tion if supx∈X |f (x)| < ∞. In addition f is bounded below or bounded above if
infx∈X f (x) > −∞ or supx∈X f (x) < ∞.
A real valued function f : X → R̄ is lower semicontinuous at x0 if xn →n x0 implies
lim inf n f (xn ) ≥ f (x0 ), or upper semicontinuous at x0 if xn → x0 implies lim supn f (xn ) ≤
f (x0 ). We use the abbreviations lsc and usc. A function is, in general, lsc (usc) if it is
lsc (usc) at all x0 ∈ X . Equivalently, f is lsc if {x ∈ X | f (x) ≤ λ} is closed for all λ ∈ R,
and is usc if {x ∈ X | f (x) ≥ λ} is closed for all λ ∈ R. A function is continous (at x0 ) if
and only if it is both lsc and usc (at x0 ). Note that only sequences in X are required
for the definition, so that if f is lsc or usc on X , it is also lsc or usc on X  ⊂ X .
A set X ⊂ Rd is convex if for any p ∈ [0, 1] and any x1 , x2 ∈ X we also have
px1 + (1 − p)x2 ∈ X . A real valued function f : X → R on a convex set X is con-
vex if for any p ∈ [0, 1] and any x1 , x2 ∈ X we have pf (x1 ) + (1 − p)f (x2 ) ≥ f (px1 +
(1 − p)x2 ). Additionally, f is strictly convex if pf (x1 ) + (1 − p)f (x2 ) > f (px1 + (1 −
p)x2 ) whenever p ∈ (0, 1) and x1  = x2 . If −f is (strictly) convex then f is (strictly)
concave.
The usual kth order partial derivatives, when they exist, are written
∂k f /∂xi1 . . . ∂xik , and if d = 1 the kth total derivative is written d k f /dxk = f (k) (x). A
derivative is a function on X , unless evaluation at a specific value of x ∈ X is indicated,
as in d k f /dxk |x=x0 = f (k) (x0 ). The first and second total derivative will also be written
f  (x) and f  (x) when the context is clear.
The following function spaces are commonly defined: C(X ) is the set of all contin-
uous real valued functions on X , while Cb (X ) ⊂ C(X ) denotes all bounded continuous
functions on X . In addition, C k (X ) ⊂ C(X ) is the set of all continuous functions
on X for which all order 1 ≤ j ≤ k derivatives exist and are continuous on X , with
C ∞ (X ) ⊂ C(X ) denoting the class of functions with continuous derivatives of all orders
(the infinitely divisible functions). Note that a function on R may possess derivatives
f  (x) everywhere (which are consistent in direction), without f  (x) being continuous.
When defining a function space, the convention that X is open, with X̄ representing
the closure of X when needed, is sometimes adopted. This ensures that the convential
definitions of continuity and differentiability apply (formally, any bounded function
defined on a finite set X is continuous, since the only convergent sequences in X are
constant ones).
12 Approximate iterative algorithms

2.1.11 Graphs
A graph is a collection of nodes and edges. Most commonly, there are m nodes uniquely
labeled by elements of set V = {1, . . . , m}. We may identify the set of nodes as V
(although sometimes unlabeled graphs are studied). An edge is a connection between
two nodes, of which there are two types. A directed edge is any ordered pair from V,
and an undirected edge is any unordered pair from V. Possibly, the two nodes defining
an edge are the same, which yields a self edge. If E is any set of edges, then G = (V, E)
defines a graph. If all edges are directed (undirected), the graph is described as directed
(undirected), but a graph may contain both types.
It is natural to imagine a dynamic process on a graph defined by node occupancy.
A directed edge (v1 , v2 ) denotes the possibly of a transition from v1 to v2 . Accordingly,
a path within a directed graph G = (V, E) is any sequence of nodes v0 , v1 , . . . , vn for
which (vi−1 , vi ) ∈ E for 1 ≤ i ≤ n. This describes a path from v0 to vn of length n (the
number of edges needed to construct the path).
It will be instructive to borrow some of the terminology associated with the theory
of Markov chains (Section 5.2). For example, if there exists a path starting at i and
ending at j we say that j is accessible from i, which is written i → j. If i → j and j → i
then i and j communicate, which is written i ↔ j. The connectivity properties of a
directed graph are concerned with statements of this kind, as well as lengths of the
relevant paths.
The adjacency matrix adj(G) of graph G is an m × m 0-1 matrix with element
gi,j = 1 if and only if the graph contains directed edge (i, j). The path properties of G
can be deduced directly from the iterates adj(G)n (conventions for matrices are given
in Section 2.3.1).

Theorem 2.1 For any directed graph G with adjacency matrix AG = adj(G) there
exists a path of length n from node i to node j if and only if element i, j of AnG is
positive.
Proof Let g[k]i,j be element i, j of AkG . All such elements are nonnegative. Suppose,
as an induction hypothesis, the theorem holds for all paths of length n , for any n < n.
We may write


m
g[n]i,j = g[n ]i,k g[n − n ]k,j ,
k=1

from which we conclude that g[n]i,j > 0 if and only if for some k we have g[n ]i,k > 0
and g[n − n ]k,j > 0. Under the induction hypothesis, the latter statement is equivalent
to the claim that for all n < n there is a node k for which there exists a path of length
n from i to k and a path of length n − n from k to j. In turn, this claim is equivalent
to the claim that there exists a path of length n from i to j. The induction hypothesis
clearly holds for n = 1, which completes the proof. ///

It is interesting to compare Theorem 2.1 to the Chapman-Kolmogorov equations


(5.4) associated with the theory of Markov chains. It turns out that many important
properties of a Markov chain can be understood as the path properties of a directed
Real analysis and linear algebra 13

graph. It is especially important to note that in Theorem 2.1 we can, without loss of
generality, replace the ‘1’ elements in AG with any positive number. Accordingly, we
give an alternative version of Theorem 2.2 for nonnegative matrices.

Theorem 2.2 Let A be an n × n matrix of nonnegative elements ai,j . Let a[k]i,j be


element i, j of Ak . Then a[n]i,j > 0 if and only if there exists a finite sequence of n + 1
indices v0 , v1 , . . . , vn , with v0 = i, vn = j, for which avk−1 ,vk > 0 for 1 ≤ k ≤ n.
Proof The proof follows that of Theorem 2.1. ///

The implications of this type of path structure are discussed further in Sections
2.3.4 and 5.2.

2.1.12 The binomial coefficient


n
For any n ∈ N0 the factorial is written n! = i=1 i. By convention, 0! = 1 (compare to
(2.2)). The binomial coefficient is
 
n n!
= , n ≥ k, n, k ∈ N0 .
k k!(n − k)!

Given m ≥ 2, if ni ∈ N0 , i = 1, . . . , m, and n = n1 + · · · + nm , then the multinomial


coefficient is
 
n n!
= m .
n1 , . . . , nm i=1 ni !

The Binomial Theorem states that for a, b ∈ R and n ∈ N the following equality
holds
n  
 n i n−i
(a + b)n = ab . (2.4)
i
i=0

2.1.13 Stirling’s approximation of the factorial


The factorial n! can be approximated accurately using series expansions. See, for exam-
ple, Feller (1968) (Chapter 2, Volume 1). Stirling’s approximation for the factorial is
given by

sn = (2π)1/2 nn+1/2 e−n , n ≥ 1,

and if we set n! = sn ρn , we have

e1/(12n+1) < ρn < e1/(12n) . (2.5)

The approximation is quite sharp, guaranteeing that (a) limn→∞ n!/sn = 1; (b) 1 <
n!/sn < e1/12 < 1.087 for all n ≥ 1; (c) (12n + 1)−1 < log(n!) − log(sn ) < (12n)−1 for all
n ≥ 1.
14 Approximate iterative algorithms

2.1.14 L’Hôpital’s rule


Suppose f , g ∈ C(X ) for open interval X , and for x0 ∈ X we have limx→x0 f (x) =
limx→x0 g(x) = b, where b ∈ {−∞, 0, ∞}. The ratio f (x0 )/g(x0 ) is not defined, but the
limit limx→x0 f (x)/g(x) may be. If f , g ∈ C 1 (X − {x0 }), and g  (x)  = 0 for x ∈ X − {x0 }
then l’Hôpital’s Rule states that

lim f (x)/g(x) = lim f  (x)/g  (x),


x→x0 x→x0

provided the right hand limit exists.

2.1.15 Taylor’s theorem


Suppose f is n times differentiable at x0 . The nth order Taylor’s polynomial about x0
is defined as

n
f (i) (x0 )
Pn (x; x0 ) = (x − x0 )i , (2.6)
i!
i=1

and the remainder term is given by

Rn (x; x0 ) = f (x) − Pn (x; x0 ). (2.7)

The use of Pn (x; x0 ) to approximate f (x) is made precise by Taylor’s Theorem:

Theorem 2.3 Suppose f is n + 1 times differentiable on [a, b], f ∈ C n ([a, b]), and x0 ∈
[a, b]. Then for each x ∈ [a, b] there exists η(x), satisfying min(x, x0 ) ≤ η(x) ≤ max(x, x0 )
for which

f (n+1) (η(x))
Rn (x; x0 ) = (x − x0 )n+1 , (Lagrange form) (2.8)
(n + 1)!

as well as η (x), also satisfying min(x, x0 ) ≤ η (x) ≤ max(x, x0 ), for which

f (n+1) (η (x))


Rn (x; x0 ) = (x − η (x))n (x − x0 ). (Cauchy form) (2.9)
(n + 1)!

The Lagrange form of the remainder term is the one commonly intended, and
we adopt that convention here, although it is worth noting that alternative forms are
also used.

2.1.16 The l p norm


The l p norm for p ≥ 0 is defined for any x = (x1 , . . . , xn ) ∈ Rn by
 1/p

n
%x%p = |xi |p .
i=1
Real analysis and linear algebra 15

for p < ∞, and

%x%∞ = max|xi | .
i

when p = ∞.

2.1.17 Power means


For a collection of positive numbers ã = (a1 , . . . , an ) the power mean is defined
 p
1/p
as Mp [ã] = n−1 ni=1 ai for finite nonzero p. The definition is extended to
p = 0, −∞, ∞ by the existence of well defined limits, yielding M−∞ [ã] = mini {ai },
n 1/n
M0 [ã] = i=1 ai and M∞ [ã] = maxi {ai }.

Theorem 2.4 Suppose for positive numbers ã = (a1 , . . . , an ) and real number p ∈
 p
1/p
(−∞, 0) ∪ (0, ∞) we define power mean Mp [ã] = n−1 ni=1 ai . Then

 1/n

n
lim Mp [ã] = ai = M0 [ã] , (2.10)
p→0
i=1

lim Mp [ã] = max{ai } = M∞ [ã] and (2.11)


p→∞ i

lim Mp [ã] = min{ai } = M−∞ [ã] , (2.12)


p→−∞ i

which justifies the conventional definitions of M−∞ [ã] , M0 [ã] and M∞ [ã]. In addition,
−∞ ≤ p < q ≤ ∞ implies Mp [ã] ≤ Mq [ã], with equality if and only if all elements of ã
are equal.
Proof By l’Hôpital’s Rule,
n
n−1
p 
n
i=1 log(ai )ai
lim log(Mp [ã] ) = lim  p = n−1 log(ai ) = log(M0 [ã] ).
p→0 p→0 n−1 ni=1 ai i=1

Relabel ã so that a1 = maxi {ai }. Then


 1/p

n
−1
lim Mp [ã] = lim n a1 (ai /a1 ) p
= a1 = max{ai } = M∞ [ã] .
p→∞ p→∞ i
i=1

The final limit of (2.12) can be obtained by replacing ai with 1/ai .


That the final statement of the theorem holds for 0 < p < q < ∞ follows from
Jensen’s inequality (Theorem 4.13), and the extension to 0 ≤ p < q ≤ ∞ follows from
the limits in (2.12). It then follows that the statement holds for −∞ ≤ p < q ≤ 0 after
replacing ai with 1/ai , and therefore it holds for −∞ ≤ p < q ≤ ∞. ///

The cases p = 1, 0, −1 correspond to the arithmetic mean, geometric mean and


harmonic mean which will be denoted AM [ã] ≥ GM [ã] ≥ HM [ã], respectively.
16 Approximate iterative algorithms

2.2 EQUIVALENCE RELATIONSHIPS

The notion of equivalence relationships and classes will play an important role in our
analysis. Suppose X is a set of objects, and ∼ defines a binary relation between two
objects x, y ∈ X .
Definition 2.1 A binary relation ∼ on a set X is an equivalence relation if it satisfies
the following three properties for any x, y, z ∈ X :

Reflexivity x ∼ x.
Symmetry If x ∼ y then y ∼ x.
Transitivity If x ∼ y and y ∼ z then x ∼ z.

Given an equivalence relation, an equivalence class is any set of the form Ex =


{y ∈ X | y ∼ x}. If y ∈ Ex then Ey = Ex . Each element x ∈ X is in exactly one equivalence
class, so ∼ induces a partition of X into equivalence classes.
In Euclidean space, ‘is parallel to’ is an equivalence relation, while ‘is perpendicular
to’ is not.
For finite sets, cardinality is a property of a specific set, while for infinite sets,
cardinality must be understood as an equivalence relation.

2.3 LINEAR ALGEBRA

Formal definitions of both a field and a vector space are given in Section 6.3. For the
moment we simply note that the notion of real numbers can be generalized to that
of a field K, which is a set of scalars that is closed under the rules of addition and
multiplication comparable to those available for R. Both R and C are fields.
A vector space V ⊂ Kn is any set of vectors x ∈ Kn which is closed under linear and
scalar composition, that is, if x, y ∈ V then ax + by ∈ V for all scalars a, b. This means
the zero vector 0 must be in V, and that x ∈ V implies −x ∈ V.

Elements x1 , . . . , xm of Kn are linearly independent if m i=1 ai xi = 0 implies ai = 0
for all i. Equivalently, no xi is a linear combination of the remaining vectors. The span
of a set of vectors x̃ = (x1 , . . . , xn ), denoted span(x̃), is the set of all linear combina-
tions of vectors in x̃, which must be a vector space. Suppose the vectors in x̃ are not
linearly independent. This means that, say, xm is a linear combination of the remaining
vectors, and so any linear combination in span(x̃) including xm may be replaced with
one including only the remaining vectors, so that span(x̃) = span(x1 , . . . , xm−1 ). The
dimension of a vector space V is the minimum number of vectors whose span equals
V. Clearly, this equals the number in any set of linearly independent vectors which
span V. Any such set of vectors forms a basis for V. Any vector space has a basis.

2.3.1 Matrices
Let Mm,n (K) be the set of m × n matrices A, for which Ai,j ∈ K (or, when required for
clarity, [A]i,j ∈ K) is the element of the ith row and jth column. When the field need not
be given, we will write Mm,n = Mm,n (K). We will generally be interested in Mm,n (C),
noting that the real matrices Mm,n (R) ⊂ Mm,n (C) can be considered a special case of
Real analysis and linear algebra 17

complex matrices, so that any resulting theory holds for both types. This is important
to note, since even when interest is confined to real valued matrices, complex numbers
enter the analysis in a natural way, so it is ultimately necessary to consider complex
vectors and matrices. Definitions associated with real matrices (transpose, symmetric,
and so on) have analgous definitions for complex matrices, which reduce to the more
familiar definitions when the matrix is real.
The square matrices are denoted as Mm = Mm,m . Elements of Mm,1 are column
vectors and elements of M1,m are row vectors. A matrix in Mm,n is equivalently an
ordered set of m row vectors or n column vectors. The transpose AT ∈ Mn,m of a matrix
A ∈ Mm,n has elements Aj,i = Ai,j . For A ∈ Mn,k , B ∈ Mk,m we always understand matrix

multiplication to mean that C = AB ∈ Mn,m possesses elements Ci,j = kk =1 Ai,k Bk ,j , so
that matrix multiplication is generally not commutative. Then (AT )T = A and (AB)T =
BT AT where the product is permitted.
In the context of matrix algrebra, a vector x ∈ Kn is usually assumed to be a
column vector in Mn,1 . Therefore, if A ∈ Mm,n then the expression Ax is understood to
be evaluated by matrix multiplication. Similarly, if x ∈ Km we may use the expression
xT A, understanding that x ∈ Mm,1 .
When A ∈ Mm,n (C), the conjugate matrix is written Ā, and is the component-wise
conjugate of A. The identity ĀB̄ = AB holds. The conjugate transpose (or Hermitian
adjoint) of A is A∗ = ĀT . As with the transpose operation, (A∗ )∗ = A and (AB)∗ = B∗ A∗
where the product is permitted. This generally holds for arbitrary products, that is
(ABC)∗ = (BC)∗ A∗ = C ∗ B∗ A∗ , and so on. For A ∈ Mm,n (R), we have A = Ā and A∗ =
AT , so the conjugate transpose may be used in place of the transpose operation when
matrices are real valued. We always may write (A + B)∗ = A∗ + B∗ and (A + B)T =
AT + BT where dimensions permit.
A matrix A ∈ Mn (C) is diagonal if the only nonzero elements are on the diag-
onal, and can therefore be referred to by the diagonal elements diag(a1 , . . . , an ) =
diag(A1,1 , . . . , An,n ). A diagonal matrix is positive diagonal or nonnegative diagonal if
all diagonal elements are positive or nonegative.
The identity matrix I ∈ Mm is the matrix uniquely possessing the property that
A = IA = AI for all A ∈ Mm . For Mm (C), I is diagonal, with diagonal entries equal to 1.
For any matrix A ∈ Mm there exists at most one matrix A−1 ∈ Mm for which AA−1 = I,
referred to as the inverse of A. An inverse need not exist (for example, if the elements
of A are constant).
The inner product (or scalar product) of two vectors x, y ∈ Cn is defined as &x, y' =

y x (a more general definition  of the
 inner product is given in Definition 6.13). For any
x ∈ Cn we have &x, x' = i x̄i xi = i |xi |2 , so that &x, x' is a nonnegative real number,
and &x, x' = 0 if and only if x = 0.  The magnitude, or norm, of a vector may be taken
as %x% = (&x, x') (a formal definition of a norm is given in Definition 6.6).
1/2

Two vectors x, y ∈ Cn are orthogonal if &x, y' = 0. A set of vectors x1 , . . . , xm is


orthogonal if &xi , xj ' = 0 when i  = j. A set of m orthogonal vectors are linearly inde-
pendent, and so form the basis for an m dimensional vector space. If in addition
%xi % = 1 for all i, the vectors are orthonormal.
A matrix Q ∈ Mn (C) is unitary if Q∗ Q = QQ∗ = I. Equivalently, Q is unitary if and
only (i) its column vectors are orthonormal; (ii) its row vectors are orthonormal; (iii)
it possesses inverse Q−1 = Q∗ . The more familiar term orthogonal matrix is usually
reserved for a real valued unitary matrix (otherwise the definition need not be changed).
18 Approximate iterative algorithms

A unitary matrix preserves magnitude, since &Qx, Qx' = (Qx)∗ (Qx) = x∗ Q∗ Qx =


x Ix = x∗ x = %x%2 .

A matrix Q ∈ Mn (C) is a permutation matrix if each row and column contains


exactly one 1 entry, with all other elements equal to 0. Then y = Qx is a permutation
of the elements of x ∈ Cn . A permutation matrix is always orthogonal.
Suppose A ∈ Mm,n and let α ⊂ {1, . . . , m}, β ⊂ {1, . . . , n} be any two nonempty sub-
sets of indices. Then A[α, β] ∈ M|α|,|β| is the submatrix of A obtained by deleting all
elements except for Ai,j , i ∈ α, j ∈ β. If A ∈ Mn , and α = β, then A[α, α] is a principal
submatrix.
The determinant associates a scalar with A ∈ Mm (C) through the recursive formula
 
det(A) = (−1)i+j Ai,j det(Ai,j ) = (−1)i+j Ai,j det(Ai,j )
i=1 j=1

where Ai,j ∈ Mm−1 (C) is the matrix obtained by deleting the ith row and jth column
of A. Note that in the respective expressions any j or i may be chosen, yielding
the same number, although the choice may have implications for computational
efficiency. As is well known, for A ∈ M1 (C) we have det(A) = A1,1 and for A ∈ M2
we have det(A) = A1,1 A2,2 − A1,2 A2,1 . In general, det(AT ) = det(A), det(A∗ ) = det(A),
det(AB) = det(A) det(B), det(I) = 1 which implies det(A−1 ) = det(A)−1 when the inverse
exists.
A large class of algorithms are associated with the problem of determining a solu-
tion x ∈ Km to the linear systems of equations Ax = b for some fixed A ∈ Mm and b ∈ Km .

Theorem 2.5 The following statements are equivalent for A ∈ Mm (C), and a matrix
satisfying any one is referred to as nonsingular, any other matrix in Mm (C) singular:

(i) The columns vectors of A are linearly independent.


(ii) The row vectors of A are linearly independent.
(iii) det(A)  = 0.
(iv) Ax = b possesses a unique solution for any b ∈ Km .
(v) x=0  is the only solution of Ax = 0.


Matrices A, B ∈ Mn are similar, if there exists a nonsingular matrix S for which B =


S−1 AS. Simlarity is an equivalence relation (Definition 2.1). A matrix is diagonalizable
if it is similar to a diagonal matrix. Diagonalization offers a number of advantages.
We always have Bk = S−1 Ak S, so that if A is diagonal, this expression is particularly
easy to evaluate. More generally, diagonalization can make apparent the behavior of
a matrix interpreted as a transformation. Suppose in the diagonalization B = S−1 AS
we know that S is orthogonal, and that A is real. Then the action of B on a vector
is decomposed into S (a change in coordinates), A (elementwise scalar multiplication)
and S−1 (the inverse change in coordinates).

2.3.2 Eigenvalues and spectral decomposition


For A ∈ Mn (C), x ∈ Cn , and λ ∈ C we may define the eigenvalue equation

Ax = λx, (2.13)
Real analysis and linear algebra 19

 then λ is an eigenvalue
and if the pair (λ, x) is a solution to this equation for which x  = 0,
of A and x is an associated eigenvector of λ. Any such solution (λ, x) may be called an
eigenpair. Clearly, if x is an eigenvector, so is any nonzero scalar multiple. Let Rλ be
the set of all eigenvectors x associated with λ. If x, y ∈ Rλ then ax + by ∈ Rλ , so that Rλ
is a vector space. The dimension of Rλ is known as the geometric multiplicity of λ. We
may refer to Rλ as an eigenspace (or eigenmanifold). In general, the spectral properties
of a matrix are those pertaining to the set of eigenvalues and eigenvectors.
If A ∈ Mn (R), and λ is an eigenvalue, then so is λ̄, with associated eigenvectors
Rλ̄ = R̄λ . Thus, in this case eigenvalues and eigenvectors occur in conjugate pairs.
Simlarly, if λ is real there exists a real associated eigenvector.
The eigenvalue equation may be written (A − λI)x = 0. However, by Theorem 2.5
this has a nonzero solution if and only if A − λI is singular, which occurs if and only if
pA (λ) = det(A − λI) = 0. By construction of a determinant, pA (λ) is an order n polyno-
mial in λ, known as the characteristic polynomial of A. The set of all eigenvalues of A
is equivalent to the set of solutions to the characteristic equation pA (λ) = 0 (including
complex roots). The multiplicity of an eigenvalue λ as a root of pA (λ) is referred to as its
algebraic multiplicity. A simple eigenvalue has algebraic multiplicity 1. The geometric
multiplicity of an eigenvalue can be less, but never more, than the algebraic multiplic-
ity. A matrix with equal algebraic and geometric multiplicities for each eigenvalue is a
nondefective matrix, and is otherwise a defective matrix.
We therefore denote the set of all eigenvalues as σ(A). An important fact is that
σ(Ak ) consists exactly of the eigenvalues σ(A) raised to the kth power, since if (λ, x)
solves Ax = λx, then A2 x = Aλx = λAx = λ2 x, and so on. A quantity of particular
importance is the spectral radius ρ(A) = max{|λ| | λ ∈ σ(A)}. There is sometimes interest
in ordering the eigenvalues by magnitude. If there exists an eigenvalue λ1 = ρ(A), this
is sometimes referred to as the principal eigenvalue, and any associated eigenvector is
a principal eigenvector.
In addition we have the following theorem:
Theorem 2.6 Suppose A, B ∈ Mn , and |A| ≤ B, where |A| is the element-wise absolute
value of A. Then ρ(A) ≤ ρ(|A|) ≤ ρ(B).
In addition, if all elements of A ∈ Mn (R) are nonnegative, then ρ(A ) ≤ ρ(A) for
any principal submatrix A .
Proof See Theorem 8.1.18 of Horn and Johnson (1985). ///

Suppose we may construct n eigenvalues λ1 , . . . , λn , with associated eigenvectors


ν1 , . . . , νn . Then let  ∈ Mn be the diagonal matrix with ith diagonal element λi , and
let V ∈ Mn be the matrix with ith column vector νi . By virtue of (2.13) we can write

AV = V. (2.14)

If V is invertable (equivalently, there exist n linearly independent eigenvectors, by


Theorem 2.5), then

A = VV −1 , (2.15)

so that A is diagonalizable. Alternatively, if A is diagonalizable, then (2.14) can


be obtained from (2.15) and, since V is invertable, there must be n independent
20 Approximate iterative algorithms

eigenvectors. The following theorem expresses the essential relationship between


diagonalization and spectral properties.
Theorem 2.7 For square matrix A ∈ Mn (C):

(i) Any set of k ≤ n eigenvectors ν1 , . . . , νk associated with distinct eigenvalues


λ1 , . . . , λk are linearly independent,
(ii) A is diagonalizable if and only if there exist n linearly independent eigenvectors,
(iii) If A has n distinct eigenvalues, it is diagonalizable (this follows from (i) and (ii)),
(iv) A is diagonalizable if and only if it is nondefective.

Right and Left Eigenvectors


The eigenvectors defined by (2.13) may be referred to as right eigenvectors, while left
eigenvectors are nonzero solutions to

x∗ A = λx∗ , (2.16)

(note that some conventions do not explicitly refer to complex conjugates x∗ in (2.16)).
This similarly leads to the equation x∗ (A − λI) = 0, which by an argument identical to
that used for right eigenvectors, has nonzero solutions if and only if pA (λ) = 0, giving
the same set of eigenvalues as those defined by (2.13). There is therefore no need to
distinguish between ‘right’ and ‘left’ eigenvalues. Then, fixing eigenvalue λ we may
refer to the left eigenspace Lλ as the set of solution x to (2.16) (in which case, Rλ now
becomes the right eigenspace of λ).
The essential relationship between the eigenspaces is summarized in the following
theorem:

Theorem 2.8 Suppose A ∈ Mn (C).

(i) For any λ ∈ σ(A) Lλ and Rλ have the same dimension.


(ii) For any distinct eigenvalues λ1 , . . . , λm from σ(A), any selection of vectors
xi ∈ Rλi for i = 1, . . . , m are linearly independent. The same holds for selections
from distinct Lλ .
(iii) Right and left eigenvectors associated with distinct eigenvalues are orthogonal.

Proof Proofs may be found in, for example, Chapter 1 of Horn and Johnson
(1985). ///

Next, if V is invertible, multiply both sides of (2.15) by V −1 yielding

V −1 A = V −1 .

Just as the column vectors of V are right eigenvectors, we can set U ∗ = V −1 , in which
case the ith column vector υi of U is a solution x to the left eigenvector equation (2.16)
corresponding to eigenvalue λi (the ith element on the diagonal of ). This gives the
diagonalization

A = VU ∗ .
Real analysis and linear algebra 21

Since U ∗ V = I, indefinite multiplication of A yields the spectral decomposition:


n
Am = Vm U ∗ = λm ∗
i νi υ i . (2.17)
i=1

The apparent recipe for a spectral decomposition is to first determine the roots
of the characteristic polynomial, and then to solve each resulting eigenvalue equa-
tion (2.13) after substituting an eigenvalue. This seemingly straightforward procedure
proves to be of little practical use in all but the simplest cases, and spectral decompo-
sitions are often difficult to construct using any method. However, a complete spectral
decomposition need not be the objective. First, it may not even exist for many other-
wise interesting models. Second, there are many important problems related to A
that can be solved using spectral theory, but without the need for a complete spectral
decomposition. For example:

(i) Determining bounds %Ax% ≤ a %x% or %Ax% ≥ b %x%,


(ii) Determining the convergence rate of the limit limk→∞ Ak = A∞ ,
(iii) Verifying the existence of a scalar λ and vector ν for which Aν = λν, and
guaranteeing that (for example) λ and ν are both real and positive.

Basic spectral theory relies on the identification of special matrix forms which
impose specific properties on a the spectrum. We next discuss two cases.

2.3.3 Symmetric, Hermitian and positive definite matrices


A matrix A ∈ Mn (C) is Hermitian if A = A∗ . A Hermitian real valued matrix is
symmetric, that is, A = AT . The spectral properties of Hermitian matrices are quite
definitive (see, for example, Chapter 4, Horn and Johnson (1985)).

Theorem 2.9 A matrix A ∈ Mn (C) is Hermitian if and only if there exists a unitary
matrix U and real diagonal matrix  for which A = UU ∗ .
A matrix A ∈ Mn (R) is symmetric if and only if there exists a real orthogonal Q
and real diagonal matrix  for which A = QQT .
Clearly, the matrices  and U may be identified with the eigenvalues and eigen-
vectors of A, with n eignevalue equation solutions given by the respect columns of
AU = UU ∗ U = U. An important implication of this is that all eigenvalues of a
Hermitian matrix are real, and eigenvectors may be selected to be orthonormal.
If we interpet x ∈ Cn as a column vector x ∈ Mn,1 we have quadratic form x∗ Ax,
which is interpretable either as a 1 × 1 complex matrix, or as a scalar in C, as is
convenient.
If A is Hermitian, then (x∗ Ax)∗ = x∗ A∗ x = x∗ Ax. This means if z = x∗ Ax ∈ C, then
z = z̄, equivalently x∗ Ax ∈ R. A Hermitian matrix A is positive definite if and only
 If instead x∗ Ax ≥ 0 then A is positive semidefinite. A non-
if x∗ Ax > 0 for all x  = 0.
symmetric matrix satisfying xT Ax > 0 can be replaced by A = (A + AT )/2, which is
symmetric, and also satisfies xT A x > 0.
22 Approximate iterative algorithms

Theorem 2.10 If A ∈ Mn (C) is Hermitian then x∗ Ax is real. If, in addition, A is


positive definite then all of its eigenvalues are positive. If it is positive semidefinite
then all of its eigenvalues are nonnegative.
If A is positive semidefinite, and we let λmin and λmax be the smallest and largest
eigenvalies in σ(A) (all of which are nonnegative real numbers) then it can be shown
that

λmin = min x∗ Ax and λmax = max x∗ Ax.


%x%=1 %x%=1

If A is positive definite then λmin > 0. In addition, since the eigenvalues of A2 are the
squares of the eigenvalues of A, and since for a Hermitian matrix A∗ = A, we may also
conclude

λmin = min %Ax% and λmax = max %Ax% ,


%x%=1 %x%=1

for any positive semidefinite matrix A.


k
Any diagonalizable matrix A possesses a kth root, A1/k , meaning A = A1/k .
Given diagonalization A = Q−1 Q, this is easily seen to be A1/k = Q−1 1/k Q, where
1/k
[1/k ]i,j = i,j . If A is a real symmetric positive definite matrix then A1/2 is real,
symmetric and nonsingular.

2.3.4 Positive matrices


A real valued matrix A ∈ Mm,n (R) is positive or nonnegative if all elements are posi-
tive or nonnegative, respectively. This may be conveniently written A > 0 or A ≥ 0 as
appropriate.
The spectral properties of A ≥ 0 are quite precisely characterized by the Perron-
Frobenius Theorem which is discussed below.
If P ∈ Mn is a permutation matrix then the matrix PT AP is obtained from A by a
common permutation of the row and column indices.

Definition 2.2 A matrix A ∈ Mn (R) is reducible if n = 1 and A = 0, or there exists a


permutation matrix P for which

B C
PT AP = (2.18)
0 D

where B and D are square matrices. Otherwise, A is irreducible.


The essential feature of a matrix of the form (2.18) is that the block of zeros is of
dimension a × b where a + b = n. It can be seen that this same block remains 0 in
any power (PT AP)k . The same will be true for A, subject to a label permutation.
Clearly, this structure will not change under any relabeling, which is the essence of the
permutation transformation. The following property of irreducible matrices should be
noted:
Real analysis and linear algebra 23

Theorem 2.11 If A ∈ Mn (R) is irreducible, then each column and row must contain
at least 1 nondiagonal nonzero element.
Proof Suppose all nondiagonal elements of row i of matrix A ∈ Mn (R) are 0. After
relabeling i as n, there exists a 1 × (n − 1) block of 0’s conforming to (2.18). Similarly,
if all nondiagonal elements of column j are 0, relabeling j as 1 yields a similar block
of 0’s. ///

Irreducibility may be characterized in the following way:

Theorem 2.12 For a nonnegative matrix A ∈ Mn (R) the following statements are
equivalent:

(i) A is irreducible,
(ii) The matrix (I + A)n−1 is positive.
(iii) For each pair i, j there exists k for which [Ak ]i,j > 0.

Condition (iii) is often strengthened:

Definition 2.3 A nonnegative matrix A ∈ Mn is primitive if there exists k for which


Ak is positive.
Clearly, Definition 2.3 implies statement (iii) of Theorem 2.12, so that a primitive
matrix is also irreducible.
The main theorem follows (see, for example, Horn and Johnson (1985)):

Theorem 2.13 (Perron-FrobeniusTheorem) For any primitive matrix A ∈ Mn , the


following hold:

(i) ρ(A) > 0,


(ii) There exists a simple eigenvalue λ1 = ρ(A),
(iii) There is a positive eigenvector ν1 associated with λ1 ,
(iv) |λ| < λ1 for any other eigenvalue λ.
(v) Any nonnegative eigenvector is a scalar multiple of ν1 .

If A is nonnegative and irreducible, then (i)−(iii) hold.

If A is nonnegative, then ρ(A) is an eigenvalue, which possesses a nonnega-


tive eigenvector. Furthermore, if v is a positive eigenvector of A, then its associated
eigenvalue is ρ(A).
One of the important consequences of Theorem 2.13 is that an irreducible matrix
A possesses a unique principal eigenvalue ρ(A), which is real and positive, with a
positive principal eigenvector. Noting that AT is also irreducible, we may conclude
that the left principal eigenvector is also positive.
We cannot rule out ρ(A) = 0 for A ≥ 0 (A ≡ 0, among other examples). However, a
convenient lower bound for ρ(A) exists, a consequence of Theorem 2.6, which implies
that maxi Ai,i ≤ ρ(A).
24 Approximate iterative algorithms

Suppose a nonnegative matrix A ∈ Mn is diagonalizable, and ρ(A) > 0. A normal-


ized spectral decomposition follows from (2.17):

 m 
n
 m
ρ(A)−1 A = ρ(A)−1 λi νi υi∗ .
i=1

To fix ideas, suppose A is primitive. By Theorem 2.13 there exists a unique principal
eigenvalue, say λ1 = ρ(A), and any other eigenvalue satisfies |λj | < ρ(A). Then
 m  m
ρ(A)−1 A = ν1 υ1∗ + O mm2 −1 ρ(A)−1 |λSLEM | , (2.19)

where λSLEM is the second largest eigenvalue in magnitude and m2 is the algebraic
multiplicity of λSLEM , that is, any eigenvalue other than λ1 (not necessarily unique)
maximizing |λj |. Since |λSLEM | < ρ(A) we have limit
 m
lim ρ(A)−1 A = ν1 υ1∗ , (2.20)
m→∞

where ν1 , υ1 are the principal right and left eigenvectors, with convergence at a geo-
m
metric rate O ρ(A)−1 |λSLEM | . For this reason, the quantity |λSLEM | is often of
considerable interest. Note that in this representation, the normalization &νi , υi ' = 1 is
implicit.
However, existence of the limit (2.20) for primitive matrices does not depend on
the diagonalizability of A, and is a direct consequence of Theorem 2.13. When A is
irreducible, the limit (2.20) need not exist, but a weaker statement involving asymptotic
averages will hold. These conclusions are summarized in the following theorem:
Theorem 2.14 Suppose nonegative matrix A ∈ Mn (R) is irreducibile. Let ν1 , υ1 be
the principal right and left eigenvectors, normalized so that &ν1 , υ1 ' = 1. Then


N
 m
lim N −1 ρ(A)−1 A = ν1 υ1∗ . (2.21)
N→∞
m=1

If A is primitive, then (2.20) also holds.


Proof See, for example, Theorems 8.5.1 and 8.6.1 of Horn and Johnson (1985). ///

A version of (2.21) is available for nonnegative matrices which are not necessarily
irreducible, but which satisfy certain other regularity conditions (Theorem 8.6.2, Horn
and Johnson (1985)).

2.3.5 Stochastic matrices


We say A ∈ Mn is a stochastic matrix if A ≥ 0, and each row sums to 1. It is easily seen
 = 1,
that A1  and so λ = 1 and v = 1 form an eigenpair. Since 1 > 0, by Theorem 2.13
we must have ρ(A) = 1.
In addition, for a general stochastic matrix, any positive eigenvector v satisfies
Av = v.
Real analysis and linear algebra 25

If A is also irreducible then λ = 1 is a simple eigenvalue, so any solution to Av = v


must be a multiple of 1  (in particular, any positive eigenvector must be a multiple of 1).


If A is primitive, any nonnegative eigenvector v must be a multiple of 1. In addition,
all eigenvalues other than the principal have modulus |λj | < 1.
We will see that is can be very advantageous to verify the existence of a principal
eigenpair (λ1 , ν1 ) where λ1 = ρ(A) and ν1 > 0. This holds for any stochastic matrix.

2.3.6 Nonnegative matrices and graph structure


The theory of nonnegative matrices can be clarified by associating with a square matrix
A ≥ 0 a graph G(A) possessing directed edge (i, j) if and only if Ai,j > 0. Following
Theorems 2.1–2.2 of Section 2.1.11, we know that Ani,j > 0 if and only if there is a path
of length n from i to j within G(A).
By (iii) of Theorem 2.12 we may conclude that A is irreducible if and only if all
pairs of nodes in G(A) communicate (see the definitions of Section 2.1.11).
Some important properties associated with primitive matrices are summarized in
the following theorems.
Theorem 2.15 If A ∈ Mn (R) is a primitive matrix then for some finite k we have
Ak > 0 for all k ≥ k .

Proof By Definition 2.3 there exists finite k for which Ak > 0. Let i, j be any ordered
pair of nodes in G(A). Since a primitive matrix is irreducible, we may conclude from
Theorem 2.11 that there exists node k such that (k, j) is an edge in G(A). By Theorem
2.2 there exists a path of length k from i to k, and therefore also a path of length k

from i to j. This holds for any i, j, therefore by Theorem 2.2 Ak +1 > 0. The proof is
completed by successively incrementing k . ///

Thus, for a primitive matrix A all pairs of nodes in G(A) communicate, and in
addition there exists k such that for any ordered pair of nodes i, j there exists a path
from i to j of any length k ≥ k .
Any irreducible matrix with positive diagonal elements is also primitive:
Theorem 2.16 If A ∈ Mn (R) is an irreducible matrix with positive diagonal elements,
then A is also a primitve matrix.
Proof Let i, j be any ordered pair of nodes in G(A). There exists at least one path
from i to j. Suppose one of these paths has length k. Since, by hypothesis, Aj,j > 0 the
edge (j, j) in included in G(A), and can be appended to any path ending at j. This means
there also exists a path of length k + 1 from i to j. The proof is completed by noting
that there must be some finite k such that any two nodes may be joined by a path of

length no greater than k , in which case Ak > 0. ///

A matrix can be irreducible but not primitive. For example, if the nodes of G(A)
can be partitioned into subsets V1 , V2 such that all edges (i, j) are formed by nodes
from distinct subsets, then A cannot be primitive. To see this, suppose i, j ∈ V1 . Then
any path from i to j must be of even length, so that the conclusion of Theorem 2.15
cannot hold. However, if G(A) includes all edges not ruled out by this restriction, it is
easily seen that A is irreducible.
26 Approximate iterative algorithms

Finally, we characterize the conectivity properties of a reducible nonnegative


matrix. Consider the representation (2.18). Without loss of generality we may take
the identity permutation P = I. Then the nodes of G(A) may be partitioned into V1
and V2 in such a way that there can be no edge (i, j) for which i ∈ V1 and j ∈ V2 . This
means that no node in V2 is accessible from any node in V1 , that is, there cannot be
any path beginning in V1 and ending in V2 .
We will consider this issue further in Section 5.2, where it has quite intuitive
interpretations.
Chapter 3

Background – measure theory

Measure theory provides a rigorous mathematical foundation for the study of, among
other things, integration and probability theory. The study of stochastic processes,
and of related control problems, can proceed some distance without reference to mea-
sure theoretic ideas. However, certain issues cannot be resolved fully without it, for
example, the very existence of an optimal control in general models. In addition, if we
wish to develop models which do not assume that all random quantities are stochasti-
cally independent, which we sooner or later must, the theory of martingale processes
becomes indepensible, an understanding of which is greatly aided by a familiarity
with measure theoretic ideas. Above all, foundational ideas of measure theory will be
required for the function analytic construction of iterative algorithms.

3.1 TOPOLOGICAL SPACES

Suppose we are given a set , and a sequence xk ∈ , k ≥ 1. It is important to have


a precise definition of the convergence of xk to a limit. If  ⊂ Rn the definition is
standard, but if  is a collection of, for example, functions or sets, more than one
useful definition can be offered. We may consider pointwise convergence, or uniform
convergence, of a sequence of real-valued functions, each being the more appropriate
for one or another application.
One approach to this problem is to state an explicit definition for convergence
(xn →n x ∈ R iff ∀ > 0∃N supn≥N |xn − x| < ). The much more comprehensive
approach is to endow  with additional structure which induces a notion of prox-
imity. This is achieved through the notion of a neighborhood of any x ∈ , a type of
subset which includes x. If xk remains in any neighborhood of x for all large enough
k then we can say that xk converges to x.
This idea is formalized by the topology:
Definition 3.1 Let O be a collection of subsets of a set . Then (, O) is a topological
space if the following conditions hold:
(i)  ∈ O and ∅ ∈ O,
(ii) if A, B ∈ O then A ∩ B ∈ O,
(iii) for any collection of sets {At } in O (countable or uncountable) we have ∪t At ∈ O.
In this case O is referred to a topology on . If ω ∈ O ∈ O then O is a neighborhood
of ω.
28 Approximate iterative algorithms

The sets O are called open sets. Any complement of an open set is a closed set. They
need not conform to the common understanding of an open set, since the power set
P() (that is, the set of all possible subsets) satisfies the definition of a topological
space. However, the class of open sets in (−∞, ∞) as usually understood does satisfy
the definition of a topological space, so the term ‘open’ is a useful analogy.
A certain flexibility of notation is possible. We may explicitly write the topological
space as (, O). When it is not necessary to refer to specific properties of the topology
O, we can simply refer to  alone as a topological space. In this case an open set O ⊂ 
is understood to be an element of some topology O on .
Topological spaces allow a definition of convergence and continuity:

Definition 3.2 If (, O) is a topological space, and ωk is a sequence in , then ωk


converges to ω ∈  if and only if for every neighborhood O of ω there exists K such
that ωk ∈ O for all k ≥ K.
A mapping f : X → Y between topological spaces X, Y is continuous if for any
open set E in Y the preimage f −1 (E) is an open set in X.
A continuous bijective mapping f : X → Y between topological spaces X, Y is
a homeomorphism if the inverse mapping f −1 : Y → X is also continuous. Two
topological spaces are homeomorphic if there exists a homeomorphism f : X → Y.

We may have more than one topology on . In particular, if O and O are


topologies on  then if O ⊂ O we say O is a weaker topology than O, which is
a stronger topology than O . Since convergence is defined as a condition imposed
on a class of open sets, a weaker topology necessarily has a less stringent defini-
tion of convergence. The weakest topology is O = {, ∅}, in which case all sequences
converge to all elements of . The strongest topology is the set of all subsets of .
Since the topology includes all singletons, the only convergent sequences are constant
ones, which essentially summarizes the notion of convergence on sets of countable
cardinality.
We can see that the definition of continuity for a mapping between topological
spaces f : X → Y requires that Y is small enough, and that X is large enough. Thus, if f
is continuous, it will remain continuous if Y is replaced by a weaker topology, or X is
replaced by a stronger topology. In fact, any f is continuous if Y is the weakest topology,
or X is the strongest topology. We also note that the definitions of semicontinuity of
Section 2.1.10 apply directly to real-valued functions on topologies.
The study of topology is especially concerned with those properties which are unal-
tered by homeomorphisms. From this point of view, two homeomorphic topological
spaces are essentially the same.
If  ⊂  and O = {U ∩  | U ∈ O}, then ( , O ) is also a topology, sometimes
referred to as the subspace topology. Note that  need not be an element of O.
An open cover of a subset E of a topological space X is any collection Uα , α ∈ I
of open sets containing E in its union. We say E is a compact set if any open covering
of E contains a finite subcovering of E (the definition may be applied to X itself). This
idea is a generalization of the notion of bounded closure (see Theorem 3.3). Similarly,
a set E is a countably compact set if any countable open covering of E contains a finite
subcovering of E. Clearly, countable compactness is a strictly weaker property than
compactness.
Background – measure theory 29

3.1.1 Bases of topologies


We say B(O) ⊂ O is a base for O if all open sets are unions of sets in B(O). This suggests
that a topology may be constructed from a suitable class of subsets G of  by taking
all unions of members of G and then including  and ∅. As might be expected, not all
classes G yield a topology in this manner, but conditions under which this is the case
are well known:

Theorem 3.1 A class of subsets G of  is a base for some topology if and only if
the following two conditions hold (i) every point x ∈  is in at least one G ∈ G; (ii) if
x ∈ G1 ∩ G2 for G1 , G2 ∈ G then there exists G3 ∈ G for which x ∈ G3 ⊂ G1 ∩ G2 .

The proof of Theorem 3.1 can be found in, for example, Kolmogorov and Fomin
(1970) (Chapter 3 of this reference can be recommended for this topic).

3.1.2 Metric space topologies


Definition 3.3 For any set X a mapping d : X × X → [0, ∞) is called a metric, and
(X, d) is a metric space, if the following axioms hold:

Identifiability For any x, y ∈ X we have d(x, y) = 0 if and only if x = y,


Symmetry For any x, y ∈ X we have d(x, y) = d(y, x),
Triangle inequality For any x, y, z ∈ X we have d(x, z) ≤ d(x, y) + d(y, z).

Convergence in a metric space follows from the metric, so that we write xn →n x


if limn d(xn , x) = 0. Of course, this formulation assumes that x ∈ X, and we may have
sequences exhibiting ‘convergent like’ behavior even is it has no limit in X.

Definition 3.4 A sequence {xn } in a metric space (X, d) is a Cauchy sequence if for
any  > 0 there exists N such that d(xn , xm ) <  for all n, m ≥ N. A metric space is
complete if all Cauchy sequences converge to a limit in X.

Generally any metric space can always be completed by extending X to include all
limits of Cauchy sequences (see Royden (1968), Section 5.4).

Definition 3.5 Given metric space (X, d), we say x ∈ X is a point of closure of E ⊂ X
if it is a limit of a sequence contained entirely in E. In addition, the closure Ē of E is
set of all points of closure of E. We say A is a dense subset of B if A ⊂ B and Ā = B.

Clearly, any point in E is a point of closure of E, so that E ⊂ Ē. A metric space is


separable if there is a countable dense subset of X. The real numbers are separable,
since the rational numbers are a dense subset of R.
A metric space also has natural topological properties. We may define an open
ball Bδ (x) = {y|d(y, x) < δ}.

Theorem 3.2 The class of all open balls of a metric space (X, d) is the base of a
topology.
30 Approximate iterative algorithms

Proof We make use of Theorem 3.1. We always have x ∈ Bδ (x), so condition (i) holds.
Next, suppose x ∈ Bδ1 (y1 ) ∩ Bδ2 (y2 ). The for some  > 0 we have d(x, y1 ) < δ1 −  and
d(x, y2 ) < δ2 − . Then by the triangle inequality x ∈ B (x) ⊂ Bδ1 (y1 ) ∩ Bδ2 (y2 ), which
completes the proof. ///

A topology on a metric space generated by the open balls is referred to as the metric
topology, which always exists by Theorem 3.2. For this reason, every metric space can
be regarded as a topological space. We adopt this convention, with the understanding
that the topology being referred to is the metric topology. We then say a topological
space (, O) is metrizable (completely metrizable) if it is homeomorphic to a metric
space (complete metric space), in which case there exists a metric which induces the
topology O. This generalizes the notion of a metric space. Homeomorphisms form an
equivalence class, and metrics are equivalent if they induce the same topolgy.
Additional concepts of continuity exist for mappings f : X → Y between metric
spaces (X , dx ) and (Y, dy ). We say f is uniformly continuous if for every  > 0 there
exists δ > 0 such that dx (x1 , x2 ) < δ implies dy (f (x1 ), f (x2 )) < . A family of functions F
mapping X to Y is equicontinuous at x0 ∈ X if for every  > 0 there exists δ > 0 such
that for any x ∈ X satisfying dx (x0 , x) < δ we have supf ∈F dy (f (x0 ), f (x)) < . We say F
is equicontinuous if it is equicontinuous at all x0 ∈ X .
Theorem 3.3 (Heine-Borel Theorem) In the metric topology of Rm a set S is
compact if and only if it is closed and bounded.

3.2 MEASURE SPACES

In elementary probability, we have a set of possible outcomes , and the ability to


assign a probability P(A) to any subset of outcomes A ⊂ . If we ignore the interpre-
tation of P(A) as a probability, then P becomes simply a set function, which, as we
expect of a function, maps a set of objects to a number. Formally, we write, or would
like to write, P : P() → [0, 1], where P(E) is the power set of E, or the class of all
subsets of E. It is easy enough to write a rule y = x2 + x + 1 which maps any number
x to a number y, but this can become more difficult when the function domain is a
power set. If  = {1, 2, . . .} is countable, we can use the following process. We first
choose a probability for each singleton in , say P({i}) = pi , then extend the definition
by setting P(E) = i∈E pi . Of course, there is nothing preventing us from defining an
alternative set function, say P∗ (E) = maxi∈E pi , which would possess at least some of
the properties expected of a probability function. We would therefore like to know if
we may devise a precise enough definition of a probability function so that any choice
of pi yields exactly one extension, since definitions of random variables on countable
spaces are usually given as probabilities of singletons.
The situation is made somewhat more complicated when  is uncountable. It is
univerally accepted that a random variable X on R can be completely defined by the
cumulative distribution function F(x) = P{X ≤ x}, which provides a rule for calculating
only a very small range of elements of P(R). We can, of course, obtain probabilities of
intervals though subtraction, that is P{X ∈ (a, b]} = F(b) − F(a), and so on, eventually
for open and closed intervals, and unions of intervals. We achieve the same effect if we
use a density f (x) to calculate probabilities P{X ∈ E} = E f (x)dx, since our methods
Background – measure theory 31

of calculating an integral almost always assume E is a collection of intervals. We


are therefore confronted with the same problem, that is, we would like to define
probabilities for a simple class of events E ∈ E ⊂ P(), for example, singletons or half
intervals (−∞, x], and extend the probability set function P to power sets P() in such
a way that P satisfies a set of axioms we regard as essential to our understanding of a
probability calculus.
The mathematical issues underlying such a construction relate to the concept of
measurability, and it is important to realize that it affects both countable and uncount-
able sets . Suppose we propose a random experiment consisting of the selection of
any positive integer at random. We should have no difficulty deciding that the set Ek
consisting of all integers divisible by k should have probability 1/k. To construct such
a probability rule, we may set  = I+ , and define a class of subsets F0 as those E ⊂ I+
for which the limit

P∗ (E) = lim n−1 |E ∩ {1, . . . , n}|


n→∞

exists. Then P∗ defines a randomly chosen integer X about which we can say
P(X is divisible by 7) = 1/7 or P(X is a square number) = 0. But we are also assum-
ing that each integer i has equal probability pi = α. If we extend P in the way we
proposed, we would end up with P() equalling 0 or ∞, whereas the probability that
the outcome is in  can only be 1. Similarly, it is possible to partition the unit interval
into a countable number of uncountably denumerable sets E1 which are each modulo
translations of a one member. Therefore, if we attempt to impose a uniform proba-
bility on the unit interval, we would require that P(E) for each E ∈ E has the same
probability, and we would similarly be forced to conclude that P() equals 0 or ∞.
Both of these examples are the same in the sense that some principle of uniformity
forces us to assign a common probability to an infinite number of disjoint outcome.
As we will next show, the solution to these problems differs somewhat for count-
able and uncountable . For countable , the object will be to extend P fully to P(),
and the method for doing so will explicitly rule out examples such as the randomly cho-
sen integer, by insisting at the start that i∈ pi = 1. It could be, and has been (Dubins
and Savage (1976)), argued that this type of restriction (formally known as countable
additivity, see below) is not really needed. It essentially forces P to be continuous in
some sense, which might not be an essential requirement for a given application. We
could have a perfectly satisfactory definition of a randomly chosen positive integer by
restricting our definition to a subset of P(), as we have done. In fact, this is precisely
how we deal with uncountable , by first devising a rule for calculating P(E) for inter-
vals E, then extending P to sets which may be constructed from a countable number
of set operations on the intervals, better known as the Borel sets (see below for formal
definition). The final step adds all subsets of all Borel sets of probability zero. This
class of sets is considerably smaller that P() for uncountable , and means that a
probability set function is really no more complex an object than a function on .

3.2.1 Formal construction of measures


Our discussion suggests that probabilities and integrals are similar objects, and both
usually are based on the construction of a measure space. We do so here, although it is
32 Approximate iterative algorithms

possible to construct an axiomatic theory of probability based on the integral operator,


without the mediation of the measure space, the probability itself constructed as the
expectation of an indicator function (Whittle (2000)).
We have already outlined the steps in the creation of a probability measure, or of
measures in general (we need not insist that the measure of  is 1, or even finite). The
first step is to define the sets on which the measure will be constructed.
Definition 3.6 Let F be a collection of subsets of a set . Then F is a field (or
algebra) if the following conditions hold:

(i)  ∈ F,
(ii) if E ∈  then Ec ∈ ,
(iii) if E1 , E2 ∈  then E1 ∪ E2 ∈ .

If (iii) is replaced with

(iv) if E1 , E2 , . . . ∈  then ∪i Ei ∈ .

Then F is a σ-field (or σ-algebra).


Condition (iii) extends to all finite unions, so we say that a field is closed under
complementation and finite union, and a σ-field is closed under complementation
and countable union. Both contain the empty set ∅ = c . By De Morgans’ Law
(A ∪ B)c = Ac ∩ Bc , so that a field (or σ-field) is closed under all finite (or countable) set
operations. For a field (or σ-field) F, a measurable partition of E ∈ F is any partition
of E consisting of elements of F.
Then for any class of subsets E of  we define σ(E) as the smallest σ-field containing
E, or equivalently the intersection of all σ-fields containing E (which must also be a
σ-field). It is usually referred to as the σ-field generated by E. This always exists, since
P() is a σ-field, but the intention is usually that σ(E) will be considerably smaller.
If E ⊂  and F = σ(E) then σ(E ∩ E) = F ∩ E. If F, F  are two σ-fields on  and
F ⊂ F, we say that F  is a sub σ-field of F.


Example 3.1 Let F0 be a class of sets consisting of  = (∞, ∞), and all finite unions
of intervals (a, b], including (∞, b] and ∅ = (b, b]. This class of sets is closed under
finite union and complementation, and so is a field on . Then σ(F0 ) is the σ-field
consisting of all intervals, and all sets obtainable from countably many set operations
on intervals. Note that σ(F0 ) could be equivalently defined as the smallest σ-field
containing all intervals in , or all closed bounded intervals, all open sets, all sets
(∞, b], and so on.
We next define a measure:
Definition 3.7 A set function µ : F → R̄+ , where F is a σ-field on , is a measure if
µ(∅) = 0 and if it is countably
 additive, that is for any countable collection of disjoint
sets E1 , E2 , . . . we have i µ(Ei ) = µ (∪i Ei ). If F is a field, then µ is called a measure
if countable additivity holds whenever ∪i Ei ∈ F.
If Definition 3.7 did not require that µ(∅) = 0, then it would hold for µ ≡ ∞.
However, that µ(∅) = 0 for any other measure would follow from countable additivity,
since we would have µ(E ) < ∞ for some E , and µ(E ) = µ(E ) + µ(∅).
Background – measure theory 33

A measure µ is a finite measure if µ() < ∞, and is a stochastic measure, or prob-


ability measure, if µ() = 1. We sometimes need to consider a substochastic measure,
for which µ() ≤ 1. We say µ is a σ-finite measure if there exists a countable collection
of subsets Ei ∈ F such that ∪i Ei =  with µ(Ei ) < ∞. We refer to (, F) as a measur-
able space if F is a σ-field on , then (, F, µ) is a measure space if µ is a measure on
(, F). We may also refer specifically to a finite, probability or σ-finite measure space
as appropriate.
We have assumed that µ(E) is always nonnegative. Under some conventions the
term positive measure is used instead. We will encounter signed measures, that is, set
functions which share the properties of a measure, but are allowed to take negative
values.
We have already referred to the countable additivity property as a type of continuity
condition. Formally, we may define sequences of sets in terms of increasing unions.
If E1 ⊂ E2 ⊂ . . . we write Ei ↑ E = ∪j Ej . For any sequence A1 , A2 , . . . we have Ei =
∪ij=1 Aj ↑ ∪∞ ∞
j=1 Aj = ∪j=1 Ej , so that increasing sequences appear quite naturally. By taking
complements, we equivalently have for any decreasing sequence F1 ⊃ F2 ⊃ . . . the limit
Fi ↓ F = ∩j Fj , and any sequence A1 , A2 , . . . generates a decreasing sequence by setting
Fi = ∩ij=1 Aj ↓ ∩∞ ∞
j=1 Aj = ∩j=1 Fj .
This leads to a definition of continuity for measure spaces. It is important to note
that continuity holds axiomatically for any countably additive measure (as all measures
in this book will be), so this need not be verified independently. We summarize a
number of such properties:
Theorem 3.4 Suppose we are given a measure space (, F, µ). The following
statements hold.
(i) A ⊂ B implies µ(A) ≤ µ(B),
(ii) µ(A) + µ(B) = µ(A ∪ B) + µ(A ∩ B),
(iii) Ei ↑ E implies limi µ(Ei ) = µ(E),
(iv) Fi ↓ F implies limi µ(Fi ) = µ(F),
(v) For any sequence A1 , A2 , . . . in F we have limi µ ∪ij=1 Aj = µ ∪∞
j=1 Aj

(vi) For any sequence A1 , A2 , . . . in F we have limi µ ∩ij=1 Aj = µ ∩∞


j=1 Aj .

Proof (i) Write the disjoint union B = A ∪ (B − A), then µ(B) = µ(A) + µ(B − A).
(ii) Write the disjoint unions A = (A − B) ∪ AB, B = (B − A) ∪ AB, A ∪ B = (A − B) ∪
(B − A) ∪ AB, then apply additivity. (iii) We write D1 = E1 , Di = Ei − Ei−1 for i ≥ 2.
j
The sequence D1 , D2 , . . . is disjoint, with Ei = ∪i=1 Dj and E = ∪i Di . So, by countable
 
additivity we have µ(E) = µ( ∪i Di ) = i µ(Di ) = limi ij=1 µ(Dj ) = limi µ(Ei ). Then
(v) follows after setting Ei = ∪ij=1 Aj and applying (iii). Finally, (iv) and (vi) follow
by expressing a decreasing sequence as an increasing sequence of the complements,
then applying (iii) and (iv). ///

3.2.2 Completion of measures


Any measure satisfies µ(∅) = 0 but ∅ need not be the only set of measure zero. We can
refer to any set of measure zero as a null set. It seems reasonable to assign a measure
34 Approximate iterative algorithms

of zero to any subset of a null set, since, if it was assigned a measure, it could only be 0
under the axioms of a measure. However, the definition of a measure space (, F, µ)
does not force F to contain all subsets of null sets, and counterexamples can be readily
constructed. Accordingly, we offer the following definition:

Definition 3.8 A measure space (, F, µ) is complete if A ∈ F whenever A ⊂ B and


µ(B) = 0.
Any measure space may be completed by considering the class of subsets M = {A |
A ⊂ B, µ(B) = 0}, and setting µ∗ (A) = 0 for all A ∈ M and µ∗ (B) = µ(B) for all B ∈ F.
It can be shown that F ∗ = F ∪ M is a σ-field and µ∗ is a measure, so that (, F ∗ , µ∗ )
is a complete measure space.

3.2.3 Outer measure


Definition 3.9 A set function λ on all subsets of  is an outer measure if the following
properties are satisfied:

(i) λ(∅) = 0,
(ii) A ⊂ B ⇒ λ(A) ≤ λ(B),

(iii) A ⊂ ∪∞i=1 Ai ⇒ λ(A) ≤ i=1 λ(Ai ).

Property (ii) is referred to as monotonicity and property (iii) is referred to as countable


subadditivity.
The outer measure differs from the measure in that it is defined on all subsets, so
does not require a definition of measurability. However, it does induce a concept of
measurability, and in fact directly induces a measure space.

Definition 3.10 Given an outer measure λ on  a set E is λ-measurable if λ(A) =


λ(A ∩ E) + λ(A ∩ Ec ) for all A ⊂ . By countable subadditivity, this condition can be
replaced by λ(A) ≥ λ(A ∩ E) + λ(A ∩ Ec ) for all A ⊂ .

Theorem 3.5 Given an outer measure λ on , any set E for which λ(E) = 0 is
λ-measurable.
Proof Suppose A ⊂  and λ(E) = 0. By monotonicity 0 ≤ λ(AE) ≤ λ(E) = 0 and
λ(A) ≥ λ(AEc ), so that Definition 3.10 holds. ///

We can always restrict λ to a class of subsets E of . In fact, inducing a measure


space from λ is quite straightforward:

Theorem 3.6 Given an outer measure λ on , the class B of λ-measurable sets is a


σ-field in which λ is a complete measure.
Proof See, for example, Theorem 12.1 of Royden (1968). ///

Many authors reserve a distinct symbol for a set function restricted to a class of
subsets. Theorem 3.6 then describes a measure space (, B, λB ) where λB is λ restricted
to B.
Background – measure theory 35

3.2.4 Extension of measures


Theorem 3.6 permits the construction of a measure space by restricting an outer mea-
sure λ to the class of λ-measurable sets, which can be shown to be a σ-field. The
complementary procedure is to define a measure µ0 on a simpler class of subsets E,
then to extend µ0 to a measure µ on a σ-field which includes E. The objective is to
do this so that µ and µ0 agree on the original sets E while µ satisfies Definition 3.7.
In addition, we wish to know if any measure µ which achieves this is unique.
We have already introduced the field, in addition to the σ-field (Definition 3.6).
In addition, the idea of extending any class of sets E to the σ-field σ(E) is well defined.
A class of sets, simpler than a field, from which an extension may be constructed is
given by the following definition:
Definition 3.11 A class of subsets A of  is a semifield (semialgebra) if the following
conditions hold:
(i) A, B ∈ A ⇒ A ∩ B ∈ A.
(ii) A ∈ A ⇒ Ac is a finite disjoint union of sets in A.

A σ-field is a field, which is a semifield. The latter is a quite intuitive object. The
set of right closed intervals in R, including (−∞, a] and (a, ∞) and ∅ is a semifield,
which is easily extended into Rn .
If A is a semifield, then the class of subsets F0 consisting of ∅ and all finite disjoint
unions of sets in A can be shown to be a field, in particular, the field generated by
semifield A.
Theorem 3.7 Suppose A is a semifield on  and F0 is the field generated by A. Let
µ be a nonnegative set function on A satisfying the following conditions:

(i) If ∅ ∈ A then µ(∅) = 0, 


(ii) if A ∈ A is a finite disjoint union of sets A1 , . . . , An in A then µ(A) = ni=1 µ(Ai ).
(iii) nA ∈ A is a countable disjoint union of sets A1 , A2 , . . . in A then µ(A) ≤
if
i=1 µ(Ai ).

Then there exists a unique extension of µ to a measure on F0 .


We have used the term outer measure to refer to a set of axioms applicable to a
set function defined on all subsets of a space . The term is also used to describe a
specific constructed set function associated with Lebesgue measure (see Section 3.2.6).
Definition 3.12 Suppose µ is a nonnegative set function defined on a class of subsets
A of  which contains ∅ and covers . Suppose µ(∅) = 0. The outer measure µ∗
induced by µ is defined as

µ∗ (E) = inf µ(Ai ), (3.1)
E⊂∪i Ai
i=1

where the infimum is taken over all countable covers of E ⊂  from A.


It may be shown that under Definition 3.12 the set function (3.1) is an outer
measure in the sense of Definition 3.9. Our main extension theorem follows (see, for
example, Section 12.2, Royden (1968)).
36 Approximate iterative algorithms

Theorem 3.8 (Carathéodory Extension Theorem) Suppose µ is a measure on a


field F0 of subsets of . Let µ∗ be the outer measure induced by µ. Let F ∗ be the
set of all µ∗ -measurable sets, and let µ be µ∗ restricted to F ∗ . Then F ∗ is a σ-field
containing F0 on which µ is a measure.
If µ is finite, or σ-finite then so is µ . Let µ be the restriction of µ∗ to σ(F0 ). If µ
is σ-finite then µ is the unique extension of µ to σ(F0 ).
The progression from a semifield to a field, and finally to a σ-field is a natural
one. In Rn the semifield is adequate to describe assignment of measures to n-rectangles
and their finite compositions. If this can be done in a coherent manner, extension to a
measure space follows as described in Theorem 3.8.
However, it must be noted that several extensions are described in Theorem 3.8.
The extension to the µ-measurable sets F ∗ is complete (see Theorem 3.6). Formally,
this is not the same extension as that to σ(F0 ). In other words the measure spaces
(, F ∗ , µ ) and (, σ(F0 ), µ ) are not generally the same. In fact, this distinction plays
a notable role in the theory of stochastic optimization.

3.2.5 Counting measure


We will usually encounter one of two types of measures. If we are given a countable
set S, then µ(E) = |E ∩ S| is called counting measure, and satisies the definition of a
measure. It is important to note that E is not necessarily a subset of S, so that a counting
measure can be defined on any space containing S. Many games of chance are good
examples of counting measures.

3.2.6 Lebesgue measure


The second commonly encountered measure is the Lebesgue measure. On  =
(−∞, ∞) the set of intervals (a, b], taken to include (−∞, b] and (a, ∞) is a semifield
(we may also simply take the class of all intervals). We assign measure m (a, b] = b − a
to all bounded intervals, and ∞ to all unbounded intervals. An application of Theo-
rems 3.7 and 3.8 yields Lebesgue measure, which a completion of a measure which
consistently measures the length of intervals. The same procedure may be used to con-
struct Lebesgue measure in Rn by assigning the usual geometric volume to rectangles.
Whether or not a set is Lebesgue measurable can be resolved by Definition 3.10 and
the outer measure referenced in Theorem 3.8. A subset of [0, 1) which is not Lebesgue
measurable exists, given the axiom of choice, known as the Vitali set (Section 3.4,
Royden (1968)).

3.2.7 Borel sets


For any topological space (, O) the Borel sets are taken to be B = σ(O), that is, the
smallest σ-field containing all open sets (or equivalently all closed sets). Suppose we
are given a measure space (, F, µ). If  is also a topological space (, O), this will
generally mean we expect all open sets to be measurable, so that O ⊂ F and therefore
B ⊂ F. Thus, when all open sets are measurable, µ may be restricted to the Borel sets.
Any measure defined on B is a Borel measure.
Background – measure theory 37

There can be an important advantage to characterizing measurability in terms


of a topology, provided the topology has sufficient structure. For example, if  is
metrizable, then the Borel sets form the smallest class of subsets containing all open
sets which is closed under a countable number of union and intersection operations
(Proposition 7.11, Bertsekas and Shreve (1978)).
At this point, we have defined the properties sufficient to define a type of space
possessing a useful balance of generality and structure, namely the Polish space.
Definition 3.13 A Polish space is a separable completely metrizable topological space.
A Borel space is a Borel subset of a Polish space.

3.2.8 Dynkin system theorem


The Dynkin system theorem is a quite straightforward statement describing the rela-
tionship between various classes of subsets. It permits a number of quite elegant proofs,
and turns out to play a specific role in the theory of dynamic programming.
Definition 3.14 Given a set  and classes of subsets E and L

(i) E is a π-system if A, B ∈ E implies AB ∈ E.


(ii) L is a λ-system if
(a)  ∈ L,
(b) A, B ∈ L and B ⊂ A implies A − B ∈ L,
(c) An ∈ L for n ≥ 1, with An ↑ A implies A ∈ L.

Here, we refer to π- and λ-systems, while other conventions refer to a λ-system as


a Dynkin system, or D-system.
A σ-field is both a π-system and a λ-system. A λ-system is closed under comple-
mentation. A λ-system that is also a π-system (or is closed under finite union) is also
a σ-field. The main theorem follows (see, for example, Billingsley (1995) for a proof):
Theorem 3.9 (Dynkin System Theorem) Given set , if E is a π-system and L is
a λ-system, then E ⊂ L implies σ(E) ⊂ L.
An important consequence is the following:
Theorem 3.10 Given set , if E is a π-system, and µ1 , µ2 are two finite measures
on σ(E) for which µ1 (E) = µ2 (E) for all E ∈ E, then µ1 (E ) = µ2 (E ) for all E ∈ σ(E).
Proof Let E  be the collection of sets E for which µ1 (E ) = µ2 (E ). It is easily verified
that E  is a λ-system. If E ⊂ E  then by Theorem 3.9 σ(E) ⊂ E  . ///

A topology is also a π-system, so that any measures which agree on the open sets
must agree on the Borel sets by Theorem 3.10. The intervals (a, b], with ∅, also form
a π-system.

3.2.9 Signed measures


Under Definition 3.7 the value of a measure is always nonnegative, which certainly
conforms to the intuitive notion of measure. However, even when this is the intention
38 Approximate iterative algorithms

we may have the need to perform algebraic operations on them, and it will prove quite
useful to consider vector spaces of measures. In this case, an operation involving two
measures such as µ1 + µ2 would result in a new measure, say ν = µ1 + µ2 . To be sure,
ν could be evaluated by addition ν(E) = µ1 (E) + µ2 (E) in R for any measurable set E,
but it is an entirely new measure. Subtraction seems just as reasonable, and we can
define a set function by the evaluation ν(E) = µ1 (E) − µ2 (E), represented algebraically
as ν = µ1 − µ2 . Of course, ν(E) might be negative, but we would expect it to share the
essential properties of a measure.
Accordingly, Definition 3.7 can be extended to set functions admitting negative
values.
Definition 3.15 A set function µ : F → R̄, where F is a σ-field on , is a signed
measure if µ(∅) = 0 and if it is countably
 additive, that is for any countable collection
of disjoint sets E1 , E2 , . . . we have i µ(Ei ) = µ ∪j Ej , where the summation is either
absolutely convergent or properly divergent.
This definition does not appear to differ significantly from Definition 3.7, but the
possibility of negative values introduces some new issues. For example, suppose we
wish to modify Lebesgue measure m on R by assigning negative measure below 0,
that is:
ms (E) = −m(E ∩ (−∞, 0)) + m(E ∩ [0, ∞)).
We must be able to assign a measure m((∞, −∞)), which by symmetry should be 0.
However, countable additivity fails for the subsets (i − 1, i], i ∈ I, since the implied
summation is not absolutely convergent.
When signed measures are admitted, the notion of a positive measure must be
clarified. It is possible, for example, to have µ(A) ≥ 0, with µ(B) < 0 for some B ⊂ A.
Accordingly, we say a measurable set A is positive if µ(B) ≥ 0 for all measurable B ⊂ A.
A set is negative if it is positive for −µ. A measure on (, F) is positive (negative) if 
is a positive (negative) set. A set is a null set if it is both positive and negative.
The monotonicity property does not hold for signed-measures. If A is positive and
B is (strictly) negative, then we have µ(A ∪ B) < µ(A). If µ is a positive measure on
(, F) then µ() < ∞ forces all measurable sets to be of finite measure. Similarly, a
signed measure is finite if all measurable sets are of finite measure. In fact, to define
a signed measure as finite, it suffices to assume µ() is finite. Otherwise, suppose for
some E ∈ F we have µ(E) = ∞. Definition 3.15 precludes assignment of a measure to
Ec ∈ F. The definition of the σ-finite property is the same for signed measures as for
positive measures.

3.2.10 Decomposition of measures


We will encounter the situation in which we are given a single measurable space M =
(, F) on which a class of measures is to be defined. Suppose for example, we have a
probability measure P on  = [0, 1] on which P assigns a probability of 1 to outcome
1/2. For this case we need only a σ-field generated by {1/2}. However, we may wish
to consider a family of probability measures defined on a single σ-field, as well as a
method of calculating expected values. For greater generality, we might like F to be the
Borel sets on [0, 1] when continuous random variable arise, but we would also like to
Background – measure theory 39

include our singular example. This poses no particular problem, since this probability
measure is easily described by P(E) = I{1/2 ∈ E} for all Borel sets E.
To clarify this issue, we introduce a few definitions.

Definition 3.16 Let ν and µ be two measures on M = (, F). If µ(E) = 0 ⇒ ν(E) = 0
for all E ∈ F, then ν is absolutely continuous with respect to µ. This is written ν ) µ,
and we also say ν is dominated by µ. If ν ) µ and µ ) ν then ν and µ are equivalent.
Conversely ν and µ are singular if there exists E ∈ F for which ν(E) = µ(Ec ) = 0, also
written ν ⊥ µ.

If ν is absolutely continous with respect to a counting measure on S ⊂  = R,


and µ is absolutely continuous with respect to Lebesgue measure, then ν ⊥ µ since
ν(Sc ) = µ(S) = 0. Note that a pair of measures need not be either singular or equivalent
(consider a measure describing a random waiting time W which is continuous above
0, but for which P(W = 0) > 0, and Lebesgue measure on the positive real numbers).
The Lebesgue Decomposition Theorem will prove useful:

Theorem 3.11 (Lebesgue DecompositionTheorem) Suppose ν, µ are two σ-finite


signed measures defined on a common measurable space (, F). Then there exists a
unique decomposition ν = ν0 + ν1 for which ν1 ) µ and ν0 ⊥ µ.

We have noted that signed measures arise naturally as differences of positive mea-
sures. It turns out that any signed measure can be uniquely represented this way. This
is a consequence of the Jordan-Hahn Decomposition Theorem.

Theorem 3.12 (Jordan-Hahn Decomposition Theorem) Suppose µ is a signed


measure on F.

(i) [Hahn Decomposition] There exists E ∈ F such that E is positive and Ec is


negative. This decomposition is unique up to null sets.
(ii) [Jordan Decomposition] There exists a unique pair of mutually singular
(positive) measures µ+ , µ− for which µ = µ+ − µ− .

The uniqueness of the Jordan decomposition µ = µ+ − µ− permits the definition


of the total variation measure |µ|(E) = µ+ (E) + µ− (E).

3.2.11 Measurable functions


A commonly encountered assumption is that a function is a (usually Borel or Lebesgue)
‘measurable mapping’. It is worth discussing what is meant by this, understanding
that this definition does impose some restrictions on the functions we may consider.
We have the definition:

Definition 3.17 Given two measurable spaces (i , Fi ), i = 1, 2 a mapping f : 1 → 2


is measurable if f −1 (A) ∈ F1 for all A ∈ F2 .

Rather like the definition of a continuous mapping between topologies, a measur-


able functions remains measurable if F1 is replaced by a strictly larger σ-field, or if F2
is replaced by a strictly smaller σ-field.
40 Approximate iterative algorithms

−1 −1
−1 −1
 −1 isc easily proven by noting that f (A ∪ B) = f (A) ∪
The following theorem
f (B) and f (A ) = f (A) .
c

Theorem 3.13 If f maps a measurable space (, F) to range X then the collection
FX of sets E ⊂ X for which f −1 (E) ∈ F is a σ-field.

By Definition 3.17 the idea of a measurable function depends on separate defini-


tions of measurability on the domain and the range. In many applications, when using
real-valued functions it suffices to take the Borel sets as measurable on the range.
We therefore adopt the convention that a function f : X → R is F-measurable, or a
measurable mapping on (X, F), if it is a measurable mapping from (X, F) to (R, B).
This simplifies somewhat the characterization of a real-valued measurable function.
Suppose f is a real-valued function on (X, F). Suppose further that

{x ∈ X | f (x) ≤ α} ∈ F for all α ∈ R. (3.2)

By Theorem 3.13 the class of subsets E ⊂ R for which f −1 (E) ∈ F is a σ-field, and by
assumption it contains all intervals (∞, α], and so also contains the Borel sets (since
this is the smallest σ-field containing these intervals). Of course, <, > or ≥ could
replace ≤ in the inequalities of (3.2). We therefore say a real-valued funtction f is
Borel measurable, or Lebesgue measurable, if F are the Borel sets, or the Lebesgue
sets. Similarly, measurablility of a mapping from a measurable space (, F) to Rn will
be defined wrt the Borel sets on Rn .
Nonmeasurable mappings usually exist, and are easily constructed using indicator
functions of nonmeasurable sets.
We note that composition preserves measurability.

Theorem 3.14 If f , g are measurable mappings from (1 , F1 ) to (2 , F2 ), and from
(2 , F2 ) to (3 , F3 ) respectively, then the composition g ◦ f is a measurable mapping
from (1 , F1 ) to (3 , F3 ).

Note that Theorem 3.14 does not state that compound mappings of Lebesgue
measurable mappings are Lebesgue measurable, since only preimages of Borel sets
(and not Lebesgue sets, a strictly larger class) need be Lebesgue measurable.
If X is a topological space (usually a metric space), then F(X ) will be the set of
mappings f : X → R which are measurable with respect to the Borel sets on X and R.

Theorem 3.15 If f , g ∈ F(X ), then so are f + g, fg, f ∨ g and f ∧ g. If fn is a countable


sequence of measurable mappings, then lim supn fn , lim inf n fn , supn fn and inf n fn are
measurable mappings.
Closure of the class of measurable functions is given only for countable operations.
Suppose E ⊂ [0, 1] is not Lebesgue measurable. For each z ∈ E define function fz (x) =
I{x = z}. Then supz∈E fz = I{x ∈ E}, which is not Lebesgue measurable, even though
each fz is.
In the context of a measure space, the notion of the equality of two functions is
often usefully reduced to equivalence classes of functions. In particular, given a mea-
sure space (, F, µ), we might find it useful to consider f and g equal if f (x) = g(x)
except for x ∈ E for which µ(E) = 0. Such properties are said to hold almost everywhere
Another random document with
no related content on Scribd:
kasvoni koivun runkoa vastaan ja sydämeni löi kuin suonenvedossa.
———

"Liina!"

Mikä ääni se oli, joka niin suloisesti mainitsi nimeäni? Minä


kuuntelin, mutta en liikkunut paikoiltani. "Liina!" vielä toisen kerran,
ja kuin salama käännyin sinne päin. Kaksi suurta silmää loisti minun
silmiini.

"Jansu!" — — — Voimakas ääni herätti minut viimeisestä


hämmästyksestäni, nostin pääni jälleen ylös, vaan ei enää kylmästä
koivunrungosta, vaan Jansun rinnoilta. — "Jansu, suljetko minut
todellakin syliisi, vai onko tämä unennäköä?" kysyin pelokkaasti,
mutta myöskin ihastuksella.

"Liina, minun Liinani, olen todellakin kerran päässyt sinun luoksesi!


Et voi arvata, kuinka suuresti olen sinua ikävöinyt. Mutta nyt en enää
eroa sinusta, sillä ilman sinutta en voi elää. Liina, sano minulle,
voitko puoleksikaan rakastaa minua niin paljon kuin minä sinua.
Tahdotko ainiaaksi yhdistyä minun kanssani?" — Minä painoin
kasvoni uudestaan hänen rintaansa vastaan. — "Liina, sinä et puhu
mitään, sinä rakastat vielä Fredrikiä!" Ja koko hänen ruumiinsa
vapisi. Minä katsoin hänen silmiinsä ja sanoin selvällä äänellä:

"Jansu, sinun seurassasi vaan katoaa minulta kaikki ikävät ja


huolet, sinun kanssasi vaan olen täydellisesti onnellinen." — — —
Suun antaminen yhdisti sielun sieluun.

"Liina, sano minulle vielä, että olet minun morsiameni. Minä


tahdon selvään kuulla sen sinun suustasi." Minä täytin hänen
tahtonsa.
Tämän jälkeen kysyi hän minulta, miksi olin itkenyt. Minä kerroin
hänelle kaikki. Hän surkutteli, hän painoi minut uudestaan rintaansa
vastaan, suuteli minua yhä uudestaan ja minä vakuutin, ett'en enää
milloinkaan tuntisi semmoista ikävää. — Sitte kertoi hän,
kysymykseeni, että hän heti tutkintonsa suoritettuaan oli
matkustanut Pietarista, ehtinyt päivänlaskunaikana kotiin ja,
tervehdittyään vanhempiaan, kiirehtinyt minun luokseni.

"Mutta, Jansu, ihmiset ovat jo kaikki lähteneet pois ja


Juhannuskokkokin alkaa sammua, meidän täytyy myöskin lähteä
kotiin, jos emme tahdo odottaa sadetta."

"Sinä olet oikeassa", sanoi hän, minä pistin käteni hänen


kainaloonsa ja me tulimme kotiin. Salamaa löi ja ukkonen kuului,
mutta meillä oli niin paljo puhumista, ettemme sitä juuri
huomanneetkaan. Pääsimme kuitenkin kuivina kotiin.

Jansun vanhemmille oli aika kotona käynyt pitkäksi, he tulivat


meille ja siellä odottivat minun vanhempaini kanssa meitä. Suurella
ilolla he ottivat meidät vastaan, ja kuinka suuri vanhan seppä-isän
ihastus oli, kun hän kuuli meidän liittomme, sitä ei voi sanoin
selittää. Hän painoi meitä rintaansa vastaan ja sanoi pojalleen:
"Jansu, minä olen sinulle perinnöksi koettanut rakkaasti suojella
Liinaa kuin kalleinta aarrettani, sillä tiesin, että sydämesi oli häneen
kiintynyt lapsuudesta asti. — Ja, Liina, kysyi hän minulta, eikö minun
Jaanini — nyt nimitti hän Jansua ensimmäisen kerran näin — eikö
minun Jaanini käsivarsi voi kantaa sinua elämän tiellä? Eikö hän, kun
hän tuossa solakkana seisoo, ole sinun mieleisesi?" — Isän ylpeys
ilmaantui hänen sanoissaan. Jansun suuruuden ja kauneuden
huomasin minäkin nyt vasta, ja ylpeys kohotti minunkin rintaani. Että
minun piti saada semmoinen mies!
"Isä, hän on kauniimpi kuin Apollo", sanoin minä, ymmärsi isä sen
taikka ei. — Hänen äitinsäkin ilo oli suuri; mutta minun vanhempieni
iloon sekaantui salainen mielipaha.

"Niin pian pitää meidän erota rakkaasta apulaisestamme", sanoi


isä värisevin äänin ja äiti käänsi kasvonsa pois meistä. Minä
heittäydyin hänen syliinsä. Me syleilimme toisiamme ja meidän
kyyneleemme vuotivat.

"Tulkaa katsomaan, kuinka päivä nousee taas kirkkaana!" huusi


seppä-isä.

Jansu tuli minun luokseni, otti minut äitini sylistä omaan


kainaloonsa ja vei akkunan luo. "Katso Liina", sanoi hän, että kaikki
kuulivat, "yöllinen rajuilma ja pimeys ovat niin kadonneet, ettei
yhtään pilveä ole jäänyt, ja päivä tulee hehkuvin poskin
huoneestaan, kuin onnellinen morsian! Katso, niin pitää kerran
virolaisten vapaus ja elinvoima nouseman. Ja mekin yhdistymme,
voidaksemme paremmin työtämme tehdä, sillä helppoa se työ ei tule
olemaan. Siinä mies usein taistelussa väsyy ja naisen suloinen
rakkaus häntä silloin virkistää ja lääkitsee hänen sydämensä haavat."

"Jumala siunatkoon teidän liittonne!" sanoivat molemmat isät kuin


yhdestä suusta, pannen kätensä meidän päämme päälle.

Loppulause.

"Kuule, äiti, joku tulee ajaen!" — "Niin, se on meidän rakas


isämme, se on isä, minä näen jo Vaskan!" Näin huusivat minun
seitsenvuotias Annani ja kaksitoistavuotias Arturi ja juoksivat isäänsä
vastaan. Hän tuli eräästä maanviljelyskokouksesta. Minä jäin
odottamaan portaille, joka nyt oli kaunistettu koivuilla ja
kukkaseppeleillä ja johon illallispöytä oli valmistettu. Sieltä tulivat he
myöskin kohta, koivukäytävästä, esille ja ajoivat viheriäksi
maalatusta portista sisään. Arturi, joka oli kuin toinen Jansu, istui
kaksin reisin ystävänsä, Vaskan, seljässä. Anna taas, valkoisissa
vaatteissa, juhannuskukista ja harakankelloista tehty seppele
mustilla kiharoilla, oli rakkaan isänsä vieressä. Suurella ruohoisella
pihalla, jonka keskellä iso tammi hiljaa kohisee, pysähtyy kallein
tavarakuormani portaiden eteen. Jaani astuu kärryiltä, panee lapset
maahan. Minä käyn hänen luoksensa ja hän suutelee minua.

"Liina", sanoo hän, istuen viheriäiselle penkille ja ottaa minut


syliinsä. "Kuinka suurella ikävällä odotin tätä päivää 15 vuotta sitten,
joka niin lempeästi yhdisti meidät, ja siitä olen suuressa
kiitollisuuden velassa Jumalalle. Sillä jos sinä et olisi näinä
viitenätoista vuotena ollut minun apunani, et olisi ottanut osaa
töihini, pyrintöihini ja toimituksiini, et olisi neuvonut ja lohduttanut
minua, niin ei olisi voimani kestänyt, enkä olisi aina pysynyt niin
vakavana, kuin nyt. Sillä aina, missä hyvä asia kohottaa päätänsä,
siinä on niin paljo kateutta, suvaitsemattomuutta ja pahuutta
vastassa, että rohkeus ja totuuden tunto pitää olla suuri ja
horjumatoin, jos siinä jaksaa iloisesti ja toivoen pyrkiä eteenpäin.
Kyllä tuntee sydämeni iloa, kun katson ympärilleni ja näen
maanviljelys-, kirjallisuus- ja lauluja opettajaseurat, jotka jo ovat
elossa, ja luen sanomalehtiä omalla kielelläni. Mutta pian
sekaantuvat jälleen nämä ilot epäilyksiin. Kuinka paljon voivat
meidän pienet seuramme uhrata kansan sivistyksen hyväksi, kun
suuret maksut heitä kaikkia rasittavat? Mitä auttaa meidän
Aleksanteri-koulumme, jonka me kymmenvuotisen, kovan taistelun
jälkeen kohta saamme toimeen, kun siinä koulutetut nuoret miehet
eivät kuitenkaan saa mitään valtionvirkaa."

"Rakas Jaani, tänäpäivänä ennen puoltapäivää olin lasten kanssa


Linnun väen luona vieraisilla. Minä tapasin siellä perheenmiehen ja
perheennaisen koivuilla ja kukilla kaunistetussa tuvassa, puhtaaksi
pestyn pöydän ääressä istumassa ison, kirkkaan akkunan luona. He
kuuntelivat mielihalulla 'Saaremaa onupoja’n' kirjoituksia, joita luki
heidän vanhin poikansa Kaarlo. Hän on käynyt kihlakunnan koulun
läpi ja luki niin hyvällä äänenpainolla, että minun täytyi ihmetellä.
Pöydällä oli vielä muutamia muitakin kirjoja, niiden joukossa C.R.
Jacobsonin kolme isänmaan puhetta, Väinämöisen kanteleen kielet,
Viljandin laulaja ja myöskin 'Eesti postimees'. Minä istuin myös
heidän joukkoonsa ja kuuntelin myös hyvillä mielin. Lapset leikkivät
heidän pienempien lastensa kanssa omenatarhassa. Annettiin meille
sen jälkeen myös kauniisti katetussa pöydässä munaruokaa ja
paksua viilipiimää syödä. Mutta ennenkuin me lähdimme, täytyi
minun vielä kuulla, kuinka sinua joka haaralla kiitettiin. Perheenmies
tunnusti liikutuksella, että hän vasta siitä asti on ruvennut oikeen
elämään, kun sinä ostit tämän kartanon ja myit talonpojille heidän
arentimaansa kohtuulliseen hintaan, josta he joka vuosi maksavat
arentia niin paljon, kuin heidän sisääntulonsa kannattaa, etteivät
velkaannu. Ainoastaan siten he ovat päässeet suuresta
köyhyydestään, johon olivat vajonneet. Ja siten voivat he saavuttaa
ihmisarvonsa."

"Rakas Liina, varmaankaan ei voinut Linnun emäntä unhottaa,


kuinka sinä olet opettanut hänelle huoneiden puhdistusta ja
kaikenlaisia muitakin hyviä tapoja, joita hän nyt koettaa omaistensa
hyväksi toteuttaa. Mutta se on aina sinun tapasi, että muistat vaan
ne kiitokset, jotka minun osakseni tulevat, mutta annat omat hyvät
työsi mennä yhdestä korvasta sisään, toisesta ulos", nuhteli Jaani
minua hymyillen.

"Ei", väitin minä, "minun sydämeni ei pysy kylmänä näitä


kuullessani, vaan tunnen suurta iloa, jos huomaan, että minäkin olen
voinut tehdä jotakin hyvää kansani eduksi. Mutta sen kuitenkin aina
muistan, etten ole suuria saanut aikaan."

"Rakas Liina, elä unhota sitä, mitä jo ennen olen sinulle sanonut,
että sinä, kun olet minun tukenani ollut, olet myös kansallesi paljon
hyvää tehnyt. Mies parkoja, joilta puuttuu sellaiset tukeet! Milloinka
alkavat myös naisemme vapautua orjuuden ikeestä? Oi, Liina, puhu
tovereillesi, huuda, ehkä kuulee vielä joku heistä, jotka oman
kansansa keskuudesta ovat sekaantuneet saksalaisiin."

Samassa tulivat lapsemme tallin luota, jossa olivat katselleet


Vaskan riisumista ja talliin viemistä. He kysyivät isältään, oliko hän
käynyt katsomassa heidän rakasta isoisäänsä ja isoäitiään.

"Kävin minä. He lähettivät teille paljon terveisiä ja käskivät teitä


tulla pian heitä katsomaan, sillä heillä on teitä hyvin ikävä, kun eivät
niin pitkään aikaan ole nähneet. Myöskin vanha setä ikävöi teitä
nähdä ja tulee kohta veljensä luota tänne."

"Me ajamme sinne ja tuomme itse hänet kotiin!" huusivat


molemmat lapset yhdestä suusta.

"Ovatko terveinä kaikki?" kysyin minä, ja Jaani vastasi:

"Vanhempamme viihtyvät hyvästi omalla tilallaan. Todella ovat he


siitä asti, kun viimeisen velkansa talon hinnasta maksoivat, käyneet
koko joukon nuoremmiksi. He kiittävät myöskin sitä onnea, että
voivat ostaa maansa toistensa viereen ja että vanhoilla päivillään
saivat semmoisen onnen, jota eivät nuorena voineet toivoakkaan.
Isäni kertoi vielä minulle toivovansa, että sinun nuorin veljesi kihlaa
hänen kauniin kasvattityttärensä, joka on äidille ollut niin suurena
apuna, jotta hän, äidin kuoleman jälkeen, joutuisi rakkaan pojan
kanssa naimisiin. Sitä minäkin sydämestäni toivon. Nuorempi
sisaresi, nykyinen äidin oikea käsi, on vanhemman veljen kanssa
mennyt sisaren luo vieraaksi. Sieltä he kaikki yhdessä tulevat meille,
myöskin sisar, miehineen ja lapsineen, ja vievät meidät vanhempien
luona käymään."

"Oi, kuinka hyvä, kuinka hyvä!" huusivat lapset.

"Mutta nyt olen kaikki kertonut. Vatsa vaatii ruokaa ja juusto ja voi
pöydällä näyttävät niin hyviltä, että niitä täytyy ruveta syömään",
sanoi armas Jaanini. Minä istuin hänen viereensä pöydän ääreen ja
lapset meitä vastapäätä toiselle puolelle ja rupesimme syömään,
juuri kuin aurinko meni mailleen.

Tämä kaikki tapahtui eilen juhannuspäivänä. Nyt istun varhaisesta


aamusta alkaen kirjotuspöytäni ääressä, lopettaen elämäkertani ja
täyttäen Jaanini toivon. Sen tähden:

Rakkaat Viron sisaret! Minun mieheni on yksi niitä, jotka


taistelevat vapauden puolesta, että inhimillisempi elämä alkaisi ja
hengen kevät rupeisi kukoistamaan. Kuinka paljon aikaa, voimia ja
itsensä uhrausta tämä työ vaatii, sen olen näiden viidentoista vuoden
kuluessa huomannut miehenikin toimesta. Ja olen jo kertonut hänen
omilla sanoillaan, minä apuna olen hänelle ollut hänen töissään,
pyrinnöissään ja toiveissaan. Mutta ne ovat harvassa, jotka voivat
joka tilassa, joka paikassa seisoa tukena miestensä rinnalla ja tehdä
työtä yhdessä. Mutta naiset eivät voi kartuttaa henkensä voimia
toisella tiellä kuin miehetkään. Se tie kulkee hyvien koulujen kautta
vapauden helmaan, jossa ainoastaan kaikki hyvät voimat voivat
kehittyä. Sentähden, rakkaat Viron sisaret, perustakaa myöskin
tyttärillenne kouluja, niinkuin pojillennekin kouluja toimitetaan.
Hankkikaa heille kunniakkaampi kasvatus, kuin teillä itsellänne on
ollut, ettei teidän poikanne tarvitse mennä suurinta onneaan
etsimään vieraan kansan naisista, vaan löytävät sen omien
neitojensa joukosta. — Mutta kuulkaa tekin, rakkaat sisaret, joita
Saksan koulut ovat vieroittaneet omasta kansastanne! Kuulkaa
minun rukoilevaa ääntäni ja osoittakaa ystävällistä mieltä,
myötätuntoisuutta ja rakkautta ja ojentakaa kätenne Viron sisarille ja
auttakaa heitä vapautumaan pimeyden kahleista! Tulkaa takaisin ja
älkää siroittako kukkianne vieraille miehille! Ruvetkaa joka puolelta,
alhaalta ja ylhäältä, niinkuin sukukansamme suomalaiset naiset ovat
olleet tukena miesten rinnalla, miehillenne avuksi, että Vironkin
kansa nousisi kukoistukseensa, niinkuin kaikki muutkin kansat,
jokainen ajallaan, on noussut. Niin, yhdistäkäämme rakkautemme ja
rientäkäämme myöskin me eteenpäin!
*** END OF THE PROJECT GUTENBERG EBOOK LIINA ***

Updated editions will replace the previous one—the old editions will
be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States copyright in
these works, so the Foundation (and you!) can copy and distribute it
in the United States without permission and without paying
copyright royalties. Special rules, set forth in the General Terms of
Use part of this license, apply to copying and distributing Project
Gutenberg™ electronic works to protect the PROJECT GUTENBERG™
concept and trademark. Project Gutenberg is a registered trademark,
and may not be used if you charge for an eBook, except by following
the terms of the trademark license, including paying royalties for use
of the Project Gutenberg trademark. If you do not charge anything
for copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such as
creation of derivative works, reports, performances and research.
Project Gutenberg eBooks may be modified and printed and given
away—you may do practically ANYTHING in the United States with
eBooks not protected by U.S. copyright law. Redistribution is subject
to the trademark license, especially commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the free


distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund
from the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only be


used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law
in the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name associated
with the work. You can easily comply with the terms of this
agreement by keeping this work in the same format with its attached
full Project Gutenberg™ License when you share it without charge
with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears,
or with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived


from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted


with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning
of this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this


electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the Project
Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing


access to or distributing Project Gutenberg™ electronic works
provided that:

• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™


electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for


the “Right of Replacement or Refund” described in paragraph 1.F.3,
the Project Gutenberg Literary Archive Foundation, the owner of the
Project Gutenberg™ trademark, and any other party distributing a
Project Gutenberg™ electronic work under this agreement, disclaim
all liability to you for damages, costs and expenses, including legal
fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR
NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR
BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK
OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL
NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT,
CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of receiving
it, you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or
entity that provided you with the defective work may elect to provide
a replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,


the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.

Section 2. Information about the Mission


of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.
Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

You might also like