Iterative Algorithms 1st Edition Almudevar
Iterative Algorithms 1st Edition Almudevar
Iterative Algorithms 1st Edition Almudevar
com
https://ebookgate.com/product/approximate-
iterative-algorithms-1st-edition-almudevar/
https://ebookgate.com/product/approximate-iterative-
algorithms-1st-edition-anthony-almudevar/
https://ebookgate.com/product/iterative-receiver-design-1st-
edition-henk-wymeersch/
https://ebookgate.com/product/iterative-methods-in-combinatorial-
optimization-1st-edition-lap-chi-lau/
https://ebookgate.com/product/algorithms-1st-edition-umesh-
vazirani/
Algorithms Illuminated Part 4 Algorithms for NP Hard
Problems 1st Edition Tim Roughgarden
https://ebookgate.com/product/algorithms-illuminated-
part-4-algorithms-for-np-hard-problems-1st-edition-tim-
roughgarden/
https://ebookgate.com/product/introduction-to-approximate-
solution-techniques-numerical-modeling-and-finite-element-
methods-1st-edition-victor-n-kaliakin/
https://ebookgate.com/product/essential-algorithms-1st-edition-
rod-stephens/
https://ebookgate.com/product/beginning-algorithms-1st-edition-
simon-harris/
https://ebookgate.com/product/grokking-algorithms-1st-edition-
aditya-y-bhargava/
Approximate Iterative Algorithms
Approximate Iterative Algorithms
Anthony Almudevar
Department of Biostatistics and Computational Biology,
University of Rochester, Rochester, NY, USA
CRC Press/Balkema is an imprint of the Taylor & Francis Group, an informa business
© 2014 Taylor & Francis Group, London, UK
Typeset by MPS Limited, Chennai, India
Printed and Bound by CPI Group (UK) Ltd, Croydon, CR0 4YY
All rights reserved. No part of this publication or the information contained
herein may be reproduced, stored in a retrieval system, or transmitted in any
form or by any means, electronic, mechanical, by photocopying, recording or
otherwise, without written prior permission from the publisher.
Although all care is taken to ensure integrity and the quality of this publication
and the information herein, no responsibility is assumed by the publishers nor
the author for any damage to the property or persons as a result of operation
or use of this publication and/or the information contained herein.
Library of Congress Cataloging-in-Publication Data
Almudevar, Anthony, author.
Approximate iterative algorithms / Anthony Almudevar, Department of
Biostatistics and Computational Biology, University of Rochester, Rochester, NY, USA.
pages cm
Includes bibliographical references and index.
ISBN 978-0-415-62154-0 (hardback) — ISBN 978-0-203-50341-6 (eBook PDF)
1. Approximation algorithms. 2. Functional analysis. 3. Probabilities.
4. Markov processes. I. Title.
QA76.9.A43A46 2014
519.2 33—dc23
2013041800
Published by: CRC Press/Balkema
P.O. Box 11320, 2301 EH Leiden,The Netherlands
e-mail: Pub.NL@taylorandfrancis.com
www.crcpress.com – www.taylorandfrancis.com
ISBN: 978-0-415-62154-0 (Hardback)
ISBN: 978-0-203-50341-6 (eBook PDF)
Table of contents
1 Introduction 1
PART I
Mathematical background 3
PART II
General theory of approximate iterative algorithms 189
PART III
Application to Markov decision processes 247
Bibliography 351
Subject index 357
Chapter 1
Introduction
The scope of this volume is quite specific. Suppose we wish to determine the solution
V ∗ to a fixed point equation V = TV for some operator T. Under suitable conditions,
V ∗ will be the limit of an iterative algorithm
V 0 = v0
Vk = TVk−1 , k = 1, 2, . . . , (1.1)
where v0 is some initial solution. Such algorithms are ubiquitous in applied mathemat-
ics, and their properties well known.
Then suppose (1.1) is replaced with an approximation
V0 = v0
Vk = T̂k Vk−1 , k = 1, 2, . . . , (1.2)
where each T̂k is close to T in some sense. The subject of this book is the analysis
of algorithms of the form (1.2). The material in this book is organized around three
questions:
(Q1) If (1.1) converges to V ∗ , under what conditions does (1.2) also converge to V ∗ ?
(Q2) How does the approximation affect the limiting properties of (1.2)? How close
is the limit of (1.2) to V ∗ , and what is the rate of convergence (particularly in
comparison to that of (1.1))?
(Q3) If (1.2) is subject to design, in the sense that an approximation parameter, such
as grid size, can be selected for each T̂k , can an approximation schedule be
determined which minimizes approximation error as a function of computation
time?
From a theoretical point of view, the purpose of this book is to show how quite
straightforward principles of functional analysis can be used to resolve these ques-
tions with a high degree of generality. From the point of view of applications, the
primary interest is in dynamic programming and Markov decision processes (MDP),
with emphasis on approximation methods and computational efficiency. The emphasis
2 Approximate iterative algorithms
is less on the construction of specific algorithms then with the development of theoret-
ical tools with which broad classes of algorithms can be defined, and hence analyzed
with a common theory.
The book is divided into three parts. Chapters 2–8 cover background material in
real analysis, linear algebra, measure theory, probability theory and functional analy-
sis. This section is fairly extensive in comparison to other volumes dealing specifically
with MDPs. The intention is that the language of functional analysis be used to express
concepts from the other disciplines, in as general but concise a manner as possible.
By necessity, many proofs are omitted in these chapters, but suitable references are
given when appropriate.
Chapters 9–11 form the core of the volume, in the sense that the questions (Q1)–
(Q3) are largely considered here. Although a number of examples are considered (most
notable, an analysis of the Robbins-Monro algorithm), the main purpose is to deduce
properties of general classes of approximate iterative algorithms on Banach and Hilbert
spaces.
The remaining chapters deal with Markov decision processes (MDPs), which forms
the principal motivation for the theory presented here. A foundation theory of MDPs
is given in Chapters 12 and 13, from the point of view of functional analysis, while
the remain chapters discuss approximation methods.
Finally, I would like to acknowledge the patience and support of colleagues and
family, especially Cynthia, Benjamin and Jacob.
Part I
Mathematical background
Chapter 2
In this chapter we first define notation, then review a number of important results
in real analysis and linear algebra of which use will be made in later chapters. Most
readers will be familiar with the material, but in a number of cases it will be important
to establish which of several commonly used conventions will be used. It will also
prove convenient from time to time to have a reference close at hand. This may be
especially true of the section on spectral decomposition.
In this section we describe the notational conventions and basic definitions to be used
throughout the book.
The cardinality of a set E is the number of elements it contains, and is denoted |E|.
If |E| < ∞ then E is a finite set. We have |∅| = 0. If |E| = ∞, this statement does not
suffice to characterize the cardinality of E. Two sets A, B are in a 1-1 correspondence
if a collection of pairs (a, b), a ∈ A, b ∈ B can be constructed such that each element of
A and of B is in exactly one pair. In this case, A and B are of equal cardinality. The
pairing is known as a bijection.
If the elements of A can be placed in a 1-1 correspondence with N we say A is
countable (is denumerable). We also adopt the convention of referring to any subset of
a countable set as countable. This means all finite sets are countable. If for countable
A we have |A| = ∞ then A is infinitely countable. Note that by some conventions, the
term countable is reserved for infinitely countable sets. For our purposes, it is more
natural to consider the finite sets as countable.
Real analysis and linear algebra 7
All infinitely countable sets are of equal cardinality with N, and so are
mutually of equal cardinality. informally, a set is countable if it can be writ-
ten as a list, finite or infinte. The set Nd is countable since, for example, N2 =
{(1, 1), (1, 2), (2, 1), (1, 3), (2, 2), (3, 1), . . .}. The set of rational numbers is countable,
since the pairing of numerator and denominator, in any canonical representation, is a
subset of N2 .
A set A is uncountable (is nondenumerable) if |A| = ∞ but A is not countable. The
set of real numbers, or any nonempty interval of real numbers, is uncountable.
If A1 , . . . , Ad are d sets, then A1 × A2 × · · · × Ad = ×di=1 Ai is a product set, con-
sisting of the set of all ordered selections of one element from each set ai ∈ Ai . A vector
is an element of a product set, but a product set is more general, since the sets Ai need
not be equal, or even contain the same type of element. The definition may be extended
to arbitrary forms of index sets.
For two numbers a, b ∈ R̄, we may use the notations max{a, b} = x ∨ y = max(a, b)
and min{a, b} = x ∧ y = min(a, b).
2.1.6 Functions
If X, Y are two sets, then a function f : X → Y assigns a unique element of Y to each
element of X, in particular y = f (x). We refer to X and Y as the domain and range
(or codomain) of f . The image of a subset A ⊂ X is f (A) = {f (x) ∈ Y | x ∈ A}, and the
preimage (or inverse image) of a subset B ∈ Y is f −1 (B) = {x ∈ X | f (x) ∈ B}. We say f is
8 Approximate iterative algorithms
When the context is clear, we may use the more compact notation d̃ = (d1 , d2 , . . . )
to represent a sequence {dk }. If ã = {ak } and b̃ = {bk } then we write ã ≤ b̃ if ak ≤ bk
for all k.
Let S be the class of all sequences of finite positive real numbers which con-
verge to zero, and let S − be those sequences in S which are nonincreasing. If {ak } ∈ S
we define the lower and upper convergence rates λl {ak } = lim inf k→∞ ak+1 /ak and
λu {ak } = lim supk→∞ ak+1 /ak . If 0 < λl {ak } ≤ λu {ak } < 1 then {ak } converges linearly.
If λu {ak } = 0 or λl {ak } = 1 then {ak } converges superlinearly or sublinearly, respec-
tively. We also define a weaker characterization of linear convergence by setting
1/k 1/k
λ̂l {ak } = lim inf k→∞ ak and λ̂u {ak } = lim supk→∞ ak .
When λl {ak } = λu {ak } = ρ we write λ{ak } = ρ. Similarly λ̂l {ak } = λ̂u {ak } = ρ is
written λ̂{ak } = ρ.
A sequence {ak } is of order {bk } if lim supk ak /bk < ∞, and may be written ak =
O(bk ). If ak = O(bk ) and bk = O(ak ) we write ak = (bk ). Similarly, for two real valued
mappings ft , gt on (0, ∞) we write ft = O(gt ) if lim supt→∞ ft /gt < ∞, and ft = (gt ) if
ft = O(gt ) and gt = O(ft ).
A sequence {bk } dominates {ak } if limk ak /bk = 0, which may be written ak =
o(bk ). A stronger condition holds if λu {ak } < λl {bk }, in which case we say {bk } lin-
early dominates {ak }, which may be written ak = o (bk ). Similarly, for two real
valued mappings ft , gt on (0, ∞) we write ft = o(gt ) if limt→∞ ft /gt = 0, that is, gt
dominates ft .
∞
ak = ak = a1 + a2 + · · · .
k=1 k
Some care is needed in defining a sum of an infinite collection of numbers. First, define
partial sums
n
Sn = ak = a1 + a2 + · · · + an , n ≥ 1.
k=1
It is important to establish whether or not the value of the series depends on the
σ : N " → N is a bijective mapping (essentially,
order of the sequence. Precisely, suppose
an infinite permutation). If the series k ak exists, we would like to know if
ak = aσ(k) . (2.1)
k k
Since these two quantities are limits of distinct partial sums, equality need not hold.
This question
has a quite definite resolution. A series k ak is called absolutely conver-
gent if k |ak | is convergent (so that all convergent series of nonnegative sequences are
absolutely convergent). A convergent sequence is unconditionally convergent if (2.1)
holds for all permutations σ. It may be shown that a series is absolutely convergent
if and only if it is unconditionally convergent. Therefore, a convergent series may be
defined as conditionally convergent if either it is not absolutely convergent, or if (2.1)
does not hold for at least one σ. Interestingly, by the Riemann series theorem, if k ak
is conditionally convergent then for any L ∈ R̄ there exists permutation σL for which
k aσL (k) = L.
There exist many well known tests for series convergence, and can be found in
most calculus textbooks.
Let E = {at ; t ∈ T } be a infinitely countable indexed set of extended real numbers.
For example, we may have T = Nd . When there is no ambiguity, we can take t at
to be the sum of all elements of E. Of course, in this case the implication is that the
sum does not depend on the summation order. This is the case if and only if there
is a bijective mapping σ : N " → T for which k aσ(k) is absolutely convergent. If this
holds, it holds for all such bijective mappings. All that is needed is to verify that the
cumulative sum of the elements |at |, taken in any order, remains bounded. This is
written, when possible
at = at .
t∈T t
We also define for a sequence {ak } the product ∞
k=1 ak . We will usually be interested
in products of positive sequences, so this may be converted to a series by the log
transformation:
∞ ∞
log ak = log(ak )
k=1 k=1
∞
(i + m)! m!
ri = for r2 < 1, m = 0, 1, 2, . . .
i! (1 − r)m+1
i=0
n
1 − rn+1
ri = for r = 1. (2.3)
1−r
i=0
2.1.11 Graphs
A graph is a collection of nodes and edges. Most commonly, there are m nodes uniquely
labeled by elements of set V = {1, . . . , m}. We may identify the set of nodes as V
(although sometimes unlabeled graphs are studied). An edge is a connection between
two nodes, of which there are two types. A directed edge is any ordered pair from V,
and an undirected edge is any unordered pair from V. Possibly, the two nodes defining
an edge are the same, which yields a self edge. If E is any set of edges, then G = (V, E)
defines a graph. If all edges are directed (undirected), the graph is described as directed
(undirected), but a graph may contain both types.
It is natural to imagine a dynamic process on a graph defined by node occupancy.
A directed edge (v1 , v2 ) denotes the possibly of a transition from v1 to v2 . Accordingly,
a path within a directed graph G = (V, E) is any sequence of nodes v0 , v1 , . . . , vn for
which (vi−1 , vi ) ∈ E for 1 ≤ i ≤ n. This describes a path from v0 to vn of length n (the
number of edges needed to construct the path).
It will be instructive to borrow some of the terminology associated with the theory
of Markov chains (Section 5.2). For example, if there exists a path starting at i and
ending at j we say that j is accessible from i, which is written i → j. If i → j and j → i
then i and j communicate, which is written i ↔ j. The connectivity properties of a
directed graph are concerned with statements of this kind, as well as lengths of the
relevant paths.
The adjacency matrix adj(G) of graph G is an m × m 0-1 matrix with element
gi,j = 1 if and only if the graph contains directed edge (i, j). The path properties of G
can be deduced directly from the iterates adj(G)n (conventions for matrices are given
in Section 2.3.1).
Theorem 2.1 For any directed graph G with adjacency matrix AG = adj(G) there
exists a path of length n from node i to node j if and only if element i, j of AnG is
positive.
Proof Let g[k]i,j be element i, j of AkG . All such elements are nonnegative. Suppose,
as an induction hypothesis, the theorem holds for all paths of length n , for any n < n.
We may write
m
g[n]i,j = g[n ]i,k g[n − n ]k,j ,
k=1
from which we conclude that g[n]i,j > 0 if and only if for some k we have g[n ]i,k > 0
and g[n − n ]k,j > 0. Under the induction hypothesis, the latter statement is equivalent
to the claim that for all n < n there is a node k for which there exists a path of length
n from i to k and a path of length n − n from k to j. In turn, this claim is equivalent
to the claim that there exists a path of length n from i to j. The induction hypothesis
clearly holds for n = 1, which completes the proof. ///
graph. It is especially important to note that in Theorem 2.1 we can, without loss of
generality, replace the ‘1’ elements in AG with any positive number. Accordingly, we
give an alternative version of Theorem 2.2 for nonnegative matrices.
The implications of this type of path structure are discussed further in Sections
2.3.4 and 5.2.
The Binomial Theorem states that for a, b ∈ R and n ∈ N the following equality
holds
n
n i n−i
(a + b)n = ab . (2.4)
i
i=0
The approximation is quite sharp, guaranteeing that (a) limn→∞ n!/sn = 1; (b) 1 <
n!/sn < e1/12 < 1.087 for all n ≥ 1; (c) (12n + 1)−1 < log(n!) − log(sn ) < (12n)−1 for all
n ≥ 1.
14 Approximate iterative algorithms
Theorem 2.3 Suppose f is n + 1 times differentiable on [a, b], f ∈ C n ([a, b]), and x0 ∈
[a, b]. Then for each x ∈ [a, b] there exists η(x), satisfying min(x, x0 ) ≤ η(x) ≤ max(x, x0 )
for which
f (n+1) (η(x))
Rn (x; x0 ) = (x − x0 )n+1 , (Lagrange form) (2.8)
(n + 1)!
The Lagrange form of the remainder term is the one commonly intended, and
we adopt that convention here, although it is worth noting that alternative forms are
also used.
%x%∞ = max|xi | .
i
when p = ∞.
Theorem 2.4 Suppose for positive numbers ã = (a1 , . . . , an ) and real number p ∈
p
1/p
(−∞, 0) ∪ (0, ∞) we define power mean Mp [ã] = n−1 ni=1 ai . Then
1/n
n
lim Mp [ã] = ai = M0 [ã] , (2.10)
p→0
i=1
which justifies the conventional definitions of M−∞ [ã] , M0 [ã] and M∞ [ã]. In addition,
−∞ ≤ p < q ≤ ∞ implies Mp [ã] ≤ Mq [ã], with equality if and only if all elements of ã
are equal.
Proof By l’Hôpital’s Rule,
n
n−1
p
n
i=1 log(ai )ai
lim log(Mp [ã] ) = lim p = n−1 log(ai ) = log(M0 [ã] ).
p→0 p→0 n−1 ni=1 ai i=1
The notion of equivalence relationships and classes will play an important role in our
analysis. Suppose X is a set of objects, and ∼ defines a binary relation between two
objects x, y ∈ X .
Definition 2.1 A binary relation ∼ on a set X is an equivalence relation if it satisfies
the following three properties for any x, y, z ∈ X :
Reflexivity x ∼ x.
Symmetry If x ∼ y then y ∼ x.
Transitivity If x ∼ y and y ∼ z then x ∼ z.
Formal definitions of both a field and a vector space are given in Section 6.3. For the
moment we simply note that the notion of real numbers can be generalized to that
of a field K, which is a set of scalars that is closed under the rules of addition and
multiplication comparable to those available for R. Both R and C are fields.
A vector space V ⊂ Kn is any set of vectors x ∈ Kn which is closed under linear and
scalar composition, that is, if x, y ∈ V then ax + by ∈ V for all scalars a, b. This means
the zero vector 0 must be in V, and that x ∈ V implies −x ∈ V.
Elements x1 , . . . , xm of Kn are linearly independent if m i=1 ai xi = 0 implies ai = 0
for all i. Equivalently, no xi is a linear combination of the remaining vectors. The span
of a set of vectors x̃ = (x1 , . . . , xn ), denoted span(x̃), is the set of all linear combina-
tions of vectors in x̃, which must be a vector space. Suppose the vectors in x̃ are not
linearly independent. This means that, say, xm is a linear combination of the remaining
vectors, and so any linear combination in span(x̃) including xm may be replaced with
one including only the remaining vectors, so that span(x̃) = span(x1 , . . . , xm−1 ). The
dimension of a vector space V is the minimum number of vectors whose span equals
V. Clearly, this equals the number in any set of linearly independent vectors which
span V. Any such set of vectors forms a basis for V. Any vector space has a basis.
2.3.1 Matrices
Let Mm,n (K) be the set of m × n matrices A, for which Ai,j ∈ K (or, when required for
clarity, [A]i,j ∈ K) is the element of the ith row and jth column. When the field need not
be given, we will write Mm,n = Mm,n (K). We will generally be interested in Mm,n (C),
noting that the real matrices Mm,n (R) ⊂ Mm,n (C) can be considered a special case of
Real analysis and linear algebra 17
complex matrices, so that any resulting theory holds for both types. This is important
to note, since even when interest is confined to real valued matrices, complex numbers
enter the analysis in a natural way, so it is ultimately necessary to consider complex
vectors and matrices. Definitions associated with real matrices (transpose, symmetric,
and so on) have analgous definitions for complex matrices, which reduce to the more
familiar definitions when the matrix is real.
The square matrices are denoted as Mm = Mm,m . Elements of Mm,1 are column
vectors and elements of M1,m are row vectors. A matrix in Mm,n is equivalently an
ordered set of m row vectors or n column vectors. The transpose AT ∈ Mn,m of a matrix
A ∈ Mm,n has elements Aj,i = Ai,j . For A ∈ Mn,k , B ∈ Mk,m we always understand matrix
multiplication to mean that C = AB ∈ Mn,m possesses elements Ci,j = kk =1 Ai,k Bk ,j , so
that matrix multiplication is generally not commutative. Then (AT )T = A and (AB)T =
BT AT where the product is permitted.
In the context of matrix algrebra, a vector x ∈ Kn is usually assumed to be a
column vector in Mn,1 . Therefore, if A ∈ Mm,n then the expression Ax is understood to
be evaluated by matrix multiplication. Similarly, if x ∈ Km we may use the expression
xT A, understanding that x ∈ Mm,1 .
When A ∈ Mm,n (C), the conjugate matrix is written Ā, and is the component-wise
conjugate of A. The identity ĀB̄ = AB holds. The conjugate transpose (or Hermitian
adjoint) of A is A∗ = ĀT . As with the transpose operation, (A∗ )∗ = A and (AB)∗ = B∗ A∗
where the product is permitted. This generally holds for arbitrary products, that is
(ABC)∗ = (BC)∗ A∗ = C ∗ B∗ A∗ , and so on. For A ∈ Mm,n (R), we have A = Ā and A∗ =
AT , so the conjugate transpose may be used in place of the transpose operation when
matrices are real valued. We always may write (A + B)∗ = A∗ + B∗ and (A + B)T =
AT + BT where dimensions permit.
A matrix A ∈ Mn (C) is diagonal if the only nonzero elements are on the diag-
onal, and can therefore be referred to by the diagonal elements diag(a1 , . . . , an ) =
diag(A1,1 , . . . , An,n ). A diagonal matrix is positive diagonal or nonnegative diagonal if
all diagonal elements are positive or nonegative.
The identity matrix I ∈ Mm is the matrix uniquely possessing the property that
A = IA = AI for all A ∈ Mm . For Mm (C), I is diagonal, with diagonal entries equal to 1.
For any matrix A ∈ Mm there exists at most one matrix A−1 ∈ Mm for which AA−1 = I,
referred to as the inverse of A. An inverse need not exist (for example, if the elements
of A are constant).
The inner product (or scalar product) of two vectors x, y ∈ Cn is defined as &x, y' =
∗
y x (a more general definition of the
inner product is given in Definition 6.13). For any
x ∈ Cn we have &x, x' = i x̄i xi = i |xi |2 , so that &x, x' is a nonnegative real number,
and &x, x' = 0 if and only if x = 0. The magnitude, or norm, of a vector may be taken
as %x% = (&x, x') (a formal definition of a norm is given in Definition 6.6).
1/2
where Ai,j ∈ Mm−1 (C) is the matrix obtained by deleting the ith row and jth column
of A. Note that in the respective expressions any j or i may be chosen, yielding
the same number, although the choice may have implications for computational
efficiency. As is well known, for A ∈ M1 (C) we have det(A) = A1,1 and for A ∈ M2
we have det(A) = A1,1 A2,2 − A1,2 A2,1 . In general, det(AT ) = det(A), det(A∗ ) = det(A),
det(AB) = det(A) det(B), det(I) = 1 which implies det(A−1 ) = det(A)−1 when the inverse
exists.
A large class of algorithms are associated with the problem of determining a solu-
tion x ∈ Km to the linear systems of equations Ax = b for some fixed A ∈ Mm and b ∈ Km .
Theorem 2.5 The following statements are equivalent for A ∈ Mm (C), and a matrix
satisfying any one is referred to as nonsingular, any other matrix in Mm (C) singular:
Ax = λx, (2.13)
Real analysis and linear algebra 19
then λ is an eigenvalue
and if the pair (λ, x) is a solution to this equation for which x = 0,
of A and x is an associated eigenvector of λ. Any such solution (λ, x) may be called an
eigenpair. Clearly, if x is an eigenvector, so is any nonzero scalar multiple. Let Rλ be
the set of all eigenvectors x associated with λ. If x, y ∈ Rλ then ax + by ∈ Rλ , so that Rλ
is a vector space. The dimension of Rλ is known as the geometric multiplicity of λ. We
may refer to Rλ as an eigenspace (or eigenmanifold). In general, the spectral properties
of a matrix are those pertaining to the set of eigenvalues and eigenvectors.
If A ∈ Mn (R), and λ is an eigenvalue, then so is λ̄, with associated eigenvectors
Rλ̄ = R̄λ . Thus, in this case eigenvalues and eigenvectors occur in conjugate pairs.
Simlarly, if λ is real there exists a real associated eigenvector.
The eigenvalue equation may be written (A − λI)x = 0. However, by Theorem 2.5
this has a nonzero solution if and only if A − λI is singular, which occurs if and only if
pA (λ) = det(A − λI) = 0. By construction of a determinant, pA (λ) is an order n polyno-
mial in λ, known as the characteristic polynomial of A. The set of all eigenvalues of A
is equivalent to the set of solutions to the characteristic equation pA (λ) = 0 (including
complex roots). The multiplicity of an eigenvalue λ as a root of pA (λ) is referred to as its
algebraic multiplicity. A simple eigenvalue has algebraic multiplicity 1. The geometric
multiplicity of an eigenvalue can be less, but never more, than the algebraic multiplic-
ity. A matrix with equal algebraic and geometric multiplicities for each eigenvalue is a
nondefective matrix, and is otherwise a defective matrix.
We therefore denote the set of all eigenvalues as σ(A). An important fact is that
σ(Ak ) consists exactly of the eigenvalues σ(A) raised to the kth power, since if (λ, x)
solves Ax = λx, then A2 x = Aλx = λAx = λ2 x, and so on. A quantity of particular
importance is the spectral radius ρ(A) = max{|λ| | λ ∈ σ(A)}. There is sometimes interest
in ordering the eigenvalues by magnitude. If there exists an eigenvalue λ1 = ρ(A), this
is sometimes referred to as the principal eigenvalue, and any associated eigenvector is
a principal eigenvector.
In addition we have the following theorem:
Theorem 2.6 Suppose A, B ∈ Mn , and |A| ≤ B, where |A| is the element-wise absolute
value of A. Then ρ(A) ≤ ρ(|A|) ≤ ρ(B).
In addition, if all elements of A ∈ Mn (R) are nonnegative, then ρ(A ) ≤ ρ(A) for
any principal submatrix A .
Proof See Theorem 8.1.18 of Horn and Johnson (1985). ///
AV = V. (2.14)
A = VV −1 , (2.15)
x∗ A = λx∗ , (2.16)
(note that some conventions do not explicitly refer to complex conjugates x∗ in (2.16)).
This similarly leads to the equation x∗ (A − λI) = 0, which by an argument identical to
that used for right eigenvectors, has nonzero solutions if and only if pA (λ) = 0, giving
the same set of eigenvalues as those defined by (2.13). There is therefore no need to
distinguish between ‘right’ and ‘left’ eigenvalues. Then, fixing eigenvalue λ we may
refer to the left eigenspace Lλ as the set of solution x to (2.16) (in which case, Rλ now
becomes the right eigenspace of λ).
The essential relationship between the eigenspaces is summarized in the following
theorem:
Proof Proofs may be found in, for example, Chapter 1 of Horn and Johnson
(1985). ///
V −1 A = V −1 .
Just as the column vectors of V are right eigenvectors, we can set U ∗ = V −1 , in which
case the ith column vector υi of U is a solution x to the left eigenvector equation (2.16)
corresponding to eigenvalue λi (the ith element on the diagonal of ). This gives the
diagonalization
A = VU ∗ .
Real analysis and linear algebra 21
n
Am = Vm U ∗ = λm ∗
i νi υ i . (2.17)
i=1
The apparent recipe for a spectral decomposition is to first determine the roots
of the characteristic polynomial, and then to solve each resulting eigenvalue equa-
tion (2.13) after substituting an eigenvalue. This seemingly straightforward procedure
proves to be of little practical use in all but the simplest cases, and spectral decompo-
sitions are often difficult to construct using any method. However, a complete spectral
decomposition need not be the objective. First, it may not even exist for many other-
wise interesting models. Second, there are many important problems related to A
that can be solved using spectral theory, but without the need for a complete spectral
decomposition. For example:
Basic spectral theory relies on the identification of special matrix forms which
impose specific properties on a the spectrum. We next discuss two cases.
Theorem 2.9 A matrix A ∈ Mn (C) is Hermitian if and only if there exists a unitary
matrix U and real diagonal matrix for which A = UU ∗ .
A matrix A ∈ Mn (R) is symmetric if and only if there exists a real orthogonal Q
and real diagonal matrix for which A = QQT .
Clearly, the matrices and U may be identified with the eigenvalues and eigen-
vectors of A, with n eignevalue equation solutions given by the respect columns of
AU = UU ∗ U = U. An important implication of this is that all eigenvalues of a
Hermitian matrix are real, and eigenvectors may be selected to be orthonormal.
If we interpet x ∈ Cn as a column vector x ∈ Mn,1 we have quadratic form x∗ Ax,
which is interpretable either as a 1 × 1 complex matrix, or as a scalar in C, as is
convenient.
If A is Hermitian, then (x∗ Ax)∗ = x∗ A∗ x = x∗ Ax. This means if z = x∗ Ax ∈ C, then
z = z̄, equivalently x∗ Ax ∈ R. A Hermitian matrix A is positive definite if and only
If instead x∗ Ax ≥ 0 then A is positive semidefinite. A non-
if x∗ Ax > 0 for all x = 0.
symmetric matrix satisfying xT Ax > 0 can be replaced by A = (A + AT )/2, which is
symmetric, and also satisfies xT A x > 0.
22 Approximate iterative algorithms
If A is positive definite then λmin > 0. In addition, since the eigenvalues of A2 are the
squares of the eigenvalues of A, and since for a Hermitian matrix A∗ = A, we may also
conclude
Theorem 2.11 If A ∈ Mn (R) is irreducible, then each column and row must contain
at least 1 nondiagonal nonzero element.
Proof Suppose all nondiagonal elements of row i of matrix A ∈ Mn (R) are 0. After
relabeling i as n, there exists a 1 × (n − 1) block of 0’s conforming to (2.18). Similarly,
if all nondiagonal elements of column j are 0, relabeling j as 1 yields a similar block
of 0’s. ///
Theorem 2.12 For a nonnegative matrix A ∈ Mn (R) the following statements are
equivalent:
(i) A is irreducible,
(ii) The matrix (I + A)n−1 is positive.
(iii) For each pair i, j there exists k for which [Ak ]i,j > 0.
m
n
m
ρ(A)−1 A = ρ(A)−1 λi νi υi∗ .
i=1
To fix ideas, suppose A is primitive. By Theorem 2.13 there exists a unique principal
eigenvalue, say λ1 = ρ(A), and any other eigenvalue satisfies |λj | < ρ(A). Then
m m
ρ(A)−1 A = ν1 υ1∗ + O mm2 −1 ρ(A)−1 |λSLEM | , (2.19)
where λSLEM is the second largest eigenvalue in magnitude and m2 is the algebraic
multiplicity of λSLEM , that is, any eigenvalue other than λ1 (not necessarily unique)
maximizing |λj |. Since |λSLEM | < ρ(A) we have limit
m
lim ρ(A)−1 A = ν1 υ1∗ , (2.20)
m→∞
where ν1 , υ1 are the principal right and left eigenvectors, with convergence at a geo-
m
metric rate O ρ(A)−1 |λSLEM | . For this reason, the quantity |λSLEM | is often of
considerable interest. Note that in this representation, the normalization &νi , υi ' = 1 is
implicit.
However, existence of the limit (2.20) for primitive matrices does not depend on
the diagonalizability of A, and is a direct consequence of Theorem 2.13. When A is
irreducible, the limit (2.20) need not exist, but a weaker statement involving asymptotic
averages will hold. These conclusions are summarized in the following theorem:
Theorem 2.14 Suppose nonegative matrix A ∈ Mn (R) is irreducibile. Let ν1 , υ1 be
the principal right and left eigenvectors, normalized so that &ν1 , υ1 ' = 1. Then
N
m
lim N −1 ρ(A)−1 A = ν1 υ1∗ . (2.21)
N→∞
m=1
A version of (2.21) is available for nonnegative matrices which are not necessarily
irreducible, but which satisfy certain other regularity conditions (Theorem 8.6.2, Horn
and Johnson (1985)).
Thus, for a primitive matrix A all pairs of nodes in G(A) communicate, and in
addition there exists k such that for any ordered pair of nodes i, j there exists a path
from i to j of any length k ≥ k .
Any irreducible matrix with positive diagonal elements is also primitive:
Theorem 2.16 If A ∈ Mn (R) is an irreducible matrix with positive diagonal elements,
then A is also a primitve matrix.
Proof Let i, j be any ordered pair of nodes in G(A). There exists at least one path
from i to j. Suppose one of these paths has length k. Since, by hypothesis, Aj,j > 0 the
edge (j, j) in included in G(A), and can be appended to any path ending at j. This means
there also exists a path of length k + 1 from i to j. The proof is completed by noting
that there must be some finite k such that any two nodes may be joined by a path of
length no greater than k , in which case Ak > 0. ///
A matrix can be irreducible but not primitive. For example, if the nodes of G(A)
can be partitioned into subsets V1 , V2 such that all edges (i, j) are formed by nodes
from distinct subsets, then A cannot be primitive. To see this, suppose i, j ∈ V1 . Then
any path from i to j must be of even length, so that the conclusion of Theorem 2.15
cannot hold. However, if G(A) includes all edges not ruled out by this restriction, it is
easily seen that A is irreducible.
26 Approximate iterative algorithms
Measure theory provides a rigorous mathematical foundation for the study of, among
other things, integration and probability theory. The study of stochastic processes,
and of related control problems, can proceed some distance without reference to mea-
sure theoretic ideas. However, certain issues cannot be resolved fully without it, for
example, the very existence of an optimal control in general models. In addition, if we
wish to develop models which do not assume that all random quantities are stochasti-
cally independent, which we sooner or later must, the theory of martingale processes
becomes indepensible, an understanding of which is greatly aided by a familiarity
with measure theoretic ideas. Above all, foundational ideas of measure theory will be
required for the function analytic construction of iterative algorithms.
The sets O are called open sets. Any complement of an open set is a closed set. They
need not conform to the common understanding of an open set, since the power set
P() (that is, the set of all possible subsets) satisfies the definition of a topological
space. However, the class of open sets in (−∞, ∞) as usually understood does satisfy
the definition of a topological space, so the term ‘open’ is a useful analogy.
A certain flexibility of notation is possible. We may explicitly write the topological
space as (, O). When it is not necessary to refer to specific properties of the topology
O, we can simply refer to alone as a topological space. In this case an open set O ⊂
is understood to be an element of some topology O on .
Topological spaces allow a definition of convergence and continuity:
Theorem 3.1 A class of subsets G of is a base for some topology if and only if
the following two conditions hold (i) every point x ∈ is in at least one G ∈ G; (ii) if
x ∈ G1 ∩ G2 for G1 , G2 ∈ G then there exists G3 ∈ G for which x ∈ G3 ⊂ G1 ∩ G2 .
The proof of Theorem 3.1 can be found in, for example, Kolmogorov and Fomin
(1970) (Chapter 3 of this reference can be recommended for this topic).
Definition 3.4 A sequence {xn } in a metric space (X, d) is a Cauchy sequence if for
any > 0 there exists N such that d(xn , xm ) < for all n, m ≥ N. A metric space is
complete if all Cauchy sequences converge to a limit in X.
Generally any metric space can always be completed by extending X to include all
limits of Cauchy sequences (see Royden (1968), Section 5.4).
Definition 3.5 Given metric space (X, d), we say x ∈ X is a point of closure of E ⊂ X
if it is a limit of a sequence contained entirely in E. In addition, the closure Ē of E is
set of all points of closure of E. We say A is a dense subset of B if A ⊂ B and Ā = B.
Theorem 3.2 The class of all open balls of a metric space (X, d) is the base of a
topology.
30 Approximate iterative algorithms
Proof We make use of Theorem 3.1. We always have x ∈ Bδ (x), so condition (i) holds.
Next, suppose x ∈ Bδ1 (y1 ) ∩ Bδ2 (y2 ). The for some > 0 we have d(x, y1 ) < δ1 − and
d(x, y2 ) < δ2 − . Then by the triangle inequality x ∈ B (x) ⊂ Bδ1 (y1 ) ∩ Bδ2 (y2 ), which
completes the proof. ///
A topology on a metric space generated by the open balls is referred to as the metric
topology, which always exists by Theorem 3.2. For this reason, every metric space can
be regarded as a topological space. We adopt this convention, with the understanding
that the topology being referred to is the metric topology. We then say a topological
space (, O) is metrizable (completely metrizable) if it is homeomorphic to a metric
space (complete metric space), in which case there exists a metric which induces the
topology O. This generalizes the notion of a metric space. Homeomorphisms form an
equivalence class, and metrics are equivalent if they induce the same topolgy.
Additional concepts of continuity exist for mappings f : X → Y between metric
spaces (X , dx ) and (Y, dy ). We say f is uniformly continuous if for every > 0 there
exists δ > 0 such that dx (x1 , x2 ) < δ implies dy (f (x1 ), f (x2 )) < . A family of functions F
mapping X to Y is equicontinuous at x0 ∈ X if for every > 0 there exists δ > 0 such
that for any x ∈ X satisfying dx (x0 , x) < δ we have supf ∈F dy (f (x0 ), f (x)) < . We say F
is equicontinuous if it is equicontinuous at all x0 ∈ X .
Theorem 3.3 (Heine-Borel Theorem) In the metric topology of Rm a set S is
compact if and only if it is closed and bounded.
exists. Then P∗ defines a randomly chosen integer X about which we can say
P(X is divisible by 7) = 1/7 or P(X is a square number) = 0. But we are also assum-
ing that each integer i has equal probability pi = α. If we extend P in the way we
proposed, we would end up with P() equalling 0 or ∞, whereas the probability that
the outcome is in can only be 1. Similarly, it is possible to partition the unit interval
into a countable number of uncountably denumerable sets E1 which are each modulo
translations of a one member. Therefore, if we attempt to impose a uniform proba-
bility on the unit interval, we would require that P(E) for each E ∈ E has the same
probability, and we would similarly be forced to conclude that P() equals 0 or ∞.
Both of these examples are the same in the sense that some principle of uniformity
forces us to assign a common probability to an infinite number of disjoint outcome.
As we will next show, the solution to these problems differs somewhat for count-
able and uncountable . For countable , the object will be to extend P fully to P(),
and the method for doing so will explicitly rule out examples such as the randomly cho-
sen integer, by insisting at the start that i∈ pi = 1. It could be, and has been (Dubins
and Savage (1976)), argued that this type of restriction (formally known as countable
additivity, see below) is not really needed. It essentially forces P to be continuous in
some sense, which might not be an essential requirement for a given application. We
could have a perfectly satisfactory definition of a randomly chosen positive integer by
restricting our definition to a subset of P(), as we have done. In fact, this is precisely
how we deal with uncountable , by first devising a rule for calculating P(E) for inter-
vals E, then extending P to sets which may be constructed from a countable number
of set operations on the intervals, better known as the Borel sets (see below for formal
definition). The final step adds all subsets of all Borel sets of probability zero. This
class of sets is considerably smaller that P() for uncountable , and means that a
probability set function is really no more complex an object than a function on .
(i) ∈ F,
(ii) if E ∈ then Ec ∈ ,
(iii) if E1 , E2 ∈ then E1 ∪ E2 ∈ .
(iv) if E1 , E2 , . . . ∈ then ∪i Ei ∈ .
Example 3.1 Let F0 be a class of sets consisting of = (∞, ∞), and all finite unions
of intervals (a, b], including (∞, b] and ∅ = (b, b]. This class of sets is closed under
finite union and complementation, and so is a field on . Then σ(F0 ) is the σ-field
consisting of all intervals, and all sets obtainable from countably many set operations
on intervals. Note that σ(F0 ) could be equivalently defined as the smallest σ-field
containing all intervals in , or all closed bounded intervals, all open sets, all sets
(∞, b], and so on.
We next define a measure:
Definition 3.7 A set function µ : F → R̄+ , where F is a σ-field on , is a measure if
µ(∅) = 0 and if it is countably
additive, that is for any countable collection of disjoint
sets E1 , E2 , . . . we have i µ(Ei ) = µ (∪i Ei ). If F is a field, then µ is called a measure
if countable additivity holds whenever ∪i Ei ∈ F.
If Definition 3.7 did not require that µ(∅) = 0, then it would hold for µ ≡ ∞.
However, that µ(∅) = 0 for any other measure would follow from countable additivity,
since we would have µ(E ) < ∞ for some E , and µ(E ) = µ(E ) + µ(∅).
Background – measure theory 33
Proof (i) Write the disjoint union B = A ∪ (B − A), then µ(B) = µ(A) + µ(B − A).
(ii) Write the disjoint unions A = (A − B) ∪ AB, B = (B − A) ∪ AB, A ∪ B = (A − B) ∪
(B − A) ∪ AB, then apply additivity. (iii) We write D1 = E1 , Di = Ei − Ei−1 for i ≥ 2.
j
The sequence D1 , D2 , . . . is disjoint, with Ei = ∪i=1 Dj and E = ∪i Di . So, by countable
additivity we have µ(E) = µ( ∪i Di ) = i µ(Di ) = limi ij=1 µ(Dj ) = limi µ(Ei ). Then
(v) follows after setting Ei = ∪ij=1 Aj and applying (iii). Finally, (iv) and (vi) follow
by expressing a decreasing sequence as an increasing sequence of the complements,
then applying (iii) and (iv). ///
of zero to any subset of a null set, since, if it was assigned a measure, it could only be 0
under the axioms of a measure. However, the definition of a measure space (, F, µ)
does not force F to contain all subsets of null sets, and counterexamples can be readily
constructed. Accordingly, we offer the following definition:
(i) λ(∅) = 0,
(ii) A ⊂ B ⇒ λ(A) ≤ λ(B),
∞
(iii) A ⊂ ∪∞i=1 Ai ⇒ λ(A) ≤ i=1 λ(Ai ).
Theorem 3.5 Given an outer measure λ on , any set E for which λ(E) = 0 is
λ-measurable.
Proof Suppose A ⊂ and λ(E) = 0. By monotonicity 0 ≤ λ(AE) ≤ λ(E) = 0 and
λ(A) ≥ λ(AEc ), so that Definition 3.10 holds. ///
Many authors reserve a distinct symbol for a set function restricted to a class of
subsets. Theorem 3.6 then describes a measure space (, B, λB ) where λB is λ restricted
to B.
Background – measure theory 35
A σ-field is a field, which is a semifield. The latter is a quite intuitive object. The
set of right closed intervals in R, including (−∞, a] and (a, ∞) and ∅ is a semifield,
which is easily extended into Rn .
If A is a semifield, then the class of subsets F0 consisting of ∅ and all finite disjoint
unions of sets in A can be shown to be a field, in particular, the field generated by
semifield A.
Theorem 3.7 Suppose A is a semifield on and F0 is the field generated by A. Let
µ be a nonnegative set function on A satisfying the following conditions:
A topology is also a π-system, so that any measures which agree on the open sets
must agree on the Borel sets by Theorem 3.10. The intervals (a, b], with ∅, also form
a π-system.
we may have the need to perform algebraic operations on them, and it will prove quite
useful to consider vector spaces of measures. In this case, an operation involving two
measures such as µ1 + µ2 would result in a new measure, say ν = µ1 + µ2 . To be sure,
ν could be evaluated by addition ν(E) = µ1 (E) + µ2 (E) in R for any measurable set E,
but it is an entirely new measure. Subtraction seems just as reasonable, and we can
define a set function by the evaluation ν(E) = µ1 (E) − µ2 (E), represented algebraically
as ν = µ1 − µ2 . Of course, ν(E) might be negative, but we would expect it to share the
essential properties of a measure.
Accordingly, Definition 3.7 can be extended to set functions admitting negative
values.
Definition 3.15 A set function µ : F → R̄, where F is a σ-field on , is a signed
measure if µ(∅) = 0 and if it is countably
additive, that is for any countable collection
of disjoint sets E1 , E2 , . . . we have i µ(Ei ) = µ ∪j Ej , where the summation is either
absolutely convergent or properly divergent.
This definition does not appear to differ significantly from Definition 3.7, but the
possibility of negative values introduces some new issues. For example, suppose we
wish to modify Lebesgue measure m on R by assigning negative measure below 0,
that is:
ms (E) = −m(E ∩ (−∞, 0)) + m(E ∩ [0, ∞)).
We must be able to assign a measure m((∞, −∞)), which by symmetry should be 0.
However, countable additivity fails for the subsets (i − 1, i], i ∈ I, since the implied
summation is not absolutely convergent.
When signed measures are admitted, the notion of a positive measure must be
clarified. It is possible, for example, to have µ(A) ≥ 0, with µ(B) < 0 for some B ⊂ A.
Accordingly, we say a measurable set A is positive if µ(B) ≥ 0 for all measurable B ⊂ A.
A set is negative if it is positive for −µ. A measure on (, F) is positive (negative) if
is a positive (negative) set. A set is a null set if it is both positive and negative.
The monotonicity property does not hold for signed-measures. If A is positive and
B is (strictly) negative, then we have µ(A ∪ B) < µ(A). If µ is a positive measure on
(, F) then µ() < ∞ forces all measurable sets to be of finite measure. Similarly, a
signed measure is finite if all measurable sets are of finite measure. In fact, to define
a signed measure as finite, it suffices to assume µ() is finite. Otherwise, suppose for
some E ∈ F we have µ(E) = ∞. Definition 3.15 precludes assignment of a measure to
Ec ∈ F. The definition of the σ-finite property is the same for signed measures as for
positive measures.
include our singular example. This poses no particular problem, since this probability
measure is easily described by P(E) = I{1/2 ∈ E} for all Borel sets E.
To clarify this issue, we introduce a few definitions.
Definition 3.16 Let ν and µ be two measures on M = (, F). If µ(E) = 0 ⇒ ν(E) = 0
for all E ∈ F, then ν is absolutely continuous with respect to µ. This is written ν ) µ,
and we also say ν is dominated by µ. If ν ) µ and µ ) ν then ν and µ are equivalent.
Conversely ν and µ are singular if there exists E ∈ F for which ν(E) = µ(Ec ) = 0, also
written ν ⊥ µ.
We have noted that signed measures arise naturally as differences of positive mea-
sures. It turns out that any signed measure can be uniquely represented this way. This
is a consequence of the Jordan-Hahn Decomposition Theorem.
−1 −1
−1 −1
−1 isc easily proven by noting that f (A ∪ B) = f (A) ∪
The following theorem
f (B) and f (A ) = f (A) .
c
Theorem 3.13 If f maps a measurable space (, F) to range X then the collection
FX of sets E ⊂ X for which f −1 (E) ∈ F is a σ-field.
By Theorem 3.13 the class of subsets E ⊂ R for which f −1 (E) ∈ F is a σ-field, and by
assumption it contains all intervals (∞, α], and so also contains the Borel sets (since
this is the smallest σ-field containing these intervals). Of course, <, > or ≥ could
replace ≤ in the inequalities of (3.2). We therefore say a real-valued funtction f is
Borel measurable, or Lebesgue measurable, if F are the Borel sets, or the Lebesgue
sets. Similarly, measurablility of a mapping from a measurable space (, F) to Rn will
be defined wrt the Borel sets on Rn .
Nonmeasurable mappings usually exist, and are easily constructed using indicator
functions of nonmeasurable sets.
We note that composition preserves measurability.
Theorem 3.14 If f , g are measurable mappings from (1 , F1 ) to (2 , F2 ), and from
(2 , F2 ) to (3 , F3 ) respectively, then the composition g ◦ f is a measurable mapping
from (1 , F1 ) to (3 , F3 ).
Note that Theorem 3.14 does not state that compound mappings of Lebesgue
measurable mappings are Lebesgue measurable, since only preimages of Borel sets
(and not Lebesgue sets, a strictly larger class) need be Lebesgue measurable.
If X is a topological space (usually a metric space), then F(X ) will be the set of
mappings f : X → R which are measurable with respect to the Borel sets on X and R.
"Liina!"
Loppulause.
"Rakas Liina, elä unhota sitä, mitä jo ennen olen sinulle sanonut,
että sinä, kun olet minun tukenani ollut, olet myös kansallesi paljon
hyvää tehnyt. Mies parkoja, joilta puuttuu sellaiset tukeet! Milloinka
alkavat myös naisemme vapautua orjuuden ikeestä? Oi, Liina, puhu
tovereillesi, huuda, ehkä kuulee vielä joku heistä, jotka oman
kansansa keskuudesta ovat sekaantuneet saksalaisiin."
"Mutta nyt olen kaikki kertonut. Vatsa vaatii ruokaa ja juusto ja voi
pöydällä näyttävät niin hyviltä, että niitä täytyy ruveta syömään",
sanoi armas Jaanini. Minä istuin hänen viereensä pöydän ääreen ja
lapset meitä vastapäätä toiselle puolelle ja rupesimme syömään,
juuri kuin aurinko meni mailleen.
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.