Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Algorithmic Information Theory: G. J. Chaitin

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

G. J.

Chaitin

Algorithmic Information Theory

Abstract: This paper reviews algorithmic information theory, which is an attempt to apply information-theoretic and probabilistic
ideas to recursive function theory. Typical concerns in this approach are, for example, the number of bits of information required to
specify an algorithm, or the probability that a program whose bits are chosenby coin flipping produces a given output. During the past
few years the definitions of algorithmic information theory have been reformulated. The basic features of the new formalism are pre-
sented here and certain resultsof R. M . Solovay are reported.

Historical introduction
To our knowledge, the first publication of the ideas of sal machines, the induced structures cannot be desper-
algorithmicinformation theory was the description of atelydifferent.We appeal to the ‘translationtheorem’
R. J . Solomonoff’s ideas given in 1962 by M. L. Minsky wherebyanarbitrary instructionformula forone ma-
in his paper, “Problemsof formulation for artificial intelli- chine may be converted into an equivalent instruction
gence” [ 11 : formula for the other machine by the addition of a con-
“Consider a slightly different form of inductive infer- stant prefix text. This text instructs the second machine
ence problem. Suppose that we are givenavery long to simulate the behavior of the first machine in operating
‘data’ sequence of symbols; the problem is to makea on the remainder of the input text. Then for datastrings
prediction about the future of the sequence. This is a much larger than this translation text (and its inverse)
problem familiar in discussionconcerning‘inductive thechoicebetweenthetwo machines cannot greatly
probability.’ The problem is refreshed a little, perhaps, affect the induced structure. It would be interesting to
by introducing the modern notion of universal computer see if these intuitive notions could be profitably formal-
and its associated language of instruction formulas. An ized.
instruction sequence will be considered acceptable if it “Even if this theory can be worked out, it is likely that
causesthecomputertoproduce a sequence,perhaps it will present overwhelming computational difficulties in
infinite, that begins with the given finite ‘data’ sequence. application. The recognition problemfor minimal descrip-
Each acceptable instruction sequence thus makes a pre- tions is, in general, unsolvable, and a practical induction
diction, and Occam’s razor would choose the simplest machine will havetouse heuristic methods. [In this
such sequence and advocate its prediction. (More gener- connection it would be interesting to write a program to
ally, one could weight the different predictions by play R. Abbott’s inductive card game [ 21 .] ”
weights associated with the simplicities of the Algorithmic information theory originated in the inde-
instructions.) If the simplicity function is just the length pendent work of Solomonoff (see [ 1, 3 -61 1 , of A. N.
of the instruction, we are then trying to find a minimal Kolmogorov and P. Martin-Lof (see [7 - 141), and of
description, i.e., an optimally efficient encoding of the G. J. Chaitin (see [ 15-261).Whereas Solomonoff
data sequence. weighted together all the programs for a given result into
“Such an induction method could be of interest only if a probability measure, Kolmogorov and Chaitin concen-
one could show some significant invariance with respect trated their attention on the size of the smallest program.
to choice of defining universal machine. There is no such Recently it hasbeenrealized by Chaitin and indepen-
invariance for a fixed pair of data strings. For one could dently by L. A. Levin that if programs are stipulated to
design a machine which would yield the entire first string be self-delimiting, thesetwo differing approaches be-
with a very small input, and the second string only for come essentially equivalent. This paper attempts to cast
some very complex input. On the brighter side, one can into a unified scheme the recent work in this area by
see that in a sense the induced structure on the space of Chaitin [23, 241 and by R. M.Solovay [27, 281. The
data strings has some invariance in an ‘in the large’ or reader may also find it interesting to examine the parallel
350 ‘almost everywhere’ sense.
Giventwo different
univer- efforts of Levin (see [ 29 - 351 ) . There has been a sub-

G. J . CHAITlN IBM J . RES. DEVELOP.


stantial amount of other work in this general area, often some contents of its program tape M eventually halts
involving variants of the definitions deemed more suit- with s written on the output tape after reading precisely
able for particular applications (see, e.g., [ 36-47]). n squares of the program tape; i.e., I(s) is the size of a
minimal program for s. To summarize, P is the probabili-
Algorithmic information theory of finite computations tythat M calculates s given arandom program, H is
~ 3 1 -log,P, and I is the minimum number of bits required to
specify an algorithm for M to calculate s.
Definitions It is important to note that blanks are not allowed on
Let us start by considering a class of Turing machines the program tape, which is imagined to be entirely filled
with the following characteristics. Each Turing machine with 0’s and 1’s. Thus programs arenot followed by
has three tapes:a program tape, a work tape, and an endmarker blanks. This forces them to be self-delimit-
output tape. There is a scanning head on each of the ing; a program must indicate within itself what size it
three tapes. The program tape is read-only and each of has. Thus no program can be a prefix of another one,
its squares contains a 0 or a 1. It may be shifted in only and the programs for M form what is known as a prefix-
one direction. The work tape may be shifted in either free set or an instantaneous code. This has two very
direction and may be read and erased, and each of its important effects: It enables a natural probability distri-
squares contains a blank, a 0, or a 1. The work tape is bution to be defined on the setof programs, and it makes
initially blank. The output tape may be shifted in only it possible for programs to be built up from subroutines
one direction. Its squares are initially blank, may have a by concatenation. Both of these desirable features are
0, a 1, or a comma written on them, and cannot be re- lost if blanks are used as program endmarkers.This
written. Each Turing machine of this type has a finite occurs because there is no natural probability distribu-
numbern of states, andis defined by an n X 3 table, tion on programs with endmarkers;one, of course,
which gives theactiontobe performedand the next makes all programs of the same size equiprobable, but it
state as a function of the current state and the contents is also necessary to specify in some arbitrary manner the
of the square of the work tape that is currently being probability of each particular size. Moreover, if two sub-
scanned. The first state in this table is by convention the routines with blanks as endmarkers are concatenated,it
initial state. There are eleven possible actions: halt, shift is necessary to include additional information indicating
worktapeleft/right, write blank/O/l on worktape, where the first one ends and the second begins.
read square of program tape currently being scanned and Here is an example of a specific Turing machine M of
copy onto square of work tape currently being scanned the above type. M counts the number n of 0’s up to the
and then shift program tape, write O / 1/comma on out- first 1 it encounters on its program tape, then transcribes
put tape and then shift output tape, and consult oracle. the next n bits of the program tape onto the output tape,
The oracle is included for the purpose of defining rela- and finally halts. So M outputs s iff it finds length(s) 0’s
tive concepts. It enables the Turing machine to choose followed by a 1 followed by s on its program tape. Thus
between two possible state transitions,dependingon +
P ( s ) = exp,(-2 length(s) - l ) , H ( s ) = 2 length(s) 1,
whether or not the binary string currently being scanned +
and Z(s) = 2 length(s) 1. Here exp,(x) is the base-two
on the work tape is in a certain set, which for now we exponentialfunction 2”. Clearly this is a veryspecial-
shall take to be the null set. purpose computer which embodies a very limited class
From each Turingmachine M of this type we define a of algorithmsand yields uninterestingfunctions P , H ,
probability P, an entropy H , and a complexity I . P ( s ) is and I .
the probability that M eventually halts with the string s On the otherhand it is easy to see that there are “gen-
written on its output tape if each square of the program eral-purpose’’ Turingmachines that maximize P and
tape is filled with a 0 or a 1 by a separate toss of an un- minimize H and I ; in fact, consider those universal Tur-
biased coin. By “string” we shall always mean a finite ing machineswhich will simulate an arbitraryTuring
binary string. From the probability P ( s ) we obtain the machine if a suitable prefix indicating the machine to
entropy H (s) by taking the negative base-twologarithm, simulate is added to its programs. Such Turing machines
i.e., H ( s ) is -log,P(s). A string p is said to be a pro- yield essentially thesame P , H , and I . We therefore
gram if when it is written on M s program tape and M pick, somewhat arbitrarily, a particular one of these, U ,
starts computing scanning the first bit of p, then M even- and the definitive definition of P, H , and I is given in
tually halts after reading all of p and without reading any terms of it. The universal Turing machine U works as
other squares of the tape. A program p is said to b e a follows. If U finds i 0’s followed by a 1 on its program
minimal program if no other program makes M produce tape, itsimulates the computation that the ith Turing
the same output and has a smaller size. And finally the machine of the above type performs upon reading the
complexity I ( s ) is defined to be the least n such that for remainder of the program tape. By the ith Turing ma- 351

JULY 1977 ALGORITHMIC INFORMATIONTHEORY


chine 'we mean the one that comes ith in a list of all pos- not less than the product of the probabilities of comput-
sible defining tables in which the tables are ordered by ing s and t , and follows from the fact that programs are
size (i.e., number of states) and lexicographically among self-delimiting and can be concatenated. The inequality
those of the same size. With this choice of Turing ma-
chine, P , H , and I can be dignified with the following ti-
O(1) 5 H(tls) 5 H ( t ) + O(1)
tles: P ( s ) is the algorithmic probability of s, H ( s ) is the is merely a restatement of the previous two properties.
ulgorithrnic entropy of s, and Z(s) is the ulgorithmic in&w- However, in view of the directrelationship between
mution of s. Following Solomonoff [3], P ( s ) and H ( s ) conditional entropyand relativecomplexityindicated
may also be called the a priori probability and entropy below, this inequality also states that being told some-
of s. Z(s) may also be termed the descriptive, program- thing by an oracle cannot make it more difficult to obtain
size, or information-theoretic complexity of s. And since t. The relationship between entropy and complexity is
P is maximal and H and I are minimal, the above choice
of special-purposeTuringmachine showsthat P ( s ) 1
H(s)=I(s) + O(1),
exp,(-2 length(s) - O ( l ) ) , H ( s ) 5 2 length(.s) + i.e., the probability of computing s is essentially the
+
O ( 1 ) , and I ( s ) 5 2 length(s) O ( 1 ) . same as 1 /exp, (the size of a minimal program for s ) .
We have defined P ( s ) , H (s), and I ( s ) for individual This implies that a significant fraction of the probability
strings s. It is also convenient to consider computations of computing s is contributed by its minimal programs,
which produce finite sequences of strings. Theseare and that there arefew minimal or near-minimal programs
separated by commas on the output tape. One thus de- for a given result. The relationship between conditional
fines the joint probability P ( s , , . . ., s,), the joint entropy entropy and relative complexity is
H (s,,. . ., sn), and the jointcomplexity I ( sl,.. ., x,) of an
n-tuple s,, . . ., s,. Finally one defines the conditional
H(tls) =I,(t) + O(1).
probability P(t,, . . ., tmlsl,.. ., s,) of the m-tuple t,, . . ., t , Here Z , ( t ) denotes the complexity of t relative to a set
given the n-tuple st,. .*,s, to be the quotient of the joint having a single element which is a minimal program for
probability of the n-tuple andthe m-tupledivided by s. In other words,
the jointprobability of the n-tuple. In particular P ( t l s ) is
defined to be P ( s ,t ) / P ( s ). And of course theconditional
I(s,t)= I ( s ) + Z,(t) + O ( 1 ) .
entropy is defined to be the negative base-two logarithm This relation states that one obtains what is essentially a
of the conditional probability. Thus by definition H ( s , t ) minimal program for the pair s , t by concatenating the
=H(s) + H ( t i s ) . Finally, in order to extend the above following two subroutines:
definitions totupleswhosemembers may eitherbe
a minimal program for s
strings or natural numbers, we identify the natural num-
a minimal program for calculating t using an oracle for
ber n with its binary representation. the set consisting of a minimal program for s.
Basic relationships
We now review some basic properties of these concepts. Algorithmic rundonmess
The relation Consider an arbitrary string s of length n. From the fact
H(s,t)= H(t,s) + O (1 ) + +
that H ( n ) H ( s l n ) = H ( n , s ) = H ( s ) O ( I ) , it is cask
+ +
to show that H ( s ) 5 n H ( n ) O ( 1 ) , and that less t h u
states that the probability of computing the pair s, t is exp,(n-k+O(l))ofthesoflengthnsatisfyH(s) < n
essentially the same as the probability of computing the + H ( n ) - k . It follows that for most s of length n, H ( s )
pair t, s. This is true because there is a prefix that con- +
is approximately equal to n H (n). These are themost
verts any program for one of these pairs into a program complexstrings of length n, the ones which are most
for the other one. The inequality difficult to specify, the ones withhighest entropy, and
they are said to be the algorithmically random strings of
H ( s )5 H ( s , t ) + 0(1) length n. Thus a typical strings of length n will have H (s)
states that the probability of computing s is not less than +
close to n H ( n ) , whereas if s has pattern or can be
the probability of computing the pair s, t . This is true distinguished in some fashion, then it can be compressed
because a program for s can be obtained from any pro- orcodedinto aprogram that is considerablysmaller.
gram for the pair s, t by adding a fixed prefix to it. The +
That H (s)is usually n H ( n ) can be thought of as fol-
inequality lows: In order to specify a typical string s of length n, it
is necessary first to specify its size n, which requires H ( a )
H(s,t)5 H(s) + H ( t ) + 0 (I ) bits, and it is necessary then to specify each of the n
352 statesthatthe probability of computing the pair s, t is bits in s, which requires n more bits and brings the total

G. .I. CHAITIN IBM J . RES. UEVELOP.


+
to n H ( n ) . In probabilistic terms this can be stated as coin. Then it is not difficult to see that 0 is in fact an
follows: the sum of the probabilities of all the strings of algorithmically random real, because if one were given
length n is essentially equal to P ( n ) , and most strings s the first n bits of the dyadic expansion of R, then one
of length n have probability P (s) essentially equal to could use this to tell whether each program for U of size
P ( n ) / 2”. On the otherhand, one of the strings of length n less than n ever halts or not. In other words, when writ-
that is leastrandomandthathas most pattern is the ten in binary the probability of halting R is a random or
string consisting entirely of 0’s. It is easy to see thatthis incompressible infinite string. Thus the basic theorem of
+
string has entropy H ( n ) O ( 1 ) and probability essen- recursive function theorythatthe halting problem is
tially equal to P ( n ) , which is another way of saying that unsolvable corresponds in algorithmic information the-
almost all the information in it is in its length. Here is an ory to the theorem thatthe probability of halting is algo-
example in the middle: If p is a minimal program of size rithmically random if the program is chosen by coin flip-
n, then it is easy to see that H ( p ) = n + O ( 1 ) and P ( p ) ping.
is essentially 2-“. Finally it should be pointed out that This concludes our review of the most basic facts re-
+ +
since H ( s ) = H ( n ) H ( s l n ) 0 ( 1 ) if s is of length n, garding the probability, entropy, and complexity of finite
the above definition of randomness is equivalent to say- objects, namely strings and tuples of strings. Before pre-
ing that the most random strings of length n have H ( s i n ) senting some of Solovay’s remarkable results regarding
close to n, while the least randomoneshave H(sln) these concepts, and in particular regarding (4 we would
close to 0. like to review the most important facts which are known
Later we shall show that even though most strings are regarding the probability, entropy, andcomplexity of
algorithmically random, i.e., have nearly as much entro- infinite objects, namely recursively enumerable sets of
py as possible, an inherent limitation of formal axiomatic strings.
theories is that a lower bound n on the entropy of a spe-
cific string can be established only if n is less than the Algorithmic information theory of infinite
entropy of theaxioms of the formal theory.Inother computations [24]
words, it is possible to prove that a specific object is of In order to define the probability, entropy, and complex-
complexity greater than n only if n is less than the com- ity of r.e. (recursively enumerable) sets of strings it is
plexity of theaxioms being employed in the demon- necessary to consider unending computations performed
stration. These statements may be considered to be an on our standard universal Turing machine U . A compu-
information-theoretic version of Godel’sfamous in- tation is said to produce an r.e. set of strings A if all the
completeness theorem. members of A and only members of A are eventually
Now let us turn from finite random strings to infinite written on the output tape, each followed by a comma. It
ones, or equivalently,by invoking the correspondence is important that U not be required to halt if A is finite.
between a real number and its dyadic expansion, to ran- The members of the set A may be written in arbitrary
dom reals. Consider an infinite string X obtained by flip- order, and duplications are ignored. A technical point: If
ping an unbiased coin, or equivalently a real x uniformly there are only finitely many strings written on the output
distributed in the unit interval. From the preceding con- tape, and the last one is infinite or is not followed by a
siderations and the Borel-Cantelli lemma it is easy to see comma, then it is considered to be an “unfinished” string
that with probability one there is a c such that H ( X n ) > and is also ignored. Note that since computations may
n - c for all n, where X n denotes the first n bits of X , be endless, it is now possible for a semi-infinite portion
that is, the first n bits of the dyadic expansion of x. We of the program tape to be read.
take this property to be our definition of an algorithmi- The definitions of the probability P ( A ), the entropy
cally random infinite string X or real x. H ( A ), and the complexity I ( A ) of an r.e. set of strings A
Algorithmic randomness is a clear-cutpropertyfor may now be given. P ( A ) is the probability that U pro-
infinite strings, but in the case of finite strings it is a mat- duces the output setA if each square of its program tape
ter of degree. If a cutoff were to be chosen, however, it is filled with a 0 or a 1 by a separate toss of an unbiased
would be well to place it at about the point at which coin. H ( A ) is the negative base-two logarithm of P ( A ) .
H (s)is equal to length (s). Then an infinite random string And I ( A ) is the size in bits of a minimal program that
could be defined to be one for which all initial segments produces the output set A , i.e., / ( A ) is the least n such
are finite random strings, within a certain tolerance. that there is a program tape contents that makes U un-
Now consider thereal number R defined as the halting dertake a computation in the course of which it reads
probability of the universal Turing mac,hine U that we precisely n squares of the program tape and produces
used to define P, H , and I ; i.e., R is the probability that the set of strings A . In order to define the joint and con-
U eventually halts if each square of its program tape is ditional probability and entropy we need a mechanism
filled with a 0 or a 1 by a separate toss of an unbiased for encoding two r.e. setsA and B into a single set A j o i n 353

JULY 1977 ALGORITHMICINFORMATION THEORY


B . T o obtain A join B one prefixes each string in A with Let us use the adjective “initial” for any set consisting
a 0 and each string in B with a 1 and takes the union of of all natural numbers less than a given one:
the two resulting sets. Enumerating A join B is equiva-
lent to simultaneously enumerating A and B . So the joint
For initial A , I ( A ) = H ( A ) + O(logH(A)).
probability P ( A , B ) is P ( A join B ) , the joint entropy Moreover, it is possibletoshow thatthereare in-
H ( A , B ) is H ( Ajoin B ) , and the joint complexity I ( A , B ) finitely many initial sets A for which Z(A) > H ( A ) +
is Z(A join B ) . These definitions can obviously be ex- O(logH(A)). This is thegreatest knowndiscrepancy
tended to more than twor.e. sets, but it is unnecessary to between I and H for r.e. sets. It isdemonstrated by
do so here. Lastly, the conditional probability P ( B I A ) of showing that occasionally the number of initial sets A
B givenA is the quotient of P ( A ,B ) divided by P ( A ) ,and with H ( A ) < n is appreciably greater than the number
the conditional entropy H ( B I A ) is the negative base-two of initial sets A with Z(A) < n. On the other hand, with
logarithm of P ( B 1 A ) . Thus by definition H ( A , B )= H ( A ) the aid of a crucial game-theoretic lemmaof D. A. Martin,
+ H(B1A). Solovay [28] has shown that
As before, one obtains the following basic inequali-
ties:
I(A)i 3 H(A) + O(logH(A)).
These are the best results currently known regarding the
H(‘4,B) = H ( B , A ) + O(1), relationship between the entropy and the complexity of
H(A)iH(A,B) + O(1), an r.e. set; clearly much remains to be done. Further-
more, what is the relationship between the conditional
H ( A ,B ) = H ( B , A ) + O(I ) , entropy and therelativecomplexity of r.e. sets? And
O(1 ) 5 H(B1A) 5 H ( B ) + O(I ) , how many minimal or near-minimal programs for an r.e.
set are there?
I ( A , B ) 5 Z ( A ) + I ( B ) + O(1). We would now like to mentionsome other results
In order to demonstrate the third and the fifth of these concerning these concepts. Solovay has shown that:
relations one imagines two unending computations to be There are exp,(n - H ( n ) + O ( 1 ) ) singleton sets A with H ( A ) < n,
occurring simultaneously. Then one interleaves the bits
There are exP,(n- H ( n ) + O ( 1 ) ) singleton setsA with [ ( A ) < n.
of the two programs in the order in which they are read.
Putting a fixed size prefix in front of this, one obtains a We have extended Solovay’s result as follows:
single program for performing both computations simul-
There are exp,(n - H‘(n) + O ( 1 ) ) finite sets A with H ( A ) < n,
taneously whose size is 0 ( 1) plus the sum of the sizes
of the original programs. There are exp,(n- H ( L , ) + O ( l o g H ( L , ) ) )sets A with [ ( A ) < n,
So far things look much as they did for individual There are exp,(n - H ’ ( L , ) + O ( l o g H ’ ( L J ) )sets A with H ( A ) < n.
strings. But the relationship between entropy and com-
plexity turns out to be morecomplicated for r.e. sets Here L, is the set of natural numbers less than n, and H‘
than it was in the case of individual strings. Obviously isthe entropy relative to the halting problem; if U is
the entropy H ( A ) isalways less than or equal to the provided with an oracle for the halting problem instead
complexity I @ ) , because of the probability contributed of one for the null set, then the probability, entropy, and
by eachminimal program forA : complexity measures one obtains are P’, H ’ , and I’ in-
stead of P , H , and I . Two final results:
H ( A )5 Z(A).
Z’(A, the complement ofA) 5 H ( A ) + O(1);
But how about bounds on Z(A) in terms of H ( A ) ? First
the probability that the complement of an r.e. set has
of all, it is easy to see that if A is a singleton set whose
cardinality n is essentially equal to the probability that a
only member is the string s, then H ( A ) = H (s) O ( 1 )+ set r.e. in the halting problem has cardinality n.
and I ( A 1 = Z(s) + 0 ( 1). Thus the theory of the algo-
rithmic information of individual strings is contained in More advanced results [27]
the theory of the algorithmic information of r.e. sets as The previous sections outline the basic features of the
the special case of sets having a single element: new formalism for algorithmic informationtheoryob-
For singleton A , I ( A ) = H ( A ) + O(I ) . tained by stipulating that programs be self-delimiting in-
stead of having endmarker blanks. Error terms in the
There is also a close but not an exact relationship be- basic relations which were logarithmic in the previous
tween H and I in the case of sets consisting of initial approach [9] are now of the order of unity.
segments of the set of natural numbers (recall we identi- In the previous approach the complexity of n is usu-
354 fy the natural number n with its binary representation). +
ally log, n 0 ( 1 ) , there is an information-theoreticchar-

G . J. CHAITIN IBM J. RES. DEVELOP.


acterization of recursive infinite strings [25, 261, and + +
H(X,ln) O ( 1 ) . So H ( X , ) = H ( n ) O ( 1 ) iff H(X,ln)
much is known about complexity oscillations in random = 0 ( 1) . Then using a relativized version of the proof in
infinite strings [ 141. The corresponding propertiesin the [37, pp. 525- 5261, one can show that X is recursive in
new approachhave beenelucidated by Solovay in an the halting problem. Moreover, by using a priority argu-
unpublished paper [27]. We present some of his results ment Solovay is actually able to construct a nonrecur-
here. For related work by Solovay, see the publications siveXthatsatisfiesH(X,)=H(n)fO(l).
[28, 48, 491.
Equivalent dejinitions of analgorithmicallyrandom
Recursive bounds on H ( n ) real
Following [23, p. 3371, let us consider recursive upper Pick a recursive enumeration 0,, O,, 0,, . . . of all open
and lower bounds on H ( n ). Let f be an unbounded re- intervals withrational endpoints. A sequence of open
cursive function, and consider the series C exp,(-f(n) )
sets U,, U,, U,, . . . is said to be simultaneously r.e. if
summed over all natural numbers n. If this infinite series
there is a recursive function h such that U , is the union
+
converges, then H ( n ) < f ( n ) O ( 1 ) for all n. And if it of those O iwhose index i is of the form h ( n , j ), for some
diverges, then the inequalities H ( n ) > f ( n ) and H ( n ) <
natural number j . Consider a real number x in the unit
f ( n ) each hold for infinitely many n. Thus, for example,
interval. We say that x has theSolovay randomness prop-
f o r a n y E > O , H ( n ) <logn+loglogn+(l+E)logloglogn
erty if the following holds. Let U , , U , , U,, ’ . . be any
+ + +
0 ( 1 ) for all n, and H ( n ) > logn loglogn loglog-
simultaneously r.e. sequence of open sets such that the
logn for infinitely many n, where all logarithms are base sum of the usual Lebesgue measureof the U , converges.
two. See [50] for the results on convergence used to Then x is in only finitely many of the U,. We say that x
prove this. has the Chaitin randomness property if there is a c such
Solovay has obtained the following results regarding
that H ( X , ) > n - c for all n, where X , is the string con-
recursive upper bounds on H , i.e., recursive h such that
sisting of the first n bits of the dyadic expansion of x.
H ( n ) < h ( n ) for all n. First he shows that there is a
Solovay has shown that these randomness properties are
recursiveupperboundon H which is almostcorrect
equivalent to each other, and that they are also equiva-
infinitely often, i.e., I H ( n ) - h ( n ) l < c for infinitely
lent to Martin-Lof’s definition [ 101 of randomness.
many values of n. In fact, the lim sup of the fraction of
values of i less than n such that h ( i )I H ( i ) - h ( i )I < c is
greater than 0. However, he also shows that the values Theentropy of initial segments of algorithmically
of n for which I H ( n ) - h ( n )1 < c must in a certain sense random and of R-like reals
be quite sparse. In fact, he establishes that if h is any Consider a random real x. By the definition of random-
recursive upper bound on H then there cannot exist a +
ness, H ( X , ) > n 0 ( 1 ) . On the other hand, for any
tolerance c and a recursive function f such that there are infinite string X , random or not, we have H ( X , ) 5 n +
always at least n different natural numbers i less than +
H ( n ) 0 ( 1 ) . Solovay shows that t h e above bounds are
f ( n ) at which h ( i ) is within c of H ( i ) . I tfollows that the each sometimes sharp. More precisely, consider a ran-
lim inf of the fraction of values of i less than n such that dom X and a recursive functionfsuch thatZ exp, (-f(n) )
I H ( i ) - h ( i ) l < c is zero. diverges (e.g., f ( n ) = integer part of log,n). Then there
The basic idea behindhis construction of h is to are infinitely many natural numbers n such that H ( X , )
choose f s o that C exp, (-f(n ) ) converges “as slowly” as 2 n + f ( n ). And consider an unbounded monotone
possible. As a byproduct he obtains a recursive conver- increasing recursive function g (e.g., g ( n ) = integer part
gent series of rational numbers Ea, such that if Zb, is of loglog n ). There are infinitely many natural numbers n
anyrecursive convergent series of rational numbers, such that it is simultaneously the case that H ( X , ) 5 n +
then lim sup a,/ b, is greater than zero. g ( n ) and H ( n ) 2 f ( n ) .
Solovay has obtained much more precise results along
Nonrecursive injinite stringswithsimple initial seg- these lines about R and a class of reals which he calls
ments “R-like.” A real number is said to be an r.e. real if the
At the high-order end of the complexity scale for infinite set of all rational numbers less than itis an r.e. subset of
strings are the random strings, and the recursive strings the rational numbers. Roughly speaking, an r.e. real x is
are at the low order end. Is anything else there? More R-like if for any r.e. real y one can getin an effective man-
formally, let X be an infinite binary string, and let X , be ner a good approximation to y fromany good ap-
the first n bits of X . If X is recursive, then we have proximation to x, and the quality of the approximation to
+
H ( X , ) = H ( n ) O ( 1 ) . What about the converse, i.e., y is at most 0 ( 1 ) binary digits worse than the quality of
what can be saidabout X given only that H ( X , ) = H ( n ) + the approximation to x. The formal definition of R-like is
O ( l ) ? Obviously H ( X , ) = H(n,X,) +
O(1) =H ( n ) + as follows. The real x is said to dominate the real y if 355

JULY 1977 ALGORITHMIC INFORMATION THEORY


there is a partial recursive function f and a constant c “[Natural] numbers can of course be interesting in a
with the property that if q is any rational number that is variety of ways. Thenumber 30 wasinteresting to
less than x, then f ( q ) is defined and is a rational number George Moore when he wrote his famous tribute to ‘the
that is less than y and satisfies the inequality CIX - q1 2 woman of 30,’ the age at which he believed a married
/ y - f ( q ) 1. And a real number is said to be R-like if it is woman was most fascinating. To a number theorist 30 is
an r.e. real that dominates all r.e. reals. Solovay proves more likely to be exciting because it is the largest integer
that R is in fact 0-like, and that if x and y are 0-like, such that all smaller integers with which it has no com-
+
then H ( X,) = H ( Y , ) 0 ( 1) , where X, and Y , are the mon divisor are prime numbers. . . . The question arises:
first n bits in the dyadic expansions of x and y. It is an Arethereany uninterestingnumbers? We canprove
immediatecorollary that if x isR-likethen H(X,) = thatthereare none by the following simple steps. If
H(RJ + 0(1), and that all a-like reals are algorithmi- there are dull numbers, we can then divide all numbers
cally random. Moreover Solovayshows that the algo- into twosets-interesting and dull. Intheset of dull
rithmic probability P ( s ) of any string s is always an R- numbers there will be only one number that is the small-
like real. est. Since it is the smallest uninteresting number it be-
Inorder to state Solovay’s resultscontrasting the comes, ipso facto, an interesting number. [Hence there
behavior of H (0,)with that of H ( X , ) for a typical real are no dull numbers!] ” [5 11.
number x, it is necessary to define two extremely slowly “Among transfinite ordinalssome canbe defined,
growing monotone functions a and a’. a ( n ) = min H ( j ) while others cannot; for the total number of possible
( j 2 n ) , and a’ is defined in the same manner as a except definitions is X,, while the number of transfinite ordinals
that H is replaced by H ‘ , the algorithmic entropy rela- exceeds KO. Hence there must be indefinable ordinals,
tive to the halting problem. It can be shown (see [29, and among these there must be a least. But this is de-
pp. 90-911)that a goes to infinity, butmore slowly fined as ‘the least indefinable ordinal,’ which is a contra-
than anymonotonepartial recursive function does. diction” [52].
More precisely, iff is an unbounded nondecreasing par- Here is our incompleteness theorem for formal axiom-
tial recursive function, then a ( n ) is less than f ( n ) for atictheorieswhosearithmetic consequencesaretrue.
almost all n for whichf(n) is defined. Similarly a’ goes The setup is as follows: The axioms are a finite string,
to infinity, but more slowly than any monotone partial the rules of inference are an algorithm for enumerating
function recursive in a does. More precisely, iff is an the theorems given the axioms, and we fix the rules of
unboundednondecreasingpartialfunctionrecursive in inferenceandvary the axioms. Within such a formal
the halting problem, then a’ ( n ) is less than f ( n ) for al- theory a specific string cannot be proven to be of en-
most all n for which f ( n ) is defined. In particular, a’ ( n ) tropy more than O ( 1) greaterthan the entropy of the ax-
is less than a ( a ( n ) )for almost all n. ioms of the theory. Conversely, there are formal theories
We can now state Solovay’s results. Consider a real +
whose axioms have entropy n 0 ( 1) in which it is pos-
number x uniformly distributed in the unit interval. With sible to establish all true propositions of theform
+
probability one there is a c such that H ( X , ) > n H ( n ) “H(specific string) 1 n.”
- c holds for infinitely many n. Andwithprobability
Proof Consider the enumeration of the theorems of the
+ +
one, H ( X , ) > n a ( n ) O(loga(n)). Whereas if x is
formal axiomatictheory in order of the size oftheir
+
R-like, then the following occurs: H ( X , ) < n H ( n ) -
+
a ( n ) 0 (loga(n) ), and for infinitely many n we have
proofs. For each natural number k, let s* be the string in
the theorem of the form “ H ( s ) E n” with n greater than
+ +
H ( X J < n a’(n ) 0 (log a’(n ) ) . This shows that the
+
H (axioms) k which appears first in this enumeration.
complexity of initial segments of the dyadic expansions
On the one hand, if all theorems are true, then H ( s * ) >
of R-like reals is atypical. It is an open question whether
H(R,) - n tends toinfinity; Solovay suspects thatit does.
+
H(axioms) k. On the other hand, the above prescrip-
tion for calculating s* shows that H ( s * ) 5 H(axioms) +
+ +
H ( k ) O(1). It follows that k < H ( k ) O(1). How-
ever, this inequality is false for all k 1 k*, where k* de-
Algorithmic information theory and
metamathe-
pends only on the rules of inference. The apparent con-
matics
tradiction is avoided only if s* does not exist for k = k*,
There is something paradoxical about being able to
i.e., only if it is impossible to prove in the formal the-
prove that a specific finite string is random; this is per-
ory that a specific string has H greater than H (axioms)
haps captured in the following antinomies from the writ-
ings of M. Gardner [51] and B. Russell [521. In reading
+ k*.
them one should interpret “dull,”“uninteresting,” and Proof of Converse The set T of all true propositions of
“indefinable” to mean “random,” and “interesting” and the form “ H (s) < k” is r.e. Choose a fixed enumeration
356 “definable” to mean
“nonrandom.” of T without repetitions, and for each natural number n

G . J. CHAITIW IBM J. RES. DEVELOP.


let s* be the string in the last proposition of the form There is such an enormous difference between dead
“ H ( s ) < n” in the enumeration. It is not difficult to see and organized living matter that it must be possible to
+
that H ( s * , n ) = n O(1). Let p be a minimal program givea quantitativestructuralcharacterization of this
for the pair s*, n. Then p is the desired axiom, forH ( p ) difference, i.e., of degree of organization. One possibility
=n + 0 ( 1 ) and to obtain all true propositions of the [ 191 is to characterize an organism as a highly interde-
form “ H ( s ) 1 n” from p one enumerates T until all s pendent region, onefor which the complexity of the
with H ( s ) < n have been discovered. All other s have whole is much less than the sum of the complexities of
H ( s ) 3 n. its parts. C. H. Bennett [55] has suggested another ap-
We developed this information-theoretic approach to proach based on the the notion of “logical depth.” A
metamathematics before being in possession of the no- structure is deep “if it is superficially random but subtly
tion of self-delimiting programs (see [20 - 221 and also redundant, in other words, if almost all its algorithmic
[ 531 ) ; the technical details are somewhatdifferent when probability is contributed by slow-running programs. A
programs have blanks as endmarkers. The conclusion to string’s logical depth should reflect the amount of com-
be drawn from all this is that even though most strings putational work required toexposeits buried redun-
are random, we will never be able to explicitly exhibit a dancy.” It is Bennett’s thesis that “apriori the most prob-
string of reasonable size which demonstrably possesses able explanation of ‘organized information’ such as the
this property. A less pessimistic conclusion to be drawn sequence of bases in a naturally occurring D N A mole-
is that it is reasonable to measure the power of formal cule is that it is the product of an extremely long evolu-
axiomatic theories in information-theoreticterms. The tionaryprocess.” For relatedworkby Bennett,see
fact that in some sense one gets out of a formal theory [561.
no more than one puts in should not be taken too seri- This, then, is the fundamental problem of theoretical
ously: a formal theory is at its best when a great many biology that we hope the ideas of algorithmic informa-
apparently independent theoremsare
shownto
be tion theory may help to solve: to set up a nondeterminis-
closely interrelated byreducingthem to a handful of tic model universe, to formally define what it means for
axioms. In this sense a formal axiomatic theory is valu- a region of space-time in that universe to be anorganism
able for the same reason as a scientific theory; in both and what is its degree of organization, and to rigorously
cases information is being compressed, and one is also demonstrate that, starting from simple initial conditions,
concerned with the tradeoff between the degree of com- organisms will appear and evolve in degree of organiza-
pression and thelength of proofs of interesting theorems tion in a reasonable amount of time and with high proba-
or the time required to compute predictions. bility.

Acknowledgments
Algorithmic information theory and biology The quotation by M. L.Minskyin the firstsection is
Above we have pointed out a number of open problems. reprintedwith the kind permission of the publisher
In our opinion, however, the most importantchallenge is AmericanMathematical Society
from Mathematical
to see if the ideas of algorithmic information theory can Problems in the BiologicalSciences,Proceedings of
contribute in some form or manner to theoretical mathe- Symposia in AppliedMathematics X I V , pp. 42-43,
matical biology in the style of von Neumann [54], in copyright @ 1962. We are grateful to R. M. Solovay for
which geneticinformation is considered to be an ex- permitting us to include several of his unpublished re-
tremely large and complicated program for constructing sults in the sectionentitled “Moreadvanced results.”
organisms. We alluded briefly to this in a previous paper The quotation by M. Gardner in thesectionon algo-
[21], and discussed it at greater length in a publication rithmic information theory and metamathematics is re-
[ 191 of somewhat limited access. printed with his kind permission, and the quotation by
Von Neumann wished to isolate the basic conceptual B. Russell in that section is reprinted with permission of
problems of biology from the detailed physics and bio- the Johns Hopkins University Press. We are grateful to
chemistry of life as we know it. The gist of his message C . H. Bennett for permitting us to present his notion of
is that it should be possible to formulate mathematically logical depth in print for the first time in the section on
and to answer in a quite general setting such fundamen- algorithmic information theory and biology.
tal questionsas“How is self-reproductionpossible?”,
“What is an organism?”, “What is its degree of organi- References
zation?”,and
“How probable is evolution?’. H e 1. M. L. Minsky, “Problems of Formulation for Artificial In-
achieved this for the first question; he showed that exact telligence,” Mathematical Problems in the Biological Sci-
ences, Proceedings of Symposia in Applied Mathematics
self-reproduction of universal Turing machines is possi- X I V , R. E. Bellman, ed., American Mathematical Society,
ble in a particular deterministic model universe. Providence, RI, 1962, p. 35. 357

JULY 1977 ALGORITHMICINFORMATIONTHEORY


2. M. Gardner, “An Inductive Card Game,” Sci. Amer. 200, tion and Randomness by Means of the Theory ofAlgo-
No. 6, 160 (1959). rithms,” Russ. Math. Suru. 25, No. 6, 83 (1970).
3. R. J. Solomonoff, “A Formal Theory of Inductive Infer- 30. L. A. Levin, “On the Notion of a Random Sequence,” So-
ence,” Info. Control 7, 1, 224 (1964). viet Math. Dokl. 14, 1413 (1973).
4. D. G. Willis, “Computational Complexity and Probability 31. P. Gat, “On the Symmetry of Algorithmic Information,”
Constructions,” J . ACM 17,241 (1970). Soviet Math. Dokl. 15, 1477 (1974). “Corrections,” Soviet
5. T . M. Cover, “Universal Gambling Schemes and the Com- Math. Dokl. 15, No. 6, v ( 1974).
plexity Measures of Kolmogorov and Chaitin,” Statistics 32. L. A. Levin, “Laws of Information Conservation (Non-
Department Report 12, Stanford University, CA, October, growth) and Aspects of the Foundation of Probability
1974. Theory,” Prob. Info. Transmission 10,206 (1974).
6. R. J. Solomonoff, “Complexity Based Induction Systems: 33. L. A. Levin, “Uniform Tests of Randomness,” Soviet
Comparisons and Convergence Theorems,” ReportRR- Math. Dokl. 17, 337 (1976).
329, Rockford Research, Cambridge, MA, August, 1976. 34. L. A. Levin, “Various Measures of Complexity for Finite
7. A.N. Kolmogorov, “On Tables of Random Numbers,” Objects (Axiomatic Description) ,” Soviet Math. Dokl. 17,
Sankhyd A25, 369 (1963). 522 (1976).
8. A. N. Kolmogorov, “Three Approaches to the Quantitative 35. L. A. Levin, “On the Principle of Conservation of Informa-
Definition of Information,” Prob.Info.Transmission 1, tion in Intuitionistic Mathematics,” Soviet Math. Dokl. 17,
No. 1 , 1 (1965). 601 (1976).
9. A. N. Kolmogorov, “Logical Basis for Information Theory 36. D. E. Knuth, Seminumerical Algorithms. The Art of Com-
and Probability Theory,” IEEE Trans. Info. Theor. IT-14, puter Programming, Volume 2 , Addison-Wesley Publishing
662 (1968). Co., Inc., Reading, MA, 1969. See Ch. 2, “Random Num-
10. P. Martin-Lof, “The Definition of Random Sequences,” bers,” p. 1 .
Info. Control 9,602 (1966). 37. D. W. Loveland, “A Variant of the Kolmogorov Concept
1 1 . P. Martin-Lof, “Algorithms and Randomness,” Intl. Stat. of Complexity,” Info. Control 15,510 (1969).
Rev. 37,265 (1969). 38. T. L. Fine, Theories of Probability-AnExamination of
12. P. Martin-Lof, “The Literature on von Mises’ Kollektivs Foundations, Academic Press, Inc., New York, 1973. See
Revisited,” Theoria 35, Part 1, 12 ( 1969). Ch. V, “Computational Complexity, Random Sequences,
13. P. Martin-Lof, “On the Notion of Randomness,” Intuition- and Probability,” p. 1 18.
ism and Proof Theory, A. Kino, J. Myhill, and R. E. Ves- 39. J. T. Schwartz, O n Programming: An Interim Report on
ley, eds., North-Holland Publishing Co., Amsterdam, 1970, the SETLProject.Installment I : Generalities, Lecture
p. 73. Notes, Courant Institute of Mathematical Sciences, New
14. P. Martin-Lof, “Complexity Oscillations in Infinite Binary York University, 1973. See Item 1, “On the Sources of
Sequences,” Z . Wahrscheinlichk.uerwand. G e b . 19, 225 Difficulty in Programming,” p. 1 , and Item 2, “A Second
(1971). General Reflection on Programming,” p. 12.
15. G . J. Chaitin, “On the Length of Programs for Computing 40. T. Kamae, “On Kolmogorov’s Complexity and Informa-
Finite Binary Sequences,” J . ACM 13,547 (1966). tion,’’ OsakaJ. Math. 10, 305 (1973).
16. G. J. Chaitin, “On the Length of Programs for Computing 41. C. P. Schnorr, “Process Complexity and Effective Random
Finite Binary Sequences: Statistical Considerations,” J . Tests,” J . Comput. Syst. Sci.7, 376 (1973).
ACM 16, 145 (1969). 42. M. E. Hellman, “The Information Theoretic Approach to
17. G. J. Chaitin, “On the Simplicity and Speed of Programs Cryptography,” Information Systems Laboratory, Center
for Computing Infinite Sets of Natural Numbers,” J . ACM for Systems Research, Stanford University, April, 1974.
16,407 (1969). 43. W. L. Gewirtz, “Investigations in the Theory of Descrip-
18. G . J. Chaitin, “On the Difficulty of Computations,” IEEE tive Complexity,” CourantComputerScienceReport 5,
Trans. Info. Theor. IT-16,5 (1970). Courant Institute of Mathematical Sciences, New York
19. G. J. Chaitin, “To a Mathematical Definition of ‘Life,”’ University, October, 1974.
ACM SICACT News 4,12 (1970). 44. R. P. Daley, “Minimal-program Complexity of Pseudo-
20. G. J. Chaitin, “Information-theoretic Limitations of Formal recursive and Pseudo-random Sequences,’’ Math.Syst.
Systems,” J . ACM 21,403 (1974). Theor. 9,83 (1975).
2 1 . G. J. Chaitin, “Information-theoretic Computational Com- 45. R. P. Daley, “Noncomplex Sequences: Characterizations
plexity,” IEEE Trans. Info. Theor. IT-20, 10 (1974). and Examples,” J . Symbol. Logic 41, 626 ( 1976).
22. G. J. Chaitin, “Randomness and Mathematical Proof,” Sci. 46. J. Gruska, “Descriptional Complexity (of Languages) -A
Amer. 232, No. 5 , 47 (1975). (Also published in the Jap- Short Survey,” MathematicalFoundations of Computer
anese and Italian editions of Sci. Amer.) Science 1976, A. Mazurkiewicz, ed., Lecture Notes in
23. G. J. Chaitin, “A Theory of Program Size Formally Identi- Computer Science 45, Springer-Verlag, Berlin, 1976, p. 65.
cal to Information Theory,” J . ACM 22, 329 (1975). 47. J. Ziv, “Coding Theorems for Individual Sequences,” un-
24. G. J. Chaitin, “Algorithmic Entropy of Sets,” Comput. & dated manuscript, Bell Laboratories, Murray Hill, NJ.
Math. Appls. 2, 233 (1976). 48. R. M. Solovay, “A Model of Set-theory in which Every Set
25. G. J. Chaitin, “Information-theoretic Characterizations of of Reals is Lebesgue Measurable,” Ann.Math. 92, 1
Recursive Infinite Strings,” Theoret.Comput.Sci. 2, 45 (1970).
(1976). 49. R. Solovay and V. Strassen, “A Fast Monte-Carlo Test
26. G. J. Chaitin, “Program Size, Oracles, and the Jump Oper- for Primality,” SIAM J . Comput. 6, 84 (1977).
ation,’’ Osaka J . Math., to be published in Vol. 14, No. 1 , 50. G. H. Hardy, A Course of Pure Mathematics, Tenth edi-
1977. tion, Cambridge University Press, London, 1952. See
27. R. M. Solovay, “Draft of a paper . . . on Chaitin’s work Section 2 18, “Logarithmic Tests of Convergence for Series
. . . done for the most part during the period of Sept. -Dec. and Integrals,” p. 417.
1974,” unpublished manuscript, IBM Thomas J. Watson 51. M. Gardner,“A Collection of Tantalizing Fallacies of
Research Center, Yorktown Heights, NY, May, 1975. Mathematics,” Sci. Amer. 198, No. 1,92 (1958).
28. R. M. Solovay, “On Random R. E. Sets,” Proceedings of 52. B. Russell, “Mathematical Logic as Based on the Theory of
theThirdLatinAmericanSymposium on Mathematical Types,” From Frege to Godel: A Source Book in Mathe-
Logic, Campinas, Brazil, July, 1976. To be published. matical Logic, 1879-1931, J. van Heijenoort, ed., Harvard
29. A. K. Zvonkin and L. A. Levin, “The Complexity of Finite University Press, Cambridge, MA, 1967, p. 153; reprinted
3513 Objects and the Development of the Concepts of Inforrna- from Amer. J . Math. 30, 222 (1908).

G . J. CHAITIN IBM J. RES. DEVELOP.


53. M. Levin, “Mathematical Logic forComputer Scientists,” ReceivedFebruary 2, 1977;revisedMarch9,1977
M I T Project M A C T R - 1 3 1 , June, 1974, pp. 145, 153.
54. J . von Neumann, Theory of Self-reproducing Automata,
University of Illinois Press, Urbana, 1966; edited and
completed by A. W. Burks.
55. C. H. Bennett, “On the Thermodynamics of Computation,”
undated manuscript, IBM Thomas J. Watson Research
Center, Yorktown Heights, NY.
56.C. H. Bennett, “Logical Reversibility of Computation,” TheauthorislocatedattheIBMThomas J . Watson
I B M J . Res.
Develop. 17,525 (1973). Research
Center,
Yorktown
Heights,
New
York
10598.

359

JULY 1977 ALGORITHMIC INFORMATION THEORY

You might also like