What Is Information

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/226335374
What is Information?
Article in Foundations of Science · January 2004

DOI: 10.1023/B:FODA.0000025034.53313.7c
CITATIONS READS
23 49,598
1 author:
Olimpia Lombardi
CONICET-National Scientific and Technical Research Council
161 PUBLICATIONS 1,856 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
VII Conference On Quantum Foundations - Website: https://sites.google.com/site/viijornadasfundamentoscuantica/ View project
Problemas filosóficos de la dinámica científica View project
All content following this page was uploaded by Olimpia Lombardi on 13 March 2014.
The user has requested enhancement of the downloaded file.

OLIMPIA LOMBARDI
WHAT IS INFORMATION?
ABSTRACT. The main aim of this work is to contribute to the elucidation of the
concept of information by comparing three different views about this matter: the
view of Fred Dretske’s semantic theory of information, the perspective adopted by
Peter Kosso in his interaction-information account of scientific observation, and
the syntactic approach of Thomas Cover and Joy Thomas. We will see that these
views involve very different concepts of information, each one useful in its own
field of application. This comparison will allow us to argue in favor of a termino-
logical ‘cleansing’: it is necessary to make a terminological distinction among
the different concepts of information, in order to avoid conceptual confusions
when the word ‘information’ is used to elucidate related concepts as knowledge,
observation or entropy.
KEY WORDS: communication, information, knowledge, observation, prob-

ability
‘We live in the age of information’: this sentence has become a

commonplace of our times. Our everyday language includes the
word ‘information’ in a variety of different contexts. It seems that
we all precisely know what information is. Moreover, the explosion
in telecommunications and computer sciences endows the concept
of information with a scientific prestige that makes supposedly
unnecessary any further explanation. This apparent self-evidence
has entered the philosophical literature: philosophers usually handle
the concept of information with no careful discussion.
However, the understanding of the meaning of the word ‘infor-
mation’ is far from being so simple. The supposed agreement hides
the fact that many different senses of the same word coexist. The
flexibility of the concept of information makes it a diffuse notion
with a wide but vague application. If we ask: ‘What is information?’,
we will obtain as many different definitions as answers.
The main aim of this work is to contribute to the elucida-
tion of the concept of information by comparing three different
views about this matter: the view of Fred Dretske’s semantic theory
Foundations of Science 9: 105–134, 2004.

© 2004 Kluwer Academic Publishers. Printed in the Netherlands.
106 OLIMPIA LOMBARDI
of information, the perspective adopted by Peter Kosso in his

interaction-information account of scientific observation, and the
syntactic approach of Thomas Cover and Joy Thomas. We will see
that these views involve very different concepts of information, each
one useful in its own field of application. This comparison will allow
us to argue in favor of a terminological ‘cleansing’: it is necessary
to make a terminological distinction among the different concepts of
information, in order to avoid conceptual confusions when the word
‘information’ is used to elucidate related concepts as knowledge,
observation or entropy.
1. BASIC CONCEPTS OF SHANNON’S THEORY
We will begin our discussion by presenting some basic concepts of

Shannon’s theory, because all the three views that we will analyze
adopt this theory as their formal basis. The concepts introduced in
this section will be necessary for our further discussion.
The Theory of Information was formulated to solve certain
specific technological problems. In the early 1940s, it was thought
that increasing the transmission rate of information over a communi-
cation channel would increase the probability of error. With his
paper “The Mathematical Theory of Communication”, Claude
Shannon (1948) surprised the communication theory community by
proving that this was not true as long as the communication rate was
below the channel capacity; this capacity can be easily computed
from the characteristics of the channel. This paper was immediately
followed by many works of application to fields as radio, television
and telephony. At present, Shannon’s theory has become a basic
element of the communication engineers training.
Communication requires a source S, a receiver R and a channel
CH:
If S has a range of possible states s1 , . . ., sn , whose probabilities of

occurrence are p(s1 ), . . ., p(sn ), the amount of information generated
at the source by the occurrence of si is:1
WHAT IS INFORMATION? 107
(1.1) I (si ) = log1/p(si )
Where ‘log’ is the logarithm to the base 2.2 The choice of a

logarithmic base amounts to a choice of a unit for measuring infor-
mation. If the base 2 is used, the resulting unit is called ‘bit’ –
a contraction of binary unit –.3 One bit is the amount of infor-
mation obtained when one of two equally likely alternatives is
specified.
However, the theory is not concerned with the occurrence of
specific events, but with the communication process as a whole.
Hence, the average amount of information generated at the source
is defined as the average of the I(si ) weighted by the probability of
occurrence of each state:
(1.2) I (S) = p(si )I (si ) = p(si )log1/p(si )
I(S) has its maximum value equal to log n, when all the p(si ) have
the same value, p(si ) = 1/n.
By analogy, if R has a range of possible states r1 , . . ., rm , whose
probabilities of occurrence are p(r1 ), . . ., p(rm ), the amount of
information received at the receiver by the occurrence of ri is:
(1.3) I (ri ) = log1/p(ri )
And the average amount of information received at the receiver is
defined as:
(1.4) I (R) = p(ri )I (ri ) = p(ri )log1/p(ri )
The relationship between I(S) and I(R) can be represented by the
following diagram:
where:
• I(S, R) transinformation. Average amount of information

generated at S and received at R.
• E: equivocation. Average amount of information generated at S
but not received at R.
• N: noise. Average amount of information received at R but not
generated at S.
As the diagram shows, I(S, R) can be computed as:
(1.5) I (S, R) = I (S) − E = I (R) − N
E and N are measures of the amount of dependence between the
source S and de receiver R:
• If S and R are totally independent, the values of E and N are
maximum (E = I(S) and N = I(R)), and the value of I(S, R) is
minimum (I(S, R) = 0).
• If the dependence between S and R is maximum, the values of
E and N are minimum (E = N = 0), and the value of I(S, R) is
maximum (I(S, R) = I(S) = I(R)).
The values of E and N are not only function of the source and the
receiver, but also of the communication channel. The introduction of
the communication channel leads directly to the possibility of errors
arising in the process of transmission: the channel CH is defined by
the matrix [p(rj /si )], where p(rj /si ) is the conditional probability of
the occurrence of rj given that si occurred, and the elements in any
row must sum to 1. Thus, the definitions of E and N are:
(1.6) E = p(rj )p(si /rj )log1/p(si /rj )
= p(rj , si )log1/p(si /rj )
(1.7) N = p(si )p(rj /si )log1/p(rj /si )
= p(si , rj )log1/p(rj /si )
where p(si , rj ) = p(rj , si ) is the joint probability of si and rj (p(si ,
rj ) = p(si )p(rj /si ); p(rj , si ) = p(rj )p(si /rj )). The channel capacity is
given by:
(1.8) C = max I (S, R)
where the maximum is taken over all the possible distributions p(si )
at the source.4
The strong relationship between the characteristics of the channel

and the values of E and N allows us to define two special types of
communication channels:
• Equivocation-free channel (E = 0): a channel defined by a
matrix with one, and only one, non-zero element in each
column.
• Noise-free channel (N = 0): a channel defined by a matrix with
one, and only one, non-zero element in each row.
2. INFORMATION AND KNOWLEDGE: DRETSKE’S SEMANTIC

THEORY OF INFORMATION
A concept usually connected with the notion of information is

the concept of knowledge: it is assumed that information provides
knowledge, that it modifies the state of knowledge of those who
receive it. Some authors even define the measure of information in
terms of knowledge; this is the case of D.A. Bell in his well-known
textbook, where he states that information “is measured as a differ-
ence between the state of knowledge of the recipient before and
after the communication of information” (Bell, 1957, p. 7).
In his Knowledge and the Flow of Information, Fred Dretske
presents an attempt to apply a semantic concept of information to
questions in the theory of knowledge. By identifying knowledge
and information-caused belief, he distinguishes between sensory
processes and cognitive processes – between seeing and recognizing
– in terms of the different ways in which information is coded, and
analyzes the capacity of holding beliefs and developing concepts.
But this is not the part of his work with which we are concerned;
we are interested on his interpretation of the concept of informa-
tion. Dretske adopts Shannon’s theory as a starting point, but he
introduces two new elements into his approach. First, he proposes a
change in the basic formulas of the theory. Second, he supplements
the resulting formal theory with a semantic dimension. Let us begin
with the first point.
According to Dretske, one of the respects in which Shannon’s
theory is unprepared to deal with semantic issues is that semantic
notions apply to particular messages, while the theory of informa-
tion deals with average amounts of information. Since Dretske is
concerned with seeking an information-based theory of knowledge,

he is interested on the informational content of particular messages
and not on average amounts of information: “if information theory
is to tell us anything about the informational content of signals,
it must forsake its concern with averages and tell us something
about the information contained in particular messages and
signals. For it is only particular messages and signals that have
a content” (Dretske, 1981, p. 48). In order to focus on the infor-
mation contained in particular messages, Dretske changes the usual
interpretation about the relevant quantities of the theory: instead
of considering the average amount of information I(S) as the basic
quantity (equation (1.2)), he focuses on the amount of information
generated at the source by the occurrence of sa (equation (1.1)):
(2.1) I (sa ) = log1/p(sa )
and instead of adopting the transinformation I(S, R) as the relevant

quantity, he defines a new ‘individual’ transinformation I(sa , ra ),
amount of information carried by a particular signal ra about sa , by
analogy with equation (1.5) (Dretske, 1981, p. 52):5
(2.2) I (sa , ra ) = I (sa ) − E(ra )
where:
(2.3) E(ra ) = p(si /ra )log1/p(si /ra )
According to Dretske (1981, p. 24), E(ra ) is the contribution of ra

to the equivocation E because, given the definition of E (equation
(1.6)), results that:
(2.4) E = p(rj )p(si /rj )log1/p(si /rj ) = p(rj )E(rj )
Dretske foresees that he will be accused of misrepresenting or

misunderstanding the theory of information. For this reason, he
emphasizes that “the above formulas are now be assigned a
significance, given an interpretation, that they do not have in
standard applications of communication theory. They are now
being used to define the amount of information associated with
particular events and signals” (Dretske, 1981, p. 52). And he
immediately adds that, even though such an interpretation is foreign
to standard applications of the theory, it is “perfectly consistent”

with the orthodox uses of these formulas.
Dretske’s aim of adapting the standard theory of information
to make it capable of dealing with the information contained in
particular messages is very valuable. The problem is that the formal
resources to reach this goal have deep technical difficulties. The
least of them is talking about the ‘signal ra ’ in the definition of
I(sa , ra ) (equation (2.2)): ra is not a signal but one of the states
of the receiver. I(sa , ra ) should be defined as the amount of infor-
mation about the state sa of the source contained in the state ra of
the receiver. It is even more troubling that Dretske uses the same
subindex ‘a’ to refer to the state of the source and to the state of the
receiver, as if some specific relationship linked certain pairs (s, r). In
order to make the definition of the new individual transinformation
(equation (2.2)) completely general, I(si , rj ) should be defined as the
amount of information about the state si of S received at R through
the occurrence of its state rj :
(2.5) I (si , rj ) = I (si ) − E(rj )
where E(rj ) would be (by analogy with equation (2.3)):
(2.6) E(rj ) = p(si /rj )log1/p(si /rj )
However, we have not reached the central difficulty yet. When

Dretske’s proposal is formally ‘cleaned’ in this way, its main tech-
nical problem shows up. If – as Dretske supposes – I(si , rj ) were
the ‘individual’ correlate of the transinformation I(S, R), then I(S,
R) should be computed as the average of the I(si , rj ). According to
the definition of the average of a function of two variables:
(2.7) I (S, R) = p(si , rj )I (si , rj )
where I(S, R) = I(S) − E (equation (1.5)), with the standard defini-

tions of I(S) and E (equations (1.2) and (1.6)). The technical problem
is that the identity (2.7) does not hold with Dretske’s formulas.
Indeed, a simple algebraic argument shows that we cannot obtain:
(2.8) I (S, R) = I (S) − E = p(si )log1/p(si )

−p(rj , si )log1/p(si /rj )
from the right-hand term of (2.7) when (2.5) and (2.6) are used.6
Therefore, we cannot accept Dretske’s response to the critics who
accuse him of misunderstanding Shannon’s theory: his ‘interpre-
tation’ of the formulas by means of these new quantities is not
compatible with the formal structure of the theory.
It might be argued that this is a minor formal detail, but this
detail has relevant conceptual consequences. When Dretske defines
E(rj ), that is, the contribution of rj to the equivocation E as a
summation over the si (equation (2.6)), he makes the error of
supposing that this individual contribution is only function of the
particular state rj of the receiver. But the equivocation E is a
magnitude that essentially depends on the communication channel.
Then, any individual contribution to E must preserve such a depend-
ence. The understanding of this conceptual point allows us to
retain Dretske’s proposal by appropriately correcting his formal
approach. In order to introduce such a correction, we must define
the individual contribution of the pair (si , rj ) to the equivocation E
as:
(2.9) E(si , rj ) = log1/p(si /rj )
With this definition, it holds that the average of E(si , rj ) is equal to

E:
(2.10) E = p(rj , si )log1/p(si /rj ) = p(rj , si )E(si , rj )
Now we can correctly rewrite equation (2.5) as:
(2.11) I (si , rj ) = I (si ) − E(si , rj )
where the average of I(si , rj ) is the transinformation I(S, R):
(2.12) I (S, R) = I (S) − E = p(si , rj )I (si , rj )
This modified version of the formulas makes possible to reach

Dretske’s goal, that is, to adapt the standard theory of information
to deal with the information contained in particular messages. We
can now return to Dretske’s argument. When does the occurrence of
the state rj at the receiver give us the knowledge of the occurrence
of the state si at the source? The occurrence of the state rj tells
us that si has occurred when the amount of information I(si , rj ) is
equal to the amount of information I(si ) generated at the source by

the occurrence of si . This means that there was no loss of informa-
tion through the individual communication, that is, the value of the
individual contribution E(si , rj ) to the equivocation is zero (Dretske,
1981, p. 55); according to equation (2.11):
E(si , rj ) = 0 ⇒ I (si , rj ) = I (si )
But, now, the value of E(si , rj ) must be obtained with the correct
formula (2.9). At this point, it should be emphasized again that,
contrary to Dretske’s assumption, the individual contribution to the
equivocation is function of the communication channel and not only
of the receiver. In other words, it is not the state rj which individu-
ally contributes to the equivocation E but the pair (si , rj ), with
its associated probabilities p(si ) and p(rj ), and the corresponding
conditional probability p(rj /si ) of the channel. This means that we
can get completely reliable information – we can get knowledge
– about the source even through a very low probable state of the
receiver, provided that the channel is appropriately designed.
But Dretske does not stop here. In spite of having begun
from the formal theory of information, he immediately reminds us
Shannon’s remark: “[the] semantic aspects of communication are
irrelevant to the engineering problem. The significant aspect is
that the actual message is one selected from a set of possible
messages” (Shannon, 1948, p. 379). Shannon’s theory is purely
quantitative: it only deals with amounts of information, but ignores
questions related to informational content. The main contribution
of Dretske is his semantic theory of information, which tries to
capture what he considers the nuclear sense of the term ‘informa-
tion’: “A state of affairs contains information about X to just that
extent to which a suitable placed observer could learn some-
thing about X by consulting it” (Dretske, 1981, p. 45). Dretske
defines the informational content of a state r in the following terms
(p. 65):
A state r carries the information that S is F = The condi-
tional probability of S’s being F, given r (and k), is 1 (but,
given k alone, less than 1).
where k stands for what the receiver already knows about the
possibilities existing at the source.
However, unlike what may be supposed, the semantic character

of this proposal does not rely on such a definition of informational
content. Of course, this definition cannot be stated in terms of
the original Shannon’s theory, because it only deals with average
amounts of information. But it can be formulated with the new
quantities referred to the amount of information contained in partic-
ular states. In fact, the concept of informational content can be –
more precisely – defined as follows:
A state rB of the receiver contains the information about
the occurrence of the state sA of the source iff p(sA /rB ) =
1 but p(sA ) < 1, given the knowledge of the probability
distribution over the possible states of the source.
where sA stands for S’s being F. If the right formulas are used, we
can guarantee that:
• If p(sA ) < 1, then I(sA ) > 0 (equation (2.1)), that is, there is a
positive amount of information generated at the source by the
occurrence of sA .
• If p(sA /rB ) = 1, then E(sA , rB ) = 0 (equation (2.9)), that is, the
individual contribution of the pair (sA , rB ) to the equivocation
E is zero. And if E(sA , rB ) = 0, then I(sA , rB ) = I(sA ) (equation
(2.11)).
In other words, the definition says that rB contains the information
about the occurrence of sA iff the amount of information about
sA received through the occurrence of rB is equal to the positive
amount of information generated by the occurrence of sA . Dretske
tries to express a similar idea when he says: “if the conditional
probability of S’s being F (given r) is 1, then the equivocation
of this signal must be 0 and (in accordance with formula 1.5)
the signal must carry as much information about S, I(S, R), as is
generated by S’s being F, I(sF )” (Dretske, 1981, p. 65), where his
formula 1.5 is I(S, R) = I(S) − E. The problem is that this is wrong:
p(sA /rB ) = 1 does not implies that E = 0 and I(S, R) = I(S) (see
equation (1.6)). Why does he uses these formulas, which refer to
average amounts of information, instead of using the new formulas
which refer to the amount of information contained in particular
messages, for which necessity he has strongly argued? The reason
relies again on his formal error: with his definition of E(rB ) (equa-
tion (2.6)), p(sA /rB ) = 1 does not make the individual contribution to
the equivocation E equal to zero and, then, he cannot guarantee that
I(sA , rB ) = I(sA ). Only when the new formulas are properly
corrected, the idea roughly expressed by Dretske can be stated with
precision. In summary, Dretske’s definition of informational content
says nothing that cannot be said in terms of the theory of informa-
tion adapted, in the right way, to deal with particular amounts of
information.
If the semantic character of Dretske’s proposal is not based on
the definition of informational content, where is it based on? For
Dretske, information qualifies as a semantic concept in virtue of
the intentionality inherent in its transmission. And the ultimate
source of this intentionality is the nomic character of the regular-
ities on which the transmission of information depends. The channel
probabilities p(rj /si ) do not represent a set of mere de facto correla-
tions; they are determined by a network of lawful connections
between the states of the source and the states of the receiver:
“The conditional probabilities used to compute noise, equivo-
cation, and amount of transmitted information (and therefore
the conditional probabilities defining the informational content
of the signal) are all determined by the lawful relations that
exist between source and signal. Correlations are irrelevant
unless these correlations are a symptom of lawful connections”
(Dretske, 1981, p. 77). It is true that, in many technological applica-
tions of information theory, statistical data are used to determine the
relevant probabilities. Nevertheless, even in these cases it is assumed
that the statistical correlations are not accidental, but manifesta-
tions of underlying lawful regularities. Indeed, there is normally
an elaborate body of theory that stands behind the attributions of
probabilities. In short, the source of the semantic character of infor-
mation is its intentionality; and information inherits its intentional
properties from the lawful regularities on which it depends.7
Dretske emphasizes the semantic character of information
because it is precisely this character what relates information to
knowledge. Even if the properties F and G are perfectly correlated
– whatever is F is G and vice-versa –, this does not assure us that
we can know that ‘x is G’ by knowing that ‘x is F’. If the correlation
between F and G is a mere coincidence, there is no information
in x’ being F about x’ being G; the first fact tells us nothing about

the second one. In other words, the mere correlations, and even
the exceptionless accidental uniformity, do not supply knowledge.
This fact about information explains why we are sometimes in a
position to know that x is F without being able to tell whether x
is G, despite the fact that every F is G. Only on the basis of this
semantic dimension of information we can affirm that “informa-
tion is a commodity that, given the right recipient, is capable of
yielding knowledge” (Dretske, 1981, p. 47).
Dretske’s definition of informational content also refers to what
the receiver already knows about the possibilities existing at the
source (k). In formal terms, we can say that the amount of infor-
mation about the occurrence of the state sA received through the
occurrence of the state rB , I(sA , rB ), depends not only on the
communication channel but also on the characteristics of the source
S: in particular, it is a function of the probabilities p(s1 ), . . . , p(sn ).
But the definition of the source is not absolute; on the contrary, it
depends on the knowledge about the source available at the receiver
end before the transmission. In other words, the background knowl-
edge is relevant to the received information only to the extent that
it affects the value of the amount of information generated at the
source by the occurrence of a specific state. This fact implies a rela-
tivization of the informational content with respect to the knowledge
available before the transmission. Usually, the relative character of
information is not explicitly considered in technical books about
the subject, where the background knowledge is tacitly taken into
account in the definition of the source; an exception is Bell, who
admits that “the datum point of information is then the whole
body of knowledge possessed at the receiving end before the
communication” (Bell, 1957, p. 7).
However, the relative character of information does not make
the concept less objective or not amenable to precise quantification.
In this sense, Dretske disagrees with Daniel Dennet, who claims
that “the information received by people when they are spoken
to depends on what they already know and is not amenable to
precise quantification” (Dennet, 1969, p. 187). He follows Dennet
in relativizing the information contained in a message but, unlike
Dennet, he correctly asserts that this does not mean that we cannot
precisely quantify such information: if the knowledge available at

the receiver end is accurately determined, the received information
can be quantified with precision. Dretske also stresses the fact that
the background knowledge must not be conceived as a subjective
factor, but rather as a frame of reference regarding to which infor-
mation is defined. The relative character of objective magnitudes is
usual in sciences; in this sense, information is not different from
velocity or simultaneity: only when there is a shift of reference
systems does the need arise to make explicit the relative nature of
the quantity under consideration; but such relativity does not mean
non-objectivity. From Dretske’s viewpoint, only to the extent that
information is conceived as an objective magnitude, the concept of
information can be fruitfully applied to questions in the theory of
knowledge.
3. KOSSO’S INTERACTION-INFORMATION ACCOUNT OF

SCIENTIFIC OBSERVATION
During the last decades, some authors have abandoned the linguistic
approach to the problem of scientific observation – an approach
shared by the positivistic tradition and by its anti-positivistic critics
–, to focus on the study of the observational instruments and
methods used in natural sciences. From this perspective, Dudley
Shapere, Harold Brown and Peter Kosso agree in their attempt to
elucidate the scientific use of the term ‘observable’ by means of
the concept of information. Thus, Shapere proposes the following
analysis: “x is directly observed (observable) if: (i) information
is received (can be received) by an appropriate receptor; and (ii)
that information is (can be) transmitted directly, i.e., without
interference, to the receptor from the entity x (which is the
source of information)” (Shapere, 1988, p. 492). In turn, Brown
defines observation in science in the following way: “To observe
an item I is to gain information about I from the examination of
another item I*, where I* is an item that we (epistemically) see
and I is a member of the causal chain that produced I*” (Brown,
1987, p. 93). By using a terminology more familiar to physicists,
Kosso replaces the idea of causal chain by the idea of interaction;
but again, the concept of information becomes central when he
defines: “The ordered pair <object x, property P> is observ-

able to the extent that there can be an interaction (or a chain of
interactions) between x and an observing apparatus such that
the information ‘that x is P’ is transmitted to the apparatus and
eventually conveyed to a human scientist” (Kosso, 1989, p. 32).
But, how is ‘information’ interpreted in this context? Shapere and
Brown do not explain its meaning, as if the concept of information
lacked interpretative difficulties. On the contrary, Kosso admits that
the concept requires further discussion: “a lot hangs on the notion
of information, and it is only by clarifying this that a full under-
standing of the observing apparatus is made clear and that the
necessary condition of interaction is augmented with sufficient,
epistemic conditions” (Kosso, 1989, pp. 35–36). For this purpose,
he follows Dretske’s semantic theory of information, and adapts it
for elucidating scientific observation: he seeks a concept of infor-
mation that makes room for observation of what is already known.
Then, Kosso introduces the following modification: he calls ‘new
information’ what Dretske calls ‘information’, and adds the descrip-
tion of redundant information for the case in which the observed
fact ‘x is P’ is included in k, that is, in the body of the previous
knowledge about the source.
With this conceptual framework, Kosso analyzes several
examples of observation in physical sciences, classifying them in
examples of entities that are unobservable in principle – when the
physical theory which describes the entity explicitly precludes its
being observed –, examples of unperceivable entities – which can
interact with some non-human device but cannot interact with a
human sense organ – and examples of perceivable entities – which
can interact in an informational way with the human being –. Never-
theless, here we are not interested on this part of Kosso’s study,
but on the concept of information that lies behind his interaction-
information account of observation in physical sciences. Kosso
explicitly says that he borrows heavily from Dretske’s semantic
theory, as if his work were an application of the semantic concept of
information provided by Dretske. However, a careful examination
of both perspectives shows up some differences between them.
The first difference is that, unlike Dretske, Kosso does not exploit
the formal resources of the information theory. This is clear when,
following Dretske, he stresses that it is a mistake to simply identify

the flow of information with a propagation of a causal influence.
His argument proposes a case where the state s of the source causes
the state a of the receiver, but a can also be caused by another state
s of the source; in this case, the occurrence of a at the receiver
does not distinguish between the possible states s and s and, then,
does not allow us to know which was the state of the source. On
this basis, Kosso concludes: “causal interaction is not sufficient
for the conveyance of information” (Kosso, 1989, p. 38). But this
idea can be precisely expressed with the formal theory: if Kosso’s
example is formally represented, we can see that the individual
contribution E(s , a) to the equivocation is not zero and, then, the
amount of information I(s ) generated at the source by the occur-
rence of s is not equal to the individual transinformation I(s , a)
(equation (2.11)). This means that, even if the occurrence of a
tells us that either s or s have occurred, it does not give us the
knowledge of which of both states occurred at the source; but it
is precisely such knowledge what we need for making a scientific
observation. In other words, if – following Dretske – Kosso used
the formal resources of the theory of information, he could formu-
late the concept of scientific observation in more precise terms. In
fact, from an informational approach to scientific observation we
can characterize observation in science as a process of transmission
of information from the observed entity to the receiver through an
equivocation-free channel. This characterization allows us to under-
stand why noise does not prevent observation and to argue for the
conceptual advantages of the informational account of observation
over the causal account. However, these matters are beyond the
purposes of this paper.8
The second difference between Kosso’s and Dretske’s views
is a conceptual divergence about the very nature of information.
Although Dretske claims that the communication channel is defined
by a network of nomic connections between the states of the source
and the states of the receiver, he explicitly declares that a physical
link between source and receiver is not necessary for the transmis-
sion of information. In this sense, he considers the following case
(Dretske, 1981, pp. 38–39):
A source S is transmitting information to both receivers RA and RB

via some physical channel. RA and RB are isolated from one another
in the sense that there is no physical interaction between them.
But Dretske considers that, even though RA and RB are physically
isolated from one another, there is an informational link between
them. According to Dretske, it is correct to say that there is a
communication channel between RA and RB because it is possible to
learn something about RB by looking at RA and viceversa. Nothing
at RA causes anything at RB or viceversa; yet RA contains informa-
tion about RB and RB about RA . Dretske stresses the fact that the
correlations between the events occurring at both receivers are not
accidental, but they are functions of the common nomic dependen-
cies of RA and RB on S. However, for him this is an example of
an informational link between two points, despite the absence of a
physical channel between them. Dretske adds that the receiver RB
may be farther from the source than RA and, then, the events at RB
may occur later in time than those at RA , but this is irrelevant for
evaluating the informational link between them: even though the
events at RB occur later, RA carries information about what will
happen at RB . In short: “from a theoretical point of view [. . .]
the communication channel may be thought of as simply the
set of depending relations between S and R. If the statistical
relations defining equivocation and noise between S and R are
appropriate, then there is a channel between these two points,
and information passes between them, even if there is no direct
physical link joining S with R” (Dretske, 1981, p. 38).
As we have seen, in his interaction-information account of
scientific observation, Kosso asserts that interaction is not a suffi-
cient condition for information flow. But he also claims that: “obser-
vation must involve interaction. Interaction between x and an
observing apparatus is a necessary condition for observation”

(Kosso, 1989, pp. 34–35). This last requirement for observation
does not seem to be added to the demand of an information flow
between the observed entity and the receiver; on the contrary, it
seems a result of the very concept of information adopted by
Kosso, when he claims that “information is transferred between
states through interaction. The object in state s which has
informational content (s is P) interacts with something else,
the observing apparatus or some intermediate informational
medium, with the result that this latter object is left in a state
A which has the information (s is P) whereas it did not have that
information before the interaction” (Kosso, 1989, p. 37). This
quote suggests that Kosso would not agree with Dretske regarding
the example of the source transmitting to two receivers: certainly
Kosso would not accept that we can observe the events at RB by
looking at RA ; but surely he would neither accept that information
flows from RA to RB with no physical link between them. If this
is right, despite its own assumption, Kosso does not completely
agree with Dretske’s view about information: instead of conceiving
the concept of information as a semantic concept, his concep-
tion approaches the perspective most usually adopted in physical
sciences, where an unavoidable link between flow of information
and propagation of signals is required. Physicists and engineers
accept the well-known dictum ‘no information without represen-
tation’: the transmission of information between two points of the
physical space necessarily requires an information-bearing signal,
that is, a physical process propagating from one point to the other.
This perspective is adopted when the correlations between spatially
separate quantum systems are considered: any analysis of EPR-
experiment9 stresses the fact that there is no information flowing
between the two particles, because the propagation of a superlu-
minal signal from one particle to the other is impossible. From this
view, information is a physical entity, which can be generated, accu-
mulated, stored, processed, converted from one form to another, and
transmitted from one place to another. Precisely due to the physical
nature of information, the dynamics of its flow is ruled by natural
laws; in particular, it is constrained by relativistic limitations. The
extreme versions of this view conceive information as a physical
entity with the same ontological status as energy, and whose essen-
tial property is to manifest itself as structure when added to matter
(cfr. Stonier, 1990).
This kind of situations, where the correlations between two
points A and B are explained by lawful regularities but there is
no signal propagation between them,10 shows up that Dretske and
Kosso are using two different concepts of information. According
to the semantic concept, information is defined by its capability of
providing knowledge. From this view, the possibility of controlling
the states at A to send information to B is not a necessary condi-
tion for defining an information channel between A and B: the only
requirement for an informational link between both points is the
possibility of knowing the state at A by looking at B. According to
the physical concept, information is a physical entity whose essen-
tial feature is its capability of being generated at one point of the
physical space and transmitted to another point. This view requires
an information-bearing signal that can be modified at the transmitter
end in order to carry information to the receiver end. Therefore, if
there is no physical link between A and B, it is impossible to define
an information channel between them: we cannot control the states
at A to send information to B.
The divergence between the semantic view and the physical view
of information acquires great relevance when the concept of infor-
mation is applied to philosophical problems. In particular, when the
concept is used to elucidate the notion of scientific observation,
this interpretative divergence becomes explicit in the case of the
so-called ‘negative experiments’. Negative experiments were origi-
nally proposed as a theoretical tool for analyzing the measurement
problem in quantum mechanics (cfr. Jammer, 1974, pp. 495–496);
but here we will only use them to show the consequences of the
choice between both concepts of information. In a negative exper-
iment, it is assumed that an event has been observed by noting
the absence of some other event; this is the case of neutral weak
currents, which are observed by noticing the absence of charged
muons (cfr. Brown, 1987, pp. 70–75). But the conceptual core of
negative experiments can be understood by means of a very simple
example. Let us suppose a tube in whose middle point a particle is
emitted at t0 towards one of the ends of the tube. Let us also suppose
that we place a detection device at the right end A in order to know

in which direction the particle was emitted.
If after the appropriate time t1 – depending on the velocity of the

particle and the length of the tube – the device indicates no detection,
we can conclude that the particle was emitted towards the left side
of the tube. At this point, we can guarantee two facts:
• there is a perfect anticorrelation between both ends of the tube.
Then, by looking at the state – presence or absence of the
particle – at the right end A, we can know the state – absence or
presence, respectively – at the left end B.
• the instantaneous propagation of a signal between A and B at t1
is physically impossible.
The question is: do we have observed the direction of the emitted
particle?
From an informational account of scientific observation, the
answer depends on the view about information adopted for elucidat-
ing the notion of observation:
• if the semantic view is adopted, a communication channel
between both ends of the tube can be defined. Then, there is
a flow of information from B to A, which allows us to observe
the presence of the particle at B, even though there is no signal
propagating from B to A.
• if the physical view is adopted, there is no information flow
from B to A because there is not, and there cannot be, a signal
instantaneously propagating between B and A at t1 . Then, we
do not observe the presence of the particle at B.
In other words, according to the semantic view of information,
by looking at the detector we simultaneously observe two events,
presence-at-B and absence-at-A. On the contrary, the physical view
leads us to a concept of observation more narrow than the previous
one: by looking at the detector we observe the state at A – presence
or absence –, but we do not observe the state at B; such a state is

inferred.
This discussion shows that it is possible to agree on the formal
theory of information and even on some interpretative points but,
despite this, to dissent on the very nature of information. Informa-
tion may be conceived as a semantic item, whose essential property
is its capability of providing knowledge. But information may also
be regarded as a physical entity ruled and constrained by natural
laws.
4. THE SYNTACTIC APPROACH OF COVER AND THOMAS
The physical view of information has been the most widespread

view in physical sciences. Perhaps this fact was due to the specific
technological problems which led to the original theory of Shannon:
the main interest of communication engineers was, and still is,
to optimize the transmission of information by means of physical
signals, whose energy and bandwidth is constrained by techno-
logical and economic limitations. In fact, the physical view of
information is the most usual in the textbooks on the subject used
in engineer’s training. However, this situation is changing in recent
times: one can see that some very popular textbooks introduce infor-
mation theory in a completely syntactic way, with no mention of
sources, receivers or signals. Only when the syntactic concepts and
their mathematical properties have been presented, the theory is
applied to the traditional case of signal transmission.
Perhaps the best example of this approach is the presentation
offered by Thomas Cover and Joy Thomas in his book Elements of
Information Theory (1991).11 Just from the beginning of this book,
the authors clearly explain their perspective: “Information theory
answers two fundamental questions in communication theory:
what is the ultimate data compression [. . .] and what is the ulti-
mate transmission rate of communication [. . .]. For this reason
some consider information theory to be a subset of communi-
cation theory. We will argue that it is much more. Indeed, it
has fundamental contributions to make in statistical physics
(thermodynamics), computer sciences (Kolmogorov complexity
or algorithmic complexity), statistical inference (Occam’s
Razor: ‘The simplest explanation is best’) and to probability

and statistics (error rates for optimal hypothesis testing and
estimation)” (Cover and Thomas, 1991, p. 1). On the basis of
this general purpose, they define the basic concepts of information
theory in terms of random variables and probability distributions
over their possible values. Let X and Y be two discrete random vari-
ables with alphabets A and B, and probability mass functions p(x) =
Pr(X = x), x ∈ A and p(y) = Pr(Y = y), y ∈ B respectively. In general,
they call ‘entropy’ what we called ‘average amount of information’.
Thus, the entropy H(X) of a discrete random variable is defined by:
(4.1) H (X) = p(x)log1/p(x)
Next, Cover and Thomas extend the definition of entropy to a pair

of discrete random variables: the joint entropy H(X, Y) of X and Y
with a joint distribution p(x, y) is defined as:
(4.2) H (X, Y ) = p(x, y)log1/p(x, y)
And the conditional entropy H(X/Y) of X given Y is defined as:
(4.3) H (X/Y ) = p(x, y)log1/p(x/y)
The naturalness of these definitions from the viewpoint of proba-

bility theory is exhibited by the fact that the entropy of a pair of
random variables is the entropy of one of them plus the conditional
entropy of the other:
(4.4) H (X, Y ) = H (X) + H (Y/X)
Thus, what we had originally called ‘equivocation’ E (equation

(1.6)) – ‘noise’ N (equation (1.7)) – here becomes the conditional
entropy H(X/Y) − H(Y/X) –. Cover and Thomas also define the
relative entropy D(p//q) between two probability mass functions
p(x) and q(x) as:
(4.5) D(p//q) = p(x)logp(x)/q(x)
The relative entropy D(p//q) is a measure of the inefficiency of

assuming that the distribution is q when the true distribution is p;
then, D(p//q) is always non-negative, and is zero if and only if p
= q.12 With these elements, Cover and Thomas define the mutual
information I(X, Y) – which we called ‘transinformation’- as the
relative entropy between the joint distribution p(x, y) and the product
distribution p(x)p(y):
(4.6) I (X, Y ) = p(x, y)logp(x, y)/p(x)p(y)
Thus, the mutual information is the reduction of the uncertainty of
X given Y.
On the basis of these concepts, Cover and Thomas demonstrate
the relationships among entropy, joint entropy, conditional entropy
and mutual information, and express them in the well-known
diagram:
(4.7) I (X, Y ) = H (X) − H (Y/X) = H (Y ) − H (X/Y )

(4.8) H (X, Y ) = H (X) + H (Y ) − I (X, Y )
where the first of both formulas is the analogue of equation (1.5),

which was expressed in terms of equivocation and noise. Since here
the concepts are introduced in terms of random variables and their
correlations, the authors can extend the definitions to the case of
more than two random variables. For example, they define (1991,
pp. 21–23) the entropy H(X1 , . . ., Xn ) of a collection of random vari-
ables, the conditional mutual information I(X, Y/Z) of the random
variables X and Y given Z, and the conditional relative entropy
D(p(y/x)//q(y/x)).
This brief summary of the way in which Cover and Thomas
present the theory of information shows that this approach adopts
a syntactic concept of information. From this perspective, the defini-
tion of information has nothing to do with communication, trans-

mission and reception of messages, nor with the knowledge of an
event obtained by looking at another event: here, the only ‘objects’
of the theory are random variables and their correlations. As we have
seen, even though Dretske admits the possibility of a communica-
tion channel with no physical substratum, he nevertheless requires
that the conditional probabilities defining the channel result from the
nomic dependence between the states of the source and the states of
the receiver. But from the perspective of Cover and Thomas, the
concept of information loses even this semantic ingredient: it is
legitimate to define the mutual information of two variables even
if there is no nomic relationship between them and their condi-
tional probabilities are computed exclusively by means of de facto
correlations. For example, if the last month results of the lottery of
Sydney partially coincide with the results obtained in the lottery of
Mexico during the same period, there is a positive mutual informa-
tion between both sequences of completely independent events. If
the concept of information is so deprived of the intentional character
required by Dretske, any link between information and knowledge
vanishes: as we have seen, when the correlation between two vari-
ables is merely accidental, the value of one of them tells us nothing
about the value of the other. In short, from this syntactic view,
we lose the basic intuition according to which information modi-
fies the state of knowledge of those who receive such information.
This might seem a too high price to pay for retaining the syntactic
approach, despite its elegance and mathematical precision.
However, the position of Cover and Thomas has its own advan-
tages. By turning information into a syntactic concept, this approach
makes the theory of information applicable to a variety of fields.
Among them, communication by means of physical signals is only
one of the many applications. Thus, after the chapter that introduces
the basic concepts of the theory, Cover and Thomas devote the next
chapters of their book to explain how such concepts answer very
different problems.
A concept usually associated with information is the concept of
thermodynamic entropy. The well-known Boltzmann’s equation for
the entropy of a macrostate M,
SB (M) = k logW
where k is Boltzmann’s constant, and W is the number of micro-

states compatible with M – is isomorphic with the equation for the
contribution of xi to the entropy H(X) of the variable X. In fact, if the
microstates are equiprobable, the probability of the macrostate M is
1/W. But, which is the relationship between informational entropy
and thermodynamic entropy? In his thoughtful discussion on this
point, Jeffrey Wicken makes a valuable argumentative effort to
stress the difference between thermodynamic entropy and Shannon
entropy as used in communication theory; in this context, he claims
that: “while the Shannon equation is symbolically isomorphic
with the Boltzmann equation, the meanings of the respective
equations bear little in common” (Wicken, 1987, p. 179). This is
certainly true if one adopts the physical interpretation of the concept
of information. But from a syntactic interpretation, Wicken’s claim
loses its original sense, not because both concepts have the same
meaning, but because the concept of information, as a purely
syntactic concept, completely lacks semantic content.
In this field, Cover and Thomas go further by formulating a
version of the Second Law of Thermodynamics in informational
terms. In particular, they explain the increasing of the coarse grained
entropy proposed by Gibbs:
Scg = kPi log1/Pi
where Pi is the probability corresponding of a cell i resulting from

a coarse grained partition of the phase space. Let pn (x) be the prob-
ability distribution over the cells at time tn , and let us suppose that
such a distribution evolves as a Markov chain. Cover and Thomas
(1991, pp. 34–35) demonstrate that the relative entropy between
pn (x) and the uniform stationary distribution p(x) = α – which
represents thermodynamic equilibrium – monotonically decreases
with time. Now, by using the definition of relative entropy (equation
(4.5)), we have:
D(pn //p) = pn (x)logpn (x)/p(x) = pn (x)log1/p(x)
−pn (x)log1/pn (x)
Hence, by equation (4.1):
D(pn //p) = logα − H (Xn )

Therefore, the monotonic decrease in the relative entropy implies

the monotonic increase in the informational entropy H(Xn ), which
here represents the coarse grained thermodynamic entropy Scg .
These results presented by Cover and Thomas show that to accept
the conceptual difference between thermodynamic entropy and
Shannon entropy does not lead to conclude that the syntactic concept
of information is useless in thermodynamics. On the contrary,
Boltzmann’s entropy and coarse grained entropy can be fruitfully
treated by means of the concepts supplied by the syntactic theory of
information.
Another discipline where the syntactic approach to information
shows its applicability is computer science, in particular, the field
of algorithmic complexity. Let X be a finite length binary string; the
algorithmic complexity (Kolmogorov complexity) of X is defined
as:
K(X) = minp l(p)
where p is a Turing machine program that prints X and halts, and

l(p) is the length of p. Then, K(X) is the shortest description length
over all the descriptions supplied by a Turing machine. Intuitively,
a string has maximum algorithmic complexity when the shortest
program that prints it has approximately the same length as such
a string. Cover and Thomas (1991, p. 154) demonstrate that the
expected value of the algorithmic complexity of a sequence X is
close to its informational entropy H(X). Thus, a well-known result
about data compression can be seen in a new light: H(X) is a
lower bound on the average length of the shortest description of the
sequence X; but H(X) is also close to the algorithmic complexity
of X. Therefore, through the concept of informational entropy, the
algorithmic complexity of a sequence becomes a measure of its
incompressibility.
Cover and Thomas extend these results in order to elucidate the
controversial principle of simplicity, according to which, if there
are many explanations consistent with the observed data, one must
choose the simplest one. They demonstrate (1991, pp. 160–161)
that, if p is a program that produces the string X, the probability
of p is 2−l(p) ; hence, short programs are much more probable than
longer ones. If X is interpreted as the sequence of observed data and
p is the explanatory algorithm for such data, then this result can be
used to justify the choice of the shortest -the simplest- explanation of
data. Although one may disagree with this interpretation of Occam’s
Razor, it must be admitted that this is an interesting and precise
elucidation of the concept of simplicity usually invoked in scientific
research.
These are only some examples of the many applications of the
syntactic concept of information. Other fields where the concept is
useful are the generalization of gambling processes, the theory of
optimal investment in the stock market and the computing of error
rates for optimal hypothesis testing. From this syntactic perspective,
communication theory is only an application – of course, a very
important one – of the theory of information: “While it is clear
that Shannon was motivated by problems in communication
theory, we treat information theory as a field of its own with
applications to communication theory and statistics” (Cover and
Thomas, 1991, p. viii). In summary, from the syntactic approach the
concept of information acquires a generality that makes it a powerful
formal tool for science. However, this generality is obtained at
the cost of losing its meaning links with concepts as knowledge
or communication. From this view, the word ‘information’ does
not belong to the language of factual sciences or to ordinary
language: it has no semantic content. The concept of information is a
scientific but completely formal concept, whose ‘meaning’ only has
a syntactic dimension; its generality derives from this exclusively
syntactic nature. Therefore, the theory of information becomes a
mathematical theory, a chapter of the theory of probability: only
when its concepts are semantically interpreted, the theory can be
applied to very different fields.
5. WHAT IS INFORMATION?
From our previous discussion it is clear that there is not a single

answer for this question. We have shown that there are different
concepts of information, each one of them useful for different
purposes. The semantic concept strongly links information to
knowledge: information is essentially something capable of yielding
knowledge; this concept is useful for cognitive and semantic studies.
The physical concept is the one used in communication theory:

here information is a physical entity that can be generated, trans-
mitted and received for practical purposes. The syntactic concept
is a formal notion with no reference: in this sense, the theory of
information is a mathematical theory, in particular, a chapter of the
theory of probability.
The question is: which are the relationships among these three
concepts? When we talk about three concepts of information we
do not mean that we are facing three rival views, among which
we must choose the correct one. All the three concepts are legiti-
mate when properly used. The relationship between the syntactic
concept and the other two is the relationship between a mathematical
object and its interpretations. The wave equation may represent
the mechanical motion of a material medium or the dynamics of
an electromagnetic wave: both cases share nothing else than their
syntactic structure. Analogously, the informational entropy H(X)
and the mutual information I(X, Y), as syntactic concepts, have
no reference: their syntactic ‘meaning’ is given by the role played
in the mathematical theory to which they belong. But when these
syntactic concepts are interpreted, they acquire referential content.
In the semantic theory of information, the relevant quantities are
not the average quantities but their individual correlates I(si ) and
I(si , rj ): when both amounts of semantic information are equal, the
occurrence of the state of affairs rj gives us the knowledge of the
occurrence of the state of affairs si . In communication theory, H(S)
measures the average amount of the physical information generated
at the source S, and this physical information is transmitted to the
receiver by means of a carrier signal. But these are not the only
possible interpretations. In computer science, if X is interpreted as a
finite length binary string, H(X) can be related with the algorithmic
complexity of X. If, in thermodynamics, X is interpreted as a macro-
state compatible with W equiprobable states, H(X) represents the
Boltzmann’s thermodynamic entropy of X; the understanding of the
relationship between the syntactic concept of information and its
interpretations serves to evaluate the usually obscure extrapolations
from communication theory to thermodynamics.
This discussion suggests that there is a severe terminological
problem here. Usually, various meanings are subsumed under the
term ‘information’, and many disagreements result from lacking a

terminology precise enough to distinguish the different concepts of
information. Therefore, a terminological cleansing is required in
order to avoid this situation. My own proposal is to use the word
‘information’ only for the physical concept: this option preserves
not only the generally accepted links between information and
knowledge, but also the well-established meaning that the concept
of information has in physical sciences. I think that this terminolo-
gical choice retains the pragmatic dimension of the concept to the
extent that it agrees with the vast majority of the uses of the term.
But, what about the semantic and the syntactic concepts? Perhaps
Dretske’s main goal of applying the concept of information to ques-
tions in the theory of knowledge can be also achieved by means
of the physical concept, without commitments with non-physical
information channels. Regarding the syntactic view, it would be
necessary to find a new name that expresses the purely mathema-
tical nature of the theory, avoiding confusions between the formal
concepts and their interpretations. Of course, this terminological
cleansing is not an easy task, because it entails a struggle against
the imprecise application of a vague notion of information in many
contexts. Nevertheless, this becomes a valuable task when we want
to avoid conceptual confusions and futile disputes regarding the
nature of information.
NOTES
1. Here we work with discrete situations, but the definitions can be extended to
the continuous case (cfr. Cover and Thomas, 1991, pp. 224–225).
2. In his original paper, Shannon (1948, p. 349) discusses the reason for the
choice of a logarithmic function and, in particular, of the logarithm to the
base 2 for measuring information.
3. If the natural logarithm is used, the resulting unit of information is called
‘nat’ – a contraction of natural unit –. If the logarithm to base 10 is used,
then the unit of information is the Hartley. The existence of different units
for measuring information shows the importance of distinguishing between
the amount of information associated with an event and the number of binary
symbols necessary to codify the event.
4. Shannon’s Second Theorem demonstrates that the channel capacity is the
maximum rate at which we can send information over the channel and recover
the information at the receiver with a vanishingly low probability of error
(cfr., for instance, Abramson, 1963, pp. 165–182).
5. Dretske uses Is(r) for the transinformation and Is(ra ) for the new individual
transinformation. We have adapted Dretske’s terminology in order to bring it
closer to the most usual terminology in this field.
6. In fact, the right-hand term of (2.7), when (2.5) and (2.6) are used, is:
i j p(si , rj )I(si , rj ) = i j p(si , rj )[I(si ) − E(rj )] = i j p(si , rj )log1/
p(si ) − i j p(si , rj )k p(sk /rj )log1/p(sk /rj ). Perhaps, Dretske made the
referred mistake by misusing the subindices of the summations.
7. Dretske says that, in this context, it is not relevant to discuss where the inten-
tional character of laws comes from: “For our purpose it is not important
where natural laws acquire this puzzling property. What is important is
that they have it” (Dretske, 1981, p. 77).
8. I have argued for this view of scientific observation elsewhere (Lombardi,
“Observación e Información”, future publication in Analogia): if we want
that every state of the receiver lets us know which state of the observed
entity occurred, it is necessary that the so-called “backward probabilities”
p(si /rj ) (cfr. Abramson, 1963, p. 99) have the value 0 or 1, and this happens
in an equivocation-free channel. This explains why noise does not prevent
observation: indeed, practical situations usually include noisy channels, and
much technological effort is devoted to design appropriate filters to block the
noise bearing spurious signal. I have also argued that, unlike the informational
account of observation, the causal account does not allow us to recognize
(i) situations observationally equivalent but causally different, and (ii) situ-
ations physically – and, then, causally – identical but informationally different
which, for this reason, represent different cases of observation.
9. The experiment included in the well-known article of Einstein, Podolsky and
Rosen (1935).
10. Note that this kind of situations does not always involve a common cause. In
Dretske’s example of the source transmitting to two receivers, the correlations
between RA and RB can be explained by a common cause at S. But it is usually
accepted the impossibility of explaining quantum EPR-correlations by means
of a common cause argument (cfr. Hughes, 1989). However, in both cases
correlations depend on underlying nomic regularities.
11. This does not mean that Cover and Thomas are absolutely original. For
example, Reza (1961, p. 1) considers information theory as a new chapter
of the theory of probability; however, his presentation of the subject follows
the orthodox way of presentation in terms of communication and signal trans-
mission. An author who adopts a completely syntactic approach is Khinchin
(1957); nevertheless his text is not as rich in applications as the book of Cover
and Thomas and was not so widely used.
12. In the definition of D(p//q), the convention – based on continuity arguments
– that 0 log 0/q = 0 and p log p/0 = ∞ is used. D(p//q) is also referred to
as the ‘distance’ between the distributions p and q; however, it is not a true
distance between distributions since it is not symmetric and does not satisfy
the triangle inequality.
REFERENCES
Abramson, N.: 1963, Information Theory and Coding. New York: McGraw-Hill.
Bell, D.A.: 1957, Information Theory and its Engineering Applications. London:
Pitman & Sons.
Brown, H.I.: 1987, Observation and Objectivity. New York/Oxford: Oxford
University Press.
Cover, T. and J.A. Thomas: 1991, Elements of Information Theory. New York:
John Wiley & Sons.
Dennet, D.C.: 1969, Content and Conciousness. London: Routledge & Kegan
Paul.
Dretske, R.: 1981, Knowledge and the Flow of Information. Cambridge, MA: MIT
Press.
Einstein, A., B. Podolsky and N. Rosen: 1935, Can Quantum-Mechanical
Description of Physical Reality be Considered Complete? Physical Review 47:
777–780.
Hughes, R.I.G.: 1989, The Structure and Interpretation of Quantum Mechanics.
Cambridge, MA: Harvard University Press.
Jammer, M.: 1974, The Philosophy of Quantum Mechanics. New York: John
Wiley & Sons.
Khinchin, A.I.: 1957, Mathematical Foundations of Information Theory. New
York: Dover Publications.
Kosso, P.: 1989, Observability and Observation in Physical Science. Dordrecht:
Kluwer Academic Publishers.
Reza, F.M.: 1961, Introduction to Information Theory. New York: McGraw-Hill.
Shannon, C.: 1948, The Mathematical Theory of Communication. Bell System
Technical Journal 27: 379–423.
Shapere, D.: 1982, The Concept of Observation in Science and Philosophy.
Philosophy of Science 49: 485–525.
Stonier, T.: 1990, Information and the Internal Structure of the Universe. London:
Springer-Verlag.
Wicken, J.S.: 1987, Entropy and Information: Suggestions for Common
Language. Philosophy of Science 54: 176–193.
University Nacional de Quilmes-CONICET

Crisólogo Larralde 3440
6◦ D, 1430, Ciudad de Buenos Aires
Argentina
E-mail: olimpiafilo@arnet.com.ar
View publication stats

What Is Information

Uploaded by

Copyright:

Available Formats

What Is Information

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

What Is Information

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Article in Foundations of Science · January 2004

VII Conference On Quantum Foundations - Website: https://sites.google.com/site/viijornadasfundamentoscuantica/ View project

Problemas filosóficos de la dinámica científica View project

The user has requested enhancement of the downloaded file.

KEY WORDS: communication, information, knowledge, observation, prob-

‘We live in the age of information’: this sentence has become a

Foundations of Science 9: 105–134, 2004.

of information, the perspective adopted by Peter Kosso in his

1. BASIC CONCEPTS OF SHANNON’S THEORY

We will begin our discussion by presenting some basic concepts of

If S has a range of possible states s1 , . . ., sn , whose probabilities of

(1.1) I (si ) = log1/p(si )

Where ‘log’ is the logarithm to the base 2.2 The choice of a

• I(S, R) transinformation. Average amount of information

The strong relationship between the characteristics of the channel

2. INFORMATION AND KNOWLEDGE: DRETSKE’S SEMANTIC

A concept usually connected with the notion of information is

concerned with seeking an information-based theory of knowledge,

(2.1) I (sa ) = log1/p(sa )

and instead of adopting the transinformation I(S, R) as the relevant

(2.2) I (sa , ra ) = I (sa ) − E(ra )

(2.3) E(ra ) = p(si /ra )log1/p(si /ra )

According to Dretske (1981, p. 24), E(ra ) is the contribution of ra

(2.4) E = p(rj )p(si /rj )log1/p(si /rj ) = p(rj )E(rj )

Dretske foresees that he will be accused of misrepresenting or

to standard applications of the theory, it is “perfectly consistent”

(2.5) I (si , rj ) = I (si ) − E(rj )

where E(rj ) would be (by analogy with equation (2.3)):

(2.6) E(rj ) = p(si /rj )log1/p(si /rj )

However, we have not reached the central difficulty yet. When

(2.7) I (S, R) = p(si , rj )I (si , rj )

where I(S, R) = I(S) − E (equation (1.5)), with the standard defini-

(2.8) I (S, R) = I (S) − E = p(si )log1/p(si )

(2.9) E(si , rj ) = log1/p(si /rj )

With this definition, it holds that the average of E(si , rj ) is equal to

(2.10) E = p(rj , si )log1/p(si /rj ) = p(rj , si )E(si , rj )

Now we can correctly rewrite equation (2.5) as:

(2.11) I (si , rj ) = I (si ) − E(si , rj )

where the average of I(si , rj ) is the transinformation I(S, R):

(2.12) I (S, R) = I (S) − E = p(si , rj )I (si , rj )

This modified version of the formulas makes possible to reach

equal to the amount of information I(si ) generated at the source by

However, unlike what may be supposed, the semantic character

in x’ being F about x’ being G; the first fact tells us nothing about

precisely quantify such information: if the knowledge available at

3. KOSSO’S INTERACTION-INFORMATION ACCOUNT OF

defines: “The ordered pair <object x, property P> is observ-

following Dretske, he stresses that it is a mistake to simply identify

A source S is transmitting information to both receivers RA and RB

observing apparatus is a necessary condition for observation”

that we place a detection device at the right end A in order to know

If after the appropriate time t1 – depending on the velocity of the

or absence –, but we do not observe the state at B; such a state is

4. THE SYNTACTIC APPROACH OF COVER AND THOMAS

The physical view of information has been the most widespread

Razor: ‘The simplest explanation is best’) and to probability

(4.1) H (X) = p(x)log1/p(x)

Next, Cover and Thomas extend the definition of entropy to a pair