What Is Information
What Is Information
What Is Information
net/publication/226335374
What is Information?
CITATIONS READS
23 49,598
1 author:
Olimpia Lombardi
CONICET-National Scientific and Technical Research Council
161 PUBLICATIONS 1,856 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Olimpia Lombardi on 13 March 2014.
WHAT IS INFORMATION?
ABSTRACT. The main aim of this work is to contribute to the elucidation of the
concept of information by comparing three different views about this matter: the
view of Fred Dretske’s semantic theory of information, the perspective adopted by
Peter Kosso in his interaction-information account of scientific observation, and
the syntactic approach of Thomas Cover and Joy Thomas. We will see that these
views involve very different concepts of information, each one useful in its own
field of application. This comparison will allow us to argue in favor of a termino-
logical ‘cleansing’: it is necessary to make a terminological distinction among
the different concepts of information, in order to avoid conceptual confusions
when the word ‘information’ is used to elucidate related concepts as knowledge,
observation or entropy.
where:
108 OLIMPIA LOMBARDI
where:
from the right-hand term of (2.7) when (2.5) and (2.6) are used.6
Therefore, we cannot accept Dretske’s response to the critics who
accuse him of misunderstanding Shannon’s theory: his ‘interpre-
tation’ of the formulas by means of these new quantities is not
compatible with the formal structure of the theory.
It might be argued that this is a minor formal detail, but this
detail has relevant conceptual consequences. When Dretske defines
E(rj ), that is, the contribution of rj to the equivocation E as a
summation over the si (equation (2.6)), he makes the error of
supposing that this individual contribution is only function of the
particular state rj of the receiver. But the equivocation E is a
magnitude that essentially depends on the communication channel.
Then, any individual contribution to E must preserve such a depend-
ence. The understanding of this conceptual point allows us to
retain Dretske’s proposal by appropriately correcting his formal
approach. In order to introduce such a correction, we must define
the individual contribution of the pair (si , rj ) to the equivocation E
as:
tion (2.6)), p(sA /rB ) = 1 does not make the individual contribution to
the equivocation E equal to zero and, then, he cannot guarantee that
I(sA , rB ) = I(sA ). Only when the new formulas are properly
corrected, the idea roughly expressed by Dretske can be stated with
precision. In summary, Dretske’s definition of informational content
says nothing that cannot be said in terms of the theory of informa-
tion adapted, in the right way, to deal with particular amounts of
information.
If the semantic character of Dretske’s proposal is not based on
the definition of informational content, where is it based on? For
Dretske, information qualifies as a semantic concept in virtue of
the intentionality inherent in its transmission. And the ultimate
source of this intentionality is the nomic character of the regular-
ities on which the transmission of information depends. The channel
probabilities p(rj /si ) do not represent a set of mere de facto correla-
tions; they are determined by a network of lawful connections
between the states of the source and the states of the receiver:
“The conditional probabilities used to compute noise, equivo-
cation, and amount of transmitted information (and therefore
the conditional probabilities defining the informational content
of the signal) are all determined by the lawful relations that
exist between source and signal. Correlations are irrelevant
unless these correlations are a symptom of lawful connections”
(Dretske, 1981, p. 77). It is true that, in many technological applica-
tions of information theory, statistical data are used to determine the
relevant probabilities. Nevertheless, even in these cases it is assumed
that the statistical correlations are not accidental, but manifesta-
tions of underlying lawful regularities. Indeed, there is normally
an elaborate body of theory that stands behind the attributions of
probabilities. In short, the source of the semantic character of infor-
mation is its intentionality; and information inherits its intentional
properties from the lawful regularities on which it depends.7
Dretske emphasizes the semantic character of information
because it is precisely this character what relates information to
knowledge. Even if the properties F and G are perfectly correlated
– whatever is F is G and vice-versa –, this does not assure us that
we can know that ‘x is G’ by knowing that ‘x is F’. If the correlation
between F and G is a mere coincidence, there is no information
116 OLIMPIA LOMBARDI
During the last decades, some authors have abandoned the linguistic
approach to the problem of scientific observation – an approach
shared by the positivistic tradition and by its anti-positivistic critics
–, to focus on the study of the observational instruments and
methods used in natural sciences. From this perspective, Dudley
Shapere, Harold Brown and Peter Kosso agree in their attempt to
elucidate the scientific use of the term ‘observable’ by means of
the concept of information. Thus, Shapere proposes the following
analysis: “x is directly observed (observable) if: (i) information
is received (can be received) by an appropriate receptor; and (ii)
that information is (can be) transmitted directly, i.e., without
interference, to the receptor from the entity x (which is the
source of information)” (Shapere, 1988, p. 492). In turn, Brown
defines observation in science in the following way: “To observe
an item I is to gain information about I from the examination of
another item I*, where I* is an item that we (epistemically) see
and I is a member of the causal chain that produced I*” (Brown,
1987, p. 93). By using a terminology more familiar to physicists,
Kosso replaces the idea of causal chain by the idea of interaction;
but again, the concept of information becomes central when he
118 OLIMPIA LOMBARDI
entity with the same ontological status as energy, and whose essen-
tial property is to manifest itself as structure when added to matter
(cfr. Stonier, 1990).
This kind of situations, where the correlations between two
points A and B are explained by lawful regularities but there is
no signal propagation between them,10 shows up that Dretske and
Kosso are using two different concepts of information. According
to the semantic concept, information is defined by its capability of
providing knowledge. From this view, the possibility of controlling
the states at A to send information to B is not a necessary condi-
tion for defining an information channel between A and B: the only
requirement for an informational link between both points is the
possibility of knowing the state at A by looking at B. According to
the physical concept, information is a physical entity whose essen-
tial feature is its capability of being generated at one point of the
physical space and transmitted to another point. This view requires
an information-bearing signal that can be modified at the transmitter
end in order to carry information to the receiver end. Therefore, if
there is no physical link between A and B, it is impossible to define
an information channel between them: we cannot control the states
at A to send information to B.
The divergence between the semantic view and the physical view
of information acquires great relevance when the concept of infor-
mation is applied to philosophical problems. In particular, when the
concept is used to elucidate the notion of scientific observation,
this interpretative divergence becomes explicit in the case of the
so-called ‘negative experiments’. Negative experiments were origi-
nally proposed as a theoretical tool for analyzing the measurement
problem in quantum mechanics (cfr. Jammer, 1974, pp. 495–496);
but here we will only use them to show the consequences of the
choice between both concepts of information. In a negative exper-
iment, it is assumed that an event has been observed by noting
the absence of some other event; this is the case of neutral weak
currents, which are observed by noticing the absence of charged
muons (cfr. Brown, 1987, pp. 70–75). But the conceptual core of
negative experiments can be understood by means of a very simple
example. Let us suppose a tube in whose middle point a particle is
emitted at t0 towards one of the ends of the tube. Let us also suppose
WHAT IS INFORMATION? 123
= q.12 With these elements, Cover and Thomas define the mutual
information I(X, Y) – which we called ‘transinformation’- as the
relative entropy between the joint distribution p(x, y) and the product
distribution p(x)p(y):
(4.6) I (X, Y ) = p(x, y)logp(x, y)/p(x)p(y)
Thus, the mutual information is the reduction of the uncertainty of
X given Y.
On the basis of these concepts, Cover and Thomas demonstrate
the relationships among entropy, joint entropy, conditional entropy
and mutual information, and express them in the well-known
diagram:
SB (M) = k logW
128 OLIMPIA LOMBARDI
p is the explanatory algorithm for such data, then this result can be
used to justify the choice of the shortest -the simplest- explanation of
data. Although one may disagree with this interpretation of Occam’s
Razor, it must be admitted that this is an interesting and precise
elucidation of the concept of simplicity usually invoked in scientific
research.
These are only some examples of the many applications of the
syntactic concept of information. Other fields where the concept is
useful are the generalization of gambling processes, the theory of
optimal investment in the stock market and the computing of error
rates for optimal hypothesis testing. From this syntactic perspective,
communication theory is only an application – of course, a very
important one – of the theory of information: “While it is clear
that Shannon was motivated by problems in communication
theory, we treat information theory as a field of its own with
applications to communication theory and statistics” (Cover and
Thomas, 1991, p. viii). In summary, from the syntactic approach the
concept of information acquires a generality that makes it a powerful
formal tool for science. However, this generality is obtained at
the cost of losing its meaning links with concepts as knowledge
or communication. From this view, the word ‘information’ does
not belong to the language of factual sciences or to ordinary
language: it has no semantic content. The concept of information is a
scientific but completely formal concept, whose ‘meaning’ only has
a syntactic dimension; its generality derives from this exclusively
syntactic nature. Therefore, the theory of information becomes a
mathematical theory, a chapter of the theory of probability: only
when its concepts are semantically interpreted, the theory can be
applied to very different fields.
5. WHAT IS INFORMATION?
NOTES
1. Here we work with discrete situations, but the definitions can be extended to
the continuous case (cfr. Cover and Thomas, 1991, pp. 224–225).
2. In his original paper, Shannon (1948, p. 349) discusses the reason for the
choice of a logarithmic function and, in particular, of the logarithm to the
base 2 for measuring information.
3. If the natural logarithm is used, the resulting unit of information is called
‘nat’ – a contraction of natural unit –. If the logarithm to base 10 is used,
then the unit of information is the Hartley. The existence of different units
for measuring information shows the importance of distinguishing between
the amount of information associated with an event and the number of binary
symbols necessary to codify the event.
4. Shannon’s Second Theorem demonstrates that the channel capacity is the
maximum rate at which we can send information over the channel and recover
the information at the receiver with a vanishingly low probability of error
(cfr., for instance, Abramson, 1963, pp. 165–182).
WHAT IS INFORMATION? 133
5. Dretske uses Is(r) for the transinformation and Is(ra ) for the new individual
transinformation. We have adapted Dretske’s terminology in order to bring it
closer to the most usual terminology in this field.
6. In fact, the right-hand term of (2.7), when (2.5) and (2.6) are used, is:
i j p(si , rj )I(si , rj ) = i j p(si , rj )[I(si ) − E(rj )] = i j p(si , rj )log1/
p(si ) − i j p(si , rj )k p(sk /rj )log1/p(sk /rj ). Perhaps, Dretske made the
referred mistake by misusing the subindices of the summations.
7. Dretske says that, in this context, it is not relevant to discuss where the inten-
tional character of laws comes from: “For our purpose it is not important
where natural laws acquire this puzzling property. What is important is
that they have it” (Dretske, 1981, p. 77).
8. I have argued for this view of scientific observation elsewhere (Lombardi,
“Observación e Información”, future publication in Analogia): if we want
that every state of the receiver lets us know which state of the observed
entity occurred, it is necessary that the so-called “backward probabilities”
p(si /rj ) (cfr. Abramson, 1963, p. 99) have the value 0 or 1, and this happens
in an equivocation-free channel. This explains why noise does not prevent
observation: indeed, practical situations usually include noisy channels, and
much technological effort is devoted to design appropriate filters to block the
noise bearing spurious signal. I have also argued that, unlike the informational
account of observation, the causal account does not allow us to recognize
(i) situations observationally equivalent but causally different, and (ii) situ-
ations physically – and, then, causally – identical but informationally different
which, for this reason, represent different cases of observation.
9. The experiment included in the well-known article of Einstein, Podolsky and
Rosen (1935).
10. Note that this kind of situations does not always involve a common cause. In
Dretske’s example of the source transmitting to two receivers, the correlations
between RA and RB can be explained by a common cause at S. But it is usually
accepted the impossibility of explaining quantum EPR-correlations by means
of a common cause argument (cfr. Hughes, 1989). However, in both cases
correlations depend on underlying nomic regularities.
11. This does not mean that Cover and Thomas are absolutely original. For
example, Reza (1961, p. 1) considers information theory as a new chapter
of the theory of probability; however, his presentation of the subject follows
the orthodox way of presentation in terms of communication and signal trans-
mission. An author who adopts a completely syntactic approach is Khinchin
(1957); nevertheless his text is not as rich in applications as the book of Cover
and Thomas and was not so widely used.
12. In the definition of D(p//q), the convention – based on continuity arguments
– that 0 log 0/q = 0 and p log p/0 = ∞ is used. D(p//q) is also referred to
as the ‘distance’ between the distributions p and q; however, it is not a true
distance between distributions since it is not symmetric and does not satisfy
the triangle inequality.
134 OLIMPIA LOMBARDI
REFERENCES
Abramson, N.: 1963, Information Theory and Coding. New York: McGraw-Hill.
Bell, D.A.: 1957, Information Theory and its Engineering Applications. London:
Pitman & Sons.
Brown, H.I.: 1987, Observation and Objectivity. New York/Oxford: Oxford
University Press.
Cover, T. and J.A. Thomas: 1991, Elements of Information Theory. New York:
John Wiley & Sons.
Dennet, D.C.: 1969, Content and Conciousness. London: Routledge & Kegan
Paul.
Dretske, R.: 1981, Knowledge and the Flow of Information. Cambridge, MA: MIT
Press.
Einstein, A., B. Podolsky and N. Rosen: 1935, Can Quantum-Mechanical
Description of Physical Reality be Considered Complete? Physical Review 47:
777–780.
Hughes, R.I.G.: 1989, The Structure and Interpretation of Quantum Mechanics.
Cambridge, MA: Harvard University Press.
Jammer, M.: 1974, The Philosophy of Quantum Mechanics. New York: John
Wiley & Sons.
Khinchin, A.I.: 1957, Mathematical Foundations of Information Theory. New
York: Dover Publications.
Kosso, P.: 1989, Observability and Observation in Physical Science. Dordrecht:
Kluwer Academic Publishers.
Reza, F.M.: 1961, Introduction to Information Theory. New York: McGraw-Hill.
Shannon, C.: 1948, The Mathematical Theory of Communication. Bell System
Technical Journal 27: 379–423.
Shapere, D.: 1982, The Concept of Observation in Science and Philosophy.
Philosophy of Science 49: 485–525.
Stonier, T.: 1990, Information and the Internal Structure of the Universe. London:
Springer-Verlag.
Wicken, J.S.: 1987, Entropy and Information: Suggestions for Common
Language. Philosophy of Science 54: 176–193.