Artificial Intelligence Review 10: 21-35, 1996.
© 1996 Kluwer Academic Publishers. Printed in the Netherlands.
On the Referential Competence of Some
Machines*
DIEGO MARCONI
Facoltd di Lettere e Filosofia, Palazzo Tartara, V. G. Ferraris 109, 13100 Vercelli,
Italy, and Center for Cognitive Science, Via Lagrange 3, 10123 Torino, Italy, E.U.
E-mail: erme@rs950.cisi.unito.it
Abstract. The main reason why systems of natural language understanding are often
said not to "really" understand natural language is their lack of referential competence.
A traditional system, even an ideal one, cannot relate language to the perceived
world, whereas - obviously - a human speaker can. The paper argues that the recognition abilities underlying the application of language to the world are indeed a
prerequisite of semantic competence.
If a system of the traditional kind were appropriately endowed with the analytic
abilities of a system of artificial vision, it would display (partial) referential competence: e.g. it would be able to verify sentences. In response to Searle's objections
to the so-called "robot reply", the paper argues that such an integrated system could
not be considered as essentially on a par with a purely inferential system of the
traditional kind, unless one were prepared to regard even the human understanding
system as "purely syntactic" (and therefore incapable of genuine understanding).
Key Words: natural language, vision, understanding, semantic competence, reference.
1. WHY MACHINES DO NOT UNDERSTAND NATURAL LANGUAGE
T h e r e are p e r f o r m a n c e s w h i c h stand in a criterial relation to u n d e r s t a n d i n g , in
W i t t g e n s t e i n ' s sense. F o r e x a m p l e , if a p e r s o n c a n s u m m a r i z e a t e x t w e s a y
that he h a s u n d e r s t o o d it ( w h e r e a s if he cannot, we d o u b t that he u n d e r s t o o d ) .
I f a p e r s o n can a n s w e r questions c o n c e r n i n g the topics a text is about, and his
a n s w e r s a p p e a r to be b a s e d on the i n f o r m a t i o n c o n t a i n e d in the text, w e s a y
that p e r s o n has u n d e r s t o o d the text - w h e r e a s if he c a n n o t answer, it is legitimate to raise d o u b t s a b o u t his u n d e r s t a n d i n g . If a p e r s o n can c o r r e c t l y translate
the text into a n o t h e r l a n g u a g e , w e say she u n d e r s t o o d (but i f she c a n n o t , a n d
y e t d o e s k n o w the s e c o n d l a n g u a g e , w e are i n c l i n e d to s a y that she d i d n o t
understand). S u c h are the " p a r a d i g m a t i c " cases in w h i c h w e say that s o m e b o d y
u n d e r s t a n d s a text in a n a t u r a l l a n g u a g e : W i t t g e n s t e i n (1953) w o u l d s a y that
o u r use o f such w o r d s as ' u n d e r s t a n d i n g ' and ' t o u n d e r s t a n d ' is i n t e r t w i n e d
with such p e r f o r m a n c e s and the ability to carry them out. O f course, u n d e r s t a n d i n g
is not identical with s u m m a r i z i n g , or a n s w e r i n g questions, or translating (or at
any rate, it w o u l d be h i g h l y unnatural to say so). H o w e v e r , w e p r o b a b l y learn
123
22
DIEGO MARCONI
how to use the concept of understanding by learning how to assess such
performances.
1.1. Natural-language understanding systems
Today, we have artificial systems which can carry out such tasks, with different
degrees of success (for a recent survey, see Gazdar 1993). They are called
"natural-language understanding systems" precisely because they are capable
of one or the other among such performances. I However, in spite of the fact
that these systems can carry out the very performances on the basis of which
we normally say of a human being that she understands a language, many would
say that such systems do not really understand natural language.
Of course, the present systems are not as good as human beings at carrying
out such tasks: their translations are often clumsy,~ their summaries unintelligent, the questions they can answer, relatively few in number (Gazdar 1993,
pp. 162-163, 168). Moreover, the existing systems can (usually) carry out one
or the other among such tasks: in contrast with human beings, they are either
translators or question-answering devices or automatic abstractors. Finally, the
range of texts that each system can process is strongly restricted, lexically
at any rate. In order to overcome such limitations, more is required that just
building huge lexical databases or integrating complex systems into one big
system: we need to solve problems which have not even been formulated clearly
so far, from metaphoric language to pragmatic competence and "contextual"
knowledge.
However, even though the AI community is concentrating on these kind of
limitations, I surmise that it is not essentially because of them that naturallanguage processing systems are said not to really understand natural language.
To realize this, imagine we have been successful in building a very sophisticated "understanding" system of the standard type. Such a system would have
a perfect syntactic analyzer, a vast lexical database, and a semantic interpreter
capable of compositionally constructing fully analytic semantic representations:
they would be as explicit as we need them to be in order for the system to carry
out - thanks to a reasoning module - all the inferences that could be plausibly
attributed to a competent, or even a very competent speaker. From 'There are four
elephants in the living-room' our system would infer that there are four large
animals in the living-room, that there are four elephants in the house, that there
is an even number of elephants in the living-room, that there are higher mammals
(to be more precise, proboscideans) in the living-room; it could even infer that
the living-room's furniture is likely to be badly spoiled. Let us call 'inferential
competence' (Marconi 1987, 1991) a speaker's ability to manage a network of
connections among words, underlying such performances as semantic inference,
paraphrase, definition, retrieval of a word from its definition, synonym-finding,
and so forth; we could then say that the ideal system's inferential competence
would be satisfactory to the highest degree.
124
ON THE R E F E R E N T I A L C O M P E T E N C E OF SOME M A C H I N E S
23
1.2. Inabilities of the ideal system
Why would we say, even of such a system, that it really does not understand
the language it can process? What is it that the system cannot do? some philosophers would say that the system doesn't know the truth-conditions of the sentences
it processes. However, knowledge of truth-conditions, to the extent that it is
relevant to understanding, is partly structural competence (Partee 1981)3 and partly
inferential competence, and we just assumed that the system possesses both.
Notice in particular that it would not be right to say that the system doesn't
know a sentence's truth-conditions in the sense that it cannot establish, for each
situation S, whether the sentence is true or false in S. For if situation S is described
in language, the system can indeed determine whether a sentence is true or false
in S. This is exactly what systems do which (like our system) can answer questions relative to a text's topic: such systems determine whether certain sentences
(corresponding to the questions) are true or false in the situations described by
the texts they have processed. As we assumed that there are no limitations - either
lexical or syntactic or discursive or of any other kind - to the texts the system
can process, we conclude that our system can indeed determine whether a given
sentence is true or false in any situation which can be described in the language
the system can interpret.
On the other hand, our system cannot establish whether a sentence is true or
false in a situation which is not given to it through language. For example, it
cannot determine whether a sentence is true or false in the real world: it cannot
verify the sentence, unless the "real world" is given to it through a linguistic
description. If you place the system in a room and require it to evaluate the
sentence 'There are at least four seats in this room', the system won't do it.
A strictly related inability can be highlighted by focussing on the reference
of single words. We assumed that the system has remarkable inferential ability:
for example, it can draw many inferences concerning elephants, i.e. inferences
involving sentences where the word 'elephant' occurs. Still, one could claim perhaps Searle (1980) would claim - that such inferences as the system can
carry out are not about elephants at all: strictly speaking, it would not even be
correct to say that they involve sentences in which the word 'elephant' occurs.
The system can indeed manipulate strings of symbols including a symbol which
materially coincides with the English word 'elephant'. Such a symbol, however,
is devoid of meaning for the system: emphatically, it does not mean "elephant"
(i.e. it does not mean what the English word 'elephant' means). Whatever conclusions the system can infer are not in themselves about elephants: they are
strings of symbols, meaningless for the system, which we (the system's users)
interpret as pertaining to elephants.
There is much that is wrong with this familiar argument. First of all, it is
certainly wrong to oppose knowledge of meaning and the ability to manipulate
symbols, as if "genuine" knowledge of meaning were forever something else with
respect to symbol-manipulating ability. On the contrary, knowing the meaning
of something is, essentially, being able to use it. It is still true that "The account
according to which understanding a language consists in being able to use it
125
24
DIEGO MARCONI
• . . is the only account now in the field" (Putnam 1979, p. 199). The problem
is not whether knowledge of meaning can be "reduced" to symbol-manipulation, but what kind of symbol-manipulating abilities count as knowledge of
meaning - as it is obvious that many such abilities would not be regarded as
adequate• Secondly, it is wrong to say that the system does not know the meaning
of 'elephant' for it ignores the word's reference. There is a sense in which the
system does know the reference of 'elephant': it knows that the world refers
to elephants, i.e. to large mammals, Proboscideans, living in Africa or India
(or zoos), etc. For example, it would certainly be incorrect to say that, for
all the system knows, its conclusions might be about flamingos rather than
elephants. The system can very well tell elephants (mammals, Proboscideans,
etc.) from flamingos, which are birds, waders, pink or white (not grey like
elephants) etc.
Nevertheless, the argument does point to real inadequacies of the system and
its understanding of language. One thing the system cannot do is recognize
elephants in the real world or in a photograph, as little as it can verify a sentence
about elephants. It's the system's referential incompetence which underlies our
feeling that natural-language understanding systems are only metaphorically such,
for they do not really understand natural language.
2. SEMANTICCOMPETENCEAND RECOGNITION
It follows that, in order to have a genuinely competent artificial system - a system
that could really understand natural language - we ought to build a referentially competent system, i.e. a system which could apply words to the real world.
Referential competence is the ability underlying such performances as naming,
answering questions concerning the obtaining situation, obeying orders such as
'Close the door!', following directions etc. These performances are partly based
on the ability to recognize objects and actions. This, in turn, is not a purely
linguistic ability; under a certain description it can even be regarded as nonlinguistic (Marconi [in press])• This may have led some to deny that referential
competence is part of semantic competence. For example, Wilks (1982) remarked
that he knew enough of the meaning of 'uranium' to use the word effectively
even though he could not recognize uranium, and knew no one who could.
Therefore, referential ability is not necessary for semantic competence. We say
that we know the meaning of 'X' even when we are unable to identify instances
of X: thus 'knowing the meaning' is not used so as to include referential competence.
Wilks's contention that recognitional ability is not necessary for semantic
competence can be construed in two different ways, 'weak' and 'strong'. In the
weak interpretation, it is claimed that one can have some competence relative
to a word although one lacks full referential competence relative to that same
word. In this sense the thesis is true: if I know a lot about uranium without
being able to recognize uranium in the real world, I will not be denied competence relative to the word 'uranium'. Note, however, that such inability does
126
ON T H E R E F E R E N T I A L
COMPETENCE
OF S O M E M A C H I N E S
25
not necessarily amount to a complete lack of referential competence. I cannot
recognize uranium; but if I am presented - on a tabletop, say - with a fruit I don't
know, an animal I never saw before and a bit of uranium and I am asked to
pick the uranium, I will easily do it. As far as uranium is concerned, I no doubt
lack full recognitional ability but I do possess some ability to discriminate. It
makes sense to regard such ability as part of referential competence.
Wilks's thesis in the weak version does not entail that recognitional ability
is irrelevant to semantic competence. In the strong version, however, the thesis
holds that it is possible to have full semantic competence relative to a word
although one has no referential competence associated with it - not even any
ability to discriminate, as in the case of uranium. In this form, the thesis is
(first of all) very hard to test: if a speaker has even limited competence relative
to a word, she usually has some ability to discriminate in its application. If one
knows anything at all concerning opals one knows that they are precious stones,
so one can tell opals from cats or books. If the word 'pangolin' is not totally
unknown to you, you know that pangolins are animals (not celestial bodies or
Indian military men). On the other hand, in ordinary cases the strong thesis seems
false: if I cannot tell cats from tigers, or violins form cellos my competence
will be considered as defective, whatever my zoological or musical learning.
The question arises, however, of what degree or amount of recognitional
ability counts as referential competence. Suppose we take it as established that
some degree of recognitional ability is a necessary condition of referential competence; is any degree of recognitional ability a sufficient condition of referential
competence? Secondly, any view which connects semantic competence with the
ability to recognize referents and verify sentences is open to the charge of verificationism, the discredited theory according to which to know the meaning of
a sentence is to be able to verify it. Thus the second question is: does the view
that referential competence involves recognition and verification abilities entail
verificationism?
2.1. Recognition procedures are fallible
Concerning the first question. Philosophers of a realist bent - partisans of the idea
of objective reference - have pointed out that possession of the methods of inquiry
by which we ascertain whether something is or is not gold is neither a necessary, nor a sufficient condition of referential knowledge about the word 'gold'.
Knowing the reference of 'gold' is knowing that the word refers to gold: which
does not require that one can recognize gold by the chemical analyst's or the
jeweler's methods. On the other hand, such methods do not guarantee that one
has access to the reference of 'gold' (witness Putnam's (1975) science-fictional
examples: we might discover that such methods are and always were defective,
for they fail to pick out gold and nothing but gold).
This may be countered by challenging the very notion of objective reference
(on the one hand) and on the other hand by pointing out that the alleged knowledge "that 'gold' refers to gold" has no content whatsoever as long as it is not
explicated by some "i.e. clause'". But there is another objection, which cannot
127
26
DIEGO MARCONI
be so dismissed. The "methods" one hints at while trying to describe recognitional ability have little in common with the sophisticated scientific methods
that are discussed in connection with the realists' objections. A normal speaker's
application of words such as 'cat', or 'water', or 'gold' is based on rough, macroscopic identification criteria, close to those underlying pattern recognition: not
on DNA, or chemical or spectrographic analysis. However, such macroscopic
recognition criteria are even more conspicuously fallible and unreliable than
"scientific" methods. They make us identify hydrogen peroxide as water, iron
pyrites as gold, plastic imitation wood as wood; under certain conditions,
even porcelain cats as cats. But of course, 'cat' does not refer to porcelain
cats nor 'gold' to iron pyrites. It is thus incorrect to label 'referential competence' a recognition ability which is so far removed from actually identifying
reference.
In reply to this objection, I should like to make three points. First of all,
there is the question of what kind of recognition abilities count for semantic
competence. If a person mistook a porcelain cat for a cat, and called it 'cat', would
we say that she is linguistically incompetent? she made a mistake all right, but
was that a mistake in the use of language? If not, that suggests that the kind of
recognition ability which counts as constitutive of semantic competence in the
case of a word such as 'cat' or 'wood' (as opposed to words such as 'diabetes'
or 'neutrine') is just the availability of such rough identification criteria which
are said to be "so far removed from actually identifying reference".
The second point concerns the phrase 'under certain conditions' ("under certain
conditions, one could mistake a porcelain cat for a cat"). It should be pointed
our that, under certain conditions, one could make any mistake whatsoever. One
could even fail to recognize a cube, under certain conditions of light. Would
that make one incompetent in the use of the word 'cube'? Certainly not. Would
that show, then, that recognitional ability is irrelevant to semantic competence?
Not either: for if a person, endowed with normal sight, failed to recognize the
cube under normal conditions, we would indeed call him incompetent.
This leads me to the third point: it is impossible in principle to establish
what amount or degree of recognitional ability counts as referential competence. The amount and nature of the recognitional ability which is regarded as
relevant or even necessary to linguistic competence varies widely from word
category to word category and even from word to word within the same category
('common cold' vs. 'sickle-cell anemia'), depending on social and natural factors.
An artificial system that were to be made referentially competent in the sense
in which a normal speaker is would have to be taught very different abilities in
different cases. However, it would not have to be made into a fully competent
encyclopedic scientist: for it is not that kind or recognition ability which we regard
as relevant to linguistic competence in most cases.
2.2. Verificationism
Now for the second question, i.e. the charge of verificationism. Notice first of
all that it is not my intention to identify semantic competence with the ability
128
ON THE REFERENTIAL COMPETENCE OF SOME MACHINES
27
to verify. The question is, at most, whether verification abilities are relevant to
understanding. I believe they are in the limited sense that has been stressed above.
Let us repeat: as far as words such as 'cat', 'yellow' or 'walk' are concerned,
the inability to verify (under normal circumstances) simple sentences in which
they occur would be regarded as evidence of semantic incompetence. Which of
course does not mean that the same should be said of such words as 'although',
'eight', or 'function '5. Nor does it mean that recognition (and verification)
abilities are a sufficient condition of semantic competence. One standard objection to verificationism is based on the fact that there are lots of sentences we seem
to understand, although we have no idea of how to go about verifying them:
sentences like 'God exists', 'Positrons are made of quarks', 'Aristotle liked
onions' (Fodor 1981, p. 216). This objection is irrelevant to the view I have
been defending. I do not hold that to understand a sentence is, or requires, to
know how to verify it. The understanding of a sentence is a complex process
drawing from both structural and lexical competence; lexical competence, in
its turn, is partly inferential and partly referential. For some sentences, the ability
to verify them is a necessary condition of linguistic competence in the above
specified sense. But note that for many sentences, the process by which they
are validated does not directly involve "the real world", or "experience", or
perception. Neither 'Positrons are made of quarks' nor 'Aristotle liked onions'
would be validated by being directly correlated with perceptual input. This does
not mean that such sentences only appear to be about the real world (but are
really about, say, our database). "To be about the real world" is an intricate and
obscure notion, which certainly does not reduce to being verifiable by appeal
to perceptual input. Fodor (1981, p. 219) may well be right that "being about
the real world" is a holistic notion, meaning that a proposition may be said to
be about the world thanks to a very roundabout itinerary through many layers
of our knowledge, both theroretical and perceptual. Thus:
a) not all sentences that may be said to be "about the real world" are therefore
to be verified in perception;
b) verification is not necessarily verification-in-perception;
c) understanding does not identify with, or require the availability of a method
of verification, in any sense;
d) yet for some sentences, the ability to verify them is a necessary condition
of understanding.
It could still be objected that, even within such limitations, the ability of verify
a sentence is at most a symptom of understanding; it cannot be a necessary condition. The argument runs as follows. Most cases of understanding are cases of
understanding in absentia: in most cases, the texts and speeches we understand
- daily newspapers, novels, our friends' accounts of their feats - are not about
the scene we have under our eyes at the moment of understanding. In all such
cases, verification is simply impossible. There are, indeed, exceptions: there
are cases of understanding in praesentia. Examples are: reading the instructions
for a household appliance while looking at the machine itself and its controls;
obeying an order such as 'Take the book in front of you and read the first line
on p. 28'; listening to a person who is telling us about his medical condition.
129
28
DIEGO MARCONI
But such cases, though frequent, are not the most frequent. To account for
natural language understanding is essentially to account for understanding in
absentia: verification simply does not come into the picture.
Moreover, it has been plausibly argued (Johnson-Laird 1983, p. 246) that the
understanding of fictional discourse is not essentially different from the understanding of non-fictional discourse: the distinction, all-important as it is in other
respects, is irrelevant from the standpoint of language processing. A fortiori,
one could say, in absentia understanding cannot differ in kind from understanding
in praesentia 6. So, even in the case of understanding in praesentia the possibility of verification cannot be crucial.
However, the argument as it stands fails to draw the (obvious) distinction
between not being in a position to verify a sentence and being unable to verify
it. Right now, I am not in a position to verify the sentence 'There are six people
sitting in the next room', but it would clearly be inappropriate to say that I am
unable to verify it, or that I don't know how to verify it. The clearest cases of
understanding in absentia seem to be of this type: they are cases in which one
is not in a position to verify whatever is asserted, but would know how to do it
(of course, one is usually unwilling to). The same purpose would be served by
a distinction between the ability to verify a sentence and the possibility of
verifying it: I may have the ability without there being the objective possibility,
or vice-versa. What we lack in the case of in absentia understanding is the
possibility of verification: which proves nothing concerning our possessing the
ability to verify, or the role it plays in understanding.
However, if what matters (when it does matter) is not actual verification but
the ability to verify, why should we want our system to carry out actual verifications? The answer is simple: it is the only way to effectively show that the
system does possess the required abilites. As long as we do not face the problem
of actual verification, we'll tend to have systems construct semantic representations (of single sentences or whole texts) which are nothing but formulas of a
more or less formal language, themselves in need of interpretation. The only
way to build a system to which we would be prepared to grant genuine semantic
competence is to build a system that can actually verify natural language sentences. Of course understanding - even understanding in praesentia - does not
consist in or require actual verification, but there is no better evidence of understanding than actual verification.
3. A REFERENTIALLYCOMPETENT ARTIFICIAL SYSTEM
So let us go back to our artificial system, and wonder what would be required
for it to be referentially competent. First of all, the system must be able to perceive
- typically, to see 7 - the real world, just like us. For an artificial system, the beginning of referential competence is to be found in artificial vision.
A possible misunderstanding must be avoided. There is a naive picture of
the relation of perception to semantic competence which keeps coming back,
in spite of Wittgenstein's attempts at dispelling it and of Putnam's more recent
130
ON THE REFERENTIAL COMPETENCE OF SOME MACHINES
29
criticism (in Putnam 1981). In this naive view, part of semantic competence is
represented by a certain store of mental images associated with words, such as
the image of a dog, of a table, of a running man. Thanks to these images we
can apply to the real world words such as 'dog', 'table' or 'run': this is done
by comparing our images with the output of perception (particularly, of vision).
Today, this picture may be somehow supported by reference to prototype theory
(although the theory does not license it; see Rosch 1978). Now, the point is not
that we do not have mental images: perhaps there are good reasons to believe that
we do have something of the kind (see Tye 1991). The point is that, in the
naive picture, the images' use in relation to the real world or the perceptual
scene is left undescribed. In Putnam's (1981, p. 19) words, "one could possess
any system of images you please and not possess the ability to use the sentences in situationally appropriate w a y s . . . For the image, if not accompanied
by the ability to act in a certain way, is just a picture, and acting in accordance
with a picture is itself an ability that one may or may not have". In other words,
in the naive picture the whole explanatory burden is carried by the relation of
comparison between an image and the perceptual scene; but such a relation (or
process, or whatever it is) is itself unexplained.
Anyway, systems of artificial vision (as described by Rosenfeld 1988) are
not organized like that: there is no store of images to be compared with the
perceptual scene. Classes of objects the system can recognize (e.g. tables or cubes)
are identified with classes of shapes, which are themselves interpreted as relational structures, i.e. labelled graphs where the nodes represent object parts and
the arcs represent relations between parts: a node is labelled with an ideal property
value or a set of constraints on such a value, whereas an arc is labelled with a
relation value, or a set of constraints on such a value. For example, a table is
identified with a class of shapes expressed by a relational structure, whose nodes
represent parts of the table (top, legs) while the arcs represent relations between
two parts. Node and arc labels are not absolute values, but constraints on possible
values. The problem of recognizing a table in a scene is then, as Rosenfeld (1988,
p. 286) says, the problem of "finding subgraphs of the scene graph that are
close matches to the object graph, or that satisfy the constraints defined by the
object graph". The scene graph is the result of a sequence of processing stages.
In the first stage, the image provided by a sensor is digitalized, i.e. converted
into an array of numbers "representing brightness or color values at a discrete
grid of points in the image plane" (Rosenfeld 1988, p. 266), or average values
in the neighborhoods of such points (elements of the array are called pixels).
In the second stage (segmentation), pixels are classified according to several
criteria, such as brightness, or belonging to the same local pattern (e.g. a vertical
stroke). In the third stage (resegmentation), parts of the image such as rectilinear strokes, curves, angles etc. are explicitly recognized and labelled. In the
fourth stage, properties and relations of such local patterns are identified: both
their geometric properties and relations, and (e.g.) the distribution of grey levels
through a given local pattern, color relations between two patterns etc. The
scene graph's nodes are the local patterns with their properties, and its arcs are
the relations among local patterns, with their values. To recognize a table in
131
30
DIEGO MARCONI
a scene is thus - as w e saw - to f i n d a s u b g r a p h o f the s c e n e g r a p h w h i c h
satisfies the constraints a s s o c i a t e d with the t a b l e - g r a p h . In practice, r e c o g n i t i o n
is c o m p l i c a t e d b y several factors: it is hard to m a k e it invariant with r e s p e c t to
different illumination c o n d i t i o n s , and 3-D vision raises m a n y additional p r o b l e m s .
In what follows I shall d i s r e g a r d these kind o f p r o b l e m s (though they are o f course
far from trivial) to focus on others.
F r o m our v i e w p o i n t , the r e l a t i o n a l structure a s s o c i a t e d with the class o f tables,
t o g e t h e r w i t h the m a t c h i n g a l g o r i t h m w h i c h a p p l i e s it to the a n a l y z e d s c e n e
represents the content o f the s y s t e m ' s 8 referential c o m p e t e n c e relative to the w o r d
' t a b l e ' • If a s y s t e m w e r e e n d o w e d with this k i n d o f c o m p e t e n c e , plus a m i n i m a l
a m o u n t o f structural s e m a n t i c c o m p e t e n c e (to repeat: the a b i l i t y to d e t e r m i n e
the m e a n i n g of a c o m p l e x e x p r e s s i o n form its syntactic structure and the m e a n i n g s
o f its c o n s t i t u e n t s ) a n d i n f e r e n t i a l c o m p e t e n c e , it c o u l d v e r i f y s e n t e n c e s such
as ' T h e r e is a vase on a t a b l e ' , ' T h e r e is a vase on the t a b l e ' , ' T h e r e are two small
chairs in front o f the t a b l e ' , etc.
3.1. Searle and the robot reply
B e f o r e we go on to l o o k at the m a n y p r o b l e m s that c o m e up as soon as we try
to e x t e n d the s y s t e m ' s referential c o m p e t e n c e , l e t ' s pause to c o n s i d e r a standard
o b j e c t i o n to the c l a i m that such a s y s t e m w o u l d u n d e r s t a n d l a n g u a g e in a fuller
sense than a traditional, p u r e l y inferential s y s t e m would. This is S e a r l e ' s a n s w e r
to w h a t he labels " t h e r o b o t r e p l y " . T h e r o b o t r e p l y is one p o s s i b l e o b j e c t i o n
to S e a r l e ' s o w n w e l l - k n o w n C h i n e s e r o o m a r g u m e n t ; an o b j e c t i o n that S e a r l e
h i m s e l f f o r m u l a t e s and i m m e d i a t e l y dismisses•
T h e r o b o t reply g o e s like this ( S e a r l e 1980, p. 420):
Suppose we put a computer inside a robot, and this computer would not just take in formal
symbols as input and give out formal symbols as output, but rather would actually operate
the robot in such a way that the robot does something very much like perceiving, walking,
• . . anything you like. The robot would, for example, have a television camera attached to it
that enabled it to 'see', it would have arms and legs that enabled it to 'act', and all of this
would be controlled by its computer 'brain'. Such a robot would . . . have genuine understanding and other mental states.
So, the r o b o t reply is in line with our present suggestion. T h e e s s e n c e o f S e a r l e ' s
o b j e c t i o n is that the C h i n e s e r o o m thought e x p e r i m e n t still applies. S u p p o s e I
a m in the r o o m , i n p u t t i n g C h i n e s e s y m b o l s and rules for their m a n i p u l a t i o n .
A n d s u p p o s e further that, as Searle (1980, p. 420) says,
unknown to me, some of the Chinese symbols that come to me come from a television camera
attached to the robot [notice that Searle is proposing that we treat the perceived scenes as just
more Chinese symbols] and other Chinese symbols that I am giving out serve to make the
motors inside the robot move the robot's legs or a r m s . . . I know none of these other facts. I
am receiving "information" from the robot's "perceptual" apparatus, and I am giving out "instructions" to its motor apparatus without knowing either of these facts. I am the robot's homunculus,
b u t . . . I don't understand anything except the rules for symbol manipulation. . . . The robot
has no intentional states at a l l . . . And furthermore, by instantiating the program I have no
intentional states of the relevant type.
132
ON THE R E F E R E N T I A L COMPETENCE OF SOME MACHINES
31
There is some confusion here: who is supposed to be the equivalent of the
computer inside the robot: (me inside the room)? (me + the robot inside the
room)? It appears, (me inside the room), but part of the room is really - unknown
to me - the robot. So let that be the situation. Searle's main point seems to be
that scenes or other perceptual contents are j u s t m o r e s y m b o l s to the computer,
and the rules attaching words (e.g.) to perceived elements are just rules for the
manipulation of symbols, like all others. The computer inside the robot is still
unaware of anything but uninterpreted symbols and rules for their manipulation. It doesn't see scenes and attach words to their constituents, it just inputs
symbols - of different kinds, to tell the truth - and manipulates them according
to syntactic rules.
But then - the obvious reply would be - couldn't we give the same description o f o u r o w n cognitive structure and performance? Perceptions are just symbols
of a certain kind; they are manipulated according to special rules, different from
the rules that connect words to other words - though still, of course, rules for
symbol manipulation. What else? 9 If everything is just symbol manipulation for
the computer, why not for us? One further reply could be the following. Suppose
I am right in holding that Searle doesn't show that in our case, more than symbol
manipulation is going on (or better: he doesn't show that our own cognitive activities could not be described as symbol manipulation). Then why going over to
computers which perceive at all? Why aren't we satisfied with mere intralinguistic
connections? Couldn't one claim that a traditional system - one only endowed
with inferential competence - does understand language, or anyway, that Searle
did not show that it doesn't, for he did not show that what w e do is more than
manipulating symbols? The answer is: no, one could not sustain such a claim.
For the point is that there are p e r f o r m a n c e s - such as recognition, or verification - that the traditional, inferential system could not carry out, whereas we
can; and such performances are in a strong criterial connection with the understanding of language. Searle is wrong in holding that symbol manipulation
is insufficient to understanding, but he is certainly right in pointing out that
i n f e r e n t i a l manipulation - a special case of symbol manipulation - is not
enough.
In his later reply to further objections (Searle 1982), Searle insists that "in
the Chinese room the a g e n t . . , doesn't attach any semantic content to the
formal symbols" (p. 345). This must be true of the computer inside the robot
as well: even in that case, the computer isn't attaching any semantic content to
the symbols. It is indeed connecting words with perceptual inputs, but that doesn't
count as attaching semantic content, as long as the computer ignores that this
is what it is doing (that certain symbols are scenes, while others are words).
But then comes the crucial question: how do w e know that this is what w e are
doing (aside from knowing that symbols of a certain type, coming from a certain
channel, are attached to symbols of another kind, coming from a different channel:
for the computer too knows that much)? As long as Searle doesn't answer this
question, his claim that the seeing computer isn't attaching any semantic content
to the symbols it manipulates is empty. ~°
133
32
DIEGO
MARCONI
3.2. Complications
Let us now go back to our referentially competent system, for immediate complications. A system of the kind we designed is - obviously - a pattern recognition
system. However, the part of the lexicon whose application is essentially governed
by pattern recognition is strongly limited. Here are three examples of complications that immediately arise for a system whose referential competence is based
on pattern recognition. First, a relatively easy case. Think of the word 'box'. Even
its application is not based merely on the identification of a shape: and the
reason is not simply that there are prism-like boxes, cylindrical boxes, cubic boxes
and more, but that it is essential to a box to be a container. A parallelepiped of
solid wood, size 25 x 10 x 5 cm, is not a box. A parallelepiped of the same
size that has a groove parallel to its basis is not a box either. That an object is
recognized (correctly in normal cases) as a box depends on a large amount of
knowledge, most of which is not available to a mere pattern recognizer: it very
much depends "on the context", i.e. on the social nature and function of the
place where it is located, on the function the object itself can be presumed to play,
etc. There are many common words which raise the same problem: 'desk', 'ball'
(as opposed to 'sphere'), 'dish', 'level', 'aerial'.
Secondly, a not-so-easy problem. Assume the system can verify sentences such
as 'There is a book on the table', 'There is a book on a table' and the like.
Now suppose the system has to verify the sentence
There is something on the table
Notice that the system works in an entirely top-down fashion: to put it very
roughly, it goes from linguistic analysis of the sentence, to referential algorithms, to their application to the scene. It does not start with the scene, so to
speak; it starts with lexical items and their associated referential algorithms.
But of course there is no referential algorithm corresponding to 'something', or
to an existentially quantified variable. So the system should probably go through
the notion of an anomaly, or an interruption of texture in a surface. Such a
solution, however, would be far from general: it clearly would not work with
sentences such as
There is something in the box
or
There is something in the room.
Cases like this raise a general problem, which is further magnified by words
like 'toy' or 'plant'. There is no such thing as the typical aspect of a toy as
such: the word 'toy' applies to a variety of objects which look very different from
one another, such as dolls, trains, balls etc. Thus - so the intuition goes - in
order to verify the sentence
There is a toy on the table
134
ON THE REFERENTIAL COMPETENCE OF SOME MACHINES
33
we human beings do not go in search of something "looking like a toy". Rather,
we go through the following steps: first, we look at the tabletop; second, we
identify whatever objects are on it; third, we decide whether any of th&n can
be described as a toy. We can hardly be assumed to start with a list of toy-types
{ball, doll, puppet, skates, train, etc. etc. } and proceed to determine whether
any such object is on the table. However, that would be the only procedure
open to the system as we designed it: in order to verify the sentence 'There is
a toy on the table' the system must verify at least one sentence in a list constructed
from the inferential meaning of ' t o y ' , i.e. it must verify either 'There is a doll
on the table' or 'There is a ball on the table' etc. As far as I know, today's systems
of vision can (under certain conditions) recognize objects in a scene starting
with the objects, not starting with the scene (see Rosenfeld 1988, pp. 287-288).
They can determine whether and where in a scene a given object is located starting
with the object's definition, but they cannot determine which objects are present
starting with a scene's analysis.
In order to solve problems of this kind - let us call them "problems of bottomup recognition" - we would have to think of a different architecture for the system:
we must imagine a kind of back-and-forth procedure in which aspects of the scene
(contours, c o m e r s and the like) would select possible shapes in a pre-defined
catalogue, activating top-down algorithms some of which would fail, whereas
others might succeed and undergo a further selection by a perhaps more tightly
structured scene. But all this is very hard to imagine, not to say design. One could
perhaps train a neural network to do b o t t o m - u p recognition of objects on a
tabletop, but it is vary hard to figure out how exactly it would work (as usual
with neural networks: see McCloskey 1991).
To conclude: all this shows that it is very hard even to imagine a system whose
referential c o m p e t e n c e would be c o m p a r a b l e to our own. Philosophically,
however, the point is of principle: could the system be said to be semantically
competent - really competent - limited to the fraction of the lexicon where it
is referentially competent? Would it really understand 'The book is on the table',
'There are two cups and just one spoon', etc.? If your intuitions are like mine,
you will answer 'Yes'. But even if your answer were to be ' N o ' , the resulting
discussion is bound to be extremely instructive.
NOTES
* Research leading to this article was partly supported by the Italian MURST - 40% funds.
Since the repudiation of Turing's test (Turing 1964), the stress has tended to be placed on how
such performances are carried out, i.e. on the structure of the programs and the kind of data
they have access to. Here I am neither implying that such features are irrelevant to whether a
system can legitimately be said to understand language, nor restoring some version of Turing's
test as having definitional import. I am simply suggesting that no one would think we were dealing
with (prospect) natural-language understanding systems if they were not capable of these kind
of performances.
2 For surveys of work on machine translation, see Hutchins 1986, Allegranza 1993.
3 Structural competence is the ability to determine the meaning of a complex expression from
its syntactic structure and the meanings of its constituents. Partee (1981, pp. 61-62) has called
135
34
4
5
6
7
s
9
~o
DIEGO MARCONI
'structural semantics' the part of semantic theory which accounts for semantic compositionality, i.e. for the effects of syntax on meaning.
Fodor (1981) admits that "it is, of course, not very interesting to say that 'chair' refers to chair,
since we have no theory of reference a n d . . , no mechanism to realize the t h e o r y . . , but at
least it's true" (p. 223). The point is that, if we only knew that much, we could not speak a
language.
Thus I am certainly not assuming that "every nonlogical concept is reducible to sensation
concepts" (Fodor 1981, p. 213).
If understanding in absentia were qualitatively different from understanding in praesentia then
the understanding of fictional discourse - which is mostly in absentia - would differ in most
cases from the understanding of non-fictional discourse, which can be in praesentia.
It is occasionally remarked that there is no principled reason to assign pride of place to vision
among the perceptual modalities: an artificial system's referential competence might be based
on its tactile or olfactory capacity. And there is no denying that even in the case of human
being, recognition is sometimes sustained by, or even entirely based upon how something
smells (think of alcohol or gasoline) or feels to the touch (as with discrimination among
different kinds of cloth). Vision is, however, crucial to most human recognition procedures.
Therefore, imagining an artificial system endowed with artificial sight (rather than artificial smell
or touch) makes it easier to draw comparisons with human performances and the abilities underlying them. A referential competence based on vision is ipso facto closer to human referential
competence. However, I do not mean to deny that an artificial referential competence could be
based on other perceptual modalities. It is an interesting question whether such non-visuallybased competence could underlie a linguistic competence similar to the one we have.
One should not for a moment conclude that our referential competence should be conceived along
exactly the same lines. For instance, I am not attributing to the system any form of partial
referential competence, such as the ability to discriminate uranium with respect to animals and
plants, which I claimed we possess. As far as the system is concerned referential competence
is an all-or-none affair: either the system has a referential algorithm associated with the word
'uranium' or it lacks it, and if it does lack it, then its direct referential competence relative to
that word is simply nil (naturally, the system may have a referential algorithm which is imperfect from the standpoint of, say, scientific standards of classification: but that is another matter).
This reply is in sympathy with other replies to Searle, particularly with those of Martin Ringle
and Drew McDermott. Ringle (1980, p. 445) says: "If the causality with which Searle is concerned involves nothing more than direct connectivity between internal processes and
sensori-motor states, it would seem that he is really talking about functional properties, not
physical one [ . . . ] Connecting actual sensorimotor mechanisms to a perceptronlike internal
processor should, therefore, satisfy causality requirements of this sort". And McDermott (1982,
p. 340): " I f . . . you deny that a computer interpreting any set of rules . . . understands, then
you beg the question, and this is what Searle does".
Cf. Pylyshyn's question (1980, p. 443): "What licenses us ever to say that a symbol refers?"
REFERENCES
Allegranza, V. (1993). Le forme dell'interlingua. Osservazioni sui modelli linguistici della traduzione
automatica. Sistemi lntelligenti 5: 121-157.
Fodor, J. (1981). Tom Swift and His Procedural Grandmother. In Representations, 204-224.
Harvester: Brighton.
Gazdar, G. (1993). The Handling of Natural Language. In Broadbem, D. (ed.) The Simulation of
Human Intelligence, 151-177. Blackwell: Oxford.
Hutchins, W. J. (1986), Machine Translation: Past, Present, Future. Ellis Horwood: Chichester.
Johnson-Laird, Ph. (1983). Mental Models. Cambridge Univ. Press: Cambridge.
Marconi, D. (1987). Two Aspects of Lexical Competence. Lingua e Stile 22: 385-395.
Marconi, D. (1991). Understanding and Reference. S~miotiques 1: 9-25.
136
ON THE R E F E R E N T I A L C O M P E T E N C E OF SOME M A C H I N E S
35
Marconi, D. (in press). On the Structure of Lexical Competence. To be published in Proceedings
of the Aristotelian Society, January 1995.
McDermott, D. (1982). Minds, Brains, Programs, and Persons. Behavioural and Brain Sciences 5:
339-341.
McCloskey, M. (1991). Networks and Theories - T h e Place of Connectionism in Cognitive Science.
Psychological Science 2: 387-395.
Partee, B. (1981 ). Montague Grammar, Mental Representations and Reality. In t~hman, S. & Kanger,
S. (eds.), Philosophy and Grammar, 59-78. Reidel: Dordrecht.
Putnam, H. (1975). The Meaning of 'Meaning'. In Philosophical Papers, Vol. 2, Cambridge Univ.
Press: Cambridge.
Putnam, H. (1979). Reference and Understanding. In Margalit, A. (ed.), Meaning and Use, 199-217.
Reidel: Dordrecht.
Putnam, H. (1981). Brains in Vat. In Reason, Truth and History, 1-21. Cambridge Univ. Press:
Cambridge.
Pylyshyn, Z. (1980). The 'Causal Power' of Machines. Behavioural and Brain Sciences 3: 442-444.
Ringle, M. (1980). Mysticism as a Philosophy of Artificial Intelligence. Behavioural and Brain
Sciences 3: 444--446.
Rosch, E. (1978). Principles of Categorization. In Rosch, E. & Lloyd, B. B. (eds.) Cognition and
Categorization, Erlbaum: Hillsdale NJ.
Rosenfeld, A. (1988). Computer Vision. Advances in Computers Vol. 27, 265-308. Academic
Press: New York.
Searle, J. (1980). Minds, Brains and Programs. Behavioural and Brain Sciences 3: 417-457.
Searle, J. (1982). The Chinese Room Revisited. Behavioural and Brain Sciences 5: 345-348.
Turing, A.M. (1964). Computing Machinery and Intelligence. In Anderson, A. R. (ed.) Minds and
Machines, 4--30. Prentice Hall: Englewood Cliffs NJ.
Tye, M. (1991), The Imagery Debate. M.I.T. Press: Cambridge MA.
Wilks, Y. (1982). Some Thoughts on Procedural Semantics. In Lehnert, W. G. & Ringle, M. H.
(eds.), Strategies for Natural Language Processing, 495-516. Erlbaum: Hillsdale NJ.
Wittgenstein, L. (1953). Philosophical Investigations. Blackwell: Oxford.
137