Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

On the Referential Competence of Some Machines

1996, Springer eBooks

Artificial Intelligence Review 10: 21-35, 1996. © 1996 Kluwer Academic Publishers. Printed in the Netherlands. On the Referential Competence of Some Machines* DIEGO MARCONI Facoltd di Lettere e Filosofia, Palazzo Tartara, V. G. Ferraris 109, 13100 Vercelli, Italy, and Center for Cognitive Science, Via Lagrange 3, 10123 Torino, Italy, E.U. E-mail: erme@rs950.cisi.unito.it Abstract. The main reason why systems of natural language understanding are often said not to "really" understand natural language is their lack of referential competence. A traditional system, even an ideal one, cannot relate language to the perceived world, whereas - obviously - a human speaker can. The paper argues that the recognition abilities underlying the application of language to the world are indeed a prerequisite of semantic competence. If a system of the traditional kind were appropriately endowed with the analytic abilities of a system of artificial vision, it would display (partial) referential competence: e.g. it would be able to verify sentences. In response to Searle's objections to the so-called "robot reply", the paper argues that such an integrated system could not be considered as essentially on a par with a purely inferential system of the traditional kind, unless one were prepared to regard even the human understanding system as "purely syntactic" (and therefore incapable of genuine understanding). Key Words: natural language, vision, understanding, semantic competence, reference. 1. WHY MACHINES DO NOT UNDERSTAND NATURAL LANGUAGE T h e r e are p e r f o r m a n c e s w h i c h stand in a criterial relation to u n d e r s t a n d i n g , in W i t t g e n s t e i n ' s sense. F o r e x a m p l e , if a p e r s o n c a n s u m m a r i z e a t e x t w e s a y that he h a s u n d e r s t o o d it ( w h e r e a s if he cannot, we d o u b t that he u n d e r s t o o d ) . I f a p e r s o n can a n s w e r questions c o n c e r n i n g the topics a text is about, and his a n s w e r s a p p e a r to be b a s e d on the i n f o r m a t i o n c o n t a i n e d in the text, w e s a y that p e r s o n has u n d e r s t o o d the text - w h e r e a s if he c a n n o t answer, it is legitimate to raise d o u b t s a b o u t his u n d e r s t a n d i n g . If a p e r s o n can c o r r e c t l y translate the text into a n o t h e r l a n g u a g e , w e say she u n d e r s t o o d (but i f she c a n n o t , a n d y e t d o e s k n o w the s e c o n d l a n g u a g e , w e are i n c l i n e d to s a y that she d i d n o t understand). S u c h are the " p a r a d i g m a t i c " cases in w h i c h w e say that s o m e b o d y u n d e r s t a n d s a text in a n a t u r a l l a n g u a g e : W i t t g e n s t e i n (1953) w o u l d s a y that o u r use o f such w o r d s as ' u n d e r s t a n d i n g ' and ' t o u n d e r s t a n d ' is i n t e r t w i n e d with such p e r f o r m a n c e s and the ability to carry them out. O f course, u n d e r s t a n d i n g is not identical with s u m m a r i z i n g , or a n s w e r i n g questions, or translating (or at any rate, it w o u l d be h i g h l y unnatural to say so). H o w e v e r , w e p r o b a b l y learn 123 22 DIEGO MARCONI how to use the concept of understanding by learning how to assess such performances. 1.1. Natural-language understanding systems Today, we have artificial systems which can carry out such tasks, with different degrees of success (for a recent survey, see Gazdar 1993). They are called "natural-language understanding systems" precisely because they are capable of one or the other among such performances. I However, in spite of the fact that these systems can carry out the very performances on the basis of which we normally say of a human being that she understands a language, many would say that such systems do not really understand natural language. Of course, the present systems are not as good as human beings at carrying out such tasks: their translations are often clumsy,~ their summaries unintelligent, the questions they can answer, relatively few in number (Gazdar 1993, pp. 162-163, 168). Moreover, the existing systems can (usually) carry out one or the other among such tasks: in contrast with human beings, they are either translators or question-answering devices or automatic abstractors. Finally, the range of texts that each system can process is strongly restricted, lexically at any rate. In order to overcome such limitations, more is required that just building huge lexical databases or integrating complex systems into one big system: we need to solve problems which have not even been formulated clearly so far, from metaphoric language to pragmatic competence and "contextual" knowledge. However, even though the AI community is concentrating on these kind of limitations, I surmise that it is not essentially because of them that naturallanguage processing systems are said not to really understand natural language. To realize this, imagine we have been successful in building a very sophisticated "understanding" system of the standard type. Such a system would have a perfect syntactic analyzer, a vast lexical database, and a semantic interpreter capable of compositionally constructing fully analytic semantic representations: they would be as explicit as we need them to be in order for the system to carry out - thanks to a reasoning module - all the inferences that could be plausibly attributed to a competent, or even a very competent speaker. From 'There are four elephants in the living-room' our system would infer that there are four large animals in the living-room, that there are four elephants in the house, that there is an even number of elephants in the living-room, that there are higher mammals (to be more precise, proboscideans) in the living-room; it could even infer that the living-room's furniture is likely to be badly spoiled. Let us call 'inferential competence' (Marconi 1987, 1991) a speaker's ability to manage a network of connections among words, underlying such performances as semantic inference, paraphrase, definition, retrieval of a word from its definition, synonym-finding, and so forth; we could then say that the ideal system's inferential competence would be satisfactory to the highest degree. 124 ON THE R E F E R E N T I A L C O M P E T E N C E OF SOME M A C H I N E S 23 1.2. Inabilities of the ideal system Why would we say, even of such a system, that it really does not understand the language it can process? What is it that the system cannot do? some philosophers would say that the system doesn't know the truth-conditions of the sentences it processes. However, knowledge of truth-conditions, to the extent that it is relevant to understanding, is partly structural competence (Partee 1981)3 and partly inferential competence, and we just assumed that the system possesses both. Notice in particular that it would not be right to say that the system doesn't know a sentence's truth-conditions in the sense that it cannot establish, for each situation S, whether the sentence is true or false in S. For if situation S is described in language, the system can indeed determine whether a sentence is true or false in S. This is exactly what systems do which (like our system) can answer questions relative to a text's topic: such systems determine whether certain sentences (corresponding to the questions) are true or false in the situations described by the texts they have processed. As we assumed that there are no limitations - either lexical or syntactic or discursive or of any other kind - to the texts the system can process, we conclude that our system can indeed determine whether a given sentence is true or false in any situation which can be described in the language the system can interpret. On the other hand, our system cannot establish whether a sentence is true or false in a situation which is not given to it through language. For example, it cannot determine whether a sentence is true or false in the real world: it cannot verify the sentence, unless the "real world" is given to it through a linguistic description. If you place the system in a room and require it to evaluate the sentence 'There are at least four seats in this room', the system won't do it. A strictly related inability can be highlighted by focussing on the reference of single words. We assumed that the system has remarkable inferential ability: for example, it can draw many inferences concerning elephants, i.e. inferences involving sentences where the word 'elephant' occurs. Still, one could claim perhaps Searle (1980) would claim - that such inferences as the system can carry out are not about elephants at all: strictly speaking, it would not even be correct to say that they involve sentences in which the word 'elephant' occurs. The system can indeed manipulate strings of symbols including a symbol which materially coincides with the English word 'elephant'. Such a symbol, however, is devoid of meaning for the system: emphatically, it does not mean "elephant" (i.e. it does not mean what the English word 'elephant' means). Whatever conclusions the system can infer are not in themselves about elephants: they are strings of symbols, meaningless for the system, which we (the system's users) interpret as pertaining to elephants. There is much that is wrong with this familiar argument. First of all, it is certainly wrong to oppose knowledge of meaning and the ability to manipulate symbols, as if "genuine" knowledge of meaning were forever something else with respect to symbol-manipulating ability. On the contrary, knowing the meaning of something is, essentially, being able to use it. It is still true that "The account according to which understanding a language consists in being able to use it 125 24 DIEGO MARCONI • . . is the only account now in the field" (Putnam 1979, p. 199). The problem is not whether knowledge of meaning can be "reduced" to symbol-manipulation, but what kind of symbol-manipulating abilities count as knowledge of meaning - as it is obvious that many such abilities would not be regarded as adequate• Secondly, it is wrong to say that the system does not know the meaning of 'elephant' for it ignores the word's reference. There is a sense in which the system does know the reference of 'elephant': it knows that the world refers to elephants, i.e. to large mammals, Proboscideans, living in Africa or India (or zoos), etc. For example, it would certainly be incorrect to say that, for all the system knows, its conclusions might be about flamingos rather than elephants. The system can very well tell elephants (mammals, Proboscideans, etc.) from flamingos, which are birds, waders, pink or white (not grey like elephants) etc. Nevertheless, the argument does point to real inadequacies of the system and its understanding of language. One thing the system cannot do is recognize elephants in the real world or in a photograph, as little as it can verify a sentence about elephants. It's the system's referential incompetence which underlies our feeling that natural-language understanding systems are only metaphorically such, for they do not really understand natural language. 2. SEMANTICCOMPETENCEAND RECOGNITION It follows that, in order to have a genuinely competent artificial system - a system that could really understand natural language - we ought to build a referentially competent system, i.e. a system which could apply words to the real world. Referential competence is the ability underlying such performances as naming, answering questions concerning the obtaining situation, obeying orders such as 'Close the door!', following directions etc. These performances are partly based on the ability to recognize objects and actions. This, in turn, is not a purely linguistic ability; under a certain description it can even be regarded as nonlinguistic (Marconi [in press])• This may have led some to deny that referential competence is part of semantic competence. For example, Wilks (1982) remarked that he knew enough of the meaning of 'uranium' to use the word effectively even though he could not recognize uranium, and knew no one who could. Therefore, referential ability is not necessary for semantic competence. We say that we know the meaning of 'X' even when we are unable to identify instances of X: thus 'knowing the meaning' is not used so as to include referential competence. Wilks's contention that recognitional ability is not necessary for semantic competence can be construed in two different ways, 'weak' and 'strong'. In the weak interpretation, it is claimed that one can have some competence relative to a word although one lacks full referential competence relative to that same word. In this sense the thesis is true: if I know a lot about uranium without being able to recognize uranium in the real world, I will not be denied competence relative to the word 'uranium'. Note, however, that such inability does 126 ON T H E R E F E R E N T I A L COMPETENCE OF S O M E M A C H I N E S 25 not necessarily amount to a complete lack of referential competence. I cannot recognize uranium; but if I am presented - on a tabletop, say - with a fruit I don't know, an animal I never saw before and a bit of uranium and I am asked to pick the uranium, I will easily do it. As far as uranium is concerned, I no doubt lack full recognitional ability but I do possess some ability to discriminate. It makes sense to regard such ability as part of referential competence. Wilks's thesis in the weak version does not entail that recognitional ability is irrelevant to semantic competence. In the strong version, however, the thesis holds that it is possible to have full semantic competence relative to a word although one has no referential competence associated with it - not even any ability to discriminate, as in the case of uranium. In this form, the thesis is (first of all) very hard to test: if a speaker has even limited competence relative to a word, she usually has some ability to discriminate in its application. If one knows anything at all concerning opals one knows that they are precious stones, so one can tell opals from cats or books. If the word 'pangolin' is not totally unknown to you, you know that pangolins are animals (not celestial bodies or Indian military men). On the other hand, in ordinary cases the strong thesis seems false: if I cannot tell cats from tigers, or violins form cellos my competence will be considered as defective, whatever my zoological or musical learning. The question arises, however, of what degree or amount of recognitional ability counts as referential competence. Suppose we take it as established that some degree of recognitional ability is a necessary condition of referential competence; is any degree of recognitional ability a sufficient condition of referential competence? Secondly, any view which connects semantic competence with the ability to recognize referents and verify sentences is open to the charge of verificationism, the discredited theory according to which to know the meaning of a sentence is to be able to verify it. Thus the second question is: does the view that referential competence involves recognition and verification abilities entail verificationism? 2.1. Recognition procedures are fallible Concerning the first question. Philosophers of a realist bent - partisans of the idea of objective reference - have pointed out that possession of the methods of inquiry by which we ascertain whether something is or is not gold is neither a necessary, nor a sufficient condition of referential knowledge about the word 'gold'. Knowing the reference of 'gold' is knowing that the word refers to gold: which does not require that one can recognize gold by the chemical analyst's or the jeweler's methods. On the other hand, such methods do not guarantee that one has access to the reference of 'gold' (witness Putnam's (1975) science-fictional examples: we might discover that such methods are and always were defective, for they fail to pick out gold and nothing but gold). This may be countered by challenging the very notion of objective reference (on the one hand) and on the other hand by pointing out that the alleged knowledge "that 'gold' refers to gold" has no content whatsoever as long as it is not explicated by some "i.e. clause'". But there is another objection, which cannot 127 26 DIEGO MARCONI be so dismissed. The "methods" one hints at while trying to describe recognitional ability have little in common with the sophisticated scientific methods that are discussed in connection with the realists' objections. A normal speaker's application of words such as 'cat', or 'water', or 'gold' is based on rough, macroscopic identification criteria, close to those underlying pattern recognition: not on DNA, or chemical or spectrographic analysis. However, such macroscopic recognition criteria are even more conspicuously fallible and unreliable than "scientific" methods. They make us identify hydrogen peroxide as water, iron pyrites as gold, plastic imitation wood as wood; under certain conditions, even porcelain cats as cats. But of course, 'cat' does not refer to porcelain cats nor 'gold' to iron pyrites. It is thus incorrect to label 'referential competence' a recognition ability which is so far removed from actually identifying reference. In reply to this objection, I should like to make three points. First of all, there is the question of what kind of recognition abilities count for semantic competence. If a person mistook a porcelain cat for a cat, and called it 'cat', would we say that she is linguistically incompetent? she made a mistake all right, but was that a mistake in the use of language? If not, that suggests that the kind of recognition ability which counts as constitutive of semantic competence in the case of a word such as 'cat' or 'wood' (as opposed to words such as 'diabetes' or 'neutrine') is just the availability of such rough identification criteria which are said to be "so far removed from actually identifying reference". The second point concerns the phrase 'under certain conditions' ("under certain conditions, one could mistake a porcelain cat for a cat"). It should be pointed our that, under certain conditions, one could make any mistake whatsoever. One could even fail to recognize a cube, under certain conditions of light. Would that make one incompetent in the use of the word 'cube'? Certainly not. Would that show, then, that recognitional ability is irrelevant to semantic competence? Not either: for if a person, endowed with normal sight, failed to recognize the cube under normal conditions, we would indeed call him incompetent. This leads me to the third point: it is impossible in principle to establish what amount or degree of recognitional ability counts as referential competence. The amount and nature of the recognitional ability which is regarded as relevant or even necessary to linguistic competence varies widely from word category to word category and even from word to word within the same category ('common cold' vs. 'sickle-cell anemia'), depending on social and natural factors. An artificial system that were to be made referentially competent in the sense in which a normal speaker is would have to be taught very different abilities in different cases. However, it would not have to be made into a fully competent encyclopedic scientist: for it is not that kind or recognition ability which we regard as relevant to linguistic competence in most cases. 2.2. Verificationism Now for the second question, i.e. the charge of verificationism. Notice first of all that it is not my intention to identify semantic competence with the ability 128 ON THE REFERENTIAL COMPETENCE OF SOME MACHINES 27 to verify. The question is, at most, whether verification abilities are relevant to understanding. I believe they are in the limited sense that has been stressed above. Let us repeat: as far as words such as 'cat', 'yellow' or 'walk' are concerned, the inability to verify (under normal circumstances) simple sentences in which they occur would be regarded as evidence of semantic incompetence. Which of course does not mean that the same should be said of such words as 'although', 'eight', or 'function '5. Nor does it mean that recognition (and verification) abilities are a sufficient condition of semantic competence. One standard objection to verificationism is based on the fact that there are lots of sentences we seem to understand, although we have no idea of how to go about verifying them: sentences like 'God exists', 'Positrons are made of quarks', 'Aristotle liked onions' (Fodor 1981, p. 216). This objection is irrelevant to the view I have been defending. I do not hold that to understand a sentence is, or requires, to know how to verify it. The understanding of a sentence is a complex process drawing from both structural and lexical competence; lexical competence, in its turn, is partly inferential and partly referential. For some sentences, the ability to verify them is a necessary condition of linguistic competence in the above specified sense. But note that for many sentences, the process by which they are validated does not directly involve "the real world", or "experience", or perception. Neither 'Positrons are made of quarks' nor 'Aristotle liked onions' would be validated by being directly correlated with perceptual input. This does not mean that such sentences only appear to be about the real world (but are really about, say, our database). "To be about the real world" is an intricate and obscure notion, which certainly does not reduce to being verifiable by appeal to perceptual input. Fodor (1981, p. 219) may well be right that "being about the real world" is a holistic notion, meaning that a proposition may be said to be about the world thanks to a very roundabout itinerary through many layers of our knowledge, both theroretical and perceptual. Thus: a) not all sentences that may be said to be "about the real world" are therefore to be verified in perception; b) verification is not necessarily verification-in-perception; c) understanding does not identify with, or require the availability of a method of verification, in any sense; d) yet for some sentences, the ability to verify them is a necessary condition of understanding. It could still be objected that, even within such limitations, the ability of verify a sentence is at most a symptom of understanding; it cannot be a necessary condition. The argument runs as follows. Most cases of understanding are cases of understanding in absentia: in most cases, the texts and speeches we understand - daily newspapers, novels, our friends' accounts of their feats - are not about the scene we have under our eyes at the moment of understanding. In all such cases, verification is simply impossible. There are, indeed, exceptions: there are cases of understanding in praesentia. Examples are: reading the instructions for a household appliance while looking at the machine itself and its controls; obeying an order such as 'Take the book in front of you and read the first line on p. 28'; listening to a person who is telling us about his medical condition. 129 28 DIEGO MARCONI But such cases, though frequent, are not the most frequent. To account for natural language understanding is essentially to account for understanding in absentia: verification simply does not come into the picture. Moreover, it has been plausibly argued (Johnson-Laird 1983, p. 246) that the understanding of fictional discourse is not essentially different from the understanding of non-fictional discourse: the distinction, all-important as it is in other respects, is irrelevant from the standpoint of language processing. A fortiori, one could say, in absentia understanding cannot differ in kind from understanding in praesentia 6. So, even in the case of understanding in praesentia the possibility of verification cannot be crucial. However, the argument as it stands fails to draw the (obvious) distinction between not being in a position to verify a sentence and being unable to verify it. Right now, I am not in a position to verify the sentence 'There are six people sitting in the next room', but it would clearly be inappropriate to say that I am unable to verify it, or that I don't know how to verify it. The clearest cases of understanding in absentia seem to be of this type: they are cases in which one is not in a position to verify whatever is asserted, but would know how to do it (of course, one is usually unwilling to). The same purpose would be served by a distinction between the ability to verify a sentence and the possibility of verifying it: I may have the ability without there being the objective possibility, or vice-versa. What we lack in the case of in absentia understanding is the possibility of verification: which proves nothing concerning our possessing the ability to verify, or the role it plays in understanding. However, if what matters (when it does matter) is not actual verification but the ability to verify, why should we want our system to carry out actual verifications? The answer is simple: it is the only way to effectively show that the system does possess the required abilites. As long as we do not face the problem of actual verification, we'll tend to have systems construct semantic representations (of single sentences or whole texts) which are nothing but formulas of a more or less formal language, themselves in need of interpretation. The only way to build a system to which we would be prepared to grant genuine semantic competence is to build a system that can actually verify natural language sentences. Of course understanding - even understanding in praesentia - does not consist in or require actual verification, but there is no better evidence of understanding than actual verification. 3. A REFERENTIALLYCOMPETENT ARTIFICIAL SYSTEM So let us go back to our artificial system, and wonder what would be required for it to be referentially competent. First of all, the system must be able to perceive - typically, to see 7 - the real world, just like us. For an artificial system, the beginning of referential competence is to be found in artificial vision. A possible misunderstanding must be avoided. There is a naive picture of the relation of perception to semantic competence which keeps coming back, in spite of Wittgenstein's attempts at dispelling it and of Putnam's more recent 130 ON THE REFERENTIAL COMPETENCE OF SOME MACHINES 29 criticism (in Putnam 1981). In this naive view, part of semantic competence is represented by a certain store of mental images associated with words, such as the image of a dog, of a table, of a running man. Thanks to these images we can apply to the real world words such as 'dog', 'table' or 'run': this is done by comparing our images with the output of perception (particularly, of vision). Today, this picture may be somehow supported by reference to prototype theory (although the theory does not license it; see Rosch 1978). Now, the point is not that we do not have mental images: perhaps there are good reasons to believe that we do have something of the kind (see Tye 1991). The point is that, in the naive picture, the images' use in relation to the real world or the perceptual scene is left undescribed. In Putnam's (1981, p. 19) words, "one could possess any system of images you please and not possess the ability to use the sentences in situationally appropriate w a y s . . . For the image, if not accompanied by the ability to act in a certain way, is just a picture, and acting in accordance with a picture is itself an ability that one may or may not have". In other words, in the naive picture the whole explanatory burden is carried by the relation of comparison between an image and the perceptual scene; but such a relation (or process, or whatever it is) is itself unexplained. Anyway, systems of artificial vision (as described by Rosenfeld 1988) are not organized like that: there is no store of images to be compared with the perceptual scene. Classes of objects the system can recognize (e.g. tables or cubes) are identified with classes of shapes, which are themselves interpreted as relational structures, i.e. labelled graphs where the nodes represent object parts and the arcs represent relations between parts: a node is labelled with an ideal property value or a set of constraints on such a value, whereas an arc is labelled with a relation value, or a set of constraints on such a value. For example, a table is identified with a class of shapes expressed by a relational structure, whose nodes represent parts of the table (top, legs) while the arcs represent relations between two parts. Node and arc labels are not absolute values, but constraints on possible values. The problem of recognizing a table in a scene is then, as Rosenfeld (1988, p. 286) says, the problem of "finding subgraphs of the scene graph that are close matches to the object graph, or that satisfy the constraints defined by the object graph". The scene graph is the result of a sequence of processing stages. In the first stage, the image provided by a sensor is digitalized, i.e. converted into an array of numbers "representing brightness or color values at a discrete grid of points in the image plane" (Rosenfeld 1988, p. 266), or average values in the neighborhoods of such points (elements of the array are called pixels). In the second stage (segmentation), pixels are classified according to several criteria, such as brightness, or belonging to the same local pattern (e.g. a vertical stroke). In the third stage (resegmentation), parts of the image such as rectilinear strokes, curves, angles etc. are explicitly recognized and labelled. In the fourth stage, properties and relations of such local patterns are identified: both their geometric properties and relations, and (e.g.) the distribution of grey levels through a given local pattern, color relations between two patterns etc. The scene graph's nodes are the local patterns with their properties, and its arcs are the relations among local patterns, with their values. To recognize a table in 131 30 DIEGO MARCONI a scene is thus - as w e saw - to f i n d a s u b g r a p h o f the s c e n e g r a p h w h i c h satisfies the constraints a s s o c i a t e d with the t a b l e - g r a p h . In practice, r e c o g n i t i o n is c o m p l i c a t e d b y several factors: it is hard to m a k e it invariant with r e s p e c t to different illumination c o n d i t i o n s , and 3-D vision raises m a n y additional p r o b l e m s . In what follows I shall d i s r e g a r d these kind o f p r o b l e m s (though they are o f course far from trivial) to focus on others. F r o m our v i e w p o i n t , the r e l a t i o n a l structure a s s o c i a t e d with the class o f tables, t o g e t h e r w i t h the m a t c h i n g a l g o r i t h m w h i c h a p p l i e s it to the a n a l y z e d s c e n e represents the content o f the s y s t e m ' s 8 referential c o m p e t e n c e relative to the w o r d ' t a b l e ' • If a s y s t e m w e r e e n d o w e d with this k i n d o f c o m p e t e n c e , plus a m i n i m a l a m o u n t o f structural s e m a n t i c c o m p e t e n c e (to repeat: the a b i l i t y to d e t e r m i n e the m e a n i n g of a c o m p l e x e x p r e s s i o n form its syntactic structure and the m e a n i n g s o f its c o n s t i t u e n t s ) a n d i n f e r e n t i a l c o m p e t e n c e , it c o u l d v e r i f y s e n t e n c e s such as ' T h e r e is a vase on a t a b l e ' , ' T h e r e is a vase on the t a b l e ' , ' T h e r e are two small chairs in front o f the t a b l e ' , etc. 3.1. Searle and the robot reply B e f o r e we go on to l o o k at the m a n y p r o b l e m s that c o m e up as soon as we try to e x t e n d the s y s t e m ' s referential c o m p e t e n c e , l e t ' s pause to c o n s i d e r a standard o b j e c t i o n to the c l a i m that such a s y s t e m w o u l d u n d e r s t a n d l a n g u a g e in a fuller sense than a traditional, p u r e l y inferential s y s t e m would. This is S e a r l e ' s a n s w e r to w h a t he labels " t h e r o b o t r e p l y " . T h e r o b o t r e p l y is one p o s s i b l e o b j e c t i o n to S e a r l e ' s o w n w e l l - k n o w n C h i n e s e r o o m a r g u m e n t ; an o b j e c t i o n that S e a r l e h i m s e l f f o r m u l a t e s and i m m e d i a t e l y dismisses• T h e r o b o t reply g o e s like this ( S e a r l e 1980, p. 420): Suppose we put a computer inside a robot, and this computer would not just take in formal symbols as input and give out formal symbols as output, but rather would actually operate the robot in such a way that the robot does something very much like perceiving, walking, • . . anything you like. The robot would, for example, have a television camera attached to it that enabled it to 'see', it would have arms and legs that enabled it to 'act', and all of this would be controlled by its computer 'brain'. Such a robot would . . . have genuine understanding and other mental states. So, the r o b o t reply is in line with our present suggestion. T h e e s s e n c e o f S e a r l e ' s o b j e c t i o n is that the C h i n e s e r o o m thought e x p e r i m e n t still applies. S u p p o s e I a m in the r o o m , i n p u t t i n g C h i n e s e s y m b o l s and rules for their m a n i p u l a t i o n . A n d s u p p o s e further that, as Searle (1980, p. 420) says, unknown to me, some of the Chinese symbols that come to me come from a television camera attached to the robot [notice that Searle is proposing that we treat the perceived scenes as just more Chinese symbols] and other Chinese symbols that I am giving out serve to make the motors inside the robot move the robot's legs or a r m s . . . I know none of these other facts. I am receiving "information" from the robot's "perceptual" apparatus, and I am giving out "instructions" to its motor apparatus without knowing either of these facts. I am the robot's homunculus, b u t . . . I don't understand anything except the rules for symbol manipulation. . . . The robot has no intentional states at a l l . . . And furthermore, by instantiating the program I have no intentional states of the relevant type. 132 ON THE R E F E R E N T I A L COMPETENCE OF SOME MACHINES 31 There is some confusion here: who is supposed to be the equivalent of the computer inside the robot: (me inside the room)? (me + the robot inside the room)? It appears, (me inside the room), but part of the room is really - unknown to me - the robot. So let that be the situation. Searle's main point seems to be that scenes or other perceptual contents are j u s t m o r e s y m b o l s to the computer, and the rules attaching words (e.g.) to perceived elements are just rules for the manipulation of symbols, like all others. The computer inside the robot is still unaware of anything but uninterpreted symbols and rules for their manipulation. It doesn't see scenes and attach words to their constituents, it just inputs symbols - of different kinds, to tell the truth - and manipulates them according to syntactic rules. But then - the obvious reply would be - couldn't we give the same description o f o u r o w n cognitive structure and performance? Perceptions are just symbols of a certain kind; they are manipulated according to special rules, different from the rules that connect words to other words - though still, of course, rules for symbol manipulation. What else? 9 If everything is just symbol manipulation for the computer, why not for us? One further reply could be the following. Suppose I am right in holding that Searle doesn't show that in our case, more than symbol manipulation is going on (or better: he doesn't show that our own cognitive activities could not be described as symbol manipulation). Then why going over to computers which perceive at all? Why aren't we satisfied with mere intralinguistic connections? Couldn't one claim that a traditional system - one only endowed with inferential competence - does understand language, or anyway, that Searle did not show that it doesn't, for he did not show that what w e do is more than manipulating symbols? The answer is: no, one could not sustain such a claim. For the point is that there are p e r f o r m a n c e s - such as recognition, or verification - that the traditional, inferential system could not carry out, whereas we can; and such performances are in a strong criterial connection with the understanding of language. Searle is wrong in holding that symbol manipulation is insufficient to understanding, but he is certainly right in pointing out that i n f e r e n t i a l manipulation - a special case of symbol manipulation - is not enough. In his later reply to further objections (Searle 1982), Searle insists that "in the Chinese room the a g e n t . . , doesn't attach any semantic content to the formal symbols" (p. 345). This must be true of the computer inside the robot as well: even in that case, the computer isn't attaching any semantic content to the symbols. It is indeed connecting words with perceptual inputs, but that doesn't count as attaching semantic content, as long as the computer ignores that this is what it is doing (that certain symbols are scenes, while others are words). But then comes the crucial question: how do w e know that this is what w e are doing (aside from knowing that symbols of a certain type, coming from a certain channel, are attached to symbols of another kind, coming from a different channel: for the computer too knows that much)? As long as Searle doesn't answer this question, his claim that the seeing computer isn't attaching any semantic content to the symbols it manipulates is empty. ~° 133 32 DIEGO MARCONI 3.2. Complications Let us now go back to our referentially competent system, for immediate complications. A system of the kind we designed is - obviously - a pattern recognition system. However, the part of the lexicon whose application is essentially governed by pattern recognition is strongly limited. Here are three examples of complications that immediately arise for a system whose referential competence is based on pattern recognition. First, a relatively easy case. Think of the word 'box'. Even its application is not based merely on the identification of a shape: and the reason is not simply that there are prism-like boxes, cylindrical boxes, cubic boxes and more, but that it is essential to a box to be a container. A parallelepiped of solid wood, size 25 x 10 x 5 cm, is not a box. A parallelepiped of the same size that has a groove parallel to its basis is not a box either. That an object is recognized (correctly in normal cases) as a box depends on a large amount of knowledge, most of which is not available to a mere pattern recognizer: it very much depends "on the context", i.e. on the social nature and function of the place where it is located, on the function the object itself can be presumed to play, etc. There are many common words which raise the same problem: 'desk', 'ball' (as opposed to 'sphere'), 'dish', 'level', 'aerial'. Secondly, a not-so-easy problem. Assume the system can verify sentences such as 'There is a book on the table', 'There is a book on a table' and the like. Now suppose the system has to verify the sentence There is something on the table Notice that the system works in an entirely top-down fashion: to put it very roughly, it goes from linguistic analysis of the sentence, to referential algorithms, to their application to the scene. It does not start with the scene, so to speak; it starts with lexical items and their associated referential algorithms. But of course there is no referential algorithm corresponding to 'something', or to an existentially quantified variable. So the system should probably go through the notion of an anomaly, or an interruption of texture in a surface. Such a solution, however, would be far from general: it clearly would not work with sentences such as There is something in the box or There is something in the room. Cases like this raise a general problem, which is further magnified by words like 'toy' or 'plant'. There is no such thing as the typical aspect of a toy as such: the word 'toy' applies to a variety of objects which look very different from one another, such as dolls, trains, balls etc. Thus - so the intuition goes - in order to verify the sentence There is a toy on the table 134 ON THE REFERENTIAL COMPETENCE OF SOME MACHINES 33 we human beings do not go in search of something "looking like a toy". Rather, we go through the following steps: first, we look at the tabletop; second, we identify whatever objects are on it; third, we decide whether any of th&n can be described as a toy. We can hardly be assumed to start with a list of toy-types {ball, doll, puppet, skates, train, etc. etc. } and proceed to determine whether any such object is on the table. However, that would be the only procedure open to the system as we designed it: in order to verify the sentence 'There is a toy on the table' the system must verify at least one sentence in a list constructed from the inferential meaning of ' t o y ' , i.e. it must verify either 'There is a doll on the table' or 'There is a ball on the table' etc. As far as I know, today's systems of vision can (under certain conditions) recognize objects in a scene starting with the objects, not starting with the scene (see Rosenfeld 1988, pp. 287-288). They can determine whether and where in a scene a given object is located starting with the object's definition, but they cannot determine which objects are present starting with a scene's analysis. In order to solve problems of this kind - let us call them "problems of bottomup recognition" - we would have to think of a different architecture for the system: we must imagine a kind of back-and-forth procedure in which aspects of the scene (contours, c o m e r s and the like) would select possible shapes in a pre-defined catalogue, activating top-down algorithms some of which would fail, whereas others might succeed and undergo a further selection by a perhaps more tightly structured scene. But all this is very hard to imagine, not to say design. One could perhaps train a neural network to do b o t t o m - u p recognition of objects on a tabletop, but it is vary hard to figure out how exactly it would work (as usual with neural networks: see McCloskey 1991). To conclude: all this shows that it is very hard even to imagine a system whose referential c o m p e t e n c e would be c o m p a r a b l e to our own. Philosophically, however, the point is of principle: could the system be said to be semantically competent - really competent - limited to the fraction of the lexicon where it is referentially competent? Would it really understand 'The book is on the table', 'There are two cups and just one spoon', etc.? If your intuitions are like mine, you will answer 'Yes'. But even if your answer were to be ' N o ' , the resulting discussion is bound to be extremely instructive. NOTES * Research leading to this article was partly supported by the Italian MURST - 40% funds. Since the repudiation of Turing's test (Turing 1964), the stress has tended to be placed on how such performances are carried out, i.e. on the structure of the programs and the kind of data they have access to. Here I am neither implying that such features are irrelevant to whether a system can legitimately be said to understand language, nor restoring some version of Turing's test as having definitional import. I am simply suggesting that no one would think we were dealing with (prospect) natural-language understanding systems if they were not capable of these kind of performances. 2 For surveys of work on machine translation, see Hutchins 1986, Allegranza 1993. 3 Structural competence is the ability to determine the meaning of a complex expression from its syntactic structure and the meanings of its constituents. Partee (1981, pp. 61-62) has called 135 34 4 5 6 7 s 9 ~o DIEGO MARCONI 'structural semantics' the part of semantic theory which accounts for semantic compositionality, i.e. for the effects of syntax on meaning. Fodor (1981) admits that "it is, of course, not very interesting to say that 'chair' refers to chair, since we have no theory of reference a n d . . , no mechanism to realize the t h e o r y . . , but at least it's true" (p. 223). The point is that, if we only knew that much, we could not speak a language. Thus I am certainly not assuming that "every nonlogical concept is reducible to sensation concepts" (Fodor 1981, p. 213). If understanding in absentia were qualitatively different from understanding in praesentia then the understanding of fictional discourse - which is mostly in absentia - would differ in most cases from the understanding of non-fictional discourse, which can be in praesentia. It is occasionally remarked that there is no principled reason to assign pride of place to vision among the perceptual modalities: an artificial system's referential competence might be based on its tactile or olfactory capacity. And there is no denying that even in the case of human being, recognition is sometimes sustained by, or even entirely based upon how something smells (think of alcohol or gasoline) or feels to the touch (as with discrimination among different kinds of cloth). Vision is, however, crucial to most human recognition procedures. Therefore, imagining an artificial system endowed with artificial sight (rather than artificial smell or touch) makes it easier to draw comparisons with human performances and the abilities underlying them. A referential competence based on vision is ipso facto closer to human referential competence. However, I do not mean to deny that an artificial referential competence could be based on other perceptual modalities. It is an interesting question whether such non-visuallybased competence could underlie a linguistic competence similar to the one we have. One should not for a moment conclude that our referential competence should be conceived along exactly the same lines. For instance, I am not attributing to the system any form of partial referential competence, such as the ability to discriminate uranium with respect to animals and plants, which I claimed we possess. As far as the system is concerned referential competence is an all-or-none affair: either the system has a referential algorithm associated with the word 'uranium' or it lacks it, and if it does lack it, then its direct referential competence relative to that word is simply nil (naturally, the system may have a referential algorithm which is imperfect from the standpoint of, say, scientific standards of classification: but that is another matter). This reply is in sympathy with other replies to Searle, particularly with those of Martin Ringle and Drew McDermott. Ringle (1980, p. 445) says: "If the causality with which Searle is concerned involves nothing more than direct connectivity between internal processes and sensori-motor states, it would seem that he is really talking about functional properties, not physical one [ . . . ] Connecting actual sensorimotor mechanisms to a perceptronlike internal processor should, therefore, satisfy causality requirements of this sort". And McDermott (1982, p. 340): " I f . . . you deny that a computer interpreting any set of rules . . . understands, then you beg the question, and this is what Searle does". Cf. Pylyshyn's question (1980, p. 443): "What licenses us ever to say that a symbol refers?" REFERENCES Allegranza, V. (1993). Le forme dell'interlingua. Osservazioni sui modelli linguistici della traduzione automatica. Sistemi lntelligenti 5: 121-157. Fodor, J. (1981). Tom Swift and His Procedural Grandmother. In Representations, 204-224. Harvester: Brighton. Gazdar, G. (1993). The Handling of Natural Language. In Broadbem, D. (ed.) The Simulation of Human Intelligence, 151-177. Blackwell: Oxford. Hutchins, W. J. (1986), Machine Translation: Past, Present, Future. Ellis Horwood: Chichester. Johnson-Laird, Ph. (1983). Mental Models. Cambridge Univ. Press: Cambridge. Marconi, D. (1987). Two Aspects of Lexical Competence. Lingua e Stile 22: 385-395. Marconi, D. (1991). Understanding and Reference. S~miotiques 1: 9-25. 136 ON THE R E F E R E N T I A L C O M P E T E N C E OF SOME M A C H I N E S 35 Marconi, D. (in press). On the Structure of Lexical Competence. To be published in Proceedings of the Aristotelian Society, January 1995. McDermott, D. (1982). Minds, Brains, Programs, and Persons. Behavioural and Brain Sciences 5: 339-341. McCloskey, M. (1991). Networks and Theories - T h e Place of Connectionism in Cognitive Science. Psychological Science 2: 387-395. Partee, B. (1981 ). Montague Grammar, Mental Representations and Reality. In t~hman, S. & Kanger, S. (eds.), Philosophy and Grammar, 59-78. Reidel: Dordrecht. Putnam, H. (1975). The Meaning of 'Meaning'. In Philosophical Papers, Vol. 2, Cambridge Univ. Press: Cambridge. Putnam, H. (1979). Reference and Understanding. In Margalit, A. (ed.), Meaning and Use, 199-217. Reidel: Dordrecht. Putnam, H. (1981). Brains in Vat. In Reason, Truth and History, 1-21. Cambridge Univ. Press: Cambridge. Pylyshyn, Z. (1980). The 'Causal Power' of Machines. Behavioural and Brain Sciences 3: 442-444. Ringle, M. (1980). Mysticism as a Philosophy of Artificial Intelligence. Behavioural and Brain Sciences 3: 444--446. Rosch, E. (1978). Principles of Categorization. In Rosch, E. & Lloyd, B. B. (eds.) Cognition and Categorization, Erlbaum: Hillsdale NJ. Rosenfeld, A. (1988). Computer Vision. Advances in Computers Vol. 27, 265-308. Academic Press: New York. Searle, J. (1980). Minds, Brains and Programs. Behavioural and Brain Sciences 3: 417-457. Searle, J. (1982). The Chinese Room Revisited. Behavioural and Brain Sciences 5: 345-348. Turing, A.M. (1964). Computing Machinery and Intelligence. In Anderson, A. R. (ed.) Minds and Machines, 4--30. Prentice Hall: Englewood Cliffs NJ. Tye, M. (1991), The Imagery Debate. M.I.T. Press: Cambridge MA. Wilks, Y. (1982). Some Thoughts on Procedural Semantics. In Lehnert, W. G. & Ringle, M. H. (eds.), Strategies for Natural Language Processing, 495-516. Erlbaum: Hillsdale NJ. Wittgenstein, L. (1953). Philosophical Investigations. Blackwell: Oxford. 137