Spatial Cognition Libro 1998
Spatial Cognition Libro 1998
Spatial Cognition Libro 1998
EDITOR
Maxim I. Stamenov
(Bulgarian Academy of Sciences)
EDITORIAL BOARD
David Chalmers (University of Arizona)
Gordon G. Globus (University of California at Irvine)
Ray Jackendoff (Brandeis University)
Christof Koch (California Institute of Technology)
Stephen Kosslyn (Harvard University)
Earl Mac Cormac (Duke University)
George Mandler (University of California at San Diego)
John R. Searle (University of California at Berkeley)
Petra Stoerig (Universität Düsseldorf)
Francisco Varela (C.R.E.A., Ecole Polytechnique, Paris)
Volume 26
Edited by
SEÁN Ó NUALLÁIN
Nous Research and Dublin City University
P I
Epistemological Issues
Men and Women, Maps and Minds: Cognitive bases of sex-related
differences in reading and interpreting maps 3
Gary L. Allen
A Theoretical Framework for the Study of Spatial Cognition 19
Maurizio Tirassa, Antonella Carassa and Giuliano Geminiani
Describers and Explorers: A method for investigating cognitive maps 33
Antonella Carassa, Alessia Aprigliano and Giuliano Geminiani
The Functional Separability of Self-Reference and Object-to-Object
Systems in Spatial Memory 45
M. Jeanne Sholl
In Search for an Overall Organizing Principle in Spatial Mental
Models: A question of inference 69
Robin Hörnig, Berry Claus and Klaus Eyferth
Describing the Topology of Spherical Regions using the ‘RCC’
Formalism 83
Nicholas Mark Gotts
Cognitive Mapping in Rats and Humans: The tent-maze, a place
learning task in visually disconnected environments 105
Marie-Claude Grobéty, Muriel Morand and Françoise Schenk
Spatial Cognition Without Spatial Concepts 127
Arnold Smith
vi TABLE OF CONTENTS
P II
Software Applications: Multimedia, GIS, diagrammatic reasoning
and beyond
CHAMELEON Meets Spatial Cognition 149
Paul Mc Kevitt
SONAS: Multimodal, Multi-User Interaction with a Modelled
Environment 171
John Kelleher, Tom Doris, Qamir Hussain, DCU, and Seán Ó Nualláin
Designing Real-Time Software Advisors for 3D Spatial Operations 185
Mike and Ann Eisenberg
Using Spatial Semantics to Discover and Verify Diagrammatic
Demonstrations of Geometric Propositions 199
Robert K. Lindsay
Formal Specifications of Image Schemata for Interoperability in
Geographic Information Systems 213
Andrew U. Frank and Martin Raubal
Using a Spatial Display to Represent the Temporal Structure of
Multimedia Documents 233
Mireille Bétrancourt, Anne Pellegrin and Laurent Tardif
P III
Language and Space
A Computational Multi-layered Model for the Interpretation of
Locative Expressions 249
Luca Anibaldi and Seán Ó Nualláin
The Composition of Conceptual Structure for Spatial Motion
Imperatives 267
John Gurney and Elizabeth Klipple
Modelling Spatial Inferences in Text Understanding 285
Ute Schmid, Sylvia Wiebrock and Fritz Wysotzki
TABLE OF CONTENTS vii
P IV
Memory, Consciousness and Space
Given-New Versus New-Given? An analysis of reading times for
spatial descriptions 317
Thom Baguley and Stephen J. Payne
A Connectionist Model of the Processes Involved in Generating and
Exploring Visual Mental Images 329
Mathias Bollaert
Working Memory and Mental Synthesis: A dual-task approach 347
David G. Pearson and Robert H. Logie.
I wish to thank Velibor Korolija, Mary Hegarty, and John Kelleher for their help
in organising the event. Dedicated to the people of Omagh, and all other victims
of arbitrary borders.
Introduction
Spatial Cognition – Foundations and applications
Seán Ó Nualláin
Mind III, the third annual conference of the Cognitive Science Society of Ireland
(CSSI), took place in DCU from August 17 to August 19, 1998. It was spon-
sored by Nous Research (nous@dna.ie). The Mind conferences focus on a
different theme every year; in 1998, 55 researchers from 10 different countries
representing the disciplines of Psychology, Computer Science, Linguistics and
Geography convened, to discuss how people think and communicate about space.
With respect to consciousness studies, several of the papers here discuss the
detailed structure of the representation of space in short term memory (STM).
Cognitive scientists study many different phenomena within the general
topic of spatial cognition. Those concerned with environmental cognition study
people’s internal “cognitive maps” of their environment, and how they learn the
layout of new places they visit. Other cognitive scientists are concerned with
how we communicate about space, e.g. in giving route directions. Some others
study the power of spatial metaphors, both in language and graphics, for
communicating and reasoning about non-spatial information.
Apart from the obvious human interest of this research, there are massive
technological consequences. We are moving fast into the age of intelligent
multimedia, an age in which computers will show an ability to process multi-
modal inputs in real time. These advances have highlighted the need to better
understand how people think about space, so that we can design “user-friendly”
computer and information systems. For example, cognitive science is informing
the development of computer interfaces, multimedia, virtual reality and in-car
navigation systems.
This collection is divided into four thematic sections. The first is episte-
mological issues, where the topics range from a formalization of the topology of
spherical regions to descriptions of gender differences in spatial cognition and
the way in which usual mass media disrupt normal spatial intuitions. We then
x SEÁN Ó NUALLÁIN
venture into the area of software applications. We find here an intriguing range
of applications, from intelligent multimedia (intellimedia) products which process
multimodal signals (indeed, in the case of the Sonas system, multi-user multi-
modal signals) in real time, to geographical information systems (GIS) and tools
for diagrammatic reasoning. The third section’s analysis of space and language
emerges from questions like how symbols like cognitive maps arise from sensory
stimulation. The critical issues — for example the semantics of prepositions like
“at” and “across” — merge here with questions about how to represent sentences
involving them in modelled environments. Finally we come to the section most
relevant to this Advances in Consciousness Research series; memory, conscious-
ness, and space. We are concerned here in particular with the issue of unpacking
the old coarse-grained STM versus LTM dichotomy, and in particular articulating
the detailed structure of the former.
Gary Allen, with whose contribution we open this book, must be lauded for his
courage! On the one hand, he is exploring the politically sensitive area of
cognitive sex differences; on the other, he is concerned with the development of
a conceptual taxonomy for spatial abilities in which these differences will
emerge in some principled fashion. His conclusions, which focus, inter alia, on
sex differences with respect to the use of working memory, make interesting
reading indeed. (In section four, we shall note a paper by Pearson and Logie that
re-addresses this issue of working memory).
The following three contributions all focus on a theme central to this area;
the world considered in terms of the set of possible actions one can perform on
it (egocentric cognition) or as some kind of shared map (intersubjective cogni-
tion) — Ó Nualláin (1995, 2000), contains an analysis of this distinction.
Tirassa’s counterpoint to this distinction is “coupled” versus “decoupled”
architectures. Only the latter can truly be termed representational. The former can
be further divided into pure reflex actions, and information pickup in a Gibsonian
manner based on affordances. Representational architectures, on the other hand,
begin with the deictic, where organisms represent only what they can perceive.
Once object permanence (in the Piagetian sense) is achieved, we enter the realm
of base-level representations. The formal, “meta” level, accessible only to higher
primates, involves the ability to represent one’s own representations. Tiarassa’s
work is very influenced by the situated cognition view that interaction is key.
Carassa et al.’s work, which follows, analyses the shift that occurs from egocen-
tric to intersubjective cognition as familiarity with a scene grows.
Sholl’s superb paper, which follows, is concerned about finding experimental
INTRODUCTION xi
In this section, we detect a range of new techniques, at least through a glass darkly.
Intellimedia products designed to achieve the age-old goal of “making computers
invisible” by making the products so small, and their use so natural, that nobody
notices their being used, will begin this section. We then move on to aids to
diagrammatic reasoning that will perhaps help us all improve our reasoning abilities.
It is too much obviously, to expect that we will all be capable of the insights of a
Kékulé, Faraday or Einstein as a result of such products.
At a time when he was groping toward the cyclic model of Benzene which
made him famous, Kékulé had a dream involving the archetypal image of a snake
swallowing its own tail (though it has to be said that Kékulé gave several different
versions of this story). Faraday’s insight, followed up by Maxwell, conceived of the
electromagnetic medium as fluid under stress, and the electromagnetic field as
vortices in the fluid. Lastly, Einstein’s investigation of special relativity began with
a thought experiment in which he imagined riding astride a light beam.
Paul McKevitt’s paper gives an introduction to the general topic of intellimedia,
as exemplified by Aalborg’s “Chameleon” system. He describes other systems such
as those emerging from the Saarbrucken Vitra (visual translator) environment: Soccer
(which is self explanatory!) and Moses, which shows how a user generates step —
by-step route information. Situated artificial communicators are exemplified by
Richert’s system, which shows how a model airplane can be constructed from
wooden bits by human instructions. Okada’s Aesopworld, McKevitt would agree, is
hamstrung by a pre-Kantian (and perhaps pre-pre-Socratic) view of mind. Chameleon
itself is an exciting system which accepts multimodal input, with a dialogue manager
to handle the natural language element, a gesture recogniser to track 2D pointing
gestures, a laser system to act as a system pointer as the system explains directions,
and customised speech and syntax/semantics modules. The frame-based knowledge
representation and the blackboard model to integrate the different modalities are
relatively undeveloped, but the system’s “Topsy” learning capacity adds some
valuable functionality.
The SONAS system, still in its infancy, is introduced by John Kelleher, Tom
Doris and Qamir Hussain. Its multimodal and multiuser environment we have
previously mentioned; we are currently experimenting with the basic architecture
across a range of different graphics environments, including VRML, and OPENGL.
The limitations of this print medium prevent proper appreciation of the next
paper by the Eisenbergs. Hypergami is an educational application for the creation
INTRODUCTION xiii
of polyhedra; there exist advisors that suggest how to operate on these. The
charm of these structures eludes the written word or two-dimensional diagram.
Bob Lindsay’s paper for this volume “Using spatial semantics to discover and
verify diagrammatic demonstrations of geometric propositions” differs from and
is considerably more informative than the original conference title “Discovering
diagrammatic representations”. As he points out, diagrams and maps help
preserve spatial representations; language-like representations don’t do so. We
are, in a sense, back to the killing fields of the mental imagery debate about
whether the representations and processes underlying expression of mental
images are analog or propositional. Lindsay would probably agree with
Shephard’s conclusion that mental imagery and its associated cognitive processes
are functionally similar to vision processing. Diagrams must be reflected on, with
respect to reasoning, vis à vis their comprehension, generation and use. Lindsay’s
work, which dates back thirty years, must be considered in the context of
complementary work by Iwaski, Barwise, MacDougal, and others.
Lindsay emphasises that what geometric rules are about is constraints. These
can be expressed in forms as simple as spreadsheets. He goes on to detail the
Archimedes system. Andrew Frank, who was one of our keynote speakers,
emphasises that a formal specifications of spatial objects and relations is critical
for geographical information systems (GIS). He speculates that image schemata
(in the Lakoff-Johnson sense) for geographic space are different from those for
table-top space. With many references, he gives a thorough description of image
schemata in general, passing on to an exemplification before reaching his
conclusions.
Betrancourt et al. attack the problem of conveying the temporal structure of
multimedia documents. After reviewing spatial mulitimedia authoring problems
and the cognition literature, they indicate how their system “Madeus” conveys
temoral organisation by a spatial description. With a schorlarly thoroughness,
they analyse some experimental results. Finally, on http://www.compapp.dcu.ie/
~tdoris/mind3.html, the reader can view a set of papers that didn’t make it into
this book.
We begin this section with two papers on the intrinsically difficult topic of
lexical semantics of spatial expressions. Smith’s paper in section I has given us
a basic insight into how complicated this area is; simply put, the use of the
preposition “in” in these two (three!) sentences;
xiv SEÁN Ó NUALLÁIN
Part of the educational experience of Mind-3 was the precision of the analyses
of the structure of memory. Baguley’s neat demonstration opens this section for
us. He argues that episodic construction trace is a fundamental concept in the
analysis of short term memory. There is a considerable computational cost for
introducing new objects with respect to old objects. Analysis of processing costs
during reading are done in the context of mental models theory, and the experi-
ments make interesting reading.
We end with two enthralling experimental and computational contributions
to this, the topic most relevant to consciousness studies. Bollaert first performed
experiments on mental manipulation, and then ran a simulation compatible with
his and other experimental results. His hypotheses included an expectation that
response times would be longer with larger inter-object paths, and that stated
spatial relations should result in a shorter response time than unstated such. This
xvi SEÁN Ó NUALLÁIN
eventual model proposes three separate modules; an object memory for higher
visual areas, a feature memory for intermediate visual areas, and a shape memory
for retinotopically organised areas. His neural network simulation shows patterns
compatible with the results of the experiments. Finally, we come to the article
closest to the expectations that might be hatched by the writer of this book’s
blurb. Pearson and Logie’s paper is mainly a review proposing a framework in
which specialist components act as temporary storage buffers for visual-spatial
and verbal material, while a central executive generates and maintains visual
images.
I hope these pages convey some of the excitement of the event.
Reference
Ó Nualláin, Sean (1995, 2000) The Search for Mind. Exeter: Intellect.
P I
Epistemological Issues
Men and Women, Maps and Minds
Cognitive bases of sex-related differences
in reading and interpreting maps
Gary L. Allen
University of South Carolina
Three translations that are fundamental to many map interpretation tasks were
examined in this study, specifically, (a) translation of vertical viewing angle from
a same-plane three-dimensional view to an overhead two-dimensional view; (b)
translation of scale; and (c) translation of orientation so that the configuration of
objects in the environment is congruent with the configuration of objects on the
map. Previous research suggested that the most likely problem to be encountered
by women would be the third type of translation, which involves recognition of
a consistent configuration of objects despite array rotation (Evans 1980).
Method
set of 140 pairs of color slides was prepared. Participants in the study were
instructed to look at the first slide in a pair very carefully because they subse-
quently were to compare the arrangement of objects shown in the first picture to
that of objects shown in a second picture. Appropriate illustrations of experimen-
tal manipulations were included in practice trials. Participants responded “yes” as
soon as they determined that the two pictures portrayed the same array of objects
(accommodating vertical viewing angle, scale, and array orientation) or “no” as
soon as they determined that the arrays in the two pictures could not be the
same. Data were collected from 12 men and 12 women who were undergraduates
in psychology courses.
Results
A signal detection analysis revealed that men were generally more sensitive than
women to the differences between correct pairings and mismatches, collapsing
across all three manipulations. Yet, the data showed no difference between the
sexes with regard to bias for responding either “yes” or “no.” An analysis of hits,
that is, correct “yes” responses when the two pictures showed the same array,
also showed no difference between men (79% accuracy) and women (75%
accuracy) across the different manipulations. Correct identification of identical
arrays was at an above-chance level of performance by both sexes on all problem
types. In contrast, an analysis of correct rejections, that is, correct “no” responses
when the two pictures showed different arrays, revealed a significant main
effect, with men (90% accuracy) performing more accurately than women (70%
accuracy). Women showed two instances in which their performance was
basically at chance level, correct rejections involving manipulation of both
vertical viewing angle and scale (58% accuracy) and correct rejections involving
manipulation of both scale and array rotation (58% accuracy). An analysis of
response times revealed no differences between the sexes.
Discussion
Men Women
100
90
80
Percent Correct
70
60
50
40
30
20
10
0
Hits Correct Rejections
Figure 1. Mean accuracy for men and women in matching arrays consisting of three objects
requiring one, two, or three transformations typical of map interpretation
One of the most frequent uses of maps in everyday life involves referring to
street maps or road maps for wayfinding purposes. A street map requires the user
to establish correspondence between the arrangement of pathways in the environ-
ment on the map. The purposes of this study were to determine the impact of
map orientation on the ease with which this correspondence is established and to
examine the relationship between spatial abilities as assessed using traditional
psychometric tests and the ability to establish correspondence between environ-
ment and map. Previous research has demonstrated clearly that misalignment of
environment and map leads to errors in orientation and wayfinding (Levine
1982). Misalignment results in precisely the kind of conflict between spatial
frames of reference that, according to prior research on sex-related differences
(Evans 1980; Witkin et al. 1962), should lead to difficulties for women on map
interpretation tasks.
Method
battery included the Surface Development Test to assess visualization ability, the
Cube Comparison Test to assess spatial relations ability, the Hidden Figures Test
to assess flexibility of closure, and Gestalt Completion Test to assess speed of
closure, the Map Memory Test to assess visual memory ability, and the Map
Planning Task to assess visual scanning ability. Scores from these tests were
expected to be highly intercorrelated. Data were collected from 34 men and 69
women who were undergraduates in psychology courses.
Results
The results showed that men performed more accurately than women on the map
verification task when the map to be verified was misaligned with the model
town (83% versus 74% accuracy) but not when the map and town were aligned
(89% versus 87%). As in the case of Experiment 1, separate analyses were done
on hits and correct rejections. The difference between men and women was
significant in both analyses, but the performance advantage for males in the case
of hits (85% versus 82%) was only a third of that observed in the case of correct
rejections (87% versus 78%). An analysis of response times in the map verifica-
tion task showed a significant effect of alignment but no sex-related effects or
interactions. Mean response time for trials when map and town were aligned was
4.4 seconds, as compared to 6.7 seconds when they were misaligned.
Results from the battery of psychometric tests showed sex-related differenc-
es in favor of men on all except the test of visual memory, in which no differ-
ence was found. As expected, spatial ability test scores correlated significantly
with each other, with the single exception of Map Memory (visual memory) and
Map Planning (visual scanning). Scores from tests of visualization, spatial
relations, speed of closure, flexibility of closure, visual memory, and visual
scanning were all positively correlated with performance on the map verification
task, and these correlations were equal to or larger than the correlation between
sex of participant and map verification performance. Thus, statistically removing
the influence of these abilities individually eliminated the significant correlation
between sex of participant and performance on the map verification tasks, except
in the case of visual memory. Because the statistical relationship between sex of
participant and visual memory scores was not significant, removing its influence
obviously would not effect the relationship between sex of participant and map
verification performance.
Discussion
The results of this study indicated that women were at no disadvantage vis-a-vis
men in establishing correspondence between environment and map when their
10 GARY L. ALLEN
Men Women
100
90
80
Percent Correct
70
60
50
40
30
20
10
0
Map-Town Aligned Map-Town Misaligned
Table 1. Correlations among sex of participant, scores on six psychometric tests, and
performance on the map verification task
1 2 3 4 5 6 7
1. Sex –
2. Map Verification .29 –
3. Surface Development .36 .41 –
4. Cube Comparison .24 .31 .71 –
5. Hidden Figures .25 .25 .53 .49 –
6. Gestalt Completion .20 .29 .46 .35 .28 –
7. Map Memory .09 .29 .32 .27 .30 .25 –
8. Map Planning .20 .26 .59 .49 .45 .25 .19
Note: Correlations .20 or larger are statistically reliable at p < .05.
perspective of the environment was aligned with their perspective of the map.
When the map was misaligned, however, their performance level was significant-
ly below that of men, although still above chance. As in Experiment 1, women
were less reliable than men in correctly rejecting inaccurate maps. But in this
COGNITIVE BASES OF MAP READING 11
study, they also had less success in accurately identifying correct maps. Thus,
women appear to be more at general risk than men for orientation errors in
instances in which maps used as wayfinding aids are misaligned with the
environment they represent.
Women’s difficulties with certain map-reading situations were related to
their lower performance on tests assessing complex spatial abilities, such as
visualization, spatial relations, flexibility of closure, and spatial scanning.
Although basic visual memory ability was related to map reading skill, it did not
differentiate men from women in this regard. The fact that complex abilities, but
not visual memory per se, mediated sex-related differences in map reading
performance provides support for the working memory load hypothesis. Specifi-
cally, it implicates difficulties with the simultaneous storage and processing of
spatial information rather than with storage alone. Performance on the map
verification task can be interpreted in a manner consistent with this view, as
well. Establishing correspondence between route and map requires additional
processing when misalignment must be accommodated as compared to instances
in which the two are viewed from a compatible perspective. In short, the load of
working memory is greater with misalignment.
Method
Results
Men Women
90
80
Percent Correct
70
60
50
40
30
20
10
0
Bottom-to-Top Top-to Bottom Left-to-Right Right-to-Left
Figure 3. Effect of direction of depicted travel on men’s and women’s ability to identify
accurate maps after viewing a slide presentation simulating travel along a route
Discussion
The results from this study showed that women generally had difficulty with this
task; they were unable to identify correct maps reliably under any conditions and
unable to detect errors in maps unless the errors appeared early in the route (for
all directions of travel depicted on the map) or late in the route (for horizontally
depicted directions of travel). Men’s performance differed in that they were
much more accurate in identifying correct maps.
At this point, it is difficult to suggest the cognitive basis for the sex-related
difference found in this study. Clearly, the idea of alignment, interpreted as
correspondence between straight-ahead movement through the environment and
bottom-to-top directional travel on a map, cannot account for the findings. This
alignment did not facilitate performance. One factor that does stand out is that
the middle portion of the route posed a major challenge for accuracy. If partici-
pants were reduced to chance-level performance with respect to the middle
portion of the route, then their reliability in judging the accuracy of correct maps
and the inaccuracy of incorrect maps with errors in the middle of the route
would be low. This pattern fits the data from women participants, but the results
14 GARY L. ALLEN
concerning correct rejections also show that men were unreliable in identifying
incorrect maps with errors in the middle portion. Thus, a definitive explanation
is elusive.
Recent studies have indicated that verbal descriptions can provide a flexible
spatial representation that can be translated readily into a cartographic product
(Taylor & Tversky 1992). In other words, accurate maps of environments can be
sketched by individuals who hear a verbal description of the layout involved.
This study was designed to examine sex-related differences in the ability to draw
a route on a map of a neighborhood using information obtained from a descrip-
tion of that route. Based on previous research suggesting more frequent left-right
confusion in women than in men, it was expected that men’s routes sketched on
maps would be more accurate than would women’s routes.
Method
A verbal description of a 1.7 km. route was prepared to conform with suggested
conventions for effective route directions, including the veridical temporal-spatial
sequencing of features encountered along the route, the inclusion of more
descriptive information at choice points than a other places along the route, and
the inclusion of more descriptive information at the conclusion of the route than
at other places (Allen 1997). The description included 63 communicative
statement organized into 21 units. Each unit either described a place or described
movement from one place to the next. The description far exceeded the amount
of information that an individual would normally be expected to retain temporari-
ly in memory. After hearing the description, participants were provided a map of
the neighborhood through which the route proceeded and were instructed to mark
with a pencil the course of movement described in the route directions. A point
of origin was designated on the map. The accuracy of route maps was deter-
mined by dividing them into 21 units that corresponded to the 21 units in the
route directions. The accuracy of each unit was scored by two judges, and
composite accuracy was then computed for the initial one-third, middle one-third,
and final one-third of the route. Data were collected from 25 men and 25 women
who were undergraduates in psychology courses. All participants reported being
unfamiliar with the route involved in the study.
COGNITIVE BASES OF MAP READING 15
Results
The accuracy scores revealed that men were more accurate than women in their
sketching the initial one-third (80% versus 66%) and middle one-third (55%
versus 42%) of the route on the map. There was no difference between men and
women on the final one-third of the route (44% versus 52%).
Men Women
90
80
Percent Correct
70
60
50
40
30
20
10
0
Initial One-Third Middle One-Third Final One-Third
Figure 4. Accuracy of men’s and women’s routes sketched on a map after hearing route
directions
Discussion
The data indicated that men were generally more accurate than women in
sketching the course of a route on a map while relying on information obtained
from a verbal description of that route. The performance of both men and
women showed clear evidence of a primacy effect, which is common in memory
tasks in which a list must be recalled in serial order. Interestingly, the pattern of
performance by women showed some indication of a recency effect, as well.
Because the order or sketching was constrained — it always began at the
designated point of origin — it is not the case that the final portion of map was
drawn first in such cases. Instead, a recency effect would suggest a somewhat
different strategy based on a linear-order representation built from the end-
16 GARY L. ALLEN
anchors inward rather than from start to finish. More research is needed to
develop this hypothesis and test it adequately.
Taken together, the empirical studies addressing the four topics provided
substantial evidence that, in the population sampled, certain aspects of women’s
map reading and interpretation skills were less reliable than those of men. The
differences were significant but not overwhelming, wide-spread but not all-
encompassing. The evidence hints at the cognitive bases for observed differences
in performance, but additional research is required before more definitive
statements are possible. No findings suggested deficits that would be impervious
to remediation based on widely recognized principles of skill acquisition.
With regard to the cognitive bases for the performance differences ob-
served, the evidence points to two aspects of spatial working memory. First,
women appear to be at greater risk than men of losing temporarily stored
information about spatial relations among objects, particularly if a cognitive
operation such as scale or perspective translation is performed on the stored
information or while the information is being stored. This risk could be attributed
to a simplistic notion of reduced working memory capacity, but it could also be
tied to the differential efficiency of various formats for storing spatial relations
(specifically, proposition lists versus images). Future studies designed to differ-
entiated between more efficient images and less efficient proposition lists would
be crucial for exploring this possibility.
The second aspect of working memory that is implicated involves the
inspection of spatial relations. While attempting to match object configurations
in the absence of a defining frame of reference, women may respond on the
basis of limited sampling of spatial relations because of the problem with
temporary storage mentioned above. Such a state of affairs would yield reliable
success in the event of environment-to-map matches but less reliable rejection of
environment-to-map mismatches.
This problem with limited sampling need not necessarily extend to corre-
spondence between routes and maps, however. When attempting to match route
patterns, the sequential format of the information may make it much easier to
detect route-to-map mismatches than is the case with object configurations.
Specifically, limited sampling of the beginning or end portions of the route
would yield quick detection of errors. As is frequently the case with sequentially
organized information, the middle portion provides the greatest challenge to
memory, especially with limited learning opportunities. Studies focusing specifi-
cally on the effect of limited sampling of spatial relations in array-to-map or
COGNITIVE BASES OF MAP READING 17
route-to-map matching tasks are needed to explore the hypothesis emerging from
these studies.
These findings are not without practical implications. The use of maps as
wayfinding and orientation aids may be differentially problematic for women
under certain circumstances. There is a need to compare the results from
experiments such as these with findings from research comparing the effective-
ness for men and women of map-based automated wayfinding aids, such as those
available in some automobiles. Available evidence would lead to the prediction
that such aids would be ineffective for many women, especially under the
conditions of divided attention (i.e., navigation while driving) that is typical of
their use.
Results from the experimental studies described in this presentation do not
address the larger issue of how sex-related differences in cognition emerged.
However, they may provide a measure of progress toward a necessary milestone
on the way to that goal, the milestone of a better understanding of the differenc-
es that require explanation. Men appear to be somewhat less affected than
women by visuo-spatial working memory demands in the context of tasks
requiring that a correspondence be established between a spatial array and a
symbolic representation of that array.
What questions should be asked about the origins of this difference?
Speculative accounts frequently suggest a link between sex-related differences
in spatial abilities and differences in travel behavior during human evolution.
However, perhaps even more extensive speculation is needed to account for
differences involving maps and minds. Because they are artifacts, maps must
reflect certain aspects of the modes of thought underlying their creation. Is it
possible that map reading as an activity taps the vestiges of male-characteristic
cognitive tendencies expressed in the historic conventions associated with map
design and use? Answering this question requires interdisciplinary inquiry far
beyond experimental studies of map reading and interpretation, but this is the
type of question that can attract scholarly attention from a variety of disciplines
to the study of spatial cognition.
References
Allen, Gary L. (1997). From knowledge to words to wayfinding: Issues in the production
and comprehension of route directions. In S. Hirtle & A. Frank (Eds.), Spatial
Information Theory: A theoretical basis for GIS (pp. 363–372). Berlin: Springer.
Allen, Gary L. (1999). Spatial abilities, cognitive maps, and wayfinding. In R. G.
Golledge (Ed.), Wayfinding Behavior: Cognitive mapping and other spatial processes
(pp. 46–80). Baltimore: Johns Hopkins University Press.
Baddeley, Alan D. (1986). Working Memory. New York: Oxford University Press.
18 GARY L. ALLEN
Baker, R. Robin. (1981). Human Navigation and the Sixth Sense. New York: Simon and
Schuster.
Beatty, William W. & Alexander I. Troster (1987). Gender differences in geographic
knowledge. Sex Roles, 16 565–589.
Bever, Thomas G. (1992). The logical and extrinsic sources of modularity. In M. Gunnar
& M. Maratsos (Eds.), Modularity and Constraints in Language and Cognition (pp.
179–212). Hillsdale, NJ: Lawrence Erlbaum Associates
Chang, Kang-tsung & James R. Antes (1987). Sex and cultural differences in map
reading. American Cartographer, 14, 29–42.
Ekstrom, Ruth. B., John W. French & Harry H. Harman (1976). Kit of Factor-referenced
Cognitive Tests. Princeton, NJ: Educational Testing Service.
Evans, Gary W. (1980). Environmental cognition. Psychological Bulletin, 88, 259–287.
Gilmartin, Patricia P. (1986). Maps, mental imagery, and gender in the recall of geo-
graphic information. American Cartographer, 13, 335–344.
Gilmartin, Patricia P. & Jeffrey C. Patton (1984). Comparing the sexes on spatial
abilities: Map-use skills. Annals of the Association of American Geographers, 74,
605–619.
Lawton, Carol A. (1994). Gender differences in way-finding strategies: Relationship to
spatial ability and spatial anxiety. Sex Roles, 30, 765–779.
Levine, Marvin (1982). You-are-here maps: Psychological considerations. Environment
and Behavior, 14, 202–237.
Logie, Robert H. (1995). Visuo-Spatial Working Memory. Hillsdale, NJ: Erlbaum.
Lohman, David F. (1988). Spatial abilities as traits, processes, and knowledge. In R.
Sternberg (Ed.), Advances in the Psychology of Human Intelligence (Vol. 4; pp.
181–248). Hillsdale, NJ: Erlbaum.
Money, John, Duane Alexander & H. T. Walker, Jr. (1965). A Standardized Road-Map
Test of Direction Sense. Baltimore: Johns Hopkins University Press.
Montello, Daniel R., Kristin L. Lovelace, Reginald G. Golledge & Carole M. Self (1999).
Sex-related differences and similarities in spatial and geographic abilities. Annals of
the Association of American Geographers, 89, 515–534
Mumaw, Randy J., James W. Pellegrino, Robert V. Kail & Phillip Carter (1984).
Different slopes for different folks: Process analysis of spatial aptitude. Memory
and Cognition, 12, 515–521.
Porteus, Stanley D. (1965). Porteus maze test: Fifty years’ application. Pal Alto, CA:
Pacific Books.
Taylor, Holly A. & Barbara Tversky (1992). Descriptions and depictions of environments.
Memory and Cognition, 20, 483–496.
Voyer, Daniel, Susan Voyer & M. Phillip Bryden (1995). Magnitude of sex differences
in spatial abilities: A meta-analysis and consideration of critical values. Psychologi-
cal Bulletin, 117, 250–270.
Witkin, H. A., R. B. Dyk, H. F. Patterson, D. R. Goodenough & S. A. Karp (1962).
Psychological Differentiation: Studies of development. New York: John Wiley and
Sons.
A Theoretical Framework for the Study
of Spatial Cognition
Maurizio Tirassa
Università di Torino
Antonella Carassa
Università di Padova
Giuliano Geminiani
Università di Torino
1. Introduction
invariants in the world that are relevant for that species’ interactions with it; and
adaptivity may be viewed as the species’ capability of maintaining compatibility.
Many different forms of adaptivity may be conceived of. Therefore, on the
one hand, when describing nonhuman species, we should avoid the anthropocen-
tric fallacy of conceiving of them as simply representing a greater or smaller
subset of what our species is able to represent. On the other hand, just because,
say, insects are unlikely to entertain representations doesn’t mean that representa-
tions do not exist in more sophisticated species.
It is a consequence of this picture that it may be misleading to study
adaptivity in terms of behaviors, if the term is taken to refer to factual descrip-
tions of what organisms objectively do in an objectively defined world. Behavior
is in the (representational) observer’s eye only, not in the organism observed:
what organisms do is not to behave, but to interact with their subjectively
defined environment. It is more appropriate to study interaction in terms of the
structure that generates and controls it. We call this structure the species’
cognitive architecture.
An organism should not be viewed as just cast into the world, a stranger in
a strange land: it has instead to be born prepared for the interaction with the
niche it will find itself in. In lower organisms, whose life mnespan is too brief
and nervous system too simple to allow for individual differentiation or learning,
this may mean that each architectural component has to be completely developed
from start. In general terms, however, it means rather that the possible modifica-
tions that an architecture may physiologically undergo are implicitly defined in
the architecture itself (Barkow, Cosmides & Tooby 1992; Cosmides & Tooby
1994; Lorenz 1965). Some architectures may be more rigid and some more
flexible, so to let each individual follow its own developmental trajectory,
according to the particular interactions it has with the environment; but, in any
case, the space of possible developmental trajectories is intrinsic to the initial state
of the architecture and is therefore a species-level property, and an adaptive one.
The innate endowment of a species thus determines not the specific interac-
tions that each of its members will entertain with the world, but the whole space
of that species’ possible interactions with its subjective environment; the
complexity of such space varies in accordance with the complexity of the
species’ architecture. Learning is not a natural kind, but the innate capability of
a cognitive architecture of undergoing specific types of modifications, possibly
triggered in part by specific types of interactions with the subjectively defined
environment.
To say that cognitive architectures are innate also means that they, like most
biological traits, are the product of evolution. Although adaptivity may be viewed
as the property of a whole species as well as of each of its members, natural
selection ultimately operates upon the slight individual variations existing
between the latter. The phylogeny of cognitive architectures is therefore a side
effect of the differences in the innate endowments of the individuals that make
up a species (due to statistical differences in the species’ genetic pool as well as
to mutation, recombination, etc.), plus the differences in their respective
reproductive success.
To resume: the living organisms we are interested in are those that engage
in active interactions with their subjectively construed environment by way of
self-organization (Maturana & Varela 1980; Varela, Thompson & Rosch 1991).
The structure which governs an organism’s interaction with the environment, thus
maintaining that organism’s adaptivity, is its innate cognitive architecture. The
cognitive architecture of a species defines the subjective structure that its
members will superimpose on the world. Our position so far may therefore be
described as a Kantian version of constructivism.
In the next sections, we will sketch some types of cognitive architectures,
A FRAMEWORK FOR THE STUDY OF SPATIAL COGNITION 23
that is, some ways in which an organism’s internal dynamics may co-evolve with
the subjectively relevant dynamics in the environment so to generate an adaptive
interaction. Each type will superimpose a specific type of subjective structure on
space, which will have relevant consequences on locomotion. The main division
we will draw is between those architectures whose internal dynamics are entirely
coupled to the dynamics in the world, and those that have at least some capabili-
ty of decoupling their internal dynamics from the external ones. The latter
correspond to representational architectures. From a neurobiological point of
view, we expect this division to mirror the division between species whose
nervous system has no proencephalic differentiation and those whose nervous
system has at least some. These two main classes will be further decomposed
into subclasses according to a criterion of complexity.
4. Coupled architectures
The internal dynamics of these cognitive architectures are entirely coupled to the
external ones. These organisms have no internal model of their environment and
are therefore only capable of external cognition: to them, the world is the only
possible model of itself.
Since concepts are the active constructions of a representational mind which
superimposes its own a priori categories on the world, coupled architectures have
no concepts of any sort. This implies that their subjective environment does not
build upon the existence of objects. To say that coupled architectures have no
object-based construction of space refers to something far more primitive than
object permanence. As we will argue later, the latter term refers to an organism’s
capability of realizing that objects exist even when they are out of immediate
perception, and therefore of recognizing them as being the same in different
presentations. The level logically antecedent to object permanence is object
impermanence: the difference, however, relates to the type of representation
entertained by an organism, rather than to whether that organism entertains
representations of any sort. Coupled architectures, instead, have no representa-
tions at all, so that the point here is not whether objects are or are not perma-
nent, but simply whether they exist.
Reflex-based architectures
The simplest types of coupled architectures are only composed of reflexes. The
internal dynamics of a reflex-based architecture depend exclusively on the
24 M. TIRASSA, A. CARASSA AND G. GEMINIANI
Affordance-based architectures
The organisms that belong to this second class of coupled architectures have
internal states that play a role in their interactions with the environment; we
borrow the term affordance from Gibson (1977) to refer to them.
Although the internal dynamics of an affordance-based architecture are still
entirely coupled to the environmental ones, the picture becomes far more
complex than was with reflex-based organisms. The coupling here is flexible, in
that the internal states contribute in determining what environmental dynamics
are currently the most relevant, among the several available at each moment.
Thus, an individual who is looking for prey and one who is looking for mate will
react to different affordances; and both will have to be able to adjust their
internal dynamics if required by the external ones (e.g., if a predator is detected).
These architectures may thus be described as dynamically ascribing a compara-
tive weight to each affordance available, according to the current internal state,
and then reacting to the balance of weights that has thus been created. Of course, the
criterion with which these weights are allocated is part of the architecture itself.
The subjective space of these organisms is composed of all the affordances
that are available at each moment. It has therefore a proper structure, although a
non-objectual one, because the various affordances are spatially oriented with
respect to the organism’s egocentric positioning and because they vary in
attractiveness or aversiveness.
The interactions of these types of organisms with their subjective environ-
ment may therefore be conceived of as a complex and continuously changing
A FRAMEWORK FOR THE STUDY OF SPATIAL COGNITION 25
balance between the affordances available. The complexity of this balance may
vary greatly from species to species. Some species can stabilize specific types of
affordance, so to let them govern the interaction over a certain interval of time;
this interval may be longer or shorter, thus making the stabilization more or less
permanent. The honeybee, for example, is capable of permanently fixing the
flight trajectory that leads from the hive to an interesting source of food; the
desert ant impermanently keeps track of the direction that will lead it back to the
nest at the end of the current cycle of exploration; and the housefly is incapable
of stabilizing its affordances at all.
What is interesting, in describing the interactions of all these different
species in terms of a dynamic balance between affordances, is that there is no
need to ascribe special representational or quasi-representational capabilities to
honeybees or to desert ants with respect to houseflies; it suffices for an explana-
tion of the differences between these species that the former be capable of
assigning a permanent or impermanent relative weight to certain environmental
affordances. Since the nervous systems of all these insects are roughly similar,
any other solution would be implausible from a neurobiological point of view.
5. Decoupled architectures
The architectures that belong to this second main class are those whose internal
dynamics are decoupled from the external ones; that is, those that entertain
representations. These should correspond roughly to the species whose nervous
system includes proencephalic structures.
The concept of representation lies at the very heart of cognitive science, but
it comes in different acceptances; it is therefore necessary to explain briefly
what we mean by it.
We reject the computationalist framework, according to which mental
representations are pieces of information internally stored in some predefined
formal code (Newell & Simon 1976). This position is nowadays philosophically
and psychologically unacceptable (Bruner 1990; Edelman 1992; Harnad 1990;
Nagel 1986; Putnam 1988; Searle 1980, 1992), if only because of its rather
controvertible consequence that whatever physical object undergoes internal
changes due to world events could then be considered representational —
including autonomous robots à la Brooks (Vera & Simon 1993) as well as
thermostats and computers. On the other hand, the perceived failure of symbolic
accounts of cognition has lately led many researchers to completely reject the
very idea of representation (e.g., Brooks 1991) while, at the same time, keeping
26 M. TIRASSA, A. CARASSA AND G. GEMINIANI
Deictic architectures
ence is that an affordance (in our acceptation of the term) is not an object
proper, but simply the potential for a certain action. The action that an affor-
dance calls for will possibly be executed according to its relative weight as
compared with the other affordances available. To entertain a deictic representa-
tion means instead to view an object as the experience of the possible interac-
tions that concern that particular world entity. Thus, it is not that the concept of
affordance applies to nonrepresentational architectures and not to representational
ones; it is only that it means something very different in the two cases (which
is why we have restricted our use of the term to the former case).
The idea of a deictic representation may also seem to apply to any represen-
tational architecture, at least under a certain acceptance of the concept (e.g.,
Glenberg 1997; Millikan 1998). From this point of view, the main difference
between a deictic architecture and a more sophisticated one is that the former has
no object permanence, that is, that it is incapable of singling an object out and
possibly labeling it as an individual entity, and thereby of realizing that it exists
even when it is out of immediate perception. This makes a great difference in
the subjective structure of the environment.
As regards the subjective structure of space, a deictic organism will only
interact with the space it can currently perceive; but, differently from what
happens in a coupled architecture, it will represent that space as a region wherein
proper objects exist (which is also why we have started to use the term percep-
tion only here). It will therefore be able to plan a trajectory in this region,
deciding in advance what path to follow (according to criteria such as distance
or dangerousness), what obstacles to avoid and how, and so on. This capability,
although confined to the space that is currently perceived, allows nonetheless
highly sophisticated interactions, at least as compared to those that are possible
to lower-level architectures.
In principle, two particularly simple forms of learning are possible in a
deictic architecture, one consisting in the acquisition of a novel way to cope with
a deictic object, and the other in the addition of a new deictic object to the
architecture’s subjective ontology, possibly as a specialization of a previously
existing one. Thus, if an animal is capable of creating a new deictic object (say,
my mate in a monogamous species), it may be able to interact with it in ways
that would be specific to that particular individual, while at the same time never
being able to realize that it is an individual object (because there is no such thing
as an individual object in the animal’s subjective ontology).
The capability of forming a deictic object which happens to be “objectively”
composed of only one instance also allows for the creation of a nest. Although,
in order to go back to it, the organism has either to perceive it or to resort to
28 M. TIRASSA, A. CARASSA AND G. GEMINIANI
Metarepresentational architectures
on.2 This makes it possible in turn to formulate theories about the world, to
reason formally, to reuse in a certain type of interaction the features of the world
that were relevant to a different type of interaction, and to communicate in a
mentalist way with conspecifics.
As regards the subjective structure of space, a metarepresentational species
is capable of creating abstract regions or zones with abstract borders and
landmarks and, most important, of entertaining survey maps. We do not conceive
of survey maps as allocentric: since a representation is, by definition, someone’s
subjective point of view, they can only be egocentric. Survey maps result instead
from the pretense to be dislocated in a different position (say, one kilometer
above the city) and watching the world from that perspective. This allows to
draw spatial inferences and therefore to plan in advance a path in a known as
well as in a partially unknown region. It must be remembered, however, that
these plans are always, by necessity, partial: they are not recipes for action to be
followed blindly, but guides for action to be further specified in the interaction
with the real world (Tirassa 1997).
Metacognitive organisms have two further capabilities. First, given the
appropriate cognitive tools, they can externalize their representations by means
of drawings or language. Second, given the ability to understand the representa-
tions entertained by a conspecific and to affect them in a desired way, they may
communicate spatial information to one another with the aid of these externalized
tools.
6. Future developments
The classification we have proposed here has a very high level of abstraction and
currently relies fundamentally on analytical considerations. The next step of our
research will be to derive, for each class of architectures described, a description
of the types of interaction that it may generate with regard to the control of
space and locomotion. Subsequently, we will check the empirical validity of our
analysis by looking for confirming or disconfirming evidence from neurobiology,
ethology and autonomous robotics. This is also likely to lead to a refinement of
the classification.
The ultimate goal of this research is to build a taxonomy of active organ-
isms based on three columns, namely, cognitive architecture, neurobiology and
embodiment, and interaction, that be analytically and empirically consistent on
each level.
30 M. TIRASSA, A. CARASSA AND G. GEMINIANI
Acknowledgments
We are grateful to Alessandra Valpiani for her participation in different phases of the research.
Maurizio Tirassa has been supported by the Italian Ministry of University and Scientific and
Technological Research (MURST), 40% Program Nuovi paradigmi per l’interazione e nuovi ambienti
di comunicazione, 1995–1997. Antonella Carassa and Giuliano Geminiani have also been separately
supported by the MURST, 60% for the years 1996 and 1997.
Notes
1. In this perspective, the term cognition and its derivatives should have been kept for the species
that possess a mind, that is, those that entertain representations. On the other hand, the term is
widely used in the relevant literature (e.g., Maturana & Varela 1980) and most alternatives are
not less ambiguous from a philosophical point of view (control architecture, for instance, is
typically used in autonomous robotics and would have therefore been inappropriate to the
discussion of biological entities, let alone representational ones).
2. Let us remark that to be capable of using symbols is not the same thing as being a symbol system
as postulated in classical cognitivism (see also the discussion at the beginning of this section).
References
Agre, Philip E. & David Chapman (1990). What are plans for? Robotics and Autonomous
Systems, 6, 17–34.
Allen, Colin & Marc Bekoff (1997). Species of mind. Cambridge, MA: MIT Press.
Barkow, John H., Leda Cosmides & John Tooby (Eds.) (1992). The adapted mind.
Evolutionary psychology and the generation of culture. Oxford: Oxford University
Press.
Brooks, Rodney A. (1991). Intelligence without representation. Artificial Intelligence, 47,
139–159.
Bruner, Jerome S. (1990). Acts of meaning. Cambridge: Cambridge University Press.
Cosmides, Leda & John Tooby (1994). Beyond intuition and instinct blindness: Toward
an evolutionarily rigorous cognitive science. Cognition, 50, 41–77.
Edelman, Gerald M. (1992). Bright air, brilliant fire: On the matter of the mind. New
York: Basic Books.
Geminiani, Giuliano, Antonella Carassa & Bruno G. Bara (1996). Causality by contact.
In J. Oakhill & A. Garnham (Eds.), Mental models in cognitive science. Essays in
honor of Phil Johnson-Laird. Hove, UK: Erlbaum/Taylor & Francis.
Gibson, J. J. (1977). The theory of affordances. In R. E. Shaw & J. Bransford (Eds.),
Perceiving, acting, and knowing. Hillsdale, NJ: Erlbaum.
Glenberg, Arthur M. (1997). What memory is for. Behavioral and Brain Sciences, 20,
1–55.
A FRAMEWORK FOR THE STUDY OF SPATIAL COGNITION 31
Griffin, David R. (1978). Prospects for a cognitive ethology. Behavioral and Brain
Sciences, 4, 527–538.
Harnad, Stevan (1990). The symbol grounding problem. Physica D, 42, 335–346.
Lorenz, Konrad (1965). Evolution and modification of behavior. Chicago & London: The
University of Chicago Press.
Maturana, Humberto D. & Francisco J. Varela (1980). Autopoiesis and cognition. The
realization of the living. Dordrecht: Reidel.
Millikan, Ruth G. (1998). A common structure for concepts of individuals, stuffs, and
real kinds: More Mama, more milk, and more mouse. Behavioral and Brain
Sciences, 21, 55–100.
Nagel, Thomas (1986). The view from nowhere. Oxford: Oxford University Press.
Newell, Allen & Herbert A. Simon (1976). Computer science as empirical enquiry:
Symbols and search. Communications of the Association for Computing Machinery,
19, 113–126.
Papi, Floriano (1990). Homing phenomena: Mechanisms and classification. Ethology
Ecology & Evolution, 2, 3–10.
Prato Previde, Emanuela, Marco Colombetti, Marco Poli & Emanuela Cenami Spada
(1992). The mind of organisms: Some issues about animal cognition. International
Journal of Comparative Psychology, 6, 79–119.
Premack, Ann J., David Premack & Dan Sperber (Eds.) (1995). Causal cognition: A
multidisciplinary debate. Oxford: Clarendon.
Putnam, Hilary (1988). Representation and reality. Cambridge, MA: MIT Press.
Searle, John R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3,
417–424.
Searle, John R. (1983). Intentionality: An essay in the philosophy of mind. Cambridge:
Cambridge University Press.
Searle, John R. (1992). The rediscovery of the mind. Cambridge, MA: MIT Press.
Tirassa, Maurizio (1997). Mental states in communication. In Proceedings of the 2nd
European Conference on Cognitive Science, Manchester, UK.
Tirassa, Maurizio (1999a). Taking the trivial doctrine seriously: Functionalism,
eliminativism, and materialism. Behavioral and Brain Sciences, 22, 851–852.
Tirassa, Maurizio (1999b). Communicative competence and the architecture of the
mind/brain. Brain and Language, 68, 419–441.
Varela, Francisco J., Evan Thompson & Eleanor Rosch (1991). The embodied mind.
Cognitive science and human experience. Cambridge, MA: MIT Press.
Vera, Alonzo H. & Herbert A. Simon (1993). Situated action: A symbolic interpretation.
Cognitive Science, 17, 7–133.
Describers and Explorers
A method for investigating cognitive maps
Giuliano Geminiani
Università di Torino
1. Introduction
cooperative one, and the investigator observed how the participants communicat-
ed maps to each other in order to reach their common goal. This differs from
the normal setup of experiments in which participants present verbal reports to
an investigator. Another feature was that half the participants had no prior
acquaintance with the environment in which the experiment takes place, and
everyone was informed of this fact. In our hypothesis, this condition forces one
participant, the describer, to tune his description to the knowledge of the
environment he attributes to the partner, the explorer.
The general aim of this research is to show how it is possible, by following
the experimental method hereby presented, to infer the type of spatial representa-
tion of the describer by analyzing his/her verbal descriptions of complex routes
and relating them to the explorative behaviors those descriptions induce in the
explorer.
2. Methods
Subjects
We recruited 40 children in their last year of primary school. At this age (age
range 9.3–10.2 years) spatial ability is comparable to that of adults (Cornell,
Heth, Alberts 1994; Heth, Cornell, Alberts 1997). Furthermore, although both
types of map are possessed at this age we believe that cultural influences, above
all educational influences, in the use of survey descriptions are not yet overriding.
Twenty participants (11 girls, 9 boys) attended the school that was the
setting for the way-finding experiment, and were therefore familiar with the
environment. The other twenty participants (11 girls, 9 boys) came from a
different school and had no prior knowledge of that environment.
Procedure
The experiment was carried out in a primary school which was of two stories
and had a cross-shaped ground plan. Two itineraries were designated, each
ranged over the two floors and each required exploration of four target locations
(labeled with colored cards) along the trail before arriving at destination loca-
tions, where a goal object was placed.
The subjects were divided into pairs (see later) and each member of the pair
was assigned a specific role. One subject (the describer) accompanied the
investigator on first one and then the other trail. The investigator then gave the
A METHOD FOR INVESTIGATING COGNITIVE MAPS 37
describer the following instructions: “The game starts now. You have to explain
to your teammate the route he has to take in order to collect all the cards and the
two final objects I have shown you. The winning team will be the one that
collects most cards and most final objects in the shortest time. If you think you
know a better way to reach all these items, than the route you followed with me,
then you can describe that to your teammate”.
The investigator then took the describer to a room and introduced him to
his explorer teammate.
The describer’s task was now to explain to the explorer how to reach the
intermediate and final targets. Previously the explorer had been motivated by the
investigator and the competitive nature of the game made clear. The entire
conversation between describer and explorer was recorded for subsequent
analysis. After receiving what he/she considered enough information to find the
objects, the explorer went off to find them followed by the investigator who
noted the trail followed on a plan of the building.
Thus the task of the describer was to give information to the explorer
teammate that would enable to latter reach the greatest number of target locations
in the shortest possible time, without any restriction as to the route taken. In fact
the shortest route was not that followed initially with the investigator. This task
is such, therefore, that if the describer has a survey-type representation, he would
tend to give a description having a survey-type perspective in order to allow
his/her teammate to find shortcuts which are a characteristic of a survey as
opposed to a landmark representation. Since the description is given to a real
explorer whose subsequent behavior is observed, we can overcome the limita-
tions of an analysis that only considers descriptions in terms of indicators: the
explorer therefore functions as an external judge of the description furnished,
since his/her behavior will reflect the type of information received.
The subjects were paired off in order to form four relations between
describer and explorer with respect to familiarity with the environment:
A. Both describer and explorer know the environment.
B. The describer is familiar with the environment, the explorer not.
C. The describer is not familiar with the environment, the explorer is.
D. Neither describer nor explorer are familiar with the environment.
These four pairs of subjects allow us to investigate the two methodological
hypotheses which are:
a. The description given depends on how familiar the describer is with the
environment.
38 A. CARASSA, A. APRIGLIANO AND G. GEMINIANI
Data analysis
Because there were only five pairs of subjects for each experimental situation,
we carried out non-statistical analysis of the data. Describers’ verbal reports were
analyzed in terms of indicators route and survey, which according to Taylor &
Tversky (1996) characterize route and survey maps (Table 1).
3. Results
highly familiar with the environment (A and B). Situation B (where only the
explorer is highly familiar with the environment) is characterized by a high
number of both survey and route indicators.
In situations C and D (describer not very familiar with the environment) the
number of survey indicators is less than in the other two situations; in situation
C the total number of indicators is low compared to the other situations, while in
situation D the route indicators dominate the scene.
Table 3 shows the results of the analyses of the quantity, efficacy and
flexibility of path and neighbourhood exploration.
Table 3. Results of the analyses of the quantity, efficacy and flexibility of path and neigh-
bourhood exploration. Describers and Explorers
Experimental Quantity Efficacy (explored/ Flexibility
Situation described locations
backtracking detours final intermediate avoidance of shortcuts
targets targets backtracking
A 09 20 6 (0.67) 26 (1.08) 2 03
B 12 22 5 (0.62) 24 (1.04) 3 07
C 11 25 8 (1.33) 20 (1.43) 2 10
D 04 15 9 (1.5) 18 (1.06) 3 03
The results in the Table concerning the quantity of path and neighborhood
exploration show that the poorest performance is in situation D (both, describer
and explorer have low familiarity). The number of backtracks in situation B was
high, where explorers with low familiarity obtained comparable results to those
familiar with the environment (A and C) as a result of their interaction with
describers familiar with the environment. Explorers familiar with the environ-
ment achieved similar performance irrespective of the familiarity of the decribers
familiar with the environment.
With regard to efficacy of exploration, it is surprising that explorers familiar
with the environment were successful even when they interacted with describers
not familiar with the environment. This cannot be due to the information
communicated to them by their describers as shown by the high value for the
explored/described ratio. Conversely, the targets were explored with more
efficacy by the explorers with high familiarity and by those who interacted with
highly familiar describers. As regards flexibility the most interesting finding was
A METHOD FOR INVESTIGATING COGNITIVE MAPS 41
that explorers who interacted with both highly familiar and poorly familiar
describers (situations B and C) used a high number of new shortcuts.
4. Discussion
indicators are set in a rich survey context, while in situation D they are not.
In further support of our conclusion that route indicators have different
meanings in the two cases, we note that exploration behavior related to route
differed greatly for final targets compared to backtracking. This illustrates the
importance of relating the description to the exploration behavior the description
induces and helps us better understand the meaning of the descriptions. By
analyzing the exploration behavior it is possible to reveal the real survey-type
meaning of the descriptions used in situation B. In this situation, the low
familiarity explorer in fact uses a survey type description and this is shown by
the high number of backtracks used and target-locations found, and also by the
large number of shortcuts used that were not mentioned by the describer.
Analysis of the behavior thus allows us to distinguish between the descriptions
in situations A and C: the larger number of route indicators used in situation C
is understandable if we consider the number of explored target-locations.
In conclusion, we have devised a useful methodology for studying spatial
representations, in which subjects with differing familiarity with an environment
interact, and the descriptions communicated are analyzed and related to the
exploration behavior induced. Although the method is complicated, it overcomes
some of the limits of other experimental methodologies, one of which is that
analysis of descriptions in terms of route and survey only does not allow
unequivocal interpretation of results since a given verbal description contains
both survey and route indicators, and route indicators can assume survey
significance in a survey context. The analysis of drawings, maps and diagrams
produced by a describer also presents a number of difficulties: analysis criteria
are still not well defined; graphic abilities and visuo-motor coordination vary
considerably between of individuals especially those in developmental age
confounding the analysis, and cultural factors may also influence expertise in
using street plans, maps and building plans.
The most interesting aspect of the method we propose is that it allows
experiments to be designed in which individuals with differing familiarities to a
given environment are compared. Since familiarity determines the type of the
spatial representation (as shown by Golledge 1978 and Evans 1985, for example) it
becomes possible to relate different behaviors to underlying spatial representations.
Acknowledgments
We are grateful to A. Schweitzer and A. Modigliani school children (in Segrate, Milano) who
participated in this study and to Erminielda Peron for helpful suggestions. Antonella Carassa and
Giuliano Geminiani were both supported MURST grants (60%) for the years 1996 and 1997.
A METHOD FOR INVESTIGATING COGNITIVE MAPS 43
References
Biel, Anders & Gunilla Torell (1979). The mapped environment: Cognitive aspects of
children’s drawings. Man-Environment Systems, 9, 187–194.
Chown, Eric, Stephen Kaplan & David Kortenkamp (1995). Prototypes, Location and
Associative Networks (PLAN): Towards a unified theory of cognitive mapping.
Cognitive Science, 19, 1–51.
Cornell, Edward H., Donald C. Heth & Denise M. Alberts (1994). Place recognition and
way finding by children and adults. Memory and Cognition, 22, 633–643.
Evans, Gary W. (1980). Environmental cognition. Psychological Bullettin, 88, 259–287.
Evans, Gary W. (1985). Review of Lindberg E.: Acquisition of cognitive maps of large
scale environments. Scandinavian Journal of Psychology, 26, 95–96.
Golledge, Reginald G. (1978). Learning about urban environments. In T. Carlstein, D. &
N. Thrift (Eds.), Timing Space and Spacing Time. London: Edward Arnold.
Golledge, Reginald G. (1987). Environmental cognition. In D. Stokols & I. Altman (Eds.),
Handbook of Environmental Psychology. New York: John Wiley and Sons.
Golledge, Reginald G. (1990). Cognition of physical and built environments. In T.
Garling, G. Evans (Eds.), Environment, Cognition and Action: A Multidisciplinary
Integrative Approach. New York: Oxford University Press.
Hart, Roger A., Gary T. Moore (1973). The development of spatial cognition: A review.
In R. M. Downs, Stea (Eds.), Image and Environment: Cognitive Mapping and
Spatial Behavior. Chicago: Aldline.
Heth, Donald C., Edward H. Cornell, Denise M. Alberts (1997). Differential use of
landmarks by 8- and 12-year-old children during route reversal navigation. Journal
of Environmental Psychology, 17, 199–213.
Lynch, Kevin (1960). The Image of the City. Cambridge, MA: MIT Press.
Shemyakin, F. N. (1962). General problems of orientation in space and space representa-
tions. In B. G. Anayev (Eds.), Psychological Science in the USSR (vol 1.). Washing-
ton, DC: US Joint Publications Research Service.
Siegel, Alexander W. & Sheldon H. White, (1975). The development of spatial represen-
tations of large scale environments. In H. W. Reese (Eds.), Advances in Child
Development and Behavior. New York: Academic Press.
Taylor, Holly A. & Barbara Tversky (1996). Perspective in spatial description. Journal of
Memory and Language, 35, 371–391.
Torell, Gunilla (1990). Children conceptions of large-scale environments. Goteborg
Psychological Reports, 20, II.
The Functional Separability of Self-Reference and
Object-to-Object Systems in Spatial Memory
M. Jeanne Sholl
Boston College
When people learn the layout of large-scale space by walking around within it,
their ability to orient, either in imagination or in actuality, to surrounding
landmarks that are hidden from view is generally accurate and highly flexible
(Presson & Hazelrigg 1984; Thorndyke & Hayes-Roth 1982; Sholl 1987). This
facility likely requires using locally available spatial cues to establish one’s
location and facing direction, and then retrieving the location of nonvisible
landmarks within that frame of reference. Sholl (Easton & Sholl 1995; Sholl &
Nolin 1997) has proposed a coordinate-system model of retrieval to account for
the seeming ease with which people retrieve the direction of unseen landmarks
from a variety of actual and imagined locations and facing directions in a known
environment (e.g., Easton & Sholl 1995; Sholl 1987). The model’s scope is
limited to the representation and retrieval of spatial knowledge learned by
navigating an environment on foot. It distinguishes between the storage of metric
object-to-object relations in an allocentrically organized long-term spatial
memory system and the retrieval of object-to-object relations within a body-
centered coordinate system. In its conceptualization, the model draws heavily
from animal models of navigation (Gallistel 1990; O’Keefe 1991), because the
components of a model of spatial retrieval need ultimately to fit within a general
model of human spatial navigation.
It should be acknowledged at the outset that metric models of spatial
representation can not account for all human spatial behavior. A metric model of
large-scale space that conforms to the axioms of Euclidean geometry codes the
straight-line distances and angles inter-connecting the landmarks contained within
the space. A mental representation that preserves the metric properties of
Euclidean space should produce spatial judgments that uphold Euclidean axioms
46 M. JEANNE SHOLL
(see Montello 1992, for a more in-depth discussion of this point). That is, inter-
landmark distance judgments should be symmetrical (i.e., the distance of
Landmark A from Landmark B should be estimated to be the same as the
distance from Landmark B to Landmark A) and judgments of relative direction
should be consistent with the triangle inequality axiom (i.e., estimates of the
spatial angles at the vertices of a triangle formed by three landmarks [e.g.,
Landmarks 2, 3 & 4 in Figure 1A] should not sum to more than 180°). However,
it is not uncommon for people to make systematically distorted and non-commu-
tative judgments of inter-landmark distance and angle in violation of Euclidean
axioms (e.g., Moar & Bower 1983; Sadalla, Burroughs & Staplin 1980; Tversky
1981; etc.). In spite of that, animals, including humans, skillfully navigate
complex trajectories through cluttered environments to destinations which cannot
be seen from the starting point, behavior which is consistent with an underlying
representation that preserves the metric properties of space.
Current theories of spatial memory resolve the dilemma of routine non-
metric spatial judgments coupled with highly evolved navigation skills by
proposing a metric level of representation, plus another level of representation to
account for non-metric behavior. For example, Huttenlocher, Hedges & Duncan
(1991) propose a categorical level of representation which when combined with
imprecise metric knowledge produces systematically biased judgments of spatial
location. Similarly, Poucet (1993) suggests that both topological and metric
representational systems are needed to account for contradictory spatial behaviors
observed in animals. The present model focuses on a metric level of representa-
tion and retrieval, while acknowledging that other types of representation are
needed to account for the full range of human spatial behavior.
Key terms
Before describing the model, a few key terms will be defined. An allocentric
representation is an environment-centered representation which stores inter-
landmark Euclidean relations independently of any momentary perspective of the
viewer. Large-scale space refers to a scale of space which is large enough to
walk around and contain landmarks which cannot be simultaneously apprehend-
ed. The immobile, behaviorally relevant landmarks that structure large-scale
space are the objects represented in the object-to-object system. The coordinate
system is an orthogonal set of reference axes used to compute the Euclidean
SEPARABILITY IN SPATIAL SYSTEMS 47
Self-reference system
1 F F 1
2 2
3 L R L R
4 4
5 B B 5
A B C
Figure 1. 1A is a schematic illustration of a cognitive vector space, 1B illustrates a self-
reference system consisting of a representation of the front-back/right-left axes of the body,
and 1C illustrates superposition of a self-reference system on the cognitive vector space. The
metric coordinates of the objects represented in the vector space are computed by the self-
reference system.
SEPARABILITY IN SPATIAL SYSTEMS 49
If the model is correct in its premise that the cognitive vector system and the
self-reference system are functionally separable, then it ought to be possible to
influence the accessibility of spatial relations within one system independently of
the other. For example, a variable which influences the accessibility of a
landmark’s relative direction within the self-reference system should have no
influence on its accessibility in the cognitive vector space, and complementarily,
a variable which influences the accessibility of relative direction within the
cognitive vector space should not influence its accessibility within the self
reference system. The experiments reported here tease apart accessibility in the
self-reference system from accessibility in the cognitive vector space.
Consider an example to help clarify the distinction being made between accessi-
bility in the self-reference system and the cognitive vector space. Figure 1C
illustrates the arrangement of the self-reference system in the cognitive vector
space if a person were to imagine herself at Landmark 3, facing in the direction
indicated by the front pole of the front/back axis. Assume that the person is then
asked to point in the direction of Landmark 2 from her imagined location in the
environment. Retrieval of the relative direction of Landmark 2 takes place at two
levels. At the level of the cognitive vector space, the retrieval of the direction of
Landmark 2 relative to Landmark 3 can be thought of as corresponding to the
activation of the vector connecting them. There are various potential variables
that could affect the speed of vector activation. One obvious possibility is vector
strength, which could vary as a function of the frequency of travel between two
landmarks, with stronger vectors being more quickly activated than weaker
vectors. Inter-landmark distance is another possibility: longer vectors may take
longer to activate than shorter vectors. Yet a third possibility is the directness of
the connection between two landmarks. Landmarks which are directly connected
by a vector should be more accessible than landmarks connected indirectly
through one or more other landmarks (e.g., in Figure 1A, Landmarks 4 and 5 are
indirectly connected through their connections to Landmark 3). Once the target
vector is activated, then its Euclidean coordinates are calculated within the self-
reference system. In this example, the direction of Landmark 2 relative to the
body is computed. As previously discussed, the location of Landmark 2 in the
anterior half of the self-reference system (body space) means that the coordinates
of the vector connecting Landmark 2 to Landmark 3 (Vector3–2) will be comput-
ed more quickly than the coordinates of vectors in posterior body space (e.g.,
Vector3–4, Vector3–5).
Earlier accounts of the accessibility of inter-object relations in object-to-
object representations did not allow for the possibility of differential accessibility
of inter-object relations (Levine, Jankovic & Palij 1982). In their proposal that
cognitive maps have picture-like properties, Levine et al. proposed that sequen-
tially experienced inter-object relations are read into a representational system
which makes all inter-object relations, even those not directly experienced,
simultaneously and equally available. The principle of the equi-availability of
inter-object relations has been confirmed by earlier findings (Levine et al. 1982;
SEPARABILITY IN SPATIAL SYSTEMS 51
Sholl 1987); however, in these instances, either the test spaces were very simple
or care was taken to ensure that variables which might affect inter-object
accessibility were controlled. Thus, given the complexity of the large-scale
environments along with prior findings suggesting functional differences between
environmental landmarks differing in familiarity, geographic prominence, and
cultural importance (Sadella, Borroughs & Staplin 1980), inter-object relations
are likely to be differentially accessible in an object-to-object system represent-
ing natural environments.
The extent to which the accessibility of relative direction within the self-to-object
system is independent of its accessibility within the object-to-object system
should be demonstrable with a point-to-unseen-targets task developed by Sholl
(1987). The task manipulates facing direction, landmark set, and egocentric target
location in a Latin square design. In the task, a central location in a known
environment (or test space) serves as a reference location, and people’s knowl-
edge of the direction of landmarks relative to the reference location is tested
from two facing directions diametrically opposed. The test space is bisected at
the reference location along an imaginary axis orthogonal to the facing direction
axis. For example in the schematic illustration below, east and west facing
directions are tested, and a north-south axis divides the test space into an eastern
and western half. An equal number of landmarks from each side of the test space
are selected as targets.
East
North South
Western Landmark
Set
West
Figure 2. Schematic of experimental design for point to unseen targets task
52 M. JEANNE SHOLL
Landmarks in Front
Landmarks in Back
E W
W E E W W E
E W W E
Figure 3A illustrates the typical front-back effect. In both the east and west
facing-direction conditions, landmarks located in front of the body are pointed to
faster than landmarks behind the body, regardless of whether the landmarks are
from the eastern or western landmark sets. This finding illustrates the differential
accessibility of spatial relations in the self-reference system, in the special case
in which there is no effect of landmark set. An effect of landmark set is
expected if on average the relative directions of the members of one landmark
set are more easily accessible in the cognitive vector space than the members of
the other landmark set. Figures 3B and 3C illustrate the more complex pattern of
results expected if a front-back effect is combined with an effect of landmark
set. An advantage of western over eastern landmarks is illustrated in Figure 3B
SEPARABILITY IN SPATIAL SYSTEMS 53
and an advantage of eastern over western landmarks in Figure 3C. The darker
gray area of the bars in Figures 3B and C depict the additional time required to
point to the less accessible landmark set.1
The experiments reported in this paper show that accessibility within a
body-centered reference system is indeed separable from accessibility in the
object-to-object system. Experiment 1 produced the latency profile illustrated in
Figure 3C, a finding consistent with an effect of landmark set combined with a
front-back effect. However, because the latency profile was produced by a Latin
square design, a main effect of landmark set is indistinguishable from an
interaction between facing direction and egocentric target location. Experiments
2 and 3 ruled out the possibility that an interaction accounted for the latency
profile observed in Experiment 1 and demonstrated that landmark size accounted
for the observed effect of landmark set.
General method
Participants
Figure 4 shows a map of the Chestnut Hill Campus of Boston College, which
served as the test space in the present study. The reference location in Experi-
ments 1 and 3 is depicted on the map by the dot to the left of the letter B, and
Experiment 2’s reference location is depicted by the dot below the letter D.2 A
line drawn parallel to the vertical edges of the map through the reference
location divided the test space into eastern and western halves.3 The east/west
facing directions were orthogonal to the line bisecting the test space. Target
landmarks were selected from the two halves of the test space and will be described
separately for each experiment. To the extent possible, target landmarks were evenly
distributed across the eastern and western halves of the test space.
54 M. JEANNE SHOLL
Apparatus
The point-to-unseen targets task was controlled on line by an Apple IIe micro-
computer with Z80 microprocessor and equipped with a Mountain Computer
Clock. Participants made their pointing responses with a modified Apple II
joystick, connected to the computer via an analog/digital interface card. A
detailed description of the apparatus and the supporting software routines is given
by Sholl (1987).
Procedure
All testing was done in a room located at the dot to the left of the letter A in
Figure 4. The testing room provided no outside view of the campus. In order to
ensure that each participant was familiar with the target landmarks, the experi-
menter read a list of landmarks to the participant, and participants indicated
whether they knew each landmark’s location. If a participant did not know a
landmark’s location, the experimenter gave a brief verbal description of its
whereabouts. The list contained all the target landmarks as well as some foils
that were not targets. After landmark familiarization, the experiment proceeded
under computer control. First, the participant was instructed how to use the
joystick to make pointing responses. These instructions were followed by a set
of 12 practice trials in which participants pointed to the randomly ordered
numbers 1 to 12 as they imagined them to be on the clockface and were
provided accuracy feedback to help them fine-tune their responses.
Each practice and experimental trial had an identical sequence. First, there
was a joystick centering routine, in which the joystick was physically placed at
the origin of its coordinate system. Once the joystick was centered, the partici-
pant initiated the trial, and the name of the target was presented at the center of
the CRT screen. Participants made their pointing response by moving the joystick
radially in the direction they imagined the target to be. When the joystick
movement exceeded a radius of 120 units in the “joystick” space, the computer
recorded both the response latency from the onset of the target name and the
angle of the response to the nearest degree space (see Sholl 1987, for a more
detailed description).
The experimental trials followed directly after the practice trials. Participants
were told to take as much time as they needed to imagine the campus from the
specified position, to move the joystick in the direction they imagined each
landmark to be in relation to their body, and to point to the center of each target.
The reference location and facing direction were described using local visual
56 M. JEANNE SHOLL
cues, which never served as target landmarks. The order of the two within-
subjects facing-direction conditions was counter-balanced across subjects. Order
was included in analyses of variance as a control variable and is not reported.4
Target landmarks were presented in a different random order for each participant
and in each facing-direction condition. After the experimental trials were
completed, there were twelve more clock trials with no accuracy feedback. These
trials served as a control check for the relative speed of forward and rearward
joystick movements.
Data analysis
Although three variables are manipulated in a Latin Square design, only two
variables are entered into the analysis of variance (ANOVA). In the experiments
reported here, egocentric target location and facing direction were entered in the
ANOVA, and the results are reported as a function of these two variables.
Experiment 1
Method
Target Landmarks. The eastern landmark set consisted of the Recreation Com-
plex, Alumni Stadium, Robsham Theater, Roberts Center, Parking Garage, and
More Hall. The western landmark set included Cushing Hall, Gasson Hall,
Carney Hall, Main Gate, Lyons Hall, and O’Connell Hall. Mean familiarity
rating, collected from a separate group of Boston College undergraduates (n = 22)
on a 5-point scale, were 4.07 and 4.02 for the eastern and western landmark sets,
respectively, t (10) < 1.0. High ratings indicate high familiarity. On average, the
distance of eastern and western landmarks from Reference Location B was 293.1
m and 245.4 m, respectively, t (10) < 1.0.
Results
Landmark targets. Mean response latencies are shown in Figure 5A. There was
a main effect of egocentric target location, F (1, 10) = 6.15, p = .03, MSe = .56,
qualified by an interaction between egocentric target location and imagined
facing direction, F (1, 10) = 7.89, p = .02, MSe = .43. The interaction reflected a
significant 1.23 s front-back effect for the east facing-direction condition and no
front-back effect for the west facing-direction condition. As shown in Figure 5B,
SEPARABILITY IN SPATIAL SYSTEMS 57
pointing error did not mimic pointing latencies. For pointing error, there was a
front-back effect, F (1, 10) = 13.68, p = .004, MSe = 141.70, which did not interact
with facing direction condition, F (1, 10) < 1.0.
5 50
Front
4 Back 40
3 30
2 20
1 10
0 0
East West East West
A B
Figure 5. Mean pointing latency (in Panel A) and error (in Panel B) as a function of facing
direction and front/back target location in Experiment 1. Error bars are standard errors of
the mean
Clock numeral targets. Pointing responses on the second set of clock trials
(excluding the clock numerals 3 and 9) were analyzed to test whether forward
and rearward movements of the joystick were equivalent in speed and accuracy.
For forward and rearward joystick movement, mean latency was 1.603 s and
1.634 s, respectively, and mean error was 9.17° and 8.75°, respectively. Neither
difference was significant, ts (11) < 1.0.
Discussion
concluding that the latency profile reflects the main effects of landmark set and
egocentric target location, the possibility that the profile is actually attributable
to an interaction between egocentric target location and facing direction needs to
be considered.
While additive main effects of landmark set and egocentric target location
are consistent with the present model’s premise of separate levels of accessibility
when retrieving spatial relations, the alternative of an egocentric target location
by facing direction interaction is consistent with a different view of how spatial
memory is organized and retrieved. In Zipser’s (1987) distributed field view
model of place knowledge, a representation of large-scale space is not stored as
an integrated network of metric spatial relations, but instead consists of multiple
individual representations, each storing a local view encoded from a different
vantage point. The different viewpoint-specific representations are connected
together with motor programs for how to get from one represented scene to
another. The idea that knowledge of large scale space is made up of a collection
of viewpoint-specific representations has more recently been espoused by
McNamara and his colleagues to account for their findings of viewpoint specific-
ity in spatial representations formed from static viewing points (e.g., Roskos-
Ewoldsen, McNamara, Shelton & Carr 1998; Shelton & McNamara 1997).
It seems reasonable to assume that a local-view model of spatial memory
would predict that the accessibility of relative landmark direction is, at least in
part, a function of whether the target landmark is represented in the local view
activated by the facing direction instruction used in the present task. Certainly
different viewpoints in a cluttered environment have differentially expansive
vistas which, dependent upon the number of landmarks visible in the vista, are
more or less revealing of inter-object relations. For example, a view of an
expansive vista in which numerous landmarks are simultaneously visible provides
more direct information about inter-object relations than a view of a limited vista
containing few visible landmarks. In this regard, it is notable that Reference
Location B is at the top of a steep flight of 116 stairs connecting the “lower” and
“middle”5 parts of campus.
When facing east at Reference Location B, the stairs extend downward in
front of you, and the local view is an aerial view revealing a large portion of the
lower campus, especially in the winter when the trees have lost their leaves. In
contrast when facing west, the local view is a ground-level view of middle
campus, with a single building (Devlin Hall) at its center and limited vistas to
either side. Accordingly, a viewpoint-specific representation of the more expan-
sive eastern vista may code more interobject relations directly, and hence more
accessibly, than a viewpoint-specific representation of the less expansive western
SEPARABILITY IN SPATIAL SYSTEMS 59
Experiment 2
Method
Target landmarks. The eastern and western landmark sets were identical to those
in Experiment 1, except McElroy Commons was substituted for Cushing Hall,
which could not be used because it switches into the eastern landmark set at
Reference Location D. With this change, the mean familiarity of the western
landmark set was 4.10. The mean distance of the eastern and western landmarks
from Reference Location D was 334.3 m and 243.8 m, respectively, t(10) = 1.10,
p > .10.
Results
Landmark Targets. As shown in Figure 6A, a latency profile similar to the one
observed in Experiment 1 was observed in Experiment 2. There was main effect
of egocentric target location, F (1, 10) = 10.74, MSe = .45, p = .008 , and a
marginally significant interaction between egocentric target location and facing
direction, F (1, 10) = 4.03, MSe = 1.20, p = .07. As shown in Figure 6B, there was
an effect of egocentric target location on pointing errors, F (1, 10) = 5.35,
60 M. JEANNE SHOLL
MSe = 251.5, p = .04. Targets located in the front part of body space (M = 24.6°)
were pointed to more accurately than targets located in the back part of body
space (M = 35.2°). There were no other main or interaction effects.
Front
8 40
Mean Reaction Time (in secs.)
Back
4 20
2 10
0 0
East West East West
Facing Direction Facing Direction
A B
Figure 6. Mean pointing latency (in Panel A) and error (in Panel B) as a function of facing
direction and front/back target location in Experiment 2. Error bars are standard errors of
the mean.
Clock numeral targets. For forward and rearward joystick movements, mean
latency was 1.655 s and 1.681 s, respectively, and mean error was 9.75° and
11.67°, respectively. Neither difference was significant, ts (11) < 1.0.
Discussion
the two landmark sets: One variable is an attitudinal variable, landmark pleasant-
ness, and the other is a physical variable, landmark size.
With respect to student’s attitudes about landmarks, it is important to
consider that the campus is divided up functionally as well as geographically. In
Experiments 1 and 2, all the landmarks in the eastern landmark set were located
on lower campus, which contains residential, leisure, and recreational facilities,
and all landmarks in the western landmark set, with the exception of one upper-
campus residence hall, were academic buildings located on middle campus.
Because lower campus buildings serve primarily nonacademic functions and
middle campus landmarks serve primarily academic functions, lower campus
buildings may be perceived as more pleasant, and hence may be more accessible
cognitively, than the middle campus buildings. Places perceived to be pleasant
are both more likely to be physically approached (Mehrabian & Russel 1974)
and to elicit an attitude of approach and exploration (Russel & Mehrabian 1978).
To determine whether middle campus landmarks were perceived as less
pleasant than lower campus landmarks, a group of 37 Boston College students
were asked to rate the pleasantness of 28 landmarks on a 7-point scale (1 = very
unpleasant, 7 = very pleasant). Pleasantness was defined as how much participants
would enjoy themselves and/or how comfortable they would feel when visiting
the landmark. As expected, academic buildings were perceived as less pleasant
(M = 3.55) then nonacademic structures (M = 4.80), t (26) = 2.26. However, the
pleasantness ratings for the lower and middle/upper target landmarks used in
Experiments 1 (Ms = 4.57 & 4.16, respectively) and 2 (Ms = 4.57 & 4.08,
respectively) were not significantly different, and therefore pleasantness cannot
account for an effect of landmark set.
Another variable that might differentiate the landmarks comprising the
eastern and western landmark sets is their size. As can be seen in Figure 4, two
eastern landmarks, the Recreation Complex and Alumni Stadium, cover a
particularly large area. There are no western landmarks of comparable size.
Pointing latencies may be faster for eastern than western landmarks because on
average the former cover a larger area than the latter. In the coordinate-system
model of retrieval, information about landmark size would be stored directly in
the object-to-object system. While speculative, one possible mechanism underly-
ing an effect of landmark size is that nodes coding larger landmarks are more
quickly activated in the network than nodes coding smaller landmarks. Another
possible mechanism is that the self-reference system computes a range of polar
angles for each target, with the size of the range positively related to the size of
the landmark. If so, pointing responses to large landmarks could be executed
with less precision than pointing responses to small landmarks. This latter
62 M. JEANNE SHOLL
Experiment 3
Method
Target Landmarks. Four matched pairs, each pair consisting of one eastern and
one western landmark, were created for this experiment. Pairs were matched on
distance, area, and familiarity as well as being directional mirror images. The
target pairs (listed as eastern landmark/western landmark) were Robsham
Theater/Bapst Library; Tennis Courts/O’Neill Plaza; Stadium-Forum/Upper Campus
Dorms; Roberts Center/Carney Hall. Matching produced eastern and western
landmark sets equated on distance (205 m vs. 273 m), t (6) < 1.0, familiarity (4.24 vs.
3.93), t (6) < 1.0, and area (7650 sq m vs. 7608 sq m), t (6) < 1.0.
Results
Landmark targets. Mean response latencies are shown in Figure 7A. There was
a 1.0 s front-back effect, F (1, 14) = 4.14, MSe = 3.84, p < .06. There were no
other main or interaction effects. As in Experiments 1 and 2, there was a front-
back effect for pointing accuracy, F (1, 14) = 10.04, MSe = 84.21.
Front
6 40
Mean Reaction Time (in secs.)
Back
3 20
2
10
1
0 0
East West East West
Facing Direction Facing Direction
Figure 7. Mean pointing latency (in Panel A) and error (in Panel B) as a function of facing
direction and front/back target location in Experiment 3. Error bars are standard errors of
the mean.
Clock numeral targets. Mean latencies for forward (M = 1.785 s) and rearward
(M = 1.75 s) joystick movements did not differ significantly, t (15) < 1.0. Further-
more, there was no statistically significant difference in the accuracy with which
forward (M = 7.25°) and rearward (M = 8.5°) movements were executed, t
(15) < 1.0.
Discussion
whether this effect is attributable to less precise pointing responses for large
targets, the variability of pointing responses was computed with landmark as the
unit of analysis. For each experiment, variability in pointing response angles was
averaged across the landmark’s front/back location in body space and then
correlated with landmark size. The correlations are listed in Table 1.
General discussion
Notes
1. In the depicted latency profiles, the magnitude of the landmark-set effect is arbitrarily set equal
to the magnitude of the front-back effect.
2. Locations C and E were not used in this study.
3. Although the north-south axis is not parallel to the vertical edges of the map, for ease of
communication compass points will be used to describe direction as if it were.
4. In all the ANOVAs conducted, the only effect of the order-of-the-facing-direction variable was
in one higher-order interaction.
5. The spatial modifiers are attributable to the fact that the campus is situated on the side of hill.
References
Hardwick, D. A., McIntyre, C. W. & Pick, H. L. (1976). The content and manipulation of
cognitive maps in children and adults. Monographs of the Society for Research in
Child Development, 41, 1–55.
Gallistel, C. R. (1990). The organization of learning. Cambridge, MA: MIT Press.
Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton
Mifflin.
Huttenlocher, J., Hedges, L. V. & Duncan, S. (1991). Categories and particulars: Proto-
type effects in estimating spatial location. Psychological Review, 98, 352–376.
Kosslyn, S. (1975). Information representation in visual images. Cognitive Psychology, 7,
341–370.
Levine, M., Jankovic, I. N. & Palij, M. (1982) Principles of spatial problem solving.
Journal of Experimental Psychology: General. 111, 157–175.
Mehrabian, A. & Russell, J. A. (1974). An approach to environmental psychology.
Cambridge, MA: MIT Press.
Moar, I. & Bower, G. H. (1983). Inconsistency in spatial knowledge. Memory and
Cognition, 11, 107–113.
O’Keefe, J. (1991). The hippocampal cognitive map and navigational strategies. In J.
Paillard (Ed.), Brain and Space (pgs. 273–295). London: Oxford University Press.
Poucet, B. (1993). Spatial cognitive maps in animals: New hypotheses on their structure
and neural mechanisms. Psychological Review, 100, 163–182.
Presson, C. C. & Hazelrigg, M. D. (1984). Building spatial representations through
primary and secondary learning, Journal of Experimental Psychology: Learning
Memory, and Cognition, 10, 723–732.
Presson, C. C. & Montello, D. R. (1994). Updating after rotational and translational body
movements: Coordinate structure of perspective space. Perception, 23, 1447–1455.
Rieser, J. J. (1989). Access to spatial structure from novel points of observation. Journal
of Experimental Psychology: Learning, Memory & Cognition, 15, 1157–1165.
Roskos-Ewoldsen, B., McNamara, T. P., Shelton, A. L. & Carr, W. (1998). Mental
representations of large and small spatial layouts are orientation dependent. Journal
of Experimental Psychology: Learning, Memory & Cognition, 24, 215–226.
Russell, J. A. & Mehrabian, A. (1978). Approach-avoidance and affiliation as functions
of the emotion-eliciting quality of an environment. Environment and Behavior, 10,
355–387.
Sadalla, E. K., Burroughs, W. J. & Staplin, L. J. (1980). Reference points in spatial
cognition. Journal of Experimental Psychology: Learning, Memory & Cognition, 6,
516–528.
Shelton, A. L. & McNamara, T. P. (1997). Multiple views of spatial memory. Psycho-
nomic Bulletin and Review, 4, 102–106.
Shepard, R. N. & Hurwitz, S. (1984). Upward direction, mental rotation, and discrimina-
tion of left and right turns in maps. Cognition, 18, 161–193.
Sholl, M. J. (1987). Cognitive maps as orienting schemata. Journal of Experimental
Psychology: Learning, Memory & Cognition, 13, 615–628.
SEPARABILITY IN SPATIAL SYSTEMS 67
matrix specifying the relative position of the referent within the intrinsic
reference frame of the relatum. In our example, the possible position of the
refrigerator is constrained to the negative region on the left/right-axis of the
reference frame of Torsten.
(2) The bowl [= referent] is standing on the refrigerator [= relatum].
Die Schüssel steht auf dem Kühlschrank.
If sentence (1) is followed by the second input sentence (2), the mental model is
extended by a third node denoting the new referent bowl. This node is then
connected with the node for the relatum of the second sentence, the refrigerator.
Again, the corresponding matrix specifies the position of the referent (bowl) with
respect to the reference frame of the relatum (refrigerator). We want to empha-
size that in this case, the positions of the referents of the two sentences are not
specified with respect to a unique reference frame, because they are linguistically
related to different relata. As the example illustrates, the mental model specifies
object positions with respect to as many different reference frames as there are
relata used in the text. This turns out to be a specific characteristic of the
intrinsic reading of spatial relational expressions.
Given the deictic reading of left of in (1), the sentence linguistically
localizes the referent refrigerator relative to the relatum Torsten with respect to
a deictic reference frame. The deictic reference frame, which is linguistically not
explicitly indicated, is commonly assumed to be the egocentric reference frame
of the speaker or the hearer (see Miller and Johnson-Laird 1976; Levelt 1996).
The position of the referent is linguistically given relative to the relatum but has
to be determined within the deictic reference frame. Reading sentence (2)
deictically as well, the position of the bowl must also be determined within the
deictic reference frame. Obviously, the deictic reading leads to a mental model
that specifies the position of every object mentioned in the text within one
unique reference frame: the deictic reference frame. The mental model will
include an Ego-node for the speaker or hearer providing the deictic reference
frame, which will be connected with every object node in the graph. Since the
deictic reference frame is the unique reference frame relative to which every
object position has to be determined, it sets up an overall organizing principle of
the mental model. On the other hand, given an intrinsic reading, the mental
model seems not to be governed by an overall organizing principle. That the
“intrinsic perspective system”, as Levelt calls it, is less coherent than the deictic
one in terms of their inferential potential, was demonstrated by Levelt (1996
1986): Converseness and transitivity hold in the deictic perspective system but
not in the intrinsic one. From the point of view of the psychology of text
ORGANIZING PRINCIPLE IN SPATIAL MENTAL MODELS 71
has interpreted sentence (1) egocentrically and is now receiving sentence (2),
would have to infer that the bowl is on his left in order to establish coherence.
Primarily for intuitive reasons, we think it is plausible that the egocentric
reference frame may set up an overall organizing principle. But our modelling
approach predicts higher cognitive effort for such a principle if the text contin-
ues by mentioning a movement of the protagonist, e.g., a reorientation. If the text
is continued with sentence (4), the object configuration changes relative to Ego.
(4) Torsten turns to his left.
Torsten wendet sich nach links.
Since the mental model does not represent the description but the situation
described, the representation has to be updated according to the mentioned
reorientation. For example, the refrigerator is now to be located behind Ego. As
is predicted by our modelling approach, an update of the egocentric mental
model in the case of a reorientation of the protagonist requires to update all, but
only those, arcs that connect the Ego-node with an object-node, i.e. every object
position specified relative to the egocentric reference frame. Consequently, if the
egocentric reference frame sets up an overall organizing principle, the update of
the mental model in case of a reorientation is predicted to be maximally expen-
sive in terms of required inferences.
In maintaining that a coherent egocentric mental model has to be governed
by an overall organizing principle, the cognitive effort for an update of the
mental model could be diminished by using a unique reference frame that is
invariant against a reorientation of Ego. For example, Klatzky (1998) acknowl-
edges the possibility, that a reorientation of Ego may be represented by means of
an allocentric reference frame (sometimes called “absolute reference frame”, e.g.
Levelt 1996). As Klatzky further points out, the reference direction of an
allocentric reference frame, i.e. its orientation, might be derived from the
geometry of a room. The texts we investigate introduce the protagonist standing
in a room, as in (5).
(5) Torsten is standing in the middle of the kitchen.
Torsten steht inmitten der Küche.
Thus, the egocentric mental model must include a node for the room (here:
kitchen), which, as any other node in the graph, provides a three-dimensional
reference frame. The room reference frame might then set up an allocentric
overall organizing principle: Every object mentioned in the text is to be located
within the allocentric reference frame provided by the room-node. Contrary to
the egocentric reference frame, the allocentric reference frame captures the
ORGANIZING PRINCIPLE IN SPATIAL MENTAL MODELS 73
intuition, that the object configuration does not change if Ego does reorient.
There is no need for an update of object positions that are specified with respect
to the allocentric reference frame. Therefore, we suggest that in addition to the
egocentric reference frame the egocentric mental model might be supplied with
an allocentric reference frame. Hörnig, Claus, and Eyferth (1997) present further
arguments in favor of an allocentric reference frame for an egocentric mental
model, that allows for a representational distinction between movements of Ego
and objects. Especially if the cognitive demand of the task at hand is high,
egocentric object localizations might be abandoned in favour of allocentric
localizations.
Those objects that are linguistically related to the protagonist constitute a
special class of objects, in that they are localized within the egocentric reference
frame (because of the protagonist’s perspective) as well as within the allocentric
reference frame (because of the suggested allocentric overall organizing princi-
ple). We call this special class of objects anchoring objects because they anchor
the egocentric reference frame within the allocentric one. In our view, anchoring
objects establish egocentric orientation within the allocentrically defined sur-
rounding environment. Objects not classified as anchoring objects will be called
critical objects. While anchoring objects are always localized egocentrically,
critical objects might be localized either egocentrically or allocentrically. A third
possiblity would be that a critical object is localized only within the reference
frame of the relatum, e.g., an anchoring object. In this case, it would be accessi-
ble in the mental model only after the corresponding anchoring object has
already been accessed.
Evidence reported by Levinson (1996) may cast doubt on our suggestion,
that recipients will construct their egocentric mental models using an allocentric
reference frame. Levinson compared native speakers of Dutch and Tzeltal.
Whereas Dutch, as well as English and German, makes extensive use of ego-
centrically defined localizations (deixis), spatial descriptions in Tzeltal are made
with respect to an allocentric or absolute reference frame. As Levinson could
demonstrate in a variety of investigations, this difference of the languages is
reflected in the behaviour of the native speakers even in nonverbal tasks. For
example, if subjects had to judge the identity of two arrays of five cards A, B, C,
D, E in a recognition task, after they had been reoriented by 180°, native
speakers of Dutch exhibited an egocentrically defined identity criterion. Neglect-
ing the intervening reorientation, two arrays were judged as identical if the
arrangement from left to right was kept identical. In contrast, native speakers of
Tzeltal took the reorientation into account, judging the array as identical if the
arrangement was now reversed: E, D, C, B, A. These findings clearly indicate
74 ROBIN HÖRNIG, BERRY CLAUS AND KLAUS EYFERTH
that native speakers of Dutch, and presumably those of English and German too,
code object arrangements egocentrically, while native speakers of Tzeltal code
them allocentrically. This preference for egocentric coding should show up in
constructing egocentric mental models as well. On the other hand, as we argued
above, an exclusively egocentric coding is cognitively very demanding in case of
an update. Thus, even if people prefer to code object positions egocentrically, the
cognitive demand of the task at hand might prevent them from doing so.
If people prefer to code object positions egocentrically as long as the
cognitive demand allows them to do so, one might expect an influence of the
modality of text presentation. As compared with aural text presentation, visual
presentation can be shown to interfere with constructing and maintaining spatial
mental models (e.g., Kaup, Kelter, Habel, and Clauser 1997). The reading
process does by itself reduce the cognitive capacities available for other spatial
tasks. While a listener might be able to cope with the high cognitive demand of
constructing and updating a mental model governed by an egocentric overall
organizing principle, a reader might have to abandon his preference for egocen-
tric coding. Instead, he or she might code positions of critical objects allocentric-
ally in order to reduce the cognitive demand required for updating the mental
model in case of a reorientation.
Empirical evaluation
is taken as the most dominating one because canonically (i.e. upright posture) it
is in accordance with gravity. The left/right-axis is dominated by the front/back-
axis, from which it is derived. Object access in egocentric mental models is
claimed to proceed according to the dominance relations of the axes. Within the
horizontal plane, objects located on the front/back-axis should be accessed faster
than those on the left/right-axis. A second principle rests on the asymmetry of
the front/back-axis and states that front is perceptually as well as functionally
more salient than back. Therefore, objects in front of Ego are predicted to be
accessed faster than objects behind. Since the mental transformation model and
the spatial framework model make contradicting predictions for access latencies
for objects behind and beside Ego, the adequacy of the models can be evaluated
empirically. As a third alternative, Franklin and Tversky (1990) consider the
equiavailability model, i.e. that objects in either direction are equally well
accessible. In a series of experiments on egocentric mental models, Franklin and
Tversky (1990; see also Bryant and Franklin 1992) report access latency patterns
that agree with the spatial framework model, but contradict the mental transfor-
mation model as well as the equiavailability model. Thus, we may conclude that
these egocentric mental models are organized in a way that reflects the properties
of the egocentric reference frame.
Hypotheses. As Bryant and Tversky (1992: 29) put it: “A spatial framework is
a mental model that specifies the spatial relations among objects with respect to
an observer in the environment.” We take this as a statement about objects
localized within the egocentric reference frame, e.g., anchoring objects. Because
these objects are represented immediately within the egocentric reference frame
they are directly accessible without any need for a mental transformation. On the
other hand, the mental transformation model makes sense only by modelling the
access to objects that are not immediately represented within the egocentric
reference frame, but whose locations are solely defined with respect to the
environment, i.e. — in our terms — with respect to an allocentric reference
frame. To put it the other way round: Since Barbara Tversky and her co-workers
could demonstrate that access to objects linguistically located within the egocen-
tric reference frame, i.e. anchoring objects, exhibits the spatial framework
pattern, the question of an overall organizing principle can be evaluated empiri-
cally. If the egocentric reference frame sets up an overall organizing principle,
the position of every object mentioned in a text has to be specified with respect
to the egocentric reference frame, eventually by an inference. As a consequence,
object access latencies should exhibit the spatial framework pattern for anchoring
objects in the same way as for critical objects. On the other hand, if an allo-
76 ROBIN HÖRNIG, BERRY CLAUS AND KLAUS EYFERTH
they had to indicate the remembered direction by pressing the corresponding key
on the numerical key pad (RT2). No feedback was given about the correctness
of the response.
Every subject was probed 24 times (three probes for each of the eight
stories). One half of the probes involved anchoring objects, the other half
involved critical objects. The three object probes for each text as well as the
order in which they were presented were held constant for each text. The correct
direction differed dependent on the variant of the story that a subject had
received, i.e. each test object was probed for every direction (varied between
subjects). The variants of the eight different texts were combined in a way that
every subject had to respond to three object probes for each object type (anchor-
ing and critical object) in each direction (front, right, back, left).
In the first experiment (reading), subjects read the description of the object
arrangement on a single display on a PC screen. Reading times were self paced.
In the testing phase, reorientation sentences and object probes were also present-
ed visually on the PC screen. In the second experiment (listening), subjects heard
the description of the object arrangement via earphones. To mimic self paced
reading, subjects could call for the description of any one of the four walls by
pressing a corresponding key without restriction for order and frequency. In the
testing phase, reorientation sentences and object probes were also presented
aurally via earphones. During listening, nothing was displayed on the screen. The
material was exactly the same in both experiments.
Table 1. Mean response times for determining directions for object probes (RT1 in ms)
dependent on object type and direction for experiment 1 (reading) and experiment 2
(listening).
Experiment 1: Reading Experiment 2: Listening
front right behind left front right behind left
anchoring objects 2773 3665 2994 4161 3059 4094 3593 3980
critical objects 4206 4081 4304 4150 3476 4080 3357 4261
Discussion
As expected, the response time pattern for anchoring objects corresponded to the
spatial framework pattern in both experiments. Objects located on the front/back-
axis are responded to faster than objects located on the left/right-axis in the
reading condition as well as in the listening condition. Because anchoring objects
are always represented immediatly within the egocentric reference frame, and
therefore are directly accessible, response times agree with the spatial framework
model. We regard this claim not discarded by the fact that the asymmetry of the
front/back-axis was not confirmed.
For critical objects, the spatial framework pattern was obtained in the
listening condition but not in the reading condition. When subjects had to read
the texts, direction of critical objects did yield no influence at all. Critical objects
were expected to exhibit the spatial framework pattern in the same way as
anchoring objects only if they were immediately located within the egocentric
reference frame, i.e. if egocentric object positions would have been inferred. This
was the case with listeners but not with readers. We take this to indicate that
people prefer to code all object positions egocentrically if possible: the egocentric
reference frame sets up an overall organizing principle in egocentric mental
models. Even if a referent is linguistically localized to a relatum different from
the protagonist, a reader taking the protagonist’s perspective tries to infer the
object position relative to Ego already during comprehension. These online
inferences establish coherence.
As the finding for critical objects in the reading condition indicates, objects
not linguistically related to the protagonist are not always located within the
ORGANIZING PRINCIPLE IN SPATIAL MENTAL MODELS 79
egocentric reference frame. We explain the fact, that readers do not code the
positions of critical objects egocentrically by the high cognitive demand of an
update of the mental model when Ego has to reorient. Because the reading
process reduces the cognitive ressources available to construct and maintain the
spatial mental model, the expense of an update is diminished by abandoning
egocentric localizations of critical objects. That reading is more difficult than
listening is confirmed by the higher error rates with visual presentation (21%)
than with aural presentation (11%). Since response times for critical objects in
the reading condition do not correspond to the predictions of the mental transfor-
mation model, we lack explicit evidence that critical objects instead are localized
within an allocentric reference frame. But on the one hand, the mental transfor-
mation model stipulates that a mental reorientation of Ego takes more time if it
is towards the back than towards the side. One would expect that reading times
for reorientation sentences reflect this difference, because the egocentric mental
model requires an according update: a mental rotation. Actually, reading times
for reorientation sentences did not differ significantly whether the protagonist
was described as turning to his back (6423 ms), to his right (7313 ms), or to his
left (6801 ms). If these reading time data elucidate the concept of a mental
reorientation as addressed by the mental transformation model, they are apt to
doubt that a reorientation of Ego to the back takes more time than to either side.
On the other hand, critical objects seem not to be accessed within the reference
frames of anchoring objects in the reading condition, because in this case, access
latencies should also reflect access latencies for anchoring objects. That critical
objects in the reading condition were equally well accessible in either direction
at least indirectly speaks in favour of an allocentric localization. More direct
evidence for an allocentric reference frame in an egocentric mental model has to
be addressed to future work.
In the light of the findings of Levinson (1996), the observed preference for
an egocentric overall organizing principle should not be taken as a universal
principle, but has to be restricted to native speakers of languages that express
object locations egocentrically, as does for example German, English, and Dutch.
It should not be generalized for speakers of languages like Tzeltal.
Acknowledgments
This research was supported by the Deutsche Forschungsgemeinschaft (DFG) in the project
“Modelling Inferences in Mental Models” (Wy 20/2–1) within the priority program on spatial
cognition (“Raumkognition”).
Correspondence should be addressed to Robin Hörnig, Sekr. FR 5–8, Department of Computer
80 ROBIN HÖRNIG, BERRY CLAUS AND KLAUS EYFERTH
Science, Technical University of Berlin, Franklinstr. 28/29, D-10587 Berlin; email: rhoernig@cs.tu-
berlin.de
References
Ambler, A. P. & Robin J. Popplestone (1975). Inferring the positions of bodies from
specified spatial relationships. Artificial Intelligence, 6, 157–174.
Bryant, David J. & Barbara Tversky (1992). Assessing spatial frameworks with object and
direction probes. Bulletin of the Psychonomic Society, 30, 29–32.
Bryant, David J., Barbara Tversky & Nancy Franklin (1992). Internal and external spatial
frameworks for representing described scenes. Journal of Memory & Language, 31,
74–98.
Claus, Berry, Klaus Eyferth, Carsten Gips, Robin Hörnig, Ute Schmid, Sylvia Wiebrock
& Fritz Wysotzki (1998). Reference frames for spatial inferences in text comprehen-
sion. In C. Freksa, C. Habel & K. F. Wender (Eds.), Spatial Cognition. An interdisci-
plinary approach to representing and processing spatial knowledge (241–266). Berlin:
Springer.
Franklin, Nancy & Barbara Tversky (1990). Searching imagined environments. Journal of
Experimental Psychology: General, 119, 63–76.
Franklin, Nancy, Barbara Tversky & Vicky Coon (1992). Switching points of view in
spatial mental models. Memory & Cognition, 20, 507–518.
Hörnig, Robin, Berry Claus & Klaus Eyferth (1997). Objektzugriff in Mentalen
Modellen: Eine Frage der Perspektive. In C. Umbach, M. Grabski & R. Hörnig
(Eds.), Perspektive in Sprache und Raum (81–104). Wiesbaden: Deutscher Universi-
tätsverlag.
Kaup, Barbara, Stephanie Kelter, Christopher Habel & Constanze Clauser (1997). Zur
Wahl des repräsentierten Bildausschnitts beim Aufbau mentaler Modelle während
der Textrezeption. In C. Umbach, M. Grabski & R. Hörnig (Eds.), Perspektive in
Sprache und Raum (61–79). Wiesbaden: Deutscher Universitätsverlag.
Klatzky, Roberta L. (1998). Allocentric and egocentric spatial representations: definitions,
distinctions, and interconnections. In C. Freksa, C. Habel & K. F. Wender (Eds.),
Spatial cognition. An interdisciplinary approach to representing and processing
spatial knowledge (1–17). Berlin: Springer.
Levelt, Willem J. M. (1986) Zur sprachlichen Abbildung des Raumes: Deiktische und
intrinsische Perspektive. In H.-G. Bosshardt (Ed.) Perspektive auf Sprache. Inter-
disziplinäre Beiträge zum Gedenken an Hans Hörmann (187–211). Berlin: de
Gruyter.
Levelt, Willem J. M. (1996). Perspective taking and ellipsis in spatial description. In P.
Bloom, M. A. Peterson, L. Nadel & M. F. Garrett (Eds.), Language and Space
(77–108). Cambridge, MA: MIT Press.
ORGANIZING PRINCIPLE IN SPATIAL MENTAL MODELS 81
Introduction
or metric, precise or vague, etc.) and different kinds of space (atomistic or not,
one-, two-, or three-dimensional). This paper bears on whether RCC would be a
suitable foundation for such a family, and if not, why not.
It has been claimed for RCC (Randell, Cui and Cohn 1992) that its use of
extended regions — rather than points — as basic spatial entities, makes it a
more suitable formalism for non-mathematical qualitative spatial reasoning than
topology as developed by mathematicians, avoiding many complexities and
counterintuitive results while remaining rich enough for everyday reasoning. The
analysis in this paper leads to a topological taxonomy of spherical regions,
confirming RCC’s usefulness in clarifying and sharpening intuitive spatial
concepts. However, RCC turns out to be insufficiently expressive to capture some
desirable limitations on the properties of geographical regions, or to express
some significant constraints between properties of regions that can themselves be
formulated within it. Because of these limitations, it is concluded that RCC will
probably not be an adequate basis for formalizing “non-mathematical” topologi-
cal reasoning.
of assertions implied by the dots, it would follow that those axioms must be
inconsistent with some finite subset of them, implying some finite limit on the
number of separate pieces a region may have.
There is a second problem, which undermines the idea that we can define
each geographical region independently of any other: it is quite possible to have
two regions that meet the restrictions suggested above, while the union of the
two does not. This is easier to describe for two planar than two spherical regions.
Let A and B be equal-sized squares sharing a side, each having an infinite
sequence of semicircular “bites” removed from the shared side: the first with the
middle third of the side as its diameter, the second the middle third of the upper
third, the next the middle third of the uppermost ninth, etc. A and B both meet
the restrictions suggested above, but the union of the two is a “region” with an
infinite number of holes.
This paper restricts the range of intended models of RCC to those in which
the variables standing for regions range over parts of a locally Euclidean space
(Armstrong 1979). Such a space is of uniform dimensionality (ruling out a disc
with a one-dimensional “spike” protruding from it, for example). For any two
non-boundary points, sufficiently small neighbourhoods of the two points will be
topologically indistinguishable; in the case of a two-dimensional space, such
neighbourhoods will be topological discs. If a locally Euclidean space has
boundaries, sufficiently small neighbourhoods of any two boundary points will
again be topologically indistinguishable; in the two-dimensional case, these
neighbourhoods will be half-discs, and the boundaries themselves will be simple
closed curves. Regions will be assumed to consist of one or more separate
components, each of which is regular closed (Requicha and Tilove 1978), is
path-connected (Armstrong 1979), and is such that every boundary point is the
boundary point of at least one maximal connected component of the region’s
interior. These further restrictions serve to rule out unacceptable regions of
various kinds. When the space containing the regions is the sphere S2 (or the
plane E2), a region obeying them is uniformly two-dimensional, includes all the
points on its own boundary, and has no isolated lines or points removed. In S2 it
will also have a finite number of components. These restrictions may not exclude
all we would like to exclude, but avoid excluding anything we want to permit.
Notice that we do not insist that all regions should themselves be locally
Euclidean spaces, only that the space containing all the regions should be one.
86 NICHOLAS MARK GOTTS
RCC has two parts, concerning topology and convexity. The topological part, to
which attention is confined here, uses a single primitive: C(x,y), read: “region x
is connected to region y”. Given the interpretation of “region” given above, it
means that x and y share at least one point. The current version of RCC’s
topological part was first presented in Randell, Cui and Cohn (1992). The
presentation here is similar to that in Gotts (1994).
The first two axioms ensure that C is reflexive and symmetric:
Axiom (1) ∀ x C(x,x)
Axiom (2) ∀ x,y [C(x,y) → C(y,x)].
Additional relations are defined in terms of C. Those used here are:
DC(x,y) ≡def ¬C(x,y) (x is disconnected from y),
P(x,y) ≡def ∀ z [C(z,x) → C(z,y)] (x is part of y),
PP(x,y) ≡def P(x,y) ∧ ¬P(y,x) (x is a proper part of y),
EQ(x,y) ≡def P(x,y) ∧ P(y,x) (x coincides with y),
O(x,y) ≡def ∃ z [P(z,x) ∧ P(z,y)] (x overlaps with y),
DR(x,y) ≡def ¬O(x,y) (x is discrete from y),
PO(x,y) ≡def O(x,y) ∧ ¬P(x,y) ∧ ¬P(y,x) (x partially overlaps with y),
EC(x,y) ≡def C(x,y) ∧ ¬O(x,y) (x is externally connected with y),
TPP(x,y) ≡def PP(x,y) ∧ ∃z [EC(z,x) ∧ EC(z,y)] (x is a tangential proper part
of y),
NTPP(x,y) ≡def PP(x,y) ∧ ¬∃z[EC(z,x) ∧ EC(z,y)]
(x is a non-tangential proper part of y).
Any pair of RCC regions must be related by one of a set of eight binary relations,
shown in Figure 1 (TPPi and NTPPi are inverses of TPP and NTPP respectively).
00
11
PO(a,b)
000
1110
1 00
11 000
111
TPP(a,b) NTPP(a,b) EQ(a,b) NTPPi(a,b) TPPi(a,b) EC(a,b) DC(a,b)
0
1 1
0 00
11 00
11
111
000
0
1 00
11
0
1 000
1110
1
a 00
11
a 111
0
1
00
11 000 a b 111
0000
1 00
11 00
11 0
1a1
0 11
00 00
11
000
111
0
1
a
00
11
0
1 b 000
1110b
1 0
1 000
111 000
1110
1
b 00
11 00
11
b
01
1 011
b
00
a b
00
11
000
111
0
1 00111
11
0
1 000 11 00b 111
000111 000 a 00
11
a
Further axioms guarantee the existence of a universal region (U) (which will
always be a locally Euclidean space for the purposes of this paper), and some
“quasi-Boolean” functions (these functions are quasi-Boolean rather than Boolean
DESCRIBING THE TOPOLOGY OF SPHERICAL REGIONS 87
three parts of the dark region in the lower left subfigure of Figure 3. A connect-
ed region of the sphere not so divided, like each of these three parts, is referred
to here as interior-connected (INTCON). The number and arrangement of
maximal interior-connected parts in each maximal connected part forms the
second level of structure. The topology of the maximal interior-connected parts
forms the third.
CON(x) can be defined thus (“x cannot be divided into two DC parts”):
CON(x) ≡def ∀y∀z[EQ(x,SUM(y,z)) → C(y,z)].
For disconnected regions, we might define a “separation-number”: the number of
maximal connected parts it divides into (upper-case italics here stand for
variables ranging over the natural numbers):
SEPNUM(x,1) ≡def CON(x),
SEPNUM(x,N+1) ≡def
∃y,z[EQ(x,SUM(y,z)) ∧ DC(y,z) ∧ CON(y) ∧ SEPNUM(z,N)]
However, this definition requires at least part of arithmetic (addition of the
natural numbers) to be added to RCC-theory. We do not need arithmetic axioms
sufficient to support multiplication; axioms for addition only, so-called “Pres-
burger arithmetic” (Harel 1987: 179–180) suffice. These also allow us to state
that one number is at least as large as another. Alternatively, and staying within
RCC, we could avoid the need for any arithmetic axioms, defining a succession
of as many predicates as we require, as follows:
SEPNUM2(x) ≡def∃ y,z[EQ(x,SUM(y,z)) ∧ DC(y,z) ∧ CON(y) ∧ CON(z)],
SEPNUM3(x) ≡def ∃ y,z[EQ(x,SUM(y,z)) ∧ DC(y,z) ∧ CON(y) ∧
SEPNUM2(z)]
… concluding with a predicate SEPMANY, which applies to all regions with
more than some fixed number of separate parts. However, we will see that
avoiding explicit arithmetic has drawbacks.
If the universal region U is the one-dimensional counterpart of the sphere
(the circle, S1), regions other than U can only differ topologically in their
number of MAX-Ps (maximal CON components), as illustrated in Figure 2.
If U is more than one-dimensional, however, this is only the first stage in
a topological taxonomy of regions. Figure 3 shows CON, SEPNUM2, and
SEPNUM3 spherical regions: the simplest possible example of each at the top,
a more complicated example of each below. This figure uses a representational
format which is re-used several times: a spherical region is shown as a distinc-
DESCRIBING THE TOPOLOGY OF SPHERICAL REGIONS 89
Just as SEPNUM was defined from CON, we can define INTSEPNUM, associat-
ing a region with the number of its maximal interior-connected parts.
INTSEPNUM can be applied to both connected and disconnected regions: the
INTSEPNUM of the latter being the sum of the INTSEPNUMs of its maximal
connected parts. The left part of Figure 4 shows a connected region with three
interior-connected subregions: pairs of these meet at different numbers of points.
The right shows a region with three connected subregions, themselves composed
of one, two and three interior-connected parts.
y1.1
x3 y3.2
x1 x2
y3.1
y2.1 y2.2
y2.3
In the spherical case, INTCON regions will coincide with those we will call
“line-connected” (LINECON): these are connected, and cannot be divided into
two parts linked only at isolated points. Defining line-connectedness in RCC
terms is more complicated than defining INTCON: we first need some intermedi-
ate terms. MAX-P(x,y) means that x is a maximal connected part (we will
sometimes say simply “component”) of y:
MAX-P(x,y) ≡def CON(x) ∧ P(x,y) ∧ ¬∃z[PP(x,z) ∧ P(z,y) ∧ CON(z)].
Next, we define the predicate SBNUM1(x,y) (“Regions x and y do not overlap,
and share exactly one boundary”):
DESCRIBING THE TOPOLOGY OF SPHERICAL REGIONS 91
On the left of Figure 5, it is clear that the upper region meets the INTCON
definition, while the lower picture shows a choice of NTPP which is not part of
a CON NTPP. On the right, the same pair of regions are used to illustrate the
LINECON definition; it should be clear that no choice of a part w of y in the
92 NICHOLAS MARK GOTTS
lower region would allow subparts u and v to be chosen to meet the conditions
required.
Note that in one dimensional models of RCC no region is line-connected. In
three dimensions, some but not all line-connected regions are also surface-
connected: i.e., the parts of any two-part division share at least one surface, as
illustrated in Figure 6. Only these would be INTCON.
We can distinguish two kinds of topological property which a region may have:
extrinsic, and intrinsic. The former, but not the latter, depend on how the region
is embedded in some containing space — in the current context, generally the
sphere. INTCON is an extrinsic property, while LINECON is an intrinsic one
(however, since interior-connected and line-connected regions coincide on a
sphere, we can use them interchangeably in our classification). In point-set
topological terms, two regions are intrinsically different if there is no homeo-
morphism (one-to-one continuous function with a continuous inverse) between
the points of the two. The distinction is illustrated in Figure 7. The region at the
top left is intrinsically different from the other three, which are homeomorphic.
(In all but the upper left subfigure, the two maximal INTCON subregions that
touch each other both have an interior “hole”; in the upper left subfigure, only
one of the pair does.) Note that the regions in the two lower subfigures can be
DESCRIBING THE TOPOLOGY OF SPHERICAL REGIONS 93
continuously transformed into each other on the surface of a sphere, but not into
the somewhat lighter-shaded region at top right, even though they are homeo-
morphic to it. There is thus no topological difference between the regions shown
in the two lower subfigures, an extrinsic topological difference between these
and the region in the top right subfigure, and an intrinsic topological difference
between the region in the top left subfigure and the other three.
11111
00000
00
11
00000
11111
00
11
00000
11111
000
111
00
11
00000
11111
000
111
00000
11111
00000
11111
Extrinsic
Intrinsic
difference
difference
No
topological
difference
Classifying spherical regions first by their intrinsic topology, and only using
differences in the extrinsic topology to distinguish regions with the same
intrinsic topology, has the advantage that the intrinsic classification used in the
spherical case can be used almost without change for finite regions of the plane:
the only difference being that the entire sphere has no corresponding finite
planar region.
What do we need to add to the existing RCC axioms to specify that the
universal region U is a topological sphere, given the assumption that it is a
locally Euclidean space? We can actually go a long way towards this using the
identity between interior-connected and line-connected spherical regions. If we
assert that:
Axiom (9) ∀ x[INTCON(x) ≡ LINECON(x)],
we rule out all possible Us of one dimension (where any line-segment is interior-
94 NICHOLAS MARK GOTTS
connected: each consists of two parts which are joined only at a point. The
regions at the left and centre are all interior-connected but that in the centre is
not what we will call well-connected, or WELLCON: its boundary includes an
anomalous point, where two otherwise locally disconnected parts of the region
touch. (Note that this region, like that on its right, has only one boundary, as one
can get from any boundary-point to any other without going through the interior;
but this boundary is not a simple closed curve.)
SCON
Not SCON
regions can fall: a connected region can be simply-connected or not, and it can
be well-connected, interior-connected but not well-connected, or not interior-
connected. However, one of the six is empty: a two-dimensional simply-connect-
ed region that is interior-connected must also be well-connected. The central
region of Figure 8 shows why. A spherical (or planar) region which is interior-
connected but not well-connected must be divisible into two CON parts which
share at least two separate boundaries (at least one of which will be a point), as
shown by the line through the dark region. Such a region cannot therefore be
simply-connected.
The classification of CON spherical regions implicit in Figure 8 can be
extended in a useful way to disconnected regions. A region of which all compo-
nents are SCON will be called ALLSCON, one of which all components are
INTCON will be called ALLINTCON, and one of which all components are
WELLCON, ALLWELLCON. Notice that since a connected region is its own
sole component, SCON regions are also ALLSCON, etc.
The only simply-connected and well-connected spherical regions are the
disc, and the sphere itself. A spherical simply-connected region other than these
must be a collection of discs, each pair either disjoint or sharing a single point,
and with no “cycles” such that disc 1 touches disc 2, disc 2 touches disc 3... disc
n-1 touches disc n, and disc n touches disc 1.
Spherical and finite planar well-connected regions can be classified simply
by the number of (simple closed curve) boundaries they have. Classification of
spherical and finite planar regions which are interior-connected but not necessari-
ly well-connected is considerably more complicated.
We can begin classifying all spherical interior-connected regions in the
same way as those that are well-connected, by their number of separate bound-
aries. However, the three interior-connected regions shown in black in Figure 9
have the same number of boundaries (two in each case), but are intrinsically
distinct. We need to be able to characterise the individual boundaries.
Classification of interior-connected spherical regions is easier if we realize
that the complement of a INTCON region is always an ALLSCON region, and
vice versa. Consider the first characterisation of a simply-connected region given
above: that any circle embedded in it can be continuously shrunk to a point
within it without any part of it leaving the region. This applies to any ALLSCON
region, since a circle can only lie within a single part. If we consider a non-
ALLSCON region embedded in the sphere, there must be a circle embedded in
it which cannot be so shrunk. Imagine that the non-ALLSCON region covers the
equator, and that the equator itself is such a circle (this can always be ensured by
stretching the region in the right way — a circle embedded in a sphere divides
DESCRIBING THE TOPOLOGY OF SPHERICAL REGIONS 97
it into two discs, and these can be made into hemispheres by appropriate
stretching). For the circle to be unshrinkable, part of the COMPL of the non-
ALLSCON region must lie in each hemisphere, and at most, these parts of the
COMPL will connect at a finite number of points (if they connected along a
segment of the equator, the unshrinkable circle would not, as supposed, be
embedded in the non-ALLSCON region). Thus the spherical COMPL of a non-
ALLSCON region must be non-INTCON. Conversely, the complement of an
ALLSCON region in a sphere must be INTCON, as there will be no circle
within the ALLSCON region separating the complement into two parts touching
only at a finite set of points. We can subtract as many disjoint SCON regions as
we please from the sphere, and the remainder will always be INTCON, but as
soon as we subtract a non-SCON region, the remainder will no longer be
INTCON. The constraints of this paragraph are specific to the sphere: they do
not apply to the plane (where the complement of the INTCON but non-
WELLCON region of Figure 8 includes all of the plane outside the region’s
outer rim, and is thus non-SCON); nor to compact surfaces which are non-
simply-connected, such as the torus.
This analysis suggest several ways of classifying spherical interior-connect-
ed regions.
1. Divide such regions into classes according to the number of separate
boundaries they have (their CBNUM, for “complement boundary number”:
the SBNUM of a region and its COMPL).
2. Divide into classes according to their total number of boundary lobes
(LOBENUM), summed over all boundaries. A “boundary lobe” is a simple
closed curve making up part or all of a boundary. The inner boundaries of
the three regions in Figure 9 have 2, 4 and 4 lobes.
3. Combine (1) and (2), defining a class by both CBNUM and LOBENUM.
98 NICHOLAS MARK GOTTS
1 1
2 2 4
3
4 3
or if there are no more than two components, and all the boundary-lobes of each
component are topologically equivalent, there will be only one possible embed-
ding. In all other cases, there are distinct ways to embed a given collection of
components. There are, however, generally fewer (and never more) such
embeddings in the sphere than in the plane: given a spherical embedding of an
intrinsically-defined region, a distinct planar embedding can be produced from
every topologically distinct class of points in the spherical complement of the
region: simply pierce the sphere at such a point, expand this hole so its edge
surrounds the surface and flatten the surface into the plane. As illustrated in
Figure 11, this may produce obviously different results, as shown in Figure 11
(b), or may simply interchange topologically indistinguishable components, as in
the upper pair of possibilities in 11(c).
(a)
01 (b)
01 (c)
1010 1010 1
C
2
10 10 B A
10 10
A
1010 B
1010
1010 1010
1 1
C A
10 10
A B A
A B C
1010 1010
B
2 2
A A
1010 B
1010 1 B
C
2
two boundaries with the remainder of the region, that can be removed without
increasing the region’s number of components. For a spherical interior-connected
region (other than the entire sphere, for which both LOBENUM and
EXCONNUM are 0), this will be one less than the LOBENUM. One can also
calculate EXCONNUM for connected but not interior-connected regions. This is
equal to the sum of the EXCONNUMs of the maximal interior-connected
subregions, plus a contribution from the way in which these components are
connected to each other. This contribution can be calculated by constructing a
“multigraph” in which each interior-connected component is represented by a
vertex, and each connecting point between two such components as an edge. The
contribution from this level of the region’s structure is then the number of edges
that can be removed from this multigraph without disconnecting the remainder.
Finally, the EXCONNUM of a disconnected region is just the arithmetic sum of
the EXCONNUMs of its maximal connected components.
The number of boundaries (CBNUM) of a connected region is equal to the
sum of the numbers of boundaries of its interior-connected sub-regions, minus
the number of such components, plus one. Clearly, this is so for a connected
region that is also interior-connected. If we consider adding further maximal
DESCRIBING THE TOPOLOGY OF SPHERICAL REGIONS 101
interior-connected sub-regions one at a time, each when added must have exactly
one of its boundaries amalgamated with one of the existing boundaries, maintain-
ing the arithmetic relation. Note that this would not be the case if U were, for
example, a torus, on which two annuli (an annulus being a disc with a smaller
disc removed from its interior) could share points on both pairs of their bound-
aries, so that the connected but not interior-connected SUM of the two would
have just two boundaries, not three as the spherical formula requires. Again, the
CBNUM of a disconnected region is calculated by summing those of its maximal
connected components.
Some further constraints on integer-valued topological properties of
spherical regions are listed below. Some hold only for regions with particular
properties, others across all regions.
Consider first, constraints on single regions.
1. SEPNUM(x) ≤ INTSEPNUM(x)
2. SEPNUM(x) ≤ CBNUM(x)
3. LOBENUM(x) ≤ INTSEPNUM(x) + EXCONNUM(x) (If the region is well-
connected, and is not the whole sphere, equality holds.)
4. CBNUM(x) − SEPNUM(x) ≤ EXCONNUM(x) (Again, equality holds if the
region is WELLCON, and is not the whole of the sphere.)
Next, consider constraints on regions related by the COMPL function.
5. CBNUM(x) = CBNUM(COMPL(x))
6. INTSEPNUM(x) = EXCONNUM(COMPL(x))+1
7. If x is CON, then CBNUM(x) = SEPNUM(COMPL(x)).
Note that these could be considered to hold even when x is the entire sphere, if
we allowed for the existence of a null region, but constraint (6) would not then
hold if x were the null region. Constraint (6) is illustrated in Figure 12, in which
the eight maximal INTCON subregions of the dark-shaded region are numbered.
Subtracting the seven white regions from its complement leaves that complement
unaffected, but no further regions could be subtracted from it in the same way
without disconnecting it.
Next, constraints on regions related by the SUM function.
8. For SEPNUM, INTSEPNUM, CBNUM and LOBENUM, an upper limit for
SUM(x,y) is the arithmetic sum of the values for x and y. If x and y are DC,
these maxima will be achieved. If C(x,y), then the SEPNUM and CBNUM
maxima will not be achieved, while if x and y have some stretch of com-
mon boundary, none of them will be achieved.
9. For EXCONNUM, the same maximum will hold if the boundaries of the
102 NICHOLAS MARK GOTTS
1
2
3 4
5
6
two regions have no points of contact, and will be achieved if the regions
are DC.
Finally, constraints on regions related by the PROD function.
10. An upper limit on the EXCONNUM of PROD(x,y) is set by the sum of the
values for x and y.
11. If the boundaries of the two regions have no points of contact, the same
maximum applies for SEPNUM, INTSEPNUM, CBNUM and LOBENUM.
Constraints of these kinds have potential uses in the kinds of topological
reasoning we might expect a human problem-solver to perform: for example,
calculating how many separate parcels of land or boundaries an amalgamation of two
estates might have. They are also of potential use in checking the consistency of
spatial databases. ’s inability to express them is thus a serious drawback.
Conclusions
This paper has discussed an attempt to develop a specialized version of the RCC
calculus, to deal with the particular domain of “well-behaved” two-dimensional
parts of a spherical surface such as that of the Earth. The three-level hierarchical
structure of RCC regions (the whole region and its maximal connected and
interior-connected subregions), and some of the numerical relations listed above,
arise as consequences of the basic RCC axioms. Other features of the classifica-
tion of spherical regions arise from these basic axioms, plus the additional
constraints imposed by taking the universal region to be a sphere. In describing
DESCRIBING THE TOPOLOGY OF SPHERICAL REGIONS 103
Acknowledgments
The support of the EPSRC under grant no. GR/H 78955, and helpful comments from Tony Cohn and
three anonymous referees, are gratefully acknowledged. This paper was written while the author was
employed successively in the School of Computer Studies, University of Leeds, UK and the
Department of Computer Science, University of Wales Aberystwyth, UK. An earlier and considerably
shorter version of the paper appeared as University of Leeds School of Computer Studies Research
Report 96.24.
References
Cohn, Anthony, Brandon Bennett, John Gooday & Nicholas Gotts (1997). Qualitative
spatial representation and reasoning with the region connection calculus. Geoinfor-
matica, 1, 275–316.
Egenhofer, Max & Robert Franzosa (1995). On the equivalence of topological relations.
International Journal of Geographical Information Systems, 9(2), 133–152.
Gotts, Nicholas (1994). How far can we “C”? Defining a “doughnut” through connection
alone. In Jon Doyle, Erik Sandewall and Pietro Torasso (Eds.) Principles of
Knowledge Representation and Reasoning: Proceedings of the fourth international
conference (KR’94), 246–257. San Francisco: Morgan Kaufmann.
Gotts, Nicholas (1996a). An axiomatic approach to topology for spatial information
systems [School of Computer Studies Research Report, 96.25]. University of Leeds.
Gotts, Nicholas (1996b). Formalizing commonsense topology: The INCH calculus. In
Proceedings, fourth international symposium on artificial intelligence and mathemat-
ics, AI/MATH-96, 72–75.
Gotts, Nicholas, John Gooday & Anthony Cohn (1996). A connection based approach to
commonsense topological description and reasoning. The Monist, 79(1), 51–75.
Grzegorczyk, Andrzej (1951). Undecidability of some topological theories. Fundamenta
Mathematicae, 38, 137–152.
Harel, David (1987). Algorithmics: the spirit of computing. Addison-Wesley.
Randell, David, Zhan Cui & Anthony Cohn (1992). A spatial logic based on regions and
connection. In Bernhard Nebel, Charles Rich & William Swartout (Eds.) Principles
of Knowledge Representation and Reasoning: Proceedings of the third international
conference (KR’92), 165–176. San Mateo: Morgan Kaufmann.
Requicha, Aristides & Robert Tilove (1978). Mathematical foundations of constructive
solid geometry: General topology of closed regular sets [Technical Report, TM-27a].
University of Rochester.
Worboys, Michael and Petros Bofakos (1993). A canonical model for a class of areal
spatial objects. In D. Abel and B.C. Ooi (Eds.), Avances in Spatial Databases: Third
international symposium, SSD’93 [Lecture Notes in Computer Science, 692].
Springer-Verlag.
Cognitive Mapping in Rats and Humans
The tent-maze, a place learning task in visually
disconnected environments
Introduction
The term cognitive map is generally used to describe the mental representations
that people or animals built of their spatial environment and reflect their spatial
knowledge. Although many questions stay opened regarding the formation, the
nature and the substrate of these representations, the concept of cognitive map is
largely used by cognitive psychologist and neuroethologist, because the cognitive
mapping abilities of a subject can be approach through his spatial behaviour and
his ability to solve spatial problems. In the human literature, at minimum a
distinction between three elements of spatial knowledge has been made (Chase
& Chi 1981; Siegel & White 1975): landmarks, route knowledge and survey or
configurational knowledge. This last level is thought to include relational
information about elements that can not be seen simultaneously and allow to
localise places that are not perceptually available. Such places or references
points can then serve in determining one’s position in the environment (Gärling
et al. 1986). Whether and how rats and humans can reach this configurational
knowledge of an environment, will be tested here in a place learning task, taking
place in visually disconnected environments.
Coping with visually disconnected environments is part of our everyday
experience. Imagine viewing the arrangement of objects in a room, then walk
into an adjacent room and note how easily you can point at the first room’s
features, which are then out of view. Upon moving from one bounded region to
the next, we will usually notice the spatial arrangement of the features. As more
of the first region will be occluded from view, a reciprocal amount of the
106 M.-C. GROBÉTY, M. MORAND AND F. SCHENK
All experiments were based on the following principles. Rats had to find a goal
(escape hole or baited box) whose location was fixed in space. The environment
was divided into two concentric areas. (1) A central circular or square area was
delimited by curtains, and was visually homogeneous. Four possible escape
holes, covered by lids, were distributed regularly on the arena surface. Only one
of them was connected to the animal cage. Olfactory cues in the inside arena
were controlled by rotation of the substrate of the inside arena between trials.
The correct hole was only identifiable by its fixed spatial position. The enclosed
area was accessible only through doors or barriers of thick fringe curtains. (2)
Surrounding this central arena was a circular platform giving access to the rich
and well-differentiated visuospatial cues of the experimental room. It was
assumed that free exploration of this environment from the platform around the
enclosed arena would provide sufficient orientation information to allow rats to
orient themselves in the inside enclosure. Number of trials and special probe tests
are detailed in each of the following sections (for more details see also Schenk
et al. 1995; and Schenk et al. 1997).
Several variations of the general design were tested with rats. Design
variations focused on the importance of the type of connections permitted
between the inside arena and the outside environment and on the role that they
may play in facilitating the elaboration of configurational relationships between
the two environments. The following specific questions were addressed: (1) is
place learning facilitated when only a few predetermined entrances are available?
(2) Does a fixed entrance position assist in structuring the inside environment?
(3) Will the shape of the entrance paths influence the path integration efficiency
of the studied subjects?
COGNITIVE MAPPING IN RATS AND HUMANS 109
100
First probe trial
20
0
fringes 4 doors
group 1 group 2
100
Second probe trial
Time spent in hole areas (%)
80
***
60 ***
40
20
fringes fringes
group 1 group 2
Figure 1. Mean of the relative time spent in each hole sector during the probe trials. The
discrimination of the training sector (trn) in the different groups is based on separate one-
way ANOVAs (n.s.= non significant; p<.01 **; p<.001 ***). For the second probe trial, the
group “4 doors” was tested with a mere fringe curtain
Access into the internal arena was either restricted through 4 orthogonally placed
doors (4-door condition), or it was not predetermined (free access through
“fringes”). In the latter case, a circular curtain extended by tight fringes (10 cm)
touching the ground circumscribed the internal arena. Rats could not see through
the fringes but could pass through and make their way in and out at any point.
110 M.-C. GROBÉTY, M. MORAND AND F. SCHENK
Training consisted of trials during which the rats learned how to leave the central
arena through one hole out of four, allowing escape into the subject’s home cage.
After several training sessions (3 days, 4 trials per day), knowledge regarding the
goal position was assessed during a first probe trial. For the probe test, all holes
were equivalent and none were connected to the home cage. The subjects’
knowledge of the spatial position of the goal place was evaluated as a function
of the time spent searching in the four hole areas. Further training trials and a
second probe tests were then run.
The discrimination of the goal was optimal in the 4-door condition, indicat-
ing that rats can learn a place on the basis of the interaction between the
collection of piecemeal information memorised from the outside arena and path
integration information. In the free access through “fringes” condition, the
discrimination of the goal was less clearly marked in the first probe test (Fig-
ure 1). However, rats in the free access through “fringes” condition required only
more time to learn the task and demonstrated good place learning performance
in the second probe trial (1 day later). In addition, rats originally trained with
four doors were not affected when having to locate the hole in the free access
condition (second probe trial, group 2, Figure 1). The 4-door condition implies
repeated use of the same entrances and a lower variability in the paths leading to
the goal, facilitating route learning. Therefore, this design inherently simplifies
the collection and the integration of path integration and visual information. The
repeated use of only four orthogonal entrances helped anchor the inner arena into
a representation of the room environment but these predetermined entrances were
no longer necessary after this learning phase. The experimental condition
minimising the complexity of the path integration information was the least
difficult for the rats. This underlines the importance of the path integration
process in these tasks and more generally its essential role in successful place
learning. By constantly keeping track of one’s own movements through space,
path integration allows one first to integrate directions and distances between
features and to build configurational knowledge of the environment. But it is also
essential to orient and guide one’s path to a memorised goal location.
In the above two designs, rats could have implemented a simplified strategy to
locate the goal. They could have defined a preferred entrance point and memor-
ised only the relative position of the goal from this access point (route learning).
Although we did not observe such stereotyped behaviour, a testing procedure that
would exclude the use of such a strategy was utilised in a second series of
experiments. In a squared four-door arena, 3 of the doors served as frequent
COGNITIVE MAPPING IN RATS AND HUMANS 111
access doors during training (“familiar doors”). The fourth door was only
accessible twice during training (“new door”). Two types of access doors were
used: (1) “Direct doors” were simple fringed openings in the arena wall,
allowing straight entrance in the enclosed arena. (2) “Zigzag doors” allowed
access via Z shaped alleys with a section parallel to the arena walls (Figure 2).
Surprisingly, in the “direct door” condition, probe trials with access through
any of the familiar doors indicated a weak discrimination of the training position,
whereas probe trials via the new door indicated a more significant discrimination
of the training position. In the “Zigzag door” condition, rats demonstrated a
significant spatial discrimination in probe trials when access into the homogenous
arena took place through one of the three familiar doors. In this condition,
however, a transfer probe trial with access via the “new door” indicated a poor
discrimination of the training position. Rats indeed demonstrated a strange search
pattern focus around the new entrance door.
Therefore, rats in the easiest test situation (“direct and familiar” door) and
in the most complex one (“zigzag and unfamiliar”) performed less efficiently
than in the intermediate conditions (“direct and unfamiliar” or “zigzag and
familiar”). This may suggest that an aspecific and of course non-controlled
arousal level affect the expression of a spatial bias during probe trials. The
novelty of the access path for the probe test seemed to influence positively the
level of arousal in the direct doors condition focusing activity around the correct
goal position. A novel entrance door in the zigzag condition either make the task
too difficult or the new door too attractive in itself that it disrupted searching
behaviour. Unfortunately we can not eliminate at this point a disorientation of the
subjects. However, the strange search pattern of the animals, focused around the
entrance area instead of the escape holes, suggest more a strong attractiveness of
the new entrance alley than pure disorientation. Novelty is known to induce re-
exploration (see Thinus-Blanc et al. 1998 for a review) This re-exploration burst
may be more apparent when the subjects clearly have some objects or points to
focus on with their re-exploratory behaviour. More often however, novelty
increases the general level of activity due to a shift in the priority between the
achievement of the task and curiosity or fear. Negative results of several
experiments on cognitive mapping may be partially due to this shifting factor
(Sutherland et al. 1987; Benhamou 1996). As we will see in the following
experiments, human subjects will also react strongly to the novelty of an entrance.
This series of experiments, however, clearly indicate that at least in the
direct entrance condition, rats do not need to rely upon familiar access and that
therefore they are able to identify a place in the visually homogeneous arena of
the tent-maze without relying on route strategies. According to our hypothesis,
this suggests that they were able to build and use a configurational knowledge of
the visually disconnected environments.
112 M.-C. GROBÉTY, M. MORAND AND F. SCHENK
Figure 2. Experimental set-up and time spent by the rats in the four sectors following access
to the arena via a new door
Standardised spatial tests in cue controlled environments are not very often
encountered in the spatial literature concerning human adults. Neuropsychologists
and neurologists working with patients suffering from spatial disorientation
mainly use visuospatial psychometric tests (Beaumont 1998) or tests of general
culturo-spatial knowledge. However, the demand for the development of
standardised procedures for testing spatial orientation and place learning capaci-
ties in a real space of movement is evident (see for example Van der Linden &
Meuleman 1998).
We have already seen that the tent maze test we developed for rats is
ecologically relevant for humans, as building complex representations of
disconnected environments is a common problem of everyday life. But would
such laboratory procedure really be adequate for humans? Would human subjects
develop configurational knowledge in such environment? If so, how quickly?
COGNITIVE MAPPING IN RATS AND HUMANS 113
Without or with verbal instructions? Would new entrance directions and complex
entrance paths affect human performance? Is this task relevant to distinguish
between subjects favouring spatial or non-spatial strategies?
Our first human version of the test consisted of an enclosed octagonal arena
(3.5m diam.) with eight possible blind entrance doors. The arena, completely
homogenous from the inside, was situated in the main hall of a building, an
external environment rich in visuospatial information. The assigned task was to
find a goal, a textual “well done” sign written under a white plastic disc or card.
The goal disc was placed on the arena floor among other identical ones. It could
be discriminated from other discs only by its position in space.
Forty-nine adult subjects (from both sexes, ranging from 16–50 years old),
divided into four groups, were tested according to the following procedure (see
Figure 3). During the first two training trials only four white discs (the goal and
3 others in symmetrical positions) were present in the arena. Subjects were first
conducted to a blind entrance door (E1). Once there, they were instructed to
enter the arena, to explore it and to look at the objects within it and when they
would be finished, to get out throughout any door. This was the only verbal
instruction for the all experiment, excepted for the A+ and B+ group (see below).
A second training trial was allowed, this time using E2 entrance. Subsequent to
this learning phase, an initial test was run with 60 identical discs on the floor of
the arena. The goal disc was of course located at the same position as during the
learning phase. The entrance door for this test was either an already used one
(“familiar” (door E1, group A)) or a new one (“new” door (group B)). Prior to
entering the arena, some of the subjects of each group were given the instruction:
“the object is at the same place as before” (A+ and B+ instruction Group).
Training was prolonged by two trials using the first two entrance doors (E1 and
E2). A second test was then conducted with 120 discs on the arena floor. For all
groups, a new entrance door and no verbal instructions were used. This time, in
order to assess to what extent subjects would focus their search around the
correct baited place, no “well done” card was present (similar to the rats
procedure in the probe trials, when no escape hole was connected to the cage).
all. Through the course of the 6 tests, this group of subjects were not aware that
the goal was always at the same place (and were even surprised when informed
so). They behaved in a systematic manner in each trial; opening the discs one
after another, row after row. Were these systematic subjects unable to build a
cognitive map of the environment or did they enter in the test with an a priori
non-spatial search hypothesis? Unfortunately these first series of subjects were
COGNITIVE MAPPING IN RATS AND HUMANS 115
not further tested. The subjects with a systematic search pattern were removed
from their respective groups for the following analyses.
A+ B+
Figure 4. Position of the first disc returned by the subjects during test 1 (group A, N= 10;
group B, N=13; group A+, N=10; group B+ N= 9). The grey square represents the position
of the goal (centre) and the 8 discs around it. The choice of a first disc located inside this
squared area was considered as a hit. The positions of the 3 other initial discs are also
represented.
The increased number of discs present during the test (60 instead of 4 discs)
had no influence on the search strategy and the spatial precision of the subjects
of the group A, A+ and B+. Subjects entering through a familiar door (groups A
and A+) overturned their first disc right in the goal area. In contrast, seven out
of the 13 subjects entering through a new door (Group B) were disoriented. The
first discs overturned by these subjects were situated in the proximity of the
entrance point. On the other hand, subjects of the B + instruction Group, also
entering via a new door, were well oriented toward the goal spot, as shown in
Figure 4.
116 M.-C. GROBÉTY, M. MORAND AND F. SCHENK
A second series of experiments was run with human subjects. This design was
chosen to replicate the zigzag alley entrances used with the rats. A double wall
of curtains that created a blind circular alley of 80cm wide surrounded a
hexagonal enclosed arena. The subjects, guided by the experimenter, would enter
from one direction in the blind alley, then turn around the arena between the
COGNITIVE MAPPING IN RATS AND HUMANS 117
Figure 5. (A) Mean number of discs opened to find the baited one. (B) Number of discs
overturned in an area around the goal in Test 2.
curtained walls, to finally enter into the arena from another direction. Of course
for humans the amplitude of the deviation from straight entrance direction was
higher than for rats. Two turn conditions were chosen, subjects could turn around
the peripheric arena with full access to visual cues all along the way, or turn
inside the blind alley with no vision of the visually rich environment during the
118 M.-C. GROBÉTY, M. MORAND AND F. SCHENK
turn. Amplitude of the turns were varied from 120° around the peripheral arena
(“120° around ”) to 240° around (“240° around ”) or 240° in the blind alley
(“240° blind alley”).
Forty-two subjects were tested. The general set-up was similar to the
previous one with 3 exceptions: the shape of the arena (hexagonal instead of
octagonal), the presence of the surrounding blind alley and the number of white
discs used during the test (n = 120). After two training trials, the subjects were
divided into 3 groups according to the type of entrance used (“120 around ”,
“240° around ” or “240° blind alley”). All subjects, whatever their turn amplitude
in the blind alley, entered in the arena through the same new door.
As in the first experiments, one subject with a systematic search pattern was
removed from the following results.
This test was more difficult than the previous one. The percentage of the
subject overturning a disc in the hit zone (one discs around the goal) was
respectively: 69% (“120° around ”), 46% (“240° around ”) and 71% (“240° blind
alley”). Fifty percent of all the subjects did not find the goal in the first 10
overturned discs. To take into account the performance of these subjects, we
calculated for all groups the mean proportion of the first ten overturned discs per
sector (4 sectors). If a subject had found the goal in less than ten discs, then its
mean proportion of discs overturned by sectors was calculated on all its opened
discs. Subjects of the groups “120° around ” and “240° blind alley” entrance
significantly focused their search behaviour on the correct sector (Figure 6).
Subjects from the third group (“240° around ”) did not significantly identify the
correct sector and showed the least success of goal retrieval. Theoretically, the
240° path around the arena, allowing full access to the visually rich environment
during the rotation, should have been less disorienting than the 240° blind one.
However, the results showed the reverse tendency.
COGNITIVE MAPPING IN RATS AND HUMANS 119
1
Sector -1
Goal sector
Sector +1
Sector opp ***
Mean proportion of overturned discs
0.75
*** ***
0.5 ns
0.25
0
120°around 240°around 240°blind alley 240°around + instructions
Entrance doors
Figure 6. Mean proportion of white discs overturned per subject in each sector for the first
ten discs. This proportion reflects how much subjects focused on the goal sector during their
search.
A fourth group of subjects using the same procedure as the “240° around ”
with the exception of verbal instruction “the goal is at the same place as before”
(“240° around + instructions ”) was run. Here again, similar to the first series of
experiments, this simple instruction, given prior to the start of the test, was
sufficient to improve the results of the subjects and 70% of them hit the goal
zone during their first choice. These subjects were not disoriented by the 240°
turn and focused their search in the correct sector.
120 M.-C. GROBÉTY, M. MORAND AND F. SCHENK
Similar to the results of the rats’ experiments, arousal level appeared to interfere
with the production of the correct spatial behaviour. When the task condition was
simple, the search was spatially oriented. When the difficulty of the task slightly
increased but not enough to arouse the attention level, path integration informa-
tion was not taken fully into account and the spatial efficiency reverted to a
random search. However, if the task became clearly difficult, attention was
increased and spatial performance was good again. An assumption could be made
that a further increase of the task difficulty would eventually cause the spatial
performance to deteriorate. The 240° turn in the blind alley is of a sufficient
length to challenge the subjects but not enough to disorient them. Subjects
reported feeling progressively stressed and disoriented when moving in the blind
alley and therefore tried to be more attentive to their movements. This factor was
sufficient to improve their performances. It is however important to note that in
this “240° blind alley” group both the best performance (the highest number of
goals found at the first opened discs) and the worst (the highest number of goals
not found in a 90-sec. search) were found. This may reflect the variability of
individual spatial competencies and strategies, each subject being challenged by
the difficulty of the task.
Was the performance of the subjects in the tent-maze related to their spatial
capabilities in everyday life? To approach this question, we regrouped the results
of 51 subjects of the last experiment (across different testing procedures) and
compared their performance in the maze with their self-evaluation of their spatial
capabilities. The subject’s spatial capabilities in everyday life were self-evaluated
in a questionnaire filled out before the test of the tent maze. Questions are
summarised in Figure 7. Subjects could answer on a 4-point scale. Their
performances in the tent-maze were expressed in terms of distance of the first
overturned discs to the position of the goal and were regrouped in three classes
according to their spatial precision (1 = 0 to 1 disc away from the goal, 2 = 2
discs away, 3 = 3 and more discs away). The Anova with repeated factors ran on
this measure of performances and the scores of each question per subject showed
a significant difference between groups (F (2,49) = 3.625; p = 0.035). The most
efficient subjects in the tent-maze self-evaluated their orientation capabilities as
higher than did the less spatially efficient group (distance of the first overturned
discs 3 or more).
COGNITIVE MAPPING IN RATS AND HUMANS 121
1 2 3 4
Conclusion
The experiments presented here shed some light on the controversy about the
existence of cognitive maps in animals (Bennet 1996; Poucet & Benhamou
1997). The place learning tests in the tent-maze presented here meet the opera-
tionally defined conditions for the demonstration of a configurational representa-
tion of the environment (no specific snapshot memory, capacity for detours and
new routes). Rats, like human subjects demonstrated their capability to orient in
a homogenous arena through interconnections with a surrounding varied environ-
ment. More specifically, they were able to identify a specific location on the
basis of the collection of path integration information and piecemeal visual
information obtained from the surrounding environment. The use of a new
entrance door not yet experienced and hypothetically not yet coded on the
representation of the environment disrupted the rats’accuracy in the most
complex tested designs only, but also affected human performance in some
conditions. Rats, as well as humans, were however able to develop new routes
to the goal when tested in appropriate conditions (respectively “Direct doors”
new entrance for rats and new entrance + instruction for human). In the first
experiment, when human subjects were tested without verbal instructions, a new
entrance direction seemed to reset the searching hypothesis, leading to what is
COGNITIVE MAPPING IN RATS AND HUMANS 123
or drawing ability also have the advantage of being adequate for humans at any
age as well as for animals. They allow comparative approaches, which can only
be beneficial to the understanding of brain mechanisms underlying spatial
cognition of species which, however different they are, live in the same environ-
ments and solve similar spatial problems.
References
Arnold Smith
National Research Council, Canada
1. Introduction
The intent of this chapter is to suggest that all of the ways of thinking about
human spatial cognition that loosely adhere to an information-processing
paradigm, whether these are quantitative or qualitative, are based on a set of
assumptions about the nature of language and mind that is incomplete and can be
no more than partially correct. When we examine these assumptions carefully,
we find that they are quite inadequate to explain what is going on. This lack of
adequacy, in turn, is related to the very slow progress (one might even say
‘failure’) of computational language understanding and artificial intelligence more
generally. Spatial cognition is far from being the only domain in which one can
begin to see this, but it is perhaps as good a vantage point as any from which to
develop a sense for the problems.
The core of the argument is that language and conceptual thinking, which
we of course use all the time, constitute a faculty that in humans is superimposed
on a quite different mode of knowing and awareness, and that it is the latter, far
more than the former, that we rely on in order to act in and reason about our
physical environment. I will argue that language and its associated concepts are
much too abstract to mediate our direct interaction with the world, and that
language would not work at all, even for communication, without the rich non-
conceptual substrate on which it depends. The implication is that we cannot
directly model the world and its relationships, nor can we mimic human cogni-
tion and its achievements, using concepts rooted in language as a basis for model
construction. What should we be doing instead? Well, that is a long story … a
story we are still learning to tell.
This chapter represents work in progress. When the author began this
128 ARNOLD SMITH
2. A representational perspective
the house”. In addition, the continual presence of a viewpoint onto the scene
meant that there was automatically a context that could be used to aid the
interpretation of deixis and other forms of reference. And there were further
benefits.
Nevertheless there were problems that were not so easily solved. Confining
our attention to the interpretation of spatial referring expressions, we can see at
least the following hard problems:
“Put the mailbox (car) in front of the house.”
The interpretation of this kind of sentence requires the system to find a suitable
place to position the mailbox or car. But cars typically go on roads or driveways
— mailboxes typically don’t. Almost all objects should be placed on a support-
ing surface, and objects should never be placed in such a way that they inter-
penetrate each other. A system that pays no attention to such issues looks
ridiculous, and arguably does not successfully understand English. So we see that
the interpretation of ‘in front of’ seems to depend on complex social conventions
about object type, as well as on many issues of physical simulation. It is possible
of course to wriggle out of this kind of problem by saying that such consider-
ations are not part of the meaning of the phrase, but instead are part of the
felicity conditions for interpretation of the utterance. From a system design
perspective, however, this is not a way out. Unless we can address such prob-
lems, we can’t design a usable system.
Similar considerations lead us to realize that collision detection and path
planning are relevant to spatial expressions involving action verbs, and that many
issues of friction and complex object geometry, for example, are relevant to
prepositions dealing with placement.
Or take an expression such as “Go and stand on the sidewalk”, and consider
the problem of the system choosing a suitable reference location. In this sen-
tence, we appear to have a generic reference to the category sidewalk, rather than
a reference to a unique or salient sidewalk object (it isn’t even clear if ‘sidewalk’
is a count noun or a mass noun in this kind of usage). Now in an urban model
there may be many stretches of sidewalk. Yet in a particular context of utterance
such as this, it is not the case that any point on any sidewalk in the city is as
good as any other. A new special kind of felicity condition applies here. The
reader is invited to propose a general rule that governs this case, and to think
about how many other similar rules might be needed for a realistic application.
It is issues such as these that explain why we have been making so little
progress in artificial intelligence generally. We like to think that we are making
incremental progress, and are gradually understanding how to solve the myriad
130 ARNOLD SMITH
problems that arise. But is it not possible that we are deeply on the wrong track?
Are we perhaps missing something crucial?
Let us return to spatial reasoning in particular. Think about our real interaction
with physical environments. Imagine what it feels like to find your way through
an unfamiliar forest, to run across a rock-strewn landscape, to pilot a white-water
canoe down a swiftly-running river, to figure out a way to climb up a steep
mountainside. These are the essential tasks of navigation through the environ-
ment, the essence of spatial reasoning and cognition. Most of the mental work is
inarticulate, best carried out in a state of relaxed but intense concentration —
with a mind that is mostly quiet, where logical analysis, if carried out at all, is
secondary to intuitive appraisal and the experience of the body. If you’re trying
to step across a rushing stream on a series of rocks that look as if they span it,
your decision about whether you can safely leap to that next boulder depends on
many things — the length of leap you’re capable of, the difference in height of
the surfaces of the rock you’re leaping from and the one you’ll land on, the
texture and wetness of the surfaces, the kind of shoes you’re wearing, your
fitness, your fatigue, the consequences of a slip… But all of this is assessed in
a gut-feel decision that takes perhaps a second or two, and pretty clearly does not
involve analytical modelling.
Or consider the white-water canoeist. He or she is in an environment in
which there are very few objects and only a few important boundaries. Alertness
is high, and visual assessment is constant and skilled, but navigation is dependent
on the “feel” of the water. This is skilled performance. This is the way we used
to interact with the world before we acquired language, before we began to build
abstractions to analyse our experience.
For a third example consider a pair of dancers, engaged in an interpretive
modern dance. In some sense everything in the dance is about spatial relation-
ships — relationships between the dancers, the positions and movements of their
limbs and bodies. In this case we can take the role of an observer, perhaps a
dance critic. He or she could find words to describe what the dancers are doing,
but the description would fall far short of the dance itself, no matter how
eloquent or how detailed it might be. And in the description, metaphors would
abound — a literal description would be particularly pointless.
Now think about spatial concepts and terms: adjacency, superposition,
connectedness, “on”, “next to”, “behind”, “above”, “in”, “at”. And think of some
SPATIAL COGNITION WITHOUT SPATIAL CONCEPTS 131
4. Evolution of language
and delineations that language gives us are rich enough by now that we can
always stitch together a description of what we are experiencing. But we
shouldn’t be fooled into thinking that when we do this, we have captured the
essence of it. It isn’t even easy to say how comprehensive our conceptual models
of the world are — because consciousness appears to be normally focussed on
our conceptual thinking, it is hard to be aware of what is not conceptually
mapped.
From an evolutionary point of view, language is recent, and we can come
to see it as still light-weight and fragile. When we abstract from our interaction
with the world, and build structures and representations on a basis of language
and its concepts, we are already pulling back from our direct experience.
Conceptual abstraction is powerful — all the more so now that we have comput-
ers to help us build and maintain more elaborate structures than we can keep in
our heads — but it is dangerous precisely because of its disconnection from its
subject-matter. To a far greater degree than we usually realize, we let our
conceptual models of reality, which are fashioned as much by our culture and our
media as by our own visceral experience, determine what and how we perceive.
There are several points to note here. One is that the mapping of linguistic
concepts to the richness of the world is extremely sparse, far too sparse to form
a basis for action or systematic reconstruction. Another is that despite the
apparent richness of detail, we have no difficulty acting, or even imagining
ourselves acting, in such situations. On the contrary, there is often an exhilarating
sense of absorption in the fluid handling of this kind of complexity. It is pretty
clear that have a way of apprehending the world, and even “reasoning” about it,
that is very different from language. Not only do we have such a way, it is
presumably the basis for much of our behaviour (cf. Polanyi 1962).
Of course, when we talk about what we do, when we want to describe what
is happening, we superimpose rational categories, categories of classification,
dimension, shape, quantity, quality, etc. When we tell a story, or describe a
scene, we have in mind (in some sense) the kind of visually-rich imagined world
that serves us well in action, and we use the short brush-strokes of language to
lightly sketch what we “see”. Our hearers draw on their own experience to
imaginatively reconstruct their own rich images of what may have been. It is a
wonderful process.
But to imagine that the world is really captured in those brief brush-strokes,
SPATIAL COGNITION WITHOUT SPATIAL CONCEPTS 133
and that we can use them as foundations for valid models, is to dangerously
misinterpret what is going on. Because we talk to each other and to ourselves so
much, we tend to take the categories as primary, and to project them back out
into the world as if they were elemental and ‘given’. We assume that in our
analysis of what the world is like, we can start from clean, clear, discrete
elements, and can elaborate towards the complexity of the world from that. It is
a compositional view, a view that lends itself well to structuring and manipula-
tion, to symbolic reasoning, and to computation. We are so used to thinking in
this kind of way that it is hard to imagine any other.
Unfortunately, or interestingly anyway, the world doesn’t seem to be like
that. Even the independent existence of objects, which we take absolutely for
granted, is far more problematic than it appears (Smith 1996). One aspect of the
problem is that the world is more fluid, messier, more continuous, than our
conceptual models of it. And the fluidity and the messiness are not simply at the
periphery — they involve breakdown, unexpected change, and underlying
continuity across sharp consequential boundaries.
At some level, it is not even obvious that a scenario such as fording a river
on rocks is complex. Any three-year-old child can easily understand what is
happening, and many six- and seven-year-olds can do it with skill and grace. Do
we bring extrinsic complexity with us into the situation because of our unexam-
ined habits of scientific thought?
Another pervasive source of trouble for our models is the apparently
inescapable context-dependence of every situation, every interpretation. The
“Y2K” problem is one of the current big examples — all those programmers,
comfortably ensconced in the 20th century, failed to notice the contextual
dependence of their assumptions. All symbolic modelling involves abstraction,
and all abstraction involves ignoring details and particulars, in an attempt to say
something whose value will transcend its original context. But something about
the fabric of the world never quite lets us get away with this attempt to tear out
a patch of cloth from the whole. The ideal of context-independence is always
something of an illusion.
6. What do we do?
In fact, as we have noted above, we as humans have wonderful skills for dealing
with all of this at the level of the intuitive, or of direct apprehension. But as
individuals and as a society we want the world to be under our control, and so
we are constantly trying to neaten things up so that our analytical categories do
134 ARNOLD SMITH
7. Conclusion
References
Berman, Morris (1981) The Re-enchantment of the World. Ithaca, New York: Cornell
University Press.
Csikszentmihalyi, Mihaly (1991) Flow: The Psychology of Optimal Experience. New York:
HarperCollins.
Hernández, D. (1994) Qualitative Representation of Spatial Knowledge, Berlin: Springer-
Verlag.
136 ARNOLD SMITH
Introduction
The aim of the work reported here is to promote the exploration of spatial
understanding and identity in a time when our abilities for spatial perception are
under constant duress from the influx of new media technologies. These technol-
ogies demand a perceptual shift between different ideas and representations of
spaces — local, global, networked, real, and virtual. Given the complexity and
subtleties of these shifts, it is unsurprising that the Cartesian model of space
proves unable to provide a satisfactory framework for understanding these new
media spaces. In order to illustrate the nature of these shifts and ‘slips’, present-
ed below are three examples of our perception of time and space under the stress
of new media as it breaks conventional forms and structures. These shifts are
then discussed in relation to previous attempts by late twentieth century theorists
to reconcile mental space and real space (Bourdieu 1977; Lefebvre 1991). It is
then argued that the problems created by new media technologies for these
spatial models require a new approach, consolidating the break away from the
Cartesians, reuniting the spatial and temporal aspects of information and repre-
sentation, whilst providing a flexibility capable of embracing non-linearity.
Television channel, our expectations for travel, movement and speed, demanding
we orientate ourselves in anticipation of the next move. It is not so much the
new technologies of immersive virtual reality environments, or the stunning
sensory effects now routinely found in films or theme parks. Rather, what is focused
on here, are the seemingly insignificant events which infiltrate our daily life.
Example one
Virgin Radio’s traffic information during its breakfast show is unusual because
of the way in which it asks us to comprehend the traffic problems across the
country. In a dynamic, noisy minute and a half, a loud male voice asks us to join
him in contacting his colleagues who, as he speaks, are hovering over ‘Britain’s
worst hit traffic spots’. One by one, from helicopter or some other high vantage
point, voices describe how slowly cars appear to be moving on roads below: ‘I’m
circling over junction 8 of the M6, and as I look under my wing, I can see a
long tail back stretching way back to junction 7, all queuing to get past a jack-
knifed lorry that has shed its load’. This continues whilst another three or four
flying traffic specialists recount the sights from their cockpits.
The problems of this particular case may by unclear from the point of view
of the reader, but from the point of view of a driver and listener to the report, it
requires some consideration in order to establish if their particular route will be
hampered that morning. When we make a journey we concentrate on the
combination of roads that make up our route from the perspective of being
behind the steering wheel, and that any news regarding road status is expected to
be introduced from an announcer located somewhere in the nowhere land behind
your car stereo; a geographical model that is bound by conventions of a ‘face on’
environment and point of view. Thus, when a voice from above you proclaims
that a road ‘below’ is experiencing problems, it introduces an additional interface
causing some initial confusion to the driver, one that involves a different model
of geography; the birds eye view. This clash of geographic models is only made
possible by the use of technologies and the aspirations of a radio station,
attempting to make a dull aspect of their service more spectacular, more filmic.
It is doubtful that this situation produces more than momentary confusion for the
driver before they are able to make sense of the information. However, the ‘slip’
caused by the collision of two models for geographic space and the absorption
of one model into the other appears to be an interesting and complex phenome-
non worthy of further consideration.
Example two
BBC Television Breakfast News often attempt to carry out debates around
SPACE UNDER STRESS 139
current affairs in the twenty minute slots between hourly and half hourly news
bulletins. Ambitious not only because it’s early in the morning for most viewers
to take seriously, but because the discussions often feature up to four partici-
pants. Traditionally, the group is made up of three people in ‘the studio’ and one
other contributor who is located in some ‘far off’ place, joining the debate via
telematics, appearing framed on a screen behind the breakfast presenter. Once the
debate begins, the programme controllers are able to cut between cameras to
construct a coherent ‘in-studio’ discussion chaired by the presenter using well
established cinema technique, the 180° rule. Confusion sets in when the virtual
participant is introduced, switching from studio camera to the live feed coming
from another location. Announcing the introduction of the new member to the
group, gives the viewers just time to observe the mystification on the faces of the
other participants as to where to look for their adversary. Since the new member
of the debate is superimposed on to the stage via ‘blue screen’ technology the
180° camera rule is rendered useless since he cannot be seen to exist in the
studio. Hence, the only space to which the panel can look, is into the camera, or
from the point of view of the viewer, into the screen where our virtual guest will
appear. Unfortunately, such a complex conceptual conclusion is not always made
in time by all members of the group, some of whom recognise the new guest on
an off set monitor and can be seen to look off screen, left and down, to where
the virtual participant can be seen.
The complex conundrum that has to be thought out carefully by each
participant, if a consistent spatial geography is communicated to the viewers,
breaks so many spatial and geographical rules that it is inevitable that people are
quickly confused and disorientated.
Further to the complex mixing of spatial models, the broadcast team appear
to make things worse by attempting to locate some people and not others; The
backdrop behind the virtual guest is clearly superimposed, and goes some way to
help locate him, a digitally blurred and colour saturated still of the former BBC
Television headquarters (to those who can make it out), with the words ‘Central
London’ supposedly clarifying the point across the top of the screen. The debate
continues and our man from Central London is performing well, in fact, his
lengthy answers provide an opportunity to introduce another framing device to
enable us to see the presenter and his virtual guest at the same time. Composed
within a saturated blue background, hover the two moving images of presenter
and guest, clearly in different frames and labelled accordingly: ‘Studio’ and
‘Central London’.
If the presenter is in the ‘Studio’, our virtual guest in ‘Central London’
where is the frame located? Where is home to ourselves, if home is where we
locate our own point of view on the proceedings? Typically, we place ourselves
in the studio audience, but then we are alienated from that warm, well lit space
140 CHRIS SPEED AND DEBORAH GARCÍA-TOBIN
and located somewhere with over saturated blue walls, and Helvetica type that
labels the windows through which we see anything. The naming of the ‘Studio’
and ‘Central London’ seem to suggest a break down in spatial conventions. By
introducing Central London as a means of locating the virtual guest, the produc-
tion team are forced to locate the studio, but without a geographical context for
it, it must remain as the ‘Studio’, an ambiguous term, but one that the audience
have taken on board because of the introduction of such new media.
Example three
In contrast to the above case, the televisual space offered by the BBC’s Match
of the Day soccer program, represents the synthesis of new models for geogra-
phy in a new media age. For years television audiences have absorbed the idea
of what television studios look like and where they are because they have grown
out of the theatre with a stage, backdrop, audience, and left and right wings.
They were even given addresses: Wood Lane, Pebble Mill, Camden Lock, etc.
New media technology over the past few years has introduced a different kind
of television studio. The BBC Nine o’clock News is a well known example of a
hyper-studio. Its introductory sequence, although computer generated imagery,
looks like a traditional TV studio, with a backdrop and lighting rig overhead.
Although simulated graphics are employed they still adhere to the spatial
convention that is a ‘Television Studio’.
Since this time we have seen many virtual sets employed, and the distinct
nature of computer graphic imagery enables an audience to recognise when a set
is not a set, rather an example of the blue screen technique. Whilst much of the
imagery aspires to remove the presenters from the ‘TV studio’, as an audience,
we are still able to deconstruct a scene and establish the somewhere behind all
the special effects, a TV studio in the traditional sense.
In contrast to this illusion, BBC’s Match of the Day adopts a spatial
metaphor that aspires not to remove the set from the TV studio, but to assertively
re-locate it in a different space. A strategy that once again confuses another
model of geography. The superimposed imagery behind the presenters is simply
a window, a window overlooking a football pitch late at night. As the camera
pans left to right to complete an idea of the space, the viewers see a West stand
to the left with its floodlight, a central stand with its pediment inferring the
halfway line of the pitch that is directly below, and an East stand, with its flood
light, to complete the composition. For an audience who are used to live games
from stadia around the world, the composition demands we locate the presenta-
tion team in a press booth above such a stadium, and no longer in a TV Studio
at all. Whilst representing a successful synthesis of subject and metaphor, it
SPACE UNDER STRESS 141
suggests that once again new media is eroding the conventions of space that we
use to locate ourselves in our relationship with television.
Over the past twenty years the social and psychological sciences have provided
the most thorough work in attempting to model and make sense of our dealings
with space. In particular, significant contributions have been made by Bourdieu
(1977), Lefebvre (1991), and Foucault (1980). Coming from quite different
directions, they represent the consolidation of Social Spatialisation as a field of
study that addresses the relationship between objects and their environments.
Prior to their work, post-enlightenment aspirations described ‘mental images’ of
our environment that we carried around, to enable our successful navigation
through geographies. However, whilst revealing many aspects of human spatial
interaction, much of the work struggles to account for the effects of new media
technologies, as highlighted in the examples given above. Bourdieu’s work looks
to a structural approach to establishing key methods for making sense of space
and time, constructing his habitus, a system using different class patterns and
activities to represent a vocabulary for our actions in time and space. It is
described thus:
Experience is a system and consequently the objective world is constructed
through the imposition of cultural categories on reality. Perception thus takes
place through a mediating value-framework which differentiates the facticity
of the environment in which one lives (Shields 1991: 32).
In this way habitus becomes his proposal for the system that motivates the
human to interact with a world and in turn, allows them to make sense of a
world. Treating the mind as a metaphor of a world that is made up of objects,
that in turn consist of metaphors embodied in tasks. Systems within systems that
can be identified as resulting from class oriented childhood experiences. Howev-
142 CHRIS SPEED AND DEBORAH GARCÍA-TOBIN
er, whilst Bourdieu successfully deconstructs a Kabyle village and house using
his metaphors and ‘time-geography’ (Hagerstrand 1973, 1974, 1975), it has been
criticised due to his over structuring. Before long the reader is left wondering
which structure came first, the Kabyle Village or Bourdieu’s habitus. Although
Bourdieu embraces the complexity of fourth dimensional space, his preoccupa-
tion with the use of aspects of task, makes it very difficult to take account of the
impact new technology has upon lifestyles as they adapt to take on board new
representations of space. Leisure space, such as watching television, can be
happily absorbed into Bourdieu’s work, but what happens to our models for the
world as we watch telematic news events is disregarded. In addition, he provides
no proposals for the stresses his ‘habits’ come under as our lifestyles change as
a result of the technologies.
Foucault represents a move away from Bourdieu’s causal proposals, the
result of a social situation, and instead looks to identify where the motivation
comes from. Not discounting Bourdieu’s social effects, Foucault tries to identify
how they become embroiled with other aspirations that inform how we discover
space. Foucault set out to readdress a conceptual model for space, not that
dissimilar to the aspirations of the enlightenment, but instead of proposing a
mental map that mirrors the world, he explores the idea of a spatial impression
of reality. A framework for space that embraces aspects of a Cartesian means of
sorting space, whilst placing them in a social and psychological context.
Territory is no doubt a geographical notion, but it’s first of all a juridico-
political one: the area controlled by a certain kind of power. Field is an
economico-juridical notion. Displacement: what displaces itself is an army, a
squadron, a population. Domain is a juridico-political notion. Soil is a histor-
ico-geological notion. Region is a fiscal, administrative, military notion.
Horizon is a pictorial, but also a strategic notion. (Foucault 1980: 68).
Through his dispositif Foucault outlines the complexity that our position or state
is founded on by identifying three parts: i. from formative acts that are the result
of a response to a new situation (‘emergency’) comes a strategy for action, ii. a
‘jurisdiction’ is developed as the action becomes stable and ordered, and iii. the
actions become heterogeneous as we struggle and succeed to fit new events into
the initial strategy. In this way, Foucault finds we interpret space through
strategies that form order and in turn, procedures.
There is an administration of knowledge, a politics of knowledge, relations of
power which pass via knowledge and which, if one tries to transcribe them,
lead one to consider forms of domination designated by such notions as field,
region and territory. (Foucault 1980: 69).
survive and maintain order through actions, much of the rhetoric of dispositif
relies upon strategy, and the Cartesian models for order, to the extent that one
senses the work is almost militarian in its conception.
The work is enjoyable because it recognises a procedural manner in which
we navigate through space, but its sense of rigour fails to explore the dilemmas
we face as we become caught in the spatial dichotomies that new media gener-
ates. Certainly, whilst aspiring to escape Cartesian time and space, in the end,
Foucault’s work is effective because it utilises a linear and objective relationship
with time and space, and thus becomes problematic when it takes on board the
non-linear and hyper directional nature of new media.
Lefebvre attempted to avoid the problem of adopting systems loaded with
time and space properties (Foucault), or social positions (Bourdieu), by identify-
ing a problem in the Cartesian semantics that describe space. Suggesting that in
order to fulfil the enlightenment panacea, space was split and separated into an
abstraction of space, identifiable through experience, and phenomenon of space
that is how it manifests itself in systems and form. In particular the development
and application of perspective within paintings represented for him the conquest
and embodiment of space, and one that inevitably empowered the bourgeois
component of society.
What interests Lefebvre is less the techniques than the politics of modernism,
which he treats as a politics of space by virtue of its involvement in the
production of a spatial imaginary whose assumptions and entailments extended
far beyond the canvas. (Gregory 1994: 393)
Conclusions
It has been shown that Bourdieu’s task based assumptions become inadequate as
he fails to take on board the consequences of multi-tasking on behalf of the
human in order to fit together the opposing spatial models and develop a new
task. Foucault’s work is more hopeful, since it addresses our struggle to make
sense of emergencies. Unfortunately, his adoption of the Cartesian framework of
space and time, makes it hard to see how his three part sequence of events that
describes our struggle, can operate so deftly in a non-linear context. Inevitably,
because Lefebvre opens his writings up with his third concept of ‘spaces of
representation’ we can more readily see how the condensation of a digital culture
can be absorbed and reconciled within our spatial concepts. Indeed, his foresight
to dissolve the Cartesian relationship between abstract space and physical space,
helps his model adopt non linear events. However, his ‘spatial practices’, used to
term and categorise spaces, comes under stress because it requires a consistency
that is derived from social activity. But if social activity is being transformed and
indeed simulated (as is the case with the Match of the Day set) through new
SPACE UNDER STRESS 145
References
Paul Mc Kevitt
Queens’s University, Belfast
1. Introduction
Figure 2. 2D plan of the ‘A’ building at Fredrik Bajers Vej 7, Aalborg University. Left:
ground floor; Right: 1st floor.
152 PAUL MC KEVITT
Presently, there is one static camera which calibrates the plans on the table
and the laser, and interprets the user’s pointing while the system points to
locations and draws routes with a laser. Inputs are simultaneous speech and/or
pointing gestures and outputs are synchronised speech synthesis and pointing. We
currently run all of CHAMELEON on a 200 MHz Intel pentium computer (r2d2)
which handles input for the Campus Information System in real-time. More
details on CHAMELEON and the IntelliMedia WorkBench can be found in
Brøndsted et al. (1998).
We present here a sample dialogue which the current first prototype can process.
The example includes user intentions which are instructions and queries, and
exophoric/deictic reference.
USER: Show me Tom’s office.
CHAMELEON: [points]
This is Tom’s office.
USER: Point to Thomas’ office.
CHAMELEON: [points] This is Thomas’ office.
USER: Where is the computer room?
CHAMELEON: [points] The computer room is here.
USER: [points to instrument repair] Whose office is this?
CHAMELEON: [points] This is not an office, this is instrument repair.
USER: [points] Whose office is this?
CHAMELEON: [points] This is Paul’s office.
USER: Show me the route from Lars Bo Larsen’s office to Hanne
Gade’s office.
CHAMELEON: [draws route] This is the route from Lars Bo’s office to
Hanne’s office.
USER: Show me the route from Paul Mc Kevitt’s office to instrument
repair.
CHAMELEON: [draws route] This is the route from Paul’s office to instrument
repair.
USER: Show me Paul’s office.
CHAMELEON: [points] This is Paul’s office.
Note that CHAMELEON assumes Paul Dalsgaard as default Paul1 although there
are two Pauls. A later prototype of the system should become active here and ask
the user a question by first pointing out that there are two Pauls and then asking
CHAMELEON MEETS SPATIAL COGNITION 153
which does he/she mean. CHAMELEON can process deictic reference (“Whose
office is this?”) which is one of the most frequently occurring phenomena in
IntelliMedia. However, spatial relations (e.g. “Who’s in the office beside him?”)
are another phenomenon occurring regularly in language and vision integration
which are not yet implemented in CHAMELEON. Also, there are other project-
ive spatial relations such as “left”, “right”, “above”, “below”, and queries like
“Who’s in the office two up from him?” which occur regularly.
each other or the blackboard. Semantic representations are frames in the spirit of
Minsky (1975). The intention is that all modules in the system will produce and
read frames. The frame semantics was first presented in Mc Kevitt and
Dalsgaard (1997) and for the sample dialogue given in Section 2.2
CHAMELEON’s actual blackboard history in terms of frames (messages) is
shown in Appendix A.
The dialogue manager makes decisions about which actions to take and
accordingly sends commands to the output modules (laser and speech synthe-
siser) via the blackboard. At present the functionality of the dialogue manager is
to integrate and react to information coming in from the speech/NLP and gesture
modules and to sending synchronised commands to the laser system and the
speech synthesiser modules.
The domain model contains a database of all locations and their functional-
ity, tenants and coordinates. The model is organised in a hierarchical structure:
areas, buildings and rooms. Rooms are described by an identifier for the room
(room number) and the type of the room (office, corridor, toilet, etc.). The
model includes functions that return information about a room or a person.
Possible inputs are coordinates or room number for rooms and name for persons,
but in principle any attribute can be used as key and any other attribute can be
returned. Furthermore, a path planner is provided, calculating the shortest route
between two locations.
A design principle of imposing as few physical constraints as possible on
the user (e.g. data gloves or touch screens) leads to the inclusion of a vision
based gesture recogniser. Currently, it tracks a pointer via a camera mounted in
the ceiling. Using one camera, the gesture recogniser is able to track 2D pointing
gestures in real time. Only two gestures are recognised at present: pointing and
not-pointing. From each digitised image the background is subtracted leaving
only the motion (and some noise) within this image. This motion is analysed in
order to find the direction of the pointing device and its tip. By temporal
segmenting of these two parameters, a clear indication of the position the user is
pointing to at a given time is found. The error of the tracker is less than one
pixel (through an interpolation process) for the pointer.
A laser system acts as a “system pointer”. It can be used for pointing to
positions, drawing lines and displaying text. The laser beam is controlled in real-
time (30 kHz). It can scan frames containing up to 600 points with a refresh rate
of 50 Hz thus drawing very steady images on surfaces. It is controlled by a
standard Pentium PC host computer. The pointer tracker and the laser pointer
have been carefully calibrated so that they can work together. An automatic
calibration procedure has been set up involving both the camera and laser where
they are tested by asking the laser to follow the pointer.
CHAMELEON MEETS SPATIAL COGNITION 155
3. Frame semantics
where MODULE is the name of the input module producing the frame, INPUT
can be at least UTTERANCE or GESTURE, input is the utterance or gesture and
intention-type includes different types of utterances and gestures. An utterance
input frame can at least have intention-type (1) query?, (2) instruction! and (3)
declarative. An example of an utterance input frame is:
[SPEECH-RECOGNISER
[UTTERANCE: (Point to Hanne’s office)
[INTENTION: instruction!
[TIME: timestamp]
A gesture input frame is where intention-type can be at least (1) pointing, (2)
mark-area, and (3) indicate-direction. An example of a gesture input frame is:
[GESTURE
[GESTURE: coordinates (3, 2)
[INTENTION: pointing
[TIME: timestamp]
[LASER
[INTENTION: description (pointing)
[LOCATION: coordinates (5, 2)
[TIME: timestamp]
Integration frames are all those other than input/output frames. An example
utterance integration frame is:
[NLP
[INTENTION: description (pointing)
[LOCATION: office (tenant Hanne) (coordinates (5, 2))
[UTTERANCE: (This is Hanne’s office)
[TIME: timestamp]
Things become even more complex with the occurrence of references and spatial
relationships:
[MODULE
[INTENTION: intention-type
[LOCATION: location
[LOCATION: location
[LOCATION: location
[SPACE-RELATION: beside
[REFERENT: person
[LOCATION: location
[TIME: timestamp]
An example of such an integration frame is:
[DOMAIN-MODEL
[INTENTION: query? (who)
[LOCATION: office (tenant Hanne) (coordinates (5, 2))
[LOCATION: office (tenant Jørgen) (coordinates (4, 2))
[LOCATION: office (tenant Børge) (coordinates (3, 1))
[SPACE-RELATION: beside
[REFERENT: (person Paul-Dalsgaard)
[LOCATION: office (tenant Paul-Dalsgaard) (coordinates (4, 1))
[TIME: timestamp]
Here we derive all the frames appearing on the blackboard for the example:
CHAMELEON MEETS SPATIAL COGNITION 159
PROCESSING(5):
NLP:
(1) wakes up when it detects registering of F-int
(2) reads F-int and sees it’s from DOMAIN-MODEL
(3) produces updated F-int (intention + utterance)
(4) places and registers updated F-int on blackboard:
FRAME(F-int)(4):
[NLP
[INTENTION: declarative (who)
[LOCATION: office (tenant Hanne) (coordinates (5, 2))
[LOCATION: office (tenant Jørgen) (coordinates (4, 2))
[LOCATION: office (tenant Bøerge) (coordinates (3, 1))
[SPACE-RELATION: beside
[REFERENT: (person Paul-Dalsgaard)
[LOCATION: office (tenant Paul-Dalsgaard) (coordinates (4, 1))
[UTTERANCE: (Bøerge, Jørgen and Hanne’s offices are beside Paul
[Dalsgaard’s office)
[TIME: timestamp]
PROCESSING(6):
LASER:
(1) wakes up when it detects registering of F-int
(2) reads F-int and sees it’s from DOMAIN-MODEL
(3) produces F-out (pruning + registering)
(4) places and registers F-out on blackboard:
FRAME(F-out)(1):
[LASER
[INTENTION: description (pointing)
[LOCATION: coordinates (5, 2)
[LOCATION: coordinates (4, 2)
[LOCATION: coordinates (3, 1)
[SPACE-RELATION: beside
[REFERENT: (person Paul-Dalsgaard)
[LOCATION: coordinates (4, 1)
[TIME: timestamp]
PROCESSING(7):
SPEECH-SYNTHESIZER:
(1) wakes up when it detects registering of F-int
162 PAUL MC KEVITT
The representation of the spatial relation “beside” as given in the frame seman-
tics above is similar to what Herskovits (1986) termed a spatial proposition,
(〈relation name〉 〈LO〉 〈sequence of ROs〉)
where LO is an object to be localised and RO is reference object. In our example
above the LO is “Paul Dalsgaard” and the ROs are the other offices beside his.
Blocher and Stopp (1995) give a detailed computational model for repre-
senting and generating spatial relations for SOCCER, a system which automati-
CHAMELEON MEETS SPATIAL COGNITION 163
cally generates reports of short soccer games. Their focus is more on generating
spatial relations rather than processing them as input queries. Maaö (1994) looks
at the area of route descriptions and how a speaker presents step-by-step relevant
route information in a 3D environment with an implementation called MOSES.
Specifically addressed is the interaction between the spatial relation and the
presentation representation used for natural language descriptions. Again, the
focus here is generating spatial relations rather than recognising them. SOCCER
and MOSES are part of a general project called VITRA (VIsual TRAnslator)
concerning the design and construction of integrated knowledge-based systems
for translating visual information into natural language descriptions (Herzog and
Wazinski 1994).
The L0 project (Feldman et al. 1996) focusses on combining not only vision
and natural language modelling, but also learning. The task is to build a system
that can learn the appropriate fragment of any natural language from sentence-
picture pairs. Important lessons have been learned in the subtle semantics of
spatial language, especially since L0 is multilingual (English, Mixtec, German,
Bengali, and Japanese) and spatial language is something which changes a lot
over languages. The L0 implementation of spatial language modelling is conduct-
ed mainly in the connectionist computational framework. Situated Artificial
Communicators (SFB-360) (Rickheit and Wachsmuth 1996) is a collaborative
research project at the University of Bielefeld, Germany which focusses on
modelling that which a person performs when with a partner he cooperatively
solves a simple assembly task in a given situation. The object chosen is a model
airplane (Baufix) to be constructed by a robot from the components of a wooden
building kit with instructions from a human. SFB-360 includes equivalents of the
modules in CHAMELEON although there is no learning module competitor to
Topsy. What SFB-360 gains in size it may loose in integration, i.e. it is not clear
yet that all the technology from the subprojects have been fitted together and in
particular what exactly the semantic representations passed between the modules
are. The DACS process communication system currently used in CHAMELEON
is a useful product from SFB-360.
Gandalf is a communicative humanoid which interacts with users in
MultiModal dialogue through using and interpreting gestures, facial expressions,
body language and spoken dialogue (Thórisson 1997). Gandalf is an application
of an architecture called Ymir which includes perceptual integration of multi-
modal events, distributed planning and decision making, layered input analysis
and motor-control with human-like characteristics and an inherent knowledge of
time. Ymir has a blackboard architecture and includes modules equivalent to
those in CHAMELEON. However, there is no vision/image processing module
164 PAUL MC KEVITT
since gesture tracking is done with the use of a data glove and body tracking suit
and an eye tracker is used for detecting the user’s eye gaze. Also, Ymir has no
learning module equivalent to Topsy. Ymir’s architecture is even more distribut-
ed than CHAMELEON’s with many more modules interacting with each other.
Also, Ymir’s semantic representation is much more distributed with smaller
chunks of information than our frames being passed between modules.
AESOPWORLD is an integrated comprehension and generation system for
integration of vision, language and motion (Okada 1997). It includes a model of
mind consisting of nine domains according to the contents of mental activities
and five levels along the process of concept formation. The system simulates the
protagonist or fox of an AESOP fable, “the Fox and the Grapes”, and his mental
and physical behaviour are shown by graphic displays, a voice generator, and a
music generator which expresses his emotional states. AESOPWORLD has an
agent-based distributed architecture and also uses frames as semantic representa-
tions. It has many modules in common with CHAMELEON although again there
is no vision input to AESOPWORLD which uses computer graphics to depict
scenes. AESOPWORLD has an extensive planning module but conducts more
traditional planning than CHAMELEON’s Topsy.
The INTERACT project (Waibel et al. 1996) involves developing Multi-
Modal Human Computer Interfaces including the modalities of speech, gesture
and pointing, eye-gaze, lip motion and facial expression, handwriting, face
recognition and tracking, and sound localisation. The main concern is with
improving recognition accuracies of modality specific component processors as
well as developing optimal combinations of multiple input signals to deduce user
intent more reliably in cross-modal speech-acts. INTERACT also uses a frame
representation for integrated semantics from gesture and speech and partial
hypotheses are developed in terms of partially filled frames. The output of the
interpreter is obtained by unifying the information contained in the partial
frames. Although Waibel et al. present good work on multimodal interfaces it is
not clear that they have developed an integrated platform which can be used for
developing multimodal applications.
tion about things on a physical table and, in particular, the Campus Information
System domain. Next, we discussed the frame semantics representation of
CHAMELEON and how the query, “Who’s in the office beside him?” is
processed through the semantics with all associated frames and module interac-
tions. More details on CHAMELEON and the IntelliMedia WorkBench can be
found in Brøndsted et al. (1998).
There are a number of avenues for future work with CHAMELEON. The
frame semantics handling of “beside” has yet to be implemented and the next
step is to move onto modelling other projective spatial relations. Also, presently
CHAMELEON provides route descriptions through laser pointing but also more
detailed verbal descriptions could be given hand-in-hand with those drawn by the
laser, mentioning “left”, “right” and other turns for routes. It is hoped that more
complex decision taking can be introduced to operate over semantic representa-
tions in the dialogue manager or blackboard using, for example, the HUGIN
software tool (Jensen F. 1996) based on Bayesian Networks (Jensen F.V. 1996).
The gesture module will be augmented so that it can handle gestures other than
pointing. Topsy will be asked to do more complex learning and processing of
input/output from frames. The microphone array has to be integrated into
CHAMELEON and set to work.
Intelligent MultiMedia will be important in the future of international
computing and media development and IntelliMedia 2000+ at Aalborg Universi-
ty, Denmark brings together the necessary ingredients from research, teaching
and links to industry to enable its successful implementation. Our CHAMELEON
platform and IntelliMedia WorkBench application are ideal for testing integrated
processing of language and vision for the future of SuperinformationhighwayS.
Acknowledgments
This opportunity is taken to acknowledge support from the Faculty of Science and Technology,
Aalborg University, Denmark and Paul Mc Kevitt would also like to acknowledge the British
Engineering and Physical Sciences Research Council (EPSRC) for their generous funded support
under grant B/94/AF/1833 for the Integration of Natural Language, Speech and Vision Processing
(Advanced Fellow) and LIMSI-CNRS, Orsay, France where he was a Visiting Professor whilst
completing this paper. Annelies Braffort, Tom Brøndsted, Paul Dalsgaard, Rachid Gherbi, Lars Bo
Larsen, Michael Manthey, Thomas B. Moeslund, and Kristian G. Olesen are acknowledged for useful
discussions.
166 PAUL MC KEVITT
Note
References
Beck, A. (1991). Description of the EUROTRA framework. The Eurotra Formal Specifi-
cations, Studies in Machine Translation and Natural Language Processing. C.
Copeland, J. Durand, S. Krauwer and B. Maegaard (Eds) Vol 2, 7–40. Luxembourg:
Office for Official Publications of the Commision of the European Community.
Blocker, A. and E. Stopp (1995). Time-dependent generation of minimal sets of spatial .
descriptions. In Working notes of the Workshop on Representation and Processing of
Spatial Descriptions. Oliver, Patrick Luke (Ed) 1–7, Fourteenth IJCAI, Montréal,
Canada.
Brøndsted (1998) nlparser. WWW: http://www.kom.auc.dk/~tb/nlparser.
Brøndsted, T., P. Dalsgaard, L. B. Larsen, M. Manthey, P. Mc Kevitt, T. B. Moeslund,
K. G. Olesen (1998) A platform for developing Intelligent MultiMedia applications.
Technical Report R-98–1004, Center for PersonKommunikation (CPK), Institute for
Electronic Systems (IES), Aalborg University, Denmark, May.
Christensen, Heidi, Børge Lindberg and Pall Steingrimsson (1998) Functional specifica-
tion of the CPK Spoken LANGuage recognition research system (SLANG). Center for
Personkommunikation, Aalborg University, Denmark, March.
Denis, M. and M. Carfantan (Eds.) (1993) Images et langages: multimodalitée et model-
isation cognitive. Actes du Colloque Interdisciplinaire du Comitée National de la
Recherche Scientifique, Salle des Conférences, Siège du CNRS, Paris, April.
Feldman, Jerome, George Lakoff, David Bailey, Srini Narayanan, Terry Regier and
Andreas Stolcke (1996) L0 — the first five years of an automated language acquisi-
tion project. In Artificial Intelligence Review, 8, 175–187.
Fink, Gernot A., Nils Jungclaus, Franz Kummert, Helge Ritter and Gerhard Sagerer
(1996) A distributed system for integrated speech and image understanding. In
Proceedings of the International Symposium on Artificial Intelligence, Rogelio Soto
(Ed.), 117–126. Cancun, Mexico.
Herskovits (1986) Language and spatial cognition: an interdisciplinary study of the
prepositions in English. Cambridge, England: Cambridge University Press.
Herzog, Gerd and Peter Wazinski (1994) VIsual TRAnslator: linking perceptions and
natural language descriptions. In Artificial Intelligence Review, 8, 175–187.
Infovox (1994) INFOVOX: Text-to-speech converter user’s manual (version 3.4). Solna,
Sweden: Telia Promotor Infovox AB.
Jensen, Finn V. (1996) An introduction to Bayesian Networks. London, England: UCL
Press.
CHAMELEON MEETS SPATIAL COGNITION 167
Jensen, Frank (1996) Bayesian belief network technology and the HUGIN system. In
Proceedings of UNICOM seminar on Intelligent Data Management, Alex Gammerman
(Ed.), 240–248. Chelsea Village, London, England, April.
Kosslyn, S. M. and J. R. Pomerantz (1977) Imagery, propositions and the form of internal
representations. In Cognitive Psychology, 9, 52–76.
Leth-Espensen, P. and B. Lindberg (1996) Separation of speech signals using eigen-
filtering in a dual beamforming system. In Proc. IEEE Nordic Signal Processing
Symposium (NORSIG), Espoo, Finland, September, 235–238.
Maaö, Wolfgang (1994) From vision to multimodal communication: incremental route
descriptions. In Artificial Intelligence Review, 8, 159–174.
Maaö, Wolfgang (Ed.) (1996) Proceedings of the ECAI-96 Workshop on the representa-
tions and processes between vision and natural language. Twelfth European Confer-
ence on Artificial Intelligence (ECAI-96), Budapest, Hungary.
Manthey, Michael J. (1998) The Phase Web Paradigm. In International Journal of
General Systems, special issue on General Physical Systems Theories, K. Bowden
(Ed.).
Mc Kevitt, Paul (1994) Visions for language. In Proceedings of the Workshop on Integra-
tion of Natural Language and Vision processing, Twelfth American National
Conference on Artificial Intelligence (AAAI-94), Seattle, Washington, USA, August,
47–57.
Mc Kevitt, Paul (Ed.) (1995/1996) Integration of Natural Language and Vision Processing
(Vols. I-IV). Dordrecht, The Netherlands: Kluwer-Academic Publishers.
Mc Kevitt, Paul (1997a) SuperinformationhighwayS. In “Sprog og Multimedier” (Speech
and Multimedia), Tom Brøndsted and Inger Lytje (Eds.), 166–183, April 1997.
Aalborg, Denmark: Aalborg Universitetsforlag (Aalborg University Press).
Mc Kevitt, Paul (1997b) IntelliMedia TourGuide: understanding reference at the lan-
guage/vision interface. In Proceedings of the European Science Foundation (ESF)
Network on Converging Computing Methodologies in Astronomy (CCMA), Final
Conference on Advanced Techniques and Methods for Astronomical Information
Handling, M. C. Maccarone, F. Murtagh, M. Kurtz and A. Bijaoui (Eds.), 93–103.
Sonthofen, Germany, September, Observatoire de la Cote d’Azur — B. P. 4229
06304 Cedex 4, France. (also published on Web; see http://newb6.u-
strasbg.fr/~ccma).
Mc Kevitt, Paul and Paul Dalsgaard (1997) A frame semantics for an IntelliMedia
TourGuide. In Proceedings of the Eighth Ireland Conference on Artificial Intelligence
(AI-97), Volume 1, 104–111. University of Uster, Magee, Derry, Northern Ireland,
September.
Minsky, Marvin (1975) A framework for representing knowledge. In The Psychology of
Computer Vision, P. H. Winston (Ed.), 211–217. New York: McGraw-Hill.
Nielsen, Claus, Jesper Jensen, Ove Andersen, and Egon Hansen (1997) Speech synthesis
based on diphone concatenation. Technical Report, No. CPK971120-JJe (in confi-
dence), Center for PersonKommunikation, Aalborg University, Denmark.
168 PAUL MC KEVITT
Okada, Naoyuki (1997) Integrating vision, motion and language through mind. In
Proceedings of the Eighth Ireland Conference on Artificial Intelligence (AI-97),
Volume 1, 7–16. University of Uster, Magee, Derry, Northern Ireland, September.
Olivier, Patrick Luke (Ed.) (1995) Working notes of the workshop on Representation and
processing of spatial expressions. Fourteenth International Joint Conference on
Artificial Intelligence (IJCAI-95), Montreal, Canada, August.
Olivier, Patrick Luke (Ed.) (1996) Proceedings of the ECAI-96 workshop on the represen-
tations and processing of spatial expressions. Twelfth European Conference on
Artificial Intelligence (ECAI-96), Budapest, Hungary.
Olivier, Patrick Luke (Ed.) (1997) Proceedings of the AAAI-97 workshop on language and
space. Fourteenth American National Conference on Artificial Intelligence
(AAAI-97), Rhode Island, New Jersey.
Pentland, Alex (Ed.) (1993) Looking at people: recognition and interpretation of human
action. IJCAI-93 Workshop (W28) at The 13th International Conference on
Artificial Intelligence (IJCAI-93), Chambery, France, EU, August.
Power, Kevin, Caroline Matheson, Dave Ollason and Rachel Morton (1997) The
grapHvite book (version 1.0). Cambridge, England: Entropic Cambridge Research
Laboratory Ltd.
Pylyshyn, Zenon (1973) What the mind’s eye tells the mind’s brain: a critique of mental
imagery. In Psychological Bulletin, 80, 1–24.
Retz-Schmidt, Gudala (1988) Various views on spatial prepositions. In AI Magazine,
9:95–105.
Rickheit, Gert and Ipke Wachsmuth (1996) Collaborative Research Centre “Situated
Artificial Communicators” at the University of Bielefeld, Germany. In Integration
of Natural Language and Vision Processing, Volume IV, Recent Advances, Mc Kevitt,
Paul (ed.), 11–16. Dordrecht, The Netherlands: Kluwer Academic Publishers.
Thórisson, Kris R. (1997) Layered action control in communicative humanoids. In
Proceedings of Computer Graphics Europe ’97, June 5–7, Geneva, Switzerland.
Waibel, Alex, Minh Tue Vo, Paul Duchnowski and Stefan Manke (1996) Multimodal
interfaces. In Integration of Natural Language and Vision Processing, Volume IV,
Recent Advances, Mc Kevitt, Paul (Ed.), 145–165. Dordrecht, The Netherlands:
Kluwer Academic Publishers.
Zelinsky-Wibbelt (Ed.), Cornelia (1993) The semantics of prepositions: from mental
Processing to natural language processing (NLP 3). Berlin, Germany: Mouton de
Gruyter.
CHAMELEON MEETS SPATIAL COGNITION 169
Appendix A
Blackboard in practice
Here we show the complete blackboard (with all frames) as produced exactly by
CHAMELEON for the example dialogue given in Section 2.
Received: nlp(intention(instruction(pointing)),location(person(tb),type(office)),
time(889524794))
which is passed on to dialog_manager
Received: dialog_manager(output(laser(point(coordinates(249,623))),
speech_synthesizer(utterance(“This is Toms office”))))
Calling laser: laser(point(coordinates(249,623)))
Calling speech_synthesizer: speech_synthesizer(utterance(“This is Toms office”))
Received: nlp(intention(instruction(pointing)),location(person(tbm),type(office)),
time(889524818))
which is passed on to dialog_manager
Received: dialog_manager(output(laser(point(coordinates(278,623))),
speech_synthesizer(utterance(“This is Thomass office”))))
Calling laser: laser(point(coordinates(278,623)))
Calling speech_synthesizer: speech_synthesizer(utterance(“This is Thomass office”))
Received: nlp(intention(query(where)),location(place(a2_221)),
time(889524831))
which is passed on to dialog_manager
Received: dialog_manager(output(laser(point(coordinates(132,500))),
speech_synthesizer(utterance(“computer room is here”))))
Calling laser: laser(point(coordinates(132,500)))
Calling speech_synthesizer: speech_synthesizer(utterance(“computer room is here”))
Received: nlp(intention(query(who)),location(this($Deixis),type(office)),
time(889524864))
which is passed on to dialog_manager
Received: dialog_manager(output(laser(point(coordinates(658,546))),
speech_synthesizer(utterance(“This is not an office, this is instrument repair”))))
Calling laser: laser(point(coordinates(658,546)))
Calling speech_synthesizer:
speech_synthesizer(utterance(“This is not an office, this is instrument repair”))
Received: nlp(intention(query(who)),location(this($Deixis),type(office)),
time(889524885))
which is passed on to dialog_manager
170 PAUL MC KEVITT
Received: dialog_manager(output(laser(point(coordinates(223,568))),
speech_synthesizer(utterance(“This is Pauls office”))))
Calling laser: laser(point(coordinates(223,568)))
Calling speech_synthesizer: speech_synthesizer(utterance(“This is Pauls office”))
Received: nlp(intention(instruction(show_route)),source(location(person(lbl),
type(office))),
destination(location(person(hg),type(office))),time(889524919))
which is passed on to dialog_manager
Received: dialog_manager(output(laser(route(coordinates(278,585,278,603,249,
603,220,603,197,603,197,623))),
speech_synthesizer(utterance(“This is the route from Lars Bos office to Hannes o-
fice”))))
Calling laser:
laser(route(coordinates(278,585,278,603,249,603,220,603,197,603,197,623)))
Calling speech_synthesizer:
speech_synthesizer(utterance(“This is the route from Lars Bos office to Hannes office”))
Received: nlp(intention(instruction(show_route)),source(location(person(pmck),
type(office))),destination(location(place(a2_105))),time(889524942))
which is passed on to dialog_manager
Received: dialog_manager(output(laser(route(coordinates(174,453,153,453,153,
481,153,500,153,510,153,540,153,569,153,599,153,603,184,603,197,603,220,603,
249,603,278,603,307,603,330,603,330,655,354,655,911,655,884,655,884,603,810,
603,759,603,717,603,717,570,696,570))),
speech_synthesizer(utterance(“This is the route from Pauls office to instrument repair”))))
Calling laser: laser(route(coordinates(174,453,153,453,153,481,153,500,153,
510,153,540,153,569,153,599,153,603,184,603,197,603,220,603,249,603,278,603,
307,603,330,603,330,655,354,655,911,655,884,655,884,603,810,603,759,603,717,
603,717,570,696,570)))
Calling speech_synthesizer:
speech_synthesizer(utterance(“This is the route from Pauls office to instrument repair”))
Received: nlp(intention(instruction(pointing)),location(person(pd),type(office)),
time(889524958))
which is passed on to dialog_manager
Received: dialog_manager(output(laser(point(coordinates(220,585))),
speech_synthesizer(utterance(“This is Pauls office”))))
SONAS: Multimodal, Multi-User Interaction
with a Modelled Environment
Seán Ó Nualláin
NOUS Research
Introduction
This paper describes some of the work being carried out by the IME (Interaction
with Modelled Environments) group at NOUS on the SONAS system. We
describe the theoretical foundations of the system, and discuss the research being
carried out in the area of natural language and vision integration. Finally, we
give a description of the general architecture of the current SONAS implementa-
tion. The primary aim of this research is to develop a system for navigating and
interacting with a 3D environment through spoken natural language, gesture,
hand-written characters/icons, and whatever other modes emerge.
Motivation
Over the last 40 years computer systems have developed at a staggering rate in
both power and complexity. This development has given modern computer users
access to massive amounts of information. Indeed, the modern user has so much
information at their fingertips that it is becoming difficult for them to extract the
desired information from the deluge that is presented to them.
The primary cause of this is that very little progress has been made in
172 J. KELLEHER, T. DORIS, Q. HUSSAIN, AND S. Ó NUALLÁIN
– A hierarchical organisation,
– A syntax—semantics division,
– Ambiguity.
These similarities have led many researchers to believe that visual images are
closely connected to the mental models that underlie the human use of language.
The advantages of a system that links language and vision are manifold. The
visual model grounds the language and situates the dialogue in time and space,
thus references can almost always be constrained by locality. Also in such a
system the model of the world is in some sense closed. This gives us the ability
to exhaustively represent certain properties of objects in the world. This exhaus-
tive representation allows a planner to ignore possible effects of the frame
problem. In addition, the system has ability to give the user immediate feedback
about how it understands what is being said. This is an enormous advantage and
addresses a major drawback of more traditional natural language interfaces that
was highlighted by Dennett(1991, p57–58):
Surely a major source of widespread scepticism about “machine understand-
ing” of natural language is that such systems almost never avail themselves of
anything like a visual workspace in which to parse of analyse the input. If they
did, the sense that they were actually understanding what they processed would
be greatly heightened (whether or not it would still be, as some insist, as
illusion). As it is, if a computer says, “I see what you mean” in response to
input, there is a strong temptation to dismiss the assertion as an obvious fraud.
However, if one wishes to claim that a natural language system has a cognitively
plausible understanding of what it is “hearing”, the system must do more than
consider the relationship between expression of a knowledge representation
language and natural language expression, but must also rely on the definition of
a referential semantics. To achieve this we associate naive physical attributes
with each object in the environment, and use these attributes as selectional
restrictions on the applicability of a verb on a object.
While a major part of our work is directed towards the development of NLP
input to the system, we are also extremely interested in utilising and organising
input from and feedback to the user through other modalities, especially vision.
The design of the visual front end has be based on Ben Schneiderman’s
mantra of “overview first, zoom and filter, the details on demand.”
174 J. KELLEHER, T. DORIS, Q. HUSSAIN, AND S. Ó NUALLÁIN
There are two visual displays in the user interface: a 2D-overview map of
the world and a 3D OpenGL viewer which displays the location in the world
where the viewer is currently situated.
The user can position themselves in the world through the 2D overview
map, inputting instructions through mouse, language, gesture or a combination of
these. The result of this input is displayed as a scene change in the OpenGL 3D
environment. If the user wishes to interact directly with any object that they
come across in the world, they can do so again through multiple modalities. It is
this process of object manipulation that we discuss in the next section.
In the Spoken Image project, the predecessor to the SONAS system, a three-
dimensional view of a physical environment is always present on the screen, seen
through the eyes of a user agent that moves around and modifies the world in
response to the user’s utterance’s and gestures. The goal of the project was a
system that anyone could use without training to quickly build a house or a town
scene, modifying almost any of the details until it is exactly as the user has
envisioned it.
The original project was developed at the NRC, Canada. This system takes
natural language input via the keyboard. The Alvey Natural Language Toolkit is
then used for the initial stages of the linguistic analysis. The output of the ANLT
parser is a logical form representation of the English input, which forms the
basis for further interpretation in the system. The physical model against which
linguistic propositions are evaluated, and in which most actions are carried out,
is built from the data structures that directly support the three-dimensional
graphical display. Elements of any particular scene are instances of classes in an
object-oriented system implemented in C++. The way the physical model is
linked with the interpretation of the logical form expressions is conceptually
simple. Word-senses are defined as instances in another object-oriented hierar-
chy, whose classes group concepts and grammatical operators. Each class has its
own interpretation methods, which can be made as specific as needed. The
instances of any model class are the actual objects that appear in a world —
none of the model classes have instances until a world is created or loaded.
A world can contain several ‘viewers’, and the user can ask to look through
the eyes of any viewer in the scene. The currently selected viewer is the one that
responds to instructions to move and look around, while the ‘bodies’ of the other
SONAS 175
viewers appear in the world where they were most recently abandoned. This
system was single user, and lacked provision for gestural and spoken input.
Temporal
float IntTime;
Relation Event
Geo-Concept Object1; Relational_State InitState;
Geo-Concept Object2; Relational_State FinState;
…
Boolean Relation(Obj1, …);
Collective Action
Float Times[]; Agent AgName;
Relational_State
***Motion
Point[] MotFunc(InitPoint, FinPoint, Geo-Concept);
Physical
Char * Name;
Char * RefNum float Mass;
Point Shape[];
Char * energy_state; Physical_State
[Solid/Liquid/Gas]
Point WorldLocation;
Integer Color[3];
Boolean Supporting;
…
Physical ObjSupported[];
Boolean Supported;
Physical SupportingObj;
Char * energy_state; [Solid/Liquid/Gas]
Boolean Anchored;
…
User
Char* Password;
WorldObject
Agent
Motion
Point[] MotFunc(InitPoint, FinPoint, Geo-Concept, …);
StackMotion
Point[] StackFunc(InitPoint, FinPoint, Point);
SlideMotion
Point[] SlideFunc(InitPoint, FinPoint, Point, Surface);
SlideMotion Action
Point[] SlideFunc(InitPoint, FinPoint, Point, Surface); Agent AgName;
Telic_Action
WorldObject Theme;
WorldObject Reference;
Relational_State Goal = Final_State;
SlideAlong
avatar), instances of the Relational State classes, namely Initial State, Final State,
and Goal State (which in the case of a Telic Action is equal to the Final State)
and instances of the WorldObject classes representing the Reference and Figure
of the action as attributes. In our approach to analysing the semantic meaning of
a spatial expression we agree with the multiple relations model as proposed in
[Herskovits, 1986]. This is a multilayer approach where a spatial relation is
considered from a conceptual and semantic level.
At the conceptual level, the objects involved in the action are considered
with respect to their position in the physical ontology of objects. Here the
physical attributes of the objects are checked against the selectional restrictions
associated with the verb. In the SONAS system, this is achieved by checking the
Physical Attributes of the WorldObject Attributes representing the Figure and
Reference against restrictions associated with the verb. If the Physical Attributes
of the Figure and Reference are found to be consistent with the selectional
restrictions on the verb, we then proceed with the semantic layer of analysis.
At the semantic level in the Herskovits approach, the Reference and Figure
objects are reduced to geometric conceptualisations (Point, Line, Plane etc). Also
there is an ideal meaning for each preposition. These ideal meanings are relations
between two or more geometric concepts. Examples of these ideal meanings are
coincidence of points and contiguity of two surfaces. A set of use types for each
preposition is derived from ideal meaning by applying adaptations and shifts
SONAS 179
which are dependent upon which geometric concepts the Reference and Figure
have been reduced to (Figure 4).
Geometric
Reference Conceptualisations Use Types
Object Point
Line
Figure Plane
Spatial Object Volume
Locative etc
Expression
Ideal Meanings
Coincidence of
points,
Contiguity of
two surface,
Etc.
Figure 4. The conceptual composition of a spatial locative expression at the semantic level
in the Herskovits ontology
In the SONAS system the Use Types level of abstraction in the Herskovits
ontology is reflected in the set of relational classes in the class hierarchy (Figures
5 and 7). The initial state and final state attributes of the telic action are instanc-
es of these relational classes; these are used in conjunction with the Motion
classes function to compute the Translation required for the action. This is then
applied to the Figure object causing values in the Theme Object’s Physical State
to be altered. The next time through the event loop the changed state values causes
the object’s display function to update the scene in response to the user input.
These classes reflect the geometric forms that the Figure and Reference objects
are reduced to at the Semantic layer of analysis.
180 J. KELLEHER, T. DORIS, Q. HUSSAIN, AND S. Ó NUALLÁIN
Physical
Char * Name;
Char * RefNum;
float Mass;
Point Shape[];
Char * energy_state;
[Solid/Liquid/Gas]
Integer Colour[3]; ***Relation
… Geo-Concept Object1;
Geo-Concept Object2;
…
Geo-Conceptualisation Boolean Relation(Obj1, …);
Integer Dimension;
Geo-Conceptualisation
Integer Dimension;
Point
Dimension = 1; Volume
Double x; Area
Dimension = 3;
Double y; Dimension = 2; Double cubic_Metres;
Double z; Float are;
Line Fuzzy_Area
Dimension = 2; Float StatFunc(Point);
Point Start;
Point Finish; Surfaces
Float Slope; Float Width;
Float Height;
Point Centre;
Relation
Geo-Concept Object1;
Geo-Concept Object2;
...
Boolean Relation(Obj1, ...);
Coincidence_of_points Contiguity_of_two_surfaces
Point Reference; Surface Reference;
Point Figure; Surface Figure;
Boolean CoinPoints(Ref, Fig); Boolean Cont2Surf(Ref, Fig);
Figure 7. General form of the Relation class set. This set of classes represent the Use Type
level of abstraction in the Herskovits ontology
Much of the impetus for the SONAS has its roots in the prevailing HCI para-
digm. For this reason the system has been designed to have multimodal capabili-
ties. The ultimate goal of multimodal user interfaces is the increased transparency
of computational devices to end users. While Natural language input represents
the most obvious target of exploitation for designers of such systems in the
physical world, people often find difficulty expressing complex series of actions
in spoken form. Yet we can express such actions with ease through the medium
of gesture, or ‘actually doing it’. Thus gesture is a useful element to add to
multimodal access, and a practically indispensable one when modelling spatial
environments.
Either modality on its own has strengths and weaknesses, but the two
together should provide a medium rich enough to express most, if not all, of the
instructions that a non-technically proficient user would wish to express to an
information system. The increase in complexity for the developer of such
applications is considerable, however, when we grant the user the ability to
arbitrarily combine gesture and speech when interacting.
182 J. KELLEHER, T. DORIS, Q. HUSSAIN, AND S. Ó NUALLÁIN
The Sonas system is built upon the Open Agent Architecture developed at SRI.
The OAA provides a general ontology for the development of distributed agent
applications, in addition to the ‘Facilitator’ software, which as a first cut can be
described as the middle-ware that provides inter-agent communication services.
The Facilitator also maintains a blackboard, which agents may write data to and
read data from. The data on the blackboard is stored in Horn clause form;
functor(arg,arg…) e.g. noun([phone_number,’phone number’]) encodes that there
is a noun whose written form is ‘phone number’ which refers to the symbol
phone_number.
Upon connection to the Facilitator, the agent must declare the services that
it provides. Such services are referred to as ‘solvables’ in keeping with the
Prolog-inspired nomenclature. Such solvables also define the interface to the
agent, and generally reveal their purpose upon inspection. For instance, say(Text)
is a solvable declared by a text to speech agent, and its purpose and effect is
immediately obvious from its form.
Agents provide solvables, and may also issue solve requests. To do this an
agent formulates the goal that it needs solved (a goal may be the retrieval of
information or the performance of a task) and passes it to the facilitator. The
facilitator then uses its database of solvables to decide which agent can provide
the needed services. It then sends the solve request to that agent, and upon
completion of the task, it returns the solution to the original agent.
Typical interactions involve many such exchanges between members of the
agent community. Additionally, the facilitator provides a backtracking mechanism
similar to Prolog’s. The ‘language’ which agents communicate in is referred to
as the Interagent Communication Language (ICL); basically it consists of horn-
clauses and has few reserved symbols.
Sonas Architecture
References
While the theme of visual (or more broadly visuospatial) thinking is often struck
in the discourse of mathematicians and scientists, trying to pin down just what
this notion might mean is difficult. Geometers and topologists may generate a
body of introspective folklore about their own mental operations; but cognitive
scientists, in attempting to treat the topic rigorously, are faced with a host of
puzzles relating to topics such as mental imagery, visual memory, the relation-
ship between the senses of vision and touch, the relationship between visual and
linguistic cognition, and many more.
This paper is not the occasion for an attempt to sort out, or even summarize,
the myriad questions over what “mathematical visualization” might be, or what
role it plays in mathematical cognition and education. Nonetheless, it is worth
mentioning that there is some evidence (besides the purely anecdotal) that spatial
thinking is an identifiable component of human intelligence (Gardner 1983); that
it is an important predictor of success in college physics courses (Siemankowski
and McKnight 1971); and that certain types of abilities linked with spatial
thinking may be taught (see, for instance, Brinkmann’s (1966) report of a
curriculum for teaching visual thinking, or Olson’s work in developing an
educational toy to help teach very young children the often-problematic concept
of diagonality (Olson 1970)). It is our firm belief that — consistent with the
folklore — much of mathematical thinking is indeed distinctly visual in nature,
and that mathematical education would therefore profit from a greater emphasis
on activities that strengthen and exercise visual and spatial reasoning.
That said, however, there are delicate pedagogical questions about how to
help students develop such reasoning abilities. It would certainly be possible to
create “visual workbooks”, or drill-and-practice software systems, focusing
perhaps on the types of problems (e.g., mental-rotation tasks) typical of standard-
ized psychological tests of visual thinking. While such efforts may prove
effective in raising students’ test scores, it is our belief that they would prove
DESIGNING REAL-TIME SOFTWARE ADVISORS 187
appear in the transcript window at the top of the figure. When this expression is
evaluated, the user generates a new solid shape; this shape may then be “unfold-
ed” to produce both the solid form shown in the ThreeD window at bottom right,
and the folding pattern (also known as a folding net) in the TwoD window at
upper right. In the figure, the user has shaded in several faces of the folding net
on the screen.
Much more detail about the HyperGami system — and the wealth of
polyhedral forms and sculptures that may be created with the program — can be
found in Eisenberg and Nishioka (1997). The figure and description above,
however, should serve to indicate the basic idea behind the program: namely,
that the student may design new three-dimensional shapes on the screen by
applying functions (such as the “truncation” function) to pre-existing shapes.
Once a new 3D solid has been created, the program will attempt to create a
folding net for the shape; this net may now be decorated through a variety of
means (including Scheme language expressions), printed out and folded into an
attractive tangible model. By way of illustration, Figure 2 shows two examples
of polyhedral figures designed with HyperGami (both are relatively complex
shapes, constructed from multiple pieces).
Figure 2. Two HyperGami polyhedra: a variant of the regular icosahedron (left), and a
great stellated dodecahedron (right).
190 MIKE AND ANN EISENBERG
The previous section summarized the HyperGami application and suggested the
(immense) range of polyhedra that can be produced in the system. While these
shapes and sculptures have an undeniable appeal, they leave an important
question unresolved — namely, that of mathematical content. To put the matter
another way: why should these shapes have importance for mathematicians,
scientists, and engineers; and what are the important ideas or themes that we as
educators should stress in discussing these shapes with our students?
There are several types of answer to these questions. One style of response
stresses the occurrence of polyhedral forms in nature: the tetrahedral arrangement
of the bonds of carbon, the space-filling forms of crystallography, the geodesic-
dome-like shape of certain microorganisms (Cf. Senechal 1988). This style of
answer might be called “naturalistic”: it emphasizes the surprisingly ubiquitous
character of polyhedra in the world and (by implication) would encourage
students to develop familiarity with the shapes, as a bird watcher might learn the
identities of so many songbirds. A related theme — one that naturally accompa-
nies this urge toward easy familiarity with shapes — is the notion of developing
a “taxonomy” of shapes, spotting a family resemblance between distinct polyhe-
dra. This is a theme that one often encounters in mathematical writing on the
subject: the classical polyhedra are related to each other by a variety of opera-
tions including stellation, vertex and edge truncation, taking the dual of a solid
(identifying the faces of one with the vertices of another, and vice versa),
“capping” of faces, and so forth. Finding such relationships between shapes is an
activity with an ancient pedigree: the fifteenth book of Euclid’s Elements (added
after Euclid’s death) described the inscription of a cube inside an octahedron and
vice versa (Coxeter 1973, p. 30); while (according to Senechal (1988)), the 16th-
century goldsmith Wenzel Jammitzer wrote a book entitled Perspectiva Cor-
porum Regularium in which “each of the five regular solids is presented in
exquisite variation.” (p. 11) Among contemporary authors, Loeb (1991), for
instance, writes in the introduction to his book Space Structures that “one aim of
the present volume is to present fundamental principles underlying this variety of
(polyhedral) structures, to show their family relationships and how they may be
transformed into one another…” (p. xix). Similarly, Holden (1971) suggests that
by truncating selected features of a solid (e.g., the corners of a cube) “(you) can
engage in a useful exercise, which will cultivate your abilities in visualizing
spacial relations and in specifying symmetries.” (p. 57)
Finding relationships between polyhedra is an indispensable way of knitting
together the huge variety of shapes encountered in nature and mathematics.
DESIGNING REAL-TIME SOFTWARE ADVISORS 191
While the HyperGami system does afford students a medium in which many
such relationships may be encountered in principle (i.e., the operations of
truncation, capping, stretching and others are available to students), the program
currently offers little guidance or advice in helping students think about the
types of operations that could be employed to transform one polyhedron into
another. The “advisors” to be described in the following section are intended to
strengthen in particular the skill of seeing the potential relationships between
polyhedra.
As an example of what this type of “seeing” entails — and before we
describe our own attempts to represent aspects of this skill in software — it is
worth looking at a vivid example of a talented seventeenth-century “visualizer”
at work. In Johannes Kepler’s book Harmonices Mundi, he provides sketches of
various polyhedra, some of which suggest the author’s remarkable ability to
interpret polyhedral forms. Consider, for instance, Kepler’s interpretation of the
icosahedron. In his sketch, Kepler displays both the entire icosahedron and a
“parsing” of the shape into three portions, as suggested by the two parts of
Figure 3 below. (For a view of Kepler’s original sketch, see Cromwell (1997)).
In Kepler’s interpretation of the icosahedron, the shape is composed of two
pentagonal pyramids joined onto either side of a pentagonal antiprism. By
dividing the shape in this manner — by “seeing” the icosahedron as composed
of three component shapes — Kepler is able to highlight the relationship between
the icosahedron and other, simpler shapes. (At the same time, Kepler’s sketch
suggests other possible polyhedra that one might wish to construct — e.g., it
would be interesting to vary his construction so that it employs a square, as
opposed to pentagonal, antiprism with two pyramidal caps.)
Figure 3. Kepler’s “parsing” of the icosahedron (at left) into two pyramidal caps and a
central pentagonal antiprism.
Ideally, a suite of online advisors built into HyperGami should help students
work toward the type of understanding of polyhedra exhibited by Kepler. One
advisor might, e.g., suggest ways in which to “slice” polyhedra — in much the
192 MIKE AND ANN EISENBERG
Figure 4. A truncated tetrahedron (left) is redisplayed with one of its hexagonal faces
highlighted to show three pairs of parallel edges (at center). At right, a new shape derived
from the truncated tetrahedron by adding an “edge cap” on the highlighted face.
Figure 5. An icosahedron, redisplayed with a “slicing rectangle” at center. Note that the
upper right vertex of the highlighted rectangle is one of the hidden vertices in the view at
left. At the right, the two resulting halves of the sliced icosahedron.
In the second instance, our slicing-advisor takes the icosahedron as input and
redisplays the shape with the highlighted pentagon as shown in Figure 6. Here,
the advisor is providing us with a set of vertices, all linked by edges, through
which the icosahedron might be sliced. (Note the difference between this piece
of advice and that shown in Figure 5: in that case, the four edges of the “slicing
rectangle” did not all appear among the edges of the icosahedron.)
194 MIKE AND ANN EISENBERG
Figure 6. An icosahedron, redisplayed with a set of “slicing edges” (in this case forming a
regular pentagon) at center. The resulting slices are shown at right. (Note the similarity to
the “slicing” operation used by Kepler as represented earlier in Figure 3).
Two more examples of advisors at work may serve to indicate the general utility
of the idea. The “triangle-exchange” advisor simply looks for instances within a
starting polyhedron of two triangular faces that share an edge. Once found, these
two triangles will be suggested as a possible site for the operation shown
graphically in Figure 7. Here, the original edge between the two triangles is
removed and replaced by an edge between the (hitherto unconnected) opposite
vertices of the two faces. In the figure we see the operation of the “triangle-
exchange” advisor as applied to a capped cube. The original shape is given as
input to the advisor, which highlights the pair of adjacent triangles shown at
center, indicating that these two faces are a possible site for an exchange
operation. Once that operation is performed, we obtain the shape depicted at the
right of the figure.
Figure 7. A capped cube (left), redisplayed with two adjacent triangles highlighted (at
center). When these two triangles are “exchanged”, their adjoining edge is removed and
replaced by an edge between the formerly unconnected vertices to form the “notched” shape
at right.
Finally, the “truncation advisor” looks for vertices in a solid such that the line
connecting the vertex with the “midpoint” of the solid is an axis of rotational
symmetry for the solid. As an example, consider once more the capped cube (as
shown again in Figure 8). Here, the advisor suggests the topmost “cap” vertex as
DESIGNING REAL-TIME SOFTWARE ADVISORS 195
a likely site for truncation, since the capped cube has fourfold rotational symme-
try about the axis joining that vertex and the center of the solid. (In contrast,
none of the other vertices of the solid has this property.) Once the truncation
operation has been performed, we obtain the shape at the right of the figure.
Figure 8. A capped cube (at left) is redisplayed with a highlighted “top” point (at center).
After the solid is truncated at the suggested vertex we obtain the shape at right.
The already-developed software advisors described above represent only one step
toward our eventual goal of building computational advisors for visual and
spatial reasoning. Some interesting HyperGami operations (such as “stretching”
shapes, or joining two shapes together at a face) have yet to be accompanied by
advisors; moreover, some interesting geometric operations (such as stellating
polyhedra) are not currently implemented in HyperGami at all. The current set
of advisors, while representing an interesting beginning, is still relatively simple:
for instance, the “slicing” advisor described in the previous section can suggest
only one slicing operation at a time, and hence only one of the two slicing
operations implicit in Kepler’s parsing of the icosahedron shown in Figure 3
earlier. It would thus be desirable to extend even the current set of advisors so
that they could suggest richer variations or combinations of polyhedral opera-
tions: just to take one more example, an advisor could suggest multiple vertex
truncation operations instead of only one (as in our current implementation of the
truncation advisor).
These are short-term goals for improvement of our existing system.
Currently we are also at work in devising computational models whose purpose
is to provide a meaningful metric of the “visualizability” of solids — the
difficulty or ease with which a shape is likely to be imagined by a typical
student. As part of this effort, we have conducted psychological experiments to
196 MIKE AND ANN EISENBERG
Acknowledgments
This work is supported in part by the National Science Foundation under awards no. REC-961396
and CDA-9616444, by a Young Investigator award (IRI-9258684), and by the National Science
Foundation and the Advanced Research Projects Agency under Cooperative Agreement No.
CDA-9408607. The second author is supported by a fellowship from the National Physical Science
Consortium.
DESIGNING REAL-TIME SOFTWARE ADVISORS 197
Notes
1. See also the many fine papers in Sutherland and Mason (1995) around this general theme.
2. More detail may be found by following the HyperGami link from the first author’s World Wide
Web page: www.cs.colorado.edu/~duck/
3. Again, see (Eisenberg, Nishioka, and Schreiner 1997) for more details.
References
Robert K. Lindsay
University of Michigan
Introduction
The Logic Theorist (LT) of Newell, Shaw & Simon (1957) introduced a
computational style, the first employed in artificial intelligence. The program
sought proofs to theorems of the sentential calculus, an unquantified formal
logic. (An unquantified logic lacks the universal quantifier “for all x” and the
existential quantifier “there exists an x” present in a full predicate logic.) Axioms
and theorems were represented as uninterpreted “sentences,” that is, strings of
characters selected from a finite alphabet. The logical rules of inference, also
represented as character strings, can be applied to sentences to produce other
sentences. A proof is a sequence of rule applications that, when applied to an
initial set of sentences (axioms and previously-proved sentences), yields a
sentence that is the conclusion of the theorem. To discover this sequence of rule
applications, the program determined how the goal and givens differed and
found rules that would reduce that difference.
LT represented the mathematical system in a way that mimicked its standard
printed structure, namely one-dimensional objects (strings) with an implicit
hierarchical phrase-structure imposed by the formation rules (the grammar) of the
language. For example, one axiom-sentence is “(p or p) implies p” in which the
connectives “or” and “implies” could be written as single characters since they
have no internal structure, but are here written as English words for clarity. The
symbol “p” in this sentence is a variable that stands for any well-formed
sentence of the logic. The reader may think of sentences as being either true or
200 ROBERT K. LINDSAY
false and interpret the logic as showing how the connectives (interpreted as
logical operations) can combine true sentences to yield other true sentences.
However, LT had no such understanding and merely manipulated sentences
according to its rules. Parentheses indicate phrase grouping in the usual way
(e.g., “implies” connects two subphrases in the axiom above), but again this is
our interpretation, not LT’s. The rules of the calculus formally define the legal
transformations of sentences that result in (i) legal substitutions of sentences for
variables, (ii) the replacement of a connective by its definition, and (iii) detach-
ment, i.e., the rule of modus ponens (“p” together with “p implies q” allows one
to conclude “q”). Logics and other fully formal branches of mathematics use
these representations — a linear surface structure with an implicit hierarchical
phrase structure — extensively. The rules used by the Logic Theorist maintained
the syntactic constraints: they applied only to well-formed sentences and each
sentence generated was well-formed by definition.
This logic is said to be uninterpreted because the variables do not refer to
real or even abstract objects or sets of objects or any things that are related or
interacting. As noted above, as far as LT was concerned they did not even refer
to truth values. In the case of uninterpreted sentences such as those of the
sentential calculus the computational style of the Logic Theorist seems quite
natural and powerful. However, for interpreted systems such as predicate logic,
whose sentences are normally meant to refer to sets of objects with various
properties and structure, or geometry, where the sentences are intended as
descriptions of real objects in space, or most other fields of mathematics that are
“about” something, one might also represent these interpretations, i.e., the
semantics, and bring those constraints to bear on proof discovery.
Another early system, the Geometry Machine (Gelernter, 1959; Gelernter,
Hansen & Loveland, 1960) employed essentially the method of LT, but intro-
duced semantics in a limited way, by employing a heuristic to prune the set of
rule-application options to be considered. A “diagram” illustrating the theorem
was constructed by the programmer and supplied to the Geometry Machine as a
list of points and lines, represented as coordinate pairs and endpoint pairs,
respectively. The program could then determine the falsity of certain conjectures,
vis-a-vis this supplied interpretation, by examining the diagram. For example,
when it conjectured that two triangles were congruent, their perimeters in the
diagram were compared in the obvious way, by numerical computation. If the
perimeters differed, no attempt was made to prove congruency.
Subsequent to the Geometry Machine, other work has been done in an
attempt to incorporate semantic interpretations into geometry theorem proving.
The goal of such work has in part been to reduce the space of formal operations
USING SPATIAL SEMANTICS 201
in such a way that the structural properties of space (distance, direction, etc.) are
maintained. In addition, other constraints, called situation constraints, may be
stated and enforced. Situation constraints are not simply properties of space, but
are assertions about a particular diagram, such as that two specific line segments
must always be of the same length. The program can be instructed to construct
additional elements of the diagram, and can be instructed to manipulate existing
features, for example by translating an object in a given direction or rotating an
object about a given point. These manipulations are called simulation construc-
tions, and are accomplished by an algorithm that makes incremental changes
while maintaining all constraints. Finally, the program can look at what it has
constructed and what results from its manipulations and can detect certain newly
emerged properties, such as that two objects have become superimposed, or that
a new polygon has emerged by the conjoining of previously disconnected line
segments.
Thus the primitive operations of this system are not processes that recog-
nize, analyze, and manipulate linear-hierarchical formal sentences while main-
taining their grammatical structure, but are perceptual and cognitive (simulated
motor) processes that operate on structures that represent and maintain the
semantic structure of two-dimensional space.
Given this functionality, it has been possible to “demonstrate” to the
program certain propositions of geometry. Many of these were taken from the
interesting compendia by Loomis (1940) and Nelsen (1993). Examples may be
found in Lindsay (1998).
The result of this work is a system that can understand geometry in a
limited sense. It can verify that performing the manipulations while honoring the
spatial structure and maintaining the imposed situation constraints results in a
representation that establishes the truth of a proposition (such as that the
hypotenuse square of a right triangle has area equal to the sum of the areas of
the squares on the other two sides). The program’s understanding of the generali-
ty of such demonstrations is limited since it is based on observation and manipu-
lation of a particular instance of a diagram. That problem is addressed by other
computational methods that I will not describe here (see Lindsay 1998).
Discovering demonstrations
g c = (A+C, B+D)
d
(0, D) = e
j n h
k
b = (A, B)
(0, 0) = a l f = (A, 0)
A B = AD BC
C D
is the difference in the areas of the two rectangles. The proposition as stated by
Nelsen is that the value of this determinant equals the area of the large parallelo-
gram. Transforming this statement to eliminate the minus sign yields the target
proposition stated above.
Let us call the method now to be described Discover-1. It also works in
exactly the same form for other demonstrations that depend upon establishing an
area equivalence, including other examples in Lindsay (1998).
Discover-1 first attempts to see if it can verify the target from what it
knows already. It does this by consulting a list of known area-equivalences
(which initially is empty) and attempting to combine them by substitutions of
equivalences and by using the properties of symmetry and transitivity of the
equivalence relation. The knowledge of the properties of the equivalence relation
and of substitution of equals is built into the program and is not related to its
knowledge of space. Without such extra-spatial knowledge none of these “proofs
without words” could be understood (or even verified). The geometric axiom
system of Euclid includes such properties as axioms of “common notions,” and
the axiom system of Hilbert explicitly presupposes such properties as well. Thus
predicative (“verbal”) reasoning is an essential component of understanding these
demonstrations. It is also essential for understanding geometry more generally,
since predications are required for the statement of general conclusions.
USING SPATIAL SEMANTICS 205
Discover-1 next examines each side of the target proposition equation. Each
object mentioned on either side is considered in turn. For each of these objects,
every possible decomposition into a set of other objects is detected, such that the
set exactly covers the initial object without overlap. For example, triangle ade
can be decomposed into triangle adk plus triangle dek.
These computations are not done algebraically, but make use of the pixel
representation and its underlying eight neighborhood relations that permit one to
find the immediate neighbors of a given pixel. The method is a “color-spreading”
iteration; see Ullman (1985). The interior of an object is marked by first labeling
the border with one “color”, say red. (This merely means that the associated
pixels are marked with a “red” tag; when the diagram is displayed on a color
monitor they in fact show as red.) Starting with an interior point of the object,
that point is colored a different color, say orange, and each neighbor is exam-
ined. If a neighbor is border-colored or interior-colored already, it is ignored.
Other neighbors are colored the interior-color and added to the list of points
whose neighbors remain to be examined. The iteration continues until no more
pixels remain on that list. Notice that this procedure works for closed objects of
any shape, including non-convex polygons, circles, and any closed curve, such
as Figure 2.
It should be pointed out that the discovery of an interior point is an
interesting problem itself. This is a problem that human perception appears
readily able to solve, provided the object is not too complex. Exactly what makes
an object too complex for human perception is a matter for empirical study, but
it is easy to find non-trivial examples such as Figure 3. Unlike the simple cases
of Figures 1 and 2, where a person can immediately determine if a point is
within a given object, examining Figure 3 requires a slower, sequential analysis,
perhaps aided by pointing and tracing with a pencil.
Here Discover-1 relies on the fact that its objects are polygons. It discovers
interior points by examination of several candidates such as the intersection of
diagonals (which always works for non-degenerate convex polygons). The
general problem of finding an interior point cannot be solved by Discover-1, and
is indeed a subtle perceptual computation in the general case if the representation
is a pixel array.
The next step in finding decompositions is to determine which other objects
are contained within the putative container object. This is true of a polygon if its
vertices are now either the interior-color or the border-color. This is not a
sufficient condition for complete containment (consider a non-convex container),
but partially overlapping false candidates are eliminated in the following steps.
Next, the collection of all subsets of the set of contained objects is enumerated.
206 ROBERT K. LINDSAY
Figure 2. How many closed curves are there, 0, 1, 2, or 3? Which of x, y, and z are within
one of them?
For example if there are three objects a, b, and c, there are seven non-null
subsets: {a}, {b}, {c}, {a,b}, {a,c}, {b,c}, and {a,b,c}. In general with n objects
there are 2n–1 non-null subsets. Each of these subsets may or may not exactly
decompose the initial object. There may be a very large number of such subsets
to try. For example, there are nine objects that are at least partly within parallelo-
gram bcda, yielding 511 subsets to examine (excluding the null set). To cull this
large number, Discover-1 first considers the nine objects pair-wise and deter-
mines which pairs overlap; any two objects that overlap cannot both be part of
an exact decomposition. Testing for overlap is done with a color-spreading
procedure also. First the interior of one member of the pair is colored a distinct
color, say yellow, and then the second interior is colored differently, say green,
but coloring stops if yellow is encountered. Culling the 511 subsets mentioned
above in this way leaves only 38 to be checked for exact fit.
This last step is done by tracing the boundary of the putative container and
then tracing the boundaries of the subset of partially contained objects, halting if
USING SPATIAL SEMANTICS 207
y z
Figure 3. How many closed curves are there, 0, 1, 2, or 3? Which of x, y, and z are within
one of them? Inspired by Minsky and Papert (1969).
the number of retracings is inconsistent with exact decomposition. The details are
given in Lindsay (1998).
The result of these steps is to produce a set of area equivalences of the
form: the area of the container equals the sum of the areas of the contained.
With these additions to the list of known area equivalences, a test is again made
to see if the target proposition follows from these by substitution, symmetry, and
transitivity.
If this fails, as it does in this example, Discover-1 determines all possible
congruencies that it sees in the diagram. For example, each pair of triangles is
considered in turn. The program determines if one can be rotated and translated
onto the other. If so, it generates the program that could demonstrate this by
using the simulation construction algorithm. This program could actually be
executed to effect the demonstration. However, here it is assumed that the
program at this stage of its education knows that two triangles are congruent if
they have sides that are equal pair-wise in the same order. (If they are equal but
not in the same order they are still in fact congruent but one of them would have
to be flipped in three-dimensional space to achieve coincidence, and Discover-1
208 ROBERT K. LINDSAY
does not know this because it does not know about three-dimensional space, so
it cannot generate a simulation-construction program to do this and fails to detect
this form of congruency.)
Every pair of congruent figures yields an area equivalence statement, and
these are added to the list of known equivalences. In our example, this yields
enough information to verify the target proposition.
Discussion
Discover-1 works both forward and backward, beginning with the goal proposi-
tion but then working from all present figures to see what inferences can be
made. This proves to be a successful strategy for these demonstrations. However,
Discover-1 is tailored to the kinds of operations that are useful in this and related
demonstrations, those involving establishing a relation among areas, such as in
the determinant problem and the Pythagorean Theorem. It is not a general
method for discovering demonstrations.
Different methods, Discover-2, Discover-3, and so forth, will be needed for
different classes of problems. What these methods will have in common is that
they each will construct sequences of reasoning that reduce to manipulations of
a diagram, that is, they work on the semantics of the diagram rather than the
syntax of formal propositions. Again, this is not to say that they do not also use
predicative reasoning or are not descriptive.
A more subtle question is whether these semantic methods employ predica-
tive reasoning in the diagrammatic manipulations themselves. Here again, the
answer is yes, although the predications are related to the semantics of space
rather than to the algebra of equivalence relations. I call ARCHIMEDES’
reasoning diagrammatic not to allege it is totally non-predicative, but because the
structure of space is maintained by representations of diagrams, and this structure
is used in substantive ways to make inferences.
Not all mental processes should be characterized as predicative. For example
the sophisticated and skilled perceptual-motor processes exhibited by most
mammals are more appropriately characterized as temporally continuous process-
es, albeit processes that on some occasions yield a result that can be stated as a
predication. Some manipulations of diagrams may appropriately be described as
non-predicative perceptual-like processes. However, reasoning refers to a special
class of cognitive mental processes that, by definition, employ explicit predications.
In most if not all cases these predications include generalizations, certainly so in
the case of mathematics. Therefore if one insists that diagrammatic computations
USING SPATIAL SEMANTICS 209
they do such things as find interior points and area decompositions. Each of
these functions could be realized with a variety of methods. For example, curves
might be described algebraically and locations of points with respect to these
curves could be computed numerically or symbolically. In ARCHIMEDES they
were implemented by edge-following and color-spreading methods. Color-
spreading computations have the virtue that they apply to arbitrary closed curves
as readily as to polygons, and that is consonant with human performance even as
measured informally by self-observation. Thus color spreading methods bear a
certain degree of psychological plausibility that algebraic methods do not,
although this in and of itself offers minimal support for psychological claims.
Having selected edge-following and color-spreading as the methods that
implement the interior finding and decomposition functions, one may still choose
among a variety of computational styles that follow edges and spread colors. In
the present work these have been implemented as pixel-array operations on a
digital computer using a list processing language. Once an implementation has
been made the operations have a certain efficiency profile (i.e., solution-time for
each problem). The same function and method (e.g., area-decomposition by
color-spreading) could be implemented in different ways on different computer
architectures, say an array processor, neural net, or brain. A fully implemented
psychological theory would be tested primarily by comparing its efficiency
profile with that of humans, using the usual methods of empirical psychology.
For example, it is not clear why finding interiors by color-spreading should
be more difficult for complex stimuli than for simple ones (compare Figure 2
and Figure 3), as alluded to above. And yet humans, even when working from
a physical display, readily get lost on complex figures and are aided by actual
pencil shading or tracing analogous to the calculations of the color spreading
method. When working from mental images rather than physical diagrams,
humans have much greater difficulty keeping track of what has been examined,
and yet are able to do so for at least intermediate levels of complexity. A good
psychological model would posit implementations that separate the spreading
from the bookkeeping aspects of these methods and show how they respond
differentially to memory loads.
Present knowledge from cognitive neuroscience gives little assistance in
determining how such algorithms could be embodied in brain. To the extent that
the implementations are plausible as putative psychological processes by virtue
of a reasonable efficiency profile, they might serve as a guide for neuroscience
explorations. Thus it is conceivable, as the methods of cognitive neuroscience
develop, that one could seek evidence that the brain does or does not implement
these methods in specific ways with specific neural structures and functions.
USING SPATIAL SEMANTICS 211
Acknowledgments
This material is based on work supported by the United States National Science Foundation under
Grant No. IRI-9526942.
References
Introduction
tions, most of them expressing spatial relations (Johnson 1987). Image schemata
(Johnson 1987; Lakoff 1987) are the fundamental experiential elements from
which spatial meaning is constructed, but so far image schemata have mostly
resisted formal descriptions. This paper shows exemplar formalizations of image
schemata important in the geographic context (PATH) and in tabletop space
(CONTAINER, SURFACE). This investigation is, therefore, part of the quest for
naive or commonsense physics (Hayes 1978, 1985; Hobbs & Moore 1985) and
in particular for “Naive Geography” (Egenhofer & Mark 1995).
The next section argues why the formalization of spatial relations in
geographic space is crucial for further advances in the standardization and
interoperability of GIS. In Section 3 the specification of image schemata is
discussed and Section 4 describes methods to formalize image schemata.
Section 5 gives a comprehensive method — built upon linguistics — to discover
and formally describe image schemata. Section 6 explains exemplar image
schemata for geographic and tabletop space (i.e., PATH, CONTAINER, SUR-
FACE) and presents their formalizations. Section 7 presents conclusions,
discusses open questions, and suggests directions for further research.
The spatial domain — in which GIS facts are situated — is fundamental for
human living and one of the major sources for human experience (Barrow 1992).
Human language exploits the communality of spatial experience among people
and uses spatial situations metaphorically to structure purely abstract situations
in order to communicate them (Lakoff & Johnson 1980; Johnson 1987). The
formalization of spatial relations has, therefore, been an active area of research
at least since 1989 (Mark et al. 1995).
Topological relations between simply connected regions were treated in
(Egenhofer 1989) and extensive work has followed from this (Egenhofer 1994).
Metric relations between point-like objects, especially cardinal directions (Frank
1991b; Frank 1991a; Freksa 1991; Hernández 1991) and approximate distances
(Frank 1992; 196b; Hernández et al. 1995) were discussed. Other efforts dealt
with orderings among configurations of points (Schlieder 1995) and formal
descriptions of terrain and relations in terrain (Frank et al. 1986), but formal
methods were also used to formally describe the working of administrative
systems — e.g., cadastre (Frank 1996a). Linguists have made systematic efforts
to clarify the meaning of spatial prepositions (Herskovits 1986, 1997; Lakoff
1987). However, it remains an open question how to combine these interesting
FORMAL SPECIFICATIONS OF IMAGE SCHEMATA 215
Johnson (1987) proposes that people use recurring, imaginative patterns — so-
called image schemata — to comprehend and structure their experiences while
moving through and interacting with their environment. Image schemata are
supposed to be pervasive, well defined, and full of sufficient internal structure
to constrain people’s understanding and reasoning. They are more abstract than
mental pictures and less abstract than logical structures because they are con-
stantly operating in people’s minds while people are experiencing the world
(Kuhn & Frank 1991). An image schema can, therefore, be seen as a very
generic, maybe universal, and abstract structure that helps people to establish a
connection between different experiences that have this same recurring structure
in common.
Despite efforts, success in specifying spatial image schemata has been limited.
An early paper (Kuhn & Frank 1991) gave algebraic definitions for the CON-
TAINER (“in”) and SURFACE (“on”) schemata for a discussion of user
interface design. At the level of detail and for the purpose of the paper, the two
specifications were isomorphic. A recent effort by Rodríguez and Egenhofer
(1997) introduced more operations and differentiated the CONTAINER schema
from the SURFACE schema for small-scale space, using operations such as
remove, jerk, and has_contact, and compared the application to objects in small-
scale and large-scale (geographic) space. Raubal et al. (1997) presented a
methodology based on image schemata to structure people’s wayfinding tasks.
216 ANDREW U. FRANK AND MARTIN RAUBAL
Image schemata were represented in the form of predicates in which the predi-
cate name referred to the image schema and the argument(s) referred to the
object(s) involved in the image schema (see also Raubal 1997).
In a recent paper (Frank 1998) formal descriptions for the small-scale-space-
image-schemata CONTAINER, SURFACE, and LINK were given and some of
the methodological difficulties reviewed. The large-scale-space-image-schemata
LOCATION, PATH, REGION, and BOUNDARY were treated in (Frank &
Raubal 1998).
The concept of image schemata is not well defined in the cognitive and linguistic
literature (Lakoff & Johnson 1980; Johnson 1987; Lakoff 1987). Researchers in
the past have used a working definition that implied that image schemata
describe spatial (and similar physical) relations between objects. Most have
concentrated on spatial prepositions like “in”, “on”, etc., and assumed that these
relate directly to the image schemata (Freundschuh & Sharma 1996; Raubal et al.
1997; Raubal 1997).
Image schemata are seen as fundamental and independent of the type of
space and spatial experience. But a single schema can appear in multiple, closely
related situations. For example, “in” is used for a bowl of fruit (“Der Apfel ist
in der Schale.” — “The apple is in the fruit bowl.”), but also for closed contain-
ers (“Das Geld ist im Beutel.” — “The money is in the purse.”). “Prototype
effects” as described by Rosch (1973a,b, 1978) also seem to apply. For example,
different levels of detail can be selected to describe the same image schema.
Predicate calculus
Relations calculus
The behavior of topological relations (Egenhofer 1994; Papadias & Sellis 1994),
but also cardinal directions and approximate distances (Hernández et al. 1995;
Freksa 1991; Frank 1992; Frank 1996b) can be analyzed using the relations
calculus (Schröder, 1895; Tarski 1941; Maddux 1991). Properties of relations are
described as the outcome of the combination (the “;” operator) of two relations.
The description abstracts away the individuals related (in comparison to the
predicate calculus) and gives a simple algebra over relations. This leads to
succinct and easy-to-read tables, as long as the combination of only a few
relations is considered.
a (R;S) c = aRb and bSc
for example: North;NorthEast = {North or NorthEast}
meet;inside = {inside, covered, overlap}
Functions
Functions are more appropriate to capture the semantics of image schemata with
respect to operations. Relation composition is replaced by function composition
(the “.” operator). In order to use this notation flexibly, a “curried” form of
function writing must be used (Bird & Wadler 1988; Bird & de Moor 1997).
f g (x) = f (g (x)).
Function composition can be described by tables as well, but these grow even
faster than relation composition tables. Axiomatic descriptions as algebras are
more compact but also more difficult to read.
218 ANDREW U. FRANK AND MARTIN RAUBAL
Model based
A model of the scene is constructed and used for reasoning — there is some
evidence that this is also one of the methods humans apply (Knauff et al. 1995).
A fundamental set of operations to construct any possible state of the model and
a sufficient number of “observe” operations to differentiate any of these states
are provided. In addition, more complex operations can be constructed using the
given operations.
The simplest model is to use the constructors of the scene directly and to
represent each scene as the sequence of constructors that created it (Rodríguez
1997). This gives a (possibly executable) model for functional or relation
oriented description.
Such models can be ontological — modeling some subset of the existing
world — or they can be epistemological — modeling exclusively the human
conceptualization of the world. More than one epistemological view can follow
from an ontological model.
Tools used
Mark and Frank (1996) showed how image schemata can be deduced from
natural-language expressions describing geographic situations. The image schema
that has been in the speaker’s mind while making a statement can be inferred
from the preposition (e.g., in, on, under) used (Mark 1989). The same approach
FORMAL SPECIFICATIONS OF IMAGE SCHEMATA 219
was also used by Freundschuh & Sharma (1996), Raubal et al., (1997) and Frank
(1998). A number of restrictions and assumptions are necessary to make progress
with this line of investigation:
Assumption of polysemy
A single word may have multiple meanings (e.g., the English word “spring” can
be the verb “to jump”, a season, a source, etc.). We assume that polysemy helps
to initially separate what are potentially different meanings of a word for
formalization. If the meanings are the same after formal description is achieved,
the assumed polysemy can be dropped.
We assume that image-schematic details for geographic space are separate from
image-schematic details for small-scale space (Montello 1993; Couclelis 1992).
Some of Johnson’s (1987) suggested image schemata use terminology from
geographic space (e.g., PATH), others suggest that the same image schema (e.g.,
SURFACE) is used for different types of spaces. If the same terminology is
used, we assume here — for methodological reasons — polysemy.
The examples given here are in German (with English translations) as this is the
authors’ native language; the results can be compared with the English language
situation and some differences observed (Herskovits 1986; Montello 1995). The
language examples are the driving force here and the concentration is on the
epistemology.
names, and all relations are known or inferred from the image schemata. This is
all typically not the case in natural human discourse.
A path connects places. We differentiate between the simple “direct path” and
the “indirect path”, which consists of a sequence of “direct paths.” At this level,
different types of paths are not differentiated (i.e., no particulars of railways,
highways, etc., are considered).
A direct path connects locations directly, without any intervening location
(at the level of detail considered). A direct path has a start and an end location
(Figure 1a). At this level of detail there is no need to model ‘path’ as an object,
just as a relation between two places (path (a, b)).
Es gibt einen Weg von Wien nach Baden.
There is a path from Vienna to Baden.
For this environment (but not for a city with one-way streets) the path relation
is symmetric (Figure 1b):
ind-path (a, b) = [path (a, a1) & path (a1, a2) & path (a2, …) & … & path
(…, bn) & path (bn, b)]
conv (ind-path) = ind-path
The indirect path is derived using transitive closure. The details of the algorithm are
particular to deal with cyclic and bi-directional graphs as formed by path networks
and well known as the shortest path algorithm (Dijkstra 1959; Sedgewick 1983).
Persons (and other autonomous and movable objects)
Persons move to places and are then “in” the place, unless they move further:
Er ist nach Györ gefahren; jetzt wartet er dort auf dich.
He went to Györ; now he is waiting there for you.
scene2 = move (place1, scene1) ⇒ isIn (place2, scene2)
If a person is found “in” place p1 at time t1 and place p2 at time t2 one can
deduce a move (Figure 3):
Figure 3. Move
Simon war letzte Woche in der Steiermark; jetzt ist er wieder in Wien.
Ist er am Samstag oder am Sonntag nach Hause gefahren?
Last week Simon was in Styria; now he is back in Vienna.
Did he drive home on Saturday or Sunday? (move inferred in the time in-
between)
FORMAL SPECIFICATIONS OF IMAGE SCHEMATA 223
Box
Cube
Label
Figure 4. Cube is in box. Not permitted: paste label on cube
Formal model
A function composition model can be constructed and the rules listed are directly
coded. The central operation “move” for one example given with the arguments:
relation type, object, target, scene is shown below;
FORMAL SPECIFICATIONS OF IMAGE SCHEMATA 225
move i a b s =
if fRel’ In b s –rule 7.2 : in blocks target of movement
then error (“in blocked: already in”)
move i a b s
Methodological
The method used here is borrowed from linguistics. For linguistic demonstra-
tions, a single utterance that is acceptable by a native speaker is sufficient to
demonstrate the existence of a construct. Is a single commonsense reasoning
chain as given here sufficient? It documents that at least one situation exists
where the suggested spatial inference is made — thus it demonstrates at least one
aspect of a spatial relation in (one human’s) cognition. In order to verify the
universality of such spatial inference mechanisms, extended human subjects
testing among people with different native languages is needed.
Language-independent primitives
railway lines, cars cannot follow a footpath, etc., and similar restrictions apply in
other cases). Possibly, the current approach of trying to capture image schemata
with the definition of spatial prepositions is too limited. Raubal et al. (1997) used
prepositions and semantic connotation to investigate superimpositions of image
schemata. Another interesting approach is to look at affordances. Affordances
seem to be closely related to image schemata because both of these concepts
help people to understand a spatial situation in order to know what to do (Gibson
1979). Affordances might be operational building blocks of image schemata but
further research in this area is needed (Jordan et al. 1998).
Are image schemata the atoms of spatial cognition or are there smaller semantic
units from which image schemata can be composed? It appears as if these were
smaller pieces from which the more complex image schemata could be built, but
one could also argue that these are the image schemata proper.
Acknowledgments
Numerous discussions with Werner Kuhn, Max Egenhofer, David Mark, and Andrea Rodríguez have
contributed to our understanding of image schemata and their importance for GIS. We appreciate the
efforts of Roswitha Markwart to edit the manuscript. Damir Medak prepared the environment used
here for formalization. Funding from the Oesterreichische Nationalbank and the Chorochronos project
supported the base work underlying the formalization presented here.
References
Asperti, Andrea & Guiseppe Longo (1991). Categories, types and structures: An introduc-
tion to category theory for the working computer scientist. Cambridge, Mass.: MIT
Press.
Barr, Michael & Charles Wells (1990). Category Theory for Computing Science. London:
Prentice Hall.
Barrow, John D. (1992). Pi in the Sky: Counting, thinking and being. Oxford: Clarendon
Press.
Bird, Richard & Oege de Moor (1997). Algebra of Programming. London: Prentice Hall.
Bird, Richard & Philip Wadler (1988). Introduction to Functional Programming. Hemel
Hempstead, UK: Prentice Hall International.
228 ANDREW U. FRANK AND MARTIN RAUBAL
Buehler, Kurt & Lance McKee (Eds.) (1996). The OpenGIS Guide: An Introduction to
interoperable geoprocessing. Wayland, Mass., USA: The OGIS Project Technical
Committee of the Open GIS Consortium.
Campari, Irene & Andrew U. Frank (1995). Cultural differences and cultural aspects in
GIS. In Nyerges, T., D. Mark, R. Laurini & M. Egenhofer (Eds.), Cognitive Aspects
of Human-Computer Interaction for Geographic Information Systems (249–266).
Dordrecht, The Netherlands: Kluwer.
Chevallier, J. (1981). Land information systems: A global and system theoretic approach.
In proceedings of FIG International Federation of Surveyors, Montreux, Switzerland,
3, 301.2.
Clocksin, William F. & Christopher S. Mellish (1981). Programming in Prolog. Berlin:
Springer-Verlag.
Couclelis, Helen (1992). People manipulate objects (but cultivate fields): Beyond the
raster-vector debate in GIS. In Frank, A. U., I. Campari & U. Formentini (Eds.),
Theories and Methods of Spatio-Temporal Reasoning in Geographic Space [Lecture
Notes in Computer Science 639, 65–77]. Berlin: Springer-Verlag.
Dijkstra, Edsger (1959). A note on two problems in connection with graphs. Numerische
Mathematik, 1, 269–271.
Egenhofer, Max (1989). Spatial query languages. Ph.D. thesis, University of Maine,
Orono, USA.
Egenhofer, Max (1992). Why not SQL! International Journal of Geographical Information
Systems, 6(2), 71–85.
Egenhofer, Max (1994). Deriving the composition of binary topological relations. Journal
of Visual Languages and Computing, 5(2), 133–149.
Egenhofer, Max & David Mark (1995). Naive geography. In Frank, A. U. & W. Kuhn
(Eds.), Spatial Information Theory: A theoretical basis for GIS. Int. Conference
COSIT’95 [Lecture Notes in Computer Science 988, 1–15]. Berlin: Springer-Verlag.
Frank, Andrew U. (1991a) Qualitative spatial reasoning with cardinal directions. In
proceedings of 7th Austrian Symposium on Artificial Intelligence, Vienna, Austria,
157–167.
Frank, Andrew U. (1991b). Qualitative spatial reasoning about cardinal directions. In
Mark, D. & D. White (Eds.), proceedings of Auto-Carto 10, ACSM-ASPRS,
Baltimore. 148–167.
Frank, Andrew U. (1992). Qualitative spatial reasoning about distances and directions in
geographic space. Journal of Visual Languages and Computing, 1992(3), 343–371.
Frank, Andrew U. (1994). Qualitative temporal reasoning in GIS: Ordered time scales.
Technical Report. Dept. of Geoinformation, Technical University Vienna.
Frank, Andrew U. (1996a). An object-oriented, formal approach to the design of cadastral
systems. In Kraak, M. & M. Molenaar (Eds.), proceedings of 7th Int. Symposium on
Spatial Data Handling, SDH’96, Delft, The Netherlands, 5A.19–5A.35.
Frank, Andrew U. (1996b.) Qualitative spatial reasoning: Cardinal directions as an
example. International Journal of Geographical Information Systems, 10(3), 269–290.
FORMAL SPECIFICATIONS OF IMAGE SCHEMATA 229
Johnson, Mark (1987). The Body in the Mind: The bodily basis of meaning, imagination,
and reason. Chicago: University of Chicago Press.
Jones, Mark (1991). An introduction to Gofer. Technical Report. Yale University, USA.
Jones, Mark (1994) The implementation of the Gofer functional programming system.
Technical Report YALEU/DCS/RR-1030. Yale University, USA.
Jordan, Troy, Martin Raubal, Bryce Gartrell & Max Egenhofer (1998). An affordance-
based model of place in GIS. In proceedings of 8th Int. Symposium on Spatial Data
Handling, SDH’98, Vancouver, Canada. 98–109.
Knauff, Markus, Reinhold Rauh & Christoph Schlieder (1995). Preferred mental models
in qualitative spatial reasoning: A cognitive assessment of Allen’s calculus. In
proceedings of 17th Annual Conference of the Cognitive Science Society, Hillsdale,
NJ, 200–205.
Kuhn, Werner (1993). Metaphors create theories for users. In Frank, A. U. & I. Campari
(Eds.), Spatial Information Theory: A theoretical basis for GIS. European Conference
COSIT’93 [Lecture Notes in Computer Science 716, 366–376]. Berlin: Springer-
Verlag.
Kuhn, Werner (1994). Defining semantics for spatial data transfers. In Waugh, T. & R.
Healey (Eds.), proceedings of 6th Int. Symposium on Spatial Data Handling,
SDH’94, Edinburgh, U. K., 973–987.
Kuhn, Werner & Andrew U. Frank (1991). A formalization of metaphors and image-
schemas in user interfaces. In Mark, D. & A. U. Frank (Eds.), Cognitive and
Linguistic Aspects of Geographic Space [NATO ASI Series, 419–434]. Dordrecht:
Kluwer.
Kuhn, Werner & Andrew U. Frank (1999). The use of functional programming in the
specification and testing process. In Goodchild, M., M. Egenhofer, R. Fegeas & C.
Kottman (Eds.), Interoperating Geographic Information Systems [International Series
in Engineering and Computer Science]. Boston: Kluwer.
Lakoff, George (1987). Women, Fire, and Dangerous Things: What categories reveal
about the mind. Chicago: University of Chicago Press.
Lakoff, George & Mark Johnson (1980). Metaphors We Live By. Chicago: University of
Chicago Press.
Liang, S., Paul Hudak & A. M. Jones (1995). Monad transformers and modular interpret-
ers. In proceedings of ACM Symposium on Principles of Programming Languages.
Maddux, R. (1991) The origin of relation algebras in the development and axiomatization
of the calculus of relations. Studia Logica, 50(3–4), 421–455.
Mark, David (1989). Cognitive image-schemata for geographic information: relations to
user views and GIS interfaces. In proceedings of GIS/LIS’89, Orlando, Florida,
551–560.
Mark, David (1993). Toward a theoretical framework for geographic entity types. In
Frank, A. U. & I. Campari (Eds.), Spatial Information Theory: A theoretical basis for
GIS. European Conference COSIT’93 [Lecture Notes in Computer Science 716,
270–283]. Berlin: Springer-Verlag.
FORMAL SPECIFICATIONS OF IMAGE SCHEMATA 231
Mark, David, David Comas, Max Egenhofer, Scott Freundschuh, Michael Gould & Joan
Nunes (1995). Evaluating and refining computational models of spatial relations
through cross-linguistic human-subjects testing. In Frank, A. U. & W. Kuhn (Eds.),
Spatial Information Theory: A theoretical basis for GIS. Int. Conference COSIT’95
[Lecture Notes in Computer Science 988, 553–568]. Berlin: Springer-Verlag.
Mark, David & Andrew U. Frank (1996). Experiential and formal models of geographic
space. Environment and Planning, Series B, 23, 3–24.
McCarthy, John (1980). Circumscription: A form of non-monotonic reasoning. Artificial
Intelligence, 13, 27–39.
McCarthy, John (1985). Epistemological problems of artificial intelligence. In Brachman,
R. & H. Levesque (Eds.), Readings in Knowledge Representation (24–30). Los Altos,
Calif.: Morgan Kaufmann.
McCarthy, John (1986). Applications of circumscription to formalizing common sense
knowledge. Artificial Intelligence, 28, 89–116.
Montello, Daniel R. (1993). Scale and multiple psychologies of space. In Frank, A. U. &
I. Campari (Eds.), Spatial Information Theory: A theoretical basis for GIS. European
COSIT’93 [Lecture Notes in Computer Science 716, 312–321]. Berlin: Springer-
Verlag.
Montello, Daniel R. (1995). How significant are cultural differences in spatial cognition?
In Frank, A. U. & W. Kuhn (Eds.), Spatial information theory: A theoretical basis for
GIS. Int. Conference COSIT’95 [Lecture Notes in Computer Science 988, 485–500].
Berlin: Springer-Verlag.
Papadias, Dimitris & Timos Sellis (1994). A pictorial language for the retrieval of spatial
relations from image databases. In proceedings of 6th Int. Symposium on Spatial
Data Handling, SDH’94, Edinburgh, U. K.
Peterson, John, Kevin Hammond, Lennart Augustsson, Brian Boutel, W. Burton, J. Fasel,
A. Gordon, J. Hughes, P. Hudak, T. Johnsson, M. Jones, E. Meijer, S. Jones, A.
Reid & P. Wadler (1997). The Haskell 1.4 Report. Available from:
http://haskell.org/report/index.html.
Rashid, A., B. Shariff, Max Egenhofer & David Mark (1998). Natural-language spatial
relations between linear and areal objects: The topology and metric of English-
language terms. International Journal of Geographic Information Science, 12(3),
215–246.
Raubal, Martin (1997). Structuring space with image schemata. Master’s Thesis, Universi-
ty of Maine, Orono, USA.
Raubal, Martin, Max Egenhofer, Dieter Pfoser & Nectaria Tryfona (1997). Structuring
space with image schemata: Wayfinding in airports as a case study. In Hirtle, S. &
A. Frank (Eds.), Spatial Information Theory: A theoretical basis for GIS. Int. Confer-
ence COSIT’97 [Lecture Notes in Computer Science 1329, 85–102]. Berlin:
Springer-Verlag.
Reiter, R. (1984). Towards a logical reconstruction of relational database theory. In
Brodie, M., M. Mylopolous & L. Schmidt (Eds.), On Conceptual Modelling,
232 ANDREW U. FRANK AND MARTIN RAUBAL
Introduction
With the integration of dynamic data like video, sound or object moving on the
screen, electronic documents have today a temporal dimension in addition to the
spatial, logical and hypertext dimensions. Thus, the documents, so called
multimedia documents, are characterized not only by components of various
nature — static (text, graph, table, image) or dynamic (sound, video, animation)
— but also by the temporal organization of these components. The description of
the sequence of objects in time is called temporal scenario.
234 M. BÉTRANCOURT, A. PELLEGRIN AND L. TARDIF
In the spatial domain, it has been demonstrated that humans reason from
analogical representations, called mental models, that represent “objects, states of
affairs, sequence of events, the way the world is…” (Johnson-Laird 1983, p
397). According to this theoretical framework, providing a diagram that depicts
the situation improves subjects’ ability to solve spatial problems, since the
diagram helps subjects elaborating and processing the mental model of the
situation.
It has been argued (Baddeley 1993) that temporal and causal relationships
are also coded spatially in working memory. Indeed, Glenberg & Langston
(1992) showed that the understanding of a text describing a four-step procedure
was improved if a diagram was provided, as evidenced through subjects’
performance to inferences on the succession of steps. Moreover, it has been
shown that subjects understand better the functioning of engineered or natural
systems when a diagram is provided (Mayer 1989; Kieras 1992).
Thus, regarding the multimedia documents editing process, the research on
mental model supports the hypothesis that a diagram should facilitate the
authors’ specification of the temporal structure of a multimedia document.
Moreover, the literature provides guidelines concerning how to design a diagram
that would be cognitively appropriate. For our purpose, two aspects are worth of
interest.
First, the temporal dimension is spontaneously drawn from left to right,
following the occidental writing order. Research on the graphic representation of
temporal relations by children of three different cultures (English, Hebrew and
Arab) showed an effect of the language on the temporal concepts. Children of
English language represented the temporal concepts from left to right and
children of Arab language represented the temporal concepts from right to left
(Tversky, Kugelmass & Winter 1991). Moreover, subjects solved inferences on
236 M. BÉTRANCOURT, A. PELLEGRIN AND L. TARDIF
temporal relations better when the diagram was oriented from left to right than
from right to left. In a task of classification, Winn (1982) showed that the
subjects who saw the diagram directed from left to right, had better performance
than the subjects who saw the diagram in the other direction (from right to left).
Winn concluded that people of English language read the diagrams in the same
way they read their language, from left to right and from top to bottom. Thus,
the diagrams which are not oriented in this way are a source of difficulties in
data processing and training.
Second, in order to be effective adjuncts to instructions, diagrams should
“support” three cognitive processes involved in learning (Mayer 1989):
– they should guide learners’ attention to relevant passages and act as an
“external memory”;
– they should help the learners organizing ideas within a coherent structure;
– they should facilitate the integration of the new structure with existing
knowledge in long-term memory, mainly by way of metaphors and analogies.
Even though the editing process is not a learning process, we regard these three
properties as relevant, since the editing process still requires the processing of a
mental model in working memory.
Finally, another difficulty we have pointed out in the first section is the
difficulty to represent graphically a set of solutions. There are two difficulties
from the cognitive point of view.
First, whereas language supports non-determinism, a graphic representation of
temporal relations is necessarily determinist : for example, “A before B” does
not tell how far before is A from B, but a drawing instances this delay to a fixed
value. To our knowledge, no research has ever addressed the issue of integrating
a set of solutions in a single diagram. Second, due to our limited working
memory capacity, we experience difficulties in mentally representing the
simultaneous motion of objects. For example, Hegarty (1992) showed that users
did not mentally animate a pulley system as a whole, but in a piecemeal fashion.
Similarly, if the objects’ position in time was represented statistically, it would
be difficult for users to generate mentally all the possible combinations.
The next section describes the diagrammatic solution proposed in MADEUS
to represent the temporal relationships. We study to what extent the choices
made in MADEUS correspond to the guidelines of the literature. More details on
the application MADEUS are available in Jourdan, Layaïda, Sabry-Ismail &
Roisin (1997).
TEMPORAL STRUCTURE OF MULTIMEDIA DOCUMENTS 237
Static visualization
In order to help the author surveying the whole set of solutions, MADEUS
allows the visualization of the interval of displacement so that the author can
anticipate the behavior of the selected object (criterion of foreseeability). In the
example of Figure 1, the author has selected the object Text2. A red stroke
appears under the line where is placed the selected object (Text2), thus indicating
its latitude of displacement. On our example, Text2 is limited on its left-hand
side by the end of Video1 (Text2 after Video1) and on its right-hand side by the
end of Text1 (Text2 during Text1).
The objects which are connected to the selected object by rigid constraints
get colored in blue. The rigid constraints do not introduce flexibility (for example
starts, finishes, meets). These objects are dependent on the selected object in its
moves. The springs which play the part of shock absorber for the selected object
get colored in green.
Dynamic visualization
The objective is to allow the author to access to the set of solutions of the
scenario by manipulating the objects in the temporal view. Thus, the authors do
TEMPORAL STRUCTURE OF MULTIMEDIA DOCUMENTS 239
not have to generate mentally all the combinations of objects moves, which
would be difficult because of the limited capacity of mental animation we
mentioned before. They can directly manipulate the position of all moving
objects and actually generate the whole set of solutions.
When the authors move the selected object in its entire interval of validity,
the display system updates, in real time, the representation of the corresponding
solution. It is worth noting that the limits of this displacement always correspond
to the complete absorption of a flexible delay of the scenario (a green spring in
the temporal view). Consequently the authors should easily understand the reason
of a blocking. Indeed, the reactions to manipulation provide the authors with an
immediate feedback, where static representation and manipulation are coherent.
Thus, the diagrammatic representation proposed in MADEUS is consistent
with guidelines deduced from the research. However, is this representation
adapted to the specific task of multimedia documents authoring process?
The experiment
This experiment was designed to comply with both ergonomic and cognitive
concerns. We focused the analysis on three questions:
– how do people spontaneously represent the temporal relations between
objects, especially when they are undetermined? from an ergonomic point
of view, does that fit in with the diagrammatic choices made in MADEUS?
– does the diagram facilitate the authors’ processing of temporal relations
compared to the text of constraints alone?
– does the manipulation enable authors to reason on the set of solutions?
Methods
Participants
11 graduates, 8 male and 3 female, were individually tested. 7 were graduated in
computer science, and all were using computers at work. Their age ranged from
27 to 37.
Material
For the first task, we used two scenarios. A text of constraints and a temporal
view were derived from each one. The scenario 2 was the scenario 1 in reverse,
whose objects were renamed. Half subjects saw the text of constraints of scenario
1 and the temporal view of scenario 2, whereas the other half saw the text of
240 M. BÉTRANCOURT, A. PELLEGRIN AND L. TARDIF
Procedure
The experiment consisted of two successive tasks, designed to be close to the
real activity of multimedia document authoring. The participants were briefly
presented with the objectives of a multimedia document authoring tool. Then the
constraints language was explained.
In the first task, the participants were presented with the text of constraints
from one of the scenario and were asked to describe the corresponding presenta-
tion in a written text. They were told they could make a drawing if they wanted.
Then, they were asked three inference questions about temporal relationships,
whose answers were not explicitly indicated in the constraints. Secondly, the
temporal view from the other scenario was presented to the participants and the
representation options (springs, vertical lines) were briefly explained. As for the
text of constraints, participants were asked to describe the corresponding
presentation and to answer three questions. The order in which the two scenarios
were presented was counterbalanced across subjects.
The second task aimed to evaluate whether subjects were able to infer the
set of solutions from the diagram, and to explore it by manipulation. From a
temporal view presented on the screen, the subjects were asked to solve three
problems about the attainability of temporal situations. They were told to answer
first without manipulating, and then, to verify the accuracy of their answer by
manipulating the temporal view. A second temporal view was then presented on
the screen, on which subjects were similarly asked to solve three problems.
Time spent on each task and performance were recorded by the experiment-
er. Moreover, the subjects evaluated on a scale the task difficulty and their
certitude regarding their answer.
Results
Task 1: Reasoning from the textual source or from the temporal view
view (m = 256 s.) than from the list of constraints (m = 526 s.), (F(1,10) = 11.66,
p <.01). Moreover, 36% of subjects (4) made an accurate description of the
presentation from the constraints, whereas 82% of subjects (9) did so from the
temporal view.
Table 1 Number of questions correctly answered and response times (in seconds) in the first
task (m = mean and SD = standard deviation)
Accuracy Response time
Temporal view m = 2.73 m = 32.81
SD = .47 SD = 23.29
Text of constraints m = 1.91 m = 66.27
SD = .83 SD = 56.89
Describing the presentation and answering the questions were judged significant-
ly more difficult to perform from the text of constraints than from the temporal
view (F(1,10) = 13.4, p <.005, without interaction).
In this task, subjects had to solve 6 problems (see appendix), 4 of which required
one or two moves only (simple manipulation) and 2 of which required the
manipulation of two distinct objects in addition to the object mentioned in the
question (complex manipulation). Subjects were first asked whether the manipu-
lation was possible (without manipulating) and then to perform it.
For the simple manipulation problems, only five errors were made (11%),
usually due to the misunderstanding of vertical lines (inflexible constraints).
The two complex manipulation problems were attainable. For the first
problem (question 3 in appendix), without manipulation, 8 subjects out of 11 got
the correct answer. In order to achieve the goal, the manipulation required to
move first another object (Image 1), due to the “before” constraint between
Video1 and Image1 and to the “starting” constraint between Text1 and Video1
(Figure 1). With manipulation, only 3 subjects succeeded in achieving the goal.
The other subjects did not get the idea of moving Image1. For the second
complex manipulation problem, only one subject did not get the correct answer
with and without manipulation. However, 4 subjects were not sure that “the
TEMPORAL STRUCTURE OF MULTIMEDIA DOCUMENTS 243
spring was long enough”. This is quite inconsistent with the concept of spring in
the temporal view, which means a flexible delay with no intrinsic duration.
Appendix
Acknowledgments
We are grateful to André Bisseret, Gilles Bisson, Muriel Jourdan and Cécile Roisin for their useful
comments on earlier drafts, and to Claude Castelluccia for proofreading this manuscript.
Notes
1. The patterns used here to distinguish the types of components — video, text, image, sound —
have not been tested yet.
TEMPORAL STRUCTURE OF MULTIMEDIA DOCUMENTS 245
References
It seems at first sight that the task of formalising the use of spatial language
sufficiently to simulate people’s real use of it is an intractable one.
For example, consider the use of “put” in the following sentences:
1. Put the book on the table.
2. Put the water in the bottle.
3. Put the thread through the eye of the needle.
Similarly, let’s attend to the spatial preposition “on” in the following:
1. Stack the book on the pile of books.
2. Place the book on the table.
3. Float the rubber duck on the water.
We now come to our first generalisation. People will often use general purpose
verbs (e.g. “put”) and prepositions (“on”, etc.) when the context is sufficiently
clear to disambiguate them. The context can be derived from physical properties
of objects (for example, a pile of books requires the action be the stacking
instance of put) or, in absence of such naïve physics evidence, from the preposi-
tion employed. For example, specifying the object must go through the eye of
the needle obviates the need to use thread as a verb instead of put; Put the
thread through the eye rather than thread the needle.
250 LUCA ANIBALDI AND SEÁN Ó NUALLÁIN
motor representations
uses of a given preposition. This position is seen to fail in the attempt to apply
the approach to cases where the use of a preposition diverges from the standard
geometric relation: tolerance and sense-shift phenomena.
Hence, accounting for the flexibility and adaptability in the use of spatial
relations, the class exemplified by (Lakoff 1987) and (Herskovits 1986) claims
that each preposition identifies a central or ideal meaning from which the others
can be obtained via transformations, “categorizing relationships of schematicity”
(Langacker 1988).
In particular, the representation of the meaning of locative prepositions
viewed as the result of a schematization, namely a process that involves the
systematic selection of certain aspects of a referent scene to represent the whole,
disregarding the remaining aspects (Talmy 1983), should enable the disambi-
guation of the preposition. Hence to select, idealize, approximate and conceptual-
ize a given meaning, we reduce its indeterminacy by specializing the real
physical scene to a sketchy semantic content.
That reduction can not solve the disambiguation problem without detailed
information about the abstract spatial relation applied to geometric objects — as
points, lines, surfaces and ribbons — and the objects involved in the scene. Here
the claim is that there is a strict interaction between the spatial properties of the
entities involved in the description and the general world knowledge encoded in
the ontology of the system. In addition, we suggest that such information should
be part of the conceptual representation of events and physical objects. Thus, the
fact that an action spatially identifies a flat region of space, either as a region
conceptualized as a container or a cognitively salient surface, must be available
to the process of interpretation.
In spatial locative expressions, where the location of an object is described
via a reference to the known location of another object, the literature presents
several different terms: we follow Herskovits’ definition and we use respectively
the terms Figure Object (or Figure for short) and Reference Object. For example,
in the sentence put the book on the stack, we describe that locative action as a
spatial relation between the book (Figure) and the stack (Reference Object).
252 LUCA ANIBALDI AND SEÁN Ó NUALLÁIN
two or three ideal geometric objects (e.g., points, lines, surfaces, volumes,
vectors) — in fact, ideal meanings are usually those simple relations that most
linguists and workers in artificial intelligence have proposed as meanings of the
prepositions. These relations play indeed an important role, but as something
akin to prototypes, not as truth-conditional meanings.“ (Herskovits 1986: 39)
Coincidence of points, inclusion of a point in a line or in an area, and
contiguity of two surfaces are some examples of relations considered by Hers-
kovits as ideal meanings. They are directly reflected by language so that
idealizations, approximations and conceptualizations mediate between a canonical
view of the world and language.
An important issue can be outlined on that study of spatial prepositions. It
concerns the proposal by (Landau & Jackendoff 1993) about the “what” and the
“where” systems, performing separately object identification and object localiza-
tion. This hypothesis states that preposition use and noun use access only the
encoding produced by the distinct neural systems, see (Ungerleider & Miskin
1982). Thus, when naming objects, we use detailed geometric properties of them.
But, when locating them, we use rough representations as points and lines. On
the contrary, we may need fine information about shape of objects and locations
for applying the appropriate preposition, even if its relative selectional restric-
tions specify no shapes of Figure and Reference Object. The position adopted
seems to fail to distinguish the selectional restrictions from the knowledge of
object shape, so we follow Herskovits in rejecting their hypothesis.
Moreover, further evidence shows the boundedness of computational models
ignoring schematization and its relationship with the perceptual, conceptual and
spatial cognitive analysis.
According to their most salient senses, we can put forward a rough classifi-
cation of prepositions, sorted into two different groups:
Table 1
Location Motion
At/on/in Across
Upon Along
Against To/from
Inside/outside Around
Within/without Away from
Near/far from Toward
Next Up/down (to)
Beside Into/(out of)
By Onto/off
Between Out
Beyond Through
Opposite Via
Above/below About
Under/over Ahead of
On (the) top of/bottom of Past
Behind
In front of/back of
Left/right (side) of
East/north/south/west of
Contrary to a common belief, not all prepositions treat Figure and Reference
Object as simple geometric objects, but the region occupied should be known to
determine whether a preposition applies. In fact a closer investigation reveals that
the following assumption is unfounded: “If the prepositional predicate applies to
objects of any shape, then its truth in particular cases can be assessed without
referring to the object’s shape”. For example, in motion prepositions, such a
predicate is approximated to the motion of the centroid — centre of gravity —
of the object, ignoring rotations and translations around it.
The transformations result in different idealizations of the spatial objects:
when an object is described as being at some reference object, both objects are
idealized as points. To that purpose, a set of geometric description functions was
collected by Herskovits (Herskovits 1986: Chapter 5). Usually, several coercions
have to be applied in sequence until the appropriate conceptualization is found.
The schematization of the spatial objects depends on functional part-whole
relations, but also sometimes on certain aspects of the referent scene, such as the
256 LUCA ANIBALDI AND SEÁN Ó NUALLÁIN
distance between Figure and Reference Object for the projective prepositions (to
the right, above, behind, etc.).
Hence our research on spatial expressions goes toward the design of a
multilevel system, taking into account different modules of analysis:
Table 2
Conceptual Level
Naïve Semantics: a hierarchy describing relations between conceptual entities —
objects and events — by means of prototype theory and other salient properties.
Semantic Level
Lexical Semantics: a cognitive representation that refers directly to the linguistic
locative expressions.
Reference Semantics: a basic spatial representation establishing the connection between
the ideal meaning of spatial relations and the spatial entities: a structural description —
proximity, surrounding, alignment, etc. — to obtain a local frame of reference for
various configurations of objects.
Pragmatic Level
Spatial Configuration: a geometric description of objects’ location specifying the
location and orientation of objects with respect to the local axes by appropriate topo-
logical — metric information (measures of angles and distances).
From this point of view, the schematization described above (Herskovits 1998)
and conceptual structures (Jackendoff 1990, 1995) play a central role in our
model, in conjunction as related modules of the Semantic Level. Both these
modules refer directly to locative expressions, so they concern the linguistic part
and they are independent of the context-specific information. Hence world
knowledge, which is requested for appropriate handling of specific situations, is
included in the latter level for processing locative actions in terms of topological-
metric knowledge on angles and distances in the given context.
On one hand, such a modularization allows abstraction from the concrete scene;
on the other, it also permits the interaction between semantic and pragmatic
representation.
Let’s assume that each lexical item in the sentence specifies how its conceptual
arguments are linked to syntactic positions in the heading phrase. Now a
thematic role or q-role is a term for an argument position in the conceptual
THE INTERPRETATION OF LOCATIVE EXPRESSIONS 257
structure, so for locative actions Agent and Theme are particular structural
positions as well.
Thus the first fundamental assumption asserts that every content-bearing
major phrasal constituent of a sentence (S, NP, AP, PP) corresponds to a
conceptual constituent of some major conceptual category. The basic machinery
under this approach is that the specification of conceptual structure within
arguments of the verb can be regarded as in large part orthogonal to the positions
of indices on arguments — the way the verb links its arguments to the syntactic
structure. The following rules use this lexical information to integrate the
readings of syntactic complements and subjects with indexed argument positions
in the conceptual structure of the head:
1. for each indexed constituent in the conceptual structure of a lexical item,
substitute the conceptual structure of that phrase, say YP, that satisfies the
co-indexed position in the subcategorization feature of that lexical item, if
its conceptual category matches that of the constituent;
2. if the lexical item is a verb, substitute the conceptual structure of the
constituent indexed in the conceptual structure of that lexical item, if its
conceptual category matches that of the constituent.
To express alternations, the subcategorization feature stipulates merely an
optional post-verbal phrase of arbitrary major phrasal category. This is possible
because of the selectional restrictions of the verb: only specific NPs or PPs can
correspond to a conceptual constituent of the proper category, and Objects must
always be expressed by NPs as well as Paths are expressed by PPs — in the
unmarked case.
Going further, the interpretation of a small class of Path-nouns as in “getting
down by climbing this route” behaves semantically like a PP, even though its
syntax is clearly that of a NP. The reason is that the NP expresses a Path, hence
it must fuse with the realization of the Path-constituent or map into Paths instead
of Objects.
Moreover, selectional restrictions could restrict the range of spatial region
occupied by objects assuming the Destination role, that is enabling a geometrical
conceptualization for individual or collective (physical) objects in the sense of
Herskovits’s analysis. From this point of view, the spatial relations expressed by
locative prepositions rule the correspondence between Conceptual Structures and
possible spatial relations in the relative spatial representation.
At the end, adopting a fine-grained framework — the conceptual structure
— seems necessary to outline the relations among multiple structures of lexical
entries and to account for syntactic alternations without loss of generality.
258 LUCA ANIBALDI AND SEÁN Ó NUALLÁIN
Table 3
Telic Paratelic
Means-ends dimension Essential goals No essential goals
Imposed goals Freely-chosen goals
Unavoidable goals Avoidable goals
Reactive Proactive
Goal-oriented Behaviour-oriented
End-oriented Process-oriented
Attempts to complete Attempts to prolong activity
activity
Time dimension Future-oriented Present-oriented
Planned Spontaneous
Intensity dimension Low intensity preferred High intensity preferred
Synergies avoided Synergies sought
Figure_Obj
Spatial_Ident Ref_Obj
Loc_in
Area area
Volume Loc_in
volume
Fuzzy_area
Loc_in_
Bounded_area
bounded_area Loc_in_
fuzzy_area
Figure 2
meaning of a preposition. For example, the preposition in deals with the follow-
ing relations: “located in” (loc_in for short) and “destination inward a Reference
Object” (dest_in). This spatial relation is further specialized when it is associated
with the geometric description of the Reference Object, that is the Destination of
the action; for example, a spatial relation of location in area may be restricted to
loc_in_fuzzy_area or loc_in_bounded_area.
Referring back to the theories of (Herskovits 1986), such locative actions
are discriminated upon the region of space, in terms of geometric description,
identified by the spatial relation between Figure and Reference Object.
As result here are the following conceptual entities for verb lexical entries
in English and Italian:
NoContact
PUT OVER, HANG
[CAUSE ([Object ], [Event TELIC ([Object ], [Path TO ([Place OVER ([Object ])])])])]
262 LUCA ANIBALDI AND SEÁN Ó NUALLÁIN
PUT ALONG
[CAUSE ([Object ], [Event TELIC ([Object ], [Path ALONG [Place ON ([Object ])])])]
Contiguity
ALIGN, SET IN ROWS allineare
Adherence
TWIST, RING attorcigliare
Finally, in the our approach, lexical semantics plays a central role indicating the
relationship between grammatical relations of sentences and thematic cases of the
action described. The fundamental difference of this proposal conceives the
appropriate layer of lexical knowledge representation and the use of them in a
NL interface.
Moreover, although the model used here derives all linguistic knowledge
(pragmatic, semantic and syntactic) from a cognition-action system, so both
pragmatic and semantic structures are made of similar cognitive material, they
still constitute separate functional processes at different moments in generating
natural language sentences. The user can simply use a pragmatic procedure to
generate a communicative act, or he can take his own pragmatic procedure as the
object or content of a higher conceptualization: he has shifted cognitive levels.
Austin (1962) notes that “To say something is to do something”, that is
264 LUCA ANIBALDI AND SEÁN Ó NUALLÁIN
References
Apter, MJ. (1989). Reversal theory: Motivation, Emotion and Personality. London:
Routledge.
Austin, J. L. (1962) How to do things with Words. New York: Oxford University Press
1962.
Bennett, D. C. (1975). Spatial and Temporal Uses of English Prepositions: An essay in
stratificational semantics. London: Longman.
Herskovits, A. (1986). Language and Spatial Cognition. Cambridge, England: Cambridge
University Press.
Herskovits, A. (1997). Language, spatial cognition and vision. In O. Stock (Ed.),
Temporal and Spatial Reasoning. Boston: Kluwer.
THE INTERPRETATION OF LOCATIVE EXPRESSIONS 265
1. Introduction
We are concerned with how the full interpretation of language about space and
motion is computed from the words in the sentence, the grammatical structures,
and the context of utterance. We have implemented a system which simulates
this act of interpretation in some detail: a natural language interface to a virtual
reality, geographic information system (the NLVR system). Our goal is to
provide an expressively powerful human/machine interface by simulating human
understanding of natural language imperatives. Examples of the sentences we
will discuss include: turn around, drop down to the ground, zoom northward, and
veer off thirty six degrees to your right. These are the kinds of sentence that a
viewer can utter in order to navigate through our virtual environment. Utterances
such as these raise two problems for a theory of meaning:
The problem of computing lexical, phrasal, and full sentence meaning of
sentences uttered within an extra-linguistic context; how does the context
constrain meaning?
The problem of composition of meaning; how is the meaning of a sentence
composed from the meanings of its parts?
On our view these two problems are linked. In Sections 4 and 5 we motivate and
present our theory which (a) captures lexical meaning in terms of conceptual
structures of conceptual primitives; and (b) explains how these conceptual
structures can be composed into full sentence meanings. We believe that it is
important to work out details of this composition of meaning in order to advance
268 JOHN GURNEY AND ELIZABETH KLIPPLE
and constrain hypotheses about the human conception of space and motion that
supports intelligent speech. The ancillary benefit from this work will be a more
useful and expressively powerful natural language interface to a VR system.
We begin with a description of the NLVR system in Section 2 followed by a
discussion of the situation of the viewer in the virtual environment in Section 3.
The authors have created a fully operational speech and natural language
interface to a real-time 3D virtual reality environment. The interface supports
navigation in a 3D landscape and is an interesting testbed for detailed semantic
interpretation of spatial and motional language.
The virtual environment software runs on a Silicon Graphics Octane
computer. We use both a wide screen desktop display (12 inches high by 19
inches wide) and a back-projected glass wall display (5 feet high by 6 and 1/2
feet wide) to show the virtual environment. A snapshot of a typical scene appears
in Figure 1. This is a view of the National Training Center in central California.
The terrain in this scene is modeled and displayed in real time (about 16 frames
per second) using the Virtual Graphic Information System (VGIS) (Koller et al.
1995). The terrain models are derived from elevation and imagery data.
The Spoken Language Navigation Task requires a user (who we call the
viewer) to navigate through this environment using only spoken natural language.
In this application navigation consists in moving your point of view about in any
of six degrees of freedom: at various linear and angular speeds, in various
directions, for various distances, or to various locations. It is possible to create
complex motions and trajectories in this way. An interesting outcome of this
project has been the realization of the importance of perception of spatial motion
and how this can be interpreted and controlled through natural language — a
point we will return to below.
The theory of lexical meaning as conceptual structure and the composition
of these structures into sentence meaning that we present in this paper was
implemented in the Interpreter portion of the Navigation Expert Module. We will
not take space in this paper to discuss the other modules of the overall computa-
tional system, the architecture of which is discussed in (Gurney et al. 1998).
SPATIAL MOTION IMPERATIVES 269
To use the NLVR system, a viewer faces the VGIS screen, wearing a micro-
phone headset, and utters commands which are interpreted by the system causing
actions in the VGIS system. One utterance will cause changes in the scene to
which the viewer may respond with a new utterance in a continuous processing
loop. Users quickly learn how utterances affect their movement through the
scene. Normal use of this system depends on what the viewer perceives along
with other knowledge brought to the task (e.g., notions of compass points) and
knowledge inferred. The viewer’s utterances and expectations of their effects
will depend on his beliefs or conceptions about these and other factors. We will
call all of this the viewer’s conception of the state of the environment.
We define the actual state of the environment as all of the information about
the virtual environment required for fully adequate computation of the meanings
270 JOHN GURNEY AND ELIZABETH KLIPPLE
of the viewer’s utterances. There is a well-defined and precise state of the virtual
environment that held true at the time of the snapshot in Figure 2. This picture
is probably confusing to the reader of this paper. However, the scene at the time
this snapshot was captured was not confusing to the viewer because the viewer
perceives and represents scenes dynamically and this is essential to the viewer’s
conception of even stationary objects and (in our case) the terrain. Figure 2 is
what the mountain looked like after the viewer had flown straight south from the
viewpoint in Figure 1 right up to the side of the mountain. Over this time period
the viewer spoke the following utterances into his microphone headset:
stop
crawl straight ahead
look to the southeast
look slowly down thirty degrees
roll
After observing the dynamic effects (motions and turnings) of his utterances the
viewer knew what he was perceiving in Figure 2. We will explain how the
interpretation of these utterances was computed in Sections 4 and 5. For now we
want to use this example of viewpoint navigation to say a few things about
system state in our environment. At Figure 2 we have the following values for
SPATIAL MOTION IMPERATIVES 271
and how the meaning of a sentence is composed. We might say that in (1) the
verb and the particle invoke hidden references to geometric entities that are
available from the state of the environment. Note, finally, that these meanings
depend only on selected aspects of state. The compass bearing of the helicopter,
for example, is irrelevant to (1).
IP
NP Ibar
pro Infl VP
Vbar
V PP
turn P NP
around pro
short-circuit this bit of reasoning by writing 180 degrees into the meaning of
around (see Section 5). Our plan for the NLVR project includes a more princi-
pled and adequate treatment of pragmatics in the future along the lines of
(Gurney et al. 1997).
IP
NP Ibar
Pro Infl VP
Vbar
V PP
Go P NP
Up Det N
The Hill
Here we have replaced the particle up with the prepositional phrase up the hill.
The parse tree for (8) appears in Figure 4. As before, the word up refers to an
axis in a coordinate system, as well as the positive direction. The coordinate
276 JOHN GURNEY AND ELIZABETH KLIPPLE
system is still embedded in the object referred to by pro, that is, its origin is in
the object. But the vector referred to by up must be aligned with the intrinsic
coordinate system of the object of the preposition, the hill. The question is which
coordinate system of the hill does up refer to? Unlike the viewer’s viewpoint, a
hill has spatial size and structure. For this reason, none of the three coordinate
systems (gaze, gravitational, global) may suffice for what is meant by up the hill.
Elsewhere (Klipple and Gurney 1997) we proposed a family of curvilinear
coordinate systems that conforms to the outer surface of the hill. After referring
to one of these coordinate systems, up aligns its vector with the appropriate
z-axis within. Our current NLVR application does not implement cases like
these, mainly because we have no computations yet that can recognize or extract
proper hills or other such emergent terrain objects in our environment.
Sentences like (9), which refer to a goal of an action, have been at the center of
research in lexical semantics (Gruber 1965; Jackendoff 1983; Levin and
Rappaport Hovav 1996; Tenny 1987).
(9) drop down to the ground
(10) drop down
Example (9) requests a completed action. (10) requests an action that has no
definite end; any amount of dropping down would do.3 In a sense, these two
actions are of different kinds. The verb drop in (9) refers to the viewer’s local
gravitational coordinate system. It also refers to an axis in that coordinate system
(namely, the vertical axis) as well as a motion vector, pointing down, aligned
with that axis. In this way, the verb drop incorporates meaning that is typically
expressed by verb modifiers (like the particle down). In our implementation, the
particle down in (9) is redundant; explaining why it looks natural for it to appear
in (9) is beyond our current interest. Since it is redundant, its meaning composes
nicely with the meaning of drop; it also refers to the vertical axis and a vector
along it pointing down. On the other hand (11):
(11) drop up
would be uninterpretable. The verb and the particle refer to the same coordinate
system and axis but they also refer to opposite (therefore, incompatible) vectors.
The prepositional phrase to the ground creates a goal in (9). This result
arises out of composition of meaning — as it must. The noun phrase the ground
refers to a piece of the terrain. The preposition to operates on this terrain to refer
SPATIAL MOTION IMPERATIVES 277
to a location in the coordinate system referred to by the verb drop. This location
is the goal. So the original unbounded, atelic motion action becomes part of a
more complex telic action. Our views on the transformation of atelic to telic
spatial motion actions follow the lines originally proposed by (Pustejovsky 1992)
(which fails to account for spatial geometry, however). An adequate cognitive
theory of telicity for spatial motion actions has not been developed at this time.
For the NLVR implementation, we simply distinguish two versions for each
action — one atelic and one telic.
There are several other ways that the meanings of words access conceptions of
the state of the environment. Examples of a few other kinds of sentence that fall
into this group are: start dropping a lot faster, veer off thirty six degrees to your
right, look straight down, back up three klicks.
We have said little about the mood of our sentences. All of the examples we
have discussed are in the imperative mood. The declarative mood of (12) and the
interrogative mood of (13) have rather different effects in our application, of
course.
(12) he turned around
(13) did he turn around
Meanings of words and phrases are the same no matter what the mood of the
sentences. Mood, however, takes scope over a sentence. It ranges over the
conceptual structure that was composed for a sentence and determines how a
sentence is to be understood and responded to.
Our natural language interpreter composes meanings for sentences that have been
parsed by REAP and translated into logical forms by the Logical Form (LF)
Module. The logical forms for some of the sentences we have discussed include:
LF for (3): [pro:X1, [turn:E:X1, [around:E:X2, pro:X2]]]
LF for (5): [pro:X1, [fly:E:X1, faster:E]]
LF for (9): [pro:X1, [drop:E:X1, [down:E:X2, pro:X2], [to:E:X3, [the:X3,
ground:X3]]]
LF: [pro:X1, [veer:E:X1, [off:E:X2, pro:X2], [measure:E:X3, [number:thirty-
six:X3, degree:X3], [to:E:X4, [your:X4, right:X4]]]]
278 JOHN GURNEY AND ELIZABETH KLIPPLE
Lexical entries for verbs, adverbs, particles, prepositions, nouns, and so on, must
specify structures of the appropriate state elements. These are the lexical
conceptual structures, LCSs.5 Thus, words decompose into structures of non-
linguistic primitives. Fully adequate LCSs for the lexicon of a natural language
such as English is a goal of ongoing research (Levin 1993; Pustejovsky 1993;
Klipple 1997). For our purposes, we have built an inventory of LCSs that aims
to be adequate for the NLVR navigation task.
Here in schematic form are a few of our lexical entries. An explanation of
terminology follows below. Some verbs:
vm(turn, rotate-act:E:TH).
vm(turn, rotateto-act:E, TH).
vm(rotate, rotate-act:E:TH).
vm(rotate, rotateto-act:E:TH).
In the lexical entries for turn and rotate — which we do not distinguish in our
current system — E is the event (Davidson 1966) and TH is the theme (moving
object).6 Both words refer to the same basic (i.e., conceptual) action, rotate-act.
As mentioned above, we divide all basic motion acts into telic and atelic
versions, for example, rotate-act and rotateto-act.
The verb zoom can mean either an axial act like rotate-act or a linear act
like translate-act. This verb incorporates not only a reference to the act but also
one of the properties of a basic act, namely, speed, as shown in the following
lexical entries.
vm(zoom, [A, speed(A:E:TH, FS)]) :- typeact(A, axial), canon(A, TH, speed,
S), FS is 4.0 * S.
vm(zoom, [A, speed(A:E:TH, FS)]) :- typeact(A, linear), canon(A, TH, speed,
S), FS is 4.0 * S.
SPATIAL MOTION IMPERATIVES 279
In the first entry for zoom, A is the basic motion act (it could be rotate-act), and
FS is the required speed, which is (perhaps arbitrarily) fixed at four times some
canonical speed for the object TH undergoing the act A.
The particle around is represented as:
partm(around, A:E:X, [coordsystem(E, CS), axis(Z, CS), embed(CS, X),
goal(E, 180.0)]) :- typeact(A, axial), typeact(A, telic).
In this lexical entry for around, E is the event and X is some entity referred to
by the pro object of the particle in the parse tree. As we can see, the word
around refers to several properties of a basic action: coordsystem, axis, and goal.
Furthermore, this meaning for around can only apply to acts of type axial.7
In general, these words refer to actions, geometrical and other entities, or
their properties. There is a finite number of these conceptual items and there are
various restrictions on their types and application. Furthermore, the members of
this relatively small group of conceptual items are constituents of a relatively
large number of word meanings.
The basic actions, entities, and their properties that we employ in building the
lexicon are the types of things that also become the components for the state of
the virtual environment discussed in Section 3. Many of them are observable by
the viewer or can be easily brought into the viewer’s conception of the state.
Below we list a few examples of these conceptual primitives.
Basic actions occur in a taxonomy, a portion of which we display here.
rotate-act is a motion that is axial. translate-act is a motion that is linear.
motion(E) :- axial(E).
motion(E) :- linear(E).
axial(E) :- rotation-act(E).
linear(E) :- translate-act(E).
The basic actions have structure, examples of which are specified by the
following two formulae. rotate-act has an axis A in a coordinate system C that
is embedded in some object X. E is a variable that will be bound to any particu-
lar instance of a rotate-act event.
rotate-act(E, X) :- axis(E, A), coordsystem(A, C), embed(C, X).
translate-act(E, X) :- vector(E, V), coordsystem(V, C), embeddedin(C, X).
As illustrated in the lexical and basic action examples above, the basic actions
280 JOHN GURNEY AND ELIZABETH KLIPPLE
have various associated properties. In addition to axis, vector, etc., others are:
direction(E, D) which is the positive or negative direction of an axial act,
gaze(C) which is a type of coordinate system, and zaxis(A) which singles out an
axis in a coordinate system. In addition, there are: linear measure, angular
measure, coordinate points, and rates of change of quantities, among others.
Properties and quantities get fixed by value-setting and value-getting actions. For
example, for the imperative (3) (turn around) the following value-setting action
would be performed before executing the main action:
set(goal, E, 180.0).
And for the imperative (7) (drop down to the ground) there are two preliminary
value-setting actions and one preliminary value-getting action:
set(direction, E, negative),
get(elevation(H, terrain)),
set(goal, E, H).
Value-getting actions corresponding to the state variables include:
get(E, speed, S),
get(X, pitch, P),
rotating(X, YN).
The formula rotating(X, yes) means that X is currently rotating.
6. Conclusion
The lexical semantic constituents of the words we have been discussing refer to
pieces of the viewer’s conception of the state of the environment. For language
about space and motion, these include coordinate systems, axes, motion vectors,
direction vectors, the terrain and other objects, including the viewer. These types
of entities should be added to the ontology already assumed for LCSs. This is
necessary for two reasons: first, for an adequate treatment of the semantics; and
second, for grounding interpretation in real or artificial non-linguistic contexts.
Vagueness and other kinds of uncertainty must be accounted for by pragmatics
and inference from world knowledge, and values determined in this way must be
input to the algorithm that composes meaning. Meanings of phrases and sentenc-
SPATIAL MOTION IMPERATIVES 281
Acknowledgments
The authors are indebted to: Joseph Garman of the University of Maryland for the Generative
Grammar parser REAP; Timothy Gregory of the Army Research Laboratory (ARL) for the
Navigation Application Program Interface to the VGIS system; and Michael Salonish and others for
the ARL version of VGIS.
Notes
1. (Jackendoff 1996) talks about coordinate systems as “frames of reference”. The same notion (or
a closely related one) is called “perspective” in (Levelt 1996; Tversky 1996).
2. If we were to extend normal Tarskian denotational semantics to this case we would say that the
verb denotes a set of vectors and the adverb denotes a subset of this set. See (Chierchia and
McConnell-Ginet 1990) for an example of Tarskian formal semantics applied to natural
language and see (Larson and Segal 1995) for a non-set-theoretic and, to our thinking, more
interesting version of formal semantics.
3. The lexical semantics of drop has to be compatible with telic (bounded) and atelic (unbounded)
meanings. This is not trivial, but we will not discuss it here. See (Dowty 1979; Tenny 1987;
Dorr and Olsen 1997).
4. The notation here and throughout Section 5 is Prolog.
5. Our theory of LCS is not identical to that of (Jackendoff 1983) or (Levin and M. Rappaport
Hovav 1995, 1996) but it is in the same spirit. We would envision that the theory of LCS be
supplemented by our proposals.
6. The verbs turn and rotate actually have significant but only partially overlapping distributions
in English.
7. This kind of restriction achieves for verb meaning what (Pustejovsky 1993) calls co-composi-
tion.
References
Levin, Beth and Malka Rappaport Hovav (1996). Lexical Semantics and Syntactic
Structure. In S. Lappin (Ed.), Handbook of Contemporary Semantic Theory (487–
508). Oxford: Blackwell.
Levinson, Stephen (1983). Pragmatics. Cambridge, MA: Cambridge University.
Pustejovsky, James (1991). The Generative Lexicon. In Computational Linguistics.
Pustejovsky, James (1992). The Syntax of Event Structure. In B. Levin and S. Pinker
(Eds.), Lexical and Conceptual Semantics (47–82). Oxford: Blackwell.
Russell, Bertrand (1905). On Denoting. In Mind.
Tenny, Carol (1987). Grammaticalizing Aspect and Affectedness. PhD. dissertation, MIT.
Tversky, Barbara (1996). Spatial Perspective in Descriptions. In P. Bloom et al. (Eds.),
Language and Space (463–492). Cambridge, MA: MIT.
Modelling Spatial Inferences in Text Understanding
1. Introduction
z-axis to the above/below axis of the body. The orientation of these axes is
arbitrary for objects without intrinsic axes. To describe relations, we follow an
approach that is used in robotics (Ambler and Popplestone 1975). A relation
between two objects A and B is described by the 4 × 4 matrix corresponding to
the rotation and translation needed to map the coordinate system of A onto that
of B. For the restricted case where only rotations around the vertical (z) axis are
allowed, this matrix has the form:
∆ (xA , B )
R ∆ (yA , B ) cos q − sin q 0
T = with R = sin q cos q 0
∆ (zA , B ) 0 0 1
0 0 0 1
our graphs will always be labelled with a transformation matrix and may also be
labelled with the name(s) of one or more defined relations.
In the following, we will use the example text to explain the algorithm we
have implemented. The psychological motivation for our program, and the
unsolved questions with respect to the construction of and the access to a mental
model are discussed in Claus et al. (1998). Recent results can also be found in
Hörnig et al. (1999).
When started, the program reads the definition of objects and predefined relations
from two files.An object description contains the name of an object class (e.g.
person), the form (cylinder), the names of the dimensions (r and h) and optional-
ly upper and lower bounds on the possible extensions. The graph is initialized
with a subgraph representing a room object2 (see Figure 3). When sentence (1)
is read, a node for Torsten is created and an arc from Torsten to the room node
is introduced which is labelled with is_in_room and the generic transformation
matrix. The constraints for the is_in_room relation are stored in the constraint
table. The transformation matrix T (room,To) describes the rotation of the axes and
the translation of the origin of the room coordinate system necessary to identify
MODELLING SPATIAL INFERENCES 289
both coordinate systems. When this matrix is multiplied with a vector represent-
ing a location of an object A (respectively its origin) in Torsten’s coordinate
system, the result is the location of A with respect to the room coordinate
system. Therefore the arc is pointed from Torsten to the room node. In Figure 3
the default graph for a room node and the default layout of a room is shown.
Neither the room object nor any of its sub-objects have intrinsic front or right
sides. The coordinate system of room is positioned parallel to its walls. The
coordinate systems of the walls are positioned in such a way
that the relation is_in_room corresponds to a negative D(Wall,A)
x value for any
object A and any wall.
When sentence (2) is read, a node for the refrigerator is created, the arc for
the mentioned relation left_of (Torsten, refrigerator) is introduced, and the arc for
the relation is_in_room (refrigerator) is introduced (as part of our background
knowledge). The transformation matrix and constraints for the is_in_room
(refrigerator) arc are computed by multiplying the matrices for T (room,To) and
T (To,re) and propagating the constraints. In addition, the generic constraints for
the is_in_room relation are stored (they only state that an object must be com-
290 UTE SCHMID, SYLVIA WIEBROCK AND FRITZ WYSOTZKI
pletely inside the room). A possible situation for these two sentences is shown in
Figure 4. As the walls are not distinguished, we have chosen to orient Torsten so
that his coordinate system is parallel to the room coordinate system. This
corresponds to the transformation matrix
∆ (x
1 room,To )
0 0
room,To )
T(
room, To )
=0 1 0 ∆ (y
0 0 1 To.h - room.h
0 0 0 1
− ( room.w − To.r ) ≤ ∆ (x
room,To )
≤ room.w − To.r and
− ( room.d − To.r ) ≤ ∆ (y
room,To )
≤ room.d − To.r
The refrigerator has an intrinsic front (presumably where the door is) and is
oriented with its back side against the wall. The transformation matrices for
left_of (Torsten, refrigerator) and is_in_room (refrigerator) are
MODELLING SPATIAL INFERENCES 291
0 1 0 ∆ (xTo,re )
T(
To , re )
= −1 0 0 ∆ (yTo,re )
0 0 1 re.h − To.h
0 0 0 1
∆ (x
To,re )
≤ − ( To. r + re.d )
− [( ∆ ( To,re )
x )
− To. r − re.d + ( To. r − re.w ) = ]
( ) (
− − ∆ (xTo,re ) − re.d − re.w ≤ ∆ (yTo,re ) ≤ − ∆ (xTo,re ) − re.d − re.w )
0 1 0 ∆ (xroom,re )
and T(
room , re )
= −1 0 0 ∆ (yroom,re )
0 0 1 re. h − room. h
0 0 0 1
constraints on D(To,re)
y contain the variable D(To,re)
x and are not used to build further
(room,re)
constraints for Dy .
This strategy of immediately updating the is_in_room arcs corresponds to
the selection of the room coordinate system as a global reference frame. Though
we retain all arcs given by the propositions, we always compute the position
relative to the room. To use the coordinate system of the protagonist (e.g.
Torsten) would be cognitively more plausible, but as yet we have no conclusive
results about the building and updating of mental models and have opted for the
most efficient (from a computer science point of view) strategy.
Suppose that after sentence (3), we have the graph in Figure 5. In this graph
all and only the relations given in the text plus the implicit is_in_room arcs are
represented. Suppose further that we were asked the relationship between Torsten
and the bowl. To infer the relation between Torsten and bowl, we have to find a
directed path from bowl to Torsten. In the figure there is exactly one such path,
but — as stated above — we can easily compute the inverse relation, i.e. invert
the arc, by computing the inverse matrix. These inverted arcs are not introduced
into the graph, because we want to keep the number of arcs as small as possible,
but can be used for the inference. Now we get four possible paths: bowl –
refrigerator – Torsten, bowl – room – Torsten–1, bowl – refrigerator room –
Torsten–1, and bowl – room – refrigerator–1 – Torsten, where the raised −1
denotes inverting the preceding arc. The default relation is_in_room introduced
for every object ensures that the graph is always connected. Therefore we will
always find at least one path. Which path is chosen for the inference depends on
MODELLING SPATIAL INFERENCES 293
the search strategy. From a computational point of view, we might prefer to omit
the search altogether and take the path via the room node which is always a
shortest path. This is the default strategy we have implemented at present.3 If we
have found a path, we can multiply the transformation matrices at the arcs. For
the default path via the room node, we get the equation
-1
T (Torsten,bowl) = T (room,Torsten) × T (room,bowl)
Ê1 0 0 - D(room,To)
x
ˆ
Á (room,To) ˜
Á0 1 0 - Dy ˜
Á ˜
Á0 0 1 - (To.h - room.h)˜
ÁË 0 0 0 1˜¯
Ê0 1 0 D(room,bowl)
x
ˆ
Á ˜
Á -1 0 0 D(room,bowl)
y ˜
¥
Á ˜
Á0 0 1 2re.h+bowl.h - room.h ˜
ÁË 0 0 0 1 ˜¯
Ê0 1 0 D(room,bowl)
x - D(room,To)
x
ˆ
Á ˜
Á -1 0 0 D(room,bowl)
y - D(room,To)
y ˜
=
Á ˜
Á0 0 1 2re.h+bowl.h - room.h - (To.h - room.h)˜
ÁË 0 0 0 1 ˜¯
After simplifying the expressions as far as possible and propagating the con-
straints for the parameters in the matrix, we have to match the constraints for the
transformation matrix against the definitions of all pre-defined relations. For the
example above, it can easily be proved that left_of (Torsten, bowl) holds (and
none of the other relations, if we only provide the set shown in Figure 2 and on).
For the natural path bowl – refrigerator – Torsten, the computations would be
even simpler.
In general, this task is by no means trivial, because we can get nonlinear
equations involving trigonometric expressions. As shown above, a possible
simplification of the problem is the use of defaults. Currently, we restrict the
orientations of objects to multiples of 90˚ around the z-axis. This means that the
coordinate systems of all objects become parallel or orthogonal. This makes the
inferences easier without too much loss of expressive power. We are at present
294 UTE SCHMID, SYLVIA WIEBROCK AND FRITZ WYSOTZKI
3. Related Work
In this paper we have focussed on the program and have largely ignored the
psychological questions concerning the mental model. These questions are
discussed in Claus et al. (1998). The model as presented above is implemented
as a prototype SPACE/0 (Wiebrock et al. 2000). In this prototype, we only
consider intrinsic uses of spatial expressions. The graph we build contains the
explicitly given relations plus the implicit is_in_room arcs. After inference
(including constraint propagation) these arcs contain all available information
about an object’s position in the room. Therefore the most efficient inference
strategy is to use the path via the room node. This corresponds to a mixed
strategy of localizing all objects with respect to the global reference frame of the
room and still retaining the arcs for the explicitly given relations. As this
example shows, our (implemented) mental model is not equivalent to a single
perspective but allows to compute different perspectives, if necessary. For the
sample text above, another plausible strategy would be to assume a protagonist’s
perspective and use the protagonist’s coordinate system as global reference
frame. We have preferred the room reference frame because a reorientation of
the protagonist would necessitate the recomputation of all adjacent arcs. There-
fore we tried to keep the number of those arcs as small as possible.
In the example above we never had to consider more than one mental
model. When using relation definitions with disjunctions (i.e. for the relation
beside), we may be forced to construct several pairwise incompatible models.
Another case is discussed in Wysotzki et al. (1997). In this paper, disjoint
models have to be constructed for the sentence “The oven stands at the wall.” As
there are four possible walls, the sentence is ambiguous. At encoding time it
cannot be decided which wall is meant. The situation is disambiguated with the
next sentence of the text, and three of the models can then be discarded.
In the future we plan to parameterize the model so that different strategies
both for the introduction of arcs — at encoding time — and the inferences at
access time can be modelled.
296 UTE SCHMID, SYLVIA WIEBROCK AND FRITZ WYSOTZKI
Acknowledgments
This research was supported by the Deutsche Forschungsgemeinschaft (DFG) in the project
“Modellierung von Inferenzen in Mentalen Modellen” (Wy 20/2–1) within the priority program on
spatial cognition (“Raumkognition”).
Notes
1. We omit introduction, filler sentences and object descriptions. In our experiments (Hörnig et al.
1996), we are using german texts.
2. At present, room objects are the only structured objects we consider.
3. Recent results described in Hörnig et al. (1999) suggest that the initial position of the protago-
nist, e.g. Torsten may serve as global reference frame. This would correspond to another
organization of the graph.
References
1. Introduction
2. Computational background
Kogan (1963) have derived bipolar cognitive styles by contrasting scores on tests
designed to measure abilities at the two poles, they term this methodology
‘constructed contrasts’.
This idea that styles have a ‘balance of advantage’ across tasks leads to
their third characteristic: they are taken to be pervasive rather than focussed,
affecting what people do across a wide variety of situations rather than in a
single situation. To continue with our previous example, if field independence
were only the capacity to perform extractions of geometrical figures from
complex grounds, then it would be a poor candidate for a style. Field indepen-
dence is of interest because this apparently ‘low level’ perceptual ability
correlates with abilities on superficially quite different tasks.
The contrast between ‘visual’ and ‘verbal’ thinking bears most of the
hallmarks of a cognitive style dimension. These are modes of processing which
certainly manifest in terms of strong personal preferences. For a wide range of
tasks there are choices between doing them in different ways which might at
least loosely be called visual or verbal. They are not obviously societally
evaluative abilities. To say of someone that they are not intelligent is negative:
to say they aren’t visual or aren’t verbal is more neutral, bearing the implication
that they have a profile of abilities at specific tasks where this style will advan-
tage some and disadvantage others.
Along with this growth in thinking about individual differences in terms of
styles rather than capacities, there has continued a not unrelated controversy
about whether visual or verbal ability ought to be conceived in terms of differen-
tial ability to represent information in one or other modality, as opposed to
different strategies for choosing and reasoning with representations.
There are many different ‘visual’ abilities — at least many different tests
of plausibly ‘visual’ tasks which yield less than totally correlated scores —
mental rotation, paper folding, embedded figures, Raven’s matrices, the GRE
analytical reasoning (analytical) subscale, etc. Setting aside for the moment the
issue of how to individuate these abilities, perhaps the most obvious explanation
of individual differences associated with visual/verbal tasks is in terms of
capacity to summon up, or maintain mental representations in the two modalities.
However, Lohman and Kyllonen (1983) have suggested that ‘high spatial’
subjects are to be characterised in terms of their flexible choice of strategies for
reasoning, rather than representational capacity per se.
Lohman & Kyllonen studied performance on the paper-folding test, and
found that there were various strategies students adopted in the solution of the
task. Those students that scored highest on the test adapted their strategy from a
‘spatial representation’ for the easier problems, to a more ‘analytic’ method for
THE CHARACTERISATION OF INDIVIDUAL DIFFERENCES 303
one cannot escape strong intuitions that there can be a separation of ‘operative
skill’ with an algorithm, from ‘insight’ into why the algorithm is sound. The
graphically inclined students often spontaneously explain, in a non-technical
vocabulary, why the algorithm is as it is and how it maps on to the logical
model-theory for the domain. This might, for example, be couched as an
observation about how the algorithm ensures that ‘all cases are searched’. In
contrast, the graphically disinclined may have perfectly mastered the operation of
the ‘graphical calculator’, but in debriefing, it remains for them merely a
graphical rigmarole. Asked why some aspect of the graphical procedure is as it
is, the stereotypically graphically disinclined reasoner might reply: ‘Search me
guv, that’s how you said to do it!’
Conceiving of individual differences as styles, and attributing visual/verbal
differences to strategic adoption and deployment of methods are movements in
similar theoretical directions. It might be objected that moving to studies of
external representation use more or less guarantees that the data will be dominat-
ed by strategic differences rather than pure capacity to hold representations.
Perhaps the right rhetorical stance is to accept this truth, but to adopt a research
program which asks to what extent we can explain the phenomena of representa-
tion use (both internal and external) in terms of strategic differences. The
computational studies of the last half century have provided us with a rich
conceptual vocabulary and considerable understanding of the importance of
strategy in reasoning, and how strategies interact with representations. Psycho-
metric approaches have always suffered from their lack of process accounts of
the mental goings on which they measure. Applying computational concepts to
issues of individual difference would appear a promising program.
Roberts (1998) responds to the studies that reveal that individual differences are
replete with strategic influences by proposing there is a consistent progression of
development of strategies as subjects gain expertise with a task. This progression
starts out with adoption of spatial strategies but develops in the direction of their
progressive optimisation by the intercalation of verbal ‘short cuts’ which increase
the efficiency of reasoning. The ‘cancellation’ of reciprocal spatial operations in
the compass directions task is a paradigm example of this kind of efficiency
gain. Spatial, or visual, subjects are the ones who are more adept at making these
modifications of strategies in the course of practice with a task.
At one level, our logic teaching observations reviewed above, the Hyper-
306 KEITH STENNING AND PADRAIC MONAGHAN
proof results, the GRE workscratchings study, and the Euler Circles ones, are all
consistent with Roberts’ line of reasoning. Explanations of these findings in
terms of differences in subjects’ strategies, or their learning of strategies, are
more consistent with the data than explanations in terms of some fundamental
ability to represent.
However, at another level, the computational explanation is at least superfi-
cially opposite. Our approach to the issue of when graphical representation is
useful is predicated on the fact that graphical representations are weakly expres-
sive and therefore tractable to reason with as long as the reasoning task can be
solved within the expressiveness of resources offered. So graphical reasoning is
efficient relative to reasoning in general expressive languages which suffer from
providing too many avenues of reasoning for computational tractability. By the
same token, graphical reasoning systems are also specialised. They may well
constrain reasoning in a way that means that some class of answers lies com-
pletely outside their range of expression. A reasoner operating within some
graphical system who was set such a problem would have to switch out of the
system, either into a more general linguistic one, or perhaps to an alternative
graphical system specialised for a range of problems which included the new one.
So there is a definite tension between Roberts’ idea that linguistic methods
arise by the optimisation of graphical ones, and our idea that graphical systems
are efficient because they are special purpose. What is the resolution of this
conflict? There is plenty of scope for resolution within a computational concept-
ualisation, if only because so little is specified, in general terms, about the
psychological tasks which crop up in this literature. Notably, what ‘verbal’
systems of representation are in play in these tasks is rather unclear. Although
languages of the kind whose computational properties are much explored are
highly expressive, there are of course other ‘verbal systems of representation’
which are highly restricted in their expressiveness. Particularly relevant to one of
our graphical examples above, categorial syllogistic logic is an extremely
restricted language which is exactly equivalent in expressive power to Euler’s
graphical system. The great difficulty that arises in understanding language use
in reasoning is exactly the question of what language is ‘in play’ at any given
point. The syllogism is a very tiny fragment of a natural language such as
English. When someone ‘does syllogisms’ verbally, are we to think of them as
‘doing them in English’ or ‘doing them in the syllogistic fragment of English’?
These are deep questions, but ones which are important for further develop-
ment of our understanding of human representational behaviour. This is an area
where understandings from AI and computer science may provide some help
with questions about how to conceptualise mental processes. Whereas an
THE CHARACTERISATION OF INDIVIDUAL DIFFERENCES 307
must respect these general meanings, but in order to be efficient, they must
exploit limitations of the particular fragments at hand in the ways they exert their
extra-logical control of reasoning processes.
This is the kind of picture of inference we get from AI and computer
science. From this perspective, graphical reasoning systems can be seen as
bundling together language and theorem provers in ways that makes their
separation a great deal less clean than it is in the linguistic case. This is the
reason why graphical systems are by their nature specialised and local and do not
include anything corresponding to the general languages so prominent in the
sentential systems. The jumps from graphical system to graphical system are
much more idiosyncratic (think of the change from Hyperproof’s blocks-world
diagrams to Euler’s Circles). Indeed, to invent an account of the space in which
these transitions take place we would be forced back onto thinking in terms of
some general logical language.
So coming from a logical/computational direction it is natural to think of
graphical systems as the specialisation of reasoning within localities of a space
defined by some general language. Roberts, in contrast, is thinking of subjects
who start their reasoning from some specification of a task which is often
naturally represented graphically (e.g. the compass directions tasks mentioned
above). Within such restricted systems, it is nevertheless often possible to impose
further restrictions which then allow yet simpler representations for reasoning.
So, for example, the cancellation strategy only works in a fragment of the
graphical system of compass directions if the task is restricted to unit distances
or the task is limited to computing the outcome direction of travel.
A rather richer and more interesting set of relationships between graphical
and sentential methods occur with syllogisms. Here it is possible to illustrate how
one can start with a graphical system of reasoning and transform it into the
equivalent of a sentential system by introducing optimisations which start out as
strategies for selecting graphical representations. In the ‘primitive’ interpretation
of Euler’s Circles, which is uniformly the one people initially adopt, one circle
diagram depicts one model of the syllogism. This leads to a combinatorial
explosion of diagrams if one solves syllogisms by drawing diagrams of all
possible models of a pair of premisses, and then searching to see whether some
statement holds of all such models (and is therefore a valid conclusion). This is
the crux of Johnson-Laird’s argument for rejecting Euler as the basis for
syllogistic reasoning, which prompted the invention of mental models notation.
This approach is hopelessly inefficient, but equally easy to improve on by
improving the strategy of diagram selection. Instead of exploring all possible
diagrams, the constraints of the domain allow the reasoner to adopt the ‘weakest
THE CHARACTERISATION OF INDIVIDUAL DIFFERENCES 309
case’ diagram for each premiss, and to combine them into the ‘weakest case’
graphical unification. Adopting this strategy requires corresponding limitations to
be placed on drawing inferences from the final diagram, but with a suitable
strategy for conclusion drawing, the method becomes extremely efficient —
every problem can be solved by constructing a single diagram according to
general principles. Finally, the strategies of representational choice can be
reflected explicitly in the graphics by a simple notation given a systematic
semantics (for a fuller explanation see Stenning & Oberlander 1995). Graphical
representation system plus reasoning strategy leads to a far more efficient
modified graphical system by the incorporation of a theorem prover into a
primitive graphical system. This amounts to an example of Roberts’ honing of a
representational system to gain reasoning efficiency, but takes place by graphical
modifications rather than linguistic ones. The incorporation of strategy into
representation must be a common method of increasing efficiency. It is in fact
not uncommon for subjects to invent something like this ‘cross notation’ to
augment Euler. We have seen our subjects do it, and Ford (1995) reports data
which include similar cases.
But the story about relations between graphical and linguistic systems
doesn’t end there. It turns out that the augmented Euler system is functionally
equivalent to a simple propositional sentential system (see Stenning & Yule
1997). What all these methods abstractly share is that they construct specifica-
tions of critical individuals which are minimal models of the premisses and
which specify the drawing of valid conclusions. The graphical method specifies
them in terms of sub-regions in the plane defined by closed curves representing
properties: the sentential method constructs conjunctions of propositions repre-
senting the same properties. The sentential method lends itself to an optimisation
which saves working memory by only representing the single critical individual,
though in doing this it incurs some extra overhead of a more complex theorem
prover. The graphical method more naturally represents all possible types of
individual, though focusing on a single one by the cross notation. Again it is the
focusing of a parallel strategy of reasoning onto a single serially constructed case
which transforms a graphical method into something in step-by-step correspon-
dence with a linguistic method. This more obviously accords with Roberts’
notion of honing graphical methods towards linguistic ones.
These examples give us a clue that strategy and representation may not be
quite as cleanly separable as our original discussion suggested. True, within a
reasoning system at a particular point in its development, what is strategy and
what is representation may be crisply distinguishable. But when we think of
systems of reasoning evolving during learning and optimisation of a new task,
310 KEITH STENNING AND PADRAIC MONAGHAN
we see that what is strategy at one point may become representation at another.
And we must think of these evolutions if we are to understand human reasoning.
Human reasoning is as much about developing new systems of reasoning as
about operating them.
This returns us to our observation of the independence of graphical ‘opera-
tive skill’ and understanding of the rationale behind it. The same gulf may well
exist for linguistic operation and understanding, but we have just seen some
reasons why such a gulf may be somewhat less crisply evident. The very nature
of linguistic rule systems is that their subsystems are less neatly demarcated and
therefore less susceptible to meta-logical demonstration. Be that as it may, the
graphical case suffices to illustrate an important point.
Even when systems of reasoning by graphical and sentential methods can be
shown to be internally identical, the properties of their representations may give
very different opportunities for meta-reasoning about their properties. The Euler
system is obviously self-consistent to any reasoner who understands intuitively
the basic plane geometry of closed convex curves. Proving the self-consistency
of the equivalent sentential system requires some sophistication. It is also less
likely to occur to a naive reasoner as a property of the system, if only because
the neighbouring fragments of the language do not have this property. If, as we
have argued throughout, we must understand much of human reasoning as
reasoning about systems as much as reasoning within them, then the accessibility
of their properties from the externals of the systems becomes critically important.
This points to the need for further understanding about how users of linguistic
systems structure their understandings of the various fragments which make up
the whole patchwork. Because the borders between one fragment and the next
are less clear, it is also less clear what knowledge is required to derive a theorem
prover to exploit constraints to solve a new task in a new domain.
Characterising cognitive styles computationally requires a better understand-
ing of how reasoners learn to get around in the spaces we have been discussing.
Getting around consists of preferring some strategies and representations for
reasoning to others. Perhaps even more, it consists in different approaches to
learning how to develop efficient ways of reasoning in new domains under new
sets of constraints. The trajectories from naivety to expertise within the space of
possibilities for a new domain and task may be more coherent than the particular
patterns of reasoning at any point along the trajectory. Styles are preferences for
systems of reasoning either more or less applicable across a wide ranges of tasks.
Styles will, of their nature, be more successful in some domains and for some
tasks than others.
It is customary to think of individual differences in terms of individuals
THE CHARACTERISATION OF INDIVIDUAL DIFFERENCES 311
References
Background
construction also form one of the most important sources of empirical evidence
in their favour.
One substantial cost associated with constructing a mental model is that of
premise integration (Evans, Newstead & Byrne 1993). The idea behind this is
quite simple: a representation of the described situation has to integrate informa-
tion across two or more propositions in order to build an appropriate model. The
origin of this idea appears to date back at least as far as William James (1890),
but was articulated as a process theory by Hunter (1957). Hunter suggested that
people manipulate the premises of three-term series problems in order to place
them in a suitable form with which to reach a conclusion.1 He proposed that
problems of the form
Anna is happier than Barbara
Chris is sadder than Barbara
require conversion. In this case readers convert the sentence Chris is sadder than
Barbara into Barbara is happier than Chris. This means that the conclusion Anna
is happier than Chris can be reached without considering the middle term
Barbara (James termed this method the “axiom of skipped intermediaries”).
Similarly, problems of the form
Barbara is sadder than Anna
Chris is sadder than Barbara
require re-ordering. If we consider Chris is sadder than Barbara before Barbara
is sadder than Anna, the axiom of skipped intermediaries gives us Chris is sadder
than Anna as the conclusion. There is only one problem with the simple and
elegant theory that Hunter proposed; it makes the wrong pattern of predictions of
accuracy and reading time for three-term series problems. To take just one
example, Huttenlocher (1968), Clark (1969) and Sternberg (1980) all found that
problems like
A is better than B
B is better than C
took longer to answer than problems such as
A is better than B
C is worse than B
This goes against Hunter’s theory because the latter, but not the former, suppos-
edly require conversion (for a comprehensive review of the evidence see Evans
et al. 1993). Nevertheless, the evidence that premise integration occurs is
considered rather strong. For example, Mayberry, Bain and Halford (1986)
READING TIMES FOR SPATIAL DESCRIPTIONS 319
showed that pairs of sentences which permit integration (those with a shared
term) take longer to read than control sentences (sentences with no shared term).
observation is that, according to Hunter (1957), sentence (2) does not require
conversion; the theories make different predictions about when conversion
occurs. Sentence (3) illustrates that the model view generalizes to two-dimension-
al situations not found in traditional three-term series problems
A B C
D
The final proposition in the trace shows that conversion is not required when a
sentence describes new tokens in relation to old tokens
[ D A below]
The construction processes described here carry with them the implication that
introducing old, given information before new information in a sentence imposes
a greater cost during premise integration. This prediction appears to contravene
the “given-new contract” proposed by psycholinguists (Haviland & Clark 1974).
The mental model account also suggests two other situations where construction
costs are increased. The first, and least interesting, is that the initial sentence of
the description should carry an increased processing burden because two tokens
are being added to the model (indicated by the term start). This prediction is
analogous to that of Gernsbacher and colleagues who argue that laying a
foundation for a mental structure increases reading times (Gernsbacher 1990;
Gernsbacher & Hargreaves 1988). The second additional prediction is that when
a description becomes indeterminate, processing should become more difficult.
For example, consider a fourth sentence
(4) B is to the left of E
In this case the position of E is indeterminate with respect to C and is indicated
in the episodic construction trace like so
[ E B right [ clash [ E C ] ] ]*
What happens after a clash proposition is recorded depends on how a reader
responds. If, as suggested by Mani and Johnson-Laird (1982), readers abandon
model construction post-clash processing load may decrease. Some readers may
however attempt to resolve the indeterminacy by constructing one, or possibly
both, of the possible models entailed by the description (Brédart 1987; Johnson-
Laird 1983).
Evidence in the literature on spatial descriptions for increased processing
cost associated with old-relative-to-new sentences is mixed. It may be worth
noting that in Hunter’s own experiment 11-year olds found problems requiring
READING TIMES FOR SPATIAL DESCRIPTIONS 321
conversion (by his account) easier than those of the form given by sentence (1)
and (2) above (predicted to require conversion by the mental model account).
Ehrlich and Johnson-Laird (1982) concluded that there was little evidence to
suggest a difference in reading times (or accuracy) in favour of either the new-
old or old-new sentence forms in the three experiments they reported. There are
several reasons why such a difference might be hard to find; low statistical
power, materials effects, and differences in tasks. Reading times are a more
sensitive measure of difficulty than is accuracy, but are potentially influenced by
many different factors which need to be controlled for (e.g. word length,
orthography, frequency or concreteness). Tasks such as three-term series
problems are not the most powerful test of the model view outlined above and
two-dimensional spatial problems are considered more decisive (Byrne &
Johnson-Laird 1989). For example, the operations described by Hunter (1957) are
a viable strategy for three-term-series problems (particularly when both premises
are presented simultaneously), and may be used by some participants some of the
time. Three-term series problems with spatial or non-spatial adjectives are
probably also heavily influenced by markedness as suggested by Clark (1969).
The unmarked, positive form of adjectives like “taller” or “better” seem to be
encoded in a simpler, more accessible form than their marked counterparts
“shorter” or “worse” (Clark 1969, 1973). However, the role of markedness in
encoding spatial prepositions (such as those used in the experiments reported
here) is less clear; it is not obvious that “right” and “left” can be categorized as
marked and unmarked prepositional pairs. For instance, the marked form “worse”
implies both relative and absolute badness, whereas neither “to the left of” nor
“below” necessarily imply absolute location in physical space (though this does
not extend to metaphorical usage such as in politics).
This paper analyses a large data set of reading times for spatial descriptions.
The chief focus is on predictions which follow from the computational model
described by Johnson-Laird (1983). The main prediction is that when mental
model construction entails the conversion of a sentence in order to enter a new
token in an existing model, reading times will increase. A secondary prediction
is that when a description is rendered indeterminate (i.e. when a clash occurs)
reading times will increase. Last, the account also predicts that reading times for
the initial sentence will be greater than for subsequent sentences (because two
tokens have to be added to the model).
322 THOM BAGULEY AND STEPHEN J. PAYNE
Data set
The reading time data were pooled across 118 participants from three different
experiments. The participants were recruited at Cardiff University and were paid
or given course credit for taking part. All three experiments were carried out in
two phases. In the first phase participants were presented with a series of spatial
descriptions to read. Each description consisted of four sentences and was
presented by computer, one line at a time. Immediately after each description
participants had to try and remember the description by using the mouse to place
items in an empty grid on the computer screen. Each participant read 8 different
descriptions (4 determinate and 4 indeterminate). Of the 4 indeterminate descrip-
tions half became indeterminate in the third sentence (indeterminate-at-S3) while
the remaining descriptions became indeterminate in the last sentence (indetermi-
nate-at-S4). For the determinate descriptions exactly half the relevant sentences
from the determinate descriptions contained sentences of the new-old form (and
therefore not predicted to require conversion). For the indeterminate descriptions
5 out 12 of the relevant sentences had sentences of the new-old form. Each
participant read the eight descriptions in a random order (new random orders
were generated in turn for each participant prior to the experiment). In phase 2,
participants in the experiments carried out a surprise recognition test (not
necessarily the same test in each experiment). The analyses presented here focus
solely on the initial reading times of participants in phase 1.
Table 1
Determinate description: Corresponding EC Trace:
D1. The blouse is in front of the kilt [start [blouse kilt front]]
D2. The vest is in front of the blouse [vest blouse front]
D3. The shawl is to the left of the blouse [shawl blouse right]
D4. The overcoat is behind the shawl [overcoat shawl behind]
Indeterminate description: Corresponding EC Trace:
I1. The coke is in front of the lemonade [start [coke lemonade front]]
I2. The lemonade is to the right of the vodka [vodka lemonade left]*
I3. The lemonade is to the right of the scotch [scotch lemonade left [clash [scotch vodka]]]*
I4. The vodka is behind the brandy [brandy vodka front]*
Note: Sentences D2, D3 and D4 are examples of the new-old sentence form. I2, I3 and I4 are
examples of the old-new sentence form.
The three experiments were identical in phase 1 except for minor differenc-
es in the wording of the descriptions (the basic form of the descriptions remained
READING TIMES FOR SPATIAL DESCRIPTIONS 323
unchanged). The same basic structure was used for the eight descriptions in each
experiment. The eight descriptions used objects from eight different categories
(animals, birds, clothing, drinks, fruit, instruments, gems and vegetables). In each
experiment participants were randomly assigned to one of several different sets
of materials (different sets were used for each experiment). For each set every
description was randomly allocated to a category and the objects within that
description randomly selected from the five members of the allocated category.
Category members were approximately matched for Kucera-Francis frequency,
imagery and concreteness (Kucera & Francis 1967; Quinlan 1992). Table 1 gives
sample materials typical of those used in phase 1 of all three experiments.
Analyses
Mean reading times per syllable were calculated for those descriptions which
were correctly recalled during phase 1. (Reading times for incorrectly recalled
descriptions were not included in the analysis, but followed a similar overall
pattern, as did the reading times uncorrected for sentence length in syllables).
Two analyses were conducted. In the first analysis a 3 × 4 repeated measures
ANOVA was performed on reading times per syllable with determinacy (deter-
minate, indeterminate-at-S3, or indeterminate-at-S4) and sentence order (S1, S2,
S3 or S4) as the factors.
Figure 1 shows a graph of the means for each combination of determinacy
and sentence order for the 70 participants who contributed to the analysis
(participants with missing cells could not be included in the factorial design).
The interaction between determinacy and sentence order was significant,
F(6,420) = 8.09, MSE = 0.286, p < .0002.
Three planned comparisons were made (these analyses were carried using
paired t tests on the whole data set, not just those included in the factorial
ANOVA). The first comparison tested the prediction that the initial sentences
would take longer to read than later sentences; this was tested by comparing the
mean reading times for the first and second sentences (this excludes consider-
ation of determinacy from the analysis, as all descriptions are determinate prior
to sentence 3). As predicted, reading times for sentence 1 were longer than for
sentence 2, t(112) = 2.51, SE = 0.051, two-sided p = .014. The remaining two
planned comparisons compared reading times for sentence 3 of the indetermi-
nate-at-S3 descriptions and sentence 4 for the indeterminate-at-S4 descriptions
with a control sentence (the corresponding sentence 3 or 4 of the determinate
descriptions). It took participants longer to read sentences which introduced an
324 THOM BAGULEY AND STEPHEN J. PAYNE
1.3
Reading times per syllable (seconds)
1.2
1.1 Determinate
Indeterminate-at-S3
1 Indeterminate-at-S4
.9
.8
.7
S1 S2 S3 S4
Sentence Order
Discussion
Reading times follow the pattern predicted by the operations required to construct
a spatial mental model. The initial sentence takes longer to process than compa-
rable later sentences. Sentences which introduce an indeterminacy result in a
sudden increase in reading times. Last, but not least, sentences of the form new-
old are easier to process than sentences of the form old-new. This is consistent
with the view that old-new sentences, but not new-old sentences, need to be
converted in order to integrate the new token into a spatial mental model.
Conclusions
These findings provide strong support for the three predictions derived from
theory of mental models (Johnson-Laird 1983). In this paper we have used the
episodic construction trace to clarify the predictions about increased processing
cost during premise integration. These predictions follow from the idea that
people construct and manipulate mental models during reasoning and language
comprehension. All three predictions were confirmed in an analysis of reading
times, and therefore support the view that spatial mental models are constructed
when people read and understand spatial descriptions.
The findings reported here go against a simple interpretation of the given-
new strategy proposed in psycholinguistics. In at least one domain it appears that
people find the order new-given easier to read and understand. While this may
seem counterintuitive (particularly in the light of alternative accounts such as that
proposed by Hunter), it makes more sense if you consider the way common
spatial language terms are used. In general, people introduce new spatial
locations relative to known landmarks. Thus people seem to use spatial language
in a way which eliminates an unnecessary conversion cost on readers and
listeners. Natural conversation is much less constrained than the sentence
structures we have used in our experiment, and speakers are readily able to
separate the given-new status of information from the demands it makes on
model construction. For instance, a speaker might introduce a new token as a
potential topic prior to providing the information necessary to add the token to
a spatial mental model (“That new shop I told you about is behind the library”
rather than “The library is in front of that new shop I told you about”). From a
given-new perspective the given information is that there is a new shop (which
had previously been mentioned). The new information is that the shop is behind
the library. From a mental models perspective the new token in the model is the
326 THOM BAGULEY AND STEPHEN J. PAYNE
shop and the old token is the library. This kind of separation between the given-
new contract and the demands of mental model construction requires writers and
speakers to draw on a rich well of general and individual knowledge about their
intended audience (e.g. what they might be interested in, what landmarks they
are familiar with and so on). One promising line of research in this area is to
consider language as “processing instructions” for constructing an appropriate
mental model (e.g. Gernsbacher 1990).
Acknowledgments
The authors wish to thank Keith Stenning and two anonymous reviewers for constructive comments
on earlier versions of this work. Preparation of this research was supported by the United Kingdom
Economic and Social Research Council (project grant number R000235641) and the Faculty of
Science, Loughborough University. Correspondence concerning this paper should be addressed to
Thom Baguley at the Department of Human Sciences, Loughborough University, Loughborough,
Leicestershire, LE11 3TU, United Kingdom (email: T.S.Baguley@lboro.ac.uk).
Notes
1. In a strict sense the theory proposed by Hunter might still be viewed as propositional because
it involves operations applied to the premises rather than to a mental model. In practice his
operational account can be considered a half-way house between propositional theories and later
image or model accounts.
References
Mathias Bollaert
LIMSI-CNRS, Université de Paris-Sud
1. The experiment
Thus in both conditions the material was presented three times consecutively,
defining three stages of learning, to study the evolution of internal representa-
tions during learning.
Visual Presentation
Verbal description
At the upper left, there is the brush. At the upper left, there are the scissors.
Below the brush, there is the teapot. Below the scissors, there is the bottle.
To the right of the teapot, there is the candle. To the right of the bottle, there is the pen.
Above the candle, there is the envelope. To the right of the pen, there is the hammer.
To the right of the envelope, there is the knife. Above the hammer, there is the cigar.
Below the knife, there is the clock. To the left of the cigar, there is the bulb.
A B C A B C
D E F D E F
Figure 2. Materials
332 MATHIAS BOLLAERT
In the visual condition, the subjects learned two configurations. During the test,
they heard a first word and then had to mentally reconstruct the configuration in
which the corresponding object was present, as fast as possible, and focus their
attention on this object. Then they pressed a key, which provided a generation
time. A second word was given, and subjects had to decide whether or not the
corresponding object belonged to the same configuration as the first object, by
pressing the « Yes » or « No » key. This provided an exploration time. Subjects
were not explicitly required to scan the mental image. Thus, the term « explora-
tion » refers not only to the scanning of a mental image, but to any sub-process
involving the visuo-spatial internal representations which may provide the
decision. The verbal condition differed only in the way in which the configura-
tions were presented, that is, sentences describing the relative positions of pairs
of objects. After the three stages of learning in one condition, the subjects went
through the experiment again with the other condition, which involved new sets
of objects (half of the subjects started with each condition). At the end, they
were asked to answer a follow-up questionnaire.
2. Results
Learning had a significant effect on the generation and exploration times, under
both conditions, with the longest response times occurring at the first stage of
learning, and the shortest ones during the last stage (Figure 3). The percentage
of correct responses also increased during the three stages, from 78%, to 87%,
to 90%. Both generation and exploration times were shorter when the informa-
tion was presented visually, but the effect was not significant (Figure 4). For the
generation times, the difference was obvious for the first and the second stages,
and almost disappeared in the third stage. This may indicate that the processes
responsible for generating the internal representations from long term memory
became similar for the two conditions. The differences in response times
between conditions were smaller for the exploration times.
2.1 Generation
In the visual condition, the site of the object had a significant effect on genera-
tion times, with site B having significantly shorter generation times than site C
(Figure 5). Closer examination revealed that the objects in the center of the
configuration resulted in shorter generation times. The patterns of learning during
the three stages also differed significantly. For the objects in the corners,
GENERATING AND EXPLORING VISUAL MENTAL IMAGES 333
Figure 3. Learning
Figure 4. Conditions
2.2 Exploration
The main results for exploration times were quite surprising, as there was a
significant reversed mental scanning effect: the response times were shorter for
distant objects than for close ones. For visually-presented information, the
reverse scanning effect was significant for the second and third stages of trials,
and for the whole set of data (Figure 8). For verbally-presented information, the
effect was significant for all stages and for the whole set of data. However, in
this experiment, the material was not specifically designed for studying the
already well known scanning effect, and the distant pairs were only between the
sites D and F.
As expected, we found an effect of the description path (Figure 9). The
time taken to answer the AB pair was longer with the loop-like path than with
the S- like path, although the difference was not significant (about 125 ms).
When we grouped together the AB and EF inferred pairs (in the S-like path)
instead of only the AB pairs, the effect was still present and remained non-
significant (about 50 ms less for the small inferences). Also, the pairs explicitly
336 MATHIAS BOLLAERT
We have also investigated other effects which may explain the obtained results.
Here we present the clearest ones only. We studied how each site influences the
memorization of the item located at its position. Concerning exploration,
response times for the EF pair were significantly shorter than for the DE pair and
response times for the ED pair were also significantly shorter than for the FE
pair. Moreover, response times obtained for the site D or F as a target were
similar, independently of the distance between the two items. So what we called
the reverse scanning effect can be explained in terms of higher accessibility of
sites D and F. We also studied if some directions were privileged during mental
exploration. For both conditions, the response times were shorter when the
GENERATING AND EXPLORING VISUAL MENTAL IMAGES 337
second item was to the right of the first one than when it was to its left.
Moreover, we found that response times were significantly shorter when the
second item was below the first one than when it was above. This may be partly
responsible for better accessibility for sites D and F than for sites A and C.
338 MATHIAS BOLLAERT
3. Discussion
The conditions had an effect on both generation and exploration times. Thus, the
internal representations constructed from verbal descriptions seem at first glance
to differ from those constructed from visual presentation. But the path effect
during generation was probably due to the fact that subject re-scanned the image
after generating it, following the path in the verbal condition (cf. Kosslyn 1994).
So the internal representations generated under both conditions might become
similar while the processes applied during exploration become more dissimilar,
resulting in response times that differed with the condition of presentation during
exploration as well as for the generation of the visuo-spatial internal representa-
tion. The unexpected position effect on generation times may be due to the fact
that subjects focused more on the center of the image when it was presented
visually, and thus better memorized the objects in the center. We referred above
to the high accessibility of sites D and F, independently of the distance from the
source item. This better accessibility during exploration seems to be due to the
conjunction of better accessibility of the objects in the corners and of better
learning of the target’s relative position when it is below the source. This may
indicate that two different subsystems are used for the memorization of the
spatial information: one storing the absolute position of the target, responsible for
better accessibility of objects in the corners, and the other one storing the relative
position between objects, being responsible for the path effect and better
accessibility when the target is below the source. The same idea of two subsys-
tems for shifting visual attention has been developed by Kosslyn, with one
subsystem for the categorical properties, more involved in the visual tasks related
to language, and one for the coordinate properties (Kosslyn 1994; Kosslyn &
Koenig 1992).
The results described here and those of earlier studies were used to build a
psychological model of the processes used by people to explore a mental image
(Figure 10). The model describes the various abstraction levels in the ventral
system, with three modules: the Object memory for higher visual areas (like
STPa), which is independent of the size of objects; the Feature memory for
intermediate visual areas, like CIT or AIT, which is size dependent; and the
Shape memory for retinotopically organized areas, which is equivalent to the
visual buffer in Kosslyn’s model. The internal representations in these three
GENERATING AND EXPLORING VISUAL MENTAL IMAGES 339
modules slowly decay with time. The Attention shifting system is used to shift the
visual attention between the different parts of the Shape memory, and contains
two subsystems: a between-objects association for categorical spatial relations,
and an object-position association for coordinate spatial relations. The results of
both subsystems are summed. There is no competition algorithm between
subsystems as they are cooperative rather than competitive. However, there is a
competition after the summation, in the Attention zone, accounting for response
times proportional to the activation of the selected unit. Once the position of the
target is found in the Attention zone, the contrast of this region of the Shape
memory is increased, which permits the features of the item under examination
to be extracted in the Feature memory. As for the Shape memory, the Attention
zone increases the contrast and selects the part of the Feature memory to be
identified in the Object memory. The structure of the Feature memory is such that
it can be used as a visual store, without activating the Shape memory, allowing
less visual strategies. The Spatial schema is a module used to extract spatial
information, or, more precisely, the positions of the different parts of the scene,
from the scene represented in the Shape memory. This module uses the spatial
information to generate the exploration paths used when scanning a mental
image, for instance from left to right, line by line. In the verbal condition, the
Spatial schema learns the description path, and uses it during the generation of
the visuo-spatial internal representations. We also assume that other associations
are learned: Position — Item associations between the Attention zone and the
Object memory. These associations allow subjects to use different strategies for
achieving the tasks, such as a purely spatial strategy.
Concerning the learning processes, we assume that the internal spatial
representations constructed during the learning period in the visual condition are
more activated than those constructed in the verbal condition. But the main
difference between conditions is that the spatial information is extracted and
identified from lower to higher areas during learning in the visual condition,
while the categorical spatial information is directly generated from the verbal
system to the visuo-spatial system in the verbal condition, and only the coordi-
nate spatial relations are identified from the Attention zone, as they are not
explicitly stated in the verbal description. We also assume that the categorical
spatial information extracted from the visual areas in the visual condition is more
activated when scanning from top to bottom, due to environmental factors. This
leads us to infer that activation of the categorical spatial information should be
greater for this direction of exploration in the visual condition, and thus should
result in shorter response times. Indeed, this is the case in the experiment:
response times in the visual condition are significantly shorter when scanning
340 MATHIAS BOLLAERT
from up to bottom than when scanning from bottom to top; the same tendency
can be found in the verbal condition, but it is not significant. Regarding the
coordinate spatial relations, we assume that they are more activated when
computed for objects in the corners, especially in the lower corners, and then
better memorized. Thus, the absolute positions of objects in the corners are more
accessible than the absolute positions of objects in the center. As coordinate
spatial relations are identified in both conditions, the corners should be more
accessible in both condition. This was indeed the case in the experiment. Lastly,
we assume that the internal spatial representation also decays with time. The time
to scan from one item to another following the description path in the verbal
condition is longer when the relative position of the two items has to be inferred
than when it is explicitly stated in the description. Therefore, the spatial internal
representation is less activated for inferred spatial relations, which results in
poorer memorization of the associations and then in longer response times.
The units of the Shape memory are spatially organized (Figure 11). Reactivation
in the Shape memory is performed with recurrent connections from the Shape
selection to the Shape memory. The Attention zone is also spatially organized.
The Feature memory has two subsystems, a local subsystem which first identifies
the features in each region of the Shape memory, and a global subsystem, in
which is copied the activation of the local Feature memory for the region where
the attention focuses. As for the Shape memory, the signal is reactivated via
recurrent connections linking the global to the local Feature memory. The
features are used to identify the item in the Object memory. The Object memory
has a second layer, a temporal buffer, identical to the first one, and the content
of the first layer is moved into the second layer at each step of propagation, to
simulate the progressive deactivation of the internal representation. During the
generation, the Spatial schema activates sequentially the units of the Attention
zone, following the description path in the verbal condition, and predefined paths
in the visual condition. The object associated with the current position is found
by the Item — Position associations. The features of this object are generated in
the global Feature memory, and then in the local Feature memory. During the
exploration, the spatial information associated with the items represented in the
Object memory is activated in the coordinate and the categorical attention shifting
subsystems. The categorical subsystem involves another module, which computes
the position of the target, given the position of the prime and the relative motion
342 MATHIAS BOLLAERT
to be made to join the position of the target. The implementation of this module
uses a temporal buffer of the Attention window, too. The responses of both
subsystems are filtered and then added to the Attention zone. We assume that the
visual system has little influence on this task, as it is more spatial than visual.
Thus, the activation level in the Attention zone is assumed to be mostly responsi-
ble for the response times and is used as the result of the simulations.
Learning occurs as follows: The dashed links are learned with a Hebbian
algorithm. The object identification and feature generation processes are learned,
in order to implement an exemplar memory system and to simulate the presenta-
tion of the items during the experiment. The learning patterns are constructed as
indicated in the psychological model: in the visual condition, we assume that
subjects scan their mental image using the Spatial schema; in the verbal condi-
tion, the visuo-spatial internal representations are generated from higher to lower
modules. Learning proceeds in three stages, as in the experiment. All the other
links represent the prior knowledge of the subjects. They are learned with a
backpropagation learning algorithm, except for the competition in the Attention
zone and the simpler sub-processes like deactivation, reactivation or copy from
one module to another, for which the weights were calculated. All the modules
were fully implemented except the Spatial schema. The specifications of this
module were used to produce the patterns for the dependent processes. These
patterns were some predefined exploration paths (in the visual condition) and the
description paths memorized during the learning period (in the verbal condition).
We do not assume that visual imagery involves retinotopical visual areas, which
is still the subject of debate. The execution of the task relies more on spatial
ability than on visual capacity (via the Attention zone). The model accounts for
less visual strategies with the Position — Item associations and the Feature
memory. Thus, it is possible to manipulate rather rich visuo-spatial internal
representations to perform the task, without activating the retinotopical visual areas.
The simulations shown here are for a fully spatial strategy used during explora-
tion. This strategy was chosen because 52% of the subjects in the visual condi-
tion and 68% in the verbal condition declared in the follow-up questionnaire that
they did not really « see » the objects in the visual condition, but rather used a
spatial internal representation. In this spatial strategy, activation of the position of the
target in the Attention zone allowed the system to answer the question correctly,
using the Position — Item associations (only the positive answers were simulated).
GENERATING AND EXPLORING VISUAL MENTAL IMAGES 343
Other simulations were also used to reproduce the generation in the Feature
memory. They reproduced some of the results found in the experiments, notably
better identification of the features of an object in the visual condition than in the
verbal condition, which results in shorter response times in the visual condition.
The linear trend in the verbal condition was also reproduced, due to the sequen-
tial generation of the items. The model also effectively generates, stores,
identifies and reactivates visual internal representations in the Feature memory,
but a strong visual similarity effect makes the results difficult to compare with
those from the experiment. The strength of this similarity effect seems to be
partly due to the few features used during the simulation (12) and to the low
resolution of the visual representation of the items, drawn on a 7 × 7 grid. But the
fact that our network shows a similarity effect is a clue to its validity, as this
effect is also found in human data (see e.g., Logie 1995).
All the following data are levels of activation in the Attention zone. Accord-
ing to the competition in this module, the response time is shorter when activa-
tion is greater. The competition is not performed here, as it mostly depends on
arbitrary parameters, and only excites the single activated unit. Thus, the
competition provide no new information on the simulation. The mean levels of
activation during exploration were 0.570 for the first stage of learning, 0.729 for
the second and 0.854 for the third, corresponding to decreasing response times
during learning. The difference was bigger between the first and second stages
than between the second and third, as in the experimental results. The activation
was greater for the visual condition (0.589, 0.766, 0.890) than for the verbal
condition (0.560, 0.710, 0.837), resulting in shorter response times for the visual
condition. The levels of activation were as expected from the results of the
experiment: in the visual condition, the mean activation of the unit selected in the
Attention window was lower for close objects (0.736) than for distant objects
(0.821); in the verbal condition, the mean activation for the large segments
(LSEGM) was greater (0.791) than the mean activation for the small segments
(SSEGM), the small inferences (SINFE) and the large inferences (LINFE), which
were 0.699, 0.665 and 0.629 respectively. In the visual condition, sites D (0.803)
and F (0.817) were more accessible than the other sites (all under 0.712). This
holds true for the verbal condition (sites D and F: 0.755 and 0.739; other sites all
under 0.699). Finally, we reproduced the effect of the direction of scanning,
particularly with better accessibility to the target object when it was below the
source (visual condition: 0.844; verbal condition: 0.722) than when the source
was below the target (visual condition: 0.679; verbal condition: 0.704).
GENERATING AND EXPLORING VISUAL MENTAL IMAGES 345
7. Conclusion
The results of our experiment and data on the structure of the cortical areas of
vision were used to construct a modular network that gives patterns of results
that are compatible with the experimental results. The model is able to generate,
reactivate, store and identify visual internal representations. Although some
numerical results were obtained with the generation, complete simulations were
not done because of a strong similarity effect. We will extend the number of
features and the resolution of the image to reduce this effect. Concerning the
exploration, the results obtained with the model were very similar to those
obtained in the experiment. We successfully reproduced the effect of learning,
the effect of the condition, and all the main effects observed in exploration,
under both conditions. Hence this modular network provides a valid model of the
sub-processes involved in learning, generating and exploring visual mental
images.
References
Introduction
This chapter reviews a series of studies which have examined the role that verbal
and visuo-spatial working memory may play during the performance of mental
synthesis, in which visual imagery is used to manipulate and combine separate
components into new configurations. Mental synthesis has been shown to be
involved during a wide variety of different creative tasks, including the visual-
isation and development of scientific models (Miller 1984), architectural design
(Reed 1993), and many aspects of everyday problem-solving (Kaufmann 1988;
Finke 1990). Also, anecdotal reports from highly creative individuals such as
Einstein (1949), Watson (1968), Mozart or Poincare (reported in Vernon 1970)
suggest that mental imagery can aid not only the conceptualisation of a problem,
but also the discovery of creative and original solutions. For example, mental
synthesis has been proposed as a useful reasoning technique in theoretical
physics (Finke 1990). Einstein advocated ‘combinatory play’ as a key feature of
his own scientific thinking, during which he mentally reproduced and combined
images so as to discover new insights and relationships between the constituent
parts (Gardner 1993). Mental synthesis has also been claimed to play a funda-
mental role during the concept phase of creative design (Purcell & Gero 1998;
Verstijnen et al. 1998). Suwa and Tversky (1996) collected protocols of design
sessions from student and expert architectural designers and found that both
groups used sketches based on the results of mental synthesis while designing,
although the expert designers were better at considering perceptual and functional
features and relations within potential designs. On a more general level, the
mental recombining of images has been implicated during everyday reasoning
348 DAVID G. PEARSON AND ROBERT H. LOGIE
The issues raised above will be addressed within the theoretical framework of a
revised model of the visuo-spatial component of working memory (Logie 1995;
Pearson, Logie & Green 1996; Logie & Pearson 1997; Pearson, Logie &
Gilhooly, 1999). In this model the temporary storage of visual material is
achieved by a passive memory system, referred to as a ‘visual cache’, and the
WORKING MEMORY AND MENTAL SYNTHESIS 349
Figure 1. Example of response sheet for the creative synthesis task (presented shapes were
circle, D, square, d, eight). Data taken from Pearson, Logie and Gilhooly (1999)
These results are consistent with the hypothesis that the inner scribe
component of visuo-spatial working memory is utilised during the spatial
manipulation of imaged shapes. Such manipulation is necessary in order to
produce legitimate patterns during mental synthesis, and concurrent tapping
appears to interfere with this process, resulting in the pattern of disruption
observed. In contrast, the scribe would seem to have a much less significant role
in the retention of the presented parts themselves.
There is little evidence that the degree of correspondence between the verbal
labels and the drawings of the synthesised patterns is affected by concurrent
tapping. This suggests that the locus of the interference is during the actual
combination of the imaged parts, and that once synthesis is successfully achieved
there is no further disruption of concurrent tapping on participants’ ability to
assign a verbal label to the completed pattern. This result is consistent with a
task analysis model of creative synthesis reported by Helstrup and Anderson
(1991), who divide mental synthesis into two phases; a construction phase, in
which the presented shapes are mentally assembled into a pattern, and a verbal
interpretation phase, in which the synthesised pattern is then given an appropriate
verbal description. The results of Pearson et al. suggest that the main involve-
ment of the inner scribe is during the construction phase of the synthesis task.
WORKING MEMORY AND MENTAL SYNTHESIS 351
resources (i.e., Salway & Logie 1990; Logie & Salway 1995) and suggest that
the generation and maintenance of visual images within a visual buffer is best
viewed as a function of the central executive rather than the visual cache.
Instead, the cache operates primarily as a storage mechanism, which provides
temporary storage of visual material during imagery tasks if required. Crucially,
however, any material stored in the cache is deemed functionally separate from
material maintained via the central executive within the visual buffer. Hence,
there is a clear distinction in this revised model between the maintenance of
visual images by the operation of the central executive, and the rehearsal of
visual material within the visual cache by the operation of the inner scribe. Such
a separation is not without precedent, as a comparable distinction has been made
in the literature between auditory imagery and verbal rehearsal within the
phonological store component of verbal working memory (Baddeley & Logie
1992; Reisberg et al. 1991). In addition, the involvement of attentional resources
during the generation and maintenance of visual images is also a feature of many
current cognitive models of mental imagery (e.g., Kosslyn 1980, 1994).
Although mental imagery tasks have been traditionally assumed to draw primari-
ly upon visuo-spatial cognitive systems, a growing number of empirical studies
have demonstrated the important role that verbal encoding and rehearsal can play
during creative discoveries based on imagery. Brandimonte and colleagues
(Brandimonte, Hitch & Bishop 1992; Brandimonte & Gerbino 1993) have
reported that concurrent articulatory suppression can significantly improve
participants’ performance on image subtraction and reinterpretation tasks, but
only for those items which can be easily verbally encoded. In addition, Intons-
Peterson (1996) has shown that performance of image subtraction declines when
linguistic processing is encouraged, and increases when the visual aspects of the
task are emphasised. These results have been interpreted as demonstrating that
verbal representations can significantly impair the ability to make novel discover-
ies on the basis of visual imagery alone, and that concurrent articulatory suppres-
sion can remove this effect by preventing the use of verbal labelling during
imagery. However, this interpretation of the effect of articulatory suppression is
contentious, as it implies that suppression operates on a much deeper level of
verbal processing than has previously been assumed (Logie 1995; Reisberg 1996).
In terms of the task analysis model of mental synthesis proposed by
Helstrup and Anderson (1991), articulatory suppression could potentially interfere
with participants’ performance by disrupting the verbal interpretation phase of
the process. However, an alternative prediction is that articulatory suppression
WORKING MEMORY AND MENTAL SYNTHESIS 353
will instead disrupt the verbal rehearsal of the presented shapes themselves, and
thereby lower performance through memory interference rather than at the verbal
labelling stage of the synthesis task. Pearson et al. (1999) addressed these issues
by examining the effect of concurrent articulatory suppression on performance
of the creative synthesis task. If the focus of any interference was in terms of
verbal rehearsal of the presented parts, then disruption should have been evident
for the number of trials in which all parts were correctly recalled, but absent in
terms of the degree of rated correspondence between the verbal label and
associated drawing. The results of the study supported the hypothesis that the
phonological loop component of working memory is involved during the
rehearsal of verbal representations during performance of the creative synthesis
task. Concurrent articulatory suppression not only significantly lowered the
number of legitimate patterns that participants produced, but also significantly
lowered the number of trials on which they were able to successfully recall all
of the presented shapes.
As there was a specific effect of suppression on participants’ memory for
the shapes themselves, this suggests they were continuing to rehearse the verbal
labels during the construction phase of the synthesis task. There was also a
significant increase in the variance of responses for articulatory suppression when
it was performed concurrently with creative synthesis, and this is also indicative
of mutual interference occurring between the two cognitive tasks. Finally, the
fact that there was no significant effect of suppression upon degree of rated
correspondence between verbal labels and associated drawings suggests that the
cause of the interference was based upon the verbal rehearsal of the labels for
the shapes, rather than during the verbal interpretation phase of the synthesis
task, in which a verbal label was assigned to the synthesised pattern.
Conclusions
Synthesis alone
0
Three parts Five parts With RNG
Figure 3. Effect of concurrent oral random number generation (RNG) on mean number of
legitimate patterns produced for three and five part trials (from Pearson, Logie & Gilhooly,
1998)
356 DAVID G. PEARSON AND ROBERT H. LOGIE
of images (such as changes in size or orientation) are assumed to also require the
operation of the inner scribe component. Concurrent spatial tasks have been
shown to interfere with mental rotation (Logie & Salway 1990), mental compari-
sons (Engelkamp, Mohr & Logie 1989), and mental animation (Sims & Hegarty
1997). Such a role for the inner scribe in manipulating imagery is also supported
by the finding discussed previously that concurrent spatial tapping results in
significantly fewer legitimate patterns being produced during mental synthesis
(Pearson et al., 1999).
The model depicted in Figure 2 also allows for the visual cache to store
material during synthesis as a back-up store for imagery, in a fashion analogous
to the way in which the phonological loop is used to maintain the identity of the
shapes being manipulated via mental imagery. However, as no studies have so
far demonstrated any significant interference from concurrent visual noise, this
means currently any involvement of the visual cache during mental synthesis
must remain speculative.
In conclusion, the research findings reviewed in this chapter have demon-
strated the involvement of spatial working memory during the spatial manipula-
tion of material during mental synthesis, and have also demonstrated the
importance of verbal representations during what may initially appear to be
primarily a visuo-spatial task. Specifically, mental synthesis appears to occupy
the resources of the working memory system as a whole, rather than relying
solely on the operation of the visuo-spatial sketchpad component.
References
A D
Aesopworld xii, 164 Davidson, D. 278, 281
affordances x, 24, 25, 27, 30, 227 deictic reference 70, 152, 153, 159
agent 150, 153, 164, 174, 177, deixis 73, 129, 169
182-184, 257, 297 describer 35-40, 42
Archimedes xiii, 201, 208-210 diagrammatic reasoning vi, x, xii,
147, 209, 211, 312
B dispositif 142, 143
Baddeley, A. 7, 17, 235, 245, 348, Dowty, D. 281
349, 351-353, 356, 357, 359
Barwise, J. xiii, 303, 311 E
Bloomsday xiv educational software 185, 187
Boltzmann, L. 185 egocentric cognition x
Bourdieu, P. 137, 141-145 Einstein, A. xii, 185, 347, 357
episodic construction trace xv, 319,
C 320, 325, 327
central executive system 351, 355 Etchemendy, J. 303, 311
Chameleon vi, xii, 149, 150, 152-156, Evans, G. 5, 8, 18, 33, 35, 42, 43,
159, 162-165, 169 318, 327
Chomsky, N. 273, 281 explorer 35-40, 42
cognitive architecture 19, 21, 22, 29
cognitive map xi, xiv, 66, 105, 114, F
125, 126 Feldman, J. 163, 166, 211, 212
cognitive style 301, 302 figure object 179, 251
cognitive vector system 49 Foucault, M. 141-145
Cohn, A. 83, 84, 86, 87, 103, 104 frame of reference 4, 16, 33, 45, 254,
configurational knowledge 105-107, 256, 294
110-112, 122-124 Franklin, N. xi, 48, 49, 65, 71, 74,
coupled architectures 23, 24 75, 80, 81, 285, 287, 296
creative synthesis 349-351, 353-355,
358
362 SUBJECT INDEX
G K
Gapp, K-P 265, 287, 296 Kamp, H. 282
Gardner, H. 186, 197, 347, 357 Kant, I. 131
gender differences ix, 3-19, 122 Kosslyn, S. 65, 66, 149, 167, 329,
Geometry Machine 200, 201 338, 345, 348, 352, 357
Gibson, J.J. 24, 30, 47, 66, 123, 124,
227, 229 L
GIS (Geographical Information landmarks 3, 21, 28, 29, 33, 34, 41,
Systems) vi, x, xii, xiii, 17, 83, 43, 45, 46, 49-53, 55-59, 61-65,
147, 213-216, 225, 227-232, 282, 105, 124, 325, 326
297 large-scale space 45-47, 58, 220
Gulyas, B. 329, 346 Lemon, O. 300, 312
locomotion 19, 23, 29, 124
H Logic Theorist 199, 200
habitus 141, 142 LTM (long term memory) x, 332
Hayes-Roth, B. 45, 67, 317, 327
HCI (Human-Computer interaction) M
128, 173, 181 map reading 4-6, 11, 16-18
Herman, J. 106, 124 Marr, D. 250, 265
Herskovits, A. xiv, 136, 162, 166, mass media ix
178, 179, 181, 184, 214, 220, 229, Mcluhan, M. 144, 145
250, 251, 252-257, 261, 264, 265 Mcnamara, T. 47, 58, 66, 317, 328,
hippocampal place cells 106 329, 345
hippocampus xi, 125 mental imagery xiii, 18, 168, 186, 197,
Hypergami xii, 187-192, 195, 197 345, 347, 351, 352, 356, 357, 359
mental models v, xi, xv, 30, 69, 71,
I 73-75, 78-81, 173, 230, 235, 245,
image schema 215, 216, 218, 220 265, 292, 296, 297, 300, 308, 319,
intelligent multimedia ix, x, 165, 166, 325-328
175 metric relations 214
intellimedia x, xii, 149-153, 164, 165, Mozart, W. 347
167 multimedia documents vi, xiii,
intersubjective cognition x 233-235, 239, 245
invariants xv, 21 multimodal vi, x, xii, 150, 156, 163,
164, 167, 168, 171, 172, 181, 312
J multiple relations model 178
Jackendoff, R. xiv, 250, 253, 256,
265, 271, 276, 281, 282 N
Johnson, M. xiii,, 214-216, 220, 230 navigation ix, 4, 17, 18, 43, 45, 46, 67,
Johnson-Laird, P. 30, 70, 81, 230, 106, 123-126, 130, 141, 268, 270,
235, 245, 250, 265, 285, 297, 308, 272, 273, 277, 278, 281, 282, 327
317, 319, 320, 321, 325-327 Nous Research ix, 171, 249
SUBJECT INDEX 363
S T
Schneiderman, B. 173 Talmy, L. 251, 265
self-reference v, xi, 45, 48-50, 52, 61, temporal scenario 233, 234
65 Tensors xv
sex differences x, 18 tent-maze v, 105, 107, 108, 111, 120,
Shephard, R. xiii 122, 123
shortcuts 34, 35, 37, 39-42 territory 28, 142
Sonas vi, x, xi, xii, 171, 174, 175, theme ix, x, xi, 138, 177, 179, 186,
178, 179, 181-183, 264 190, 197, 257, 272-274, 278
spatial cognition 1, i, iii, v, vi, ix, 3, topological relations 103, 104, 214,
4, 17, 19, 33, 34, 43, 66, 79, 80, 217, 228
107, 124-127, 136, 149, 166, 184, topology v, ix, 83, 84, 86, 88, 93,
197, 227, 229, 231, 233, 235, 244, 103, 104, 185, 197, 219, 231
250, 264, 265, 296, 297 transformation matrix 69, 286-290,
spatial heuristics 196 293
spatial inferences vi, xv, 29, 80, 81, Tversky, B xi, 14, 18, 34, 35, 38, 43,
226, 232, 285, 296, 329, 346 46, 48, 65, 67, 71, 74, 75, 80, 81,
spatial language 163, 175, 249, 253, 235, 243, 245, 271, 281, 283, 285,
264, 265, 272, 297, 325 296, 317, 328, 347, 359
spatial memory v, 45, 46, 58, 66,
124-126
364 SUBJECT INDEX
V W
verbal condition 330, 332-335, working memory vii, x, 7, 8, 11,
337-339, 341, 342, 344 16-18, 47, 235, 236, 309, 345, 347,
virtual reality ix, 138, 264, 267, 268 348, 350-353, 355-359
visual condition 330, 332, 334, 336, WYSIWYG 234
339, 341, 342, 344
visual storage 351 Z
Vitra xii, 163, 296 zoom and filter 173
In the series ADVANCES IN CONSCIOUSNESS RESEARCH (AiCR) the following titles
have been published thus far or are scheduled for publication:
1. GLOBUS, Gordon G.: The Postmodern Brain. 1995.
2. ELLIS, Ralph D.: Questioning Consciousness. The interplay of imagery, cognition, and
emotion in the human brain. 1995.
3. JIBU, Mari and Kunio YASUE: Quantum Brain Dynamics and Consciousness. An intro-
duction. 1995.
4. HARDCASTLE, Valerie Gray: Locating Consciousness. 1995.
5. STUBENBERG, Leopold: Consciousness and Qualia. 1998.
6. GENNARO, Rocco J.: Consciousness and Self-Consciousness. A defense of the higher-order
thought theory of consciousness. 1996.
7. MAC CORMAC, Earl and Maxim I. STAMENOV (eds): Fractals of Brain, Fractals of
Mind. In search of a symmetry bond. 1996.
8. GROSSENBACHER, Peter G. (ed.): Finding Consciousness in the Brain. A neurocognitive
approach. 2001.
9. Ó NUALLÁIN, Seán, Paul MC KEVITT and Eoghan MAC AOGÁIN (eds): Two Sciences
of Mind. Readings in cognitive science and consciousness. 1997.
10. NEWTON, Natika: Foundations of Understanding. 1996.
11. PYLKKÖ, Pauli: The Aconceptual Mind. Heideggerian themes in holistic naturalism. 1998.
12. STAMENOV, Maxim I. (ed.): Language Structure, Discourse and the Access to Conscious-
ness. 1997.
13. VELMANS, Max (ed.): Investigating Phenomenal Consciousness. Methodologies and Maps.
2000.
14. SHEETS-JOHNSTONE, Maxine: The Primacy of Movement. 1999.
15. CHALLIS, Bradford H. and Boris M. VELICHKOVSKY (eds.): Stratification in Cogni-
tion and Consciousness. 1999.
16. ELLIS, Ralph D. and Natika NEWTON (eds.): The Caldron of Consciousness. Motivation,
affect and self-organization – An anthology. 2000.
17. HUTTO, Daniel D.: The Presence of Mind. 1999.
18. PALMER, Gary B. and Debra J. OCCHI (eds.): Languages of Sentiment. Cultural con-
structions of emotional substrates. 1999.
19. DAUTENHAHN, Kerstin (ed.): Human Cognition and Social Agent Technology. 2000.
20. KUNZENDORF, Robert G. and Benjamin WALLACE (eds.): Individual Differences in
Conscious Experience. 2000.
21. HUTTO, Daniel D.: Beyond Physicalism. 2000.
22. ROSSETTI, Yves and Antti REVONSUO (eds.): Beyond Dissociation. Interaction be-
tween dissociated implicit and explicit processing. 2000.
23. ZAHAVI, Dan (ed.): Exploring the Self. Philosophical and psychopathological perspectives
on self-experience. 2000.
24. ROVEE-COLLIER, Carolyn, Harlene HAYNE and Michael COLOMBO: The Develop-
ment of Implicit and Explicit Memory. 2000.
25. BACHMANN, Talis: Microgenetic Approach to the Conscious Mind. 2000.
26. Ó NUALLÁIN, Seán (ed.): Spatial Cognition. Selected papers from Mind III, Annual
Conference of the Cognitive Science Society of Ireland, 1998. 2000.
27. McMILLAN, John and Grant R. GILLETT: Consciousness and Intentionality. 2001.
28. ZACHAR, Peter: Psychological Concepts and Biological Psychiatry. A philosophical analy-
sis. 2000.
29. VAN LOOCKE, Philip (ed.): The Physical Nature of Consciousness. 2001.
30. BROOK, Andrew and Richard C. DeVIDI (eds.): Self-awareness and Self-reference. n.y.p.
31. RAKOVER, Sam S. and Baruch CAHLON: Face Recognition. Cognitive and computa-
tional processes. n.y.p.
32. VITIELLO, Giuseppe: My Double Unveiled. The dissipative quantum model of the brain.
n.y.p.
33. YASUE, Kunio, Mari JIBU and Tarcisio DELLA SENTA (eds.): No Matter, Never Mind.
Proceedings of Toward a Science of Consciousness: fundamental approaches, Tokyo 1999.
n.y.p.
34. FETZER, James H.(ed.): Consciousness Evolving. n.y.p.
35. Mc KEVITT, Paul, Sean O’NUALLAIN and Conn Mulvihill (eds.): Language, Vision,
and Music. Selected papers from the 8th International Workshop on the Cognitive Science of
Natural Language Processing, Galway, 1999. n.y.p.