AI & Soc (2007) 21:567–605
DOI 10.1007/s00146-007-0103-8
ORIGINAL PAPER
Entrainment and musicality in the human system
interface
Satinder P. Gill
Received: 12 December 2006 / Accepted: 21 February 2007 / Published online: 25 May 2007
Ó Springer-Verlag London Limited 2007
Abstract What constitutes our human capacity to engage and be in the same
frame of mind as another human? How do we come to share a sense of what ‘looks
good’ and what ‘makes sense’? How do we handle differences and come to coexist
with them? How do we come to feel that we understand what someone else is
experiencing? How are we able to walk in silence with someone familiar and be
sharing a peaceful space? All of these aspects are part of human ‘interaction’. In
designing interactive technologies designers have endeavoured to explicate, analyse
and simulate, our capacity for social adaptation. Their motivations are mixed and
include the desires to improve efficiency, improve consumption, to connect people,
to make it easier for people to work together, to improve education and learning. In
these endeavours to explicate, analyse and simulate, there is a fundamental human
capacity that is beyond technology and that facilitates these aspects of being, feeling
and thinking with others. That capacity, we suggest, is human entrainment. This is
our ability to coordinate the timing of our behaviours and rhythmically synchronise
our attentional resources. Expressed within the movements of our bodies and voices,
it has a quality that is akin to music. In this paper, disparate domains of research
such as pragmatics, social psychology, behaviourism, cognitive science, computational linguistics, gesture, are brought together, and considered in light of the
developments in interactive technology, in order to shape a conceptual framework
for understanding entrainment in everyday human interaction.
S. P. Gill (&)
School of Computing Science, Middlesex University,
Ravensfield House, The Burroughs, Hendon, London NW4 4BT, UK
e-mail: spg12@cam.ac.uk
123
568
AI & Soc (2007) 21:567–605
Introduction
The last 20 years have seen a transition in the ways in which designers think about
human cognition, communication and the role of the body in their conception of
knowledge [from residing in an autonomous individual to being distributed across
individuals (e.g. distributed cognition, Hutchins 1995)], intelligence (from individual to social) and being human (from being a black box to being multi-modal and
multi-sensory, Sha 2002). This transition is a fundamental shift from a focus on
cognition as disembodied to being embodied (Andersen 2003), evident in the
definitions and inclusions of the social (embodied cognition, gesture) and emotion,
in the field of cognitive science and designs of interactive AI, Art and Robotics
technologies. Part of this shift has been motivated by the engagement of the artistic
and performance narrative domains that have reflected on the nature of public, home
and entertainment spaces by using technology to explore and extend our
engagement spaces (e.g. Topological Media lab,1 Sha 2005). Art demands a
consideration of our senses and the aesthetic, and performance arts demands a
consideration of the communication environments and performance structures (e.g.
mixed media dance). Technology’s appropriation by the narrative domains has
created possibilities of analogue and digital symbiotics [e.g. Sponge, TGarden,
TG2001, http://sponge.org/projects/m3_tg_intro.html; Infomus Lab2 (Camurri)]
beyond the boundaries of the typewriter and TV screen metaphor that most of us
work with and communicate through.
There is one fundamental conception that frames the design of human-machine
interaction and that is the information theoretic signal transmission model (Shannon
and Weaver 1949) which involves a speaker as being distinct from a listener at any
moment in time, and where information is passed from speaker and received by
listener. This is a linear model of feedback. The limitations of the conception and
implications for human–machine symbiosis will be explored in this paper, drawing
on a theory of Body Moves and the concept of entrainment. Entrainment occurs
when two oscillators come to oscillate together. In human interaction, entrainment is
the coordinating of the timing of our behaviours and the synchronsing of our
attentional resources (Clayton et al. 2005; Cross 2007). Body Moves serve this
function (Gill et al. 2000). Body Moves are periodic rhythmic synchronisations of
the movements and modulations of our bodies and voices as we engage with each
other.
Coordination and entrainment
In our everyday interactions with others, one of the least conscious yet most
powerful ways in which we engage is with the movements of our bodies and voices.
As we know, musical movement is not the same as displacement in physical space,
for example: ‘vocal gesture’. The complexity of these coordinations of body and
1
http://topologicalmedialab.net/joomla/main/index.php
2
http://www.infomus.dist.unige.it/
123
AI & Soc (2007) 21:567–605
569
voice is quite astonishing and we are, in most cases, able to adapt to our individual
differences. We only become consciously aware of how we move and sound when
the coordination is not working well or when it is unwanted. A friend of mine
recounted an experience of trying to communicate with a person whom she was
quite sure must have been a very nice person but because they could not quite get
the coordination of their speech and motion patterns quite right, she gave up trying
to do so. One case of unwanted coordination that we can all relate to is that
experience of walking down a street and finding yourself in step with a total stranger
which creates a sense of closeness that feels awkward. You feel conscious of your
footsteps being in a ‘togetherness’ with this person’s, and try to get out of step.
Another scenario is when you are walking with a friend and suddenly becoming
aware that you are not in step and try to get in step in order to be ‘with’ that person.
I will later in this paper be referring to this particular moment of togetherness
as ‘parallel coordinated movement’.
In much work on human coordination and cooperation (e.g. conversation
analysis, Clark 1996; Schiffrin 1987; Coupland 1999), the focus is on speech and on
our capacity for sequential turn taking in conversations and joint tasks. Much of this
focus is rooted in a conception of information as signal, where information is passed
from one autonomous person to another autonomous person in their utterances, and
interpreted and responded to, etc. In other words, there is continuous feedback of
information transmission. In this picture, it becomes possible to describe human
knowledge as information that is transmitted, stored, processed and recalled
(orthodox cognitivism, computationalism: Fodor 1976, 1981; Pylyshyn 1984;
Simon 1969, 1982, 1983). The body is also understood as part of that communication
process and operates according to the same principles, as seen in work on gesture and
semiotics (e.g. Ekman and Friesen 1972, on illustrators and emblems). This
conception of human knowledge leads to the design of methods to elicit, represent
and simulate it for the design of technologies that fit this picture, and serve in turn, to
reinforce it (e.g. knowledge based/expert systems, cognitive engineering, Roth et al.
2001), multi-modal agent interfaces (Cassell et al. 2000).
However, our walking down the street with a friend, talking with a stranger and
trying to find common ground, involves the modulation, pitch, tempo, pulse and
rhythms of our voices and our bodies in a kind of coordinated autonomy. These
aspects are always there when we are engaging with each other or just being copresent in physical space, and cannot simply be explained as having sequential
conversational turn-taking structures. This engagement has a complexity of sensing
and knowing in experienced time, and we will explore this as a kind of musicality or
dance, and as a-linguistic. At one time I considered this as paralinguistic but I am no
longer sure that this is the common frame underlying rhythmic synchronisation
(Kendon 1970, 1972; Condon and Ogston 1966) and find that ‘a-linguistic’ may be
the appropriate ascription. Support for this lies in work on mother–baby interactions
that have been described as rhythmically synchronised and musical (Trevarthen
2005), and where it is proposed that such rhythmical synchrony is important for
companionship, intersubjectivity and empathy.
A key feature of our engagement with each other is our capacity for entrainment
(Large and Jones 1999; Clayton et al. 2005) or collective action/performance, as
123
570
AI & Soc (2007) 21:567–605
when we walk in step with each other. In collective action there is no differentiation
of speaker and listener, rather, we are both at the same time. This is the case with
music performance where in the making of the collective sound the musicians are
both performers and listeners simultaneously (Cross 2007). Such collective action is
considered as being critical for the processes of grounding attention and meaning
in the formation of shared knowledge or knowledge transformation in human
interaction (Gill 2004). One particular phenomena that embodies this process has
been termed ‘Body Moves’ (Gill 2000). These are the rhythmic and pitch
movements of body and sound that operate to ground attention and meaning in
knowledge formation, and enable empathy, and intersubjectivity. They are part of
the tacit dimension of human knowing and raise questions about the information
transmission model of sender and receiver.
This model of information transmission is a linear model of information being
passed from one entity to another. This linearity of sending and receiving
information underlies the sequential model of speaker and listener in turn-taking
in linguistics. It influenced the conceptual articulation of the early work on ‘Body
Moves’ as these are composites of body and speech performance. I was working
with linguists (at NTT Communication Science Labs near Tokyo, and the Media
Lab at ATR near Kyoto) and had a background in discourse analysis, hence it is
not surprising that I found myself trying to find forms of expression that could
both allow for the bodily language and its accompanying speech. The dominance
of the sequential signal information-processing model however, caused some
difficulty in handling one particular kind of movement that did not fit this model.
This was termed the Parallel Coordinated Move (Gill 2002), and understanding it
has caused me to question the linguistic characterisation of the other Body Moves.
These are primarily periodic and rhythmic synchronisations that entrain the
movements of our bodies and voices during communication. They are joint
rhythms, and they take two forms and serve two functions. The first form
synchonises our joint attention to understand how and where we are in relation to
each other. The second form of rhythm synchronises our manifest intentions and
joint attention such that the movements of body and voice across both persons is
performed as one coherent collective movement, even though what is being
expressed are different ideas (i.e. this is not about imitation). I call this ‘Parallel
Coordinated Movement’ (Gill 2002) and have argued that this is critical for
intersubjectivity (Gill 2004; Polanyi 1966). Body Moves are forms entrainment, of
coordinated autonomy.
Autonomy assumes that an individual is a self-sufficient system and can exist
without others. However, human survival depends on one’s existence with others.
The concept of ‘coordinated autonomy’ better describes our individuality as existing
within sociality.
In the phenomena of Body Moves, one form of rhythmic synchrony serves to
sustain our commitment to engage with each other, and the simultaneous form
serves to transform our states of tacit knowing such that we are able to arrive at
agreements and achieve topic shifts. The latter form cannot be arrived at without the
existence of the former and emerges from it. Both of these forms of Body Moves are
moments of empathic connection. Support for this empathic connection is found in
123
AI & Soc (2007) 21:567–605
571
research on music and affect undertaken by Joel Swaine, in his work on ‘Rhythm in
Vocal Attending’ (ongoing Ph.D. Dissertation, Cambridge).
This function of the Body Moves and the demonstration of their existence in
2001 was undertaken in an experiment in the Stanford Interactive Workspaces Lab.
The larger goal of the experiment was to explore the role of the non-verbal in
collaborative activity and how the use of ‘collaborative tools’ could afford this. The
experiment revealed the intersubjective and entraining function of Body Moves. In
the experiment, I asked subjects to collaborate in a joint sketching task where they
had to engage at a large computer-based surface (an electronic whiteboard) that only
permitted one person to touch the surface at any time, in sequences. Imagine using
two mice on your desktop, the cursors will join up giving a messy effect on the
screen. I expected the constraints of sequential behaviour would deny them the
possibility to achieve Body Moves. And indeed, all the pairs of collaborators found
their rhythmic synchronous coordinations to be disrupted, and found that this
affected their commitment to communicate and be with each other in joint action
(Gill and Borchers 2004).
My present collaboration with musicians (Centre for Music and Science,
University of Cambridge) is providing me with a foundation for understanding the
temporal dimension of experiencing within body and voice rhythmic periodicity. It
is important to explain that Body Moves occur at particular moments in the
communication process, and are emergent from the other co-regulated movements
that are occurring. What distinguishes them is a different temporal structure that is
akin to qualities in music. It is for this reason that the last section of this paper is on
musicality of human interaction.
This paper is going to travel through these ideas by recounting a journey through
the technologies with their communication and computational paradigms, and I am
going to arrive at my interest in body movement and gesture in the context of my
interest in the tacit dimension of knowledge.
Expert knowledge and the expert system
The research on the tacit dimension of knowledge begins with the concept of the
expert or expertise and the limitations of representing human skill in propositional
forms. In 1987, I became involved with an interesting group of researchers (Bo
Goranzon and Ingela Josefson), from the Swedish Centre for working life in
Stockholm who provided me with the opportunity for an apprenticeship into the
concept of the tacit dimension. As part of my learning, I reflected on the concept of
skill, the meaning of information, the role of imagination and reflection, and the use
of language to express oneself. The theoretical training involved my spending time
at the University of Bergen working with Kjell Joannessen and Tore Nordenstam–in
the school of Wittgensteinian Philosophy where I was introduced to Philosophical
Investigations (Wittgenstein 1953) and Polanyi’s (1966) ‘The Tacit Dimension’, as
being essential to hermeneutics. Much of this was new to me and enlightening. My
practical training involved visiting various researchers in Europe and bringing them
together in an international workshop on a Humanistic Approach to Technology and
123
572
AI & Soc (2007) 21:567–605
Design at a major European conference called Language, Skill and Artificial
Intelligence (Stockholm 1988). This conference was significant in Scandinavia as it
reflected on a culmination of almost 10 years of the use of expert systems in
organisations within both the public and private sector. My project in that year was
to undertake a Cultural comparative study of the concept of design between Britain
and Scandinavia and edit a collection of works on ‘Knowledge, Skill and AI’
(Goranzon and Josefson 1988).
What did I learn from all this? I learnt that there is a deep relationship between
that which is explicit and that which carries and embodies it–namely the tacit. I
learnt that the patterns of negotiation and organisational decision making are rooted
in cultural practice and cannot be considered outside of this performance of tacit
knowing that is socially embodied. Likewise, how people understand the meaning
of the words, ‘participate’, ‘cooperate’ and ‘dialogue’, is also located within cultural
practices. The Italian colleagues could not accept the use of the word ‘dialogue’ to
denote any lower act than the communion with God–and any other form of
engagement is ‘communication’. I learnt that the British pragmatic culture was more
comfortable with the idea of cooperation, whereas the Swedish democratic culture
was more comfortable with the idea of participation. In the former, decisions can be
altered relatively quickly if circumstances change, and in the latter, decisions
undergo the rigor of the democratic process of participation for any alteration to be
considered. Decision-making itself is a culturally rooted communication process.
In particular, I was struck by a Swedish case study undertaken by my mentor,
Goranzon (1993) of mathematicians in the Forestry industry. They had worked with
systems designers to build expert systems to assist them in their daily calculations.
However, over a period of years, they found themselves on occasion doubting their
own judgement to that of the computer and eroding confidence in their
mathematical skills. This story has always troubled me.
If we know what information is being placed into a technology, and we know the
limitations of what that can achieve, and we are ourselves highly skilled, why when
we engage with this representation of our knowledge do we lose our confidence to
judge?
I decided that I needed to investigate the relationship between the tacit and the
explicit dimensions of human knowledge and human cognition. And I began with
the example of the expert system.
Expert systems and experiential knowing
Expert systems designers of the old school of cognitive science and information
systems (often referred to as ‘strong AI’) believed that it was possible to represent
someone’s skill as explicit representations and explicit processes. This assumed that
the tacit was merely the unformed explicit.
Two case studies in the beginning of my Ph.D. were to have a fundamental
impact on the way I reflected on the tacit dimension in relation to the explicit within
the expression or performance of expertise. Most important was my realisation that
123
AI & Soc (2007) 21:567–605
573
the key to the tacit was ‘experiential knowing and the imagination’ and that the
‘experiencing’ was outside of linguistic patterns.
The first case study was about creating a knowledge base for consultancy
practice. As a result of my presentation of a paper, ‘‘‘Knowledge and Skill Transfer
through Expert Systems: British and Scandinavian Traditions’ at a British AI
conference (Expert Systems, Gill 1988), I was invited by the chairman of the
conference session to join a workshop on consultancy organised by his consultancy
company for a week in the country as they underwent a corporate identity
transformation. The company was a large multinational. At that time its members
did not have a corporate concept of ‘consultant’ and performed in their jobs as
‘experts’ in their domains. However, to keep up with the times and be competitive,
the company decided to develop the concept of consultancy and reconstruct its
identity as a consultancy company. This necessitated asking its highly skilled
experts to articulate themselves as consultants, something they were unfamiliar with
seeing themselves as. My role was to ‘elicit’ the tacit dimension of their knowledge
formation as they underwent this process. The outcome of this was to develop an
interactive and intelligent multi-modal knowledge based system for training experts
to become consultants. It was not possible to video or audio tape any of the
interactions due to sensitivity and confidentiality. During my week with these
experts from all domains of work (sales, engineering, marketing, top management,
computing, etc.), I experienced many forms of expression of practice and
experience being used to build the identity of ‘consultancy’, such as role play,
cartoons, metaphors and video film. My task was to record and provide an analysis
of the tacit knowledge of consultancy practice.
These days, the processes of apprenticeship and learning through practice may
seem alien to many researchers. However, under my Scandinavian mentors, I learnt
not to follow conventions of research methods but to develop my own, which is
what I did and continued to do so throughout my research.
To illustrate some findings about the tacit-explicit and experiential knowledge
processes, I will give one example of ‘consultancy’ performance that took place
during the week mentioned above. A group of four upper-middle and senior
management practitioners (experts in their fields) gave a presentation of about
20 min each. They were seated around a table and were provided with an overhead
projector if they needed one. Each was to present themselves to others as a
consultant and talk about what a consultant is, and sell the idea of consultancy to the
others. The group was chaired by another senior manager who sat at the table with
them.
The first ‘consultant’ dressed in a suit and tie, stood up and presented a ‘tool kit’
of consultancy, which essentially consisted of a list of propositional statements–
descriptors, definitions and rules. This consultant had to stop giving his presentation
after a few minutes saying he had lost the thread of information, i.e. the connection
between himself and the information he was presenting.
The second ‘consultant’ dressed more casually but smart, spoke of how
consultants ‘pull rabbits out of hats’ and presented hand drawn overheads of a rabbit
being pulled out of a hat and one visual with the word ‘magic’ in large letters. His
forms of expression disturbed his ‘clients’ who accused him of mocking their
123
574
AI & Soc (2007) 21:567–605
profession and expertise portraying them as insincere or dishonest, and made them
unreceptive to his ‘content’.
The third consultant also dressed casually but smart, also stood up. He spoke
about rules or conduct, and emphasised the good things a consultant does. His
handwritten overheads were measured and consistently paced. He was perceived as
sincere and they felt he understood them and supported them.
All these three ‘consultants’ had stood and presented overheads, with the first
dressed in suit and tie and the next two dressed in more casual but smart clothes.
The fourth consultant sat on the other side of the table to the other three, facing
them, and began telling them a confidential story of some political rumblings at the
top of their corporation. This consultant was very high up in the organisational
structure hence he had an authentic voice on these matters. The others became
troubled and deeply involved in unravelling the story trying to find out as much as
they could and work out the nature of the problem. After 20 min, this fourth
consultant broke the illusion of reality, and told them it was all a story. This was a
disorienting experience for the others and they were very impressed by what he had
done with them. It was of great interest for me, for this consultant had fully engaged
them in the performance of practical knowledge, where their experiential knowing
was immersed with each others. It was powerful acting (or it was acting from
power) with audience co-performance.
The second consultant who had offended the others’ moral well-being had in fact
given a sound presentation at the level of content. The chairman of the consultancy
workshop called me a few days after this week was over, and showed me the copy
of the overheads by this consultant pointing out that there was actually nothing
offensive or wrong in what he was saying. The problem had lain in how he had
presented the content and how he was perceived as a person. This took me back to
my premise on culture and communication, that the performance of the forms of our
expression influences the perception and meaning of the content the forms carry.
The third consultant provided the feeling of safety and comfort in his use of
moral and ethical forms of expression and a calm, paced voice. He was described as
genuine.
In all these performances, the posture, position and clothing of the performers in
relation to their ‘clients’, set the stage. If the forms of expression were not
embodied, the performance failed, and if the forms of expression did not meet the
perceptions of self of the client, there was breakdown in the communication process.
As I reflected on this case study, I was reminded that I needed to study the tacit
dimension as a process within dialogue and not outside of it. This was reinforced by
my second case study which was an interview applying ideas about how to engage
with ‘eliciting’ the tacit through and within dialogue where dialogue is the method
and the observation.
I was invited in by some researchers who were developing a data base for
underwriters that could process applications for life insurance policies. The work on
the data base was becoming cumbersome and the processing of all the possible data
input categories was creating bottlenecks. I requested to be alone with the ‘experts’,
and had the opportunity to sit with a senior underwriter and a junior underwriter.
They had brought a set of application forms with them and were curious about my
123
AI & Soc (2007) 21:567–605
575
presence. I told them that I was not there to extract any information but wanted to
learn about what they do and I spoke a bit about my interest in the tacit and
experiential dimension of human knowledge and skill. They began to chat about
their skill and began to go through each form, thinking aloud in order to explain to
me what they do. The session emerged as a senior expert teaching and imparting his
skill to the junior (and to me). As they worked through the information on the forms
they built up pictures of the people they had in front of them and imagined their past
and future lives. On the basis of this they formed judgements as to whether this was
someone who could or could not qualify for a certain type of life insurance policy.
The experiential knowledge and imagination of the senior underwriter was made
available to the junior who could then follow and work with him to understand the
personality and life-style of their applicants.
It was clear that there was no one salient procedure of data processing as each
person (each applicant) had a different picture of salient information, and it could be
problematic to assume predefined categories with predefined processing rules that
are rooted outside the meaning of ‘relevant’ data–i.e. outside how it is meaningful to
the underwriter in building a picture of a person. There are two problems and they
relate to the idea of not being able to see the wood for the trees. If one functions at
the levels of data and procedures, then one builds composites, but these composites
may not form a wholeness, instead they may simply remain a collection of parts. It
is the human who can make the wholeness by applying experiential knowing and
imagination but the skill of achieving that may become lost if the system automates
the expert’s creation of the applicant into the composition of parts. There are
undoubtedly corporate factors (around risk and profit), which shape the imaginative
construction of the client, but these are not the foci of the analysis. This experience
took me back to the early conversation with my Scandinavian mentors about what
role imagination and experience plays in human knowledge.
These two studies were undertaken during my time in a school of computer
science. I realised that the limitations of the computational paradigm in relation to
the tacit and experiential dimension of human knowing lay in the necessity of
representation. Computation necessitates that human knowing is made explicit as
‘content’, ‘context’ and ‘procedure’. What I was seeking to do was to understand
human knowing as the inter-relations of the tacit and explicit within the performance
of content, context and procedure, and this was not a matter of representation, but of
communication.
Hence the next part of my story moves from representation to communication.
Mediation and distributed expertise
I decided to apply the analysis of the relation between the tacit and explicit
dimensions of knowing to the analysis of conversation. My case study was of
the meeting discussions of a design team who were creating an audio–visual
communications infrastructure in their corporate building. This involved participatory observation, which included making video and audio recordings, as well as
conducting informal interviews, and inhabiting the space as an affiliated researcher.
123
576
AI & Soc (2007) 21:567–605
I found that (Gill 1995) the success and failure of the transfer of knowledge in
communication consists in the relationship between the tacit and explicit
dimensions of knowledge. This arose from critiqueing the idea, prevalent in
computation and the information transmission model, that it is possible to represent
‘expertise’ or ‘knowledge’ in a data-base such that the ‘information’ can be coded
and logically processed. As we have discussed above, this frame assumes that ‘data’
or ‘information’ can be taken from its living context without losing its meaning, as
its meaning becomes redefined within a system of rules and abstract differences.
Hence, I decided to study what constitutes a ‘piece’ of ‘information’ within the
conversations of the audio–visual design team. At the time, two researchers based in
the organisation were analysing design meetings in their research, and in their
analyses they would list salient types of information that occurred within these
meetings. One such category of information was ‘topics’. I decided to identify
topics that were raised in the conversations of ‘my’ design team, and treated them as
‘information’. In analysing the nature of this information, I traced the path of each
topic and found that where a topic began was where there was a discrepancy in
knowledge amongst the conversants. The end of each discrepancy coincided with a
new piece of information being raised, i.e. a new topic, indicated by a move for
a topic shift. At that moment, the discrepancy had been resolved or at least a
consensus was reached such that the conversation could move forward.
Quite unexpectedly, the quest to understand the relationship between the tacit and
the explicit, required the analysis of the nature of discrepancies in communication
and the means by which we can accommodate to each other’s differences, either via
third party help or between ourselves. The participants needed to find some mutual
ground, involving their self and this is where the operation of mediation was critical.
The design team consisted of five persons, all with different skills or knowledge
domains, and each person was chosen by the Director to ensure they represented the
salient elements necessary for successful design of this technology. The topics that
I identified within the conversations involved discrepancies of very differing
structures. One example of a gap in knowledge between two people communicating,
is where one is talking from within his are expertise and the other is trying to engage
with him outside of his expertise. This typically necessitates some third party person
who can bridge the gap. This scenario of mediation comes most easily to mind when
thinking of what mediation means. In the case of the design team, a mediator was
able to make the ‘expert’ understand that in fact he had not understood the nature of
design problem that he was supposed to be responsible for and this is what lay
behind the gap, not the lack of knowledge of the non-expert. Another, scenario
which is not so clear, but performs a similar function, is where many people are
recalling a design setting–e.g. ‘I cannot see a very clear view through that camera
lens’, ‘that area looks very dark’, ‘maybe the light is on too low’, etc... As their
experiences pour in and some of them contradict a statement they had made just a
while back, one of them says something pertinent such that the ‘expert’ in this
chattering of recalled autobiographies suddenly realises what the problem is, and
solves the matter of the camera lens.
In both scenarios, the ‘expert’ cannot see what the design problem is. It is
someone else, whom I term the ‘mediator’, who provides the key and does so with
123
AI & Soc (2007) 21:567–605
577
the precision of the appropriateness of their utterance at the right time and in the
right style and with the right role. There are three points to be made here. The first is
the dismissal of the concept of an autonomous expert, the second is the existence of
distributed expertise, and the third is the relation between mediation and knowledge
transformation.
The mediator enables resolution in discrepancy and consensus in knowledge by
being empathic with the critical discrepancies. Empathy is here defined as
the compatibility to generate shared understanding with respect to a particular
combination of compatibilities such as role, level of knowledge, forms of
expression, personality, etc.3 Empathy is necessarily personal and involves emotion,
and is therefore the ability to share or generate understanding of knowledge (which
is necessarily personal, and can be propositional), role and personality.
The explicit and the tacit within this picture of mediated communication is
considered as being the expression and its background of meaning. When this is
unsuccessful in being communicated, mediation is needed to provide the bridge for
the particular discrepant aspects of the tacit and explicit dimensions to meet. This is
achieved by making the tacit nature of the discrepancy explicit to both participants
such that they both understand the background to their discrepancy. Once they
become aware, it is possible for them to begin to resolve it.
Hence, the success or failure of knowledge transfer is dependent upon the
relations between kinds of knowledge (content) and the processes, which influence
its transfer, i.e. discourse dynamics. The categories of knowledge (content) are
propositional, experiential, personal, knowledge by familiarity and practical.
Propositional knowledge is expert (domain) knowledge, or knowledge, which can
be expressed in the form of rules, made explicit, and is non-personal and nonexperiential. Experiential knowledge is that which comes from one’s own direct
experience of the knowledge one expresses, or it is cultural/social knowledge, or it
is knowledge of another’s experience, or of an event. Experiential knowledge
consists of personal knowledge. Personal knowledge is that of the individual
personality, expressed as values, beliefs, emotions. Experiential knowledge is the
relating of one’s experience. This may be either direct experience which is indicated
by the use of ‘I’, ‘we’, etc., or general knowledge (e.g. knowledge about a culture: ),
or generic knowledge4 (a frequent experience: ‘whenever I do...), or episodic
knowledge5 (a specific experience: ‘the other day I was...’). Knowledge by
familiarity is about knowing ‘when’ to act. Practical knowledge is the skilled
communicative performance itself. It can be inferred but not made explicit:
decisions, judgements, analyses, indicate (point to) practical knowledge but do not
represent it.
Through dialogue, participants acquire knowledge or fail to do so, and other
possible outcomes are changes in group knowledge, achieving dynamically stable
3
It is akin to aesthetic emotion–e.g. our resonation to the structures, textures, forms and colours of a
painting, as well as the theme presented by them. By empathy, I do not mean sympathy.
4
This is based on the idea of generic structures in memory, which summarise similar events, cf. Barsalou
(1988).
5
This is as in episodic memory, cf. Tulving (1972).
123
578
AI & Soc (2007) 21:567–605
knowledge6 and building trust. Knowledge acquisition is successful when the
communication between a speaker and a listener is consensual and compatible (at
this stage in my research, I was still following the model of speaker and listener).
Knowledge acquisition fails where no compatibility in communication can be
established between speaker and listener. Group knowledge denotes a new level of
group consensual knowledge: indicated by a qualitative difference over time, e.g.
from the beginning of the conversation, in this case design meeting, to the end. Trust
is both an aspect of group dynamics and a possible outcome of dialogue. The system
of interaction between content (knowledge) and processes (discourse dynamics) in
the communication is orthogonal. This is evident in the earlier two case studies of
the performance of consultancy practice, and of how underwriters make sense of the
data to imagine a person’s life. Knowledge is embodied in discourse dynamics, such
as goals, forms of expression and group dynamics.
The construction of knowledge categories for the analysis of communication
drew upon a range of discussions and research on tacit knowledge and human skill,
particularly in relationship to technology and its application, from human-centred
systems (Cooley 1987; Gill 1996). The categorisation of knowledge dimensions also
drew upon work in autobiographical memory research (Bekerian and Dennett 1990).
To return to mediation, the conventional model assigns a person to have that role
in dispute situations or in a meeting (e.g. chairperson). Hence, the mediation process
involves the mediator making a number of interventions. Sometimes they may not
succeed and sometimes they do. Where they do not, and other interventions disturb
the communication rather than ‘enable’ it, this may be perceived as a problem.
However, in the design meeting referred to above, the constant intervention,
sometimes with information irrelevant to the particular problem (rhetorical) by
various negotiators or participants, functioned to sustain the dialogue and clear up
the noise in order for the person who would be the mediator, to act as such. In fact,
in a period of 5 min, there were three topics, each consisting of different kinds of
discrepancies in the knowledge of the participants and each with different persons in
different roles performing as mediators for different experts.
The study drew three basic requirements for a person to be a successful
negotiator or mediator:
(1)
(2)
(3)
Understanding the other; understanding the situation of discrepancy between
two (or more) parties;
Knowledge of the gap between oneself and the other; knowledge of the gap
between parties;
Ability to express this understanding and knowledge to other or others, i.e.
produce the bridge.
The first is a necessary condition, and this has to function in conjunction with the
next two. It is not sufficient to understand the nature of the discrepancy nor to have
the key knowledge; one needs to be able to convey this in a form that others can
6
This denotes an individual’s ability to have acquired the knowledge such that they can use it in a
sustainable manner; a kind of psychological state whereby someone can maintain their performance of the
knowledge over time.
123
AI & Soc (2007) 21:567–605
579
‘receive’ (The receiver model was prevalent for me at this time). Personality also
played a role as this influenced the perceptions people have of each other, and
affects their ‘reception’ of information imparted. Being aware or understanding the
other requires an understanding of how the other perceives you.
The analysis of the tacit and explicit dimensions of communication for
knowledge transformation, clarified that embodiment of forms of expression was
critical to sustaining one’s own performance and engaging in co-performance with
others (what I termed above as dynamically stable knowledge). It took the analysis
from the consultancy study further by identifying the various kinds of knowing that
make for the tacit dimension in communication. It showed that the explicit is never
meaningful outside of experience, and that a critical factor in the formation of
knowledge in human communication is the process of mediation.
One aspect within this communication was missing from the analysis, namely,
the human body. Although all the initial analysis was made from video recordings,
the final analysis was made primarily from speech performance. This is why the
speaker–listener model of communication is prominent at this early stage of work
on the relation between tacit and explicit dimensions of knowledge in communication. A reconsideration of this began when I moved from the conversation setting,
to a setting involving mediation of communication using materials where the body
and co-presence in the same physical space became important for developing an
understanding of the nature of experiential knowledge. The setting, which was of
landscape architectural practices, extended my interest in the tacit dimension to
analyse the aesthetic dimension in human interaction. The hermeneutic tradition that
I began my research within, emphasised that the tacit had two arms, aesthetics and
ethics, neither of which is reducible to abstract propositions if it is to be meaningful
in human conduct. The place of empathy in conversation was defined as being
embodied in the mediation (see Fig. 1) process, and empathy was described in terms
of aesthetic emotion arising with the resonance of communication structures. This
conceptualisation of empathy is later extended with the work on the body and tacit
knowing in human interaction.
The methodology for understanding the tacit has so far drawn on dialogue,
discourse analysis, ethnomethodology and participatory observation, and now
extends to include ethnography.
Aesthetics, experiential knowing and the body
In the winter of 1996/1997, Gordon, an apprentice landscape architect with
company ‘BETA’, sent a set of completed coloured maps that he had made at the
company’s Welsh office, to John, a senior architect based at its headquarters located
in North England. The company was going to make a bid for project work to
reshape a major road in North Wales where the frequency of traffic accidents was
high, and these coloured maps were part of the depiction of the changes to the road
design and effects upon the landscape. For example, colours depicted old woodland
and new woodland. To Gordon’s surprise, John judged the colours that he had used
123
580
AI & Soc (2007) 21:567–605
Knowledge Acquisition:
Knowledge transfer
and formation
Design
DIALOGUE
Outcomes
Knowledge
acquisition
Influences
or
Knowledge
DIALOGUE
Failure at
knowledge
acquisition
Discourse Dynamics
Forms of expression
Group dynamics
Style
Group
knowledge
Dynamically
stable
knowledge
Trust; empathy
Fig. 1 Framework
to be ‘wrong’ and that the maps needed to be correctly re-coloured. Company
BETA had barely 2 weeks left to submit their bid and re-colouring all these maps
was no small task. John brought in other experienced landscape architects at his
branch to help, and asked Gordon to travel up from Wales and re-colour the maps
with them. It was felt that Gordon lacked experience and the only way he was going
to get it was by experiencing the doing of colouring in a shared practice.
The problem of ‘seeing’ the colours was partly due to the company’s economic
condition. BETA was downsizing, as a result of which Gordon was the sole
landscape architect left at the Welsh branch. Architects, however, do not interpret
the material in isolation when they first handle it. In talking aloud and moving pens
over paper, they engage the other person(s) in their conceiving. This, it is suggested
enables one person to adapt upon another person’s view, producing the conditions
123
AI & Soc (2007) 21:567–605
581
for a coherent development of the design (Gill 1997), and a process for ‘seeing-as’
(interpretation) until they come to ‘see’ (unmediated understanding) (Tilghman
1988). This is likewise with colouring activity: as the apprentice colours with the
team and more experienced architects, he/she learns how they select, for example, a
specific shade of blue to set against a particular shade of green (‘seeing-as) to create
a ‘pleasing effect’ that ‘looks professional’ (Gill, op.cit).
Because of the distance between the two branches and because of their
commitments, John had been unable to visit Gordon and work with him. Instead, he
had sent him a set of previously coloured maps (examples of experience), colour
coded keys and a set of instructions. These are descriptive and propositional forms
of expression, all located in the experience of the architects at the North England
Branch. For Gordon, they are outside his experience, and he brings his own to bear
in interpreting these fragmented representations of practice.
In his study of how a team of geophysicists judge when material fibres in a
reaction vat are jet black, Goodwin (1997) shows how simply saying ‘jet black’ is
not sufficient for helping an apprentice measure and make this judgement
competently. Rather, the ‘blackness of black’ is learnt through physically working
with the fibre, and in talking about the experience, ‘transforming private sensations
and hypotheses into public events that can be evaluated and confirmed by a more
competent practitioner’. Geochemists use their bodies as ‘media that experience the
material’ being worked with through a variety of modalities. In the case of the
apprentice, Gina, in Goodwin’s study, her interlocutor’s ability to recognise and
evaluate the sensation she is talking about requires co-participation in the same
activity.
The example of Gordon’s ‘failure’ to correctly interpret the forms of expression
sent to him, is an example of how breakdown can take place when co-participation
is missing from the interpretation process, and how essential it is for repair within a
distributed apprenticeship setting. Knowledge becomes clearly more than a matter
of applying learnt rules, but of learning ‘rule-following’ (Johannessen 1988) within
the practices that constitute it. The need for him to colour with the other architects in
order to be able to correctly interpret any such future fragments that might be sent to
him, shows that experiencing in c-presence has powerful tacit information.
Gordon’s acquired knowledge will be evident in his skilful performance of these
forms of expression.
The equivalence in meaning of ‘forms of expression’ and ‘representations of
practice’ denotes a range of a range of human action, artifacts, objects and tools.
Human action includes cues, which may be verbal, bodily, of interaction with a
physical material world (tools, e.g. pens, light tables, etc.), and construction of the
physical boundary objects (e.g. colour, maps, sketches, masterplan sketches,
masterplans, plans, functional descriptive sketches, photographs, written documents, etc.).
The dilemma of this distributed setting is that even in the future, any interpreting
or understanding that Gordon, as an apprentice, does of similar or different
fragments of knowledge, will still take place in isolation, and the feedback from his
local colleagues will be based on their ‘seeing-as’ (Tilghman 1988) (interpretation
123
582
AI & Soc (2007) 21:567–605
based on their experience) and not ‘seeing’ (as they lack sufficient skill in this
domain to ‘understand’ without interpreting).
I asked John that suppose it were possible for his team and Gordon to colour
maps together in a distributed setting with the help of some hypothetical computer
mediated technology, would he be interested in exploring this possibility? John
declared that this was not a matter for technology, but quite simply that Gordon
‘lacks experience’ and that the only way he will acquire it is by colouring with them
in the same space. His conviction, made me reflect on what it means to share a space
and be present, as a precondition to acquiring experience; experience that would
have helped Gordon to interpret the examples of previously coloured maps for
similar bids, colour keys and instructions, that had all been sent to aid him in
understanding how to colour the maps.
Being present is a bodily experience, and involves all the human senses. In
various cultures we draw upon various levels of our senses. For instance, the Maori
rub noses in greeting each other, Russians kiss on the mouth, and in some Arab
cultures, they bring their faces close enough to smell the breath of the other. All
these acts are part of gauging one person’s sense of another, essential to building
trust that is required for committed engagement. Placing a glass plane between two
people in any of these situations would block their tacit ability to interpret their
relation to each other, and thereby comprehend each others meaning, through the
impacts between their bodies, and would require them to focus on the visual and
speech channels that have limited bandwidth for tacit knowing.
John was certain that once Gordon had this experience of colouring with him
in the same physical space, he would have no trouble in the future in aligning
his aesthetic ‘seeing-as’ with theirs when given such materials or representations
(exemplars) to interpret and ‘see’, wherever he might be. Seeing-as requires
interpretation, and Tilghman terms this, ‘mediated understanding’. Once you have
the skill to see, you can understand without interpretation, and just perform. The
tacit knowing that Gordon had acquired would be ‘retrieved and made active by
sensing’ (Reiner et al. 2004) in his act of seeing. The role of mind and imagination
is important for such retrieval in sensing that brings together past (memory), present
and future. In such sensing, our minds draw upon our bodies: ‘wherever some
process in our body gives rise to consciousness in us, our tacit knowing of the
process will make sense of it in terms of an experience to which we are attending’
(Polanyi, op.cit. p. 15).
When the maps in this example were finally coloured, the final result was
markedly different from those that that Gordon had produced, and although this is
not the key point, they did look aesthetically more pleasing. However, my
impression was also one that had been shaped with others.
I now sought to investigate the aesthetics of ‘seeing’ by analysing the importance
of the body for experiential knowing in shared physical space, and the constraints of
disembodied communication upon distributed knowledge acquisition. During the
next phase of my research, the coordinated synchronised movement of the prosodic
qualities of the human body and voice become a foci of attention as part of the basic
level of the tacit dimension of human knowing, the nature of which, it is proposed,
enables intersubjectivity, knowledge acquisition and transformation. I developed an
123
AI & Soc (2007) 21:567–605
583
analysis of how experiencing the performances of representations of practice (e.g.
gestures, deictics) and moving with these representations in a joint activity (such as
colouring together, sketching together), consists in specific types of behavioural
alignments between actors in an environment. I termed this Body Moves and have
been developing the relationship between Body Moves and tacit knowing. This is to
better understand, conceptually and practically, how the nonverbal movement of
body and voice facilitates knowledge transformation and knowledge acquisition,
and specifically the quality of that movement, which turned out to be rhythmic and
entraining.
The methodology to understand the tacit dimension would now be further
extended to include experiments and psycholinguistics, to analyse the pragmatics of
body prosody.
Embodied synchrony and multi-modal systems
I will begin by relating the technology scenario in which rhythm and entrainment
became visible. This next phase of research took me to Japan. At NTT Basic
Research Labs (1997–1999) (now the Communication Science Labs) and ATR
(1997) I became interested in the world of interactive multi-modal systems that used
various styles of human like objects. For example, there were ‘talking heads’ and
‘talking eyes’, where the body would be reduced to a range of features from the
most minimal (just two eyes on the screen) or more complex, and would move with
the prosody of the voice, including backchannels and fillers.
In the last few years, work in gesture has shown many features of gesture speech
coordination that is being fed into interactive agent technology design. Designs of
life-like agents, as in the case of the MIT estate agent called REA7 (Real Estate
Agent), are full bodied and gesture at the appropriate prosodic speech moments but
also follow patterns of gesture and speech cue timing that is being discovered in
gesture research. In the case of REA a camera was used to passively sense the user.
REA plays the role of a real estate salesperson who interacts with users to determine
their needs, show them around virtual properties, and attempt to sell them a house.
Cassell describes her Embodied Conversational Agent (ECA), as ‘a virtual human
capable of interacting with humans using both language and nonverbal behavior’8.
She introduced the rule-governed, autonomous generation of non-verbal conversational behaviours in animated characters. In a similar vein to the work at NTT and
ATR, the embodied conversational agent is capable of generating and ‘understanding’ both propositional components of speech and synchronised interactional
components such as back-channel speech, gestures and facial expressions. Work on
gesture has shown that we begin a gesture movement prior to our speech action, and
that our gestures have a structure called a ‘stroke’ (McNeill 1992; Kendon 2004)
that is quite precise in its phases and works with speech content. A gesture may
accompany a speech utterance and be its embodied representation. Work by former
7
http://www.media.mit.edu/gnl/projects/humanoid/index.html
8
http://www.soc.northwestern.edu/justine/jc_research.htm
123
584
AI & Soc (2007) 21:567–605
members of the Chicago school of gesture that was headed by David McNeil, and
included Cassell and Kita, have investigated representational and conversational
gestures and produced findings that are important for understanding the relation
between hand and mind (McNeil 1992).
Other artistic approaches to gesture, such as in Tosa’s interactive art/media
projects9 focus on the emotional sensation of movements of the body and voice in
creating attachment and care. When I met her at ATR Tosa had developed an
emotion matrix that she used to develop her artworks of ‘neurobaby’ that moves and
coordinates its non-verbal body and voice actions in response to the movements
(prosody, pitch and modulation) of the sounds of our voices. In the case of
‘neurobaby’, I found myself experiencing anxiety when the baby (a collection of
moving coloured lines of eyebrows, eyes and mouth, and a outlined face) burst out
crying and I desperately tried to stop it by altering the pitch, rhythm, tone, phrasing
and modulation of my voice.
The domain of embodied agent designs is a large one, and I wish to focus on one
aspect, that of the representation of communication processes (procedures) and
semantics (content) of communication acts. The representation of the communication process is based on sequential turn-taking structures and that of semantics is
based on the affordances of computational structures. The turn-taking acts are preprogrammed responses to the feedback that the user provides (such as that of my
voice when trying to calm the crying neurobaby)–and the user has to be predictable
in giving the feedback (otherwise the emotion structure of my voice would not be
recognised). The timing of any turn is bounded by the computational constraints
of processing the data input and responding to it. The complexity problem for
computing interaction/conversation is that if one tries to represent the human
interaction and replicate it, then this necessitates making human interaction explicit.
The problem of representation is similar to the problem faced by the bottleneck of
‘implicit’ or ‘tacit’ knowledge for the expert system. The interaction is definitely
made more engaging when using human-like agents with programmed human
voices as these elicit emotional (sensory) and imaginative responses that enable the
illusion of contact with the artificial.
At a basic level, all these designs are a further development upon the Eliza
system developed in the 1970s by Weizenbaum at MIT (1966, 1976), which asked
the kinds of questions that people expected to hear from a therapist and gave the
illusion of contact and understanding with the artificial. The multi-modal agent
system extends interaction from speech (content and organisation of content), to
including the body, gestures, and modulation of the voice in eliciting a stronger
embodied response, that we now know from work on mirror neurons (Rizzolatti
et al. 2001; Ferrari et al. 2003), and work on music and motor neurons (Large and
Jones, 1999 on dynamics of attending), trigger both these neurons.
What interactive technologies would need to be able to achieve, in order to
perform as co-performers with us, is adaptability and grounded meaning in
communicative situations. This is not computationally possible and it is not clear if
this is desirable.
9
http://www.tosa.media.kyoto-u.ac.jp/
123
AI & Soc (2007) 21:567–605
585
At the moment, feedback patterns are based on a variety of methods, one of
which is to aggregate and average the types and frequencies of an action, e.g. of a
gesture with a speech action, and another is to simulate the communication setting
and model it. An example of simulation is given further below, where the task
involves two people separated in different rooms, connected via a microphone, with
one person trying to find out where to eat out in Tokyo on a particular night, and the
other person searching a website to help them find an appropriate restaurant. The
aim of simulations can be to discover a grounded ideal patterning of query and
response by running through a large number of subjects who will entrain to the most
prominent coordinated pattern (this is never predictable as it depends on the
personalities and behaviours of the subjects) (e.g. Gill and Kawamori 2002). This
grounded pattern is then taken as the normative case for computational purposes,
and for human users to then engage with. Present work on learning algorithms seeks
to free the computational constraint on adaptation.
AT NTT Basic Research Labs where I was part of the Cognitive Science and
Dialogue Understanding Groups, my research projects on tacit knowing in dialogue,
in addition to multi-modal communication processes, included the study of crosscultural communication and perception, and the use of the Internet in Japanese
education (e.g. the 100 Schools Project, Miyazawa et al. 1999). The use of the
Internet had a fundamental impact on teaching in Japanese schools, giving rise to
the concept of the individual and representations of self in relation to others
including in cross-cultural communication. The relation of self to other in a
distributed setting is different from that of being face-to-face, but I still needed to
undertake a study that would provide more understanding of the limits of presence
in the distributed setting that had proven so problematic for the junior architect to
‘see’.
I decided, with my colleague, Kawamori, to analyse how multi-modality,
particularly the body, operates in communication and identify the contingencies of
adaptation for grounding meaning in a non-face-to-face setting where people cannot
see each other but can talk to each other through a microphone. This involved
exploring the synchronisation of body and speech movements (prosody) the a nonface-to-face setting to further understand the functions of the body and voice for the
self and for the other in grounding meaning.
In gesture research on face-to-face communication, coordinative gestures had
already been identified (Bavelas 1994, 1995; Ekman and Friesen 1969) as
functioning as a signal for the interlocutor. Ekman and Friesen had produced a
semiotics of gesture, such as the ‘OK’ sign with the thumb, calling them illustrators
and emblems, whilst Bevelas working on interactive coordination, showed, for
example, how we may gesture with a hand, pointing it to the person we are talking
with, to indicate something we are referring to in relation to that person. In contrast,
in the non-face-to-face setting, where speakers cannot see each other, there is no
visual information that such gestures can carry. This implicates that their function, if
they occur, in this setting will differ from that of the face-to-face setting. Alibali and
Heath’s findings (1999) of reduced but still high levels of representational gestures
(topic based) in non-face-to-face settings, and their proposal that these may have a
function for the speaker which is independent of their function for the listener,
123
586
AI & Soc (2007) 21:567–605
supports the idea that there may be a difference in the functions of gestures in the
two settings.
If coordinative gestures (not just the kinds cited above that are more clearly
intended to be seen) require the presence of the interlocutor in a face-to-face setting,
it is unlikely that they would occur in the non-face-to-face communicative setting.
Bavelas (1994) had found that ‘interactive gestures’ do occur in non-face-to-face
settings but that they are significantly reduced. As gestures still occur in this
interaction, do they still presuppose the interlocutor, given that he or she cannot be
seen, or is this not a necessary condition for the act?
In order to answer these questions we conducted a couple of experiments (Gill
and Kawamori 2002). The gestures that we focused upon, nodding gestures, were
described as having a metacommunicative function, i.e. considered as signals for the
interlocutor, carrying information, and thereby enabling coordination (at this period,
the notion of signalled information transmission was still a dominant concept). We
expected that if these gestures still occurred in the non-face-to-face setting, then the
presupposition of the interlocutor is not a necessary condition, in which case we
proposed that they embody an additional self-oriented function.
Of the two experiments, one consisted of British subjects and the other of
Japanese subjects. The task in both cases was identical. The gesture that we focused
on most was the head nod, which is a frequent gesture in Japanese communication,
and we expected a cultural comparative analysis to identity common and essential
coordinative characteristics of this gesture.
The experiment was of the communication of information between information
provider and information seeker. Note the speaker–listener model of information.
The task is a web searching task, where the web-searcher is the information
provider, and the person seeking information is the information-seeker. The
experiment takes place in a laboratory setting. Subjects are placed in separate sound
proof rooms. Four video cameras are used, two in each room. These take a
synchronised view of both subjects at two angles: of the upper torso, head and arm
movement from a frontal view; of hand movement on the desk from an overhead
view. All participants knew in advance that they were being video-taped. The main
topic is about eating out in the Tokyo area. Questions are asked by both subjects, for
example, about the price range of the restaurants, and the kind of food the subject
wants to eat, directions on how to reach the place, and contact details. Each session
lasts a few minutes.
Coding involved recording the timings of the picture frames of each nod, which
person is nodding at the time, at which place in an utterance or turn the nod occurs,
whether the person nodding is speaking at the moment they nod or the other person
is speaking. We also noted which utterances and nods occurred just prior to or just
after the nod coded. This included backchannels, endings of turns or phrases,
pauses, simultaneous nodding and prompts. Coding also involved functions, e.g.
question, inform, suggest, propose, self-expression, emphasis, evaluate, request,
listing, repair, confirm, topic shift, prompt, closure, initiate. This was to check for
any relations between the gesture and the function of the speech as well as the
category of the speech, for example, as filler or backchannel, etc. The frequency of
123
AI & Soc (2007) 21:567–605
587
nods per session, and for information-provider and information-seeker, were also
coded.
Other gestures among the British subjects were also coded, but we found the
most frequent one in this communication setting to be the nod, and nods serve a
coordinative function, which does not directly serve to engage the interlocutor by
the action itself. Three significant findings emerged which indicated a self-oriented
function of the gestures.
First, nods do occur in silence without the accompaniment of speech in a
backchannel-like action. Although nods also occur in conjunction with speech in
backchannel speech acts, the silent nod questions the claim that gesture and speech
form a composite whole in the backchannel where the gesture functions as a signal
for engaging the listener. In the case of silent nodding, this claim cannot be made.
Rather, the claim of a self-oriented function to this interactive but non-communicative silent nodding is more applicable. In this case, we propose that gestures that
do occur with back-channel speech are not performing the same function as speech.
Here begins a questioning of the signal as an appropriate ascriptor for the body.
Second, we found that nods occur with intonation and emphasis, when describing
something, during the utterance itself. This backed up work on synchrony and
coordination undertaken by Kendon (1990). They also occur with repetition. The
nodding action when describing something is rather like a rhythmic beat and tends
to occur at the beginning or middle of the specific word for British subjects speaking
in English, and at end of the specific word for the Japanese subjects speaking in
Japanese. We had expected to find cultural differences of prosodic of body and
speech sounds in the different languages and cultures. For both groups of subjects,
nodding can occur at the end of a phrase. It does not appear to perform any
communicative function, but rather seems to be a necessary part of what we will
term the internal autopoieitic rhythm of the speaker. The frequency and trajectory of
the nods vary with the speech rhythm.
Third, and surprisingly, given that the speakers cannot see each other,
simultaneous nodding takes place. This would appear to suggest that the internal
autopoietic rhythms of these two independent systems (speakers) are the same at
that moment. The situations of this simultaneous gesturing vary. They can involve
silent nods, or both information provider and information seeker speaking at the
same time, either producing exactly the same utterance or different utterances, and
can occur with simultaneous laughter. In the case of Japanese subjects, producing
the same utterance is more likely to result in a simultaneous and same gesture (nod),
than for the British subjects, who may produce differing gestures (for example, nod
and body sway, or nod and a sideways head movement). The frequency patterns
of the simultaneity for both experiments varies with differing situations. The
simultaneity does not appear to be arbitrary and it is synchronised. Whether it is
entrainment as in the case of the Parallel Coordinated Body Move, we were not
certain.
These three cases of nodding would suggest that nodding entails an additional
function to the communicative function (signal for the interlocutor), which is
coordinative and interactive, and this is the self-orienting function essential for
sustaining the interaction. The face-to-face theories do not provide us with an
123
588
AI & Soc (2007) 21:567–605
explanation for why such gestures take place in a non-face-to-face setting. These are
not iconic gestures, i.e. they are not representational. They occur as a person is
speaking in a rhythmic manner, or in silence whilst the other person is speaking, or
during a pause, and in a simultaneous manner. Hence the proposal for considering
an autopoietic (Maturana and Varela 1980) internal mechanism of each person,
which is coordinated by the feedback (but not information transmission) in the
communication, provides a plausible theory to explore for the functionality of these
gestures.
In comparing our findings with that of Bavelas’ taxonomy of interactive gestures
we found that nodding did occur when giving information, seeking, and in turntaking in the non-face-to-face situation. Nodding occurs when seeking to check if
the listener has understood and in turn the listener nods when confirming this. It
occurs when releasing and taking turns. However, although there is an apparent
similarity with the face-to-face situation, our findings suggest that the interactive
function of gestures in human interaction necessarily involves both a self and other
oriented dimension, which in the non-face-to-face condition is predominantly self
rather than other oriented. It has been suggested (Alibali and Heath 1999) that even
in the case of representational gestures, they may perform dual functions, for self
and for other. Hence our proposal for a self and other function in the case of
nodding, and other coordinating gestures has some precedent in gesture research.
Considering the idea of imaginative sociability (Fridlund 1991), it may be that
feeling the presence of another is a necessary condition for performing metacommunicative (signalled conveyance of information about the communication)
gestures. We could say that we perform certain gestures in the need to affirm the
communicative situation to ourselves, thereby reinforcing this feeling of presence,
and in this case, the expression ‘metacommunicative’ may not be suitable. This selfaffirming may be a need on the part of the speaker as a self-referencing act. The
autopoietic theory proposes that a system maintains it’s own defining organisation,
and regenerates its components. The self-referential gesturing actions may be part of
the system’s (persons’) maintenance process.
In the autopoietic vision of communication, this is described as orienting
behaviour. The interactions orient the listener within their ‘cognitive domain’, and if
these oriented domains are similar on the part of both participants, consensual
orienting interactions are possible, as each becomes subservient to the maintenance
of both. Consensual orienting interactions are coordinating interactions. What we
term self-referencing may be what autopoietic theory terms orienting.
Autopoiesis opposes the idea that communication involves the transmission of
information, proposing instead that interaction serves to orient us within our
‘cognitive domain’. This would support the idea that the function of the gestures in
the non-face-to-face setting is not for the interlocutor, but rather are a self-orienting
mechanism, triggered off by the verbal feedback. As we orient towards similar
cognitive domains, our gestures will be coordinated with our interaction. It would be
expected, then, that some gestures will occur simultaneously as a result of being
oriented in similar cognitive domains and not as a chance occurrence.
It has been suggested (Alibali and Heath 1999) that gestures may ‘signal’
disfluencies (‘uhm’, ‘uuh’) in the face-to-face condition, whereas verbal fillers are
123
AI & Soc (2007) 21:567–605
589
greater in number in the non-face-to-face condition. Gesturing accompanies fillers
in the non-face-to-face situation, for example in the form of nodding and body
swaying, but the concept of signalling disfluencies to onself, does not seem to make
sense in the non-face-to-face situation, as we have already found above. It is may
be more appropriate to consider the concept of self-regulation for synchronous
co-regulation with the other.
The idea from autopoietic theory of orientation within cognitive domains,
combined with the proposal of a dual functionality of coordinating gestures as both
self-referential and self-affirming, provides a framework for the coordination of
gestures in non-face-to-face communication. It also questions the idea of signal
model of information processing to account for inter-individual gestural and speech
sound coordination, and it finds support in what we now know from neuroscientific
work on mirror neurons (Ferrari et al. 2003) and motor neurons (Large and Jones
1999).
At this point I would like to take a step back, and before embarking on the
explanation and examples of Body Moves, reflect on another technological space for
the engagement of the body and co-presence. This is the virtual space.
Virtual spaces
Whist at Stanford Centre for the Study of Language and Information (2000–2003), I
ran a seminar series of Gesture and will give an example of engaging in a virtual
space from one of these occasions. Di Paola (2001) and his former colleagues had
developed and worked with an avatar-based 3D virtual community that emulates
many natural social metaphors but has extended and adapted many of these
metaphors as its very tight-nit community of users has evolved over the years.
OnLive Traveler10 and its communities use voice based emotive head avatars in
online 3D virtual environments of their own creation. The community has evolved a
very specific gesture and expression language that uses voice and 3D space in a
socially complex manner.
The original design goals of creating this system were for a commercial start-up,
and these goals succeeded and failed over the years, but ultimately the community
members independently evolved expressively rich conventions of gesture, expression and emotional creation for their own personal inter-relationship needs and were
running it as part of their daily lives.
In his demonstration of OnLive Traveler we had the chance to interact with the
community members online and bring their thoughts of expression and gesture into
our discussion. Being co-present in virtual space threw up some interesting issues,
about how we project onto the virtual space and are able to engage with
communication cues of proximity, of gestural nuances, of social performance cues
of courtesy, offence, emotion, with people who can simulate these using the
artificial theatre space. In interactive artwork this can become extended as we use
more of our physical body (gesture, body motion and voice) to cause the effect in
10
http://www.dipaola.org/sig99/sld002.htm, http://www.dipaola.org/steve/vworlds.html
123
590
AI & Soc (2007) 21:567–605
the virtual environment to create social and emotional contact within it with other
persons and their represented agency (of textures, colours, shapes, sounds, touch
sensations, smell).
My research on how co-presence in virtual space can be mediated by
representations of communication (agents, the written word, cartoons, etc.) was
triggered by an experience recounted to me by a Japanese colleague who used
on-line methods for teaching distributed groups of long-distance students. These
students had never met face-to-face and she never met them. The on-line
discussions were mainly functional and the information expressed in a very narrow
bandwidth of modality, namely the written word within the context of the study task
set to the students. So there was no social discourse. After a year of teaching, she
made a decision to shuffle the members of the groups around without consulting
them, and was astonished at their emotional responses to this action. They expressed
disorientation and felt she should have asked at least informed them of her intention
and asked them about it. Awareness of another presence over time, regardless of
whether one can see the other or not, builds a feeling of sharing or inhabiting a space
together, of being co-present. The Japanese colleague did not recount the details of
the study activities of the student groups in the online space, however, there is no
doubt that the awareness over time must have involved forms of expression, style of
conveying expressions and timing of expressions, that gave form for identities or
personalities to be formed. Just the timing itself of another person’s actions can give
one a sense of the other in relation to oneself.
Now I would like to give another example of distributed communication, this
time, with video-conferencing. A colleague of mine from my group in NTT was sent
to work in the International division based in Tokyo, where he was involved in
numerous video-conference based meetings with clients and colleagues in USA. My
colleague is one of those special people whom you know understands what you are
saying and can finish off your half uttered sentences. But in the video conference
situation he was unable to understand what was being ‘said’. Yet his Japanese
colleagues had no problems. I was quite taken aback by his dilemma. Here was one
of the most brilliant of communicators with whom you could discuss Kant, having
difficulties in communicating about relatively functional and mundane matters. It
emerged in conversation that his colleagues spoke very good English whereas my
colleague’s English is not so perfect and his grammar is not great. However, he is
uncanny in building trust and rapport face-to-face. NTT did recognise this quality in
him and sent him to USA to continue his work. The video conference had reduced
the bandwidth of communication to the ‘words’ and ‘grammatical fluency’. It had
not offered my colleague the other cues of co-present physical space that he uses to
be with another person when they speak and think. It was these ‘other cues’ that I
sought to better understand, for these must be the same kinds of cues that the junior
landscape architect, discussed further above, was missing when he tried to colour in
the maps with the explicit representations he had in front of him. It is these other
cues that are critical for gaining ‘experience’ of the other and building ‘tacit
knowing’ and ‘trust’ that goes deep for sustainable knowledge.
Curiously, it is also these cues that were created in the OnLive Traveler where we
used the representations of our selves within a virtual space. Could such a space be
123
AI & Soc (2007) 21:567–605
591
used to acquire embodied experiential knowledge? The senior landscape architect
was adamant that no advance in technology could replace the face-to-face coperformance. This brings me back to relationship between embodiment and the tacit
dimension in communication, and a return to Body Moves.
Body Moves and musicality of human interaction
Body moves are periodicities of rhythmic synchrony of body and speech
entrainment across persons who are engaged in interaction. During these periodic
moments the distinction between being a speaker and listener is not able to capture
the nature of the dynamics (Gill 2004). We have seen from the non-face-to-face
nodding and gestural coordination study that thinking in terms of signal and
information transmission (that underlies the speaker–listener distinction), does not
appropriately explain the function of such gestures for one’s own self-regulation or
self-synchrony in relation to the other person.
I mentioned at the beginning of this paper that my analysis of the movement
of the body in the transfer and formation of knowledge has two conceptual layers,
one from linguistics and one from my observations of the rhythmic synchrony
of entrainment. Both have importance in Body Moves, which are movements of
both the body and the voice. However, the linguistic layer inhibits the fuller
understanding of the operation of the other, to the extent that in my early work (Gill
et al. 2000), I placed one of the movements (Parallel Coordinated Movement) which
I was convinced was significant but could not explain with linguistics, until I could
investigate it much later (Gill 2002).
In search of a coherent framework to talk about Body Moves in the early stages
and handle this multi-layered problematic picture, I drew upon joint action (Clark
and Schaefer, 1989; Clark 1996) theory (pragmatics of communication) and this
seemed to work. However, joint action theory is based on information transfer and
signalled information and has a sequential turn-taking structure. I found that I still
had a problem to resolve. In work on gesture in joint action the gesture itself
becomes part of this sequential structure of human communication. Yet joint action
theory did seem helpful in two ways. Clark’s pragmatics acknowledges that there
are multiple layers of sounds and semantics that constitutes a communicative act,
and proposes that in cooperating to communicate, we express commitment to
negotiate and understand each other, and this is necessary to arrive at grounded
meanings. I have now drawn on joint attention theory (Tomasello et al. 2005; Eilan
et al. 2005; Franco 1997) to help advance upon the problems of the linguistic
structure underlying joint action.
At this point it would help to describe Body Moves in some more detail.
Body Moves
The identification and conceptualisation of Body Moves emerged during a 1-month
period (1997) spent at ATR (Adavanced Telecommunications Research) Media
Lab, located between Kyoto and Nara, with a group of computational linguists. This
123
592
AI & Soc (2007) 21:567–605
group were designing various multi-modal conversational agents, and I sat in on
their analyses of feedback and fillers. Although I was not fluent in Japanese, I had
experienced enough of the sound of the language and contexts of use to feel the
meanings that were intended in the prosodic quality. My Japanese colleagues knew I
was concerned to understand what impinges on the flow and formation of
knowledge, in particular the relation between the tacit and explicit, and I had the
opportunity to reflect on the acquisition of skills in the distributed setting of the
landscape architecture study. I was struck when viewing the video data I had
collected by how the body operated differently in different design stages. For
example architects position themselves around a table and engage with their bodies
very differently when discussing what happened in a recent meeting with a client
than they do when they are sketching out a conceptual idea together on a sheet of
paper. Analysing presence would require me to select one of the design settings to
start with. As the latter kind of scenario (sketching together) revealed a great deal of
bodily activity and one could observe the expression of the architects’ ideas as they
used their hands, pens and bodies, I selected this design stage of their work as the
data for my analysis.
In order to begin the analysis, I reflected on Wittgenstein’s discussion on actionreaction, the Philosophical Investigations, as the basis of language games. The
Japanese colleagues were looking into feedback in order to design their multi-modal
interactive agents, so we agreed that a good idea would be for me to explore how the
body performs feedback. This would be of interest to both our purposes. My
colleagues gave me Kita’s master’s thesis on gesture and cognition (1993), Traum’s
PhD thesis (1994) and a paper on conversation moves (Carletta et al. 1997). I rather
liked the idea of the word ‘move’ as something that moves the information and
conversation forwards. This gave rise to the expression ‘Body Moves’ as being what
I was looking for. Here is a description of the activity in that video clip of the
landscape architects sketching together. The background to the activity is explained
first.
The selected video excerpt is of the landscape architects working on one design
task of the daily practice in this firm. The senior architect is fully qualified and
director of the company, whilst the other is being trained and due to qualify in a
year’s time. They are both familiar with each other and share a mutual respect,
despite the difference in their status and experience (empathic relationship). Their
task is to produce a plan for the car park of a site, as well as the site itself. Some
time earlier, they had produced a sketch plan for the client. The site is to be
transformed from being an old derelict brewery to a headquarters of this client. The
client has produced a version of their sketch plan, largely following their ideas and
wants them to take this further. Part of the discussion between A (senior) and B
(junior) is whether to go for something radical or generally remain within the
bounds of what they have in front of them. They, or rather, A, decide that changing
it would not greatly improve on what they have. Hence they decide upon the latter
option.
There is a great deal of body interactions in this design activity. Their mutual
respect means that B is able to express disagreements and produce his own
suggestions. However, the discrepancy in status is evident in the take-turns and
123
AI & Soc (2007) 21:567–605
593
keep-turns that A performs. This activity is illustrated in the following sketch of the
action unfolding, of gestures, movements and entrainment taking place:
The two architects are developing a sketch plan for a client, one is senior and one
is junior. They communicate at different levels of design–the senior architect is
focusing on the ‘conceptual structure’ of the entire landscape, where as the junior
architect is focusing on ‘one position’ within that ‘conceptual structure’. Their
gestures and body movement correspond to the level of the design. The senior
architect makes large sweeping hand and arm gestures across the table. The junior
architect makes small finger and hand pointing gestures. They never ‘meet’
although they do try–as one enters the space the other leaves it or shifts their
position within it—until the senior architect ‘mimics’ the junior architect’s gesture
and posture, by focusing on one alternative location within the design to that which
was the focus of the junior architect. The moment the senior architect’s finger point
indicates this intention, the junior architect moves down into the space as well. The
moment the junior architect touches the paper with his finger and moves it in one
stroke across the area of his proposed position, the senior architect moves his finger
as well—back and forth—across his alternative proposal. During this parallel
coordinated action, they are finally synchronised and entrained within each others’
body motion and voice prosody. The moment the junior architect leaves and lifts his
body out of the design space the senior architect’s hand motion continues across into
the junior architect’s space and in one pen stroke acknowledges his proposal. This
all happens within a period of three seconds. How we know that this has led to
knowledge transformation is that the junior architect then explicitly proposes a topic
shift and moves his body position priming this shift.
In the discussion that now follows, I will explain the Body Moves that are hinted
at above. These occur where I say the architects ‘try to’ meet and where the parallel
coordinated action takes place.
Body Moves are essentially coordinated rhythms of body, speech and silence,
performed by participants orienting within a shared activity. These rhythms create
‘contact’, i.e. a space of engagement between and take two forms, sequential and
parallel. They are kinds of behavioural alignments (Scheflen 1974; Bateson 1955)
and interactional synchrony (Birdwhistle 1970; Kendon 1970) and metapragmatic
(Mey 2001). Drawing upon the idea of the composite signal (Clark 1996; Engle
1998) that denotes an individual’s composite act of speech and gesture, Body Moves
were conceived as Composite Dialogue Acts where the idea of ‘composite’ denotes
the various combinations of possible combinations of gesture, speech and silence
(Gill et al. 2000) across the persons engaging with each other (i.e. not of the
individual’s act). And as they occur Body Moves indicates the construction/
establishment of mutual ground within a space of action. Ascribing an act as being a
‘Body Move’ does not refer to, and is not defined in terms of, the physical
movement, rather, it targets the act that the movement performs.
Body Moves that were identified in the 5-minute video clip of the two landscape
architects above are:11
11
For further details of these Body Moves, please see Gill et al. (2000).
123
594
AI & Soc (2007) 21:567–605
Attempt-Contact
Dem-Ref
Take-turn, Keep-turn, Release-turn
Body-Check (B-check)
Acknowledge (Ack)
Focus
An example of the Body Moves Attempt-Contact is:
Function—Draws person’s attention and involvement
Effect—Increases engagement or commitment
Gesture—’looking’ gesture, or hand and arm gesture
The analysis of the body moves identified drew upon the concept of metacommunication (Allwood et al. 1991; Shimojima et al. 1997) in linguistics where
meta-communication is the conveyance of information about the conversation, as
opposed to being about the topic situation of the conversation itself (i.e. content).
Allwood et al. (1991) work was particularly helpful as it provided wider cognitive
or behavioural categorisations of prosodic information that other linguistic theories
could not account for, such as contact, perception, understanding and attitudinal
reactions are. He stated that these are requirements of human communication
because they describe how we relate to each other when communicating, whether
we are committed, whether we are positive or negative, whether we are perceiving
the expression or are interpreting and showing how we understand, and what
assumptions we are making. In linguistics, the conveyance of information is
described as being triggered by cues that inform about the conversation situation for
example, fillers and responsives (fillers: ‘uhh, mmm, uhm’; responsives: ‘uh huh,
ok’). These function as ‘discourse markers’ and can be identified by prosody and
‘phoricity’ (Kawamori et al. 1998). These interjections in speech determine
discourse structures and the nature of the coordination taking place (Schiffrin 1987;
Kawamori et al. 1998).
The idea of such conveyance cues or communication acts was applied to body
movements that are occurring in response to each other, whether this is related to a
verbal utterance or independent of it. However, such conveyance cues could not
apply to what was described at that time as the ‘exception’, the Parallel Coordinated
Move. In the context of gesture, Body Moves are distinct from representational or
iconic gestures of the verbal utterance as these serve primarily to illustrate it. Where
in conversation, a ‘move’ is described as being a verbal action that causes the
conversation to move forward (Carletta et al. 1997), the body move is a bodily
action, which initiates or responds to a bodily action or verbal utterance. However,
the body move is wider in its scope as it can be a response or an initiation, and in the
case of the Parallel Coordinated Move it is both in the same moment as the bodies
and speech of the participants move together. And we come back to this when
reflecting on the speaker–response model.
The building of the categories of Body Moves (BM) draws upon dialogue acts
and features of dialogue acts, which bear parallel to the phenomena observed. Some
features of dialogue acts are specific to speech and are not embodied in Body
123
AI & Soc (2007) 21:567–605
595
Moves, such as intonation, and asking questions or making commands. Some Body
Moves have required the development of new terms in order to either demarkate
between the bodily actions and their Communication Act (CA) counterpart, such as
‘check’, which as a body move becomes b-check, or because there appears no clear
counterpart in Communication Act theory, for instance dem-ref, attempt-contact and
focus. All the categories are explained below. As Body Moves are Composite
Dialogue Act (CDA) across more than one person, a BM can be accompanied by a
CA, or accompanied by no speech; and there may only be silence or stillness, e.g. as
in a pause.
It is significant that right at the outset, in distinction from conversation acts, it
was held that BMs cannot be said to embody specific intentions although they could
be said to embody an intention for communication itself (Gill et al. 2000). Body
Moves could now be described as expressions of mutually manifest intentions to
understand the other (drawing on Sperber and Wilson’s work 1986).
It was mentioned above that Body Moves create contact, a ‘space of engagement’.
An engagement space is the composite of the participants’ body fields of
engagement. Hence we can call the engagement space, the body field of engagement.
The body field of engagement is set as the communication opens and the bodies
indicate and signal a willingness to cooperate (This linguistic description draws on
Allwood’s 1995). The body field of engagement is a variable field and changes
when participants are comfortable or uncomfortable with each other. For instance,
in the case where one person moves their hand over into the other’s space, and that
person withdraws their hand, this indicates that the ‘contact’ between these persons
is disrupted. There are also examples where the participants hold their bodies back
from entering either’s field of engagement, indicating disagreement or discrepancy
in the communication, and distance rather than contact. The degree of contact or
nature of distance is described in terms of commitment and attitude. Hence an
immediate space of engagement involves a high degree of contact and commitment
to the communication situation, whereas a passive distance is less involved and
committed, and disagreement, is very distanced and commitment is withheld.
Disagreement or discrepancy can necessitate a reconfiguration of the body field of
engagement.
Reconfiguration occurs when there is a disturbance in the relationship between
the speakers, i.e. there is a discrepancy between them which is expressed by the
bodies need to re-arrange their relationship to each other so that a feeling of sharing
an engagement space is re-established. In other words, there has been momentary
detachment or distance. The reconfiguration is a response move, which is almost
akin to a motor reaction. It is a rhythmic reconfiguration of the body space between
the participants to create a new engagement situation by reshaping the field of
engagement.
This category of action occurs because there is a problem in the overlap in one
body’s field of engagement with the other body’s. It is necessarily a reaction to the
other person moving into one’s body field, at that particular moment. Note that if
there is no problem in the overlap of their respective fields, the participants can
undertake parallel coordinated moves or collaborate in problem solving.
123
596
AI & Soc (2007) 21:567–605
Within the space of engagement, bodies can move in a coordinated manner to
shift the level of focus within the communication situation, for instance upon a
specific point. Focus is actually categorised as a body move. It involves a movement
of the body towards the area the speaker is attending to, i.e. space of bodily
attention, and in response causes the listener or other party to move their body
towards the same focus. The category becomes significant as a dimension of body
interaction as it is about focus management.
The engagement space bears a relationship to the ‘o’ space developed by Kendon
within the field of gesture and which I later learnt of. In subsequent communication
with him, the relationship between the engagement space and Kendon’s ‘o’ space
is explored. Within research on space and communication, Kendon’s (1990)
F-formation system is a foundational theory for how we orient ourselves within
what he terms, the joint transactional space or ‘o’space. Each person, when acting
on their own, has a private space called a transactional segment. For example, when
watching the television, a person’s transactional segment is defined as the space that
he/she ‘looks (into) and speaks, and in which he/she reaches to handle objects’.
Other persons acknowledge and ‘respect this space, not entering or crossing into it’.
This transactional segment is managed by the person’s behaviour. When at least two
people get together to do something, they ‘arrange’ themselves such that their
transactional segments overlap and create a joint transactional space. These persons
‘agree to maintain joint jurisdiction and control over this space’, termed the ‘ospace’. Kendon cites the example of a group of people standing in a circle in a park,
talking.
The jointly constructed ‘o-space’ provides the ‘frame’ within which the ‘story
line’ of the interaction could develop. But the analysis does not deal with the spatial
stucturings involved in the actual unfolding of that ‘story line’ (this term is
borrowed from Goffman who uses it in his ‘frame analysis’ (1974). However, the
engagement space does. It is the dynamic ever changing space that a person creates
as he/she unfolds a line of activity within the ‘frame’ or ‘story line’ provided by the
overlapping transactional segments (the spatial structure for possible action shaped
by the posture and positioning of the bodies). This overlapping of segments or ‘ospace’ is the arena within which actions pertinent to the interaction can take place—
it sets the ‘field’ that I speak of, for whatever is to be done. The ‘engagement space’
is the space ‘consumed’ by actors within the o-space, and provides an understanding
of how participants who physically reach into each other’s engagement spaces
negotiate this unfolding. The bodily moves made within those spaces become
pertinent for the communicative process just because they are coordinated within
each others’ engagement spaces. It is significant that these moves are rhythmic and
synchronised.
The first form of rhythmic synchrony serves to sustain our commitment to engage
with each other, and the second form serves to transform our states of tacit knowing
such that we are able to arrive at agreements and achieve topic shifts. The latter
form cannot be arrived at without the existence of the former and emerges from it.
Both of these forms of Body Moves are moments of empathic connection.
In summary, there are two forms of Body Moves.
123
AI & Soc (2007) 21:567–605
597
(a)
Parallel Coordinated Moves, which are moments of simultaneous coordinated
autonomy. These are evidence of intersubjectivity, and where knowledge
transformation occurs.
(b) Other Body Moves (for present purposes shall still be called sequential), which
are moments of coordinated autonomy. They serve to build common ground,
and enable knowledge flow.
In this next section, the empathetic connection that the Body Moves embody is
analysed.
Body Moves and entrainment
The Body Moves have been described as being moments of empathetic connection.
Drawing on the musical term ‘accent’, these moments may be expressed as being
‘accents’ of affect (emotion), and the most heightened one is that of the Parallel
Coordinated Move where the accent is of empathy. Drawing on the concept of
entrainment, these accents may be described as moments of ‘convergence’ that
culminate in the strongest moment of convergence, where transformation takes
place. Body Moves are emergent beats from the interactive structures they are
embedded within. These beats are pulse periodicities. In the example of the two
landscape architects we can see the emergence and movement of these pulse
periodicities. The affective natures of Body Moves are indicative of how the
participants are relating to each other. We spoke of this dimension of affect when
talking of ‘contact’ in the engagement space that Body Moves shape. They are
expressive structures through which we sense our relationship to each other. In a
talk given at the Interacting Bodies conference in 2005, I described these emergent
beats as ‘salient phenomenal beats’. And I would like to explain what I mean by
this.
To do so, I have to reconsider the application of the word ‘signal’ to Body Moves
as denoting phenomena that has phenomenological experience already embodied
(Tolbert 2001). The origins of this embodiment may be seen as being shaped in
motherese, the interaction between a mother and her baby. Studies of motherese
(mother–infant interaction) shows how the poetic sounds/rhythms of a mother’s
utterances and bodily engagement differ from ordinary adult interaction, by being
simplified, rhythmically repeated, exaggerated and elaborated. This, uttered
universally with a high, soft and breathy voice, and phonetic foregrounding of
salient utterances, attracts and sustains attention (Miall and Dissanayake 2003). The
mother exhibits alternative patterns of intimacy and observation, empathy and
commentary. The aesthetics in this poetic engagement facilitates emotional
attachment. They propose that evidence of the sensitivity of infants as young as
6–8 weeks old to indications (vocal, visual and Kinesic) of social contingency of
mothers/fathers/partners, is evidence of design in neural organisation. They argue
that this supports the view that mutuality or intersubjectivity—the coordinating of
behavioural–emotional states with another’s in temporally organised sequences—is
a primary human psychobiological endowment. Mutuality is dependent on a
123
598
AI & Soc (2007) 21:567–605
fundamental dyadic timing matrix. Disorders of emotion, and learning in early
childhood are traceable to faults in early brain growth of neural systems underlying
this capacity (Trevarthen 1994, 2005).
Hence the expression ‘signal’ does not mean ‘information’, but an experientially
grasped iconicity of bodily states of the movements of body and sound. With the
origins rooted in motherese, each person’s experiential reference system is different
but there will be some shared cultural base. Predictability or tacit knowing, lies in
sharing this cultural base, these points of reference in order to come together in joint
attention (Tomasello 2005). Music—provides a framework for thinking about this.
Music’s model is that in order to grasp it you have to engage with it. Musical
concepts such as tempo may need to be considered to understand this dynamics, as
people have different personal time-frames, and it is possible they may articulate the
same temporal patterns at different rates (turn-taking, sequences), that periodically
come together (i.e. the Body Moves). Also after they have reached the end of a point
of negotiation as in the parallel coordinated move, the personal tempos are
sychnronised. They are entrained. Entrainment is coordinating the timing of our
behaviours and rhythmically synchronising our attentional resources.
This proposal that personal tempos become synchronised would support the basis
on which I identified the Body Moves as being salient movements of bodies and
voices playing off each other and with each other, and describing these moves as
being distinct from linguistic ‘turn-taking’. If one considers music performance, we
can further say these movements are distinct because the participants are coperformers, i.e. they are both speaking and listening to each other as performers at
exactly the same time, not in terms of turn-taking as in speech. This ‘musical’
communicative structure is essential to arrive at parallel coordinated action.
It is proposed that this synchronisation of temporal rhythms is how two people
can come to share an idea (or acknowledge each others’ recognition of a mutually
manifest; Sperber and Wilson 1986, state of affairs), and through parallel and
coordinated motion, they reach a heightened form of co-regulation and convergence. The use of artefacts (interactive technologies, mediating technologies, pens
and paper, overhead slides, smart collaborative technologies) can influence these
rhythmic coordinations and even disrupt them, and this has implications for how we
manage knowledge transformation.
The examples given above of the architects coming to share an aesthetic
judgment about colours, and of the participants in the simulation of query-answer
web-search session of where to eat out in Tokyo, involved the shaping of a group
norm of ways of doing, talking, sensing and perceiving information. These shared
norms emerged from the entrainment of their individual tempo, pitch, phrasing, of
embodied experience, carried in their rhythmic synchronisation, that they brought to
bear on the situation, grounding and accommodating its meaning in relation to
themselves and each other. In a study of beat entrainment by Himberg (2006), where
two people are tapping to a metronome (the artificial beat), their tapping drifts from
the metronome as they tap to the beat of each other. After some time their tapping
realigns with the metronome, and then it drifts off again to each other’s beat. All this
happens with no awareness on the part of the ‘tappers’ who think they have been
tapping to the metronome (the artificial beat) all along. It is not an accident that each
123
AI & Soc (2007) 21:567–605
599
branch of the distributed architects’ firm had its own shared sense of colour, as any
shared norm emerges from the movements of those that ground it.
‘‘Collective action is seen as being shaped by the movement between the
individual and social situation. We achieve collective action through our understanding of the performance of representations of the tacit dimension in our
communication, such as gestures, non-verbal cues, speech and pauses, as well as
artefacts of practice, including technology, evident in how we perform with them.
This understanding mediates the constant transition between individual and social
states (Gill 2004). The Body Moves are transitions in body and speech sound
movement and these transition states appear to be associated with different types of
rhythm where the sound and body motions of the participants becomes coupled, if
only for a very brief moment. This coupling is considered to be part of the process
of understanding, and it is expressed at a level of the social. Rhythm, tempo, pulse
are our personal expression of relating with another human being and our
environment, and are in that sense, social. The coupling in this picture of knowledge
formation is therefore considered as being part of the social understanding of tacit
knowing. It is akin to music performance. In a choir for example, singers and
musicians are ‘collectively engaged in the synchronous production and perception
of complex patterns of sound and music’ (Cross 2007; Arom 1991; Blacking 1976).
In collective musical behaviour, the individual behaviours are likely to be
coordinated with time and be more or less predictable in relation to each other.
This collective activity therefore has a high degree of coherence, which is likely to
help establish a strong sense of group identity (Stobart and Cross 2000; Cross 2007).
This work from music support(s) the idea of collective action as entrainment in the
case of Body Moves. These are points of convergence in interaction, where features
of musicality are drawn on in terms of interpersonal performative cues—each
participant ‘signalling’ to the other their mutuality of understanding by ‘sharing
time’—the mutual sense of shared meaning that is a feature of musicality in
interaction (Cross 2007) is foregrounded and confirmed in the sharing of time.
In musical terms, body moves (of body and vocal sound) are salient and explicit
and phenomenal ‘beats’, that are emergent from accentual structures in terms of the
articulated interactive structure (interactive gestural structure such as the engagement space), i.e. the structure of the interaction embodies cues as to structural accent
(some events are more salient, more differentiated, more referential in respect of the
dimensions that the interaction employs) that are experienced as phenomenal or
veridical accents. Each person has different phenomenological experience embodied.
Conclusion: Body Moves and knowledge transformation
Body Moves can be summarised has being rhythmic spontaneous coordinations of at
least two people that indicate the nature of contact, resonance and commitment
within a communication situation. They span more than one body (are collective).
They shape and constantly configure the engagement space of action. Body Moves
enable the formation and transformation of tacit knowing and intersubjectivity
123
600
AI & Soc (2007) 21:567–605
(Polanyi 1964, 1966). In performing Body Moves we engage with the representations of the tacit dimension of another’s actions and move with them, for example,
in a design activity, or to form a shared identity. Action is the performance, whilst
its tacit dimension, is its basis that is sensed, grasped, responded to, rather like
music. Going back to the consultancy practice study and the colouring of the maps
to share aesthetic judgment, the representation of the tacit dimension of action is the
structure of the form of its expression. In engaging with the representation of the
tacit dimension of another’s actions, we are resonating with the communicative
structures being performed. In order to be able to do so, we both draw upon
experience and are experiencing in the same moment. The bodily dimension of this
resonance is critical for presence. In his work on ‘Keeping Together in Time’
(McNeill 1995) speaks of visceral and emotional sensations that come with shared
movement, that he proposes endows groups with the capacity to cooperate’’. The
Body Moves are coordinated autonomy, essential for sociality.
In the ‘Tacit Dimension’, Polanyi described a relation between emergence and
comprehension, as existing when ‘an action creates new comprehensive entities’.
Parallel Coordinated Moves are multi-activity gestural coordinations, where
different but related projects are being expressed in the body actions of the
participants at the same time. This fusion provides the conditions for tacit
transformation in a new plane of understanding from the prior periodicities that
revolve around one idea, and as a result they create new comprehensive entities,
expressed in the simultaneous rhythmic sychronisation of bodies and speech. The
collaborative features of these moves enable the participants to negotiate and
engage in the formation of a common ground (Gill 2002) whilst expressing different
perspectives. In the ‘engagement space’ it is the one moment where the body fields
can overlap without disturbance and co-regulation is at a heightened affective state.
For Polanyi, the body is the ‘ultimate instrument of all external knowledge’, and
‘wherever some process in our body gives rise to consciousness in us, our tacit
knowing of the process will make sense of it in terms of an experience to which we
are attending’. In performing Body Moves, the ability to grasp and sense someone’s
motions, and respond to them appropriately (skilfully) is based on experience (tacit
knowing of the process) and experiencing (experience to which we are attending). It
is spontaneous action.
Hence the architects can come to see colours with shared (not same) aesthetic
judgements. Knowledge or a concept is an emergent property of being engaged with
another. It is a point in the knowledge flow where transformation occurs, and this
emerges through, for example, acknowledging and demonstrating understanding of
what someone is trying to say in the same ‘language’ expressing mutually shared
intention to understand. The concept of sharing an idea of a colour as being of a
‘right shade of blue’ to work with a particular shade of ‘green’ is borne in practice
with the other’s expression of experience in making judgements. The formation of
concepts have underlying them a temporal flow of prosodic and modulating events
(gestures, body motion, vocal sound) that is entrainment. Entrainment is coordinating the timing of our behaviours and rhythmically synchronising our attentional
resources.
123
AI & Soc (2007) 21:567–605
601
The identification of Body Moves has contributed to the conception of ‘pragmatic
acts’ by widening the ‘narrow conception of strict natural language pragmatics’
(Gill et al. 2000; Mey 2001) and the analysis needs to be taken deeper to understand
the entrainment processes in human interaction.
Non-verbal communication has already gained strong ground in different
disciplines and in its significance for understanding the ‘human interface, the point
at which interaction occurs’ (Gill et al. 2000). As part of this shift from cognition
to communication, the international gesture society was founded in 2002. The
increasing focus on the non-verbal has been reflected in the design of interactive
multi-modal systems, some of which was referred to above. The performance arts
domains are exploring the relation between sounds (from the spectrum of
recognisable speech to recognisable music) and movement in dance performance
and take this understanding beyond the ‘paralinguistic’.
Body Moves are being applied to performance environments, e.g. dance, where
they are described as a-linguistic pragmatics, emerging in the sound and visual
system of the performance space (Sha and Gill 2005). Although they were initially
coded (described or explained) within a linguistic context, this has proven to be
problematic as linguistics separates the speaker (sender) and the listener (receiver).
Sender and receiver are connected through information transmission, which allows
for the conception of the autonomous cognitive entity, the autonomous expert,
functioning with explicit knowledge. However, Body Moves are necessarily sensory
couplings of engaged persons, and by working with performance arts domains, their
analysis has evolved to discover their essential qualities as performance periodicities of the body that are self and other oriented.
Body Moves give us a further insight into the nature and operation of ‘copresence’ (Good 1996), which is an essential component of human understanding,
denoting how we are present to each other, be this in the same physical space or
in differing physical spaces (e.g. computer mediated spaces, virtual, or mobile
technology mediated spaces, computer augmented performance spaces). Being
present is described as a precondition for committed communication, but the nature
of this precondition affects how we coordinate with each other and make sense and
meaning. Work on the body and experiential knowledge suggests that the design of
technology needs to work with these affordances of entrainment.
Reflections
The ideas that this paper has worked through have journeyed through numerous
forms of technology, and have explored the assumptions that lie behind their design
as well as the impacts these technologies have on us. This exploration has covered
theoretical foundations and methodologies from philosophy, social science,
psychology, linguistics, performance arts, interactive arts. The research investigation has evolved the methodology for the tacit dimension by bringing together the
subtle aspects that these disciplines afford for understanding the tacit dimension at
many levels of human engagement. Each theory and methodology on its own was
insufficient. The analysis provides the basis for a theoretical and methodological
123
602
AI & Soc (2007) 21:567–605
design framework for interactive technologies to afford those human capacities and
qualities that are essential for human communication and engagement. Such design
requires conceiving of ‘interface’ as located in ‘dialogue’ and ‘personal’ and
‘experiential knowledge’ (Gill 1995).
The human–machine interface needs to afford us the resonance of structures in
communication operating at multiple dimensions. As we exist within our bodies, the
body may be seen as a mediating interface for the tacit and explicit interrelationship of human communication, carried through the body’s movement, breath
and vocal sound; gesture, silence and speech. This mediating interface needs to
operate in the human engagement sphere and the human–machine interface needs to
afford us this capacity necessary for entrainment.
It is important to root the methodology and design in a conceptual framework for
the relationship between entrainment, experiential knowledge, the body, tacit
knowing and communication. Such a framework is needed for analysing and
understanding the current and future impacts on our engagement with each other
and our environments when we interact with various forms of interactive
technologies. This framework will always be incomplete as each new form of
technology causes us to rediscover (in a long historical sense) what makes us
human. However, because it is about the very fundamental layers in which culture
and semantic meaning is rooted, it can evolve with the technological developments
and keep apace of them.
Acknowledgements I would like to thank the following people for their inspiration over the years in
supporting the development of the ideas, Bo Goranzon, Deborah Bekerian, Masahito Kawamori, Herb
Clark, Ian Cross, Terry Winograd, Ajit Narayanan, David Good, Mike Cooley, Seija Kulkki, Timo Saari,
Hisao Nojima, Yasuhiro Katagiri, Sotaro Kita, David Smith, Jeremy Potter, Jan Borchers, Sha Xin-Wei,
Liz Tolbert, Hubert Dreyfus, Adam Kendon and Karamjit S. Gill. I also thank Colin Tully, William Wong
and Martin Loomes of Middlesex University for their support of this research.
References
Alibali MW, Heath DC (1999) Effects of visibility between speaker and listener on gesture production:
some gestures are meant to be seen. Source of reference was this pre-published paper. Later
publication
Alibali MW, Heath DC, Myers HJ (2001) Effects of visibility between speaker and listener on gesture
production: some gestures are meant to be seen. J Mem Lang 44:169–188
Allwood J, Nivre J, Ahlsen E (1991) On the semantics and pragmatics of linguistic feedback (Tech. Rep.
No. 64). Gothenburg Papers. Theor Linguist
Anderson ML (2003) Embodied cognition: a field guide. Artif Intell 149(1):91–130
Arom S (1991) African polyphony and polyrhythm: musical structure and methodology. Cambridge
University Press, Cambridge
Barsalou LW (1988) The content and organisation of autobiographical memories. In: Neisser U,
Winograd E (eds) Remembering reconsidered: ecological and traditional approaches to the study of
memory, pp 193–243, Cambridge University Press, New York
Bateson G (1955) The Message. ‘This is the Play.’ In: Schaffner B (ed) Group processes, vol II. Macy,
New York
Bavelas JB (1994) Gestures as part of speech: methodological implications. Res Lang Soc Interact
27(3):201–221
Bavelas JB, Chovil N, Coates L, Rose L (1995) Gestures specialized for dialogue. PSPB 21(4):394–405
123
AI & Soc (2007) 21:567–605
603
Bekerian DA, Dennett JL (1990) ‘Spoken and written recall of visual narratives’. Appl Cogn Psychol
4:175–187
Birdwhistle RL (1970) Kinesics and context. University of Pennsylvania, Philadelphia, PA
Blacking J (1976) How musical is man? Faber, London
Carletta J, Isard A, Isard S, Doherty-Sneddon G, Anderson A (1997) The reliability of a dialogue structure
coding system. Assoc Comput Linguist 23(1):13–31
Cassell J, Sullivan J, Prevost S, Churchill E (2000) Embodied conversational agents. MIT, Cambridge, MA
Clark HH, Schaefer EF (1989) Contributing to discourse. Cogn Sci 13:259–294
Clark HH (1996) Using Language. Cambridge University Press, Cambridge
Clayton M, Sager R, Will U (2005) In time with the music: the concept of entrainment and its significance
for ethnomusicology. In: European meetings in ethnomusicology 11 (ESEM Counterpoint 1),
(2005), pp 3–75
Condon WS, Ogston WD (1966) Sound film analysis of normal and pathological behavior patterns.
J Nerv Ment Dis 143:338–347
Cooley MJ (1987) Architect or bee? Hogarth Press, London
Coupland N (1999) The discourse reader. Routledge, London
Cross I (2007) The evolutionary nature of musical meaning. Musicae Scientiae (in press)
Di Paola S (2001) Gesture and narrative creation in avatar-based 3D virtual communities. Invited paper at
gesture and dialogue seminar, CSLI, Stanford University, Available at http://www.dipaola.org/
sig99/sld002.htm
Ekman P, Friesen WV (1969) The repertoire of nonverbal behaviour: categories, origins, usage, and
coding. Semiotica 1:49–98
Ekman P, Friesen WV (1972) Hand movements. J Commun 22:353–374
Eilan N, Hoerl C, McCormack T, Roessler J (2005) Joint attention: communication and other minds.
Issues in philosophy and psychology. OUP, Oxford
Engle RA (1998) Not channels but composite signals: speech, gesture, diagrams and object demonstrations
are integrated in multimodal explanations. In: Proceedings of the 20th annual conference of the
cognitive science society, pp 321–327, USA
Fodor JA (1976) The language of thought. The Harvester Press, Sussex
Fodor JA (1981) Representations: philosophical essays on the foundations of cognitive science. Harvester
Press, Brighton
Franco F (1997) The development of meaning in infancy: early communication and social understanding.
In: Hala S (ed) The development of social cognition. Psychology Press, Hove
Fridlund AJ (1991) Sociality of solitary smiling: potentiation by an implicit audience. J Pers Soc Psychol
60(2):229–240
Gill JH (2000) The tacit mode. Michael Polanyi’s postmodern philosophy. SUNY Press, New York
Gill KS (1996) The foundations of human-centred systems. In: Proceedings of the human machine
symbiosis: the foundations of human-centred systems design. Springer, London
Gill SP (1988) On two AI traditions, AI & Society, vol 2 No.4. Springer, London. NB. A version of this
Gill SP (1988) Knowledge and skill transfer through expert systems: British and Scandinavian
traditions. In: Research and development in Expert Systems V: proceedings of Expert Systems ’88,
Cambridge University Press, Cambridge
Gill SP (1995) Dialogue and tacit knowledge for knowledge transfer. Ph.D. Thesis, University of
Cambridge, Cambridge
Gill SP (1998) Body language: the unspoken dialogue of bodies in rhythm. In: Proceedings of the ESSLI
workshop on mutual knowledge, common ground and public information. Gill SP (1999) Mediation
and communication of information in the cultural interface. In special issue on science, technology
and society. AI Soc 13:1–17
Gill SP, Kawamori M, Katagiri Y, Shimojima A (2000) The role of body moves in dialogue. RASK
12:89–114
Gill SP (2002) The parallel coordinated move: case of a conceptual drawing task. Published Working
Paper: CKIR, Helsinki. ISBN
Gill SP, Kawamori M (2002) Coordination of gestures in a non face-to-face setting. In: Rector M, Poggi I,
Trigo N (eds) Gestures, meaning and use. Fundacao Fernando Pessoa, Porto
Gill SP, Borchers J (2004) Knowledge in co-action: social intelligence in collaborative design activity. AI
Soc 17(3). This paper is an adaptation of a conference paper presented at social intelligence design
2003, Royal Holloway, London
123
604
AI & Soc (2007) 21:567–605
Gill SP (2004) Body moves and tacit knowing. In: Gorayska B, Mey JL (eds) Cognition and technology.
John Benjamin, Amsterdam
Gill SP (2005) Pulse periodicity in paralinguistic coordination. Presented at international conference,
Interacting Bodies, Lyon
Goffman E (1974) Frame analysis: an essay on the organization of experience. Harper Row, London
Good DA (1996) Pragmatics and presence. AI Soc J 10(3, 4):309–314
Goodwin C (1997) The blackness of black: colour categories as situated practice. In: Resnick LB, Roger
S, Clotilde P, Barbara B (eds) Discourse, tools and reasoning: essays on situated cognition. Springer,
New York, pp 111–140
Goodwin C (2003) Pointing as situated Practice. To appear In: Sotaro K (ed) Pointing: where language,
culture and cognition meet. Lawrence Erlbaum, London (in press)
Goranzon B (1993) The practical intellect: computers and skills (Artificial Intelligence and Society).
Springer, Heidelberg
Goranzon B, Josefson I (1988) Knowledge, skill and artificial intelligence. Springer-Verlag, London
Himberg T (2006) Co-operative tapping and collective time-keeping—differences of timing accuracy in
duet performance with human or computer partner. Presentation at 9th international conference on
music perception and cognition, Bologna, Italy
Hutchins E (1995) Cognition in the wild. MIT, MA
Johannessen KS (1988) Rule following and tacit knowledge. AI & Soc J 2:287–302
Kawamori M, Kawabata T, Shimazu A (1998) Discourse markers in spontaneous dialogue: a corpus
based study of Japanese and English. In: Proceedings of 17th international conference on
computational linguistics (COLING-ACL98)
Kendon A (1970) Movement coordination in social interaction: some examples described. Acta Psychol
32:100–125
Kendon A (1972) Some relationships between body motion and speech: an analysis of an example. In:
Seigman A, Pope B (eds) Studies in dyadic communication. Pergamon Press, Elmsford, NY
Kendon A (1990) Conducting Interaction. Cambridge University Press, Cambridge
Kendon A (2004) Gesture: visible action as utterance. CUP, Cambridge
Kita S (1993) Language and thought interface: A study of spontaneous gestures and Japanese mimetics.
Ph.D. Thesis. University of Chicago, Chicago, IL
Large E, Jones MR (1999) The dynamics of attending: how people track time-varying events. Pyschol
Rev 106(1):119–159
Maturana H, Varela F (1980) Autopoiesis and cognition: the realisation of the living. D. Reidel, Dordecht
McNeill D (1992) Hand and mind. University of Chicago Press, Chicago
McNeill WH (1995) Keeping together in time. Harvard University Press, London
Mey J (2001) Pragmatics. An introduction. Blackwell, Oxford
Miall DS, Dissanayake E (2003) ‘‘The Poetics of Babytalk.’’ Human Nature 14:337–364
Miyazawa K, Gill SP, Nojima H (1999) The influence of the internet on Japanese education: case of the
100 schools networking project. In: IEEE proceedings of internet workshop’99, Osaka, Japan
McNeill D, Cassell J, McCullough K-E (1994) Communicative effects of speech-mismatched gestures.
Res Lang Soc Interact 27(3):223–237
Polanyi M (1964) Personal knowledge: towards a post critical philosophy. Harper and Row, NY
Polanyi M (1966) The tacit dimension. Doubleday. Reprinted version, 1983, Peter Smith, Gloucester,
Mass
Pylyshyn ZW (1984) Computation and cognition: towards a foundation for cognitive science. MIT Press,
Cambridge, MA
Reiner M, Gilbert J (2004) The symbiotic roles of empirical experimentation and thought experimentation
in the learning of physics. Int J Sci Educ 26:1819–1834
Rizzolatti G, Fogassi L, Gallese V (2001) Neurophysiological mechanisms underlying the understanding
and imitation of action. Nat Rev Neurosci 2:661–670
Ferrari PF, Fogassi L, Gallese V, Rizzolatti G (2003) Mirror neurons responding to the observation of
ingestive and communicative mouth actions in the monkey ventral premotor cortex. Eur J Neurosci
17(8):1703–1714
Roth EM, Patterson ES, Mumaw RJ (2001) Cognitive engineering: issues in user-centered system design.
In: Marciniak JJ (ed) Encylopedia of software engineering. 2nd edn. Wiley, NY
Scheflen AE (1974) How behaviour means. Exploring the contexts of speech and meaning: kinesics,
posture, interaction, setting, and culture. Anchor Press/Doubleday, New York
123
AI & Soc (2007) 21:567–605
605
Schiffrin D (1987) Discourse Markers. Cambridge University Press, Cambridge
Sha XW (2002) Resistance is fertile: gesture and agency in the field of responsive media, in makeover:
writing the body into the posthuman technoscape. Configurations 10(3):439–472
Sha X-W, Gill SP (2005) ‘Gesture and response in field-based performance’. In: The ACM Proceedings
of creativity and cognition 2005, Goldsmiths College, London
Sha XW (2005) The TGarden performance research project. Modern Drama 48(3): Fall, Special issue:
technology 585–608
Shannon C, Weaver W (1949) The mathematical theory of communication. University of Illinois Press,
Urbana, IL
Shimojima A, Katagiri Y, Koiso H (1997) Scorekeeping for conversation-construction. In: Proceedings of
the Munich workshop on semantics and a pragmatics of dialogue
Simon H (ed) (1982) Models of bounded rationality. Behavioral economics and business organization, vol
2. MIT, Cambridge, MA, pp 424–443
Simon H (1969) The sciences of the artificial, 1st edn. MIT Press, Cambridge, MA
Simon H (1983) Reason in human affairs. Stanford University Press, Stanford, CA
Sperber D, Wilson D (1986) Relevance: communication and cognition. Blackwell, Oxford
Stobart H, Cross I (2000) The andean anacrusis? Rhythmic structure and perception in easter songs of
Northern Potosı́, Bolivia. Br J Ethnomusicol 9(2):63–94
Tolbert E (2001) Music and meaning: an evolutionary story. Psychol Music 29(1):84–94
Tilghman BR (1988) Seeing and seeing-as. AI Soc J 2(4):303–319
Tomasello M, Carpenter M, Call J, Behne T, Moll H (2005) Understanding and sharing intentions: the
origins of cultural cognition. Behav Brain Sci 28:675–691
Trevarthen C, Aitken KJ (1994) Brain development, infant communication, and empathy disorders:
intrinsic factors in child mental health. Dev Psychopathol 6:597–633 (zit. nach Trevarthen, 1996)
Trevarthen C, Aitken KJ(2005) Disorganized rhythm and synchrony: early signs of autism and Rett
syndrome. Brain Dev 27:S25–S34
Trevarthen C, Colwyn S (2000) The dance of wellbeing: defining the music therapeutic effect. Nord
J Music Ther 9(2):3–17
Trevarthen C (2005) First things first: infants make good use of the sympathetic rhythm of imitation,
without reason or language. J Child Psychother 31(1):91–113
Traum DT (1994) A computational theory of grounding in natural language conversation, Ph.D. Thesis.
The University of Rochester, Rochester, NY
Tulving E (1972) Episodic and semantic memory. In: Tulving S, Donaldson W (eds) Organisation of
memory. Academic, New York
Weizenbaum J (1966) ELIZA—a computer program for the study of natural language communication
between man and machine. Commun ACM 9(1):36–45
Weizenbaum J (1976) Computer power and human reason. WH Freeman, San Francisco
Wittgenstein L (1953) Philosophical investigations
123