Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
AI & Soc (2007) 21:567–605 DOI 10.1007/s00146-007-0103-8 ORIGINAL PAPER Entrainment and musicality in the human system interface Satinder P. Gill Received: 12 December 2006 / Accepted: 21 February 2007 / Published online: 25 May 2007 Ó Springer-Verlag London Limited 2007 Abstract What constitutes our human capacity to engage and be in the same frame of mind as another human? How do we come to share a sense of what ‘looks good’ and what ‘makes sense’? How do we handle differences and come to coexist with them? How do we come to feel that we understand what someone else is experiencing? How are we able to walk in silence with someone familiar and be sharing a peaceful space? All of these aspects are part of human ‘interaction’. In designing interactive technologies designers have endeavoured to explicate, analyse and simulate, our capacity for social adaptation. Their motivations are mixed and include the desires to improve efficiency, improve consumption, to connect people, to make it easier for people to work together, to improve education and learning. In these endeavours to explicate, analyse and simulate, there is a fundamental human capacity that is beyond technology and that facilitates these aspects of being, feeling and thinking with others. That capacity, we suggest, is human entrainment. This is our ability to coordinate the timing of our behaviours and rhythmically synchronise our attentional resources. Expressed within the movements of our bodies and voices, it has a quality that is akin to music. In this paper, disparate domains of research such as pragmatics, social psychology, behaviourism, cognitive science, computational linguistics, gesture, are brought together, and considered in light of the developments in interactive technology, in order to shape a conceptual framework for understanding entrainment in everyday human interaction. S. P. Gill (&) School of Computing Science, Middlesex University, Ravensfield House, The Burroughs, Hendon, London NW4 4BT, UK e-mail: spg12@cam.ac.uk 123 568 AI & Soc (2007) 21:567–605 Introduction The last 20 years have seen a transition in the ways in which designers think about human cognition, communication and the role of the body in their conception of knowledge [from residing in an autonomous individual to being distributed across individuals (e.g. distributed cognition, Hutchins 1995)], intelligence (from individual to social) and being human (from being a black box to being multi-modal and multi-sensory, Sha 2002). This transition is a fundamental shift from a focus on cognition as disembodied to being embodied (Andersen 2003), evident in the definitions and inclusions of the social (embodied cognition, gesture) and emotion, in the field of cognitive science and designs of interactive AI, Art and Robotics technologies. Part of this shift has been motivated by the engagement of the artistic and performance narrative domains that have reflected on the nature of public, home and entertainment spaces by using technology to explore and extend our engagement spaces (e.g. Topological Media lab,1 Sha 2005). Art demands a consideration of our senses and the aesthetic, and performance arts demands a consideration of the communication environments and performance structures (e.g. mixed media dance). Technology’s appropriation by the narrative domains has created possibilities of analogue and digital symbiotics [e.g. Sponge, TGarden, TG2001, http://sponge.org/projects/m3_tg_intro.html; Infomus Lab2 (Camurri)] beyond the boundaries of the typewriter and TV screen metaphor that most of us work with and communicate through. There is one fundamental conception that frames the design of human-machine interaction and that is the information theoretic signal transmission model (Shannon and Weaver 1949) which involves a speaker as being distinct from a listener at any moment in time, and where information is passed from speaker and received by listener. This is a linear model of feedback. The limitations of the conception and implications for human–machine symbiosis will be explored in this paper, drawing on a theory of Body Moves and the concept of entrainment. Entrainment occurs when two oscillators come to oscillate together. In human interaction, entrainment is the coordinating of the timing of our behaviours and the synchronsing of our attentional resources (Clayton et al. 2005; Cross 2007). Body Moves serve this function (Gill et al. 2000). Body Moves are periodic rhythmic synchronisations of the movements and modulations of our bodies and voices as we engage with each other. Coordination and entrainment In our everyday interactions with others, one of the least conscious yet most powerful ways in which we engage is with the movements of our bodies and voices. As we know, musical movement is not the same as displacement in physical space, for example: ‘vocal gesture’. The complexity of these coordinations of body and 1 http://topologicalmedialab.net/joomla/main/index.php 2 http://www.infomus.dist.unige.it/ 123 AI & Soc (2007) 21:567–605 569 voice is quite astonishing and we are, in most cases, able to adapt to our individual differences. We only become consciously aware of how we move and sound when the coordination is not working well or when it is unwanted. A friend of mine recounted an experience of trying to communicate with a person whom she was quite sure must have been a very nice person but because they could not quite get the coordination of their speech and motion patterns quite right, she gave up trying to do so. One case of unwanted coordination that we can all relate to is that experience of walking down a street and finding yourself in step with a total stranger which creates a sense of closeness that feels awkward. You feel conscious of your footsteps being in a ‘togetherness’ with this person’s, and try to get out of step. Another scenario is when you are walking with a friend and suddenly becoming aware that you are not in step and try to get in step in order to be ‘with’ that person. I will later in this paper be referring to this particular moment of togetherness as ‘parallel coordinated movement’. In much work on human coordination and cooperation (e.g. conversation analysis, Clark 1996; Schiffrin 1987; Coupland 1999), the focus is on speech and on our capacity for sequential turn taking in conversations and joint tasks. Much of this focus is rooted in a conception of information as signal, where information is passed from one autonomous person to another autonomous person in their utterances, and interpreted and responded to, etc. In other words, there is continuous feedback of information transmission. In this picture, it becomes possible to describe human knowledge as information that is transmitted, stored, processed and recalled (orthodox cognitivism, computationalism: Fodor 1976, 1981; Pylyshyn 1984; Simon 1969, 1982, 1983). The body is also understood as part of that communication process and operates according to the same principles, as seen in work on gesture and semiotics (e.g. Ekman and Friesen 1972, on illustrators and emblems). This conception of human knowledge leads to the design of methods to elicit, represent and simulate it for the design of technologies that fit this picture, and serve in turn, to reinforce it (e.g. knowledge based/expert systems, cognitive engineering, Roth et al. 2001), multi-modal agent interfaces (Cassell et al. 2000). However, our walking down the street with a friend, talking with a stranger and trying to find common ground, involves the modulation, pitch, tempo, pulse and rhythms of our voices and our bodies in a kind of coordinated autonomy. These aspects are always there when we are engaging with each other or just being copresent in physical space, and cannot simply be explained as having sequential conversational turn-taking structures. This engagement has a complexity of sensing and knowing in experienced time, and we will explore this as a kind of musicality or dance, and as a-linguistic. At one time I considered this as paralinguistic but I am no longer sure that this is the common frame underlying rhythmic synchronisation (Kendon 1970, 1972; Condon and Ogston 1966) and find that ‘a-linguistic’ may be the appropriate ascription. Support for this lies in work on mother–baby interactions that have been described as rhythmically synchronised and musical (Trevarthen 2005), and where it is proposed that such rhythmical synchrony is important for companionship, intersubjectivity and empathy. A key feature of our engagement with each other is our capacity for entrainment (Large and Jones 1999; Clayton et al. 2005) or collective action/performance, as 123 570 AI & Soc (2007) 21:567–605 when we walk in step with each other. In collective action there is no differentiation of speaker and listener, rather, we are both at the same time. This is the case with music performance where in the making of the collective sound the musicians are both performers and listeners simultaneously (Cross 2007). Such collective action is considered as being critical for the processes of grounding attention and meaning in the formation of shared knowledge or knowledge transformation in human interaction (Gill 2004). One particular phenomena that embodies this process has been termed ‘Body Moves’ (Gill 2000). These are the rhythmic and pitch movements of body and sound that operate to ground attention and meaning in knowledge formation, and enable empathy, and intersubjectivity. They are part of the tacit dimension of human knowing and raise questions about the information transmission model of sender and receiver. This model of information transmission is a linear model of information being passed from one entity to another. This linearity of sending and receiving information underlies the sequential model of speaker and listener in turn-taking in linguistics. It influenced the conceptual articulation of the early work on ‘Body Moves’ as these are composites of body and speech performance. I was working with linguists (at NTT Communication Science Labs near Tokyo, and the Media Lab at ATR near Kyoto) and had a background in discourse analysis, hence it is not surprising that I found myself trying to find forms of expression that could both allow for the bodily language and its accompanying speech. The dominance of the sequential signal information-processing model however, caused some difficulty in handling one particular kind of movement that did not fit this model. This was termed the Parallel Coordinated Move (Gill 2002), and understanding it has caused me to question the linguistic characterisation of the other Body Moves. These are primarily periodic and rhythmic synchronisations that entrain the movements of our bodies and voices during communication. They are joint rhythms, and they take two forms and serve two functions. The first form synchonises our joint attention to understand how and where we are in relation to each other. The second form of rhythm synchronises our manifest intentions and joint attention such that the movements of body and voice across both persons is performed as one coherent collective movement, even though what is being expressed are different ideas (i.e. this is not about imitation). I call this ‘Parallel Coordinated Movement’ (Gill 2002) and have argued that this is critical for intersubjectivity (Gill 2004; Polanyi 1966). Body Moves are forms entrainment, of coordinated autonomy. Autonomy assumes that an individual is a self-sufficient system and can exist without others. However, human survival depends on one’s existence with others. The concept of ‘coordinated autonomy’ better describes our individuality as existing within sociality. In the phenomena of Body Moves, one form of rhythmic synchrony serves to sustain our commitment to engage with each other, and the simultaneous form serves to transform our states of tacit knowing such that we are able to arrive at agreements and achieve topic shifts. The latter form cannot be arrived at without the existence of the former and emerges from it. Both of these forms of Body Moves are moments of empathic connection. Support for this empathic connection is found in 123 AI & Soc (2007) 21:567–605 571 research on music and affect undertaken by Joel Swaine, in his work on ‘Rhythm in Vocal Attending’ (ongoing Ph.D. Dissertation, Cambridge). This function of the Body Moves and the demonstration of their existence in 2001 was undertaken in an experiment in the Stanford Interactive Workspaces Lab. The larger goal of the experiment was to explore the role of the non-verbal in collaborative activity and how the use of ‘collaborative tools’ could afford this. The experiment revealed the intersubjective and entraining function of Body Moves. In the experiment, I asked subjects to collaborate in a joint sketching task where they had to engage at a large computer-based surface (an electronic whiteboard) that only permitted one person to touch the surface at any time, in sequences. Imagine using two mice on your desktop, the cursors will join up giving a messy effect on the screen. I expected the constraints of sequential behaviour would deny them the possibility to achieve Body Moves. And indeed, all the pairs of collaborators found their rhythmic synchronous coordinations to be disrupted, and found that this affected their commitment to communicate and be with each other in joint action (Gill and Borchers 2004). My present collaboration with musicians (Centre for Music and Science, University of Cambridge) is providing me with a foundation for understanding the temporal dimension of experiencing within body and voice rhythmic periodicity. It is important to explain that Body Moves occur at particular moments in the communication process, and are emergent from the other co-regulated movements that are occurring. What distinguishes them is a different temporal structure that is akin to qualities in music. It is for this reason that the last section of this paper is on musicality of human interaction. This paper is going to travel through these ideas by recounting a journey through the technologies with their communication and computational paradigms, and I am going to arrive at my interest in body movement and gesture in the context of my interest in the tacit dimension of knowledge. Expert knowledge and the expert system The research on the tacit dimension of knowledge begins with the concept of the expert or expertise and the limitations of representing human skill in propositional forms. In 1987, I became involved with an interesting group of researchers (Bo Goranzon and Ingela Josefson), from the Swedish Centre for working life in Stockholm who provided me with the opportunity for an apprenticeship into the concept of the tacit dimension. As part of my learning, I reflected on the concept of skill, the meaning of information, the role of imagination and reflection, and the use of language to express oneself. The theoretical training involved my spending time at the University of Bergen working with Kjell Joannessen and Tore Nordenstam–in the school of Wittgensteinian Philosophy where I was introduced to Philosophical Investigations (Wittgenstein 1953) and Polanyi’s (1966) ‘The Tacit Dimension’, as being essential to hermeneutics. Much of this was new to me and enlightening. My practical training involved visiting various researchers in Europe and bringing them together in an international workshop on a Humanistic Approach to Technology and 123 572 AI & Soc (2007) 21:567–605 Design at a major European conference called Language, Skill and Artificial Intelligence (Stockholm 1988). This conference was significant in Scandinavia as it reflected on a culmination of almost 10 years of the use of expert systems in organisations within both the public and private sector. My project in that year was to undertake a Cultural comparative study of the concept of design between Britain and Scandinavia and edit a collection of works on ‘Knowledge, Skill and AI’ (Goranzon and Josefson 1988). What did I learn from all this? I learnt that there is a deep relationship between that which is explicit and that which carries and embodies it–namely the tacit. I learnt that the patterns of negotiation and organisational decision making are rooted in cultural practice and cannot be considered outside of this performance of tacit knowing that is socially embodied. Likewise, how people understand the meaning of the words, ‘participate’, ‘cooperate’ and ‘dialogue’, is also located within cultural practices. The Italian colleagues could not accept the use of the word ‘dialogue’ to denote any lower act than the communion with God–and any other form of engagement is ‘communication’. I learnt that the British pragmatic culture was more comfortable with the idea of cooperation, whereas the Swedish democratic culture was more comfortable with the idea of participation. In the former, decisions can be altered relatively quickly if circumstances change, and in the latter, decisions undergo the rigor of the democratic process of participation for any alteration to be considered. Decision-making itself is a culturally rooted communication process. In particular, I was struck by a Swedish case study undertaken by my mentor, Goranzon (1993) of mathematicians in the Forestry industry. They had worked with systems designers to build expert systems to assist them in their daily calculations. However, over a period of years, they found themselves on occasion doubting their own judgement to that of the computer and eroding confidence in their mathematical skills. This story has always troubled me. If we know what information is being placed into a technology, and we know the limitations of what that can achieve, and we are ourselves highly skilled, why when we engage with this representation of our knowledge do we lose our confidence to judge? I decided that I needed to investigate the relationship between the tacit and the explicit dimensions of human knowledge and human cognition. And I began with the example of the expert system. Expert systems and experiential knowing Expert systems designers of the old school of cognitive science and information systems (often referred to as ‘strong AI’) believed that it was possible to represent someone’s skill as explicit representations and explicit processes. This assumed that the tacit was merely the unformed explicit. Two case studies in the beginning of my Ph.D. were to have a fundamental impact on the way I reflected on the tacit dimension in relation to the explicit within the expression or performance of expertise. Most important was my realisation that 123 AI & Soc (2007) 21:567–605 573 the key to the tacit was ‘experiential knowing and the imagination’ and that the ‘experiencing’ was outside of linguistic patterns. The first case study was about creating a knowledge base for consultancy practice. As a result of my presentation of a paper, ‘‘‘Knowledge and Skill Transfer through Expert Systems: British and Scandinavian Traditions’ at a British AI conference (Expert Systems, Gill 1988), I was invited by the chairman of the conference session to join a workshop on consultancy organised by his consultancy company for a week in the country as they underwent a corporate identity transformation. The company was a large multinational. At that time its members did not have a corporate concept of ‘consultant’ and performed in their jobs as ‘experts’ in their domains. However, to keep up with the times and be competitive, the company decided to develop the concept of consultancy and reconstruct its identity as a consultancy company. This necessitated asking its highly skilled experts to articulate themselves as consultants, something they were unfamiliar with seeing themselves as. My role was to ‘elicit’ the tacit dimension of their knowledge formation as they underwent this process. The outcome of this was to develop an interactive and intelligent multi-modal knowledge based system for training experts to become consultants. It was not possible to video or audio tape any of the interactions due to sensitivity and confidentiality. During my week with these experts from all domains of work (sales, engineering, marketing, top management, computing, etc.), I experienced many forms of expression of practice and experience being used to build the identity of ‘consultancy’, such as role play, cartoons, metaphors and video film. My task was to record and provide an analysis of the tacit knowledge of consultancy practice. These days, the processes of apprenticeship and learning through practice may seem alien to many researchers. However, under my Scandinavian mentors, I learnt not to follow conventions of research methods but to develop my own, which is what I did and continued to do so throughout my research. To illustrate some findings about the tacit-explicit and experiential knowledge processes, I will give one example of ‘consultancy’ performance that took place during the week mentioned above. A group of four upper-middle and senior management practitioners (experts in their fields) gave a presentation of about 20 min each. They were seated around a table and were provided with an overhead projector if they needed one. Each was to present themselves to others as a consultant and talk about what a consultant is, and sell the idea of consultancy to the others. The group was chaired by another senior manager who sat at the table with them. The first ‘consultant’ dressed in a suit and tie, stood up and presented a ‘tool kit’ of consultancy, which essentially consisted of a list of propositional statements– descriptors, definitions and rules. This consultant had to stop giving his presentation after a few minutes saying he had lost the thread of information, i.e. the connection between himself and the information he was presenting. The second ‘consultant’ dressed more casually but smart, spoke of how consultants ‘pull rabbits out of hats’ and presented hand drawn overheads of a rabbit being pulled out of a hat and one visual with the word ‘magic’ in large letters. His forms of expression disturbed his ‘clients’ who accused him of mocking their 123 574 AI & Soc (2007) 21:567–605 profession and expertise portraying them as insincere or dishonest, and made them unreceptive to his ‘content’. The third consultant also dressed casually but smart, also stood up. He spoke about rules or conduct, and emphasised the good things a consultant does. His handwritten overheads were measured and consistently paced. He was perceived as sincere and they felt he understood them and supported them. All these three ‘consultants’ had stood and presented overheads, with the first dressed in suit and tie and the next two dressed in more casual but smart clothes. The fourth consultant sat on the other side of the table to the other three, facing them, and began telling them a confidential story of some political rumblings at the top of their corporation. This consultant was very high up in the organisational structure hence he had an authentic voice on these matters. The others became troubled and deeply involved in unravelling the story trying to find out as much as they could and work out the nature of the problem. After 20 min, this fourth consultant broke the illusion of reality, and told them it was all a story. This was a disorienting experience for the others and they were very impressed by what he had done with them. It was of great interest for me, for this consultant had fully engaged them in the performance of practical knowledge, where their experiential knowing was immersed with each others. It was powerful acting (or it was acting from power) with audience co-performance. The second consultant who had offended the others’ moral well-being had in fact given a sound presentation at the level of content. The chairman of the consultancy workshop called me a few days after this week was over, and showed me the copy of the overheads by this consultant pointing out that there was actually nothing offensive or wrong in what he was saying. The problem had lain in how he had presented the content and how he was perceived as a person. This took me back to my premise on culture and communication, that the performance of the forms of our expression influences the perception and meaning of the content the forms carry. The third consultant provided the feeling of safety and comfort in his use of moral and ethical forms of expression and a calm, paced voice. He was described as genuine. In all these performances, the posture, position and clothing of the performers in relation to their ‘clients’, set the stage. If the forms of expression were not embodied, the performance failed, and if the forms of expression did not meet the perceptions of self of the client, there was breakdown in the communication process. As I reflected on this case study, I was reminded that I needed to study the tacit dimension as a process within dialogue and not outside of it. This was reinforced by my second case study which was an interview applying ideas about how to engage with ‘eliciting’ the tacit through and within dialogue where dialogue is the method and the observation. I was invited in by some researchers who were developing a data base for underwriters that could process applications for life insurance policies. The work on the data base was becoming cumbersome and the processing of all the possible data input categories was creating bottlenecks. I requested to be alone with the ‘experts’, and had the opportunity to sit with a senior underwriter and a junior underwriter. They had brought a set of application forms with them and were curious about my 123 AI & Soc (2007) 21:567–605 575 presence. I told them that I was not there to extract any information but wanted to learn about what they do and I spoke a bit about my interest in the tacit and experiential dimension of human knowledge and skill. They began to chat about their skill and began to go through each form, thinking aloud in order to explain to me what they do. The session emerged as a senior expert teaching and imparting his skill to the junior (and to me). As they worked through the information on the forms they built up pictures of the people they had in front of them and imagined their past and future lives. On the basis of this they formed judgements as to whether this was someone who could or could not qualify for a certain type of life insurance policy. The experiential knowledge and imagination of the senior underwriter was made available to the junior who could then follow and work with him to understand the personality and life-style of their applicants. It was clear that there was no one salient procedure of data processing as each person (each applicant) had a different picture of salient information, and it could be problematic to assume predefined categories with predefined processing rules that are rooted outside the meaning of ‘relevant’ data–i.e. outside how it is meaningful to the underwriter in building a picture of a person. There are two problems and they relate to the idea of not being able to see the wood for the trees. If one functions at the levels of data and procedures, then one builds composites, but these composites may not form a wholeness, instead they may simply remain a collection of parts. It is the human who can make the wholeness by applying experiential knowing and imagination but the skill of achieving that may become lost if the system automates the expert’s creation of the applicant into the composition of parts. There are undoubtedly corporate factors (around risk and profit), which shape the imaginative construction of the client, but these are not the foci of the analysis. This experience took me back to the early conversation with my Scandinavian mentors about what role imagination and experience plays in human knowledge. These two studies were undertaken during my time in a school of computer science. I realised that the limitations of the computational paradigm in relation to the tacit and experiential dimension of human knowing lay in the necessity of representation. Computation necessitates that human knowing is made explicit as ‘content’, ‘context’ and ‘procedure’. What I was seeking to do was to understand human knowing as the inter-relations of the tacit and explicit within the performance of content, context and procedure, and this was not a matter of representation, but of communication. Hence the next part of my story moves from representation to communication. Mediation and distributed expertise I decided to apply the analysis of the relation between the tacit and explicit dimensions of knowing to the analysis of conversation. My case study was of the meeting discussions of a design team who were creating an audio–visual communications infrastructure in their corporate building. This involved participatory observation, which included making video and audio recordings, as well as conducting informal interviews, and inhabiting the space as an affiliated researcher. 123 576 AI & Soc (2007) 21:567–605 I found that (Gill 1995) the success and failure of the transfer of knowledge in communication consists in the relationship between the tacit and explicit dimensions of knowledge. This arose from critiqueing the idea, prevalent in computation and the information transmission model, that it is possible to represent ‘expertise’ or ‘knowledge’ in a data-base such that the ‘information’ can be coded and logically processed. As we have discussed above, this frame assumes that ‘data’ or ‘information’ can be taken from its living context without losing its meaning, as its meaning becomes redefined within a system of rules and abstract differences. Hence, I decided to study what constitutes a ‘piece’ of ‘information’ within the conversations of the audio–visual design team. At the time, two researchers based in the organisation were analysing design meetings in their research, and in their analyses they would list salient types of information that occurred within these meetings. One such category of information was ‘topics’. I decided to identify topics that were raised in the conversations of ‘my’ design team, and treated them as ‘information’. In analysing the nature of this information, I traced the path of each topic and found that where a topic began was where there was a discrepancy in knowledge amongst the conversants. The end of each discrepancy coincided with a new piece of information being raised, i.e. a new topic, indicated by a move for a topic shift. At that moment, the discrepancy had been resolved or at least a consensus was reached such that the conversation could move forward. Quite unexpectedly, the quest to understand the relationship between the tacit and the explicit, required the analysis of the nature of discrepancies in communication and the means by which we can accommodate to each other’s differences, either via third party help or between ourselves. The participants needed to find some mutual ground, involving their self and this is where the operation of mediation was critical. The design team consisted of five persons, all with different skills or knowledge domains, and each person was chosen by the Director to ensure they represented the salient elements necessary for successful design of this technology. The topics that I identified within the conversations involved discrepancies of very differing structures. One example of a gap in knowledge between two people communicating, is where one is talking from within his are expertise and the other is trying to engage with him outside of his expertise. This typically necessitates some third party person who can bridge the gap. This scenario of mediation comes most easily to mind when thinking of what mediation means. In the case of the design team, a mediator was able to make the ‘expert’ understand that in fact he had not understood the nature of design problem that he was supposed to be responsible for and this is what lay behind the gap, not the lack of knowledge of the non-expert. Another, scenario which is not so clear, but performs a similar function, is where many people are recalling a design setting–e.g. ‘I cannot see a very clear view through that camera lens’, ‘that area looks very dark’, ‘maybe the light is on too low’, etc... As their experiences pour in and some of them contradict a statement they had made just a while back, one of them says something pertinent such that the ‘expert’ in this chattering of recalled autobiographies suddenly realises what the problem is, and solves the matter of the camera lens. In both scenarios, the ‘expert’ cannot see what the design problem is. It is someone else, whom I term the ‘mediator’, who provides the key and does so with 123 AI & Soc (2007) 21:567–605 577 the precision of the appropriateness of their utterance at the right time and in the right style and with the right role. There are three points to be made here. The first is the dismissal of the concept of an autonomous expert, the second is the existence of distributed expertise, and the third is the relation between mediation and knowledge transformation. The mediator enables resolution in discrepancy and consensus in knowledge by being empathic with the critical discrepancies. Empathy is here defined as the compatibility to generate shared understanding with respect to a particular combination of compatibilities such as role, level of knowledge, forms of expression, personality, etc.3 Empathy is necessarily personal and involves emotion, and is therefore the ability to share or generate understanding of knowledge (which is necessarily personal, and can be propositional), role and personality. The explicit and the tacit within this picture of mediated communication is considered as being the expression and its background of meaning. When this is unsuccessful in being communicated, mediation is needed to provide the bridge for the particular discrepant aspects of the tacit and explicit dimensions to meet. This is achieved by making the tacit nature of the discrepancy explicit to both participants such that they both understand the background to their discrepancy. Once they become aware, it is possible for them to begin to resolve it. Hence, the success or failure of knowledge transfer is dependent upon the relations between kinds of knowledge (content) and the processes, which influence its transfer, i.e. discourse dynamics. The categories of knowledge (content) are propositional, experiential, personal, knowledge by familiarity and practical. Propositional knowledge is expert (domain) knowledge, or knowledge, which can be expressed in the form of rules, made explicit, and is non-personal and nonexperiential. Experiential knowledge is that which comes from one’s own direct experience of the knowledge one expresses, or it is cultural/social knowledge, or it is knowledge of another’s experience, or of an event. Experiential knowledge consists of personal knowledge. Personal knowledge is that of the individual personality, expressed as values, beliefs, emotions. Experiential knowledge is the relating of one’s experience. This may be either direct experience which is indicated by the use of ‘I’, ‘we’, etc., or general knowledge (e.g. knowledge about a culture: ), or generic knowledge4 (a frequent experience: ‘whenever I do...), or episodic knowledge5 (a specific experience: ‘the other day I was...’). Knowledge by familiarity is about knowing ‘when’ to act. Practical knowledge is the skilled communicative performance itself. It can be inferred but not made explicit: decisions, judgements, analyses, indicate (point to) practical knowledge but do not represent it. Through dialogue, participants acquire knowledge or fail to do so, and other possible outcomes are changes in group knowledge, achieving dynamically stable 3 It is akin to aesthetic emotion–e.g. our resonation to the structures, textures, forms and colours of a painting, as well as the theme presented by them. By empathy, I do not mean sympathy. 4 This is based on the idea of generic structures in memory, which summarise similar events, cf. Barsalou (1988). 5 This is as in episodic memory, cf. Tulving (1972). 123 578 AI & Soc (2007) 21:567–605 knowledge6 and building trust. Knowledge acquisition is successful when the communication between a speaker and a listener is consensual and compatible (at this stage in my research, I was still following the model of speaker and listener). Knowledge acquisition fails where no compatibility in communication can be established between speaker and listener. Group knowledge denotes a new level of group consensual knowledge: indicated by a qualitative difference over time, e.g. from the beginning of the conversation, in this case design meeting, to the end. Trust is both an aspect of group dynamics and a possible outcome of dialogue. The system of interaction between content (knowledge) and processes (discourse dynamics) in the communication is orthogonal. This is evident in the earlier two case studies of the performance of consultancy practice, and of how underwriters make sense of the data to imagine a person’s life. Knowledge is embodied in discourse dynamics, such as goals, forms of expression and group dynamics. The construction of knowledge categories for the analysis of communication drew upon a range of discussions and research on tacit knowledge and human skill, particularly in relationship to technology and its application, from human-centred systems (Cooley 1987; Gill 1996). The categorisation of knowledge dimensions also drew upon work in autobiographical memory research (Bekerian and Dennett 1990). To return to mediation, the conventional model assigns a person to have that role in dispute situations or in a meeting (e.g. chairperson). Hence, the mediation process involves the mediator making a number of interventions. Sometimes they may not succeed and sometimes they do. Where they do not, and other interventions disturb the communication rather than ‘enable’ it, this may be perceived as a problem. However, in the design meeting referred to above, the constant intervention, sometimes with information irrelevant to the particular problem (rhetorical) by various negotiators or participants, functioned to sustain the dialogue and clear up the noise in order for the person who would be the mediator, to act as such. In fact, in a period of 5 min, there were three topics, each consisting of different kinds of discrepancies in the knowledge of the participants and each with different persons in different roles performing as mediators for different experts. The study drew three basic requirements for a person to be a successful negotiator or mediator: (1) (2) (3) Understanding the other; understanding the situation of discrepancy between two (or more) parties; Knowledge of the gap between oneself and the other; knowledge of the gap between parties; Ability to express this understanding and knowledge to other or others, i.e. produce the bridge. The first is a necessary condition, and this has to function in conjunction with the next two. It is not sufficient to understand the nature of the discrepancy nor to have the key knowledge; one needs to be able to convey this in a form that others can 6 This denotes an individual’s ability to have acquired the knowledge such that they can use it in a sustainable manner; a kind of psychological state whereby someone can maintain their performance of the knowledge over time. 123 AI & Soc (2007) 21:567–605 579 ‘receive’ (The receiver model was prevalent for me at this time). Personality also played a role as this influenced the perceptions people have of each other, and affects their ‘reception’ of information imparted. Being aware or understanding the other requires an understanding of how the other perceives you. The analysis of the tacit and explicit dimensions of communication for knowledge transformation, clarified that embodiment of forms of expression was critical to sustaining one’s own performance and engaging in co-performance with others (what I termed above as dynamically stable knowledge). It took the analysis from the consultancy study further by identifying the various kinds of knowing that make for the tacit dimension in communication. It showed that the explicit is never meaningful outside of experience, and that a critical factor in the formation of knowledge in human communication is the process of mediation. One aspect within this communication was missing from the analysis, namely, the human body. Although all the initial analysis was made from video recordings, the final analysis was made primarily from speech performance. This is why the speaker–listener model of communication is prominent at this early stage of work on the relation between tacit and explicit dimensions of knowledge in communication. A reconsideration of this began when I moved from the conversation setting, to a setting involving mediation of communication using materials where the body and co-presence in the same physical space became important for developing an understanding of the nature of experiential knowledge. The setting, which was of landscape architectural practices, extended my interest in the tacit dimension to analyse the aesthetic dimension in human interaction. The hermeneutic tradition that I began my research within, emphasised that the tacit had two arms, aesthetics and ethics, neither of which is reducible to abstract propositions if it is to be meaningful in human conduct. The place of empathy in conversation was defined as being embodied in the mediation (see Fig. 1) process, and empathy was described in terms of aesthetic emotion arising with the resonance of communication structures. This conceptualisation of empathy is later extended with the work on the body and tacit knowing in human interaction. The methodology for understanding the tacit has so far drawn on dialogue, discourse analysis, ethnomethodology and participatory observation, and now extends to include ethnography. Aesthetics, experiential knowing and the body In the winter of 1996/1997, Gordon, an apprentice landscape architect with company ‘BETA’, sent a set of completed coloured maps that he had made at the company’s Welsh office, to John, a senior architect based at its headquarters located in North England. The company was going to make a bid for project work to reshape a major road in North Wales where the frequency of traffic accidents was high, and these coloured maps were part of the depiction of the changes to the road design and effects upon the landscape. For example, colours depicted old woodland and new woodland. To Gordon’s surprise, John judged the colours that he had used 123 580 AI & Soc (2007) 21:567–605 Knowledge Acquisition: Knowledge transfer and formation Design DIALOGUE Outcomes Knowledge acquisition Influences or Knowledge DIALOGUE Failure at knowledge acquisition Discourse Dynamics Forms of expression Group dynamics Style Group knowledge Dynamically stable knowledge Trust; empathy Fig. 1 Framework to be ‘wrong’ and that the maps needed to be correctly re-coloured. Company BETA had barely 2 weeks left to submit their bid and re-colouring all these maps was no small task. John brought in other experienced landscape architects at his branch to help, and asked Gordon to travel up from Wales and re-colour the maps with them. It was felt that Gordon lacked experience and the only way he was going to get it was by experiencing the doing of colouring in a shared practice. The problem of ‘seeing’ the colours was partly due to the company’s economic condition. BETA was downsizing, as a result of which Gordon was the sole landscape architect left at the Welsh branch. Architects, however, do not interpret the material in isolation when they first handle it. In talking aloud and moving pens over paper, they engage the other person(s) in their conceiving. This, it is suggested enables one person to adapt upon another person’s view, producing the conditions 123 AI & Soc (2007) 21:567–605 581 for a coherent development of the design (Gill 1997), and a process for ‘seeing-as’ (interpretation) until they come to ‘see’ (unmediated understanding) (Tilghman 1988). This is likewise with colouring activity: as the apprentice colours with the team and more experienced architects, he/she learns how they select, for example, a specific shade of blue to set against a particular shade of green (‘seeing-as) to create a ‘pleasing effect’ that ‘looks professional’ (Gill, op.cit). Because of the distance between the two branches and because of their commitments, John had been unable to visit Gordon and work with him. Instead, he had sent him a set of previously coloured maps (examples of experience), colour coded keys and a set of instructions. These are descriptive and propositional forms of expression, all located in the experience of the architects at the North England Branch. For Gordon, they are outside his experience, and he brings his own to bear in interpreting these fragmented representations of practice. In his study of how a team of geophysicists judge when material fibres in a reaction vat are jet black, Goodwin (1997) shows how simply saying ‘jet black’ is not sufficient for helping an apprentice measure and make this judgement competently. Rather, the ‘blackness of black’ is learnt through physically working with the fibre, and in talking about the experience, ‘transforming private sensations and hypotheses into public events that can be evaluated and confirmed by a more competent practitioner’. Geochemists use their bodies as ‘media that experience the material’ being worked with through a variety of modalities. In the case of the apprentice, Gina, in Goodwin’s study, her interlocutor’s ability to recognise and evaluate the sensation she is talking about requires co-participation in the same activity. The example of Gordon’s ‘failure’ to correctly interpret the forms of expression sent to him, is an example of how breakdown can take place when co-participation is missing from the interpretation process, and how essential it is for repair within a distributed apprenticeship setting. Knowledge becomes clearly more than a matter of applying learnt rules, but of learning ‘rule-following’ (Johannessen 1988) within the practices that constitute it. The need for him to colour with the other architects in order to be able to correctly interpret any such future fragments that might be sent to him, shows that experiencing in c-presence has powerful tacit information. Gordon’s acquired knowledge will be evident in his skilful performance of these forms of expression. The equivalence in meaning of ‘forms of expression’ and ‘representations of practice’ denotes a range of a range of human action, artifacts, objects and tools. Human action includes cues, which may be verbal, bodily, of interaction with a physical material world (tools, e.g. pens, light tables, etc.), and construction of the physical boundary objects (e.g. colour, maps, sketches, masterplan sketches, masterplans, plans, functional descriptive sketches, photographs, written documents, etc.). The dilemma of this distributed setting is that even in the future, any interpreting or understanding that Gordon, as an apprentice, does of similar or different fragments of knowledge, will still take place in isolation, and the feedback from his local colleagues will be based on their ‘seeing-as’ (Tilghman 1988) (interpretation 123 582 AI & Soc (2007) 21:567–605 based on their experience) and not ‘seeing’ (as they lack sufficient skill in this domain to ‘understand’ without interpreting). I asked John that suppose it were possible for his team and Gordon to colour maps together in a distributed setting with the help of some hypothetical computer mediated technology, would he be interested in exploring this possibility? John declared that this was not a matter for technology, but quite simply that Gordon ‘lacks experience’ and that the only way he will acquire it is by colouring with them in the same space. His conviction, made me reflect on what it means to share a space and be present, as a precondition to acquiring experience; experience that would have helped Gordon to interpret the examples of previously coloured maps for similar bids, colour keys and instructions, that had all been sent to aid him in understanding how to colour the maps. Being present is a bodily experience, and involves all the human senses. In various cultures we draw upon various levels of our senses. For instance, the Maori rub noses in greeting each other, Russians kiss on the mouth, and in some Arab cultures, they bring their faces close enough to smell the breath of the other. All these acts are part of gauging one person’s sense of another, essential to building trust that is required for committed engagement. Placing a glass plane between two people in any of these situations would block their tacit ability to interpret their relation to each other, and thereby comprehend each others meaning, through the impacts between their bodies, and would require them to focus on the visual and speech channels that have limited bandwidth for tacit knowing. John was certain that once Gordon had this experience of colouring with him in the same physical space, he would have no trouble in the future in aligning his aesthetic ‘seeing-as’ with theirs when given such materials or representations (exemplars) to interpret and ‘see’, wherever he might be. Seeing-as requires interpretation, and Tilghman terms this, ‘mediated understanding’. Once you have the skill to see, you can understand without interpretation, and just perform. The tacit knowing that Gordon had acquired would be ‘retrieved and made active by sensing’ (Reiner et al. 2004) in his act of seeing. The role of mind and imagination is important for such retrieval in sensing that brings together past (memory), present and future. In such sensing, our minds draw upon our bodies: ‘wherever some process in our body gives rise to consciousness in us, our tacit knowing of the process will make sense of it in terms of an experience to which we are attending’ (Polanyi, op.cit. p. 15). When the maps in this example were finally coloured, the final result was markedly different from those that that Gordon had produced, and although this is not the key point, they did look aesthetically more pleasing. However, my impression was also one that had been shaped with others. I now sought to investigate the aesthetics of ‘seeing’ by analysing the importance of the body for experiential knowing in shared physical space, and the constraints of disembodied communication upon distributed knowledge acquisition. During the next phase of my research, the coordinated synchronised movement of the prosodic qualities of the human body and voice become a foci of attention as part of the basic level of the tacit dimension of human knowing, the nature of which, it is proposed, enables intersubjectivity, knowledge acquisition and transformation. I developed an 123 AI & Soc (2007) 21:567–605 583 analysis of how experiencing the performances of representations of practice (e.g. gestures, deictics) and moving with these representations in a joint activity (such as colouring together, sketching together), consists in specific types of behavioural alignments between actors in an environment. I termed this Body Moves and have been developing the relationship between Body Moves and tacit knowing. This is to better understand, conceptually and practically, how the nonverbal movement of body and voice facilitates knowledge transformation and knowledge acquisition, and specifically the quality of that movement, which turned out to be rhythmic and entraining. The methodology to understand the tacit dimension would now be further extended to include experiments and psycholinguistics, to analyse the pragmatics of body prosody. Embodied synchrony and multi-modal systems I will begin by relating the technology scenario in which rhythm and entrainment became visible. This next phase of research took me to Japan. At NTT Basic Research Labs (1997–1999) (now the Communication Science Labs) and ATR (1997) I became interested in the world of interactive multi-modal systems that used various styles of human like objects. For example, there were ‘talking heads’ and ‘talking eyes’, where the body would be reduced to a range of features from the most minimal (just two eyes on the screen) or more complex, and would move with the prosody of the voice, including backchannels and fillers. In the last few years, work in gesture has shown many features of gesture speech coordination that is being fed into interactive agent technology design. Designs of life-like agents, as in the case of the MIT estate agent called REA7 (Real Estate Agent), are full bodied and gesture at the appropriate prosodic speech moments but also follow patterns of gesture and speech cue timing that is being discovered in gesture research. In the case of REA a camera was used to passively sense the user. REA plays the role of a real estate salesperson who interacts with users to determine their needs, show them around virtual properties, and attempt to sell them a house. Cassell describes her Embodied Conversational Agent (ECA), as ‘a virtual human capable of interacting with humans using both language and nonverbal behavior’8. She introduced the rule-governed, autonomous generation of non-verbal conversational behaviours in animated characters. In a similar vein to the work at NTT and ATR, the embodied conversational agent is capable of generating and ‘understanding’ both propositional components of speech and synchronised interactional components such as back-channel speech, gestures and facial expressions. Work on gesture has shown that we begin a gesture movement prior to our speech action, and that our gestures have a structure called a ‘stroke’ (McNeill 1992; Kendon 2004) that is quite precise in its phases and works with speech content. A gesture may accompany a speech utterance and be its embodied representation. Work by former 7 http://www.media.mit.edu/gnl/projects/humanoid/index.html 8 http://www.soc.northwestern.edu/justine/jc_research.htm 123 584 AI & Soc (2007) 21:567–605 members of the Chicago school of gesture that was headed by David McNeil, and included Cassell and Kita, have investigated representational and conversational gestures and produced findings that are important for understanding the relation between hand and mind (McNeil 1992). Other artistic approaches to gesture, such as in Tosa’s interactive art/media projects9 focus on the emotional sensation of movements of the body and voice in creating attachment and care. When I met her at ATR Tosa had developed an emotion matrix that she used to develop her artworks of ‘neurobaby’ that moves and coordinates its non-verbal body and voice actions in response to the movements (prosody, pitch and modulation) of the sounds of our voices. In the case of ‘neurobaby’, I found myself experiencing anxiety when the baby (a collection of moving coloured lines of eyebrows, eyes and mouth, and a outlined face) burst out crying and I desperately tried to stop it by altering the pitch, rhythm, tone, phrasing and modulation of my voice. The domain of embodied agent designs is a large one, and I wish to focus on one aspect, that of the representation of communication processes (procedures) and semantics (content) of communication acts. The representation of the communication process is based on sequential turn-taking structures and that of semantics is based on the affordances of computational structures. The turn-taking acts are preprogrammed responses to the feedback that the user provides (such as that of my voice when trying to calm the crying neurobaby)–and the user has to be predictable in giving the feedback (otherwise the emotion structure of my voice would not be recognised). The timing of any turn is bounded by the computational constraints of processing the data input and responding to it. The complexity problem for computing interaction/conversation is that if one tries to represent the human interaction and replicate it, then this necessitates making human interaction explicit. The problem of representation is similar to the problem faced by the bottleneck of ‘implicit’ or ‘tacit’ knowledge for the expert system. The interaction is definitely made more engaging when using human-like agents with programmed human voices as these elicit emotional (sensory) and imaginative responses that enable the illusion of contact with the artificial. At a basic level, all these designs are a further development upon the Eliza system developed in the 1970s by Weizenbaum at MIT (1966, 1976), which asked the kinds of questions that people expected to hear from a therapist and gave the illusion of contact and understanding with the artificial. The multi-modal agent system extends interaction from speech (content and organisation of content), to including the body, gestures, and modulation of the voice in eliciting a stronger embodied response, that we now know from work on mirror neurons (Rizzolatti et al. 2001; Ferrari et al. 2003), and work on music and motor neurons (Large and Jones, 1999 on dynamics of attending), trigger both these neurons. What interactive technologies would need to be able to achieve, in order to perform as co-performers with us, is adaptability and grounded meaning in communicative situations. This is not computationally possible and it is not clear if this is desirable. 9 http://www.tosa.media.kyoto-u.ac.jp/ 123 AI & Soc (2007) 21:567–605 585 At the moment, feedback patterns are based on a variety of methods, one of which is to aggregate and average the types and frequencies of an action, e.g. of a gesture with a speech action, and another is to simulate the communication setting and model it. An example of simulation is given further below, where the task involves two people separated in different rooms, connected via a microphone, with one person trying to find out where to eat out in Tokyo on a particular night, and the other person searching a website to help them find an appropriate restaurant. The aim of simulations can be to discover a grounded ideal patterning of query and response by running through a large number of subjects who will entrain to the most prominent coordinated pattern (this is never predictable as it depends on the personalities and behaviours of the subjects) (e.g. Gill and Kawamori 2002). This grounded pattern is then taken as the normative case for computational purposes, and for human users to then engage with. Present work on learning algorithms seeks to free the computational constraint on adaptation. AT NTT Basic Research Labs where I was part of the Cognitive Science and Dialogue Understanding Groups, my research projects on tacit knowing in dialogue, in addition to multi-modal communication processes, included the study of crosscultural communication and perception, and the use of the Internet in Japanese education (e.g. the 100 Schools Project, Miyazawa et al. 1999). The use of the Internet had a fundamental impact on teaching in Japanese schools, giving rise to the concept of the individual and representations of self in relation to others including in cross-cultural communication. The relation of self to other in a distributed setting is different from that of being face-to-face, but I still needed to undertake a study that would provide more understanding of the limits of presence in the distributed setting that had proven so problematic for the junior architect to ‘see’. I decided, with my colleague, Kawamori, to analyse how multi-modality, particularly the body, operates in communication and identify the contingencies of adaptation for grounding meaning in a non-face-to-face setting where people cannot see each other but can talk to each other through a microphone. This involved exploring the synchronisation of body and speech movements (prosody) the a nonface-to-face setting to further understand the functions of the body and voice for the self and for the other in grounding meaning. In gesture research on face-to-face communication, coordinative gestures had already been identified (Bavelas 1994, 1995; Ekman and Friesen 1969) as functioning as a signal for the interlocutor. Ekman and Friesen had produced a semiotics of gesture, such as the ‘OK’ sign with the thumb, calling them illustrators and emblems, whilst Bevelas working on interactive coordination, showed, for example, how we may gesture with a hand, pointing it to the person we are talking with, to indicate something we are referring to in relation to that person. In contrast, in the non-face-to-face setting, where speakers cannot see each other, there is no visual information that such gestures can carry. This implicates that their function, if they occur, in this setting will differ from that of the face-to-face setting. Alibali and Heath’s findings (1999) of reduced but still high levels of representational gestures (topic based) in non-face-to-face settings, and their proposal that these may have a function for the speaker which is independent of their function for the listener, 123 586 AI & Soc (2007) 21:567–605 supports the idea that there may be a difference in the functions of gestures in the two settings. If coordinative gestures (not just the kinds cited above that are more clearly intended to be seen) require the presence of the interlocutor in a face-to-face setting, it is unlikely that they would occur in the non-face-to-face communicative setting. Bavelas (1994) had found that ‘interactive gestures’ do occur in non-face-to-face settings but that they are significantly reduced. As gestures still occur in this interaction, do they still presuppose the interlocutor, given that he or she cannot be seen, or is this not a necessary condition for the act? In order to answer these questions we conducted a couple of experiments (Gill and Kawamori 2002). The gestures that we focused upon, nodding gestures, were described as having a metacommunicative function, i.e. considered as signals for the interlocutor, carrying information, and thereby enabling coordination (at this period, the notion of signalled information transmission was still a dominant concept). We expected that if these gestures still occurred in the non-face-to-face setting, then the presupposition of the interlocutor is not a necessary condition, in which case we proposed that they embody an additional self-oriented function. Of the two experiments, one consisted of British subjects and the other of Japanese subjects. The task in both cases was identical. The gesture that we focused on most was the head nod, which is a frequent gesture in Japanese communication, and we expected a cultural comparative analysis to identity common and essential coordinative characteristics of this gesture. The experiment was of the communication of information between information provider and information seeker. Note the speaker–listener model of information. The task is a web searching task, where the web-searcher is the information provider, and the person seeking information is the information-seeker. The experiment takes place in a laboratory setting. Subjects are placed in separate sound proof rooms. Four video cameras are used, two in each room. These take a synchronised view of both subjects at two angles: of the upper torso, head and arm movement from a frontal view; of hand movement on the desk from an overhead view. All participants knew in advance that they were being video-taped. The main topic is about eating out in the Tokyo area. Questions are asked by both subjects, for example, about the price range of the restaurants, and the kind of food the subject wants to eat, directions on how to reach the place, and contact details. Each session lasts a few minutes. Coding involved recording the timings of the picture frames of each nod, which person is nodding at the time, at which place in an utterance or turn the nod occurs, whether the person nodding is speaking at the moment they nod or the other person is speaking. We also noted which utterances and nods occurred just prior to or just after the nod coded. This included backchannels, endings of turns or phrases, pauses, simultaneous nodding and prompts. Coding also involved functions, e.g. question, inform, suggest, propose, self-expression, emphasis, evaluate, request, listing, repair, confirm, topic shift, prompt, closure, initiate. This was to check for any relations between the gesture and the function of the speech as well as the category of the speech, for example, as filler or backchannel, etc. The frequency of 123 AI & Soc (2007) 21:567–605 587 nods per session, and for information-provider and information-seeker, were also coded. Other gestures among the British subjects were also coded, but we found the most frequent one in this communication setting to be the nod, and nods serve a coordinative function, which does not directly serve to engage the interlocutor by the action itself. Three significant findings emerged which indicated a self-oriented function of the gestures. First, nods do occur in silence without the accompaniment of speech in a backchannel-like action. Although nods also occur in conjunction with speech in backchannel speech acts, the silent nod questions the claim that gesture and speech form a composite whole in the backchannel where the gesture functions as a signal for engaging the listener. In the case of silent nodding, this claim cannot be made. Rather, the claim of a self-oriented function to this interactive but non-communicative silent nodding is more applicable. In this case, we propose that gestures that do occur with back-channel speech are not performing the same function as speech. Here begins a questioning of the signal as an appropriate ascriptor for the body. Second, we found that nods occur with intonation and emphasis, when describing something, during the utterance itself. This backed up work on synchrony and coordination undertaken by Kendon (1990). They also occur with repetition. The nodding action when describing something is rather like a rhythmic beat and tends to occur at the beginning or middle of the specific word for British subjects speaking in English, and at end of the specific word for the Japanese subjects speaking in Japanese. We had expected to find cultural differences of prosodic of body and speech sounds in the different languages and cultures. For both groups of subjects, nodding can occur at the end of a phrase. It does not appear to perform any communicative function, but rather seems to be a necessary part of what we will term the internal autopoieitic rhythm of the speaker. The frequency and trajectory of the nods vary with the speech rhythm. Third, and surprisingly, given that the speakers cannot see each other, simultaneous nodding takes place. This would appear to suggest that the internal autopoietic rhythms of these two independent systems (speakers) are the same at that moment. The situations of this simultaneous gesturing vary. They can involve silent nods, or both information provider and information seeker speaking at the same time, either producing exactly the same utterance or different utterances, and can occur with simultaneous laughter. In the case of Japanese subjects, producing the same utterance is more likely to result in a simultaneous and same gesture (nod), than for the British subjects, who may produce differing gestures (for example, nod and body sway, or nod and a sideways head movement). The frequency patterns of the simultaneity for both experiments varies with differing situations. The simultaneity does not appear to be arbitrary and it is synchronised. Whether it is entrainment as in the case of the Parallel Coordinated Body Move, we were not certain. These three cases of nodding would suggest that nodding entails an additional function to the communicative function (signal for the interlocutor), which is coordinative and interactive, and this is the self-orienting function essential for sustaining the interaction. The face-to-face theories do not provide us with an 123 588 AI & Soc (2007) 21:567–605 explanation for why such gestures take place in a non-face-to-face setting. These are not iconic gestures, i.e. they are not representational. They occur as a person is speaking in a rhythmic manner, or in silence whilst the other person is speaking, or during a pause, and in a simultaneous manner. Hence the proposal for considering an autopoietic (Maturana and Varela 1980) internal mechanism of each person, which is coordinated by the feedback (but not information transmission) in the communication, provides a plausible theory to explore for the functionality of these gestures. In comparing our findings with that of Bavelas’ taxonomy of interactive gestures we found that nodding did occur when giving information, seeking, and in turntaking in the non-face-to-face situation. Nodding occurs when seeking to check if the listener has understood and in turn the listener nods when confirming this. It occurs when releasing and taking turns. However, although there is an apparent similarity with the face-to-face situation, our findings suggest that the interactive function of gestures in human interaction necessarily involves both a self and other oriented dimension, which in the non-face-to-face condition is predominantly self rather than other oriented. It has been suggested (Alibali and Heath 1999) that even in the case of representational gestures, they may perform dual functions, for self and for other. Hence our proposal for a self and other function in the case of nodding, and other coordinating gestures has some precedent in gesture research. Considering the idea of imaginative sociability (Fridlund 1991), it may be that feeling the presence of another is a necessary condition for performing metacommunicative (signalled conveyance of information about the communication) gestures. We could say that we perform certain gestures in the need to affirm the communicative situation to ourselves, thereby reinforcing this feeling of presence, and in this case, the expression ‘metacommunicative’ may not be suitable. This selfaffirming may be a need on the part of the speaker as a self-referencing act. The autopoietic theory proposes that a system maintains it’s own defining organisation, and regenerates its components. The self-referential gesturing actions may be part of the system’s (persons’) maintenance process. In the autopoietic vision of communication, this is described as orienting behaviour. The interactions orient the listener within their ‘cognitive domain’, and if these oriented domains are similar on the part of both participants, consensual orienting interactions are possible, as each becomes subservient to the maintenance of both. Consensual orienting interactions are coordinating interactions. What we term self-referencing may be what autopoietic theory terms orienting. Autopoiesis opposes the idea that communication involves the transmission of information, proposing instead that interaction serves to orient us within our ‘cognitive domain’. This would support the idea that the function of the gestures in the non-face-to-face setting is not for the interlocutor, but rather are a self-orienting mechanism, triggered off by the verbal feedback. As we orient towards similar cognitive domains, our gestures will be coordinated with our interaction. It would be expected, then, that some gestures will occur simultaneously as a result of being oriented in similar cognitive domains and not as a chance occurrence. It has been suggested (Alibali and Heath 1999) that gestures may ‘signal’ disfluencies (‘uhm’, ‘uuh’) in the face-to-face condition, whereas verbal fillers are 123 AI & Soc (2007) 21:567–605 589 greater in number in the non-face-to-face condition. Gesturing accompanies fillers in the non-face-to-face situation, for example in the form of nodding and body swaying, but the concept of signalling disfluencies to onself, does not seem to make sense in the non-face-to-face situation, as we have already found above. It is may be more appropriate to consider the concept of self-regulation for synchronous co-regulation with the other. The idea from autopoietic theory of orientation within cognitive domains, combined with the proposal of a dual functionality of coordinating gestures as both self-referential and self-affirming, provides a framework for the coordination of gestures in non-face-to-face communication. It also questions the idea of signal model of information processing to account for inter-individual gestural and speech sound coordination, and it finds support in what we now know from neuroscientific work on mirror neurons (Ferrari et al. 2003) and motor neurons (Large and Jones 1999). At this point I would like to take a step back, and before embarking on the explanation and examples of Body Moves, reflect on another technological space for the engagement of the body and co-presence. This is the virtual space. Virtual spaces Whist at Stanford Centre for the Study of Language and Information (2000–2003), I ran a seminar series of Gesture and will give an example of engaging in a virtual space from one of these occasions. Di Paola (2001) and his former colleagues had developed and worked with an avatar-based 3D virtual community that emulates many natural social metaphors but has extended and adapted many of these metaphors as its very tight-nit community of users has evolved over the years. OnLive Traveler10 and its communities use voice based emotive head avatars in online 3D virtual environments of their own creation. The community has evolved a very specific gesture and expression language that uses voice and 3D space in a socially complex manner. The original design goals of creating this system were for a commercial start-up, and these goals succeeded and failed over the years, but ultimately the community members independently evolved expressively rich conventions of gesture, expression and emotional creation for their own personal inter-relationship needs and were running it as part of their daily lives. In his demonstration of OnLive Traveler we had the chance to interact with the community members online and bring their thoughts of expression and gesture into our discussion. Being co-present in virtual space threw up some interesting issues, about how we project onto the virtual space and are able to engage with communication cues of proximity, of gestural nuances, of social performance cues of courtesy, offence, emotion, with people who can simulate these using the artificial theatre space. In interactive artwork this can become extended as we use more of our physical body (gesture, body motion and voice) to cause the effect in 10 http://www.dipaola.org/sig99/sld002.htm, http://www.dipaola.org/steve/vworlds.html 123 590 AI & Soc (2007) 21:567–605 the virtual environment to create social and emotional contact within it with other persons and their represented agency (of textures, colours, shapes, sounds, touch sensations, smell). My research on how co-presence in virtual space can be mediated by representations of communication (agents, the written word, cartoons, etc.) was triggered by an experience recounted to me by a Japanese colleague who used on-line methods for teaching distributed groups of long-distance students. These students had never met face-to-face and she never met them. The on-line discussions were mainly functional and the information expressed in a very narrow bandwidth of modality, namely the written word within the context of the study task set to the students. So there was no social discourse. After a year of teaching, she made a decision to shuffle the members of the groups around without consulting them, and was astonished at their emotional responses to this action. They expressed disorientation and felt she should have asked at least informed them of her intention and asked them about it. Awareness of another presence over time, regardless of whether one can see the other or not, builds a feeling of sharing or inhabiting a space together, of being co-present. The Japanese colleague did not recount the details of the study activities of the student groups in the online space, however, there is no doubt that the awareness over time must have involved forms of expression, style of conveying expressions and timing of expressions, that gave form for identities or personalities to be formed. Just the timing itself of another person’s actions can give one a sense of the other in relation to oneself. Now I would like to give another example of distributed communication, this time, with video-conferencing. A colleague of mine from my group in NTT was sent to work in the International division based in Tokyo, where he was involved in numerous video-conference based meetings with clients and colleagues in USA. My colleague is one of those special people whom you know understands what you are saying and can finish off your half uttered sentences. But in the video conference situation he was unable to understand what was being ‘said’. Yet his Japanese colleagues had no problems. I was quite taken aback by his dilemma. Here was one of the most brilliant of communicators with whom you could discuss Kant, having difficulties in communicating about relatively functional and mundane matters. It emerged in conversation that his colleagues spoke very good English whereas my colleague’s English is not so perfect and his grammar is not great. However, he is uncanny in building trust and rapport face-to-face. NTT did recognise this quality in him and sent him to USA to continue his work. The video conference had reduced the bandwidth of communication to the ‘words’ and ‘grammatical fluency’. It had not offered my colleague the other cues of co-present physical space that he uses to be with another person when they speak and think. It was these ‘other cues’ that I sought to better understand, for these must be the same kinds of cues that the junior landscape architect, discussed further above, was missing when he tried to colour in the maps with the explicit representations he had in front of him. It is these other cues that are critical for gaining ‘experience’ of the other and building ‘tacit knowing’ and ‘trust’ that goes deep for sustainable knowledge. Curiously, it is also these cues that were created in the OnLive Traveler where we used the representations of our selves within a virtual space. Could such a space be 123 AI & Soc (2007) 21:567–605 591 used to acquire embodied experiential knowledge? The senior landscape architect was adamant that no advance in technology could replace the face-to-face coperformance. This brings me back to relationship between embodiment and the tacit dimension in communication, and a return to Body Moves. Body Moves and musicality of human interaction Body moves are periodicities of rhythmic synchrony of body and speech entrainment across persons who are engaged in interaction. During these periodic moments the distinction between being a speaker and listener is not able to capture the nature of the dynamics (Gill 2004). We have seen from the non-face-to-face nodding and gestural coordination study that thinking in terms of signal and information transmission (that underlies the speaker–listener distinction), does not appropriately explain the function of such gestures for one’s own self-regulation or self-synchrony in relation to the other person. I mentioned at the beginning of this paper that my analysis of the movement of the body in the transfer and formation of knowledge has two conceptual layers, one from linguistics and one from my observations of the rhythmic synchrony of entrainment. Both have importance in Body Moves, which are movements of both the body and the voice. However, the linguistic layer inhibits the fuller understanding of the operation of the other, to the extent that in my early work (Gill et al. 2000), I placed one of the movements (Parallel Coordinated Movement) which I was convinced was significant but could not explain with linguistics, until I could investigate it much later (Gill 2002). In search of a coherent framework to talk about Body Moves in the early stages and handle this multi-layered problematic picture, I drew upon joint action (Clark and Schaefer, 1989; Clark 1996) theory (pragmatics of communication) and this seemed to work. However, joint action theory is based on information transfer and signalled information and has a sequential turn-taking structure. I found that I still had a problem to resolve. In work on gesture in joint action the gesture itself becomes part of this sequential structure of human communication. Yet joint action theory did seem helpful in two ways. Clark’s pragmatics acknowledges that there are multiple layers of sounds and semantics that constitutes a communicative act, and proposes that in cooperating to communicate, we express commitment to negotiate and understand each other, and this is necessary to arrive at grounded meanings. I have now drawn on joint attention theory (Tomasello et al. 2005; Eilan et al. 2005; Franco 1997) to help advance upon the problems of the linguistic structure underlying joint action. At this point it would help to describe Body Moves in some more detail. Body Moves The identification and conceptualisation of Body Moves emerged during a 1-month period (1997) spent at ATR (Adavanced Telecommunications Research) Media Lab, located between Kyoto and Nara, with a group of computational linguists. This 123 592 AI & Soc (2007) 21:567–605 group were designing various multi-modal conversational agents, and I sat in on their analyses of feedback and fillers. Although I was not fluent in Japanese, I had experienced enough of the sound of the language and contexts of use to feel the meanings that were intended in the prosodic quality. My Japanese colleagues knew I was concerned to understand what impinges on the flow and formation of knowledge, in particular the relation between the tacit and explicit, and I had the opportunity to reflect on the acquisition of skills in the distributed setting of the landscape architecture study. I was struck when viewing the video data I had collected by how the body operated differently in different design stages. For example architects position themselves around a table and engage with their bodies very differently when discussing what happened in a recent meeting with a client than they do when they are sketching out a conceptual idea together on a sheet of paper. Analysing presence would require me to select one of the design settings to start with. As the latter kind of scenario (sketching together) revealed a great deal of bodily activity and one could observe the expression of the architects’ ideas as they used their hands, pens and bodies, I selected this design stage of their work as the data for my analysis. In order to begin the analysis, I reflected on Wittgenstein’s discussion on actionreaction, the Philosophical Investigations, as the basis of language games. The Japanese colleagues were looking into feedback in order to design their multi-modal interactive agents, so we agreed that a good idea would be for me to explore how the body performs feedback. This would be of interest to both our purposes. My colleagues gave me Kita’s master’s thesis on gesture and cognition (1993), Traum’s PhD thesis (1994) and a paper on conversation moves (Carletta et al. 1997). I rather liked the idea of the word ‘move’ as something that moves the information and conversation forwards. This gave rise to the expression ‘Body Moves’ as being what I was looking for. Here is a description of the activity in that video clip of the landscape architects sketching together. The background to the activity is explained first. The selected video excerpt is of the landscape architects working on one design task of the daily practice in this firm. The senior architect is fully qualified and director of the company, whilst the other is being trained and due to qualify in a year’s time. They are both familiar with each other and share a mutual respect, despite the difference in their status and experience (empathic relationship). Their task is to produce a plan for the car park of a site, as well as the site itself. Some time earlier, they had produced a sketch plan for the client. The site is to be transformed from being an old derelict brewery to a headquarters of this client. The client has produced a version of their sketch plan, largely following their ideas and wants them to take this further. Part of the discussion between A (senior) and B (junior) is whether to go for something radical or generally remain within the bounds of what they have in front of them. They, or rather, A, decide that changing it would not greatly improve on what they have. Hence they decide upon the latter option. There is a great deal of body interactions in this design activity. Their mutual respect means that B is able to express disagreements and produce his own suggestions. However, the discrepancy in status is evident in the take-turns and 123 AI & Soc (2007) 21:567–605 593 keep-turns that A performs. This activity is illustrated in the following sketch of the action unfolding, of gestures, movements and entrainment taking place: The two architects are developing a sketch plan for a client, one is senior and one is junior. They communicate at different levels of design–the senior architect is focusing on the ‘conceptual structure’ of the entire landscape, where as the junior architect is focusing on ‘one position’ within that ‘conceptual structure’. Their gestures and body movement correspond to the level of the design. The senior architect makes large sweeping hand and arm gestures across the table. The junior architect makes small finger and hand pointing gestures. They never ‘meet’ although they do try–as one enters the space the other leaves it or shifts their position within it—until the senior architect ‘mimics’ the junior architect’s gesture and posture, by focusing on one alternative location within the design to that which was the focus of the junior architect. The moment the senior architect’s finger point indicates this intention, the junior architect moves down into the space as well. The moment the junior architect touches the paper with his finger and moves it in one stroke across the area of his proposed position, the senior architect moves his finger as well—back and forth—across his alternative proposal. During this parallel coordinated action, they are finally synchronised and entrained within each others’ body motion and voice prosody. The moment the junior architect leaves and lifts his body out of the design space the senior architect’s hand motion continues across into the junior architect’s space and in one pen stroke acknowledges his proposal. This all happens within a period of three seconds. How we know that this has led to knowledge transformation is that the junior architect then explicitly proposes a topic shift and moves his body position priming this shift. In the discussion that now follows, I will explain the Body Moves that are hinted at above. These occur where I say the architects ‘try to’ meet and where the parallel coordinated action takes place. Body Moves are essentially coordinated rhythms of body, speech and silence, performed by participants orienting within a shared activity. These rhythms create ‘contact’, i.e. a space of engagement between and take two forms, sequential and parallel. They are kinds of behavioural alignments (Scheflen 1974; Bateson 1955) and interactional synchrony (Birdwhistle 1970; Kendon 1970) and metapragmatic (Mey 2001). Drawing upon the idea of the composite signal (Clark 1996; Engle 1998) that denotes an individual’s composite act of speech and gesture, Body Moves were conceived as Composite Dialogue Acts where the idea of ‘composite’ denotes the various combinations of possible combinations of gesture, speech and silence (Gill et al. 2000) across the persons engaging with each other (i.e. not of the individual’s act). And as they occur Body Moves indicates the construction/ establishment of mutual ground within a space of action. Ascribing an act as being a ‘Body Move’ does not refer to, and is not defined in terms of, the physical movement, rather, it targets the act that the movement performs. Body Moves that were identified in the 5-minute video clip of the two landscape architects above are:11 11 For further details of these Body Moves, please see Gill et al. (2000). 123 594 AI & Soc (2007) 21:567–605 Attempt-Contact Dem-Ref Take-turn, Keep-turn, Release-turn Body-Check (B-check) Acknowledge (Ack) Focus An example of the Body Moves Attempt-Contact is: Function—Draws person’s attention and involvement Effect—Increases engagement or commitment Gesture—’looking’ gesture, or hand and arm gesture The analysis of the body moves identified drew upon the concept of metacommunication (Allwood et al. 1991; Shimojima et al. 1997) in linguistics where meta-communication is the conveyance of information about the conversation, as opposed to being about the topic situation of the conversation itself (i.e. content). Allwood et al. (1991) work was particularly helpful as it provided wider cognitive or behavioural categorisations of prosodic information that other linguistic theories could not account for, such as contact, perception, understanding and attitudinal reactions are. He stated that these are requirements of human communication because they describe how we relate to each other when communicating, whether we are committed, whether we are positive or negative, whether we are perceiving the expression or are interpreting and showing how we understand, and what assumptions we are making. In linguistics, the conveyance of information is described as being triggered by cues that inform about the conversation situation for example, fillers and responsives (fillers: ‘uhh, mmm, uhm’; responsives: ‘uh huh, ok’). These function as ‘discourse markers’ and can be identified by prosody and ‘phoricity’ (Kawamori et al. 1998). These interjections in speech determine discourse structures and the nature of the coordination taking place (Schiffrin 1987; Kawamori et al. 1998). The idea of such conveyance cues or communication acts was applied to body movements that are occurring in response to each other, whether this is related to a verbal utterance or independent of it. However, such conveyance cues could not apply to what was described at that time as the ‘exception’, the Parallel Coordinated Move. In the context of gesture, Body Moves are distinct from representational or iconic gestures of the verbal utterance as these serve primarily to illustrate it. Where in conversation, a ‘move’ is described as being a verbal action that causes the conversation to move forward (Carletta et al. 1997), the body move is a bodily action, which initiates or responds to a bodily action or verbal utterance. However, the body move is wider in its scope as it can be a response or an initiation, and in the case of the Parallel Coordinated Move it is both in the same moment as the bodies and speech of the participants move together. And we come back to this when reflecting on the speaker–response model. The building of the categories of Body Moves (BM) draws upon dialogue acts and features of dialogue acts, which bear parallel to the phenomena observed. Some features of dialogue acts are specific to speech and are not embodied in Body 123 AI & Soc (2007) 21:567–605 595 Moves, such as intonation, and asking questions or making commands. Some Body Moves have required the development of new terms in order to either demarkate between the bodily actions and their Communication Act (CA) counterpart, such as ‘check’, which as a body move becomes b-check, or because there appears no clear counterpart in Communication Act theory, for instance dem-ref, attempt-contact and focus. All the categories are explained below. As Body Moves are Composite Dialogue Act (CDA) across more than one person, a BM can be accompanied by a CA, or accompanied by no speech; and there may only be silence or stillness, e.g. as in a pause. It is significant that right at the outset, in distinction from conversation acts, it was held that BMs cannot be said to embody specific intentions although they could be said to embody an intention for communication itself (Gill et al. 2000). Body Moves could now be described as expressions of mutually manifest intentions to understand the other (drawing on Sperber and Wilson’s work 1986). It was mentioned above that Body Moves create contact, a ‘space of engagement’. An engagement space is the composite of the participants’ body fields of engagement. Hence we can call the engagement space, the body field of engagement. The body field of engagement is set as the communication opens and the bodies indicate and signal a willingness to cooperate (This linguistic description draws on Allwood’s 1995). The body field of engagement is a variable field and changes when participants are comfortable or uncomfortable with each other. For instance, in the case where one person moves their hand over into the other’s space, and that person withdraws their hand, this indicates that the ‘contact’ between these persons is disrupted. There are also examples where the participants hold their bodies back from entering either’s field of engagement, indicating disagreement or discrepancy in the communication, and distance rather than contact. The degree of contact or nature of distance is described in terms of commitment and attitude. Hence an immediate space of engagement involves a high degree of contact and commitment to the communication situation, whereas a passive distance is less involved and committed, and disagreement, is very distanced and commitment is withheld. Disagreement or discrepancy can necessitate a reconfiguration of the body field of engagement. Reconfiguration occurs when there is a disturbance in the relationship between the speakers, i.e. there is a discrepancy between them which is expressed by the bodies need to re-arrange their relationship to each other so that a feeling of sharing an engagement space is re-established. In other words, there has been momentary detachment or distance. The reconfiguration is a response move, which is almost akin to a motor reaction. It is a rhythmic reconfiguration of the body space between the participants to create a new engagement situation by reshaping the field of engagement. This category of action occurs because there is a problem in the overlap in one body’s field of engagement with the other body’s. It is necessarily a reaction to the other person moving into one’s body field, at that particular moment. Note that if there is no problem in the overlap of their respective fields, the participants can undertake parallel coordinated moves or collaborate in problem solving. 123 596 AI & Soc (2007) 21:567–605 Within the space of engagement, bodies can move in a coordinated manner to shift the level of focus within the communication situation, for instance upon a specific point. Focus is actually categorised as a body move. It involves a movement of the body towards the area the speaker is attending to, i.e. space of bodily attention, and in response causes the listener or other party to move their body towards the same focus. The category becomes significant as a dimension of body interaction as it is about focus management. The engagement space bears a relationship to the ‘o’ space developed by Kendon within the field of gesture and which I later learnt of. In subsequent communication with him, the relationship between the engagement space and Kendon’s ‘o’ space is explored. Within research on space and communication, Kendon’s (1990) F-formation system is a foundational theory for how we orient ourselves within what he terms, the joint transactional space or ‘o’space. Each person, when acting on their own, has a private space called a transactional segment. For example, when watching the television, a person’s transactional segment is defined as the space that he/she ‘looks (into) and speaks, and in which he/she reaches to handle objects’. Other persons acknowledge and ‘respect this space, not entering or crossing into it’. This transactional segment is managed by the person’s behaviour. When at least two people get together to do something, they ‘arrange’ themselves such that their transactional segments overlap and create a joint transactional space. These persons ‘agree to maintain joint jurisdiction and control over this space’, termed the ‘ospace’. Kendon cites the example of a group of people standing in a circle in a park, talking. The jointly constructed ‘o-space’ provides the ‘frame’ within which the ‘story line’ of the interaction could develop. But the analysis does not deal with the spatial stucturings involved in the actual unfolding of that ‘story line’ (this term is borrowed from Goffman who uses it in his ‘frame analysis’ (1974). However, the engagement space does. It is the dynamic ever changing space that a person creates as he/she unfolds a line of activity within the ‘frame’ or ‘story line’ provided by the overlapping transactional segments (the spatial structure for possible action shaped by the posture and positioning of the bodies). This overlapping of segments or ‘ospace’ is the arena within which actions pertinent to the interaction can take place— it sets the ‘field’ that I speak of, for whatever is to be done. The ‘engagement space’ is the space ‘consumed’ by actors within the o-space, and provides an understanding of how participants who physically reach into each other’s engagement spaces negotiate this unfolding. The bodily moves made within those spaces become pertinent for the communicative process just because they are coordinated within each others’ engagement spaces. It is significant that these moves are rhythmic and synchronised. The first form of rhythmic synchrony serves to sustain our commitment to engage with each other, and the second form serves to transform our states of tacit knowing such that we are able to arrive at agreements and achieve topic shifts. The latter form cannot be arrived at without the existence of the former and emerges from it. Both of these forms of Body Moves are moments of empathic connection. In summary, there are two forms of Body Moves. 123 AI & Soc (2007) 21:567–605 597 (a) Parallel Coordinated Moves, which are moments of simultaneous coordinated autonomy. These are evidence of intersubjectivity, and where knowledge transformation occurs. (b) Other Body Moves (for present purposes shall still be called sequential), which are moments of coordinated autonomy. They serve to build common ground, and enable knowledge flow. In this next section, the empathetic connection that the Body Moves embody is analysed. Body Moves and entrainment The Body Moves have been described as being moments of empathetic connection. Drawing on the musical term ‘accent’, these moments may be expressed as being ‘accents’ of affect (emotion), and the most heightened one is that of the Parallel Coordinated Move where the accent is of empathy. Drawing on the concept of entrainment, these accents may be described as moments of ‘convergence’ that culminate in the strongest moment of convergence, where transformation takes place. Body Moves are emergent beats from the interactive structures they are embedded within. These beats are pulse periodicities. In the example of the two landscape architects we can see the emergence and movement of these pulse periodicities. The affective natures of Body Moves are indicative of how the participants are relating to each other. We spoke of this dimension of affect when talking of ‘contact’ in the engagement space that Body Moves shape. They are expressive structures through which we sense our relationship to each other. In a talk given at the Interacting Bodies conference in 2005, I described these emergent beats as ‘salient phenomenal beats’. And I would like to explain what I mean by this. To do so, I have to reconsider the application of the word ‘signal’ to Body Moves as denoting phenomena that has phenomenological experience already embodied (Tolbert 2001). The origins of this embodiment may be seen as being shaped in motherese, the interaction between a mother and her baby. Studies of motherese (mother–infant interaction) shows how the poetic sounds/rhythms of a mother’s utterances and bodily engagement differ from ordinary adult interaction, by being simplified, rhythmically repeated, exaggerated and elaborated. This, uttered universally with a high, soft and breathy voice, and phonetic foregrounding of salient utterances, attracts and sustains attention (Miall and Dissanayake 2003). The mother exhibits alternative patterns of intimacy and observation, empathy and commentary. The aesthetics in this poetic engagement facilitates emotional attachment. They propose that evidence of the sensitivity of infants as young as 6–8 weeks old to indications (vocal, visual and Kinesic) of social contingency of mothers/fathers/partners, is evidence of design in neural organisation. They argue that this supports the view that mutuality or intersubjectivity—the coordinating of behavioural–emotional states with another’s in temporally organised sequences—is a primary human psychobiological endowment. Mutuality is dependent on a 123 598 AI & Soc (2007) 21:567–605 fundamental dyadic timing matrix. Disorders of emotion, and learning in early childhood are traceable to faults in early brain growth of neural systems underlying this capacity (Trevarthen 1994, 2005). Hence the expression ‘signal’ does not mean ‘information’, but an experientially grasped iconicity of bodily states of the movements of body and sound. With the origins rooted in motherese, each person’s experiential reference system is different but there will be some shared cultural base. Predictability or tacit knowing, lies in sharing this cultural base, these points of reference in order to come together in joint attention (Tomasello 2005). Music—provides a framework for thinking about this. Music’s model is that in order to grasp it you have to engage with it. Musical concepts such as tempo may need to be considered to understand this dynamics, as people have different personal time-frames, and it is possible they may articulate the same temporal patterns at different rates (turn-taking, sequences), that periodically come together (i.e. the Body Moves). Also after they have reached the end of a point of negotiation as in the parallel coordinated move, the personal tempos are sychnronised. They are entrained. Entrainment is coordinating the timing of our behaviours and rhythmically synchronising our attentional resources. This proposal that personal tempos become synchronised would support the basis on which I identified the Body Moves as being salient movements of bodies and voices playing off each other and with each other, and describing these moves as being distinct from linguistic ‘turn-taking’. If one considers music performance, we can further say these movements are distinct because the participants are coperformers, i.e. they are both speaking and listening to each other as performers at exactly the same time, not in terms of turn-taking as in speech. This ‘musical’ communicative structure is essential to arrive at parallel coordinated action. It is proposed that this synchronisation of temporal rhythms is how two people can come to share an idea (or acknowledge each others’ recognition of a mutually manifest; Sperber and Wilson 1986, state of affairs), and through parallel and coordinated motion, they reach a heightened form of co-regulation and convergence. The use of artefacts (interactive technologies, mediating technologies, pens and paper, overhead slides, smart collaborative technologies) can influence these rhythmic coordinations and even disrupt them, and this has implications for how we manage knowledge transformation. The examples given above of the architects coming to share an aesthetic judgment about colours, and of the participants in the simulation of query-answer web-search session of where to eat out in Tokyo, involved the shaping of a group norm of ways of doing, talking, sensing and perceiving information. These shared norms emerged from the entrainment of their individual tempo, pitch, phrasing, of embodied experience, carried in their rhythmic synchronisation, that they brought to bear on the situation, grounding and accommodating its meaning in relation to themselves and each other. In a study of beat entrainment by Himberg (2006), where two people are tapping to a metronome (the artificial beat), their tapping drifts from the metronome as they tap to the beat of each other. After some time their tapping realigns with the metronome, and then it drifts off again to each other’s beat. All this happens with no awareness on the part of the ‘tappers’ who think they have been tapping to the metronome (the artificial beat) all along. It is not an accident that each 123 AI & Soc (2007) 21:567–605 599 branch of the distributed architects’ firm had its own shared sense of colour, as any shared norm emerges from the movements of those that ground it. ‘‘Collective action is seen as being shaped by the movement between the individual and social situation. We achieve collective action through our understanding of the performance of representations of the tacit dimension in our communication, such as gestures, non-verbal cues, speech and pauses, as well as artefacts of practice, including technology, evident in how we perform with them. This understanding mediates the constant transition between individual and social states (Gill 2004). The Body Moves are transitions in body and speech sound movement and these transition states appear to be associated with different types of rhythm where the sound and body motions of the participants becomes coupled, if only for a very brief moment. This coupling is considered to be part of the process of understanding, and it is expressed at a level of the social. Rhythm, tempo, pulse are our personal expression of relating with another human being and our environment, and are in that sense, social. The coupling in this picture of knowledge formation is therefore considered as being part of the social understanding of tacit knowing. It is akin to music performance. In a choir for example, singers and musicians are ‘collectively engaged in the synchronous production and perception of complex patterns of sound and music’ (Cross 2007; Arom 1991; Blacking 1976). In collective musical behaviour, the individual behaviours are likely to be coordinated with time and be more or less predictable in relation to each other. This collective activity therefore has a high degree of coherence, which is likely to help establish a strong sense of group identity (Stobart and Cross 2000; Cross 2007). This work from music support(s) the idea of collective action as entrainment in the case of Body Moves. These are points of convergence in interaction, where features of musicality are drawn on in terms of interpersonal performative cues—each participant ‘signalling’ to the other their mutuality of understanding by ‘sharing time’—the mutual sense of shared meaning that is a feature of musicality in interaction (Cross 2007) is foregrounded and confirmed in the sharing of time. In musical terms, body moves (of body and vocal sound) are salient and explicit and phenomenal ‘beats’, that are emergent from accentual structures in terms of the articulated interactive structure (interactive gestural structure such as the engagement space), i.e. the structure of the interaction embodies cues as to structural accent (some events are more salient, more differentiated, more referential in respect of the dimensions that the interaction employs) that are experienced as phenomenal or veridical accents. Each person has different phenomenological experience embodied. Conclusion: Body Moves and knowledge transformation Body Moves can be summarised has being rhythmic spontaneous coordinations of at least two people that indicate the nature of contact, resonance and commitment within a communication situation. They span more than one body (are collective). They shape and constantly configure the engagement space of action. Body Moves enable the formation and transformation of tacit knowing and intersubjectivity 123 600 AI & Soc (2007) 21:567–605 (Polanyi 1964, 1966). In performing Body Moves we engage with the representations of the tacit dimension of another’s actions and move with them, for example, in a design activity, or to form a shared identity. Action is the performance, whilst its tacit dimension, is its basis that is sensed, grasped, responded to, rather like music. Going back to the consultancy practice study and the colouring of the maps to share aesthetic judgment, the representation of the tacit dimension of action is the structure of the form of its expression. In engaging with the representation of the tacit dimension of another’s actions, we are resonating with the communicative structures being performed. In order to be able to do so, we both draw upon experience and are experiencing in the same moment. The bodily dimension of this resonance is critical for presence. In his work on ‘Keeping Together in Time’ (McNeill 1995) speaks of visceral and emotional sensations that come with shared movement, that he proposes endows groups with the capacity to cooperate’’. The Body Moves are coordinated autonomy, essential for sociality. In the ‘Tacit Dimension’, Polanyi described a relation between emergence and comprehension, as existing when ‘an action creates new comprehensive entities’. Parallel Coordinated Moves are multi-activity gestural coordinations, where different but related projects are being expressed in the body actions of the participants at the same time. This fusion provides the conditions for tacit transformation in a new plane of understanding from the prior periodicities that revolve around one idea, and as a result they create new comprehensive entities, expressed in the simultaneous rhythmic sychronisation of bodies and speech. The collaborative features of these moves enable the participants to negotiate and engage in the formation of a common ground (Gill 2002) whilst expressing different perspectives. In the ‘engagement space’ it is the one moment where the body fields can overlap without disturbance and co-regulation is at a heightened affective state. For Polanyi, the body is the ‘ultimate instrument of all external knowledge’, and ‘wherever some process in our body gives rise to consciousness in us, our tacit knowing of the process will make sense of it in terms of an experience to which we are attending’. In performing Body Moves, the ability to grasp and sense someone’s motions, and respond to them appropriately (skilfully) is based on experience (tacit knowing of the process) and experiencing (experience to which we are attending). It is spontaneous action. Hence the architects can come to see colours with shared (not same) aesthetic judgements. Knowledge or a concept is an emergent property of being engaged with another. It is a point in the knowledge flow where transformation occurs, and this emerges through, for example, acknowledging and demonstrating understanding of what someone is trying to say in the same ‘language’ expressing mutually shared intention to understand. The concept of sharing an idea of a colour as being of a ‘right shade of blue’ to work with a particular shade of ‘green’ is borne in practice with the other’s expression of experience in making judgements. The formation of concepts have underlying them a temporal flow of prosodic and modulating events (gestures, body motion, vocal sound) that is entrainment. Entrainment is coordinating the timing of our behaviours and rhythmically synchronising our attentional resources. 123 AI & Soc (2007) 21:567–605 601 The identification of Body Moves has contributed to the conception of ‘pragmatic acts’ by widening the ‘narrow conception of strict natural language pragmatics’ (Gill et al. 2000; Mey 2001) and the analysis needs to be taken deeper to understand the entrainment processes in human interaction. Non-verbal communication has already gained strong ground in different disciplines and in its significance for understanding the ‘human interface, the point at which interaction occurs’ (Gill et al. 2000). As part of this shift from cognition to communication, the international gesture society was founded in 2002. The increasing focus on the non-verbal has been reflected in the design of interactive multi-modal systems, some of which was referred to above. The performance arts domains are exploring the relation between sounds (from the spectrum of recognisable speech to recognisable music) and movement in dance performance and take this understanding beyond the ‘paralinguistic’. Body Moves are being applied to performance environments, e.g. dance, where they are described as a-linguistic pragmatics, emerging in the sound and visual system of the performance space (Sha and Gill 2005). Although they were initially coded (described or explained) within a linguistic context, this has proven to be problematic as linguistics separates the speaker (sender) and the listener (receiver). Sender and receiver are connected through information transmission, which allows for the conception of the autonomous cognitive entity, the autonomous expert, functioning with explicit knowledge. However, Body Moves are necessarily sensory couplings of engaged persons, and by working with performance arts domains, their analysis has evolved to discover their essential qualities as performance periodicities of the body that are self and other oriented. Body Moves give us a further insight into the nature and operation of ‘copresence’ (Good 1996), which is an essential component of human understanding, denoting how we are present to each other, be this in the same physical space or in differing physical spaces (e.g. computer mediated spaces, virtual, or mobile technology mediated spaces, computer augmented performance spaces). Being present is described as a precondition for committed communication, but the nature of this precondition affects how we coordinate with each other and make sense and meaning. Work on the body and experiential knowledge suggests that the design of technology needs to work with these affordances of entrainment. Reflections The ideas that this paper has worked through have journeyed through numerous forms of technology, and have explored the assumptions that lie behind their design as well as the impacts these technologies have on us. This exploration has covered theoretical foundations and methodologies from philosophy, social science, psychology, linguistics, performance arts, interactive arts. The research investigation has evolved the methodology for the tacit dimension by bringing together the subtle aspects that these disciplines afford for understanding the tacit dimension at many levels of human engagement. Each theory and methodology on its own was insufficient. The analysis provides the basis for a theoretical and methodological 123 602 AI & Soc (2007) 21:567–605 design framework for interactive technologies to afford those human capacities and qualities that are essential for human communication and engagement. Such design requires conceiving of ‘interface’ as located in ‘dialogue’ and ‘personal’ and ‘experiential knowledge’ (Gill 1995). The human–machine interface needs to afford us the resonance of structures in communication operating at multiple dimensions. As we exist within our bodies, the body may be seen as a mediating interface for the tacit and explicit interrelationship of human communication, carried through the body’s movement, breath and vocal sound; gesture, silence and speech. This mediating interface needs to operate in the human engagement sphere and the human–machine interface needs to afford us this capacity necessary for entrainment. It is important to root the methodology and design in a conceptual framework for the relationship between entrainment, experiential knowledge, the body, tacit knowing and communication. Such a framework is needed for analysing and understanding the current and future impacts on our engagement with each other and our environments when we interact with various forms of interactive technologies. This framework will always be incomplete as each new form of technology causes us to rediscover (in a long historical sense) what makes us human. However, because it is about the very fundamental layers in which culture and semantic meaning is rooted, it can evolve with the technological developments and keep apace of them. Acknowledgements I would like to thank the following people for their inspiration over the years in supporting the development of the ideas, Bo Goranzon, Deborah Bekerian, Masahito Kawamori, Herb Clark, Ian Cross, Terry Winograd, Ajit Narayanan, David Good, Mike Cooley, Seija Kulkki, Timo Saari, Hisao Nojima, Yasuhiro Katagiri, Sotaro Kita, David Smith, Jeremy Potter, Jan Borchers, Sha Xin-Wei, Liz Tolbert, Hubert Dreyfus, Adam Kendon and Karamjit S. Gill. I also thank Colin Tully, William Wong and Martin Loomes of Middlesex University for their support of this research. References Alibali MW, Heath DC (1999) Effects of visibility between speaker and listener on gesture production: some gestures are meant to be seen. Source of reference was this pre-published paper. Later publication Alibali MW, Heath DC, Myers HJ (2001) Effects of visibility between speaker and listener on gesture production: some gestures are meant to be seen. J Mem Lang 44:169–188 Allwood J, Nivre J, Ahlsen E (1991) On the semantics and pragmatics of linguistic feedback (Tech. Rep. No. 64). Gothenburg Papers. Theor Linguist Anderson ML (2003) Embodied cognition: a field guide. Artif Intell 149(1):91–130 Arom S (1991) African polyphony and polyrhythm: musical structure and methodology. Cambridge University Press, Cambridge Barsalou LW (1988) The content and organisation of autobiographical memories. In: Neisser U, Winograd E (eds) Remembering reconsidered: ecological and traditional approaches to the study of memory, pp 193–243, Cambridge University Press, New York Bateson G (1955) The Message. ‘This is the Play.’ In: Schaffner B (ed) Group processes, vol II. Macy, New York Bavelas JB (1994) Gestures as part of speech: methodological implications. Res Lang Soc Interact 27(3):201–221 Bavelas JB, Chovil N, Coates L, Rose L (1995) Gestures specialized for dialogue. PSPB 21(4):394–405 123 AI & Soc (2007) 21:567–605 603 Bekerian DA, Dennett JL (1990) ‘Spoken and written recall of visual narratives’. Appl Cogn Psychol 4:175–187 Birdwhistle RL (1970) Kinesics and context. University of Pennsylvania, Philadelphia, PA Blacking J (1976) How musical is man? Faber, London Carletta J, Isard A, Isard S, Doherty-Sneddon G, Anderson A (1997) The reliability of a dialogue structure coding system. Assoc Comput Linguist 23(1):13–31 Cassell J, Sullivan J, Prevost S, Churchill E (2000) Embodied conversational agents. MIT, Cambridge, MA Clark HH, Schaefer EF (1989) Contributing to discourse. Cogn Sci 13:259–294 Clark HH (1996) Using Language. Cambridge University Press, Cambridge Clayton M, Sager R, Will U (2005) In time with the music: the concept of entrainment and its significance for ethnomusicology. In: European meetings in ethnomusicology 11 (ESEM Counterpoint 1), (2005), pp 3–75 Condon WS, Ogston WD (1966) Sound film analysis of normal and pathological behavior patterns. J Nerv Ment Dis 143:338–347 Cooley MJ (1987) Architect or bee? Hogarth Press, London Coupland N (1999) The discourse reader. Routledge, London Cross I (2007) The evolutionary nature of musical meaning. Musicae Scientiae (in press) Di Paola S (2001) Gesture and narrative creation in avatar-based 3D virtual communities. Invited paper at gesture and dialogue seminar, CSLI, Stanford University, Available at http://www.dipaola.org/ sig99/sld002.htm Ekman P, Friesen WV (1969) The repertoire of nonverbal behaviour: categories, origins, usage, and coding. Semiotica 1:49–98 Ekman P, Friesen WV (1972) Hand movements. J Commun 22:353–374 Eilan N, Hoerl C, McCormack T, Roessler J (2005) Joint attention: communication and other minds. Issues in philosophy and psychology. OUP, Oxford Engle RA (1998) Not channels but composite signals: speech, gesture, diagrams and object demonstrations are integrated in multimodal explanations. In: Proceedings of the 20th annual conference of the cognitive science society, pp 321–327, USA Fodor JA (1976) The language of thought. The Harvester Press, Sussex Fodor JA (1981) Representations: philosophical essays on the foundations of cognitive science. Harvester Press, Brighton Franco F (1997) The development of meaning in infancy: early communication and social understanding. In: Hala S (ed) The development of social cognition. Psychology Press, Hove Fridlund AJ (1991) Sociality of solitary smiling: potentiation by an implicit audience. J Pers Soc Psychol 60(2):229–240 Gill JH (2000) The tacit mode. Michael Polanyi’s postmodern philosophy. SUNY Press, New York Gill KS (1996) The foundations of human-centred systems. In: Proceedings of the human machine symbiosis: the foundations of human-centred systems design. Springer, London Gill SP (1988) On two AI traditions, AI & Society, vol 2 No.4. Springer, London. NB. A version of this Gill SP (1988) Knowledge and skill transfer through expert systems: British and Scandinavian traditions. In: Research and development in Expert Systems V: proceedings of Expert Systems ’88, Cambridge University Press, Cambridge Gill SP (1995) Dialogue and tacit knowledge for knowledge transfer. Ph.D. Thesis, University of Cambridge, Cambridge Gill SP (1998) Body language: the unspoken dialogue of bodies in rhythm. In: Proceedings of the ESSLI workshop on mutual knowledge, common ground and public information. Gill SP (1999) Mediation and communication of information in the cultural interface. In special issue on science, technology and society. AI Soc 13:1–17 Gill SP, Kawamori M, Katagiri Y, Shimojima A (2000) The role of body moves in dialogue. RASK 12:89–114 Gill SP (2002) The parallel coordinated move: case of a conceptual drawing task. Published Working Paper: CKIR, Helsinki. ISBN Gill SP, Kawamori M (2002) Coordination of gestures in a non face-to-face setting. In: Rector M, Poggi I, Trigo N (eds) Gestures, meaning and use. Fundacao Fernando Pessoa, Porto Gill SP, Borchers J (2004) Knowledge in co-action: social intelligence in collaborative design activity. AI Soc 17(3). This paper is an adaptation of a conference paper presented at social intelligence design 2003, Royal Holloway, London 123 604 AI & Soc (2007) 21:567–605 Gill SP (2004) Body moves and tacit knowing. In: Gorayska B, Mey JL (eds) Cognition and technology. John Benjamin, Amsterdam Gill SP (2005) Pulse periodicity in paralinguistic coordination. Presented at international conference, Interacting Bodies, Lyon Goffman E (1974) Frame analysis: an essay on the organization of experience. Harper Row, London Good DA (1996) Pragmatics and presence. AI Soc J 10(3, 4):309–314 Goodwin C (1997) The blackness of black: colour categories as situated practice. In: Resnick LB, Roger S, Clotilde P, Barbara B (eds) Discourse, tools and reasoning: essays on situated cognition. Springer, New York, pp 111–140 Goodwin C (2003) Pointing as situated Practice. To appear In: Sotaro K (ed) Pointing: where language, culture and cognition meet. Lawrence Erlbaum, London (in press) Goranzon B (1993) The practical intellect: computers and skills (Artificial Intelligence and Society). Springer, Heidelberg Goranzon B, Josefson I (1988) Knowledge, skill and artificial intelligence. Springer-Verlag, London Himberg T (2006) Co-operative tapping and collective time-keeping—differences of timing accuracy in duet performance with human or computer partner. Presentation at 9th international conference on music perception and cognition, Bologna, Italy Hutchins E (1995) Cognition in the wild. MIT, MA Johannessen KS (1988) Rule following and tacit knowledge. AI & Soc J 2:287–302 Kawamori M, Kawabata T, Shimazu A (1998) Discourse markers in spontaneous dialogue: a corpus based study of Japanese and English. In: Proceedings of 17th international conference on computational linguistics (COLING-ACL98) Kendon A (1970) Movement coordination in social interaction: some examples described. Acta Psychol 32:100–125 Kendon A (1972) Some relationships between body motion and speech: an analysis of an example. In: Seigman A, Pope B (eds) Studies in dyadic communication. Pergamon Press, Elmsford, NY Kendon A (1990) Conducting Interaction. Cambridge University Press, Cambridge Kendon A (2004) Gesture: visible action as utterance. CUP, Cambridge Kita S (1993) Language and thought interface: A study of spontaneous gestures and Japanese mimetics. Ph.D. Thesis. University of Chicago, Chicago, IL Large E, Jones MR (1999) The dynamics of attending: how people track time-varying events. Pyschol Rev 106(1):119–159 Maturana H, Varela F (1980) Autopoiesis and cognition: the realisation of the living. D. Reidel, Dordecht McNeill D (1992) Hand and mind. University of Chicago Press, Chicago McNeill WH (1995) Keeping together in time. Harvard University Press, London Mey J (2001) Pragmatics. An introduction. Blackwell, Oxford Miall DS, Dissanayake E (2003) ‘‘The Poetics of Babytalk.’’ Human Nature 14:337–364 Miyazawa K, Gill SP, Nojima H (1999) The influence of the internet on Japanese education: case of the 100 schools networking project. In: IEEE proceedings of internet workshop’99, Osaka, Japan McNeill D, Cassell J, McCullough K-E (1994) Communicative effects of speech-mismatched gestures. Res Lang Soc Interact 27(3):223–237 Polanyi M (1964) Personal knowledge: towards a post critical philosophy. Harper and Row, NY Polanyi M (1966) The tacit dimension. Doubleday. Reprinted version, 1983, Peter Smith, Gloucester, Mass Pylyshyn ZW (1984) Computation and cognition: towards a foundation for cognitive science. MIT Press, Cambridge, MA Reiner M, Gilbert J (2004) The symbiotic roles of empirical experimentation and thought experimentation in the learning of physics. Int J Sci Educ 26:1819–1834 Rizzolatti G, Fogassi L, Gallese V (2001) Neurophysiological mechanisms underlying the understanding and imitation of action. Nat Rev Neurosci 2:661–670 Ferrari PF, Fogassi L, Gallese V, Rizzolatti G (2003) Mirror neurons responding to the observation of ingestive and communicative mouth actions in the monkey ventral premotor cortex. Eur J Neurosci 17(8):1703–1714 Roth EM, Patterson ES, Mumaw RJ (2001) Cognitive engineering: issues in user-centered system design. In: Marciniak JJ (ed) Encylopedia of software engineering. 2nd edn. Wiley, NY Scheflen AE (1974) How behaviour means. Exploring the contexts of speech and meaning: kinesics, posture, interaction, setting, and culture. Anchor Press/Doubleday, New York 123 AI & Soc (2007) 21:567–605 605 Schiffrin D (1987) Discourse Markers. Cambridge University Press, Cambridge Sha XW (2002) Resistance is fertile: gesture and agency in the field of responsive media, in makeover: writing the body into the posthuman technoscape. Configurations 10(3):439–472 Sha X-W, Gill SP (2005) ‘Gesture and response in field-based performance’. In: The ACM Proceedings of creativity and cognition 2005, Goldsmiths College, London Sha XW (2005) The TGarden performance research project. Modern Drama 48(3): Fall, Special issue: technology 585–608 Shannon C, Weaver W (1949) The mathematical theory of communication. University of Illinois Press, Urbana, IL Shimojima A, Katagiri Y, Koiso H (1997) Scorekeeping for conversation-construction. In: Proceedings of the Munich workshop on semantics and a pragmatics of dialogue Simon H (ed) (1982) Models of bounded rationality. Behavioral economics and business organization, vol 2. MIT, Cambridge, MA, pp 424–443 Simon H (1969) The sciences of the artificial, 1st edn. MIT Press, Cambridge, MA Simon H (1983) Reason in human affairs. Stanford University Press, Stanford, CA Sperber D, Wilson D (1986) Relevance: communication and cognition. Blackwell, Oxford Stobart H, Cross I (2000) The andean anacrusis? Rhythmic structure and perception in easter songs of Northern Potosı́, Bolivia. Br J Ethnomusicol 9(2):63–94 Tolbert E (2001) Music and meaning: an evolutionary story. Psychol Music 29(1):84–94 Tilghman BR (1988) Seeing and seeing-as. AI Soc J 2(4):303–319 Tomasello M, Carpenter M, Call J, Behne T, Moll H (2005) Understanding and sharing intentions: the origins of cultural cognition. Behav Brain Sci 28:675–691 Trevarthen C, Aitken KJ (1994) Brain development, infant communication, and empathy disorders: intrinsic factors in child mental health. Dev Psychopathol 6:597–633 (zit. nach Trevarthen, 1996) Trevarthen C, Aitken KJ(2005) Disorganized rhythm and synchrony: early signs of autism and Rett syndrome. Brain Dev 27:S25–S34 Trevarthen C, Colwyn S (2000) The dance of wellbeing: defining the music therapeutic effect. Nord J Music Ther 9(2):3–17 Trevarthen C (2005) First things first: infants make good use of the sympathetic rhythm of imitation, without reason or language. J Child Psychother 31(1):91–113 Traum DT (1994) A computational theory of grounding in natural language conversation, Ph.D. Thesis. The University of Rochester, Rochester, NY Tulving E (1972) Episodic and semantic memory. In: Tulving S, Donaldson W (eds) Organisation of memory. Academic, New York Weizenbaum J (1966) ELIZA—a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):36–45 Weizenbaum J (1976) Computer power and human reason. WH Freeman, San Francisco Wittgenstein L (1953) Philosophical investigations 123