This document summarizes a study on cultural differences in the use of pauses in speech between Germany and Japan. The researchers analyzed pause occurrences in video recordings from both cultures to build a computational model for virtual agents. They found that Japanese subjects used significantly more pauses over 1 and 2 seconds than German subjects. The researchers aim to integrate cultural differences in pause usage into dialogue management for embodied conversational agents to enhance their believability when interacting with users from different cultures.
This document summarizes a study on cultural differences in the use of pauses in speech between Germany and Japan. The researchers analyzed pause occurrences in video recordings from both cultures to build a computational model for virtual agents. They found that Japanese subjects used significantly more pauses over 1 and 2 seconds than German subjects. The researchers aim to integrate cultural differences in pause usage into dialogue management for embodied conversational agents to enhance their believability when interacting with users from different cultures.
This document summarizes a study on cultural differences in the use of pauses in speech between Germany and Japan. The researchers analyzed pause occurrences in video recordings from both cultures to build a computational model for virtual agents. They found that Japanese subjects used significantly more pauses over 1 and 2 seconds than German subjects. The researchers aim to integrate cultural differences in pause usage into dialogue management for embodied conversational agents to enhance their believability when interacting with users from different cultures.
This document summarizes a study on cultural differences in the use of pauses in speech between Germany and Japan. The researchers analyzed pause occurrences in video recordings from both cultures to build a computational model for virtual agents. They found that Japanese subjects used significantly more pauses over 1 and 2 seconds than German subjects. The researchers aim to integrate cultural differences in pause usage into dialogue management for embodied conversational agents to enhance their believability when interacting with users from different cultures.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online from Scribd
Download as pdf or txt
You are on page 1of 6
Talk is silver, silence is golden: A cross cultural study on
the usage of pauses in speech
Birgit Endrass Yukiko I. Nakano
Matthias Rehm Tokyo University of Agriculture and Technology Elisabeth André 2-24-16 Nakacho, Koganei-shi, University of Augsburg Tokyo 184-8588, Japan Eichleitnerstr. 30 nakano@cc.tuat.ac.jp D-86135 Augsburg Germany {endrass|rehm|andre}@informatik.uni-augsburg.de
ABSTRACT arises and, as he leaves, thanks his friend for a perfect
In this paper we examine the usage of pauses in speech. evening. (…) Thereby we concentrate on cultural differences with the aim to build a computational model for virtual agents later. By Would the same “conversation” have taken place if Mrs. adapting the agents’ conversation management behavior to Wordsworth and Mrs. Coleridge would have met? Or, if cultural background, we hope to get a better acceptance in a Wordsworth and Coleridge never met before? There are given culture. Therefore we have a closer look at the differences in the usage of silence in speech. But where do occurrence of pauses in speech with their features like they come from? Some are evoked by gender or age, others length or emplacement. To ground our model in empirical by personal relationships. The utilization of pauses also data, we analyzed the occurrences of pauses in speech in varies across cultures. the CUBE-G video corpus, recorded in the two We want to use tendencies about the frequency of pauses in participating cultures Germany and Japan. In a preliminary speech, described in literature and confirmed by our corpus study we observed the number of pauses that occurred in study, to adapt the dialogue model for Embodied videos of approximately five minutes duration. First we conversational agents (ECAs) to a specific cultural model. took into account pauses that lasted for more than 1 second ECAs can be regarded as a special case of multimodal and later only those out of them that lasted for over 2 dynamic interaction systems. They support the idea that seconds. By comparing the two cultures, we found out that humans prefer to interact with an artefact that possesses Japanese subjects used significantly more pauses for both some human-like qualities. In the media equation [15] the lengths than German subjects. authors state that people respond to computers as if they were humans. Thus people might also build up social Author Keywords relationships with virtual agents. To enhance the Embodied conversational agent, Pauses in speech, cross- believability of those agents they could be extended with cultural communication cultural background. Following Hofstede [8] human behavior is dependent from human nature, culture and ACM Classification Keywords personality. Although cultural background plays an H.5.2 [Information interfaces and presentation (e.g., HCI)]: important role in human interaction and virtual agents User Interfaces— interaction styles, Natural language, communicate with the user in a natural way, so far little Theory and methods; effort has been made to integrate cultural differences into technical systems. INTRODUCTION Knapp and Vangelisti [11] examine personal relationships We believe that by realising culture specific dialogue and their impact on interpersonal communication. For management styles for ECAs, their believability could be describing the possibility of deepening a friendship between enhanced. As the usage of pauses in speech is one males by using silence, they cite Roger Rosenblatt, who important aspect in dialogue management we want to have wrote an article for the Time Magazine called “The Silent a closer look at their occurrences to answer the following Friendship of Men”: questions. How often and when do pauses take place? How long do they last? Who breaks the silence? What kind of (…) Older Story: Wordsworth goes to visit Coleridge at his speech acts are followed by pauses and which utterances cottage, walks in, sits down and does not utter a word for are used for start ups? As a starting point we concentrate on three hours. Neither does Coleridge. Wordsworth then the number of pauses in a conversation, namely pauses that analysed. They found out, that long pauses are positively last for more than 1 second and 2 seconds respectively. related with the previous utterance being grounded and that they seem to be an indicator of utterance unit completion. This paper is organized as follows: First we describe some related work where ECAs already use silence in speech Nakanishi and colleagues [13] describe a helper agent that explicitly, although they do not use cultural differences. In plays the role of a party host in a virtual meeting space the next section we give an overview of the usage of pauses where different cultures meet. In this system silence is used in speech and their cultural differences. We then explain to detect conversations that are going badly. When the feasible enhancement for virtual agents, which should also helper agent locates a pause in speech, it directs a series of serve as a basis for further research. Then a preliminary yes/no questions to both conversation partners in order to study is described, where the frequency of pauses in find a topic that is interesting for both. Although the agent conversation is analysed in the two cultures Japanese and is developed to help in intercultural encounters, the length German. In the end of this paper, we discuss our results and of silence that initiates the agent is not adapted to culture. give a foresight to our future work. After analysing their results, the authors state that an adoption to the user´s cultural background would make the RELATED WORK agent more efficient. Although ECAs communicate in a more and more human manner, so far little effort has been made to integrate PAUSES IN SPEECH cultural context, as for example the different usage of According to Clark [3] pauses are powerful cues for what is silence in speech. Pauses in speech do occur in dialog happening in a conversation. To use them as a basis for simulations, but they often arise due to a lack of celerity in analyzing culture specific behavior, we first have to check the speech components and thus appear to be distracting for carefully what purposes pauses may serve in conversations the user. Nevertheless, pauses are used successfully to and how the usage differs across cultures. As we want to handle turn taking in some systems. So far a cultural aspect build a computational model for Germany and Japan, those in the usage of silence has not been taken into account. two cultures are of special interest. Sidner and colleagues [17], developed a model of In [6] Goodwin describes his research on gaze behavior and engagement for a conversational robot, based on an analysis manipulation. According to him gaze is used to manage of human-human conversation. Engagement “is the process turn taking and to signal understanding or attentiveness. If by which two (or more) participants establish, maintain and attention signals of the listener are missing, pauses are used end their perceived connection during interactions they by the speaker to regain attention. In this case the duration jointly undertake”. The appropriate use and correct of the silence is dependent from the nonverbal signals of the interpretation of engagement signals are necessary hearer. prerequisites for the success of an interaction. In particular, Pauses in speech can be used for the following purposes: pauses are used to recognize inattentiveness of the user, which encourages the robot to show engagement behavior. • cognitive processing Pauses in speech are often used for grounding behavior for • control mechanism ECAs. Cassel and colleagues [4] present a Real Estate Agent (REA) that acts in the function of a virtual realtor. In • acceptance / refusal Smalltalk situations she gains information about the users • turn taking preferences in buying a house. In [5] Cassel states that short pauses in speech lead to feedback behavior. Thus, the REA Rochester [16] gives a brief history of studies dealing with agent nods her head or emits a paraverbal (such as filled and silence pauses. During a filled pause, sounds like “Mmhmm”) or a short statement (such as “Okay”) as uhmm and ahhm might occur as well as nonverbal reaction to short pauses in the user’s speech. behaviors like head nods or gestures. In comparison a silent pause is, as the name predicts, silent. Rochester summarizes Nakano and colleagues [14] developed a grounding model the history of researches dealing with pauses in speech for the kiosk agent Mack that provides route descriptions according to three models of the speaker. for a paper map. The agent uses verbal and nonverbal grounding acts to update the state of the dialogue. They In the first model pauses are supposed to reflect the strength state, that pauses influence the choice of following actions. or weakness of verbal habits; the second model enhances the first and constitutes pauses as signalling cognitive Traum and Heeman [19] also consider grounding behavior decisions about both immediate and later speech. Here in dialogues. They examine the co-occurrence between pauses are assumed to stand in a temporally proximal turn-initial grounding acts and utterance unit signals, e.g. relationship to the choices to be made. According to that, prosodic boundary tones and pauses. Silence was divided two particular functions are supported: (a) pauses signal into two groups: short silence (less than half a second) and some word choices, and (b) may reflect decisions at major long silence (longer than half a second). Then correlations constituent boundaries. A third function is the semantic with boundary tones and relatedness markings were decision-making. The matter of content and the function of Netherlands. Temporal phenomena of turn taking, such as pauses for the speaker are examined here. the duration of pauses and overlaps of turns in dialogues were investigated. Pauses were divided into pauses between Until that point, the speaker is simply a language generator turns and pauses between utterances within turns and the which pauses either in the course of normal decision- average pause duration per dialogue was calculated. Their making operations or because of disruptions in those analysis shows that speakers adapt their turn-taking operations. However the speaker can be seen as a behaviour according to the average pause duration in the participant in the social act of speech. According to given conversation. Rochester, “pauses and other phenomena of spontaneous speech should be functionally related to changes in the These results illustrate that people belonging to the same interpersonal situation and/or to changes in the culture adapt their pause behavior in turn taking to each responsiveness of the speaker, given a constant other. But the usage of silence in speech is also a well interpersonal situation”. In his work he examines the known difference between cultures [18]. This might lead to theoretical implications of pause location. In addition, the problems and misunderstandings in intercultural functional significance of pauses is considered in terms of encounters. cognitive, affective-state, and social interaction variables. He found out that two sorts of social interaction variables CULTURE SPECIFIC DIFFERENCES influence pausing in spontaneous speech: Hall, cited in [18] describes high- and low context cultures. In High context (HC) communication little explicitly is • Mediating variables: e.g. changes in the audience encoded and the conversation relies mainly on physical situation and predispositional responsiveness to context. We find HCs in long lasting friendships, where listeners, and conversations are difficult to understand for outsiders. • Control variables: e.g. the number of potential Besides verbal utterances, meaning is transported through speakers and the individual desire to speak. context (e.g social roles or positions), situation, nonverbal clues (e.g. pauses, silence, tone) and cultural information. In his work pauses in speech can either be used as control In contrast low context communication (LC) explicitly code mechanism to control the flow of the conversation, as well information. Therefore clear descriptions, unambiguous as for cognitive processes, as decision making. communication and a high degree of specificity are Another usage of pauses is described in [2], where required. politeness strategies are constituted as an aspect of social The degree of context used in communication is dependent interaction. The authors describe some parallelisms in the on culture. Germany is explicitly named in [7] as one of the linguistic construction of utterances with which people probably lowest context cultures. However Japan like most express themselves in different languages and cultures. One Asian cultures belongs to the high context cultures, where motive of these parallels is isolated – politeness. They claim communication partners are expected to be able to encode the existence of conversational structure sequences and the implicit intent of the verbal message. Hall (1983), cited with it the intentional usage of pauses for politeness in [7] claims that silence serves as a critical communication purposes. Note that a carefully located pause can on the one device in Japanese communication patterns. Pauses reflect hand mean acceptance and on the other hand refusal. In the thoughts of the speaker and can contain strong their example (where A is a man, and W his friend’s new contextual meaning. In European conversations pauses are bride) the silence conveys acceptance: often sensed as unpleasant. Thus we expect people A: Do you sing? belonging to the Japanese culture to use pauses more frequently than Germans. W: (silence) As culture is a rather abstract concept, there are several A: Hooray! Give us a song! attempts building a concrete model. Hofstede [8] explains Whereas silence can also be a polite refusal like in a culture as a dimensional concept. His theory is based on a situation, where A writes to B for a favour and B does not broad empirical survey in which over 20 different cultures reply. were categorized into a five dimensional model. Each dimension contains two extreme sides, for which he clearly Thus, pauses can be used to express refusal or acceptance in defines stereotypical behavior norms. He defines a given a polite way. But the interpretation of the pause remains a culture as a point in a five-dimensional space, according to challenge to the interlocutor the dimensions. Another common use of pauses in conversations is to One of these dimensions is the so called identity dimension initiate turn taking behavior. Louis ten Bosch [1] states that with the two extreme sides individualism and collectivism. turn-taking is one of the basic mechanisms in all types of It defines the degree to which individuals are integrated into dialogues and that it is also a crucial mechanism in human- a group. On the individualist side ties between individuals system interaction. They analysed the turn-taking are loose, and people are expected to take care for mechanism in 93 telephone dialogues recorded in the themselves. On the collectivist side, people are integrated into strong, cohesive in-groups, often extended families communication partners. To control for gender effects, we which continue protecting them in exchange for had a male and a female actor, interacting with the same unquestioning loyalty. number of male and female subjects. According to Hofstede Germany lies on the individualistic side of this dimension, whereas Japan is a collectivistic culture. In [9] he states, that in collectivistic cultures silence may occur in conversations without creating tension. Thus we expect to find more pauses in the Japanese conversations than in German ones, as the later should try to avoid embarrassing situations like silence, whereas the Japanese should not feel uncomfortable. Pauses are used as means of conversation in Japan. But does this not hold for every culture? In [12], Morsbach Figure 1. Figure 1 shows details from the corpus collection in warns not to read too much into the Japanese way of using both cultures, which serves as empirical data for the analysis silence and not to mystify it. He refers to the so called of conversation management described in this paper. “Rare-Zero Differential”, which means that something is rare in one culture, but completely nonexistent in another ANNOTATION and thus taken as typical for the former. He refers to In order to analyse the corpus an annotation using the Anvil phenomena like kimonos or geishas, which tourists visiting Tool [10] was done. First, the video sequences had to be Japan tend to see more often than nationals. But still he transliterated and translated into English language to allow states that in specific situations there are differences in the analysis in both cultures. Figure 2 shows an example with a usage of silence, e.g. mother-child relationships or female German subject. behavior and hiding of feelings. Also he reveals that the Japanese are often regarded as “silent”, whereas westerners With the annotation of speech, we were able to calculate tend to be revered by the Japanese as “verbose”. He agrees, Gaps between speech sequences. Time spans in which that the average Japanese will use more pauses in speech neither the subject nor the actor spoke were automatically than the average American, but additionally he states that computed and saved as pauses. Thus silent pauses and filled there will be overlaps. pauses that were filled by nonverbal clues were observed. To sort out short silences (like those while breathing or EMPIRICAL DATA hesitating) we only observed pauses that last for more than According to the literature overview given above, we one second. In a later analysis we restricted to pauses over 2 hypothesize that in Japanese conversations pauses in speech seconds. Please note that pauses over 2 seconds are also will occur more frequently than in German conversations. included in those that last for more than 1 second. As this To ground our expectations about culture specific dialogue paper only describes a preliminary study, we did not yet management in empirical data, we additionally analysed the take into account the emplacement of pauses, but claim that video corpus of the Cube-G project (CUlture-adaptive further analysis of culture specific usage of pauses in BEhavior Generation for interactions with embodied speech seems to be a promising research field. conversational agents). Therefore around 20 hours of video material were collected in the two participating cultures Japanese and German, with the aim to analyse nonverbal behavior. It is organized as follows. Subjects were told that they take part in a study by a well-known consulting company for the automobile industry. To attract their interest in the study, a monetary reward was granted depending on the outcome. One of the recorded scenes was a first time meeting, which is a variation of the standard first chapter of every language Figure 2. Example annotation with a German subject. textbook. This includes a short introduction and small talk. We told our subjects that they should know each other slightly to be able to solve a task together later. This As a starting point to realize culture specific scenario takes about five minutes for every subject. The communication management behaviors for virtual agents, same design was used in Germany as well as in Japan. 21 we first need to have a closer look at inner cultural subjects (10 female, 11 male) joined the study in Germany communications, to answer the following question: Are the and 26 (13 female, 13 male) in Japan. To ensure that they observed communication management behaviors typical for all meet the same conditions we hired actors as the given culture or do they simply show up because of personality, age or gender? Thus we started with an in- depth analysis for the German samples, to compare the As for the German video samples, we analysed eight impact of gender combinations. As described above the Japanese videos with 4 female and 4 male subjects, where corpus was recorded in all gender constellations. The all gender combinations took place. Like the videos situation in which the conversation takes place also analysed above, the Japanese samples are from the first influences the communication as well as the given time meeting scenario and lasted about five minutes. interlocutor. Therefore we restricted our analysis to one Table 2 shows an overview of the pauses used in the typical scene out of our video corpus. As it was recorded Japanese video recordings. We found 31 pauses on average with students, participants are all in the same age group. that lasted over 1 second and 8,4 pauses on average per Later we compare the German samples with the Japanese video, that lasted for more than 2 seconds. As for the video recordings. German videos, we found no significant difference in the usage of pauses between the genders (t-test with ANALYSIS (p1sec=0,770; p2sec=0,252). Again, different gender As a preliminary study we analysed eight German videos combinations showed no significant results, compared with with four male and four female subjects. To fix as many same gender constellations (p1sec=0,473; p2sec=0,425). conditions as possible, we chose to examine only videos from the first time meeting scenario. All gender combinations were observed, in order to analyse differences in the occurrence of pauses in mixed and same gender combinations respectively. Table 1 shows an overview of the pauses in speech in the German videos. We found 7,1 pauses on average that lasted for more than one second, and only 1.3 pauses on average that lasted for more than 2 seconds in the 8 videos that were all approximately 5 minutes long. A comparison of female and male subjects showed no significant difference in the usage of pauses (t-test), for both pauses, those over 1 second (p=0,748) and 2 seconds (p=0,750). The same holds for pauses in videos with mixed gender combinations compared to those where both Figure 3. The usage of short (left) and long pauses (right) in subjects had the same gender (p1sec=0,795; p2sec=0,578). An speech in the two cultures Germany and Japan. interesting point for further research is that pauses over 2 Interestingly, in the Japanese videos, too, no situation was seconds, which occurred after an utterance spoken by the found where the female conversation partner broke a male conversation partner was never broken by a female. silence that was longer than two seconds, when the male All other combinations of breaking silence took place. communication partner spoke the last utterance. All other These results have to be taken with care, as we only combinations took place. analysed eight video samples for the German culture. Comparing the flow of conversation between the two cultures, the results are promising. As provided in literature, Subject/ the Japanese video samples comprise apparently more Pauses m m m s-f m s-m m m pauses. We found significant differences between the two > 1 sec 14 1 7 4 4 12 12 3 cultures (t-test), for both pauses over 1 second (p<0,001) and 2 seconds (p<0,001) respectively. Figure 3 shows the > 2 sec 2 0 2 1 0 2 2 0 Box plots for short (left) and long pauses (right), where the Table 1. Overview of the pauses in speech in the German video difference in the usage of pauses between the two cultures samples (where m=mixed gender; s=same gender; f=female; is shown graphically. m=male) CONCLUSION AND FUTURE WORK Subject/ s- s- In this paper we gave a brief overview of the usage of Pauses s-f m s-f s-f m m m m pauses in speech and focused on differences caused by > 1 sec 40 20 27 34 26 36 35 30 cultural background. By comparing the two cultures Germany and Japan in a preliminary study, we found > 2 sec 12 4 6 7 10 10 10 8 promising results. Like predicted in literature, Japanese Table 2. Overview of the pauses in speech in the Japanese subjects showed significantly higher numbers of pauses video samples (where m=mixed gender; s=same gender; between speech utterances than German subjects. Thus we f=female; m=male) emphasize this as a promising research field with the aim for integrating cultural differences in embodied conversational agents. Although the results are promising, International and Intercultural Communication. (p. 163- we do not want to declare prototypes, but think we found 185). London: Sage Publications. (1989). interesting tendencies for further exploration. As future 8. Hofstede, G. Cultures Consequences: Comparing work, we need to analyse all videos recorded for the Values, Behaviors, Institutions, and Organizations CUBE-G corpus, in order to strengthen our results. Across Nations. Thousand Oaks, London: Sage Additionally we want to have a closer look at the positions Publications. (2001). where pauses take place, to answer the following question: 9. Hofstede, G. J., Pedersen, P. B., & Hofstede, G. Who breaks the silence? What kind of speech acts are Exploring Culture: Exercises, Stories, and Synthetic followed by pauses and which utterances are used for start Cultures. Yarmouth: Intercultural Press. 2002. ups? Therefore we need to categorise the speech utterances, 10. Kipp, M. Gesture Generation by Imitation – From which also allows an analysis of sequences of speech Human Behavior to Computer Character Animation. utterances that evoke pauses. Universität des Saarlandes, PhD. Thesis. 2003 ACKNOWLEDGMENTS 11. Knapp, M. L., Vangelisti, A. L., Interpersonal The work described in this article is funded by the German Communication and Human Relationships. – 5th ed. Research Foundation (DFG) under research grant RE Pearson Education. 2005 2619/2-1 and the Japan Society for the Promotion of 12. Morsbach, H., The Importance of Silence and Stillness Science (JSPS) under a Grant-in-Aid for Scientific in Japanese Nonverbal Communication: A Cross- Research (C) (19500104). The authors would like to thank Cultural Approach. In Fernando Poyatos(Edt.) Cross- Prof. Toyoaki Nishida and Hung-Hsuan Huang for their Cultural Perspectives in Nonverbal Communication. support collecting the Japanese corpus and Afia Akhter C.J. Hogrefe, 1988. Lipi, Yuji Yamaoka and Franziska Grüneberg for annotating the video samples. 13. Nakanishi, H., Ishida, T., Isbister, K., and Nass, C., Designing a Social Agent for Virtual Meeting Space. In REFERENCES S. Payr & R. Trappl (Eds.), Agent Culture: Human- Agent Interaction in a Multicultural World (p. 245-266). 1. Bosch, ten L., Oostdijk, N., Ruiter, de J. P., Turn-taking London: Lawrence Erlbaum Associates. (2004). in social talk dialogues: temporal, formal and functional 14. Nakano, Y. I., Reinstein, G., Stocky, T., and Cassell, J., aspects. In Proceedings SPECOM 2004. Towards a model of face-to-face grounding. In 2. Brown, P., & Levinson, S. C. (1987). Politeness: Some Proceedings of the Annual Meeting of the Association universals in language use. New York: Cambridge for Computational Linguistics (ACL 2003), pages 553–561, 2003. University Press. 3. Clark, H. H., Using Language. Cambridge, England: 15. Reeves, B., & Nass, C. The Media Equation — How Cambridge University Press. 1996. People Treat Computers, Television, and New Media Like Real People and Places. Cambridge: Cambridge 4. Cassell, J., Embodied conversational interface agents. In University Press. (1996). Communications of the ACM. Vol. 43, No. 4, April 16. Rochester, S. R., The Significance of Pauses in 2000. Spontaneous Speech. In Journal of Psycholinguistic 5. Cassell, J., Nakano, Y., Bickmore, T., Sidner, C. L., and Research, Vol. 2, No. 1, 1973. Rich, C. Non-verbal cues for discourse structure. In Proc. Of the 39th Annual Meeting of the Association for 17. Sidner, C. L., Kidd, C. D., Lee, C., and Lesh, N.. Where to look: a study of human-robot engagement. In IUI ’04: Computational Linguistics (ACL), 2001. Proceedings of the 9th international conference on 6. Charles Goodwin. Conversational Organisation - Intelligent user interface, pages 78–84, New York, NY, Interaction between Speakers and Hearers. New York: USA, 2004. ACM Press. Academic Press, 1981. 18. Ting-Toomey, S. Communicating across cultures. New 7. Hecht, M. L., Andersen, P. A., Ribeau, S. A., The York: The Guilford Press. 1999. Cultural Dimensions of Nonverbal Communication. In Asante, M. K., Gudykunst, W. B., Handbook of 19. Traum, D. and P. Heeman. Utterance Units and Grounding in Spoken Dialogue. in ICSLP. 1996