Studying Mutual Phonetic Influence with a Web-Based Spoken Dialogue System

Raveh, Eran; Steiner, Ingmar; Gessinger, Iona; Möbius, Bernd

doi:10.1007/978-3-319-99579-3_57

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11096))

Included in the following conference series:

International Conference on Speech and Computer

1518 Accesses
1 Citations
1 Altmetric

Abstract

This paper presents a study on mutual speech variation influences in a human-computer setting. The study highlights behavioral patterns in data collected as part of a shadowing experiment, and is performed using a novel end-to-end platform for studying phonetic variation in dialogue. It includes a spoken dialogue system capable of detecting and tracking the state of phonetic features in the user’s speech and adapting accordingly. It provides visual and numeric representations of the changes in real time, offering a high degree of customization, and can be used for simulating or reproducing speech variation scenarios. The replicated experiment presented in this paper along with the analysis of the relationship between the human and non-human interlocutors lays the groundwork for a spoken dialogue system with personalized speaking style, which we expect will improve the naturalness and efficiency of human-computer interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

The Harmonia Corpus – A Dialogue Corpus for Automatic Analysis of Phonetic Convergence

Effect of Speech Entrainment in Human-Computer Conversation: A Review

User Generated Dialogue Systems: uDialogue

Notes

1.
As calculated by the kappa2 command of the irr R package (v0.84), https://cran.r-project.org/package=irr.

References

Bell, L., Gustafson, J., Heldner, M.: Prosodic adaptation in human-computer interaction. In: 15th International Congress of Phonetic Sciences (ICPhS), Barcelona, pp. 2453–2456 (2003). https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/p15_2453.html
Brennan, S.E.: Lexical entrainment in spontaneous dialog. In: International Symposium on Spoken Dialogue (ISSD), Philadelphia, PA, USA, pp. 41–44 (1996)
Google Scholar
Carlson, R., Edlund, J., Heldner, M., Hjalmarsson, A., House, D., Skantze, G.: Towards human-like behaviour in spoken dialog systems. In: Swedish Language Technology Conference (SLTC), Gothenburg, Sweden (2006)
Google Scholar
Coulston, R., Oviatt, S., Darves, C.: Amplitude convergence in children’s conversational speech with animated personas. In: Interspeech, Denver, CO, USA, pp. 2689–2692 (2002). http://www.isca-speech.org/archive/icslp_2002/i02_2689.html
Edlund, J., Heldner, M., Gustafson, J.: Two faces of spoken dialogue systems. In: Workshop Dialogue on Dialogues: Multidisciplinary Evaluation of Advanced Speech-based Interactive Systems. Pittsburgh, PA (2006)
Google Scholar
Gašić, M., Breslin, C., Henderson, M., Kim, D., Szummer, M., Thomson, B., Tsiakoulis, P., Young, S.: On-line policy optimisation of Bayesian spoken dialogue systems via human interaction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, pp. 8367–8371 (2013). https://doi.org/10.1109/ICASSP.2013.6639297
Gessinger, I., Raveh, E., Le Maguer, S., Möbius, B., Steiner, I.: Shadowing synthesized speech - segmental analysis of phonetic convergence. In: Interspeech, Stockholm, Sweden, pp. 3797–3801 (2017). https://doi.org/10.21437/Interspeech.2017-1433
Gessinger, I., Schweitzer, A., Andreeva, B., Raveh, E., Möbius, B., Steiner, I.: Convergence of pitch accents in a shadowing task. In: Speech Prosody, Poznań, Poland, pp. 225–229 (2018). https://doi.org/10.21437/SpeechProsody.2018-46
Kim, M., Horton, W.S., Bradlow, A.R.: Phonetic convergence in spontaneous conversations as a function of interlocutor language distance. Lab. Phonol. 2(1), 125–156 (2011). https://doi.org/10.1515/labphon.2011.004
Article Google Scholar
Levitan, R.: Acoustic-prosodic Entrainment in Human-human and Human-computer Dialogue. Ph.D. thesis, Columbia University, New York, NY, USA (2014). https://doi.org/10.7916/D8GT5KCH
Levitan, R., Beňuš, Š., Gálvez, R.H., Gravano, A., Savoretti, F., Trnka, M., Weise, A., Hirschberg, J.: Implementing acoustic-prosodic entrainment in a conversational avatar. In: Interspeech, San Francisco, CA, USA, pp. 1166–1170 (2016). https://doi.org/10.21437/Interspeech.2016-985
Levitan, R., Hirschberg, J.: Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions. In: Interspeech, Florence, Italy, pp. 3081–3084 (2011). http://www.isca-speech.org/archive/interspeech_2011/i11_3081.html
Lewandowski, N.: Talent in Nonnative Phonetic Convergence. Ph.D. thesis, University of Stuttgart, Stuttgart, Germany (2012). https://doi.org/10.18419/opus-2858
Lison, P., Kennington, C.: Developing spoken dialogue systems with the OpenDial toolkit. In: Workshop on the Semantics and Pragmatics of Dialogue (SemDial), Gothenburg, Sweden, pp. 194–195 (2015)
Google Scholar
Lopes, J., Eskenazi, M., Trancoso, I.: Automated two-way entrainment to improve spoken dialog system performance. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, pp. 194–195 (2013). https://doi.org/10.1109/ICASSP.2013.6639298
Michalsky, J., Schoormann, H.: Pitch convergence as an effect of perceived attractiveness and likability. In: Interspeech, Stockholm, Sweden, pp. 2253–2256 (2017). https://doi.org/10.21437/Interspeech.2017-1520
Nenkova, A., Gravano, A., Hirschberg, J.: High frequency word entrainment in spoken dialogue. In: ACL Human Language Technologies (HLT), Columbus, OH, USA, pp. 169–172 (2008) http://aclweb.org/anthology/P08-2043
Oviatt, S., Darves, C., Coulston, R.: Toward adaptive conversational interfaces: modeling speech convergence with animated personas. ACM Trans. Comput. Hum. Interact. 11(3), 300–328 (2004). https://doi.org/10.1145/1017494.1017498
Article Google Scholar
Pardo, J.S.: On phonetic convergence during conversational interaction. J. Acoust. Soc. Am. 119(4), 2382–2393 (2006). https://doi.org/10.1121/1.2178720
Article Google Scholar
Parent, G., Eskenazi, M.: Lexical entrainment of real users in the Let’s Go spoken dialog system. In: Interspeech, Makuhari, Chiba, Japan, pp. 3018–3021 (2010). http://www.isca-speech.org/archive/interspeech_2010/i10_3018.html
Pickering, M.J., Garrod, S.: Toward a mechanistic psychology of dialogue. Behav. Brain Sci. 27(2), 169–190 (2004). https://doi.org/10.1017/S0140525X04000056
Article Google Scholar
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Burges, C.J.C., Schölkopf, B., Smola, A.J. (eds.) Advances in Kernel Methods, pp. 185–208. MIT Press (1999)
Google Scholar
Putman, W.B., Street, R.L.: The conception and perception of noncontent speech performance: implications for speech-accommodation theory. Int. J. Sociol. Lang. 1984(46), 97–114 (1984). https://doi.org/10.1515/ijsl.1984.46.97
Article Google Scholar
Raveh, E., Steiner, I.: A phonetic adaptation module for spoken dialogue systems. In: Workshop on the Semantics and Pragmatics of Dialogue (SemDial), Saarbrücken, Germany, pp. 162–163 (2017)
Google Scholar
Raveh, E., Steiner, I., Möbius, B.: A computational model for phonetically responsive spoken dialogue systems. In: Interspeech, Stockholm, Sweden, pp. 884–888 (2017). https://doi.org/10.21437/Interspeech.2017-1042
Schweitzer, A., Walsh, M.: Exemplar dynamics in phonetic convergence of speech rate. In: Interspeech, San Francisco, CA, USA, pp. 2100–2104 (2016). https://doi.org/10.21437/Interspeech.2016-373
Walker, A., Campbell-Kibler, K.: Repeat what after whom? exploring variable selectivity in a cross-dialectal shadowing task. Front. Psychol. 6(546), 1–18 (2015). https://doi.org/10.3389/fpsyg.2015.00546
Article Google Scholar

Download references

Acknowledgments

Funded by the German Research Foundation (DFG) under grants STE 2363/1 and MO 597/6.

Author information

Authors and Affiliations

Language Science and Technology, Saarland University, Saarbrücken, Germany
Eran Raveh, Ingmar Steiner, Iona Gessinger & Bernd Möbius
Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany
Eran Raveh, Ingmar Steiner & Iona Gessinger
German Research Center for Artificial Intelligence (DFKI GmbH), Saarbrücken, Germany
Ingmar Steiner

Authors

Eran Raveh
View author publications
You can also search for this author in PubMed Google Scholar
Ingmar Steiner
View author publications
You can also search for this author in PubMed Google Scholar
Iona Gessinger
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Möbius
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eran Raveh .

Editor information

Editors and Affiliations

SPIIRAS, St. Petersburg, Russia
Alexey Karpov
Leipzig University of Telecommunications, Leipzig, Germany
Oliver Jokisch
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raveh, E., Steiner, I., Gessinger, I., Möbius, B. (2018). Studying Mutual Phonetic Influence with a Web-Based Spoken Dialogue System. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_57

Download citation

DOI: https://doi.org/10.1007/978-3-319-99579-3_57
Published: 25 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics