Abstract
This paper presents a study on mutual speech variation influences in a human-computer setting. The study highlights behavioral patterns in data collected as part of a shadowing experiment, and is performed using a novel end-to-end platform for studying phonetic variation in dialogue. It includes a spoken dialogue system capable of detecting and tracking the state of phonetic features in the user’s speech and adapting accordingly. It provides visual and numeric representations of the changes in real time, offering a high degree of customization, and can be used for simulating or reproducing speech variation scenarios. The replicated experiment presented in this paper along with the analysis of the relationship between the human and non-human interlocutors lays the groundwork for a spoken dialogue system with personalized speaking style, which we expect will improve the naturalness and efficiency of human-computer interaction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
As calculated by the kappa2 command of the irr R package (v0.84), https://cran.r-project.org/package=irr.
References
Bell, L., Gustafson, J., Heldner, M.: Prosodic adaptation in human-computer interaction. In: 15th International Congress of Phonetic Sciences (ICPhS), Barcelona, pp. 2453–2456 (2003). https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/p15_2453.html
Brennan, S.E.: Lexical entrainment in spontaneous dialog. In: International Symposium on Spoken Dialogue (ISSD), Philadelphia, PA, USA, pp. 41–44 (1996)
Carlson, R., Edlund, J., Heldner, M., Hjalmarsson, A., House, D., Skantze, G.: Towards human-like behaviour in spoken dialog systems. In: Swedish Language Technology Conference (SLTC), Gothenburg, Sweden (2006)
Coulston, R., Oviatt, S., Darves, C.: Amplitude convergence in children’s conversational speech with animated personas. In: Interspeech, Denver, CO, USA, pp. 2689–2692 (2002). http://www.isca-speech.org/archive/icslp_2002/i02_2689.html
Edlund, J., Heldner, M., Gustafson, J.: Two faces of spoken dialogue systems. In: Workshop Dialogue on Dialogues: Multidisciplinary Evaluation of Advanced Speech-based Interactive Systems. Pittsburgh, PA (2006)
Gašić, M., Breslin, C., Henderson, M., Kim, D., Szummer, M., Thomson, B., Tsiakoulis, P., Young, S.: On-line policy optimisation of Bayesian spoken dialogue systems via human interaction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, pp. 8367–8371 (2013). https://doi.org/10.1109/ICASSP.2013.6639297
Gessinger, I., Raveh, E., Le Maguer, S., Möbius, B., Steiner, I.: Shadowing synthesized speech - segmental analysis of phonetic convergence. In: Interspeech, Stockholm, Sweden, pp. 3797–3801 (2017). https://doi.org/10.21437/Interspeech.2017-1433
Gessinger, I., Schweitzer, A., Andreeva, B., Raveh, E., Möbius, B., Steiner, I.: Convergence of pitch accents in a shadowing task. In: Speech Prosody, Poznań, Poland, pp. 225–229 (2018). https://doi.org/10.21437/SpeechProsody.2018-46
Kim, M., Horton, W.S., Bradlow, A.R.: Phonetic convergence in spontaneous conversations as a function of interlocutor language distance. Lab. Phonol. 2(1), 125–156 (2011). https://doi.org/10.1515/labphon.2011.004
Levitan, R.: Acoustic-prosodic Entrainment in Human-human and Human-computer Dialogue. Ph.D. thesis, Columbia University, New York, NY, USA (2014). https://doi.org/10.7916/D8GT5KCH
Levitan, R., Beňuš, Š., Gálvez, R.H., Gravano, A., Savoretti, F., Trnka, M., Weise, A., Hirschberg, J.: Implementing acoustic-prosodic entrainment in a conversational avatar. In: Interspeech, San Francisco, CA, USA, pp. 1166–1170 (2016). https://doi.org/10.21437/Interspeech.2016-985
Levitan, R., Hirschberg, J.: Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions. In: Interspeech, Florence, Italy, pp. 3081–3084 (2011). http://www.isca-speech.org/archive/interspeech_2011/i11_3081.html
Lewandowski, N.: Talent in Nonnative Phonetic Convergence. Ph.D. thesis, University of Stuttgart, Stuttgart, Germany (2012). https://doi.org/10.18419/opus-2858
Lison, P., Kennington, C.: Developing spoken dialogue systems with the OpenDial toolkit. In: Workshop on the Semantics and Pragmatics of Dialogue (SemDial), Gothenburg, Sweden, pp. 194–195 (2015)
Lopes, J., Eskenazi, M., Trancoso, I.: Automated two-way entrainment to improve spoken dialog system performance. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, pp. 194–195 (2013). https://doi.org/10.1109/ICASSP.2013.6639298
Michalsky, J., Schoormann, H.: Pitch convergence as an effect of perceived attractiveness and likability. In: Interspeech, Stockholm, Sweden, pp. 2253–2256 (2017). https://doi.org/10.21437/Interspeech.2017-1520
Nenkova, A., Gravano, A., Hirschberg, J.: High frequency word entrainment in spoken dialogue. In: ACL Human Language Technologies (HLT), Columbus, OH, USA, pp. 169–172 (2008) http://aclweb.org/anthology/P08-2043
Oviatt, S., Darves, C., Coulston, R.: Toward adaptive conversational interfaces: modeling speech convergence with animated personas. ACM Trans. Comput. Hum. Interact. 11(3), 300–328 (2004). https://doi.org/10.1145/1017494.1017498
Pardo, J.S.: On phonetic convergence during conversational interaction. J. Acoust. Soc. Am. 119(4), 2382–2393 (2006). https://doi.org/10.1121/1.2178720
Parent, G., Eskenazi, M.: Lexical entrainment of real users in the Let’s Go spoken dialog system. In: Interspeech, Makuhari, Chiba, Japan, pp. 3018–3021 (2010). http://www.isca-speech.org/archive/interspeech_2010/i10_3018.html
Pickering, M.J., Garrod, S.: Toward a mechanistic psychology of dialogue. Behav. Brain Sci. 27(2), 169–190 (2004). https://doi.org/10.1017/S0140525X04000056
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Burges, C.J.C., Schölkopf, B., Smola, A.J. (eds.) Advances in Kernel Methods, pp. 185–208. MIT Press (1999)
Putman, W.B., Street, R.L.: The conception and perception of noncontent speech performance: implications for speech-accommodation theory. Int. J. Sociol. Lang. 1984(46), 97–114 (1984). https://doi.org/10.1515/ijsl.1984.46.97
Raveh, E., Steiner, I.: A phonetic adaptation module for spoken dialogue systems. In: Workshop on the Semantics and Pragmatics of Dialogue (SemDial), Saarbrücken, Germany, pp. 162–163 (2017)
Raveh, E., Steiner, I., Möbius, B.: A computational model for phonetically responsive spoken dialogue systems. In: Interspeech, Stockholm, Sweden, pp. 884–888 (2017). https://doi.org/10.21437/Interspeech.2017-1042
Schweitzer, A., Walsh, M.: Exemplar dynamics in phonetic convergence of speech rate. In: Interspeech, San Francisco, CA, USA, pp. 2100–2104 (2016). https://doi.org/10.21437/Interspeech.2016-373
Walker, A., Campbell-Kibler, K.: Repeat what after whom? exploring variable selectivity in a cross-dialectal shadowing task. Front. Psychol. 6(546), 1–18 (2015). https://doi.org/10.3389/fpsyg.2015.00546
Acknowledgments
Funded by the German Research Foundation (DFG) under grants STE 2363/1 and MO 597/6.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Raveh, E., Steiner, I., Gessinger, I., Möbius, B. (2018). Studying Mutual Phonetic Influence with a Web-Based Spoken Dialogue System. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_57
Download citation
DOI: https://doi.org/10.1007/978-3-319-99579-3_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99578-6
Online ISBN: 978-3-319-99579-3
eBook Packages: Computer ScienceComputer Science (R0)