Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Studying Mutual Phonetic Influence with a Web-Based Spoken Dialogue System

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2018)

Abstract

This paper presents a study on mutual speech variation influences in a human-computer setting. The study highlights behavioral patterns in data collected as part of a shadowing experiment, and is performed using a novel end-to-end platform for studying phonetic variation in dialogue. It includes a spoken dialogue system capable of detecting and tracking the state of phonetic features in the user’s speech and adapting accordingly. It provides visual and numeric representations of the changes in real time, offering a high degree of customization, and can be used for simulating or reproducing speech variation scenarios. The replicated experiment presented in this paper along with the analysis of the relationship between the human and non-human interlocutors lays the groundwork for a spoken dialogue system with personalized speaking style, which we expect will improve the naturalness and efficiency of human-computer interaction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    As calculated by the kappa2 command of the irr R package (v0.84), https://cran.r-project.org/package=irr.

References

  1. Bell, L., Gustafson, J., Heldner, M.: Prosodic adaptation in human-computer interaction. In: 15th International Congress of Phonetic Sciences (ICPhS), Barcelona, pp. 2453–2456 (2003). https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/p15_2453.html

  2. Brennan, S.E.: Lexical entrainment in spontaneous dialog. In: International Symposium on Spoken Dialogue (ISSD), Philadelphia, PA, USA, pp. 41–44 (1996)

    Google Scholar 

  3. Carlson, R., Edlund, J., Heldner, M., Hjalmarsson, A., House, D., Skantze, G.: Towards human-like behaviour in spoken dialog systems. In: Swedish Language Technology Conference (SLTC), Gothenburg, Sweden (2006)

    Google Scholar 

  4. Coulston, R., Oviatt, S., Darves, C.: Amplitude convergence in children’s conversational speech with animated personas. In: Interspeech, Denver, CO, USA, pp. 2689–2692 (2002). http://www.isca-speech.org/archive/icslp_2002/i02_2689.html

  5. Edlund, J., Heldner, M., Gustafson, J.: Two faces of spoken dialogue systems. In: Workshop Dialogue on Dialogues: Multidisciplinary Evaluation of Advanced Speech-based Interactive Systems. Pittsburgh, PA (2006)

    Google Scholar 

  6. Gašić, M., Breslin, C., Henderson, M., Kim, D., Szummer, M., Thomson, B., Tsiakoulis, P., Young, S.: On-line policy optimisation of Bayesian spoken dialogue systems via human interaction. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, pp. 8367–8371 (2013). https://doi.org/10.1109/ICASSP.2013.6639297

  7. Gessinger, I., Raveh, E., Le Maguer, S., Möbius, B., Steiner, I.: Shadowing synthesized speech - segmental analysis of phonetic convergence. In: Interspeech, Stockholm, Sweden, pp. 3797–3801 (2017). https://doi.org/10.21437/Interspeech.2017-1433

  8. Gessinger, I., Schweitzer, A., Andreeva, B., Raveh, E., Möbius, B., Steiner, I.: Convergence of pitch accents in a shadowing task. In: Speech Prosody, Poznań, Poland, pp. 225–229 (2018). https://doi.org/10.21437/SpeechProsody.2018-46

  9. Kim, M., Horton, W.S., Bradlow, A.R.: Phonetic convergence in spontaneous conversations as a function of interlocutor language distance. Lab. Phonol. 2(1), 125–156 (2011). https://doi.org/10.1515/labphon.2011.004

    Article  Google Scholar 

  10. Levitan, R.: Acoustic-prosodic Entrainment in Human-human and Human-computer Dialogue. Ph.D. thesis, Columbia University, New York, NY, USA (2014). https://doi.org/10.7916/D8GT5KCH

  11. Levitan, R., Beňuš, Š., Gálvez, R.H., Gravano, A., Savoretti, F., Trnka, M., Weise, A., Hirschberg, J.: Implementing acoustic-prosodic entrainment in a conversational avatar. In: Interspeech, San Francisco, CA, USA, pp. 1166–1170 (2016). https://doi.org/10.21437/Interspeech.2016-985

  12. Levitan, R., Hirschberg, J.: Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions. In: Interspeech, Florence, Italy, pp. 3081–3084 (2011). http://www.isca-speech.org/archive/interspeech_2011/i11_3081.html

  13. Lewandowski, N.: Talent in Nonnative Phonetic Convergence. Ph.D. thesis, University of Stuttgart, Stuttgart, Germany (2012). https://doi.org/10.18419/opus-2858

  14. Lison, P., Kennington, C.: Developing spoken dialogue systems with the OpenDial toolkit. In: Workshop on the Semantics and Pragmatics of Dialogue (SemDial), Gothenburg, Sweden, pp. 194–195 (2015)

    Google Scholar 

  15. Lopes, J., Eskenazi, M., Trancoso, I.: Automated two-way entrainment to improve spoken dialog system performance. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, pp. 194–195 (2013). https://doi.org/10.1109/ICASSP.2013.6639298

  16. Michalsky, J., Schoormann, H.: Pitch convergence as an effect of perceived attractiveness and likability. In: Interspeech, Stockholm, Sweden, pp. 2253–2256 (2017). https://doi.org/10.21437/Interspeech.2017-1520

  17. Nenkova, A., Gravano, A., Hirschberg, J.: High frequency word entrainment in spoken dialogue. In: ACL Human Language Technologies (HLT), Columbus, OH, USA, pp. 169–172 (2008) http://aclweb.org/anthology/P08-2043

  18. Oviatt, S., Darves, C., Coulston, R.: Toward adaptive conversational interfaces: modeling speech convergence with animated personas. ACM Trans. Comput. Hum. Interact. 11(3), 300–328 (2004). https://doi.org/10.1145/1017494.1017498

    Article  Google Scholar 

  19. Pardo, J.S.: On phonetic convergence during conversational interaction. J. Acoust. Soc. Am. 119(4), 2382–2393 (2006). https://doi.org/10.1121/1.2178720

    Article  Google Scholar 

  20. Parent, G., Eskenazi, M.: Lexical entrainment of real users in the Let’s Go spoken dialog system. In: Interspeech, Makuhari, Chiba, Japan, pp. 3018–3021 (2010). http://www.isca-speech.org/archive/interspeech_2010/i10_3018.html

  21. Pickering, M.J., Garrod, S.: Toward a mechanistic psychology of dialogue. Behav. Brain Sci. 27(2), 169–190 (2004). https://doi.org/10.1017/S0140525X04000056

    Article  Google Scholar 

  22. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Burges, C.J.C., Schölkopf, B., Smola, A.J. (eds.) Advances in Kernel Methods, pp. 185–208. MIT Press (1999)

    Google Scholar 

  23. Putman, W.B., Street, R.L.: The conception and perception of noncontent speech performance: implications for speech-accommodation theory. Int. J. Sociol. Lang. 1984(46), 97–114 (1984). https://doi.org/10.1515/ijsl.1984.46.97

    Article  Google Scholar 

  24. Raveh, E., Steiner, I.: A phonetic adaptation module for spoken dialogue systems. In: Workshop on the Semantics and Pragmatics of Dialogue (SemDial), Saarbrücken, Germany, pp. 162–163 (2017)

    Google Scholar 

  25. Raveh, E., Steiner, I., Möbius, B.: A computational model for phonetically responsive spoken dialogue systems. In: Interspeech, Stockholm, Sweden, pp. 884–888 (2017). https://doi.org/10.21437/Interspeech.2017-1042

  26. Schweitzer, A., Walsh, M.: Exemplar dynamics in phonetic convergence of speech rate. In: Interspeech, San Francisco, CA, USA, pp. 2100–2104 (2016). https://doi.org/10.21437/Interspeech.2016-373

  27. Walker, A., Campbell-Kibler, K.: Repeat what after whom? exploring variable selectivity in a cross-dialectal shadowing task. Front. Psychol. 6(546), 1–18 (2015). https://doi.org/10.3389/fpsyg.2015.00546

    Article  Google Scholar 

Download references

Acknowledgments

Funded by the German Research Foundation (DFG) under grants STE 2363/1 and MO 597/6.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eran Raveh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Raveh, E., Steiner, I., Gessinger, I., Möbius, B. (2018). Studying Mutual Phonetic Influence with a Web-Based Spoken Dialogue System. In: Karpov, A., Jokisch, O., Potapova, R. (eds) Speech and Computer. SPECOM 2018. Lecture Notes in Computer Science(), vol 11096. Springer, Cham. https://doi.org/10.1007/978-3-319-99579-3_57

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99579-3_57

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99578-6

  • Online ISBN: 978-3-319-99579-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics