Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3405755.3406120acmotherconferencesArticle/Chapter ViewAbstractPublication PagescuiConference Proceedingsconference-collections
research-article

Persuasive Synthetic Speech: Voice Perception and User Behaviour

Published: 22 July 2020 Publication History

Abstract

Previous research indicates that synthetic speech can be as persuasive as human speech. However, there is a lack of empirical validation on interactive goal-oriented tasks. In our two-stage study (online listening test and lab evaluation), we compared participants' perception of the persuasiveness of synthetic voices created from speech in a debating style vs. speech from audio-books. Participants interacted with our Conversational Agent (CA) to complete 4 flight-booking tasks and were asked to evaluate the voice, message and perceived personal qualities. We found that participants who interacted with the CA using the voice created from debating style speech rated it as significantly more truthful and more involved than the CA using the audio-book-based voice. However, there was no difference in how frequently each group followed the CA's recommendations. We hope our investigation will provoke discussion about the impact of different synthetic voices on users' perceptions of CAs in goal-oriented tasks.

References

[1]
Elisabeth André, Thomas Rist, Susanne Van Mulken, Martin Klesen, and Stefan Baldes. 2000. The automated design of believable dialogues for animated presentation teams. Embodied conversational agents (2000), 220--255.
[2]
Alan Baddeley. 1992. Working memory. Science 255, 5044 (1992), 556--559.
[3]
Pascal Belin, Bibi Boehme, and Phil McAleer. 2017. The sound of trustworthiness: Acoustic-based modulation of perceived voice personality. PloS one 12, 10 (2017), e0185651. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0185651
[4]
James V Bradley. 1958. Complete counterbalancing of immediate sequential effects in a Latin square design. J. Amer. Statist. Assoc. 53, 282 (1958), 525--528.
[5]
Nick Campbell. 2008. Expressive/affective speech synthesis. In Springer Handbook of Speech Processing. Springer, 505--518.
[6]
Tim Capes, Paul Coles, Alistair Conkie, Ladan Golipour, Abie Hadjitarkhani, Qiong Hu, Nancy Huddleston, Melvyn Hunt, Jiangchuan Li, Matthias Neeracher, et al. 2017. Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System. In INTERSPEECH. 4011--4015.
[7]
Jean-Charles Chebat, Kamel El Hedhli, Claire Gélinas-Chebat, and Robert Boivin. 2007. Voice and persuasion in a banking telemarketing context. Perceptual and motor skills 104, 2 (2007), 419--437. https://doi.org/10.2466/pms.104.2.419--437
[8]
Robert AJ Clark, Korin Richmond, and Simon King. 2004. Festival 2-build your own general purpose unit selection speech synthesiser. (2004). https://tinyurl.com/TTSfestival
[9]
J Cohen. 1988. Statistical power analysis for the behavioral sciences, Stat. Power Anal. Behav. Sci 567 (1988).
[10]
Nils Dahlbäck, Arne Jönsson, and Lars Ahrenberg. 1993. Wizard of Oz studies - why and how. Knowledge-based systems 6, 4 (1993), 258--266. https://doi.org/10.1016/0950--7051(93)90017-N
[11]
Mateusz Dubiel. 2018. Towards human-like conversational search systems. In Proceedings of the 2018 Conference on Human Information Interaction & Retrieval. 348--350.
[12]
Mateusz Dubiel, Martin Halvey, Leif Azzopardi, and Sylvain Daronnat. 2018. Investigating how conversational search agents affect user's behaviour, performance and search experience. In The Second International Workshop on Conversational Approaches to Information Retrieval. https://tinyurl.com/sigir-cair
[13]
DR Feinberg, BC Jones, LM DeBruine, JJM O'Connor, CC Tigue, and DJ Borak. 2011. Integrating fundamental and formant frequencies in women's preferences for men's voices. Behavioral Ecology 22, 6 (2011), 1320--1325. https://doi.org/10.1093/beheco/arr134
[14]
David R Feinberg, Lisa M DeBruine, Benedict C Jones, and David I Perrett. 2008. The role of femininity and averageness of voice pitch in aesthetic judgments of women's voices. Perception 37, 4 (2008), 615--623. https://doi.org/10.1068/p5514
[15]
David R Feinberg, Benedict C Jones, Anthony C Little, D Michael Burt, and David I Perrett. 2005. Manipulations of fundamental and formant frequencies influence the attractiveness of human male voices. Animal behaviour 69, 3 (2005), 561--568. https://doi.org/10.1016/j.anbehav.2004.06.012
[16]
Kerstin Fischer. 2004. Expressive Speech Characteristics in the Communication with Artificial Agents. In Proceedings of the AISB 2004 Convention. The Society for the Study of Artificial Intelligence and the Simulation of Behaviour, 1--11. https://tinyurl.com/communicating-artificialagents
[17]
Vincent Gaudissart, Silvio Ferreira, Céline Thillou, and Bernard Gosselin. 2004. SYPOLE: mobile reading assistant for blind people. In 9th Conference Speech and Computer.
[18]
Andrew Gibiansky, Sercan Arik, Gregory Diamos, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, and Yanqi Zhou. 2017. Deep voice 2: Multi-speaker neural text-to-speech. In Advances in neural information processing systems. 2962--2970.
[19]
Thomas F Gordon. 1993. The pleadings game. Artificial Intelligence and Law 2, 4 (1993), 239--292.
[20]
Katherine Hamilton, Shin-I Shih, and Susan Mohammed. 2016. The development and validation of the rational and intuitive decision styles scale. Journal of personality assessment 98, 5 (2016), 523--535.
[21]
Vasilis Karaiskos, Simon King, Robert AJ Clark, and Catherine Mayo. 2008. The blizzard challenge 2008. In Proc. Blizzard Challenge Workshop, Brisbane, Australia.
[22]
Kayak. 2020. (Amazon Echo application software). https://www.amazon.co.uk/KAYAK/dp/B01EILLOXI
[23]
Sandra M Ketrow. 1990. Attributes of a telemarketer's voice and persuasiveness. A review and synthesis of the literature. Journal of Direct Marketing 4, 3 (1990), 7--21. https://doi.org/10.1002/dir.4000040304
[24]
Nicole Kobie. 2018. Google's new voice is as good as your own. New scientist 3159 (2018), 9.
[25]
Javier Latorre, Kayoko Yanagisawa, Vincent Wan, Bala Krishna Kolluru, and Mark J.F. Gales. 2014. Speech intonation for TTS: Study on evaluation methodology. Singapore.
[26]
Kevin Lenzo. [n. d.]. The Carnegie Mellon University pronouncing dictionary.
[27]
Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50--60.
[28]
Phil McAleer, Alexander Todorov, and Pascal Belin. 2014. How do you say 'Hello'? Personality impressions from brief novel voices. PloS one 9, 3 (2014), e90779. https://doi.org/10.1371/journal.pone.0090779
[29]
Joseph Mendelson and Matthew P Aylett. 2017. Beyond the Listening Test: An Interactive Approach to TTS Evaluation. In INTERSPEECH. 249--253.
[30]
Shachar Mirkin, Michal Jacovi, Tamar Lavee, Hong-Kwang Kuo, Samuel Thomas, Leslie Sager, Lili Kotlerman, Elad Venezian, and Noam Slonim. 2017. A Recorded Debating Dataset. arXiv preprint arXiv:1709.06438 (2017).
[31]
Michel Nienhuis. 2009. Prosodic Correlates of Rhetorical Appeal. (2009). https://tinyurl.com/RethoricalAppeal
[32]
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).
[33]
Jon Rogers, Loraine Clarke, Martin Skelly, Nick Taylor, Pete Thomas, Michelle Thorne, Solana Larsen, Katarzyna Odrozek, Julia Kloiber, Peter Bihr, et al. 2019. Our Friends Electric: Reflections on Advocacy and Design Research for the Voice Enabled Internet. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 114. https://doi.org/10.1145/3290605.3300344
[34]
Francine Rosselli, John J Skelly, and Diane M Mackie. 1995. Processing rational and emotional messages: The cognitive and affective mediation of persuasion. Journal of experimental social psychology 31, 2 (1995), 163--190. https://doi.org/10.1006/jesp.1995.1008
[35]
Klaus R Scherer, Tom Johnstone, and Gundrun Klasmeyer. 2003. Vocal expression of emotion. Handbook of affective sciences (2003), 433--456.
[36]
Annett Schirmer, Yenju Feng, Antarika Sen, and Trevor B Penney. 2019. Angry, old, male--and trustworthy? How expressive and person voice characteristics R@shape listener trust. PloS one 14, 1 (2019), e0210555. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0210555
[37]
Marc Schröder. 2009. Expressive speech synthesis: Past, present, and possible futures. In Affective information processing. Springer, 111--126.
[38]
Skyscanner Flight Search. 2018. (Amazon Echo application software). https://www.skyscanner.com/tips-and-inspiration/features/amazon-echo-and-skyscanner-a-match-made-in-travel-heaven
[39]
Steven E Stern and John W Mullennix. 2004. Sex differences in persuadability of human and computer-synthesized speech: meta-analysis of seven studies. Psychological reports 94, 3_suppl (2004), 1283--1292. https://doi.org/10.2466/pr0.94.3c.1283--1292
[40]
Steven E Stern, John W Mullennix, Corrie-lynn Dyson, and Stephen J Wilson. 1999. The persuasiveness of synthetic speech versus human speech. Human Factors 41, 4 (1999), 588--595. https://doi.org/10.1518/001872099779656680
[41]
Steven E Stern, John W Mullennix, and Ilya Yaroslavsky. 2006. Persuasion and social perception of human vs. synthetic voice across person as source and computer as source conditions. International Journal of Human-Computer Studies 64, 1 (2006), 43--52. https://doi.org/10.1016/j.ijhcs.2005.07.002
[42]
Eva Strangert and Joakim Gustafson. 2008. Improving speaker skill in a resynthesis experiment. (2008). http://www.diva-portal.org/smash/get/diva2:149676/FULLTEXT01.pdf
[43]
Eva Strangert and Joakim Gustafson. 2008. What makes a good speaker? subject ratings, acoustic measurements and perceptual evaluations. In Ninth Annual Conference of the International Speech Communication Association.
[44]
Hideyuki Tachibana, Katsuya Uenoyama, and Shunsuke Aihara. 2018. Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4784--4788. https://arxiv.org/abs/1710.08969
[45]
Merle W Tate and Sara M Brown. 1970. Note on the Cochran Q test. J. Amer. Statist. Assoc. 65, 329 (1970), 155--160.
[46]
Paul Taylor. 2009. Text-to-speech synthesis. Cambridge university press.
[47]
Alexandra Vtyurina and Adam Fourney. 2018. Exploring the role of conversational cues in guided task support with virtual assistants. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 208. https://doi.org/10.1145/3173574.3173782
[48]
Mirjam Wester, Cassia Valentini-Botinhao, and Gustav Eje Henter. 2015. Are we using enough listeners? No! An empirically-supported critique of interspeech 2014 TTS evaluations. Dresden, Germany.
[49]
Andi Winterboer and Johanna D Moore. 2007. Evaluating information presentation strategies for spoken recommendations. In Proceedings of the 2007 ACM conference on Recommender systems. ACM, 157--160.
[50]
Junichi Yamagishi, Christophe Veaux, Simon King, and Steve Renals. 2012. Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction. Acoustical Science and Technology 33, 1 (2012), 1--5.
[51]
Tangming Yuan, David Moore, and Alec Grierson. 2008. A human-computer dialogue system for educational debate: A computational dialectics approach. International Journal of Artificial Intelligence in Education 18, 1 (2008), 3--26.
[52]
Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J Weiss, Ye Jia, Zhifeng Chen, and Yonghui Wu. 2019. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech. arXiv preprint arXiv:1904.02882 (2019). https://arxiv.org/abs/1904.02882
[53]
Heiga Zen, Keiichi Tokuda, and Alan W Black. 2009. Statistical parametric speech synthesis. speech communication 51, 11 (2009), 1039--1064.

Cited By

View all
  • (2024)Cooking with Conversation: Enhancing User Engagement and Learning with a Knowledge-Enhancing AssistantACM Transactions on Information Systems10.1145/364950042:5(1-29)Online publication date: 29-Apr-2024
  • (2024)Voicecraft: Designing Task-specific Voice Assistant PersonasProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3670000(1-3)Online publication date: 8-Jul-2024
  • (2024)The Impact of Perceived Tone, Age, and Gender on Voice Assistant Persuasiveness in the Context of Product RecommendationsProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665545(1-15)Online publication date: 8-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CUI '20: Proceedings of the 2nd Conference on Conversational User Interfaces
July 2020
271 pages
ISBN:9781450375443
DOI:10.1145/3405755
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Speech Perception
  2. Speech Synthesis
  3. User Behaviour

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CUI '20

Acceptance Rates

CUI '20 Paper Acceptance Rate 13 of 39 submissions, 33%;
Overall Acceptance Rate 34 of 100 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)108
  • Downloads (Last 6 weeks)24
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Cooking with Conversation: Enhancing User Engagement and Learning with a Knowledge-Enhancing AssistantACM Transactions on Information Systems10.1145/364950042:5(1-29)Online publication date: 29-Apr-2024
  • (2024)Voicecraft: Designing Task-specific Voice Assistant PersonasProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3670000(1-3)Online publication date: 8-Jul-2024
  • (2024)The Impact of Perceived Tone, Age, and Gender on Voice Assistant Persuasiveness in the Context of Product RecommendationsProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665545(1-15)Online publication date: 8-Jul-2024
  • (2024)Examining Humanness as a Metaphor to Design Voice User InterfacesProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665535(1-15)Online publication date: 8-Jul-2024
  • (2024)Speech as Interactive Design Material (SIDM): How to design and evaluate task-tailored synthetic voices?Companion Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640544.3645258(131-133)Online publication date: 18-Mar-2024
  • (2024)Impact of Voice Fidelity on Decision Making: A Potential Dark Pattern?Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645202(181-194)Online publication date: 18-Mar-2024
  • (2024)CUI@CHI 2024: Building Trust in CUIs—From Design to DeploymentExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3636287(1-7)Online publication date: 11-May-2024
  • (2023)The Bot on Speaking Terms: The Effects of Conversation Architecture on Perceptions of Conversational AgentsProceedings of the 5th International Conference on Conversational User Interfaces10.1145/3571884.3597139(1-16)Online publication date: 19-Jul-2023
  • (2023)"Begin with the End in Mind": Incorporating UX Evaluation Metrics into Design Materials of Participatory DesignExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3585664(1-7)Online publication date: 19-Apr-2023
  • (2022)Conversational Agents Trust CalibrationProceedings of the 4th Conference on Conversational User Interfaces10.1145/3543829.3544518(1-6)Online publication date: 26-Jul-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media