research-article

Persuasive Synthetic Speech: Voice Perception and User Behaviour

Authors:

Mateusz Dubiel,

Pilar Oplustil Gallegos,

Simon KingAuthors Info & Claims

CUI '20: Proceedings of the 2nd Conference on Conversational User Interfaces

Article No.: 6, Pages 1 - 9

https://doi.org/10.1145/3405755.3406120

Published: 22 July 2020 Publication History

Abstract

Previous research indicates that synthetic speech can be as persuasive as human speech. However, there is a lack of empirical validation on interactive goal-oriented tasks. In our two-stage study (online listening test and lab evaluation), we compared participants' perception of the persuasiveness of synthetic voices created from speech in a debating style vs. speech from audio-books. Participants interacted with our Conversational Agent (CA) to complete 4 flight-booking tasks and were asked to evaluate the voice, message and perceived personal qualities. We found that participants who interacted with the CA using the voice created from debating style speech rated it as significantly more truthful and more involved than the CA using the audio-book-based voice. However, there was no difference in how frequently each group followed the CA's recommendations. We hope our investigation will provoke discussion about the impact of different synthetic voices on users' perceptions of CAs in goal-oriented tasks.

References

[1]

Elisabeth André, Thomas Rist, Susanne Van Mulken, Martin Klesen, and Stefan Baldes. 2000. The automated design of believable dialogues for animated presentation teams. Embodied conversational agents (2000), 220--255.

[2]

Alan Baddeley. 1992. Working memory. Science 255, 5044 (1992), 556--559.

[3]

Pascal Belin, Bibi Boehme, and Phil McAleer. 2017. The sound of trustworthiness: Acoustic-based modulation of perceived voice personality. PloS one 12, 10 (2017), e0185651. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0185651

[4]

James V Bradley. 1958. Complete counterbalancing of immediate sequential effects in a Latin square design. J. Amer. Statist. Assoc. 53, 282 (1958), 525--528.

[5]

Nick Campbell. 2008. Expressive/affective speech synthesis. In Springer Handbook of Speech Processing. Springer, 505--518.

[6]

Tim Capes, Paul Coles, Alistair Conkie, Ladan Golipour, Abie Hadjitarkhani, Qiong Hu, Nancy Huddleston, Melvyn Hunt, Jiangchuan Li, Matthias Neeracher, et al. 2017. Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System. In INTERSPEECH. 4011--4015.

[7]

Jean-Charles Chebat, Kamel El Hedhli, Claire Gélinas-Chebat, and Robert Boivin. 2007. Voice and persuasion in a banking telemarketing context. Perceptual and motor skills 104, 2 (2007), 419--437. https://doi.org/10.2466/pms.104.2.419--437

[8]

Robert AJ Clark, Korin Richmond, and Simon King. 2004. Festival 2-build your own general purpose unit selection speech synthesiser. (2004). https://tinyurl.com/TTSfestival

[9]

J Cohen. 1988. Statistical power analysis for the behavioral sciences, Stat. Power Anal. Behav. Sci 567 (1988).

[10]

Nils Dahlbäck, Arne Jönsson, and Lars Ahrenberg. 1993. Wizard of Oz studies - why and how. Knowledge-based systems 6, 4 (1993), 258--266. https://doi.org/10.1016/0950--7051(93)90017-N

[11]

Mateusz Dubiel. 2018. Towards human-like conversational search systems. In Proceedings of the 2018 Conference on Human Information Interaction & Retrieval. 348--350.

Digital Library

[12]

Mateusz Dubiel, Martin Halvey, Leif Azzopardi, and Sylvain Daronnat. 2018. Investigating how conversational search agents affect user's behaviour, performance and search experience. In The Second International Workshop on Conversational Approaches to Information Retrieval. https://tinyurl.com/sigir-cair

[13]

DR Feinberg, BC Jones, LM DeBruine, JJM O'Connor, CC Tigue, and DJ Borak. 2011. Integrating fundamental and formant frequencies in women's preferences for men's voices. Behavioral Ecology 22, 6 (2011), 1320--1325. https://doi.org/10.1093/beheco/arr134

[14]

David R Feinberg, Lisa M DeBruine, Benedict C Jones, and David I Perrett. 2008. The role of femininity and averageness of voice pitch in aesthetic judgments of women's voices. Perception 37, 4 (2008), 615--623. https://doi.org/10.1068/p5514

[15]

David R Feinberg, Benedict C Jones, Anthony C Little, D Michael Burt, and David I Perrett. 2005. Manipulations of fundamental and formant frequencies influence the attractiveness of human male voices. Animal behaviour 69, 3 (2005), 561--568. https://doi.org/10.1016/j.anbehav.2004.06.012

[16]

Kerstin Fischer. 2004. Expressive Speech Characteristics in the Communication with Artificial Agents. In Proceedings of the AISB 2004 Convention. The Society for the Study of Artificial Intelligence and the Simulation of Behaviour, 1--11. https://tinyurl.com/communicating-artificialagents

[17]

Vincent Gaudissart, Silvio Ferreira, Céline Thillou, and Bernard Gosselin. 2004. SYPOLE: mobile reading assistant for blind people. In 9th Conference Speech and Computer.

[18]

Andrew Gibiansky, Sercan Arik, Gregory Diamos, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, and Yanqi Zhou. 2017. Deep voice 2: Multi-speaker neural text-to-speech. In Advances in neural information processing systems. 2962--2970.

[19]

Thomas F Gordon. 1993. The pleadings game. Artificial Intelligence and Law 2, 4 (1993), 239--292.

Digital Library

[20]

Katherine Hamilton, Shin-I Shih, and Susan Mohammed. 2016. The development and validation of the rational and intuitive decision styles scale. Journal of personality assessment 98, 5 (2016), 523--535.

[21]

Vasilis Karaiskos, Simon King, Robert AJ Clark, and Catherine Mayo. 2008. The blizzard challenge 2008. In Proc. Blizzard Challenge Workshop, Brisbane, Australia.

[22]

Kayak. 2020. (Amazon Echo application software). https://www.amazon.co.uk/KAYAK/dp/B01EILLOXI

[23]

Sandra M Ketrow. 1990. Attributes of a telemarketer's voice and persuasiveness. A review and synthesis of the literature. Journal of Direct Marketing 4, 3 (1990), 7--21. https://doi.org/10.1002/dir.4000040304

[24]

Nicole Kobie. 2018. Google's new voice is as good as your own. New scientist 3159 (2018), 9.

[25]

Javier Latorre, Kayoko Yanagisawa, Vincent Wan, Bala Krishna Kolluru, and Mark J.F. Gales. 2014. Speech intonation for TTS: Study on evaluation methodology. Singapore.

[26]

Kevin Lenzo. [n. d.]. The Carnegie Mellon University pronouncing dictionary.

[27]

Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics (1947), 50--60.

[28]

Phil McAleer, Alexander Todorov, and Pascal Belin. 2014. How do you say 'Hello'? Personality impressions from brief novel voices. PloS one 9, 3 (2014), e90779. https://doi.org/10.1371/journal.pone.0090779

[29]

Joseph Mendelson and Matthew P Aylett. 2017. Beyond the Listening Test: An Interactive Approach to TTS Evaluation. In INTERSPEECH. 249--253.

[30]

Shachar Mirkin, Michal Jacovi, Tamar Lavee, Hong-Kwang Kuo, Samuel Thomas, Leslie Sager, Lili Kotlerman, Elad Venezian, and Noam Slonim. 2017. A Recorded Debating Dataset. arXiv preprint arXiv:1709.06438 (2017).

[31]

Michel Nienhuis. 2009. Prosodic Correlates of Rhetorical Appeal. (2009). https://tinyurl.com/RethoricalAppeal

[32]

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).

[33]

Jon Rogers, Loraine Clarke, Martin Skelly, Nick Taylor, Pete Thomas, Michelle Thorne, Solana Larsen, Katarzyna Odrozek, Julia Kloiber, Peter Bihr, et al. 2019. Our Friends Electric: Reflections on Advocacy and Design Research for the Voice Enabled Internet. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 114. https://doi.org/10.1145/3290605.3300344

Digital Library

[34]

Francine Rosselli, John J Skelly, and Diane M Mackie. 1995. Processing rational and emotional messages: The cognitive and affective mediation of persuasion. Journal of experimental social psychology 31, 2 (1995), 163--190. https://doi.org/10.1006/jesp.1995.1008

[35]

Klaus R Scherer, Tom Johnstone, and Gundrun Klasmeyer. 2003. Vocal expression of emotion. Handbook of affective sciences (2003), 433--456.

[36]

Annett Schirmer, Yenju Feng, Antarika Sen, and Trevor B Penney. 2019. Angry, old, male--and trustworthy? How expressive and person voice characteristics R@shape listener trust. PloS one 14, 1 (2019), e0210555. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0210555

[37]

Marc Schröder. 2009. Expressive speech synthesis: Past, present, and possible futures. In Affective information processing. Springer, 111--126.

[38]

Skyscanner Flight Search. 2018. (Amazon Echo application software). https://www.skyscanner.com/tips-and-inspiration/features/amazon-echo-and-skyscanner-a-match-made-in-travel-heaven

[39]

Steven E Stern and John W Mullennix. 2004. Sex differences in persuadability of human and computer-synthesized speech: meta-analysis of seven studies. Psychological reports 94, 3_suppl (2004), 1283--1292. https://doi.org/10.2466/pr0.94.3c.1283--1292

[40]

Steven E Stern, John W Mullennix, Corrie-lynn Dyson, and Stephen J Wilson. 1999. The persuasiveness of synthetic speech versus human speech. Human Factors 41, 4 (1999), 588--595. https://doi.org/10.1518/001872099779656680

[41]

Steven E Stern, John W Mullennix, and Ilya Yaroslavsky. 2006. Persuasion and social perception of human vs. synthetic voice across person as source and computer as source conditions. International Journal of Human-Computer Studies 64, 1 (2006), 43--52. https://doi.org/10.1016/j.ijhcs.2005.07.002

Digital Library

[42]

Eva Strangert and Joakim Gustafson. 2008. Improving speaker skill in a resynthesis experiment. (2008). http://www.diva-portal.org/smash/get/diva2:149676/FULLTEXT01.pdf

[43]

Eva Strangert and Joakim Gustafson. 2008. What makes a good speaker? subject ratings, acoustic measurements and perceptual evaluations. In Ninth Annual Conference of the International Speech Communication Association.

[44]

Hideyuki Tachibana, Katsuya Uenoyama, and Shunsuke Aihara. 2018. Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4784--4788. https://arxiv.org/abs/1710.08969

[45]

Merle W Tate and Sara M Brown. 1970. Note on the Cochran Q test. J. Amer. Statist. Assoc. 65, 329 (1970), 155--160.

[46]

Paul Taylor. 2009. Text-to-speech synthesis. Cambridge university press.

[47]

Alexandra Vtyurina and Adam Fourney. 2018. Exploring the role of conversational cues in guided task support with virtual assistants. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 208. https://doi.org/10.1145/3173574.3173782

Digital Library

[48]

Mirjam Wester, Cassia Valentini-Botinhao, and Gustav Eje Henter. 2015. Are we using enough listeners? No! An empirically-supported critique of interspeech 2014 TTS evaluations. Dresden, Germany.

[49]

Andi Winterboer and Johanna D Moore. 2007. Evaluating information presentation strategies for spoken recommendations. In Proceedings of the 2007 ACM conference on Recommender systems. ACM, 157--160.

Digital Library

[50]

Junichi Yamagishi, Christophe Veaux, Simon King, and Steve Renals. 2012. Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction. Acoustical Science and Technology 33, 1 (2012), 1--5.

[51]

Tangming Yuan, David Moore, and Alec Grierson. 2008. A human-computer dialogue system for educational debate: A computational dialectics approach. International Journal of Artificial Intelligence in Education 18, 1 (2008), 3--26.

Digital Library

[52]

Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J Weiss, Ye Jia, Zhifeng Chen, and Yonghui Wu. 2019. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech. arXiv preprint arXiv:1904.02882 (2019). https://arxiv.org/abs/1904.02882

[53]

Heiga Zen, Keiichi Tokuda, and Alan W Black. 2009. Statistical parametric speech synthesis. speech communication 51, 11 (2009), 1039--1064.

Cited By

Frummet ASpeggiorin AElsweiler DLeuski ADalton J(2024)Cooking with Conversation: Enhancing User Engagement and Learning with a Knowledge-Enhancing AssistantACM Transactions on Information Systems10.1145/364950042:5(1-29)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649500
Dubiel MDesai SZargham NSchmitt A(2024)Voicecraft: Designing Task-specific Voice Assistant PersonasProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3670000(1-3)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3640794.3670000
Pias SHuang RWilliamson DKim MKapadia A(2024)The Impact of Perceived Tone, Age, and Gender on Voice Assistant Persuasiveness in the Context of Product RecommendationsProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665545(1-15)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3640794.3665545
Show More Cited By

Recommendations

Listeners' weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis

The quality of current commercial speech synthesis systems is now so high that system improvements are being made at subtle sub- and supra-segmental levels. Human perceptual evaluation of such subtle improvements requires a highly sophisticated level of ...
Influence of speaker familiarity on blind and visually impaired childrens and young adults perception of synthetic voices

We show that pupils have significantly longer engagement times and better performance when playing games that use synthetic voices built with their own voices.We show that blind children and young adults are better in recognizing synthetic voices than ...
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CUI '20: Proceedings of the 2nd Conference on Conversational User Interfaces

July 2020

271 pages

ISBN:9781450375443

DOI:10.1145/3405755

Conference Chairs:
María Inés Torres,
Stephan Schlögl,
Program Chairs:
Leigh Clark,
Martin Porcheron

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CUI '20

CUI '20: 2nd Conference on Conversational User Interfaces

July 22 - 24, 2020

Bilbao, Spain

Acceptance Rates

CUI '20 Paper Acceptance Rate 13 of 39 submissions, 33%;

Overall Acceptance Rate 34 of 100 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
481
Total Downloads

Downloads (Last 12 months)108
Downloads (Last 6 weeks)24

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Frummet ASpeggiorin AElsweiler DLeuski ADalton J(2024)Cooking with Conversation: Enhancing User Engagement and Learning with a Knowledge-Enhancing AssistantACM Transactions on Information Systems10.1145/364950042:5(1-29)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3649500
Dubiel MDesai SZargham NSchmitt A(2024)Voicecraft: Designing Task-specific Voice Assistant PersonasProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3670000(1-3)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3640794.3670000
Pias SHuang RWilliamson DKim MKapadia A(2024)The Impact of Perceived Tone, Age, and Gender on Voice Assistant Persuasiveness in the Context of Product RecommendationsProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665545(1-15)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3640794.3665545
Desai SDubiel MLeiva L(2024)Examining Humanness as a Metaphor to Design Voice User InterfacesProceedings of the 6th ACM Conference on Conversational User Interfaces10.1145/3640794.3665535(1-15)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3640794.3665535
Dubiel MAylett MSchmitt AMa ZHsieh GWambsganss T(2024)Speech as Interactive Design Material (SIDM): How to design and evaluate task-tailored synthetic voices?Companion Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640544.3645258(131-133)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640544.3645258
Dubiel MSergeeva ALeiva L(2024)Impact of Voice Fidelity on Decision Making: A Potential Dark Pattern?Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645202(181-194)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645202
Desai SWei CSin JDubiel MZargham NAhire SPorcheron MKuzminykh ALee MCandello HFischer JMunteanu CCowan B(2024)CUI@CHI 2024: Building Trust in CUIs—From Design to DeploymentExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3636287(1-7)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3636287
Wei CKim YKuzminykh A(2023)The Bot on Speaking Terms: The Effects of Conversation Architecture on Perceptions of Conversational AgentsProceedings of the 5th International Conference on Conversational User Interfaces10.1145/3571884.3597139(1-16)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3571884.3597139
Zheng QHuang Y(2023)"Begin with the End in Mind": Incorporating UX Evaluation Metrics into Design Materials of Participatory DesignExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3585664(1-7)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544549.3585664
Dubiel MDaronnat SLeiva L(2022)Conversational Agents Trust CalibrationProceedings of the 4th Conference on Conversational User Interfaces10.1145/3543829.3544518(1-6)Online publication date: 26-Jul-2022
https://dl.acm.org/doi/10.1145/3543829.3544518
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents