Combining Text-to-Speech Services with Conventional Voiceover for News Oralization

Afonso, Marcelo; Almeida, Pedro

doi:10.1007/978-3-031-45611-4_5

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1820))

Included in the following conference series:

Iberoamerican Conference on Applications and Usability of Interactive TV

77 Accesses

Abstract

The surge in digital content consumption has, in many cases, posed challenges for media companies, resulting in reduced revenue and the need to reinvent business models. The digitalization of content has introduced new consumption formats, and news podcasts have already become a reality in this landscape. While their existence is relatively recent in journalism, the increasing popularity of this format makes it an appealing addition to the field. But the production of podcasts may be demanding in what relates to the time, resources and even the technical expertise needed. In this scope, this paper primarily focuses on the premise of facilitating the creation of news podcasts. To achieve this, we propose employing Text-to-Speech technologies (TTS) for the oralization of journalistic texts in European Portuguese. We conducted tests using TTS services from Amazon Polly and Google Speech Cloud, with Google Speech Cloud Wavenet services yielding superior results among potential users. Additionally, we developed three podcast models incorporating human voiceover and/or TTS to get the users acceptance of those models. One model used only human voices, another only voice created with TTS and a hybrid podcast integrating both types of voices. The presence of human voice positively influenced the results, with the human voice model and hybrid voice outperforming the exclusive TTS voice model. However, the differences between the models were not significantly pronounced, and the results demonstrated an acceptance of Text-to-Speech technology in the context of news podcasts. Nonetheless, there remains a need for continuous technological advancement to converge with human discourse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

RichCast - A Voice-Driven Interactive Digital Narrative Authoring System

Open Web-Based Text-to-Speech Services for the Citizens

Real-Time Phonic Decipherer

References

Harte, D., Howells, R., Williams, A.: Hyperlocal Journalism: The Decline of Local Newspapers and the Rise of Online Community News. Routledge, Milton Park (2018)
Google Scholar
Newman, N., Gallo, N.: News podcasts and the opportunities for publishers (2019)
Google Scholar
Allan, S.: Online News: Journalism and the Internet. McGraw-Hill Education, UK (2006)
Google Scholar
Newman, N., Fletcher, R., Schulz, A., Andı, S., Nielsen, R.K.: Reuters institute digital news report 2020 (2020)
Google Scholar
Botelho, M.: A crise dos jornais e do jornalismo. Meios & Publicidade (2017)
Google Scholar
Stephens, M.: A History of News. Oxford University Press, Oxford (2007)
Google Scholar
Sweney, M.: Spotify credits podcast popularity for 24% growth in subscribers | Spotify | The Guardian, 03 February 2021. https://www.theguardian.com/technology/2021/feb/03/spotify-podcast-popularity-24-percent-growth-subscribers. Accessed 23 Feb 2021
Bhattacharjee, M.: News podcasts grow by 32% as daily news shows become increasingly popular, reports Reuters | What’s New in Publishing | Digital Publishing News, 10 December 2019. https://whatsnewinpublishing.com/news-podcasts-grow-by-32-as-daily-news-shows-become-increasingly-popular-reports-reuters/. Accessed 23 Feb 2021
Edison Media: Comedy Tops the Podcast Genre Chart in the U.S. for Q2 2022 - Edison Research. https://www.edisonresearch.com/comedy-tops-the-podcast-genre-chart-in-the-u-s-for-q2-2022/. Accessed 05 Nov 2022
Klatt, D.H.: Review of text-to-speech conversion for English. J. Acoust. Soc. Am. 82(3), 737–793 (1987)
Article Google Scholar
Arik, S.O., et al.: Deep voice: real-time neural text-to-speech. arXiv preprint arXiv:1702.07825 (2017)
Tian, Q., Wan, X., Liu, S.: Generative adversarial network based speaker adaptation for high fidelity waveNet vocoder (2019). https://arxiv.org/pdf/1812.02339.pdf. Accessed 09 Feb 2021
Gibiansky, A., et al.: Deep voice 2: multi-speaker neural text-to-speech. Adv. Neural. Inf. Process. Syst. 30, 2962–2970 (2017)
Google Scholar
Rowan, D.: DeepMind: inside Google’s groundbreaking artificial intelligence startup | WIRED UK, 22 June 2015. https://www.wired.co.uk/article/deepmind. Accessed 08 Feb 2021
Mendelson, J., Aylett, M.P. Beyond the listening test: an interactive approach to TTS evaluation. In: INTERSPEECH, pp. 249–253 (2017)
Google Scholar
Wagner, P., et al.: Speech synthesis evaluation—state-of-the-art assessment and suggestion for a novel research program. In: Proceedings of the 10th Speech Synthesis Workshop (SSW10) (2019)
Google Scholar
Rec, I.: P. 85. A method for subjective performance assessment of the quality of speech voice output devices. Int. Telecommun. Union Geneva (1994)
Google Scholar
Hoβfeld, T., Schatz, R., Egger, S.: SOS: the MOS is not enough! In: 2011 Third International Workshop on Quality of Multimedia Experience, pp. 131–136. IEEE (2011)
Google Scholar
Cambre, J., Maddock, J., Tsai, J., Colnago, J.: Choice of voices: a large-scale evaluation of text-to-speech voice quality for long-form content, vol. 20, April 2020. https://doi.org/10.1145/3313831.3376789
Likert, R.: A technique for the measurement of attitudes. Arch. Psychol. (1932)
Google Scholar
Almeida, P., Beça, P., Soares, J., Soares, B.: MixMyVisit – a solution for the automatic creation of videos to enhance the visitors’ experience. In: Abásolo, M.J., Olmedo Cifuentes, G.F. (eds.) jAUTI 2021. CCIS, vol. 1597, pp. 105–118. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-22210-8_7
Almeida, P., Beça, P., Silva, T., Afonso, M., Covalenco, I., Duarte Nicolau, C.: A podcast creation platform to support news corporations: results from UX evaluation. In: ACM International Conference on Interactive Media Experiences, pp. 343–348, June 2022
Google Scholar

Download references

Author information

Authors and Affiliations

Digimedia, University of Aveiro, 3810-193, Aveiro, Portugal
Marcelo Afonso & Pedro Almeida

Authors

Marcelo Afonso
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Almeida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Almeida .

Editor information

Editors and Affiliations

Faculty of Informatic, National University of La Plata, La Plata, Argentina
María José Abásolo
University of Córdoba, Córdoba, Spain
Carlos de Castro Lozano
ESPE, Sangolquí, Ecuador
Gonzalo F. Olmedo Cifuentes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Afonso, M., Almeida, P. (2023). Combining Text-to-Speech Services with Conventional Voiceover for News Oralization. In: Abásolo, M.J., de Castro Lozano, C., Olmedo Cifuentes, G.F. (eds) Applications and Usability of Interactive TV. jAUTI 2022. Communications in Computer and Information Science, vol 1820. Springer, Cham. https://doi.org/10.1007/978-3-031-45611-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-45611-4_5
Published: 18 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45610-7
Online ISBN: 978-3-031-45611-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Combining Text-to-Speech Services with Conventional Voiceover for News Oralization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

RichCast - A Voice-Driven Interactive Digital Narrative Authoring System

Open Web-Based Text-to-Speech Services for the Citizens

Real-Time Phonic Decipherer

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Combining Text-to-Speech Services with Conventional Voiceover for News Oralization

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

RichCast - A Voice-Driven Interactive Digital Narrative Authoring System

Open Web-Based Text-to-Speech Services for the Citizens

Real-Time Phonic Decipherer

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation