Abstract
The surge in digital content consumption has, in many cases, posed challenges for media companies, resulting in reduced revenue and the need to reinvent business models. The digitalization of content has introduced new consumption formats, and news podcasts have already become a reality in this landscape. While their existence is relatively recent in journalism, the increasing popularity of this format makes it an appealing addition to the field. But the production of podcasts may be demanding in what relates to the time, resources and even the technical expertise needed. In this scope, this paper primarily focuses on the premise of facilitating the creation of news podcasts. To achieve this, we propose employing Text-to-Speech technologies (TTS) for the oralization of journalistic texts in European Portuguese. We conducted tests using TTS services from Amazon Polly and Google Speech Cloud, with Google Speech Cloud Wavenet services yielding superior results among potential users. Additionally, we developed three podcast models incorporating human voiceover and/or TTS to get the users acceptance of those models. One model used only human voices, another only voice created with TTS and a hybrid podcast integrating both types of voices. The presence of human voice positively influenced the results, with the human voice model and hybrid voice outperforming the exclusive TTS voice model. However, the differences between the models were not significantly pronounced, and the results demonstrated an acceptance of Text-to-Speech technology in the context of news podcasts. Nonetheless, there remains a need for continuous technological advancement to converge with human discourse.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Harte, D., Howells, R., Williams, A.: Hyperlocal Journalism: The Decline of Local Newspapers and the Rise of Online Community News. Routledge, Milton Park (2018)
Newman, N., Gallo, N.: News podcasts and the opportunities for publishers (2019)
Allan, S.: Online News: Journalism and the Internet. McGraw-Hill Education, UK (2006)
Newman, N., Fletcher, R., Schulz, A., Andı, S., Nielsen, R.K.: Reuters institute digital news report 2020 (2020)
Botelho, M.: A crise dos jornais e do jornalismo. Meios & Publicidade (2017)
Stephens, M.: A History of News. Oxford University Press, Oxford (2007)
Sweney, M.: Spotify credits podcast popularity for 24% growth in subscribers | Spotify | The Guardian, 03 February 2021. https://www.theguardian.com/technology/2021/feb/03/spotify-podcast-popularity-24-percent-growth-subscribers. Accessed 23 Feb 2021
Bhattacharjee, M.: News podcasts grow by 32% as daily news shows become increasingly popular, reports Reuters | What’s New in Publishing | Digital Publishing News, 10 December 2019. https://whatsnewinpublishing.com/news-podcasts-grow-by-32-as-daily-news-shows-become-increasingly-popular-reports-reuters/. Accessed 23 Feb 2021
Edison Media: Comedy Tops the Podcast Genre Chart in the U.S. for Q2 2022 - Edison Research. https://www.edisonresearch.com/comedy-tops-the-podcast-genre-chart-in-the-u-s-for-q2-2022/. Accessed 05 Nov 2022
Klatt, D.H.: Review of text-to-speech conversion for English. J. Acoust. Soc. Am. 82(3), 737–793 (1987)
Arik, S.O., et al.: Deep voice: real-time neural text-to-speech. arXiv preprint arXiv:1702.07825 (2017)
Tian, Q., Wan, X., Liu, S.: Generative adversarial network based speaker adaptation for high fidelity waveNet vocoder (2019). https://arxiv.org/pdf/1812.02339.pdf. Accessed 09 Feb 2021
Gibiansky, A., et al.: Deep voice 2: multi-speaker neural text-to-speech. Adv. Neural. Inf. Process. Syst. 30, 2962–2970 (2017)
Rowan, D.: DeepMind: inside Google’s groundbreaking artificial intelligence startup | WIRED UK, 22 June 2015. https://www.wired.co.uk/article/deepmind. Accessed 08 Feb 2021
Mendelson, J., Aylett, M.P. Beyond the listening test: an interactive approach to TTS evaluation. In: INTERSPEECH, pp. 249–253 (2017)
Wagner, P., et al.: Speech synthesis evaluation—state-of-the-art assessment and suggestion for a novel research program. In: Proceedings of the 10th Speech Synthesis Workshop (SSW10) (2019)
Rec, I.: P. 85. A method for subjective performance assessment of the quality of speech voice output devices. Int. Telecommun. Union Geneva (1994)
Hoβfeld, T., Schatz, R., Egger, S.: SOS: the MOS is not enough! In: 2011 Third International Workshop on Quality of Multimedia Experience, pp. 131–136. IEEE (2011)
Cambre, J., Maddock, J., Tsai, J., Colnago, J.: Choice of voices: a large-scale evaluation of text-to-speech voice quality for long-form content, vol. 20, April 2020. https://doi.org/10.1145/3313831.3376789
Likert, R.: A technique for the measurement of attitudes. Arch. Psychol. (1932)
Almeida, P., Beça, P., Soares, J., Soares, B.: MixMyVisit – a solution for the automatic creation of videos to enhance the visitors’ experience. In: Abásolo, M.J., Olmedo Cifuentes, G.F. (eds.) jAUTI 2021. CCIS, vol. 1597, pp. 105–118. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-22210-8_7
Almeida, P., Beça, P., Silva, T., Afonso, M., Covalenco, I., Duarte Nicolau, C.: A podcast creation platform to support news corporations: results from UX evaluation. In: ACM International Conference on Interactive Media Experiences, pp. 343–348, June 2022
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Afonso, M., Almeida, P. (2023). Combining Text-to-Speech Services with Conventional Voiceover for News Oralization. In: Abásolo, M.J., de Castro Lozano, C., Olmedo Cifuentes, G.F. (eds) Applications and Usability of Interactive TV. jAUTI 2022. Communications in Computer and Information Science, vol 1820. Springer, Cham. https://doi.org/10.1007/978-3-031-45611-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-45611-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45610-7
Online ISBN: 978-3-031-45611-4
eBook Packages: Computer ScienceComputer Science (R0)