Enhancing movie experience by speech rate design of audio description

Nakajima, Sawako; Okochi, Naoyuki; Mitobe, Kazutaka

doi:10.1007/s10209-024-01178-z

Enhancing movie experience by speech rate design of audio description

Long Paper
Published: 04 December 2024

(2024)
Cite this article

Universal Access in the Information Society Aims and scope Submit manuscript

Sawako Nakajima¹,
Naoyuki Okochi² &
Kazutaka Mitobe¹

134 Accesses
Explore all metrics

Abstract

Speech rate conversion in screen readers is a crucial human-computer interface that allows persons with vision disabilities to access audiovisual media. However, its potential utility in audio descriptions (ADs) has not been researched extensively. This study investigated the effect of AD speech rates on information access and film aesthetics. The appropriate speech rate for human-narrated and synthesized ADs of short scenes from Japanese movies was evaluated by involving blind, partially blind, and sighted participants. An arranged staircase procedure and software developed to enable AD speech rate adjustments through keyboard operations were employed. The results obtained from blind and partially blind participants indicated lower mean appropriate speech rates of the single AD with human and synthesized voices for information access than those of screen readers in previous studies but higher than the standard spontaneous speech rate. The presence of the film sound did not significantly impact the average speech rate of either the human or synthesized AD. However, the variability of the speech rate decreased specifically for the synthesized AD. Additionally, a statistically significant positive correlation emerged between the appropriate speech rates determined for human and synthesized ADs, strengthened experimentally with short movie scenes with multiple ADs. The comments by blind and partially blind participants and sighted participants’ ratings demonstrated the positive effect of appropriate sequential changes in speech rate on the sense of presence and immersion in movies. The speech rate design of ADs presents an opportunity to improve the comfort of listening to ADs and augment the movie experience.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Novel software for producing audio description based on speech synthesis enables cost reduction without sacrificing quality

Article 21 March 2022

Audio Description: Concepts, Theories and Research Approaches

The ADLAB Project: Audio Description for the Blind

References

Asakawa, C., Takagi, H., Ino, S., et al.: The optimal and maximum listening rates in presenting speech information to the blind. J. Human Interface Soc. 7(1):105–111 (2005) https://ci.nii.ac.jp/naid/10014479377/
Bodi, A., Fazli, P., Ihorn, S., et al.: Automated video description for blind and low vision users. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, pp 1–7, (2021) https://doi.org/10.1145/3411763.3451810
Bragg, D., Bennett, C., Reinecke, K., et al.: A large inclusive study of human listening rates. CHI ’18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems pp 1–12 (2018) https://doi.org/10.1145/3173574.3174018
Cambre, J., Colnago, J., Maddock, J., et al.: Choice of voices: A large-scale evaluation of text-to-speech voice quality for long-form content. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, CHI ’20, pp 1–13, (2020) https://doi.org/10.1145/3313831.3376789
Campos, V.P., de Araújo, T.M.U., de Souza Filho, G.L., et al.: CineAD: a system for automated audio description script generation for the visually impaired. Univ. Access Inf. Soc. 19, 99–111 (2020). https://doi.org/10.1007/s10209-018-0634-4
Article Google Scholar
Campos, V.P., Gonçalves, L.M.G., Ribeiro, W.L., et al.: Machine generation of audio description for blind and visually impaired people. ACM Trans. Access Comput. 16(2), 1–28 (2023). https://doi.org/10.1145/3590955
Article Google Scholar
Choi, D., Kwak, D., Cho, M., et al.: Nobody speaks that fast!. An empirical study of speech rate in conversational agents for people with vision impairments. CHI ’20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems pp 1–13 (2020) https://doi.org/10.1145/3313831.3376569
Clark, R., Silen, H., Kenter, T., et al.: Evaluating long-form text-to-speech: Comparing the ratings of sentences and paragraphs. In: Proceeding 10th ISCA Workshop on Speech Synthesis (SSW 10), pp 99–104, (2019) https://doi.org/10.21437/SSW.2019-18
Cornsweet, T.N.: The staircase-method in psychophysics. Am. J. Psychol. 75(3), 485–491 (1962). https://doi.org/10.2307/1419876
Article Google Scholar
Described, Captioned Media Program, Description key - How to describe. https://dcmp.org/learn/617-description-key---how-to-describe, Accessed 20 September 2023 (2023)
Fernández-Torné, A.: Audio description and technologies: study on the semi-automatisation of the translation and voicing of audio descriptions. PhD thesis, Universitat Autnoma de Barcelona, Barcelona, Spain (2016)
Fernández-Torné, A., Matamala, A.: Text-to-speech vs. human voiced audio descriptions: a reception study in films dubbed into catalan. J. Spec. Trans. 24, 61–88 (2015)
Google Scholar
Fisher, R.A.: Statistical methods for research workers. Oliver and Boyd, Edinburgh, Scotland (1925)
Han, T., Bain, M., Nagrani, A., et al.: Autoad: Movie description in context. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 18930–18940, (2023) https://doi.org/10.48550/arXiv.2303.16899
Jankowska, A., Ziółko, B., Irgas-Cybulska, M., et al.: Reading rate in filmic audio description. Int. J. Trans. 19, 75–97 (2017). https://doi.org/10.13137/2421-6763/17352
Article Google Scholar
Kobayashi, M., O’Connell, T., Gould, B., et al.: Are synthesized video descriptions acceptable? In: Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility. Association for Computing Machinery, New York, NY, USA, ASSETS ’10, pp 163–170, (2010) https://doi.org/10.1145/1878803.1878833
Kurihara, K., Imai, A., Seiyama, N., et al.: Automatic generation of audio descriptions for sports programs. SMPTE Motion Imag. J. 128(1), 41–47 (2019). https://doi.org/10.5594/JMI.2018.2879261
Article Google Scholar
Maekawa, K.: Corpus of spontaneous Japanese: Its design and evaluation. Proceedings of The ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR 2003) pp 7–12 (2003)
Natalie, R., Tseng, J., Kacorri, H., et al.: Supporting novices author audio descriptions via automatic feedback. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, CHI ’23, (2023) https://doi.org/10.1145/3544548.3581023
Netflix, Inc. Audio description style guide v2.5. https://partnerhelp.netflixstudios.com/hc/en-us/articles/215510667-Audio-Description-Style-Guide-v2-5, Accessed 20 September 2023 (2023)
Omori, K., Nakagawa, R., Yasumura, M., et al.: Comparative evaluation of the movie with audio description narrated with text-to-speech. IEICE Tech. Rep. 114(512), 17–22 (2015)
Google Scholar
Oncescu, A., Henriques, J.F., Liu, Y., et al.: QuerYD: A video dataset with high-quality textual and audio narrations. (2020) CoRR arXiv preprint. arXiv preprint arXiv:2011.11071
Plaza, M.: Cost-effectiveness of audio description production process: comparative analysis of outsourcing and ‘in-house’ methods. Int. J. Prod. Res. 55(12), 3480–3496 (2017). https://doi.org/10.1080/00207543.2017.1282182
Article Google Scholar
Sade, J., Naz, K., Plaza, M.: Enhancing audio description: A value added approach. In: Miesenberger, K., Karshmer, A., Penaz, P., et al. (eds.) Computers Helping People with Special Needs. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 270–277, (2012) https://doi.org/10.1007/978-3-642-31522-0_40
Salway, A.: A corpus-based analysis of audio description. Media for All. pp 151–174 (2007) https://doi.org/10.1163/9789401209564-s012
Shen, X., Li, D., Zhou, J., et al.: Fine-grained audible video description. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10585–10596, (2023) https://doi.org/10.1109/CVPR52688.2022.00497, arXiv:2303.15616
Snyder, J.: Audio desciption guidelines and best practices. American Council of The Blind’s Audio Description Project (2010)
Soldan, M., Pardo, A., Alcázar, J.L., et al.: Mad: A scalable dataset for language grounding in videos from movie audio descriptions. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5016–5025, (2022) https://doi.org/10.48550/arXiv.2112.00431,
Szarkowska, A.: Text-to-speech audio description: towards wider availability of ad. J. Sp. Trans. 15, 142–163 (2011)
Google Scholar
Taylor, C., Perego, E. (eds.) The routledge handbook of audio description. Routledge (2022). https://doi.org/10.4324/9781003003052
Independent Television Commission, London: ITC Guidance On Standards for Audio Description (2000. https://msradio.huji.ac.il/narration.doc
Walczak, A., Fryer, L.: Vocal delivery of audio description by genre: measuring users’ presence. Perspectives 26(1), 69–83 (2018). https://doi.org/10.1080/0907676X.2017.1298634
Article Google Scholar
Wang, Y., Liang, W., Huang, H., et al.: Toward automatic audio description generation for accessible videos. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, CHI ’21, pp 1–12, (2021) https://doi.org/10.1145/3411764.3445347
Watanane, T.: Acoustical research on speech properties and explanatory expressions used in screen readers for visually impaired persons. Tech. Rep. The National Institute of Special Education, Japan, (2004) http://www.nise.go.jp/cms/resources/content/389/F-117.pdf

Download references

Acknowledgements

We thank Takeya Naono for his assistance with software development and data collection.

Funding

This study was supported by a Grant-in-Aid for Scientific Research (C) 22K12330.

Author information

Authors and Affiliations

Graduate School of Engineering Science, Akita University, 1-1 Tegatagakuen-machi, Akita city, Akita, 010-8502, Japan
Sawako Nakajima & Kazutaka Mitobe
Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo, 153-8904, Japan
Naoyuki Okochi

Authors

Sawako Nakajima
View author publications
You can also search for this author in PubMed Google Scholar
Naoyuki Okochi
View author publications
You can also search for this author in PubMed Google Scholar
Kazutaka Mitobe
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.N. wrote the main manuscript text. K.M. and N.O. reviewed the writing of the manuscript.

Corresponding author

Correspondence to Sawako Nakajima.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Ethical approval

This study was reviewed and approved by Akita University based on Article 12 (1) of the Code of Ethics for Human Subject Research. Informed consent of the participants was received before the commencement of this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nakajima, S., Okochi, N. & Mitobe, K. Enhancing movie experience by speech rate design of audio description. Univ Access Inf Soc (2024). https://doi.org/10.1007/s10209-024-01178-z

Download citation

Accepted: 15 November 2024
Published: 04 December 2024
DOI: https://doi.org/10.1007/s10209-024-01178-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing movie experience by speech rate design of audio description

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Novel software for producing audio description based on speech synthesis enables cost reduction without sacrificing quality

Audio Description: Concepts, Theories and Research Approaches

The ADLAB Project: Audio Description for the Blind

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now