Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Enhancing movie experience by speech rate design of audio description

  • Long Paper
  • Published:
Universal Access in the Information Society Aims and scope Submit manuscript

Abstract

Speech rate conversion in screen readers is a crucial human-computer interface that allows persons with vision disabilities to access audiovisual media. However, its potential utility in audio descriptions (ADs) has not been researched extensively. This study investigated the effect of AD speech rates on information access and film aesthetics. The appropriate speech rate for human-narrated and synthesized ADs of short scenes from Japanese movies was evaluated by involving blind, partially blind, and sighted participants. An arranged staircase procedure and software developed to enable AD speech rate adjustments through keyboard operations were employed. The results obtained from blind and partially blind participants indicated lower mean appropriate speech rates of the single AD with human and synthesized voices for information access than those of screen readers in previous studies but higher than the standard spontaneous speech rate. The presence of the film sound did not significantly impact the average speech rate of either the human or synthesized AD. However, the variability of the speech rate decreased specifically for the synthesized AD. Additionally, a statistically significant positive correlation emerged between the appropriate speech rates determined for human and synthesized ADs, strengthened experimentally with short movie scenes with multiple ADs. The comments by blind and partially blind participants and sighted participants’ ratings demonstrated the positive effect of appropriate sequential changes in speech rate on the sense of presence and immersion in movies. The speech rate design of ADs presents an opportunity to improve the comfort of listening to ADs and augment the movie experience.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Asakawa, C., Takagi, H., Ino, S., et al.: The optimal and maximum listening rates in presenting speech information to the blind. J. Human Interface Soc. 7(1):105–111 (2005) https://ci.nii.ac.jp/naid/10014479377/

  2. Bodi, A., Fazli, P., Ihorn, S., et al.: Automated video description for blind and low vision users. In: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, pp 1–7, (2021) https://doi.org/10.1145/3411763.3451810

  3. Bragg, D., Bennett, C., Reinecke, K., et al.: A large inclusive study of human listening rates. CHI ’18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems pp 1–12 (2018) https://doi.org/10.1145/3173574.3174018

  4. Cambre, J., Colnago, J., Maddock, J., et al.: Choice of voices: A large-scale evaluation of text-to-speech voice quality for long-form content. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, CHI ’20, pp 1–13, (2020) https://doi.org/10.1145/3313831.3376789

  5. Campos, V.P., de Araújo, T.M.U., de Souza Filho, G.L., et al.: CineAD: a system for automated audio description script generation for the visually impaired. Univ. Access Inf. Soc. 19, 99–111 (2020). https://doi.org/10.1007/s10209-018-0634-4

    Article  Google Scholar 

  6. Campos, V.P., Gonçalves, L.M.G., Ribeiro, W.L., et al.: Machine generation of audio description for blind and visually impaired people. ACM Trans. Access Comput. 16(2), 1–28 (2023). https://doi.org/10.1145/3590955

    Article  Google Scholar 

  7. Choi, D., Kwak, D., Cho, M., et al.: Nobody speaks that fast!. An empirical study of speech rate in conversational agents for people with vision impairments. CHI ’20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems pp 1–13 (2020) https://doi.org/10.1145/3313831.3376569

  8. Clark, R., Silen, H., Kenter, T., et al.: Evaluating long-form text-to-speech: Comparing the ratings of sentences and paragraphs. In: Proceeding 10th ISCA Workshop on Speech Synthesis (SSW 10), pp 99–104, (2019) https://doi.org/10.21437/SSW.2019-18

  9. Cornsweet, T.N.: The staircase-method in psychophysics. Am. J. Psychol. 75(3), 485–491 (1962). https://doi.org/10.2307/1419876

    Article  Google Scholar 

  10. Described, Captioned Media Program, Description key - How to describe. https://dcmp.org/learn/617-description-key---how-to-describe, Accessed 20 September 2023 (2023)

  11. Fernández-Torné, A.: Audio description and technologies: study on the semi-automatisation of the translation and voicing of audio descriptions. PhD thesis, Universitat Autnoma de Barcelona, Barcelona, Spain (2016)

  12. Fernández-Torné, A., Matamala, A.: Text-to-speech vs. human voiced audio descriptions: a reception study in films dubbed into catalan. J. Spec. Trans. 24, 61–88 (2015)

    Google Scholar 

  13. Fisher, R.A.: Statistical methods for research workers. Oliver and Boyd, Edinburgh, Scotland (1925)

  14. Han, T., Bain, M., Nagrani, A., et al.: Autoad: Movie description in context. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 18930–18940, (2023) https://doi.org/10.48550/arXiv.2303.16899

  15. Jankowska, A., Ziółko, B., Irgas-Cybulska, M., et al.: Reading rate in filmic audio description. Int. J. Trans. 19, 75–97 (2017). https://doi.org/10.13137/2421-6763/17352

    Article  Google Scholar 

  16. Kobayashi, M., O’Connell, T., Gould, B., et al.: Are synthesized video descriptions acceptable? In: Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility. Association for Computing Machinery, New York, NY, USA, ASSETS ’10, pp 163–170, (2010) https://doi.org/10.1145/1878803.1878833

  17. Kurihara, K., Imai, A., Seiyama, N., et al.: Automatic generation of audio descriptions for sports programs. SMPTE Motion Imag. J. 128(1), 41–47 (2019). https://doi.org/10.5594/JMI.2018.2879261

    Article  Google Scholar 

  18. Maekawa, K.: Corpus of spontaneous Japanese: Its design and evaluation. Proceedings of The ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition (SSPR 2003) pp 7–12 (2003)

  19. Natalie, R., Tseng, J., Kacorri, H., et al.: Supporting novices author audio descriptions via automatic feedback. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, CHI ’23, (2023) https://doi.org/10.1145/3544548.3581023

  20. Netflix, Inc. Audio description style guide v2.5. https://partnerhelp.netflixstudios.com/hc/en-us/articles/215510667-Audio-Description-Style-Guide-v2-5, Accessed 20 September 2023 (2023)

  21. Omori, K., Nakagawa, R., Yasumura, M., et al.: Comparative evaluation of the movie with audio description narrated with text-to-speech. IEICE Tech. Rep. 114(512), 17–22 (2015)

    Google Scholar 

  22. Oncescu, A., Henriques, J.F., Liu, Y., et al.: QuerYD: A video dataset with high-quality textual and audio narrations. (2020) CoRR arXiv preprint. arXiv preprint arXiv:2011.11071

  23. Plaza, M.: Cost-effectiveness of audio description production process: comparative analysis of outsourcing and ‘in-house’ methods. Int. J. Prod. Res. 55(12), 3480–3496 (2017). https://doi.org/10.1080/00207543.2017.1282182

    Article  Google Scholar 

  24. Sade, J., Naz, K., Plaza, M.: Enhancing audio description: A value added approach. In: Miesenberger, K., Karshmer, A., Penaz, P., et al. (eds.) Computers Helping People with Special Needs. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 270–277, (2012) https://doi.org/10.1007/978-3-642-31522-0_40

  25. Salway, A.: A corpus-based analysis of audio description. Media for All. pp 151–174 (2007) https://doi.org/10.1163/9789401209564-s012

  26. Shen, X., Li, D., Zhou, J., et al.: Fine-grained audible video description. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 10585–10596, (2023) https://doi.org/10.1109/CVPR52688.2022.00497, arXiv:2303.15616

  27. Snyder, J.: Audio desciption guidelines and best practices. American Council of The Blind’s Audio Description Project (2010)

  28. Soldan, M., Pardo, A., Alcázar, J.L., et al.: Mad: A scalable dataset for language grounding in videos from movie audio descriptions. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5016–5025, (2022) https://doi.org/10.48550/arXiv.2112.00431,

  29. Szarkowska, A.: Text-to-speech audio description: towards wider availability of ad. J. Sp. Trans. 15, 142–163 (2011)

    Google Scholar 

  30. Taylor, C., Perego, E. (eds.) The routledge handbook of audio description. Routledge (2022). https://doi.org/10.4324/9781003003052

  31. Independent Television Commission, London: ITC Guidance On Standards for Audio Description (2000. https://msradio.huji.ac.il/narration.doc

  32. Walczak, A., Fryer, L.: Vocal delivery of audio description by genre: measuring users’ presence. Perspectives 26(1), 69–83 (2018). https://doi.org/10.1080/0907676X.2017.1298634

    Article  Google Scholar 

  33. Wang, Y., Liang, W., Huang, H., et al.: Toward automatic audio description generation for accessible videos. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, CHI ’21, pp 1–12, (2021) https://doi.org/10.1145/3411764.3445347

  34. Watanane, T.: Acoustical research on speech properties and explanatory expressions used in screen readers for visually impaired persons. Tech. Rep. The National Institute of Special Education, Japan, (2004) http://www.nise.go.jp/cms/resources/content/389/F-117.pdf

Download references

Acknowledgements

We thank Takeya Naono for his assistance with software development and data collection.

Funding

This study was supported by a Grant-in-Aid for Scientific Research (C) 22K12330.

Author information

Authors and Affiliations

Authors

Contributions

S.N. wrote the main manuscript text. K.M. and N.O. reviewed the writing of the manuscript.

Corresponding author

Correspondence to Sawako Nakajima.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Ethical approval

This study was reviewed and approved by Akita University based on Article 12 (1) of the Code of Ethics for Human Subject Research. Informed consent of the participants was received before the commencement of this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nakajima, S., Okochi, N. & Mitobe, K. Enhancing movie experience by speech rate design of audio description. Univ Access Inf Soc (2024). https://doi.org/10.1007/s10209-024-01178-z

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10209-024-01178-z

Keywords