Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3657242.3658590acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinteraccionConference Proceedingsconference-collections
research-article
Open access

Self-guided Spatial Composition as an Additional Layer of Information to Enhance Accessibility of Images for Blind Users

Published: 19 June 2024 Publication History

Abstract

Image spatial composition can provide additional information in an image or photograph. However, the usual approaches to make images accessible to blind people focus mainly on describing the image’s content, without delving into other aspects such as spatial composition, colors, background, known faces, etc. In doing so, much information that is present in the image but not included in the description is missing for a blind user. This work explores the combination of image captioning and object detection techniques with the final goal of making images more accessible to blind users. The approach is twofold: (1) state-of-the-art algorithms of image captioning and object detection will be combined so blind users can visualize the spatial composition of a given image; and (2) blind users will guide the exploration of the images, so they can gather all the information in a personalized manner and make their own interpretation. We implemented a preliminary prototype based on requirements obtained from blind users and performed an evaluation that provided promising results. The participants were reasonably satisfied with the usability of the prototype, and in several cases, they were able to obtain more information during their self-guided exploration of the images than with the general original description. However, some issues that were detected in the evaluation, and functionalities that could not be implemented, will be addressed in future work.

References

[1]
John Brooke. 1995. SUS: A quick and dirty usability scale. Usability Eval. Ind. 189 (11 1995).
[2]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End Object Detection with Transformers. CoRR abs/2005.12872 (2020). arxiv:2005.12872https://arxiv.org/abs/2005.12872
[3]
Shashank Mohan Jain. 2022. Hugging Face. Apress, Berkeley, CA, 51–67. https://doi.org/10.1007/978-1-4842-8844-3_4
[4]
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. arxiv:2201.12086
[5]
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. 2015. Microsoft COCO: Common Objects in Context. arxiv:1405.0312
[6]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single Shot MultiBox Detector. Springer International Publishing, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
[7]
Jiasen Lu, Dhruv Batra, Devi Parikh, and Stefan Lee. 2019. Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Advances in neural information processing systems 32 (2019).
[8]
Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. 2017. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3242–3250. https://doi.org/10.1109/CVPR.2017.345
[9]
Vishnu Nair, Hanxiu ’Hazel’ Zhu, and Brian A. Smith. 2023. ImageAssist: Tools for Enhancing Touchscreen-Based Image Exploration Systems for Blind and Low Vision Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 76, 17 pages. https://doi.org/10.1145/3544548.3581302
[10]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 779–788. https://doi.org/10.1109/CVPR.2016.91
[11]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2016. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arxiv:1506.01497
[12]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet Large Scale Visual Recognition Challenge. arxiv:1409.0575
[13]
Anastasia Schaadhardt, Alexis Hiniker, and Jacob O. Wobbrock. 2021. Understanding Blind Screen-Reader Users’ Experiences of Digital Artboards. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 270, 19 pages. https://doi.org/10.1145/3411764.3445242
[14]
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3156–3164. https://doi.org/10.1109/CVPR.2015.7298935
[15]
Zhuohao (Jerry) Zhang and Jacob O. Wobbrock. 2023. A11yBoard: Making Digital Artboards Accessible to Blind and Low-Vision Users. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 55, 17 pages. https://doi.org/10.1145/3544548.3580655

Index Terms

  1. Self-guided Spatial Composition as an Additional Layer of Information to Enhance Accessibility of Images for Blind Users

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        Interacción '24: Proceedings of the XXIV International Conference on Human Computer Interaction
        June 2024
        155 pages
        This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 19 June 2024

        Check for updates

        Author Tags

        1. Accessible Images
        2. Blind Users
        3. Image Description
        4. Object Recognition
        5. Spatial Composition

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        • Agencia Estatal de Investigación, España

        Conference

        INTERACCION 2024

        Acceptance Rates

        Overall Acceptance Rate 109 of 163 submissions, 67%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 86
          Total Downloads
        • Downloads (Last 12 months)86
        • Downloads (Last 6 weeks)22
        Reflects downloads up to 09 Nov 2024

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media