Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3301275.3302271acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article
Open access

Scene text access: a comparison of mobile OCR modalities for blind users

Published: 17 March 2019 Publication History

Abstract

We present a study with seven blind participants using three different mobile OCR apps to find text posted in various indoor environments. The first app considered was Microsoft SeeingAI in its Short Text mode, which reads any text in sight with a minimalistic interface. The second app was Spot+OCR, a custom application that separates the task of text detection from OCR proper. Upon detection of text in the image, Spot+OCR generates a short vibration; as soon as the user stabilizes the phone, a high-resolution snapshot is taken and OCR-processed. The third app, Guided OCR, was designed to guide the user in taking several pictures in a 360° span at the maximum resolution available by the camera, with minimum overlap between pictures. Quantitative results (in terms of true positive ratios and traversal speed) were recorded. Along with the qualitative observation and outcomes from an exit survey, these results allow us to identify and assess the different strategies used by our participants, as well as the challenges of operating these systems without sight.

Supplementary Material

MP4 File (p197-neat.mp4)

References

[1]
Bangor, A., Kortum, P. T., & Miller, J. T. (2008). An empirical evaluation of the system usability scale. Intl. Journal of Human-Computer Interaction, 24(6), 574--594.
[2]
Barber, M. (2008). The kNFB Reader mobile: An individual perspective. Braille Monitor, 51(5).
[3]
Brooke, J. (1996). SUS-A quick and dirty usability scale. Usability evaluation in industry, 189(194), 4--7.
[4]
Bigham, J. P., Jayant, C., Miller, A., White, B., & Yeh, T. (2010, June). VizWiz:: LocateIt-enabling blind people to locate objects in their environment. In Proc. IEEE Workshop on Computer Vision Applications for the Visually Impaired.
[5]
Cutter, M., & Manduchi, R. (2017). Improving the accessibility of mobile OCR apps via interactive modalities. ACM Transactions on Accessible Computing (TACCESS), 10(4), 11.
[6]
Fiannaca, A., Apostolopoulous, I., & Folmer, E. (2014). Headlock: A wearable navigation aid that helps blind cane users traverse large open spaces. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility.
[7]
Jayant, C., Ji, H., White, S., & Bigham, J. P. (2011, October). Supporting blind photography. In The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility (pp. 203--210). ACM.
[8]
Coughlan, J., & Manduchi, R. (2013). Camera-based access to visual information. In Assistive technology for blindness and low vision (2013), 219--246.
[9]
Fusco, G., Tekin, E., Ladner, R. E., & Coughlan, J. M. (2014). Using computer vision to access appliance displays. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility.
[10]
Google Mobile Vision - Text Recognition API. https://developers.google.com/vision/android/text-overview. Last accessed 9/23/18.
[11]
Guo, A., Chen, X. A., Qi, H., White, S., Ghosh, S., Asakawa, C., & Bigham, J. P. (2016). Vizlens: A robust and interactive screen reader for interfaces in the real world. In Proceedings of the 29th ACM Annual Symposium on User Interface Software and Technology .
[12]
Hu, F., Zhu, Z., & Zhang, J. (2014). Mobile panoramic vision for assisting the blind via indexing and localization. In European Conference on Computer Vision.
[13]
ICDAR Focused Scene Text Dataset http://rrc.cvc.uab.es/?ch=2. Last accessed 9/23/18.
[14]
ICDAR Incidental Scene Text Dataset. http://rrc.cvc.uab.es/?ch=4&com=introduction. Last accessed 4/16/18.
[15]
Jaderberg, M., Vedaldi, A., & Zisserman, A. (2014). Deep features for text spotting. In European conference on computer vision.
[16]
Koul, A., Li, A., Haroun, E., Chen, I.W.L., Sharma, S., Bianchet, C., Shaikh, S., Morichère-Matte, S., Lai, B.T., Lam, N.P.K. and Lu, W. (2017). Augmented imaging assistance for visual impairment. U.S. Patent Application 15/242,940.
[17]
Manduchi, R., & Coughlan, J. M. (2014). The last meter: Blind visual guidance to a target. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI).
[18]
Mann, M. (1949). Reading machine spells out loud. Popular Science, February 1949.
[19]
Qin, S., & Manduchi, R. (2017). Cascaded segmentation-detection networks for word-level text spotting. Proc. 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017).
[20]
Shi, B., Bai, X., & Belongie, S. (2017). Detecting oriented text in natural images by linking segments. arXiv preprint arXiv:1703.06520.
[21]
Tian, Z., Huang, W., He, T., He, P., & Qiao, Y. (2016). Detecting text in natural image with connectionist text proposal network. In European Conference on Computer Vision.
[22]
Vázquez, M., & Steinfeld, A. (2012, October). Helping visually impaired users properly aim a camera. In Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility (pp. 95--102). ACM.
[23]
Wang, K., Babenko, B., & Belongie, S. (2011). End-to-end scene text recognition. 2011 IEEE International Conference on Computer Vision (ICCV).
[24]
Yujian, L., & Bo, L. (2007). A normalized Levenshtein distance metric. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6), pp.1091--1095.
[25]
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., & Bai, X. (2016). Multi-oriented text detection with fully convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4159--4167).
[26]
Zhong, Y., Lasecki, W. S., Brady, E., & Bigham, J. P. (2015). Regionspeak: Quick comprehensive spatial descriptions of complex images for blind users. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems.

Cited By

View all
  • (2024)Misfitting With AI: How Blind People Verify and Contest AI ErrorsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675659(1-17)Online publication date: 27-Oct-2024
  • (2024)AccessShare: Co-designing Data Access and Sharing with Blind PeopleProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675612(1-16)Online publication date: 27-Oct-2024
  • (2024)Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLMProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642939(1-20)Online publication date: 11-May-2024
  • Show More Cited By

Index Terms

  1. Scene text access: a comparison of mobile OCR modalities for blind users

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    IUI '19: Proceedings of the 24th International Conference on Intelligent User Interfaces
    March 2019
    713 pages
    ISBN:9781450362726
    DOI:10.1145/3301275
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 March 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. OCR
    2. assistive technologies
    3. text spotting

    Qualifiers

    • Research-article

    Conference

    IUI '19
    Sponsor:

    Acceptance Rates

    IUI '19 Paper Acceptance Rate 71 of 282 submissions, 25%;
    Overall Acceptance Rate 746 of 2,811 submissions, 27%

    Upcoming Conference

    IUI '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)130
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 29 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Misfitting With AI: How Blind People Verify and Contest AI ErrorsProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675659(1-17)Online publication date: 27-Oct-2024
    • (2024)AccessShare: Co-designing Data Access and Sharing with Blind PeopleProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675612(1-16)Online publication date: 27-Oct-2024
    • (2024)Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLMProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642939(1-20)Online publication date: 11-May-2024
    • (2024)Evaluating the Performance of Different Text Detection and Recognition Models for Tyre Text2024 International Conference on ICT for Smart Society (ICISS)10.1109/ICISS62896.2024.10751018(1-8)Online publication date: 4-Sep-2024
    • (2024)Analysis and design framework for the development of indoor scene understanding assistive solutions for the person with visual impairment/blindnessMultimedia Systems10.1007/s00530-024-01350-830:3Online publication date: 18-May-2024
    • (2023)Opportunities for Accessible Virtual Reality Design for Immersive Musical Performances for Blind and Low-Vision PeopleProceedings of the 2023 ACM Symposium on Spatial User Interaction10.1145/3607822.3614540(1-21)Online publication date: 13-Oct-2023
    • (2023)OCR Language Models with Custom VocabulariesDocument Analysis and Recognition - ICDAR 202310.1007/978-3-031-41685-9_7(101-115)Online publication date: 19-Aug-2023
    • (2022)Understanding Emerging Obfuscation Technologies in Visual Description Services for Blind and Low Vision PeopleProceedings of the ACM on Human-Computer Interaction10.1145/35555706:CSCW2(1-33)Online publication date: 11-Nov-2022
    • (2022)Demonstrating Interaction: The Case of Assistive TechnologyACM Transactions on Computer-Human Interaction10.1145/351423629:5(1-37)Online publication date: 20-Oct-2022
    • (2022)AIGuide: Augmented Reality Hand Guidance in a Visual ProstheticACM Transactions on Accessible Computing10.1145/350850115:2(1-32)Online publication date: 19-May-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media