Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3663548.3675605acmconferencesArticle/Chapter ViewAbstractPublication PagesassetsConference Proceedingsconference-collections
research-article

Enabling Uniform Computer Interaction Experience for Blind Users through Large Language Models

Published: 27 October 2024 Publication History

Abstract

Blind individuals, who by necessity depend on screen readers to interact with computers, face considerable challenges in navigating the diverse and complex graphical user interfaces of different computer applications. The heterogeneity of various application interfaces often requires blind users to remember different keyboard combinations and navigation methods to use each application effectively. To alleviate this significant interaction burden imposed by heterogeneous application interfaces, we present Savant, a novel assistive technology powered by large language models (LLMs) that allows blind screen reader users to interact uniformly with any application interface through natural language. Novelly, Savant can automate a series of tedious screen reader actions on the control elements of the application when prompted by a natural language command from the user. These commands can be flexible in the sense that the user is not strictly required to specify the exact names of the control elements in the command. A user study evaluation of Savant with 11 blind participants demonstrated significant improvements in interaction efficiency and usability compared to current practices.

References

[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
[2]
Amazon. 2023. Amazon Alexa Voice Assistant. https://www.amazon.com/b?ie=UTF8&node=21576558011.
[3]
Apple. 2023. Siri Voice Assistant. https://www.apple.com/siri/.
[4]
Vikas Ashok, Yevgen Borodin, Yury Puzis, and IV Ramakrishnan. 2015. Capti-speak: a speech-enabled web screen reader. In Proceedings of the 12th International Web for All Conference. 1–10.
[5]
Vikas Ashok, Yury Puzis, Yevgen Borodin, and IV Ramakrishnan. 2017. Web screen reading automation assistance using semantic abstraction. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. 407–418.
[6]
Vikas Ashok, Mohan Sunkara, and Satwik Ram. 2023. Assistive Technologies for People with Visual Impairments. (2023).
[7]
Vikas Ganjigunte Ashok. 2018. Non-Visual Web Browsing: From Accessibility with Screen Readers to Usability with Assistants. Ph. D. Dissertation. State University of New York at Stony Brook.
[8]
Ali Selman Aydin, Shirin Feiz, Vikas Ashok, and IV Ramakrishnan. 2020. Sail: Saliency-driven injection of aria landmarks. In Proceedings of the 25th international conference on intelligent user interfaces. 111–115.
[9]
Mark S Baldwin, Gillian R Hayes, Oliver L Haimson, Jennifer Mankoff, and Scott E Hudson. 2017. The tangible desktop: a multimodal approach to nonvisual computing. ACM Transactions on Accessible Computing (TACCESS) 10, 3 (2017), 1–28.
[10]
Syed Masum Billah, Vikas Ashok, Donald E Porter, and IV Ramakrishnan. 2017. Ubiquitous accessibility for people with visual impairments: Are we there yet?. In Proceedings of the 2017 chi conference on human factors in computing systems. 5862–5868.
[11]
Syed Masum Billah, Yu-Jung Ko, Vikas Ashok, Xiaojun Bi, and IV Ramakrishnan. 2019. Accessible gesture typing for non-visual text entry on smartphones. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
[12]
Yevgen Borodin, Jeffrey P Bigham, Glenn Dausch, and IV Ramakrishnan. 2010. More than meets the eye: a survey of screen-reader browsing strategies. In Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A). 1–10.
[13]
Maria Claudia Buzzi, Marina Buzzi, Barbara Leporini, and Giulio Mori. 2012. Designing e-learning collaborative tools for blind people. E-Learning-Long-Distance and Lifelong Perspectives (2012), 125–144.
[14]
Iyad Abu Doush and Enrico Pontelli. 2013. Non-visual navigation of spreadsheets: Enhancing accessibility of Microsoft Excel™. Universal access in the information society 12 (2013), 143–159.
[15]
Javedul Ferdous, Hae-Na Lee, Sampath Jayarathna, and Vikas Ashok. 2022. InSupport: Proxy interface for enabling efficient non-visual interaction with web data records. In 27th International Conference on Intelligent User Interfaces. 49–62.
[16]
Javedul Ferdous, Hae-Na Lee, Sampath Jayarathna, and Vikas Ashok. 2023. Enabling Efficient Web Data-Record Interaction for People with Visual Impairments via Proxy Interfaces. ACM Transactions on Interactive Intelligent Systems 13, 3 (2023), 1–27.
[17]
Prathik Gadde and Davide Bolchini. 2014. From screen reading to aural glancing: towards instant access to key page sections. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility. 67–74.
[18]
Google. 2023. Google Voice Assistant. https://assistant.google.com/.
[19]
Simon Harper and Alex Q Chen. 2012. Web accessibility guidelines: A lesson from the evolving Web. World Wide Web 15 (2012), 61–88.
[20]
Sandra G Hart and Lowell E Staveland. 1988. Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Advances in psychology. Vol. 52. Elsevier, 139–183.
[21]
Md Touhidul Islam, Donald E Porter, and Syed Masum Billah. 2023. A Probabilistic Model and Metrics for Estimating Perceived Accessibility of Desktop Applications in Keystroke-Based Non-Visual Interactions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
[22]
Yu-Jung Ko, Aini Putkonen, Ali Selman Aydin, Shirin Feiz, Yuheng Wang, Vikas Ashok, IV Ramakrishnan, Antti Oulasvirta, and Xiaojun Bi. 2021. Modeling Gliding-based Target Selection for Blind Touchscreen Users. In Proceedings of the 23rd International Conference on Mobile Human-Computer Interaction. 1–14.
[23]
Satwik Ram Kodandaram, Mohan Sunkara, Sampath Jayarathna, and Vikas Ashok. 2023. Detecting Deceptive Dark-Pattern Web Advertisements for Blind Screen-Reader Users. Journal of Imaging 9, 11 (2023), 239.
[24]
Hae-Na Lee, Vikas Ashok, and IV Ramakrishnan. 2020. Repurposing visual input modalities for blind users: a case study of word processors. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2714–2721.
[25]
Hae-Na Lee, Vikas Ashok, and IV Ramakrishnan. 2020. Rotate-and-Press: A Non-visual Alternative to Point-and-Click?. In International Conference on Human-Computer Interaction. Springer, 291–305.
[26]
H. N. Lee, V. Ashok, and I. V. Ramakrishnan. 2020. Repurposing Visual Input Modalities for Blind Users: A Case Study of Word Processors. In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). 2714–2721. https://doi.org/10.1109/SMC42975.2020.9283015
[27]
Barbara Leporini, Maria Claudia Buzzi, and Marina Buzzi. 2012. Interacting with mobile devices via VoiceOver: usability and accessibility issues. In Proceedings of the 24th Australian computer-human interaction conference. 339–348.
[28]
Zhi Li, Yu-Jung Ko, Aini Putkonen, Shirin Feiz, Vikas Ashok, IV Ramakrishnan, Antti Oulasvirta, and Xiaojun Bi. 2023. Modeling touch-based menu selection performance of blind users via reinforcement learning. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–18.
[29]
Zhi Li, Maozheng Zhao, Dibyendu Das, Hang Zhao, Yan Ma, Wanyu Liu, Michel Beaudouin-Lafon, Fusheng Wang, Iv Ramakrishnan, and Xiaojun Bi. 2022. Select or Suggest? Reinforcement Learning-based Method for High-Accuracy Target Selection on Touchscreens. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–15.
[30]
Mei Miao, Hoai Anh Pham, Jens Friebe, and Gerhard Weber. 2016. Contrasting usability evaluation methods with blind users. Universal access in the Information Society 15 (2016), 63–76.
[31]
Microsoft. 2021. Microsoft Office guide for people who are blind or low-vision. https://docs.microsoft.com/en-us/windows/win32/winauto/entry-uiauto-win32.
[32]
Microsoft. 2021. Microsoft UI Automation. https://docs.microsoft.com/en-us/windows/win32/winauto/entry-uiauto-win32.
[33]
Microsoft. 2023. Cortana Voice Assistant. https://www.microsoft.com/en-us/cortana.
[34]
Lourdes Morales, Sonia M Arteaga, and Sri Kurniawan. 2013. Design guidelines of a tool to help blind authors independently format their word documents. In CHI’13 Extended Abstracts on Human Factors in Computing Systems. 31–36.
[35]
Giulio Mori, Maria Claudia Buzzi, Marina Buzzi, Barbara Leporini, and Victor MR Penichet. 2011. Making “Google Docs” user interface more accessible for blind people. In Advances in New Technologies, Interactive Interfaces, and Communicability: First International Conference, ADNTIIC 2010, Huerta Grande, Argentina, October 20-22, 2010, Revised Selected Papers 1. Springer, 20–29.
[36]
Mahika Phutane, Crescentia Jung, Niu Chen, and Shiri Azenkot. 2023. Speaking with My Screen Reader: Using Audio Fictions to Explore Conversational Access to Interfaces. In Proceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility. 1–18.
[37]
Johnny Saldaña. 2021. The coding manual for qualitative researchers. (2021).
[38]
John Gerard Schoeberlein and Yuanqiong Wang. 2014. Usability Evaluation of an Accessible Collaborative Writing Prototype for Blind Users.Journal of Usability Studies 10, 1 (2014).
[39]
Kristen Shinohara and Josh Tenenberg. 2007. Observing Sara: a case study of a blind person’s interactions with technology. In Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility. 171–178.
[40]
Justina Sidlauskiene, Yannick Joye, and Vilte Auruskeviciene. 2023. AI-based chatbots in conversational commerce and their effects on product and price perceptions. Electronic Markets 33, 1 (2023), 24.
[41]
Reeta Singh. 2012. Blind handicapped vs. technology: How do blind people use computers. International Journal of Scientific & Engineering Research 3, 4 (2012), 1–7.
[42]
Mohan Sunkara, Sandeep Kalari, Sampath Jayarathna, and Vikas Ashok. 2023. Assessing the Accessibility of Web Archives. In 2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL). IEEE, 253–255.
[43]
Mohan Sunkara, Yash Prakash, Hae-Na Lee, Sampath Jayarathna, and Vikas Ashok. 2023. Enabling Customization of Discussion Forums for Blind Users. Proceedings of the ACM on Human-Computer Interaction 7, EICS (2023), 1–20.
[44]
Utku Uckun, Ali Selman Aydin, Vikas Ashok, and IV Ramakrishnan. 2020. Breaking the accessibility barrier in non-visual interaction with pdf forms. Proceedings of the ACM on Human-computer Interaction 4, EICS (2020), 1–16.
[45]
Utku Uckun, Ali Selman Aydin, Vikas Ashok, and IV Ramakrishnan. 2020. Ontology-Driven Transformations for PDF Form Accessibility. In Proceedings of the 22nd International ACM SIGACCESS Conference on Computers and Accessibility. 1–3.
[46]
Utku Uckun, Rohan Tumkur Suresh, Md Javedul Ferdous, Xiaojun Bi, I.V. Ramakrishnan, and Vikas Ashok. 2022. Taming User-Interface Heterogeneity with Uniform Overlays for Blind Users. In Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization(UMAP ’22). Association for Computing Machinery, New York, NY, USA, 212–222. https://doi.org/10.1145/3503252.3531317
[47]
Mirza Muhammad Waqar, Muhammad Aslam, and Muhammad Farhan. 2019. An intelligent and interactive interface to support symmetrical collaborative educational writing among visually impaired and sighted users. Symmetry 11, 2 (2019), 238.
[48]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
[49]
Brian Wentz, Harry Hochheiser, and Jonathan Lazar. 2013. A survey of blind users on the usability of email applications. Universal access in the information society 12 (2013), 327–336.
[50]
Brian Wentz and Jonathan Lazar. 2011. Usability evaluation of email applications by blind users. Journal of Usability Studies 6, 2 (2011), 75–89.
[51]
Yichi Zhang and Joyce Chai. 2021. Hierarchical task learning from language instructions with unified transformers and self-monitoring. arXiv preprint arXiv:2106.03427 (2021).
[52]
Yu Zhong, TV Raman, Casey Burkhardt, Fadi Biadsy, and Jeffrey P Bigham. 2014. JustSpeak: enabling universal voice control on Android. In Proceedings of the 11th Web for All Conference. 1–4.
[53]
Hong Zou and Jutta Treviranus. 2015. ChartMaster: A tool for interacting with stock market charts using a screen reader. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility. 107–116.

Index Terms

  1. Enabling Uniform Computer Interaction Experience for Blind Users through Large Language Models

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ASSETS '24: Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility
      October 2024
      1475 pages
      ISBN:9798400706776
      DOI:10.1145/3663548
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Accessibility
      2. Assistive technology
      3. Blind users
      4. Computer Interaction
      5. Large language models (LLMs)
      6. Uniform interaction

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      ASSETS '24
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 436 of 1,556 submissions, 28%

      Upcoming Conference

      ASSETS '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 165
        Total Downloads
      • Downloads (Last 12 months)165
      • Downloads (Last 6 weeks)43
      Reflects downloads up to 10 Feb 2025

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media