research-article

VizWiz: nearly real-time answers to visual questions

Authors:

Jeffrey P. Bigham,

Chandrika Jayant,

Robert C. Miller,

Aubrey Tatarowicz,

Tom YehAuthors Info & Claims

UIST '10: Proceedings of the 23nd annual ACM symposium on User interface software and technology

Pages 333 - 342

https://doi.org/10.1145/1866029.1866080

Published: 03 October 2010 Publication History

Abstract

The lack of access to visual information like text labels, icons, and colors can cause frustration and decrease independence for blind people. Current access technology uses automatic approaches to address some problems in this space, but the technology is error-prone, limited in scope, and quite expensive. In this paper, we introduce VizWiz, a talking application for mobile phones that offers a new alternative to answering visual questions in nearly real-time - asking multiple people on the web. To support answering questions quickly, we introduce a general approach for intelligently recruiting human workers in advance called quikTurkit so that workers are available when new questions arrive. A field deployment with 11 blind participants illustrates that blind people can effectively use VizWiz to cheaply answer questions in their everyday lives, highlighting issues that automatic approaches will need to address to be useful. Finally, we illustrate the potential of using VizWiz as part of the participatory design of advanced tools by using it to build and evaluate VizWiz::LocateIt, an interactive mobile tool that helps blind people solve general visual search problems.

References

[1]

}}Amazon Mechanical Turk. http://www.mturk.com/. 2010.

[2]

}}Amazon Remembers. http://www.amazon.com/gp/. 2010.

[3]

}}Bay, H., A. Ess, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. Proc. of CVIU 2008, v. 110, 346--359, 2008.

Digital Library

[4]

}}Blind with camera: Changing lives with photography. http://blindwithcamera.org/. 2009.

[5]

}}Chacha. http://www.chacha.com/. 2010.

[6]

}}Eyes-free. http://code.google.com/p/eyes-free/. 2010.

[7]

}}Gifford, S., J. Knox, J. James, and A. Prakash. Introduction to the talking points project. Proc. of ASSETS 2008, 271--272, 2008.

Digital Library

[8]

}}Google Goggles, 2010. http://www.google.com/mobile/goggles/.

[9]

}}Hong, D., S. Kimmel, R. Boehling, N. Camoriano, W. Cardwell, G. Jannaman, A. Purcell, D. Ross, an, E. Russel. Development of a semi-autonomous vehicle operable by the visually-impaired. IEEE Intl. Conf. on Multisensor Fusion and Integration for Intelligent Systems, 539--544, 2008.

[10]

}}Hsueh, P., P. Melville, and V. Sindhwani. Data quality from crowdsourcing: a study of annotation selection criteria. Proc. of the HLT 2009 Workshop on Active Learning for NLP, 27--35, 2009.

Digital Library

[11]

}}Intel reader. http://www.intel.com/healthcare/reader/. 2009.

[12]

}}Kane, S. K., J. P. Bigham, and J. O. Wobbrock. Slide rule: making mobile touch screens accessible to blind people using multi-touch interaction techniques. ASSETS 2008, 73--80, 2008.

Digital Library

[13]

}}Kane, S. K., C. Jayant, J. O. Wobbrock, and R. E. Ladner. Freedom to roam: a study of mobile device adoption and accessibility for people with visual and motor disabilities. ASSETS 2009, 115--122, 2009.

Digital Library

[14]

}}KGB, 2010. http://www.kgb.com.

[15]

}}Kittur A., E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In Proc. of the SIGCHI Conf. on Human Factors in Computing Systems (CHI 2008), pages 453--456, 2008.

Digital Library

[16]

}}16. kNFB reader. knfb Reading Technology, Inc., 2008. http://www.knfbreader.com/.

[17]

}}Knocking live video. ustream, 2010. http://knockinglive.com/.

[18]

}}Ko, J., and C. Kim. Low cost blur image detection and estimation for mobile devices. ICACT 2009, 1605--1610, 2009.

Digital Library

[19]

}}Little, G., L. Chilton, M. Goldman, and R. C. Miller. TurKit: Human Computation Algorithms on Mechanical Turk. UIST 2010, 2010.

Digital Library

[20]

}}Liu, X. A camera phone based currency reader for the visually impaired. ASSETS 2008, 305--306, 2008.

Digital Library

[21]

}}Looktel, 2010. http://www.looktel.com.

[22]

}}Matthews, T., S. Carter, C. Pai, J. Fong, and J. Mankoff. Scribe4me: Evaluating a mobile sound transcription tool for the deaf. UbiComp 2006, 159--176, 2006.

Digital Library

[23]

}}Matthews, T., J. Fong, F. W.-L. Ho-Ching, and J. Mankoff. Evaluating visualizations of non-speech sounds for the deaf. Behavior and Information Technology, 25(4):333--351, 2006.

[24]

}}Miniguide us. http://www.gdp-research.com.au/minig_4.htm/.

[25]

}}Mobile speak screen readers. Code Factory, 2008. http://www.codefactory.es/en/products.asp?id=16.

[26]

}}Ringel-Morris, M., J. Teevan, and K. Panovich. What do people ask their social networks, and why? a survey study of status message q&a behavior. CHI 2010, 1739--1748, 2010.

Digital Library

[27]

}}Power, M. R., Power, D., and Horstmanshof, L. Deaf people communicating via sms, tty, relay service, fax, and computers in australia. Journal of Deaf Studies and Deaf Education, v. 12, i. 1, 2006.

[28]

}}Rangin, H. B. Anatomy of a large-scale social search engine. WWW 2010), 431--440, 2010.

Digital Library

[29]

}}Solona, 2010. http://www.solona.net/.

[30]

}}Sorokin, A., and D. Forsyth. Utility data annotation with amazon mechanical turk. CVPRW 2008, 1--8, 2008.

[31]

}}Takagi, H., S. Kawanaka, M. Kobayashi, T. Itoh, and C. Asakawa. Social accessibility: achieving accessibility through collaborative metadata authoring. ASSETS 2008, 193--200, 2008.

Digital Library

[32]

}}Talking signs. http://www.talkingsigns.com/, 2008.

[33]

}}Testscout- your mobile reader, 2010. http://www.textscout.eu/en/.

[34]

}}Lanigan, P., A. M. Paulos, A. W. Williams, and P. Narasimhan. Trinetra: Assistive Technologies for the Blind. Carnegie Mellon University, CyLab, 2006.

[35]

}}UStream. ustream, 2010. http://www.ustream.tv/.

[36]

}}Voiceover: Macintosh OS X, 2007. http://www.apple.com/accessibility/voiceover/.

[37]

}}voice for android, 2010. www.seeingwithsound.com/android.htm.

[38]

}}von Ahn, L., and L. Dabbish. Labeling images with a computer game. CHI 2004, 319--326, 2004.

Digital Library

[39]

}}Yeh, T., J. J. Lee, and T. Darrell. Photo-based question answering. MM 2008, 389--398, 2008.

Digital Library

Cited By

Güzelci OKaradag I(2024)Revisiting the Key Components of Creativity Through Generative AIMaking Art With Generative AI Tools10.4018/979-8-3693-1950-5.ch001(1-16)Online publication date: 19-Apr-2024
https://doi.org/10.4018/979-8-3693-1950-5.ch001
Hao YYang FHuang HYuan SRangan SRizzo JWang YFang Y(2024)A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental InteractionJournal of Imaging10.3390/jimaging1005010310:5(103)Online publication date: 26-Apr-2024
https://doi.org/10.3390/jimaging10050103
Messaoudi MMenelas BMcheick H(2024)Integration of Smart Cane with Social Media: Design of a New Step Counter Algorithm for CaneIoT10.3390/iot50100095:1(168-186)Online publication date: 14-Mar-2024
https://doi.org/10.3390/iot5010009
Show More Cited By

Index Terms

VizWiz: nearly real-time answers to visual questions
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Visual challenges in the everyday lives of blind people
CHI '13: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

The challenges faced by blind people in their everyday lives are not well understood. In this paper, we report on the findings of a large-scale study of the visual questions that blind people would like to have answered. As part of this year-long study, ...
VizWiz: nearly real-time answers to visual questions
W4A '10: Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A)

Visual information pervades our environment. Vision is used to decide everything from what we want to eat at a restaurant and which bus route to take to whether our clothes match and how long until the milk expires. Individually, the inability to ...
Crowdsourcing subjective fashion advice using VizWiz: challenges and opportunities
ASSETS '12: Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility

Fashion is a language. How we dress signals to others who we are and how we want to be perceived. However, this language is primarily visual, making it inaccessible to people with vision impairments. Someone who is low-vision or completely blind cannot ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '10: Proceedings of the 23nd annual ACM symposium on User interface software and technology

October 2010

476 pages

ISBN:9781450302715

DOI:10.1145/1866029

General Chair:
Ken Perlin
New York University
,
Program Chairs:
Mary Czerwinski
Microsoft Research
,
Rob Miller
MIT CSAIL

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 October 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

UIST '10

Sponsor:

UIST '10: The 23nd Annual ACM Symposium on User Interface Software and Technology

October 3 - 6, 2010

New York, New York, USA

Acceptance Rates

Overall Acceptance Rate 842 of 3,967 submissions, 21%

Upcoming Conference

UIST '24

Sponsor:
sigchi
sigchi

The 37th Annual ACM Symposium on User Interface Software and Technology

October 13 - 16, 2024

Pittsburgh , PA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

478
Total Citations
View Citations
5,240
Total Downloads

Downloads (Last 12 months)513
Downloads (Last 6 weeks)84

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Güzelci OKaradag I(2024)Revisiting the Key Components of Creativity Through Generative AIMaking Art With Generative AI Tools10.4018/979-8-3693-1950-5.ch001(1-16)Online publication date: 19-Apr-2024
https://doi.org/10.4018/979-8-3693-1950-5.ch001
Hao YYang FHuang HYuan SRangan SRizzo JWang YFang Y(2024)A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental InteractionJournal of Imaging10.3390/jimaging1005010310:5(103)Online publication date: 26-Apr-2024
https://doi.org/10.3390/jimaging10050103
Messaoudi MMenelas BMcheick H(2024)Integration of Smart Cane with Social Media: Design of a New Step Counter Algorithm for CaneIoT10.3390/iot50100095:1(168-186)Online publication date: 14-Mar-2024
https://doi.org/10.3390/iot5010009
Emara I(2024)Knocking on doors: The use of blogging sites by visually impaired people in the USA preliminary studyConvergence: The International Journal of Research into New Media Technologies10.1177/13548565241261963Online publication date: 14-Jun-2024
https://doi.org/10.1177/13548565241261963
Zhao KLai RGuo BLiu LHe LZhao Y(2024)AI-Vision: A Three-Layer Accessible Image Exploration System for People with Visual Impairments in ChinaProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785378:3(1-27)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3678537
Kaniwa YKuribayashi MKayukawa SSato DTakagi HAsakawa CMorishima S(2024)ChitChatGuide: Conversational Interaction Using Large Language Models for Assisting People with Visual Impairments to Explore a Shopping MallProceedings of the ACM on Human-Computer Interaction10.1145/36764928:MHCI(1-25)Online publication date: 24-Sep-2024
https://dl.acm.org/doi/10.1145/3676492
Marsh AMilne L(2024)I Don’t Want to Sound Rude, but It’s None of Their Business: Exploring Security and Privacy Concerns around Assistive Technology Use in Educational SettingsACM Transactions on Accessible Computing10.1145/367069017:2(1-30)Online publication date: 5-Jun-2024
https://dl.acm.org/doi/10.1145/3670690
Regimbal JBlum JKuo CCooperstock J(2024)IMAGE: An Open-Source, Extensible Framework for Deploying Accessible Audio and Haptic Renderings of Web GraphicsACM Transactions on Accessible Computing10.1145/366522317:2(1-17)Online publication date: 23-May-2024
https://dl.acm.org/doi/10.1145/3665223
De Marsico MGiacanelli CManganaro CPalma ASantoro D(2024)VQAsk: a multimodal Android GPT-based application to help blind users visualize picturesProceedings of the 2024 International Conference on Advanced Visual Interfaces10.1145/3656650.3656677(1-5)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3656650.3656677
Herskovitz J(2024)DIY Assistive Software: End-User Programming for Personalized Assistive TechnologyACM SIGACCESS Accessibility and Computing10.1145/3654768.3654772(1-1)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1145/3654768.3654772
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents