research-article

Open access

Guiding Novice Web Workers in Making Image Descriptions Using Templates

Authors:

Valerie S. Morash,

Joshua A. Miele,

Steven LandauAuthors Info & Claims

ACM Transactions on Accessible Computing (TACCESS), Volume 7, Issue 4

Article No.: 12, Pages 1 - 21

https://doi.org/10.1145/2764916

Published: 19 November 2015 Publication History

Abstract

This article compares two methods of employing novice Web workers to author descriptions of science, technology, engineering, and mathematics images to make them accessible to individuals with visual and print-reading disabilities. The goal is to identify methods of creating image descriptions that are inexpensive, effective, and follow established accessibility guidelines. The first method explicitly presented the guidelines to the worker, then the worker constructed the image description in an empty text box and table. The second method queried the worker for image information and then used responses to construct a template-based description according to established guidelines. The descriptions generated through queried image description (QID) were more likely to include information on the image category, title, caption, and units. They were also more similar to one another, based on Jaccard distances of q-grams, indicating that their word usage and structure were more standardized. Last, the workers preferred describing images using QID and found the task easier. Therefore, explicit instruction on image-description guidelines is not sufficient to produce quality image descriptions when using novice Web workers. Instead, it is better to provide information about images, then generate descriptions from responses using templates.

References

[1]

Benetech. 2012. POET image description tool. Retrieved October 10, 2015, from http://diagramcenter.org/development/poet.html.

[2]

Benetech and Touch Graphics. 2014. Decision tree: Image sorting tool. Retrieved October 10, 2015, from http://diagramcenter.org/decision-tree.html.

[3]

Tim Berners-Lee, James Hendler, and Ora Lassila. 2001. The Semantic Web. Scientific American 284, 5, 28--37.

[4]

Jeffrey P. Bigham, Chandrika Jayant, Hanjie Ji, Greg Little, Andrew Miller, Robert C. Miller, Robin Miller, Aubrey Tatarowicz, Brandyn White, Samual White, and Tom Yeh. 2010a. VizWiz: Nearly real-time answers to visual questions. In Proceedings of the 23rd Annual Symposium on User Interface Software and Technology. ACM, New York, NY, 333--342.

Digital Library

[5]

Jeffrey P. Bigham, Chandrika Jayant, Andrew Miller, Brandyn White, and Tom Yeh. 2010b. VizWiz: LocateIt-enabling blind people to locate objects in their environment. In Proceedings of the Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’10). IEEE, Los Alamitos, CA, 65--72.

[6]

Jeffrey P. Bigham, Richard E. Ladner, and Yevgen Borodin. 2011. The design of human-powered access technology. In Proceedings of the 13th International Conference on Computers and Accessibility (SIGACCESS’11). ACM, New York, NY, 3--10.

Digital Library

[7]

Rune Haubo Bojesen Christensen, Hye-Seong Lee, and Per Bruun Brockhoff. 2012. Estimation of the Thurstonian model for the 2-AC protocol. Food Quality and Preference 24, 1, 119--128.

[8]

Leonid Boytsov. 2011. Indexing methods for approximate dictionary searching: Comparative analysis. Journal of Experimental Algorithmics 16, 1.

Digital Library

[9]

Sandra Carberry, Stephanie Elzer Schwartz, Kathleen Mccoy, Seniz Demir, Peng Wu, Charles Greenbacker, Daniel Chester, Edward Schwartz, David Oliver, and Priscilla Moraes. 2012. Access to multimodal articles for individuals with sight impairments. ACM Transactions on Interactive Intelligent Systems 2, 4, 21.

Digital Library

[10]

Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, and Rajeev Motwani. 2003. Robust and efficient fuzzy match for online data cleaning. In Proceedings of the International Conference on Management of Data (SIGMOD’03). ACM, New York, NY, 313--324.

Digital Library

[11]

Daniel Dardailler. 1997. The ALT-Server (“An Eye for an Alt”). Retrieved October 10, 2015, from http://www.w3.org/WAI/altserv.htm.

[12]

Seniz Demir, Sandra Carberry, and Kathleen F. McCoy. 2012. Summarizing information graphics textually. Computational Linguistics 38, 3, 527--574.

[13]

Seniz Demir, David Oliver, Edward Schwartz, Stephanie Elzer, Sandra Carberry, Kathleen F. Mccoy, and Daniel Chester. 2010. Interactive SIGHT: Textual access to simple bar charts. New Review of Hypermedia and Multimedia 16, 3, 245--279.

Digital Library

[14]

Seniz Demir, Stephanie Elzer Schwartz, Richard Burns, and Sandra Carberry. 2013. What is being measured in an information graphic? In Computational Linguistics and Intelligent Text Processing. Springer, 501--512.

Digital Library

[15]

Michel Dumontier, Leo Ferres, and Natalia Villanueva-Rosales. 2010. Modeling and querying graphical representations of statistical data. Web Semantics: Science, Services and Agents on the World Wide Web 8, 2, 241--254.

Digital Library

[16]

Stephanie Elzer, Sandra Carberry, Ingrid Zukerman, Daniel Chester, Nancy Green, and Seniz Demir. 2005. A probabilistic framework for recognizing intention in information graphics. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 19. 1042.

Digital Library

[17]

Massimo Fasciano and Guy Lapalme. 1996. Postgraphe: A system for the generation of statistical graphics and text. In Proceedings of the 8th International Workshop on Natural Language Generation (INLG’96). 51--60.

[18]

Yansong Feng and Mirella Lapata. 2010. How many words is a picture worth? Automatic caption generation for news images. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 1239--1249.

Digital Library

[19]

Leo Ferres, Gitte Lindgaard, Livia Sumegi, and Bruce Tsuji. 2013. Evaluating a tool for improving accessibility to charts and graphs. ACM Transactions on Computer-Human Interaction 20, 5, 28.

Digital Library

[20]

Bryan Gould, Trisha O’Connell, and Geoffrey Freed. 2008. Guidelines for describing STEM images. Retrieved October 10, 2015, from http://ncam.wgbh.org/experience_learn/educational_media/stemdx/guidelines.

[21]

Chandrika Jayant, Matt Renzelmann, Dana Wen, Satria Krisnandi, Richard Ladner, and Dan Comden. 2007. Automated tactile graphics translation: In the field. In Proceedings of the 9th International Conference on Computers and Accessibility (SIGACCESS’07). ACM, New York, NY, 75--82.

Digital Library

[22]

Geoffrey Keppel and Thomas D. Wickens. 2004. Design and Analysis: A Researcher’s Handbook (4 ed.). Pearson Education, Upper Saddle River, NJ.

[23]

Richard E. Ladner, Melody Y. Ivory, Rajesh Rao, Sheryl Burgstahler, Dan Comden, Sangyun Hahn, Matthew Renzelmann, Satria Krisnandi, Mahalakshmi Ramasamy, Beverly Slabosky, Andrew Martin, Amelia Lacenski, Stuart Olsen, and Dmitri Groce. 2005. Automating tactile graphics translation. In Proceedings of the 7th International Conference on Computers and Accessibility (SIGACCESS’05). ACM, New York, NY, 150--157.

Digital Library

[24]

Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th Annual Symposium on User Interface Software and Technology. ACM, New York, NY, 23--34.

Digital Library

[25]

LimeSurvey Project Team/Carsten Schmitz. 2012. LimeSurvey: An Open Source Survey Tool. LimeSurvey Project, Hamburg, Germany. http://www.limesurvey.org.

[26]

Kathleen F. McCoy, Sandra Carberry, Tom Roper, and Nancy Green. 2001. Towards generating textual summaries of graphs. In Proceedings of the International Conference on Universal Access in Human-Computer Interaction. 695--699.

[27]

R Core Team. 2013. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org.

[28]

Daisuke Sato, Masatomo Kobayashi, Hironobu Takagi, and Chieko Asakawa. 2010. Social accessibility: The challenge of improving Web accessibility through collaboration. In Proceedings of the 2010 International Cross Disciplinary Conference on Web Accessibility (W4A’10). ACM, New York, NY, 28.

Digital Library

[29]

Hironobu Takagi, Susumu Harada, Daisuke Sato, and Chieko Asakawa. 2013. Lessons learned from crowd accessibility services. In Human-Computer Interaction—INTERACT 2013. Springer, 587--604.

[30]

Esko Ukkonen. 1992. Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92, 1, 191--211.

Digital Library

[31]

Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2014. Show and tell: A neural image caption generator. arXiv preprint arXiv:1411.4555.

[32]

Luis Von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 319--326.

Digital Library

[33]

Luis Von Ahn, Shiry Ginosar, Mihir Kedia, Ruoran Liu, and Manuel Blum. 2006. Improving accessibility of the Web with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, New York, NY, 79--82.

Digital Library

[34]

Peng Wu, Sandra Carberry, Stephanie Elzer, and Daniel Chester. 2010. Recognizing the intended message of line graphs. In Diagrammatic Representation and Inference. Springer, 220--234.

Digital Library

Cited By

Stack Whitney KPerrone JBahlai C(2025)Open access journals lack image accessibility guidelinesQuantitative Science Studies10.1162/qss_a_003386(46-62)Online publication date: 19-Feb-2025
https://doi.org/10.1162/qss_a_00338
Jiang CKuang EFan M(2025)How Can Haptic Feedback Assist People with Blind and Low Vision (BLV): A Systematic Literature ReviewACM Transactions on Accessible Computing10.1145/371193118:1(1-57)Online publication date: 13-Jan-2025
https://dl.acm.org/doi/10.1145/3711931
Leotta MRibaudo M(2024)Evaluating the Effectiveness of STEM Images CaptioningProceedings of the 21st International Web for All Conference10.1145/3677846.3677863(150-159)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3677846.3677863
Show More Cited By

Index Terms

Guiding Novice Web Workers in Making Image Descriptions Using Templates
1. Human-centered computing
  1. Human computer interaction (HCI)
2. Social and professional topics

Recommendations

Image Descriptions' Limitations for People with Visual Impairments: Where Are We and Where Are We Going?
IHC '21: Proceedings of the XX Brazilian Symposium on Human Factors in Computing Systems

Image descriptions aim to transcribe the visual content and are essential for people who do not have eyesight. Such image descriptions are generated manually or by Artificial Intelligence (AI) models. Despite its relevance, the emergence of automatic ...
Going Beyond One-Size-Fits-All Image Descriptions to Satisfy the Information Wants of People Who are Blind or Have Low Vision
ASSETS '21: Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility

Image descriptions are how people who are blind or have low vision (BLV) access information depicted within images. To our knowledge, no prior work has examined how a description for an image should be designed for different scenarios in which users ...
"That's in the eye of the beholder": Layers of Interpretation in Image Descriptions for Fictional Representations of People with Disabilities
ASSETS '21: Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility

Image accessibility is an established research area in Accessible Computing and a key area of digital accessibility for blind and low vision (BLV) people worldwide. Recent work has delved deeper into the question of how image descriptions should properly ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Accessible Computing

ACM Transactions on Accessible Computing Volume 7, Issue 4

November 2015

77 pages

ISSN:1936-7228

EISSN:1936-7236

DOI:10.1145/2847216

Editors:
Matt Huenerfauth
Rochester Institute of Technology, USA
,
Kathleen F. McCoy
University of Delaware, USA

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 November 2015

Accepted: 01 April 2015

Revised: 01 April 2015

Received: 01 September 2014

Published in TACCESS Volume 7, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Office of Special Education Programs
Department of Education
Cooperative Agreement
Benetechs DIAGRAM Center initiative

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

43
Total Citations
View Citations
1,357
Total Downloads

Downloads (Last 12 months)210
Downloads (Last 6 weeks)24

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Stack Whitney KPerrone JBahlai C(2025)Open access journals lack image accessibility guidelinesQuantitative Science Studies10.1162/qss_a_003386(46-62)Online publication date: 19-Feb-2025
https://doi.org/10.1162/qss_a_00338
Jiang CKuang EFan M(2025)How Can Haptic Feedback Assist People with Blind and Low Vision (BLV): A Systematic Literature ReviewACM Transactions on Accessible Computing10.1145/371193118:1(1-57)Online publication date: 13-Jan-2025
https://dl.acm.org/doi/10.1145/3711931
Leotta MRibaudo M(2024)Evaluating the Effectiveness of STEM Images CaptioningProceedings of the 21st International Web for All Conference10.1145/3677846.3677863(150-159)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3677846.3677863
Seo JKamath SZeidieh AVenkatesh SMcCurry S(2024)MAIDR Meets AI: Exploring Multimodal LLM-Based Data Visualization Interpretation by and with Blind and Low-Vision UsersProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675660(1-31)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675660
Gubbi Mohanbabu APavel A(2024)Context-Aware Image Descriptions for Web AccessibilityProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675658(1-17)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3675658
Singh NWang LBragg J(2024)FigurA11y: AI Assistance for Writing Scientific Alt TextProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645212(886-906)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645212
Ko HJeon HPark GKim DKim NKim JSeo J(2024)Natural Language Dataset Generation Framework for Visualizations Powered by Large Language ModelsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642943(1-22)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642943
Seo JXia YLee BMccurry SYam Y(2024)MAIDR: Making Statistical Visualizations Accessible with Multimodal Data RepresentationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642730(1-22)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642730
Jiang CFan YXie JKuang EZhang KFan M(2024)Designing Unobtrusive Modulated Electrotactile Feedback on Fingertip Edge to Assist Blind and Low Vision (BLV) People in Comprehending ChartsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642546(1-20)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642546
Jiang LJung CPhutane MStangl AAzenkot S(2024)“It’s Kind of Context Dependent”: Understanding Blind and Low Vision People’s Video Accessibility Preferences Across Viewing ScenariosProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642238(1-20)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642238
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents