research-article

Open access

How the Alt Text Gets Made: What Roles and Processes of Alt Text Creation Can Teach Us About Inclusive Imagery

Authors:

Emory J. Edwards,

Michael Gilbert,

Emily Blank,

Stacy M. BranhamAuthors Info & Claims

ACM Transactions on Accessible Computing, Volume 16, Issue 2

Article No.: 18, Pages 1 - 28

https://doi.org/10.1145/3587469

Published: 13 July 2023 Publication History

PDF eReader

Abstract

Many studies within Accessible Computing have investigated image accessibility, from what should be included in alternative text (alt text), to possible automated, human-in-the-loop, or crowdsourced approaches to alt text generation. However, the processes through which practitioners make alt text in situ have rarely been discussed. Through interviews with three artists and three accessibility practitioners working with Google, as well as 25 end users, we identify four processes of alt text creation used by this company—The User-Evaluation Process, The Lone Writer Process, The Team Write-A-Thon Process, and The Artist-Writer Process—and unpack their potential strengths and weaknesses as they relate to access and inclusive imagery. We conclude with a discussion of what alt text researchers and industry professionals can learn from considering alt text in situ, including opportunities to support user feedback, cross-contributor consistency, and organizational or technical changes to production processes.

1 Introduction

1.8 billion digital images were uploaded and shared every day in 2014 [Eveleth 2015]. The figure is likely much higher in 2023, given the number of internet users continues to increase. And yet, only a fraction of the images being uploaded to social media sites are accessible to people with little or no sight [Guinness et al. 2018]. For screen reader users, visual information online needs to be rendered into alternative text (alt text) that can be included with an image, also sometimes known as an image description. With the relatively high number of papers within the accessible computing space focusing on accessibility for people who are blind or low vision (BLV)¹ [Mack et al. 2021b], it is not surprising that many methods for producing alt text have been explored. Some of these methods include automated means (e.g., Carvalho and Freitas [2015]; Wu et al. [2017]), crowdsourcing (e.g., Bigham et al. [2010]; Gleason et al. [2020]; Salisbury et al. [2018]), friendsourcing or microvolunteering (e.g., Brady et al. [2013b]; Brady et al. [2015]) and other technological means to support user generated alt text (e.g., Mack et al. [2021a]; Williams et al. [2022]).

This research into various methods of alt text creation has been extremely valuable. However, almost all of these studies of alt text creation have been interventions. There is a deficit of research investigating current methods that are used to produce alt text. It is impossible to fully fix a problem, in this case a lack of alt text, until we understand the processes which produce the problem. Williams et al. have written the only paper—to the authors’ knowledge—to focus directly on workflows that produce alt text [Williams et al. 2022]. They focus on computing publications and figures being described in them, but there are still many contexts of alt text creation that they could not cover. By exploring current in situ production processes that lead to images being made accessible, we can begin to understand where those current processes break down or could be made more inclusive.

For our research, we focus on alt text production processes in the context of a large technology company. Investigating how accessibility is done by industry practitioners, rather than focusing only on idealized research conditions, is an ongoing area of study. This research area has included investigations of how accessibility guidelines fail to account for needs of developers [Law et al. 2010], organizational factors that affect accessible outward-facing experiences [Giannoumis and Nordli 2020], and the design of tools to help bridge the gap between accessibility issues and the designers and developers in charge of fixing them [Gaggi and Pederiva 2021; Scott et al. 2015]. Such papers allow researchers to better understand how accessibility is done at scale and how to better support the industrial or organizational contexts which produce far more technology than any single research study can.

Our own prior work involved collaborating with an artist, designers, and end users to perform a user evaluation and co-design alt text for inclusive images in the Avatar Project. The Avatar Project was a diverse set of representations of users, including users with disabilities, created by Google for internal use by designers and developers. For this type of imagery—that is, images that are both accessible to people with disabilities, including screen reader users, and depict disabilities and other identities in an inclusive manner—we use the term “accessible and inclusive imagery” or simply “inclusive imagery”. Because our collaboration focused on the Avatar Project, we can best speak to alt text creation for inclusive imagery. However, beyond Google, accessible and inclusive imagery is a growing area of interest for many companies, activists, and researchers alike.

While co-designing the alt text for the Avatar Project, we found through discussions with our collaborators that the alt text creation process we employed as researchers was uncommon, and that many other alt text creation methods were used in everyday industry practice by accessibility practitioners within Google. This study therefore addresses the question: how are accessible and inclusive images created and how are their attributes negotiated by various actors, including image creators, alt text creators, and end users? To answer this research question we first explored in depth the perspectives of end users through 28 interviews and focus groups where users with disabilities iteratively co-designed image descriptions for fictionalized representations of disability. We then expanded on these data by interviewing six professionals to get the perspective of two new stakeholder groups, image creators (i.e., artists) and alt text creators (i.e., accessibility practitioners).

This journal article is an expanded version of a paper presented at ASSETS 2021 [Edwards et al. 2021]. We summarize the methods, findings, and conclusions from this prior work in Section 3 (Study 1: User Evaluation of Inclusive Imagery and Alt Text Co-Design)–to demonstrate what we are calling The User-Evaluation Process. We describe and contrast, in Section 5.1, The User-Evaluation Process and three more types of alt text production processes revealed via new analysis of six interviews with artists and accessibility practitioners at a large technology company. In Section 5.2, we include three new themes arising from these data: use cases and intentionality, pragmatic considerations versus individual intuition, and relative importance of words, images, and alt text. This paper thereby expands on lingering questions from Edwards et al. [2021] and we end with a discussion of what can be learned about inclusive imagery, alt text as user experience, balancing stakeholders, and alt text production practices.

2 Related Work

We summarize here three areas of literature relevant to our study: current challenges to inclusive imagery, research on image accessibility, and research on accessibility in practice. Inclusive imagery, as a project, must grapple with the history of limited and stereotypical depictions of disability and other marginalized identities. Image accessibility overlaps with inclusive depictions of identity because images that depict disability are included in the overall need to make sure digital images are perceivable by screen reader users. We review existing strategies that have been explored for addressing widespread image inaccessibility. Lastly, we summarize the body of literature focusing on how accessibility is actually enacted by designers and developers in industry contexts.

2.1 Inclusive Imagery

As of 2014, the Census Bureau estimated that almost 30% of the U.S. population—over 85 million people—had a disability [Taylor 2018]. Yet, people with disabilities are underrepresented across many forms of mainstream media. For example, a survey of top-grossing movies found that out of over 4,000 named characters or characters with speaking roles, only 1.6% had a disability [Smith et al. 2020]. Similarly, advertisements rarely include disability, and when they do, they depict people with disabilities in highly stereotyped ways [Davis 2013]. As disabilities become more recognized by mainstream, non-disabled content creation structures, representations of disability are becoming more common. This comes with increased responsibility to people with disabilities to make sure the media is itself accessible. If disabilities are represented in images, for example, it is even more important to make sure those images have alt text. It also means the representations of the disability themselves must not be stereotypical or reductive. Thus, inclusive imagery must both depict diversity, and do so in respectful ways.

Within Media Studies and Disability Studies, there are discussions of how people with disabilities are written about in print news and depicted in advertisements [Haller 2010; Riley 2005] and how fictional characters with disabilities are portrayed in film, television, and literature [Davis 2013; Garland-Thomson 2017]. Within discussions of describing disabilities textually in the most inclusive way possible, there are a variety of resources for both general audiences (e.g., Wikipedia [2023]; APA [2022]) and HCI-specific audiences (e.g., Babinszki et al. [2019]; Hanson et al. [2015]). Existing studies of representations of disability in media largely agree on two points. First, representations of disability have historically been limited [Davis 2013]. Secondly, what representations of disability do exist in media are often one-dimensional, stereotypical, patronizing, or otherwise offensive [Briant et al. 2013; Ellis et al. 2013; Houston 2019]. Structural ableism not only causes the marginalization of disabled people in the workplace and educational sphere, but also leads to ableist depictions of disability in media. There are however increasing examples of disability activists increasing the visibility of or adding depth to conversations around depictions of disability, for example in the documentary Crip Camp [Newnham and Lebrecht 2020], videos by Haben Girma [Girma 2023], or in podcasts like Reid My Mind [Reid 2023]. The important topic of making sure media depicting disability is also itself accessible has been gaining more attention as well. Researchers have studied multimedia accessibility either in the form of professionally authored subtitles or audio descriptions for movies and TV [Duarte and Fonseca 2019], or multimedia approaches to making user generated content accessible (e.g., Kim et al. [2020]; dos Santos Marques et al. [2017]).

2.2 Image Accessibility

Screen reader users require digital images to be rendered into text so that it can be accessed via synthesized voice or Braille display. Image accessibility is therefore the process of creating and associating textual descriptions of visual information with the digital images they refer to. There are several terms used for different types of text related to image accessibility [Morris et al. 2018]—including alt text, image descriptions, long descriptions, and captions. These terms may overlap in some cases or have clear delineations based on differing definitions or familiarity with image accessibility. For the sake of this paper, we use the terms image description and alt text interchangeably.

Image accessibility is an important aspect of overall digital accessibility and is listed first in W3C's list of website accessibility requirements [W3C 2019]. The prevalence of accessible images has overall improved in the past decade, depending on sites surveyed. Image description prevalence on popular websites increased from less than 40% in 2006 to 72% in 2018 [Bigham et al. 2006; Guinness et al. 2018]. Although, some estimates put the number of images lacking alt text as of 2022 at closer to 55% [WebAIM 2022]. And in some cases prevalence is much lower, with sites primarily comprised of user generated content such as Twitter found to have alt text on only 0.1% of images [Gleason et al. 2020]. This leads to a large gap between the experiences and information available to sighted versus screen reader users [Aizpurua et al. 2016], with many screen reader users lacking important visual information on a variety of topics, while sighted internet users often are not even aware of image accessibility needs [Seixas Pereira et al. 2022]. To address this pervasive accessibility issue, the Accessible Computing field has explored many potential solutions, ranging from fully automated to highly social methods for creating and propagating image descriptions. Several approaches leverage computer vision or other automatic methods for automatically labelling images (e.g., Carvalho and Freitas [2015]; Guinness et al. [2018]; Gupta and Mannem [2012]; Wu et al. [2017]). While these deliver scalability, research shows that automatic generation methods are not as high quality as human-generated descriptions [Bennett et al. 2021; Gleason et al. 2020; MacLeod et al. 2017], making automatic approaches unideal depending on circumstances. Human-generated image description interventions have included studies of crowdsourcing [Bigham et al. 2010; Brady et al. 2013a; Salisbury et al. 2017], friendsourcing [Brady et al. 2013b; Mathur and Brady 2018], and social microvolunteering [Brady et al. 2015]. The primary concerns with human-generated methods are training writers properly to produce high quality alt text and dealing with lag times between requests for information and responses. Some researchers have attempted to combine multiple methods to achieve high quality results [Gleason et al. 2020].

To aid image description creation, there are numerous guidelines detailing how to write image descriptions. Some guidelines are based on feedback from BLV and/or screen reader users [Bennett et al. 2021; MacLeod et al. 2017; Petrie et al. 2005; Power 2012; Stangl et al. 2020]. There are also guidelines from a variety of non-computing and non-academic sources, including museums [Cooper Hewitt 2019], non-profit accessibility initiatives [DIAGRAM Center 2019], and individuals [Chen 2020; Lewis 2018]. Some of these guidelines deal with issues around describing illustrations [DIAGRAM Center 2019] or the identities of photograph subjects [Chen 2020], but none discuss artistic depictions of disability in depth [Cooper Hewitt 2019] or depictions of users in design contexts, both of which are relevant to The Avatar Project images that we conducted our user evaluation on.

Accessibility researchers have in recent years led investigations into the complexities of writing image descriptions for screen reader users in different contexts [Stangl et al. 2020] and with different marginalized identities [Bennett et al. 2021]. Stangl et al. [2020] noted that image descriptions must be tailored to different contexts of use. They also indicated that depending on the purpose behind posting, the image descriptions of subjects’ identities may need to be adjusted. Bennett et al. [2021] focused on the problem of describing marginalized identities in their study of screen reader users who also had additional marginalized identities. They investigated the ways their participants used non-visual methods for understanding identities of people online and the ways they went about describing themselves in images they posted. Our paper similarly includes interviews with people with disabilities on the topic of describing identity in images, however we define our alt text consumers as disabled users who may be represented in the images rather than only screen reader users. We also contrast the in-depth research process to create image descriptions with image accessibility processes used by practitioners in industrial contexts.

2.3 Accessibility in Practice

The gap between HCI researcher communities and HCI practitioners has been investigated from many different angles (e.g., Buie et al. [2010]; Colusso et al. [2017]; Dalsgaard and Dindler [2014]; Gray et al. [2014]). Within the larger body of literature on HCI practitioners, there are an increasing number of studies looking at accessibility in practice. Some of these involve evaluating existing guidelines and accessibility evaluations for the challenges they pose to designers and developers who need to put them into practice (e.g., Colwell and Petrie [2001]; Law et al. [2006]; Steen-Hansen and Fagernes [2016]). Evaluations of existing guidelines often conclude that considering designers and developers are the “users” of the guidelines—that is the people who need to understand and put into practice the information contained in the guidelines—their pragmatic needs and perspectives are not properly considered in the design of accessibility guidelines. This has led to several studies focused on developing tools to bridge the gap between accessibility guidelines or evaluations and the practitioners who have to instrumentalize them (e.g., Gaggi and Pederiva [2021]; Scott et al. [2015]). Other studies focus on pedagogical considerations for students and early career HCI professionals and how they learn or understand accessibility (e.g., Carter and Fourne [2007]; Gray [2014]). There are many studies that directly consult industrial professionals to understand their approach to or understanding of accessibility, however almost all of them are survey-based [Freire et al. 2008; Inal et al. 2020; Durdu and Yerlikaya, 2020; Porter and Kientz 2013; Putnam et al. 2012; Trewin et al. 2010; Yesilada et al. 2012]. Interview studies or qualitative investigations of accessibility in practice are few and far between (exceptions include [Azenkot et al. 2021; Giannoumis and Nordli 2020; Vanderheiden and Tobias 2000]).

Among studies of accessibility, few research papers have focused on how image accessibility is enacted as part of industrial accessibility practices. Giannoumis and Nordli briefly describe the basic knowledge of image accessibility their participants demonstrated and how it was requested and delegated in the context of their case study of a Norwegian broadcasting company [Giannoumis and Nordli 2020]. Beyond industry contexts, most studies of image accessibility focus on interventions—either tools or new processes—into alt text creation, rather than investigating the existing practices in place. Few studies have discussed the current processes that users, researchers or disabled people undergo to create alt text (exceptions include [Bennett et al. 2021; Gleason et al. 2019; Mathur and Brady 2018; Williams et al. 2022]). Notably, [Williams et al. 2022] describe the workflows that computing researchers used when creating alt text for their publications, noting issues with complicated figures, tight deadlines, and technical tools used to insert alt text. Given the small body of research discussing alt text creation processes and the even smaller overlap between studies of industrial accessibility and image accessibility in practice, our paper seeks to contribute descriptions of four initial image accessibility creation models connected to industrial accessibility practices and inclusive imagery.

3 Study 1: User Evaluation of Inclusive Imagery and Alt Text Co-Design

This section of the paper summarizes the methods, findings, and conclusions from our ASSETS ‘21 conference paper [Edwards et al. 2021]. This initial study consisted of a user evaluation of the Avatar Project (a set of inclusive imagery, described in more detail below) and alt text co-design conducted with users with disabilities through a series of interviews and focus groups. We present the data from this study to contextualize the collaboration our current study (Study 2) is based on as well as demonstrate an in-depth example of the steps and insights gained from one type of alt text creation process (see Section 5.1.1) which we then compare to the processes described by other stakeholders.

3.1 Background: The Avatar Project

The Avatar Project is a collaboration between scholars at University of California, Irvine and industrial researchers and designers at Google. The intention of the images was to function as reusable brand-compliant components that could be used by all designers (including blind designers and developers who often lack accessible tools [Storer et al. 2021]) to represent users in company contexts. They were created to help designers and engineers “think more critically about inclusivity, diversity and representation” [Mac 2020] and so were to be included in the company's design system used by designers and developers for internal documents such as mockups and personas. They would also be viewed by users in user testing environments and external advertisements. There have been four total artists commissioned to produce artwork for the Avatar Project, with the size, style, and content of the different artists’ contributed sets varying, despite all focusing on the basic goal of producing inclusive imagery.

Our collaboration began when the academic researchers were consulted for their expertise in accessibility. Prior to that, the industry authors had held a series of workshops with other designers at Google to discuss and brainstorm how to properly represent a more diverse set of users, including users with disabilities, in various design and marketing assets. The search for inclusive imagery had begun with using photographs of people with disabilities, who had given permission for their image to be used in such documents. However, concerns arose around branding and ethical implications of having photographs of real people used in contexts beyond their direct control. Thus, the three original artists were commissioned to create a diverse set of illustrations of fictional users. The fourth artist was commissioned later as the project developed.

Study 1 involved a subset of images from one of the Avatar Project artists. That artist's collection as a whole included representations of fictionalized people of many ages, races, genders, ethnicities, professions, sexualities, disabilities, and other characteristics. Across all Avatar Project sets were also non-human and abstract representations–depicting ideas, emotions, or concepts related to identity. Six of the images assessed in Study 1 included representations of visible disabilities (e.g., a skateboarder with a prosthetic arm) or assistive aids (e.g., a guide dog).

3.2 Study 1 Methods

Our original interviews sought to answer the user side of the overall research question: how are accessible and inclusive images created and how are their attributes negotiated by various actors, including image creators, alt text creators, and end users? To understand this aspect of the question, we conducted nine focus groups and nineteen interviews, with 25 people who self-identified as having one or more disabilities. The study was reviewed and approved by the Institutional Review Board of the first author's university. Focus groups and interviews were conducted remotely via phone, Zoom, or Skype, and participants were compensated at a rate of $20 per hour. All sessions were video and audio recorded after obtaining written or verbal consent from participants. The first author attended all focus groups and interviews, usually with one or more additional researchers present.

3.2.1 User Participants.

We recruited 25 people with disabilities (Table 1) from previous participant pools and authors’ existing social networks, as well as through regional disability groups. We also used snowball sampling. We included participants based on their self-disclosure of having a disability in a pre-screening survey. We intentionally sought to recruit people who had additional marginalized racial or gender identities, to solicit thoughts regarding marginalized identities that were not disabilities.

Table 1.

ID	Session	Age	Gender	Race	Disability
U1	S1(+)	18–29	Non-binary (ve/ver/vis)	Black or African American	MIID
U2	S1	18–29	Non-binary (they/them/theirs)	Black or African American	MIID
U3	S1(+)	30–39	Woman	White	P, MIID, BLV*
U4	S1(+)	18–29	Man	Asian	DHH
U5	S2(+)	18–29	Woman	White	P, MIID∼
U6	S2(+)	18–29	Non-binary (they/them/theirs)	Asian	MIID
U7	S2	18–29	Non-binary (they/them/theirs)	Asian	MIID
U8	S3	18–29	Woman	White	MIID∼
U9	S3(+)	30–39	Woman	White	P, MIID
U10	S4(+)	30–39	Woman	White	BLV*
U11	S4(+)	18–29	Woman	White	P, MIID∼
U12	S5	40–49	Woman	White	P
U13	S5(+)	30–39	Woman	White	BLV^
U14	I6(+)	18–29	Woman	White	BLV^
U15	I7(+)	18–29	Woman	Hispanic or Latinx	BLV*
U16	S8(+)	18–29	Woman	Black or African American	P, MIID
U17	S8(+)	30–39	Man	White	P, BLV*
U18	S8	18–29	Woman	Asian, White	MIID
U19	I9	30–39	Woman	Black or African American, White	DHH, BLV*
U20	S10	30–39	Man	White	BLV*
U21	S10	30–39	Man	White	BLV*
U22	S11(+)	40–49	Man	Hispanic or Latinx	P
U23	S11(+)	18–29	Non-binary (they/them/theirs)	South Asian	MIID∼
U24	S12(+)	30–39	Woman	Hispanic or Latinx, White	BLV*
U25	S12(+)	18–29	Man	White	BLV*

Table 1. End User (U) IDs and Demographics

Key: DHH = D/deaf or Hard of Hearing, P = Physical Disability, MIID = Mental Illness or Invisible Disability, ^~= Multiple disabilities within this category, (+) = Participated in a follow-up interview, * = uses screen reader some or all of the time, ^ = uses magnification.

3.2.2 Focus Groups and Initial Interviews.

We conducted nine focus groups and three of the interviews (I6, I7, I9) using the same protocol—lasting about one hour each—to get participants’ opinions and reactions to visual and written depictions of disability and other marginalized identities. We formed focus groups based on participants’ availability as indicated in the initial screening survey, scheduling a focus group when at least two participants were available. The three interviews included in this section were intended to be focus groups but due to cancellations only one participant attended.

Before the first of these sessions, we created an initial description of the images we had selected to evaluate (see Table 2). These descriptions were based on short user descriptions as they were received from the artist, combined with existing guidelines on creating alt text for art (such as [Cooper Hewitt 2019]) or other content types, as well as general guidelines for describing identities (such as [Scheuerman et al. 2020]).

Table 2.

Before the start of each of these initial sessions, to prompt thinking about representation in images and descriptions, we asked participants to provide either a sketch or one-sentence textual description of themselves from the shoulders up. At the start of each session, participants and researchers verbally shared their self-representation, as researchers displayed the illustrations and/or descriptions via the teleconferencing system's screen sharing and chat features. After introductions, we presented an example interface mockup that used three avatars, to explain to participants how avatars were meant to be used by designers. Next, we displayed four to six slides, each with one avatar's illustration and its associated image description. For the sake of time, we could only present a subset of the nine avatars in each session; our selection was such that participants would be shown avatars most closely related to their disability identity. For our participants who used screen readers, we pasted the image descriptions into the text chat as we introduced the image and gave participants time to listen to the description.²

For each image description, we asked participants what they thought about the level of detail, length, accuracy, specific terms used, organization, and whether it fairly represented the identities portrayed. We also asked how they felt about the image and description being used by a potentially non-disabled technology designer. At the conclusion of the session, we asked more generally about how social identifiers such as age, gender, race, and disability should be described in image descriptions; what should be considered when writing image descriptions and assessing quality; and whether invisible identities or disabilities should be included in the images or descriptions. As a form of theoretical sampling [Charmaz 2014], we asked participants in later sessions to reflect on the participant opinions and our interpretations from earlier in the study.

After each session, based on notes taken during the alt text discussion portion of the sessions, the image description for the relevant images were updated to include any additional details or wording changes suggested by participants. If the feedback was specific to one image, only that image's description would be updated, but if the user had argued that the image description was incorrect in or lacking a fundamental aspect of what alt text should include, then all the images would be updated for consistency.

3.2.3 Follow-up Interviews.

Of the 25 participants who took part in the initial focus groups and interviews, 16 participants were able to participate in a follow-up interview—lasting 30 to 60 minutes— to provide us an opportunity to probe further the comments they made in their session in a setting where discussing personal identities would potentially be more comfortable. These semi-structured interviews covered what social and personal identities our participants deemed important, whether their importance changed based on certain contexts, and whether these identities informed their reaction and responses to the illustrations and image descriptions shown in the previous session. We discussed participants’ experiences with representations of their disability or disability in general in text or images.

3.2.4 Data Analysis.

Focus group and interview recordings were transcribed through institutionally authorized automatic transcription services and edited for accuracy by the researchers. The academic researchers analyzed the data through line-by-line inductive open coding. After creating an initial codebook, we developed an affinity diagram using MURAL, an online collaborative whiteboarding tool, to form axial codes and subsequent themes. Researchers met weekly during analysis to iterate on codes and themes. Though our investigation centers on fictional representations of disability in design contexts, social media was often a touchstone that participants used—implicitly or explicitly—to compare expectations of image descriptions in different contexts. We took these context switches into account during analysis.

3.3 Summary of Findings and Conclusions from Study 1

Study 1 feedback from users led overall to longer alt text (as seen in the examples shown in Table 3) partially due to the nature of co-designing through consensus, but primarily because participants believed that detailed descriptions of the images were warranted given their intended uses. In order for the images to properly invoke reflections on inclusivity as well as educate primarily non-disabled designers and developers, participants argued that these image descriptions should be highly detailed. This was part of the many areas of feedback we received from the Study 1 sessions, including comments on the sub-par nature of automatic alt text, detailed feedback about clarity of wording, and discussions on the tradeoff of including many visual details for low vision users versus making the image descriptions relevant for those who had no sighted frame of reference for things like colors of skin tone. These data were important for our overall alt text recommendations, in combination with the findings related to the role of intent, interpretation, and ableism in disability representation through alt text.

Table 3.

ID	Image Set Details	Background	Additional Notes
A1	Created largest set of Avatars; Included explicit depictions of disability; Colorful shaded style	Regarding racial inclusion, is of Asian descent but didn't explore that in the Avatar Project; Had family members with disabilities	Worked with a character designer on their images; had finished their illustrations over a year before interview
A2	Included explicit depictions of disability; Combination of lineless and thick-lined shapes with flat colors	Had experience as a teacher working with children with disabilities	Finished their illustrations well before interview; Based in Italy but spoke in English for interview
A3	More abstract style, focused on depicting/using everyday objects; Included images inspired by disability communities/concepts	Trained in Industrial design; Had created images for various US editorial pieces	Was in the midst of drafting and redrafting their set when interview was held

Table 3. Details about Artists/Image Creators

First, similar to the alignment between educational potential and level of detail, participants felt that the purpose or intent behind the image was important when deciding when and how to describe identities in image descriptions. Users located the origin of an image's purpose in the goals of the artist who created the fictionalized representations. If an artist intended to depict identity in a specific way then the alt text should reflect the intended type of representation and identity. Users also felt that context, including the intended audience and the use cases for the image, should shape the final alt text. Alt text describers, they felt, should consider the imagined readers’ familiarity with the depicted identity when deciding how much detail to include in the image description. If, for example, people without disabilities are the main audience and the images are being used for an educational purpose, then explicitly describing disability may be important in a way it is not important if images are presented to a familiar audience.

Describing disability was particularly difficult for users to create solid rules around. Their personal relationships with their disabilities varied, and thus while they emphasized that accuracy in alt text was important, when alt text writers described disability it was hard for users to define how to accurately describe disability without assigning identity labels to fictionalized representations that could not claim them. A third of users we spoke to discussed difficulties labeling fictional subjects’ disabilities and approved of describing assistive aids instead, to avoid the fraught issue entirely. Alt text writers were cautioned to avoid describing identities for fictional subjects that could not claim them personally, and to balance this linguistic ambiguity with the level of conciseness needed for image descriptions.

Most importantly, users we spoke to understood image descriptions as sites of contested meaning. They argued that the subjective experience of alt text writers affected the final alt text and that if unchecked the conscious or unconscious biases of the image describers could have deep impacts on what a screen reader user understood about the final representation. Users also felt that a large part of the image description process was not just encoding certain biases or points of view into the alt text, but also the process by which alt text readers interpreted the description through their own lens. Essentially, users we spoke to described the process of creating alt text as a progressive layering of interpretations (Figure 1). First, the artist interpreted the basic identity through their perspective and goals, then the alt text writer interpreted the image with their own biases when creating the alt text, and finally the alt text reader interpreted the image description (alongside the image, if sighted) and decided on their final understanding of the identity representation present.

Fig. 1.

Additionally, users discussed the way image descriptions were part of a “very vision-centric” world (U10) where invisible disabilities were not considered and the gold standard of image descriptions were assumed to be the complete translation of the visual elements. This reinforced a type of ocularcentrism where BLV experiences were assumed to be inferior substitutes for sighted understanding.

4 Study 2 Methods

Study 2 focused on specifically image creators and alt text creators as part of the overall research question: how are accessible and inclusive images created and how are their attributes negotiated by various actors, including image creators, alt text creators, and end users? To do this, the first author conducted remote interviews with three of the four artists who had made images for the Avatar Project (see Section 3.1) and conducted three further remote interviews with accessibility practitioners embedded in the company. Two of the artist interviews were conducted over email with exchanges taking place across multiple months. All other interviews were conducted over video conferencing, either Zoom or Google Meet, and lasted between 30 and 45 minutes. Email interviews are an accepted qualitative method [Dahlin 2021; James 2016] and may even be the preferred method when, as was the case with the artists, respondents have differing scheduling and fluency needs or benefit from the time to think and reflect before responding [James 2016; Oltmann 2011]. Study 2 was approved by the academic researchers’ university and this journal article underwent internal review and approval by Google before being submitted to TACCESS.

In the artist interviews we discussed the artist's process for creating the images, particularly any images representing disability, and their artistic approach to the project. The first author asked how, if at all, their own personal identities or experiences influenced their process or artistic products. The artists were also asked about how or if they tried to explicitly depict marginalized identities or, alternately, if the race, gender, or age were intended to be ambiguous. The interviews finished with asking the artists their feelings on receiving feedback from consumers of the images.

For the three industrial practitioners, the first author asked about their general job title and duties, about how accessibility was done at the company from their perspective, how and to what degree accessibility was introduced as part of their job responsibilities, and their thoughts or experiences with alt text including processes they used to write alt text or used to determine or affirm what should be included in alt text. The accessibility practitioners were recruited through existing networks, with the industry authors of the paper connecting the first author to relevant stakeholders related to the Avatar Project and then we included an additional accessibility practitioner that was recruited through snowball sampling. The following tables include a short summary of the personal or work details recorded on each image creator (Table 3) or alt text creator (Table 4) we spoke to, in order to give context for their comments in the findings section. All interviews were automatically transcribed and edited for clarity by the first author, who then performed line-by-line inductive coding and sorted data into overarching themes.

Table 4.

ID	Years in Job	Background/Job Focus	Experience With Accessibility
AP1	UX Writer for 3 years	Attending a technical writing program; In charge of crafting the words in the UI; Had overseen contractors on copy editing and putting data into the internal content management system (CMS)	Created image descriptions for a new launch of an existing product website including blog posts; Had previously organized a collaborative team activity to create alt text for figures that were missing it.
AP2	Program Manager for 15 years in multiple large tech companies	Background in computer science; Focuses particularly on engineering and technical problems; Also handles general project management tasks.	Is fully blind; Worked in accessibility outreach and consultancy; As a program manager, oversees various attempts to organize efforts to produce alt text as well as having personal experience as an alt text consumer.
AP3	UX Writer for 5 years	Before Google, had worked in human resources at a large institution working with internal tech teams; Has always been passionate about language; Sees job as understanding what a user is trying to accomplish and guiding them through it.	Had been introduced to alt text as a temp but didn't have extensive experience with it until she took responsibility for generating alt text for a large set of images that were going to become available on Google products.

Table 4. Details about Alt Text Writers/Accessibility Practitioners

5 Study 2 Findings

In Section 3, we³ described the process by which we created inclusive and accessible imagery in collaboration with designers at Google. However, as we interviewed stakeholders and accessibility practitioners in the company, we found that outside of research collaborations such as ours, in-depth iteration on alt text with feedback from users was rarely if ever used by practitioners. Section 5.1 will describe the four types of alt text creation described by practitioners or used in collaboration with Google: The User-Evaluation Process (used in study 1), The Lone Writer Process, The Team Write-A-Thon Process, and The Artist-Writer Process. For each process we make an effort to describe the key steps involved in the process, how relevant actors (users, alt text writers, artists or designers, etc.) interact or inform the final alt text, and best features or largest drawbacks of the creation processes compared to one another. Following that, we briefly describe three additional themes that point toward areas for improvement in future alt text production processes in industry.

5.1 Four Models of Alt Text Creation

Before we present the four different models of alt text creation, we begin by describing the general context in which the models our interviewees discussed were situated. The accessibility practitioners we spoke to had varying degrees of familiarity with organizational approaches to accessibility outside the specific projects and teams they had worked on. According to AP2, Google's approach to accessibility “[is] on multiple levels… There's a bottom-up approach; there is a bit of a top-to-bottom approach.” This top-to-bottom and bottom-up approach was demonstrated in the alt text creation processes as well. AP1 described the organization-level requirements for alt text passed down to product teams: “In any product launch, you have the accessibility review and a legal review, amongst others like engineering code reviews… [so] you have a [Quality Assurance] tester—accessibility tester—then going through the content [to assure image descriptions] make sense in the context of the task that they're trying to accomplish in order to test it.” In addition to these top-down processes, more bottom-up resources were used. For example, AP1 and AP3 both discussed being aware of or using a variety of alt text guidelines existing in different forms and formats across teams. The manifold nature of the organization–as a large tech company with many teams and divisions–required a similarly manifold approach to accessibility and alt text generation in particular, demonstrated in the differing methods that accessibility practitioners described.

The role of individual interest and advocacy as a complement to existing organizational structures was also an ongoing theme in discussions of accessibility practices with the practitioners. AP2 pointed out there were both formal and informal structures to support accessibility, with many individuals–particularly engineers–having their personal “curiosity” supported by provided training and networking opportunities: “Notwithstanding the fact that people can just connect with each other [through] bug bashes and workshops and events… there's plenty of opportunities to learn about accessibility.” As we will see in the next section, personal passion and dedication among accessibility practitioners was particularly valuable in some alt text generation methods such as the Lone Writer process (Section 5.1.2). This overarching context for the four alt text creation methods also highlights the importance of organizational support and formal policies in enabling robust alt text creation.

5.1.1 The User-Evaluation Process.

We begin with the alt text generation method we are calling The User-Evaluation process. Demonstrated in Section 4, we are calling this the User-Evaluation process because it involved the direct evaluation and revision of image descriptions with users. In our enactment of the process in Study 1 we particularly focused on centering the perspectives of users with disabilities through interviews and focus groups. However, theoretically the core feature of the User-Evaluation process could be seen as including users in direct evaluation and revision of image descriptions through almost any method (e.g., surveys, usability tests, A-B testing) depending on the resources and time available or the amount of qualitative feedback desired.

An important element of the User-Evaluation method is how knowledge and responsibility is distributed. Acting as the alt text writers in Study 1, we based our original version of the alt text on the descriptions given by the artist, but otherwise had creative control over the alt text and how user feedback was integrated. User comments impacted the final alt text, but (at least in our case) did not filter back to the artists. A2 particularly said any feedback “would be a great enrichment and I would gladly change any part of my illustrations.” A1 was similarly open to changing the images based on user feedback. Because the User-Evaluation process took months, coming back to the artists and asking them to redraft the images was logistically impossible. It is important that, unlike the Artist-Writer process (Section 5.1.4), it can be hard to iterate on both imagery and alt text using this process because, as Study 1 shows, the final received understanding of the alt text is impacted not just by the alt text but the imagery itself. Thus, a drawback of the User-Evaluation process may be the close collaboration and careful managing of timelines needed to properly distribute the feedback from users to all players in the imagery creation process.

The practical feasibility of the User-Evaluation process may be its largest drawback, as buy-in at least was very high for the method. User feedback was something all of the artists (A1–A3) and accessibility practitioners who had experience with directly creating alt text or images for user consumption (AP1 and AP3) discussed as valuable. AP3 explained that “If possible, it would be great to get some actual feedback from users… based on these descriptions that I wrote, how confident do they feel in selecting an image as their profile picture? …. Should I have been more detailed? … Or was it okay to be really spare?” The issue of user feedback may be particularly pertinent for AP3 because, as we'll discuss in Section 5.1.2., she used the Lone Writer process and may have felt more fully responsible for the overall alt text. She said, “I think if I were going to do a big set again, it would be good to actually get some feedback.” The enthusiasm for user feedback among our participants suggests that this process may be ideal for situations where buy-in from different stakeholders is important, such as image descriptions with broad cross-product or cross-organization applications.

5.1.2 The Lone Writer Process.

In contrast to the many potential stakeholders involved in the User-Evaluation process, the Lone Writer process is in some ways the most straightforward. The process is characterized by a body of images being passed to a single alt text writer, who creates the alt text for all the images, before they are passed down to users. The Lone Writer process, as it was described to us by AP3, was: “Working on a collection of 800 illustrations where I did write the accessibility descriptions for all of them.” Describing her process, AP3 said: “I sat down and just started going through as many as I could… I think I did a calculation that it would take me about a minute to write each one so whatever 800 minutes is… yeah, I ended up working a couple of weekends to get this done.” The Lone Writer process is defined by this single solitary workstream, where rather than collaboration within or across groups of actors (users, writers, artists) the content flows unidirectionally through the layers of alt text creation.

One of the features of this process was that AP3 was able, because she was the only writer of the alt text, to keep a certain level of consistency between images. “I dug up the guidelines that I already knew about from my first team, where I was a temp and read through them… Some of the guidelines for that, they were going to make things too long, because it was you know, like ‘10 words’ and I was like ‘Oh, I think, like six maybe for this’ just because I don't want [it] to take four hours for somebody to go through the whole library… one of my guidelines for myself was like, don't refer to colors [because they could be customized by the user]… I tried to limit myself to four to six words generally and using present tense.” These adaptations of existing guidelines–and the resulting similarities in length and tense across the 800 images–were simple to decide on and implement as AP3 was solely responsible for the alt text. Whether this degree of consistency is ultimately a positive in terms of user experience was not assessed.

However, one clear logistical drawback of the Lone Writer process is the potential, as seen in this example, for a high burden to be placed on one individual. AP3 at multiple points described the effort put into the 800 images over two weeks. When asked if this was a task she more or less assigned to herself, she agreed “it definitely was.” She explained that she worked on it over the weekends “just because it was really important to me to do it.” She felt personally responsible for creating more accessible images, to the point that she put in time outside of working hours. And she described if a similar situation happened in the future, “we've got to have these descriptions and I will write them even if I have to do it over two weekends. I think overall it's important to do this work.” The Lone Writer process by its nature means that a high degree of creative burden, both in terms of time and in terms of responsibility, falls on one person.

Comparing AP3’s experience with the Lone Writer process to our experience with the User-Evaluation process for the Avatar Project also suggests some potential tradeoffs of this method. AP3 characterized the images as including a wide range of subjects, including real life locations, objects arranged in a room, and humans or pets engaged in activities intended to be relatable to users. This is very different from the detailed portraits of individuals or concepts included in the Avatar Project, which were intended specifically to encourage reflection on inclusion and identity. This may partially explain the significantly shorter amount of time–approximately one minute per image–that AP3 took to create the alt text compared to the several months long and more than half-dozen iterations the User-Evaluation method required. This suggests that for complex images the Lone Writer process may not scale, or conversely, that for large sets of images changes would have to be made to the User-Evaluation process as we conducted it.

5.1.3 The Team Write-A-Thon Process.

AP1 and AP2 both had experience with the second type of alt creation model that we're here calling The Team “Write-A-Thon” Model. The essence of the Team Write-A-Thon process is a group of alt text writers collaboratively developing alt text. AP1 spoke to this model when she described a “fix-it” day she had organized for her co-workers to participate in alt text creation. She describes “fix-its” as “an engineering team concept… where you get everyone together and you have a list of tasks and you just like run through them all because that's the best and easier way to get the job done that has not been figured into anyone's priorities yet. No surprise, that's always alt text.” She said for the specific “fix-it” she ran she “got anyone, not just designers but anyone from our team, like forty people maybe, to sit down and describe the images and … then [co-worker] and I [would] then review it and add it to the CMS.” The fix-it approach was described as one way that multiple alt text writers can have a chance to sit down together and work on accessibility as a group.

The Team Write-A-Thon Process, as a collaborative approach, has the potential to reveal how varied people's opinions are about writing image descriptions. As AP1 described, “It was also a good experience if you want to see just how different two different people can think the alt text should be… someone is going to say, ‘this is a mobile UI with this, this, this.’ Someone else is gonna say like, ‘a news app and five buttons.’ And you're like, ‘I dunno? Would that be the same thing?’” The subjective nature of alt text production becomes more obvious when, as in AP1’s case, there are many writers describing images with “a lot of repeated constructs” but the alt text produced is very different. AP1 felt that the fix-it day resulted in understanding “a few different standardized ways of describing things… [Because] there's like three different ways of saying what really should be the same thing. So actually, consistency is a big chore across it all.” Thus, the Team Write-A-Thon model may also create a level of consistency similar to the Lone Writer model, but through consensus rather than a single person's decision. It may also have the advantage of placing less of a burden of work on just one person.

Although getting agreement from multiple people may result in more clear and consistent understanding across the group, it can be very time consuming and laborious trying to reach that level of agreement. AP2 describes some of the drawbacks of a collaborative approach: “When we're redesigning the [product] website… the question did become pretty quickly… ‘I'm going to be spending the next half a year labeling 1,000 images, or is there a better way we can spend our time?’… Now you've got 20 designers sitting in the room trying to say like ‘Well, this is the right way to say it. Do I give enough information? Do [I] not give enough information?’” Essentially, while a collaborative approach between many writers “[done] in a very standard fashion” may ideally lead to a distributed load with higher consistency, in practice it can lead to more time and effort being spent coordinating the writers and getting consensus from everyone.

5.1.4 The Artist-Writer Process.

This brings us to the last alt text production process we are going to describe: The Artist-Writer Process. For the prior processes, we did not discuss the role of the artist because the role of the artist was always minimal, with a clear hand-off from the artist well before and completely separate from the alt text creation process. The Artist-Writer Process, in contrast, occurs when there is either direct collaboration between image creators and alt text creators, or when the image creator and the alt text creator is the same person. A key finding here was that who counts as an “image creator” is actually more complicated than assumed. Two of the three artists we interviewed explained that their role as an artist was not simply to create an illustration, but to collaborate with a person writing the words accompanying the image. A2 said very clearly: “There is always a lot of mediation and collaboration between the art director, the author, the illustrator and the reader.” A3 described “the traditional model” as being a collaboration between “the copywriter and the art-director.” When an artist becomes part of an Artist-Writer process, they become a more involved co-creator of the final meaning of the alt text.

The Artist-Writer process has two variations. The first is having the image creator and the alt text writer be the same person. AP1 explains this is, in her experience, rare: “[my colleague is] not like any other person who contributes imagery in that she does write the alt text herself.” The more common practice in AP1’s team before the Team Write-A-Thon method was attempted (see Section 5.1.3), was the second version of the Artist-Writer process. Under this version of the model, a single author wrote the text of the webpage, made decisions behind the images, and “[it] was not a complete submission until it has alt text, designers must contribute to alt text.” This meant there was still a single source of creative control over imagery and alt text, but that it took the form of one person overseeing the artist and writing the text, rather than actually making the imagery themselves.

Having a single author in charge of content, imagery, and alt text had drawbacks, however. AP1 explained: “The goal… is that the person who is responsible for the information about a given page or topic would also write the alt text for the image, [because they are] the subject matter expert on it. But in practice it never works.” The reason it didn't work, according to AP1, was that general writers did not have specialized knowledge of or experience with alt text: “I feel like the amount of training versus the frequency someone would do it… the overhead was too high. Like someone might actually only end up writing alt text once a year [so] they are not going to get good at it.” In theory, AP1 felt, “[it's] important education-wise, but they are not going to be a good contributor.” AP1 knew “I'm going to have to rewrite it.” Thus, the Artist-Writer process had tradeoffs to either quality or conservation of effort.

However, there are still valuable elements to the Artist-Writer approach. For one thing, the educational element of distributing alt text production labor across a large team should not be dismissed. There is an implication that if a minimum alt text quality could be assured, it would avoid the time or manpower bottlenecks seen in the Lone Writer or the Team Write-A-Thon processes. Lastly, the unified authorial intent that the Artist-Writer process facilitated was closest to the understanding of “intent” we and end users had as our baseline understanding in study 1. This may make it a process that could be easily integrated with elements of the User-Evaluation process because, as we'll discuss in the next section, clear delineations between the methods are not always necessary or entirely accurate to the in situ experience of the processes.

5.1.5 Deconstructing Delineations Between Types of Alt Text Creation.

Despite labeling and describing these four alt text creation processes as distinct, in practice the delineations between different types of alt text creation processes were not always so clear cut. AP3 explained even in the Lone Writer process, that she didn't work entirely alone: “There were a few images where it was a lot harder [to write alt text] … so I did set aside some of them [to give to a colleague] as like ‘I'm having trouble… What feedback can you give me?’” Even, or perhaps especially, writers who take full responsibility for alt text don't work in complete isolation, making the distinction in practice between the Lone Writer process and the Team Write-A-Thon process not always easy to determine. Similarly, the User-Evaluation process, as we practiced it in Study 1, involved writing the iterations of the alt text not just in collaboration with the users, but in collaboration with different researchers on the team, making it reminiscent of the Team Write-A-Thon approach.

Additionally, even in methods that did not directly elicit user feedback–as in the User-Evaluation process–artists and accessibility practitioners discussed prior knowledge and experiences with disabilities informing their approach to accessible imagery in a way that complicates the simple forward march of ideas from creators to end users. A1 explained: “Since my sister is autistic and is non-verbal, I was exposed early on to neurodiversity and different disabilities…. My uncle has a prosthetic leg so that was just a very normal sight as a child… These things really opened my eyes about the concept of accessibility for all different kinds of people as well as the constant daily challenges that exist everywhere because our society was built for neurotypical, able-bodied people.” AP3 similarly said that her personal relationship to assistive technology made her more passionate about making imagery accessible: “I don't have to use a lot of assistive features… but there are things that are helpful for me even though I don't need them, like automatic captioning of YouTube videos… if I want to watch a YouTube video in bed without waking up my spouse it's lovely to be able to actually do that without having to get out of bed and go find earbuds.” People with disabilities, even in more direct models where accessible imagery is not being created with direct consultation with users, still impact the way accessibility is done, because all actors in the imagery and alt text creation process are filtering information through their past experiences, including–for many–personal experiences with people with disabilities.

5.2 Three Further Themes from Inclusive and Accessible Imagery in Practice

In this section, we describe three additional themes from the interviews with image and alt text creators that ultimately shape the overall process of inclusive imagery creation in the studied company. First, we describe the way artists and accessibility practitioners fulfilled the expectations users in Study 1 had by considering the intended use case when creating imagery and alt text. Next, we describe how the alt text creation processes we have just described were impacted by the macro pragmatic and micro personal considerations of different actors, including the organization. Lastly, we describe how inclusive imagery as it was done at Google had clear champions for the words and the images but no set role to champion the alt text, leaving a potentially fillable gap in future alt text creation processes.

5.2.1 Intended Use Cases and Identity Depictions in Inclusive Imagery.

Just as users in Study 1 believed it ought to be, we found the intended use case for the images impacted the entire inclusive imagery and alt text production process. For example, AP3 thought about the use case when deciding how or if to explicitly mention race in alt text: “We have an image of a person riding a motorcycle… [and the alt text just says] ‘Person riding motorcycle.’ You're just trying to keep it pretty generic…so that these images could appeal to anyone.” Whether aracial images or image descriptions actually do appeal to anyone is a much larger question, but what is relevant is that AP3 felt that because these images were to be user icons, she needed to keep the descriptions of people ambiguous.

The artists working on the Avatar Project, likewise said the imagery's intended use as an inclusive tool was an important factor when deciding how to depict identity. A2 summarized the project as “Google asked me to create a series of avatars that represented diversity… the goal was simply to represent the widest possible spectrum of human beings” including an image of disability that emphasized “an extremely shareable and ordinary moment” where “the lack of limb [was] visible but not overwhelming.” A1 similarly felt that they “didn't want our depictions to tokenize or degrade any person or specific disability, or flatten them to just that… it was the goal for each and every avatar to show a multi-dimensional story of the character… not centering a disability as a single identity.” Despite the inclusive purpose, artists discussed wanting to keep a degree of intentional ambiguity. A1 explains that “we had specific ages/genders/races in mind, but the final product keeps a lot of them pretty ambiguous… Just because I assigned a specific gender to a character in my mind, doesn't mean that people will/should see it the same way.” A3 felt that “any type of successful art of any kind has these layers… where it doesn't hit you over the head, but it kind of suggests something and leaves a little bit of room [for interpretation].” Inclusive imagery, in the minds of artists, did not always require explicit representation of identities.

What was interesting was that, despite the role of artists in depicting identities, there was not a clearly defined path of intent from artist to artwork. A1, talking about themselves and the character designer they worked with, said: “I feel our art was adapting to the purpose/subject matter… This art was for a client, and it had a specific purpose to represent people inclusively.” A2 similarly argued that “the job of an illustrator (at least in my case) is to represent an idea, a concept, a state of mind, that is usually someone else's, the writer of the text.” The pre-figured intent coming from the project sponsor therefore informed both the alt text and the artwork (see Figure 2).

Fig. 2.

5.2.2 Balancing Pragmatic Considerations versus Personal Intuition and Expertise.

Another tension we saw at play when artists and accessibility practitioners were creating accessible imagery was the need to balance pragmatic concerns of the organization and users, with the individual agency and expertise of the contributing image and alt text creators.

There were a host of pragmatic considerations that accessibility practitioners considered when deciding how to create alt text. AP2 valued human generated alt text, but explained he would never be the person to say, “‘Okay let's leave everything else and just work on the alt text,’ because we've got 10 thousand pages full of images and we're going to be spending the next year labeling those images.” AP1 emphasized that pragmatically when getting new people up to speed on writing alt text, she tells them to focus on what about the image is “meaningful” or “most salient” which “[is] tricky because [what] you're asking… someone to do is say ‘What is this image's role in the [webpage]? What is it there for?’” Thus, both the scale of a large company, and the complexity of producing useful alt text when writers are inexperienced affects the methods and approaches used to create alt text.

There are also personal levels of expertise that inform how individuals create inclusive and accessible imagery within the context of these larger constraints. For example, A3 explained that when they decide to move from researching a topic to creating the images themselves there is a certain amount of intuition that they have as an artist. “It's just a balance…I have to read about and at least sort of understand what's going on…, but I think it's just sort of intuitively knowing when… I've done enough of that.” AP1 has a similar level of expertise in her role as a UX writer. She explains that “[she] take[s] responsibility, happily, for the way language and image work in our product” and that in that role of responsibility, she understands that “oftentimes the secret is that someone made the image without a clear intent or without a purpose and that's when alt text becomes really hard.” From both artists and writers. their personal experience and expertise in their field allows them insight into how accessible and inclusive imagery needs to be produced and this balances with the tangible time constraints and pragmatic scale considerations to create the final processes and deliverable user experiences.

5.2.3 Understanding the Relative Importance of Words, Images, and Alt Text.

Another element that influenced the approaches image creators and alt text creators took to creating accessible imagery was how each of the different players in the process valued words, images, and alt text. AP1 explained that asking someone “to only do the alt text is like clearly missing a big part. Because like what is the surrounding? The adjacent text is a big part of that experience too.” Describing the way images, surrounding text, and alt text work together, AP1 argues that: “A visual example, even if it just reinforces the concept is probably still useful…[but] the adjacent text is probably already saying exactly what I would want [in] the alt text…[it is] this in-between thing where… this is not so essential to meaning, [but] it's not decorative.” The parallel relationship between the written words, imagery, and alt text is shown in Figure 3. Although, as AP1 shows, the written text, the imagery, and the alt text, are not—in practice—always clearly defined in relation to each other. AP2 even gives the example of describing one's appearance at a conference to add an additional layer to the puzzle: “Some people will say… ‘oh you should describe yourself to other people’… I feel like if people hear my voice, I want them to experience just my voice without knowing how I look… There's a beauty in not knowing too much about [a] person.” Thus, image description processes, in practice, are about balancing the overlapping information sources that are available to different users and considering how to best combine words, images, and image descriptions to achieve the desired result.

Fig. 3.

Perhaps unsurprisingly, in cases where images, words, and alt text are all working together, the perspective from which you look at the problem determines what is seen as the most important. For example, A1 explains “it took a lot of thought to figure out what disabilities would translate visually and the most clearly.” AP1 explained one area where she put considerable time and effort was perfecting the words and images as part of the overall experience of the product she was working on:

“I will say that something I'm proud of with this site, even with the compressed time… We tried to create identities… even though alt text treats the imagery as though it were arbitrary, It's not. We were very deliberate in choosing a few different people, different ages, different identity, and then also building out a little mini world for them too because once you see someone's phone screen it implies like a whole social circle…this is a more complex representation of identity than the previous generation of our products… there's an intentional little story there… But now I'm wondering… the alt text sphere has no idea we made all these choices, right?... Sometimes the same thing is true for... for all users, which is that I wish there were ways of scaling its complexity, [making versions which are] more sophisticated… because right now I can only treat it as this thing where I have to say as little as possible, as efficiently as possible.”

Alt text was ultimately thought about last and least because there was not an expert on alt text or screen readers’ experiences to weigh in. It never “figured into anyone's priorities” as AP1 said earlier. In the overall production process, images and words both have champions, the artists and the UX writers, but the alt text is lacking the same kind of organizational champion role.

6 Discussion

Our findings described four types of alt text creation processes and three additional themes relating to accessible and inclusive imagery creation processes in practice. We identify four image accessibility processes that were seen or described by accessibility practitioners at Google: the User-Evaluation process, the Lone Writer process, the Team Write-A-Thon process, and the Artist-Writer process. These are by no means a comprehensive set of ways that alt text is created at Google, much less in the tech industry as a whole. Rather than claiming the generalizability of our findings, we describe these four processes to give a point of comparison for further explorations of alt text creation and as a first insight into how image accessibility is sometimes done in industrial practice.

From our interviews with industrial practitioners and artists, we also identified three new findings building on our prior work looking at inclusive imagery and alt text [Edwards et al. 2021]. First, image and alt text creators do consider purpose when creating inclusive imagery, but that purpose is often prefigured by a client or commissioner rather than being located in the desires of the artist. Second, alt text being produced in industry is shaped by both the personal experiences and expertise of individuals, as well as the larger logistical, social, and technical constraints of the system that writers and artists exist within. And third, written content and imagery have advocates within industry product launching processes, but alt text does not have the same level of prioritization within the system as alt text writers are not a delineated job role. We describe how this study complicates the layers of interpretation in alt text as we first described them in Edwards et al. [2021] and finish with a discussion of what the processes and themes we found can tell us about directions for image accessibility going forward.

6.1 Layers of Interpretation: More Complicated Processes

Our findings from Study 1 complicated the guidance suggested by Bennett et al.’s study, which focused on self-presentation in photography and argued that the subject of a photograph should be the final authority on how to describe their identities [Bennett et al. 2021]. We found that users saw fictionalized representations as defined by the intent of the image author who, alongside the users themselves, determined what the final received interpretation of disability identity was. Our study of industrial practices and inclusive imagery production processes complicated this further by exploring the more elaborate relationship that artists played in conveying intent through images.

Our findings originally suggested at least three layers at work in the final reception of an image description. First, an artist renders a fictional interpretation of disability or any other social identity. Then, the image describer interprets the fictional subject into text. Finally, the reader of the description interprets the text. Our user interviews revealed that the identities of the “interpreters” at each stage of the translation layered on their own biases, assumptions, or perceived meanings. However, artists are not actually alone in creating a representation of disability or other social identities. In fact, at least in the case of the Avatar Project, the artists were collaborators attempting to stay faithful to the inclusive intent of the imagery project as defined by their client, Google. Character identities originated from a combination of that foundational intent and the artists’ own personal experiences and inspirations. Even artists themselves did not always intend a specific identity to be depicted, but sometimes intentionally depicted figures ambiguously. And this imagery, no matter how well-conveyed or accurate, is often running in parallel to the main written content of the webpage, product, or app. Thus, the final reader of the alt text is basing their understanding of the representation on a gestalt of main written text and the layers of interpretation and intent at play in creating the accessible imagery (Figure 4).

Fig. 4.

6.2 Implications for Image Accessibility: What Can We Learn from Alt Text Production Practices at Work?

Beyond strengthening the model of alt text creation we proposed based only on Study 1, our initial analysis of alt text production at Google point towards some other interesting implications for image accessibility as a field of study and practice. The four creation processes we described highlight different areas where improvement in image accessibility is warranted.

6.2.1 Platforms are Needed for Direct User Feedback on Alt Text.

The wide-support among accessibility practitioners and artists for the User-Evaluation method (Section 5.1.1) demonstrates that organizational stakeholders care about and want to prioritize user feedback. Alt text writers we spoke to understood that the text should respond to the intended use case of the images (Section 5.2.1) but didn't necessarily have the data to assure them that their intended message was being received. This is because, despite buy-in, it's not always possible to choose an alt text creation method based only on if it creates the best quality results. Instead, issues of scale, time needed to get new alt text writers up to speed, and incompatible deadlines all end up constraining what methods are practically available for any given project (Section 5.2.2). Thus, one potential route for either a research or practical contribution to image accessibility would be the creation of a central user research platform or other screen reader user feedback system. If there was a way to practically get feedback from screen reader users on the scale and with the short turn-around time that large companies like Google need, the quality of the alt text created by these large companies would no doubt leap forward.

6.2.2 Alt Text Creation is Part of UX Design, and Needs Similar Support.

The Lone Writer process (Section 5.1.2) demonstrates that alt text writing in industry can sometimes require high labor burdens be placed on individual workers, particularly those that personally care or feel responsible for accessibility being included in the final products. While the dedication of these workers is admirable, and potentially has some advantages in terms of consistency and turn-around time, this practice shows there is more work to be done on an organizational level. The Lone Writer process, as well as the dedicated advocates for images and user text that we discussed in Section 5.2.3, shows that alt text production is a part of the overall UX design process for a product and needs to be supported as such. This could mean dedicated job duties or even job titles focused on alt text or other accessibility design. Accessibility champions have a role in industry, both as the experts who new practitioners can turn to when they are tasked with creating alt text despite not having experience, or as the people putting in the effort themselves to give alt text the fine-grained attention that imagery gets from art directors and UI text gets from UX Writers.

6.2.3 As Part of UX Design, Alt Text Creation Follows UX Principles.

The Team Write-A-Thon process (Section 5.1.3), as we described it here, points toward the fact that image accessibility research does not yet consider consistency to the degree it should. For large organizations, the issue is not simply assuring that individual images are described accurately or completely, there's also the additional challenge of making sure all team members or contributors understand and follow the same guidelines. Writing alt text is unavoidably subjective, and leaning into the advice that there is no single “right” way to write alt text can be valuable for individuals. But when an organization or team is trying to create a single cohesive user experience, an entirely other level of consensus and clarity of guidelines is needed to assure that repeated elements or figures are described with the same wording regardless of how many writers are involved in creating the alt text. Consistency is part of the established canon of UX principles for designers to follow [Lidwell et al. 2010] Thus, considering what other UX principles can apply helpfully to alt text creation is an important future avenue for researchers and practitioners. Current alt text guidelines usually attempt to be generalizable, functioning as writing guides, but the Team Write-A-Thon process demonstrates that ultra-targeted guidelines for a single organization, team, or collective can be almost if not more valuable in practice. Or, alternately, considering how alt text guidelines could be rephrased or restructured to be followable in the same way UX guidelines are, may be valuable.

6.2.4 Just Like UX Design, Iterating Across Stakeholder Groups May Support Alt Text Creation.

The Artist-Writer process (Section 5.1.4) as it was described at Google, also demonstrates that at least in some contexts there can be additional stakeholders, layers of intent, or overarching frameworks that are invisible to users but which nonetheless impact the finished product. Users, as demonstrated in Study 1, found it easy to locate intent in the goals of the artists. This works well in one version of the Artist-Writer process, where the two roles are actually the same person, but in organizational practice a project goal may come well before any individual artists are involved. This potentially means that image accessibility can likewise be built into imagery projects from their earliest points of conception. There is an opportunity for final alt text to be created through an iterative back-and-forth between art director, writers, and artists, to the same degree and in parallel to the creation of the art itself.

6.2.5 Guidelines and Automated Tools are Needed to Streamline Alt Text Experience Design.

Ultimately, our data only shows a tiny piece of the puzzle. All our participants came from the same organization, and even then, the different methods we identified were not always clearly delineated from each other (Section 5.1.5). The complexity we saw—even in this small sample of alt text creation methods in practice—demonstrates the way image accessibility can be anything from a negotiation between many actors, to a sole worker blazing through 800 images over the weekend. The way guidelines exist now, as sets of vague tips and rules directed towards writers rather than designers, cannot properly serve the needs of organizations that need to make many images accessible. The complexity of alt text creation, as a form of UX design that needs to be reactive to small differences in goals, context, constraints, team makeup, timelines, and the like, leads to a new question. How could we design guidelines, or even potentially create automated or semi-automated tools, to truly standardize the experience of creating alt text in the manifold ways a large tech company requires? How can we as practitioners and researchers take the staggering complexity of industrial image accessibility practice, and turn it into an asset for better user experience for alt text consumers?

7 Limitations and Future Work

We acknowledge the limitations of this research and describe some of them here to prompt reflection on our conclusions and potential future work to fill in the gaps we could not fill. First and foremost, this research does not claim and is not intended to be comprehensive. We spoke to a limited selection of artists and accessibility practitioners, with only White accessibility practitioners and Asian-American or White artists represented. We spoke to a total of six professionals involved in accessible imagery production. There is no way, from that, to draw conclusions about all types of accessible imagery processes at this large company, much less other companies. However, we believe the only way to begin charting the complexity of industry accessibility practices is to start somewhere and build on it. We therefore welcome and encourage future studies to examine accessibility processes at other companies, in other countries and non-Western contexts, and to understand the unique perspectives of marginalized accessibility practitioners.

We also focused largely on one project and only used direct reported practice as the basis for our understandings. While reported practice is an accepted approach to understanding accessibility practitioners [Azenkot et al. 2021], it remains to be seen if a full embedded ethnographic study of accessibility practices in industry would reveal more nuanced understandings of processes and where actual processes differ from reported ones. Focusing on one project was important for depth of information and familiarity, but academic research cycles are deeply out of sync with industrial practices and as user interviews, publications, and other scheduling conflicts took more and more time, it became increasingly difficult to collect data from previously highly engaged stakeholders. We support, therefore, industrial research teams continuing our work and examining industrial practices in a way that keeps pace with companies’ rhythms. If such work is conducted, we hope the results will be published externally for academic researchers to learn from as well.

8 Conclusion

This article consisted of a study of image accessibility production practices in industry. We interviewed three accessibility practitioners at Google and three artists who were contracted to work with Google on an inclusive imagery project. From these interviews we describe four types of alt text production processes including demonstrating one of the processes, a user evaluation and co-design study that was previously documented in an ASSETS 2021 conference paper. We compared these four alt text creation processes and outlined three themes from the study of accessibility in practice. These themes covered the ways that users, artists, and alt text writers, determine what should be included in inclusive and accessible imagery including negotiating received and conveyed intention, balancing personal and pragmatic considerations, and the complicated and messy negotiations that come into play between different informational elements of a complete message. We concluded with limitations and areas for future study.

Acknowledgments

We would like to thank the organizational supports, research partners, and participants, without which this project would not have been possible. The first author would particularly like to thank Hannia Aguiar and Melanie Guerrero, whose work did not end up in this paper but whose willingness to help was deeply appreciated and helpful for keeping this project moving forward.

Footnotes

We acknowledge that not everyone who is blind or has low vision uses screen readers. In this article we use both terms, as some users we spoke to for this paper were low vision magnification users who nevertheless discussed the importance of detailed visual elements being explained in image descriptions included with the body text of a page.

Note: In Session 11, no image descriptions were shown, and instead visual representation of disability was discussed. That session and the associated follow up interviews with those participants were included in the dataset, despite not discussing image descriptions, because they provided general feelings about disability representation in media.

In the effort to assure anonymity for the artists given the total information included in prior and current publications, they are all referred to with the gender-neutral singular “they” pronoun. For the accessibility practitioners, gendered pronouns are retained.

References

[1]

Amaia Aizpurua, Simon Harper, and Markel Vigo. 2016. Exploring the relationship between web accessibility and user experience. International Journal of Human-Computer Studies 91, (July 2016), 13–23. DOI:

Abstract

1 Introduction

2 Related Work

2.1 Inclusive Imagery

2.2 Image Accessibility

2.3 Accessibility in Practice

3 Study 1: User Evaluation of Inclusive Imagery and Alt Text Co-Design

3.1 Background: The Avatar Project

3.2 Study 1 Methods

3.2.1 User Participants.

3.2.2 Focus Groups and Initial Interviews.

3.2.3 Follow-up Interviews.

3.2.4 Data Analysis.

3.3 Summary of Findings and Conclusions from Study 1

4 Study 2 Methods

5 Study 2 Findings

5.1 Four Models of Alt Text Creation

5.1.1 The User-Evaluation Process.

5.1.2 The Lone Writer Process.

5.1.3 The Team Write-A-Thon Process.

5.1.4 The Artist-Writer Process.

5.1.5 Deconstructing Delineations Between Types of Alt Text Creation.

5.2 Three Further Themes from Inclusive and Accessible Imagery in Practice

5.2.1 Intended Use Cases and Identity Depictions in Inclusive Imagery.

5.2.2 Balancing Pragmatic Considerations versus Personal Intuition and Expertise.

5.2.3 Understanding the Relative Importance of Words, Images, and Alt Text.

6 Discussion

6.1 Layers of Interpretation: More Complicated Processes

6.2 Implications for Image Accessibility: What Can We Learn from Alt Text Production Practices at Work?

6.2.1 Platforms are Needed for Direct User Feedback on Alt Text.

6.2.2 Alt Text Creation is Part of UX Design, and Needs Similar Support.

6.2.3 As Part of UX Design, Alt Text Creation Follows UX Principles.

6.2.4 Just Like UX Design, Iterating Across Stakeholder Groups May Support Alt Text Creation.

6.2.5 Guidelines and Automated Tools are Needed to Streamline Alt Text Experience Design.

7 Limitations and Future Work

8 Conclusion

Acknowledgments

Footnotes

References

Cited By

Index Terms

Recommendations

Toward supporting quality alt text in computing publications

Automatic Alt-text: Computer-generated Image Descriptions for Blind Users on a Social Network Service

ALT text and basic accessibility

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations