1 Introduction
With the proliferation of technologies in every aspect of modern life, digital technologies have become an integral part of multimedia content creation. Myriad computer-based applications are being developed to support skilled practices of creative work, such as graphics and 3D model design (Illustrator, AutoCAD, SolidWorks), photo editing (Photoshop, Lightroom), video editing (Final Cut Pro, Premier Pro, DaVinci Resolve), digital drawing and painting (Procreate, Fresco), and so on. Audio production is one such form of computer-supported skilled practice which involves turning unedited audio tracks into professional-sounding content by taking them through time-consuming and complex processing workflows named editing, mixing, and mastering.
Becoming proficient in audio production requires understanding how to work with the medium of audio as well as various software tools that support audio production, such as Digital Audio Workstations or DAWs (e.g., Pro Tools, REAPER, Logic Pro, and GarageBand) and effect plugins (e.g., equalizer or reverb). These audio production tools incorporate complex graphical user interfaces that are heavily geared towards sighted users and often lack accessibility support [
21,
30,
35]. As part of learning to use these complex tools, blind audio producers must figure out how to coordinate between screen reader software (e.g., VoiceOver, JAWS, NVDA), additional third party accessibility scripts (e.g., Flo Tools, OSARA), and hardware tools to make DAW features accessible [
35]. What’s more, this steep learning curve is further exacerbated by a lack of accessible learning resources (e.g., tutorials, guides, and documentation) geared towards blind audio producers. Although many online audio and video tutorials exist to help people learn to use audio production tools, these are largely visual in nature and rooted in a sighted instructor’s experience with the tools, which can be dramatically different from that of a screen reader user. In this paper, we focus on understanding and designing to support the creation of screen reader accessible learning resources for audio production tasks. Our work is grounded in interviews and observations with seven blind audio production experts who create their own written guides and audio-video tutorials as well as offer real-time training sessions to support screen reader users in learning audio production tasks in DAWs. Our formative work reveals that blind trainers create screen reader-centric learning resources to reduce challenges associated with widely available audio production tutorials made by sighted people as well as facilitate structured and hands-on learning for novice blind learners, all while managing a complex workflow for recording and editing accessible tutorials. Drawing on these insights, we developed
Tutoria11y, a macOS based extension for GarageBand to support blind audio producers in creating accessible, interactive tutorials for teaching audio production tasks. Tutoria11y enables screen reader users to quickly create step-by-step instructions and specify actions required to perform custom audio production tasks. Once a tutorial is created, other screen reader users can access the interactive tutorial in GarageBand and receive step-by-step guidance and confirmation of their actions. We report results from exploratory design evaluation sessions with five blind audio production experts, which detail how participants used the system to generate interactive tutorials as well as how they reacted to the interactive playback experience.
Our work makes three contributions to HCI and accessible computing. First, we contribute new empirical evidence of the complexities screen reader users encounter in learning audio production practices and generating accessible learning resources, extending prior work that highlights how blind screen reader users engage in audio production and music composition tasks [
29,
30,
35]. Second, we introduce new techniques to support blind screen reader users in creating and consuming interactive tutorials for audio production tasks. Our design exploration reveals new insights about ways to scaffold accessible learning and training practices among blind audio producers, complementing prior work that focuses on how sighted people can generate interactive guides for screen reader users [
34] and other sighted users in various forms of computer-supported skilled work [
4,
5,
10,
11,
19,
24,
39]. Third, we synthesize our findings across the two studies to highlight how we might rethink the role of technologies as a means to amplify, rather than replace, disabled experts’ knowledge in improving accessibility in computer-supported skilled work practices.
3 Formative Study: Method
To understand accessibility of audio production learning resources for screen reader users, we conducted interviews and observations with seven blind and visually impaired audio production experts who have experience with offering real-time training or creating tutorials and written guides for blind audio producers. This study was approved by the Institutional Review Board of our university.
3.1 Participants
We recruited seven participants, all of whom identified as men
1 and were advanced or expert users of two or more screen readers (JAWS, NVDA, VoiceOver, Narrator, TalkBack). Four participants were residents of the United States at the time of the research, while three lived in Europe. Most participants created audio/video tutorials and written guides, and some also offered one-on-one or group training, either on their own or through some institute. Participants shared their audio/video tutorials on their own websites or YouTube channels, online communities or blogs they administered, and WhatsApp groups or mailing lists of blind audio producers they were members of. See Table
1 for participants’ self-reported visual disability, type of audio production training they offered, and the Digital Audio Workstations (DAWs) they used.
3.2 Procedure
The sessions were conducted by the first author over Zoom between March and May of 2021. We started each session by collecting verbal consent from the participant or by using a GDPR-compliant online form for those residing in EEA countries. During the sessions, we first asked questions about participants’ instruction style, what kinds and formats of learning materials they generated, and their rationale behind preferring certain formats of learning materials over others, probing for challenges they encountered and strategies they developed to make tutorials accessible and beneficial for blind learners. Next, we remotely observed participants as they prepared an audio/video tutorial on an audio production task of their own choice. We asked participants to share their screen via Zoom (including computer sound). Participants used their preferred DAWs and screen readers to prepare the tutorials: Josh, Neil, and Owen used REAPER with NVDA screen reader; Phil and Rob used Logic Pro with VoiceOver screen reader; and Leo and Dylan used Pro Tools with VoiceOver. We paid attention to and asked for explanations on how participants performed various steps in their tutorial creation workflow—starting from setting up different software and hardware tools, narrating and enacting task steps during the recording phase, and editing and publishing the recorded tutorials. We ended the sessions with follow-up questions based on our observation of their tutorial recording and/or editing process, such as particular actions made within the recorded tutorials, presentation styles, etc. Each session lasted approximately 90 minutes. Participants received a US$60 Amazon Gift Card for their time and effort. All sessions were recorded and transcribed for analysis.
3.3 Data Analysis
We followed a thematic analysis approach [
3] for data analysis. The initial coding of data was primarily led by the first author who is sighted, has experience with multiple DAWs and screen readers, and has been conducting research on accessible audio production for four years. The first author performed open coding on the transcripts focusing on the different forms of audio production training participants offered, their unique experiences with learning audio production in the past and how these experiences informed their current instruction style, and their software and hardware setups for tutorial recording. We periodically streamlined the open codes across different transcripts by merging similar or overlapping codes. The first and third authors regularly met to discuss these codes and refine them to resolve any disagreements, whereupon the first author grouped relevant codes to create a smaller set of axial codes. Through this iterative refinement process, we reworked the codes into three distinct themes that capture core aspects of creating accessible learning resources for screen reader users.
4 Formative Study: Findings
Our analysis revealed three important aspects of how blind audio production trainers create accessible learning resources for new learners with vision impairments. Below we detail the ways in which participants curate their workflow to support the unique needs of screen reader users in understanding audio production tasks and facilitate structured and hands-on learning among novice learners as well as how they manage the complex procedure of pre-processing, recording, and editing tutorials.
4.1 Supporting Screen Reader-Centric Understanding of Audio Production Tasks
Widely adopted audio production tools provide detailed official guides, documentation, and video tutorials
2 to help users get started with these complex tools. In addition, there is also an extensive and growing number of user-generated video tutorials for audio production tasks on YouTube and other social media platforms. However, our formative interviews revealed that much of these resources are geared towards sighted people and do not always align with how screen reader users interact with audio production software. Rob explained that the primary challenge associated with these tutorials
“comes down to the lack of descriptions,” where sighted tutorial creators do not mention specific names and types of GUI elements and their relative position with respect to nearby elements – all critical information for screen reader users. Additional challenges stemmed from
“the language that gets used around driving software with a mouse” (Owen) and ambiguous visual-spatial deictic references such as
“I got this plugin up here, so I’m gonna come down here” (Rob). In addition, a big hurdle in following verbal description of elements in these tutorials is the mismatch between how an element appears visually to sighted tutorial makers versus how screen readers describe that element. Rob shared an example of this:
“What is described [by a sighted tutorial creator] as a dropdown menu, VoiceOver calls it a popup button. So some [blind] people are gonna hear ‘dropdown menu’ [in a tutorial], like ‘what’s that? I don’t see any dropdown menus in here.’”The above excerpts highlight the different ways in which tutorials geared towards sighted people fall short in supporting blind audio producers. Recognizing these shortcomings, our participants created screen reader-centric audio production tutorials to reduce “this gap of figuring out accessible equivalents for the workflows that those [mainstream] tutorials are demonstrating” (Owen). Our participants drew upon their prior experience, personal workflows, and workarounds developed over time to narrate their tutorials in a way that accounts for the limitations present in visually-oriented tutorials, for example, by referring to GUI elements using the standard terminologies used by screen readers, or by teaching screen reader-centric GUI navigation that does not rely on visual, mouse-based actions such as drag-and-drop. Beyond that, these blind trainers also highlight in their tutorials the inconsistencies and eccentricities associated with interacting with DAWs using screen readers. For example, screen readers may describe UI elements inaccurately or fail to announce certain UI changes due to lack of screen reader support. Such a case of screen reader failing to provide feedback appeared while Phil was recording a tutorial during our session, and he narrated this lack of feedback in his tutorial: “I’m pressing it (a keystroke) now. VoiceOver will not say anything.” Phil did this so that potential learners would know that this is not due to an error on the learners’ part. As another example, Phil clarified in his recorded tutorial how an element was described incorrectly by his screen reader: “Now this track is called [by the screen reader] ‘Komplete’. It’s actually a lie. It’s ‘electric piano.’ So ignore that."
Beyond carefully attending to screen reader feedback in their tutorials, our participants also shared time-saving workarounds and strategies using screen reader navigation and keyboard shortcuts to accomplish otherwise lengthy or complex tasks. Such strategies stemming from their long experience with these tools can uniquely boost productivity for blind learners and are not commonly shared in tutorials made by sighted trainers who are not familiar with the unique challenges of screen reader navigation. During Phil’s session, we saw instances of him incorporating such experiential knowledge about screen reader use in the tutorial he recorded:
Phil: We’ll press N for Native Instruments because I have far too many [instruments on the list] and it will take years if I don’t.
Screen Reader: Native instruments, sub menu.
Phil: First letter navigation is a good rule to know because it makes your life a lot easier and speeds up the world.
Hence, our findings not only shed light on strategies for making audio production tutorials accessible for screen reader users but also underscore the importance of blind trainers’ experiential knowledge. Rob commented, “I’m a screen reader user myself, and I know what I’d wish to see in non-screen reader content (tutorials). So some of it is just innate to me because I’m using these tools in the manner that the people who are watching the content will hopefully want to use it.” Thus, their personal experience with learning and using these tools gives them first-hand insight into the challenges blind learners face and the kind of instructions from which they would benefit. This led us to our first design goal: centering the experiential knowledge of expert blind trainers.
4.2 Facilitating Hands-On Structured Learning
In addition to creating tutorial content that focuses on screen reader users, participants also structure their curriculum and education style in ways that resonate with blind learners. One of the primary considerations participants mentioned is to facilitate “auditory learning” (Phil) where “you’re listening to someone performing an action or a group of actions” (Phil). Listening to how a blind trainer performs a task reveals how their screen reader responds to the trainer’s different actions and how these actions change the sound or music being produced. Through this, the learner is “going to be able to vicariously have that experience initially” (Josh) before actually performing the task themselves. In addition to auditory learning, participants described supporting hands-on practice where the trainers are “telling you what keys to press, and you can just pause the video or podcast and do those things” (Phil). Dylan explained that the effectiveness of hands-on practice extends beyond audio tutorials and applies to one-on-one training sessions as well.
“When I teach, they (students) are the ones piloting VoiceOver, I’m not. So I’m literally listening to their VoiceOver as they’re going through Pro Tools, and I’m telling them what they’re hearing — so that it’s not just a one-on-one thing where I’m doing all the talking and navigating of the computer, and they’re just listening. They’re actively engaged... it’s like, they’re learning from themselves.”
When actions are described and executed by the trainer in rapid succession in audio/video tutorials, it can be overwhelming for new learners to follow along. Participants described trying to alleviate “that sense of being overwhelmed” (Owen) among new learners by regulating the pace of instructions, both in tutorials and one-on-one classes, to allow learners to “go at their own pace” (Rob). Furthermore, participants described “breaking stuff (tasks) down into its simplest form” (Owen) while narrating a tutorial using “clear, concise directions on what they’re looking for, what they need to click on, what they need to navigate to... I try to be extremely specific as to what I’m saying” (Rob). Dylan explained that some learners who are new computer users may “have to have everything written out exactly in a list.” In such situations, written guides were helpful. Dylan said he “needed to write bulleted lists of directions on how to do something. Step one… step two… press this, then this.”
In summary, expert trainers decomposed tasks into small, manageable steps and put them in a format that learners can follow on their own, one step at a time. This led to our second design goal: scaffolding hands-on guided practice for learners.
4.3 Managing a Complex Tutorial Recording and Editing Workflow
Our observational and interview data revealed blind audio trainers’ complex workflows for creating audio/video tutorials, which involve performing required setup and pre-processing steps, managing a number of tools to execute the recording tasks, and editing and post-processing recorded tutorials. Participants shared that they needed to juggle between a number of additional applications (e.g., BlackHole, Loopback, etc.) to make sure that their recording captures multiple audio streams including audio tracks on DAW, screen reader feedback, and their own narration. Not only do they have to capture these audio streams but they also need to make sure that there is no auditory overlap and the levels of various audio sources are discernible and understandable (e.g., by slowing down screen reader speech rate when creating a tutorial). In some cases, participants prepared a detailed script to follow and practiced the content of a tutorial several times to minimize potential errors during recording. Others try to “wing it” in an impromptu manner, as Rob explained. In either case, participants expend substantial time and effort editing the recorded tutorials to get the “bad bit out” to reduce any potential for confusion and “help make this thing more palatable but [also] more educational and informational” (Rob). For some, editing tutorials (particularly video content) is so difficult and time consuming that it is easier to re-record the entire tutorial. Phil said, “When I do my YouTube videos, I don’t edit because I don’t have the capability of taking out bits or adding bits in later. So if it’s not right the first time, then I have to do it all over again.”
Although these blind trainers are motivated to create accessible tutorials, doing so means mastering elaborate tools, managing complex recording workflows, and putting in time to edit content so that the tutorials are instructive and appealing. This led to our third design goal: streamlining the workflow for recording accessible tutorials.
5 System Design and Development
To address the three design goals we identified through our formative work, we developed Tutoria11y, a macOS application for recording and playing interactive tutorials in GarageBand.
5.1 Description of an Interactive Tutorial
Tutoria11y has two primary modes of use: recording custom interactive tutorials and playing the interactive tutorials. An interactive tutorial created using Tutoria11y contains voice instructions on how to perform a task on GarageBand, much like a regular audio tutorial. However, each interactive tutorial is divided into multiple sections or steps. When an interactive tutorial is experienced, it will first play the instructions associated with the first step, and it will wait for a learner to perform the actions described in the instructions. Once the learner performs these actions successfully, the tutorial automatically unpauses itself and plays the instructions for the next step. We define these points where a tutorial stays paused between two consecutive steps waiting for the learner to perform some actions as breakpoints.
Each interactive tutorial consists of two files: a .tutorial file and a companion GarageBand project file. The GarageBand project file reflects the starting state of the task, and learners will perform the actions described in the tutorial using this project file. As an example, if an interactive tutorial involves unmuting a track, the companion GarageBand project may include a single audio track that is muted. The.tutorial file contains the auditory instructions recorded by the trainer, breakpoint timestamps, and a list of actions needed on the learner’s part to complete each of the steps. In the previous example, the list of actions would include a single action — unchecking the mute button on the track.
5.2 Recording Experience
Figure
1 shows the different stages of the recording process. The recording process can be started in two ways – either by clicking on the ‘record’ button on Tutoria11y’s user interface or by pressing a global keyboard shortcut (command-control-R). The second option allows a user to start recording from within GarageBand without having to switch back and forth between Tutoria11y and GarageBand. Once the recording has started (stage 1), the trainer will narrate one step of the task first (stage 2) and then perform the actions associated with that particular step themselves (stage 3). Stages 2 and 3 will be repeated for each subsequent step of the task, until all the steps have been narrated and performed (stage 4). When the trainer wants to stop recording (stage 5), they can either press the same keyboard shortcut (command-control-R) from within GarageBand or switch to the Tutoria11y application to click on the ‘stop recording’ button. A ‘save file’ dialog box will appear, allowing the trainer to type in a name for the tutorial and save it.
5.3 Playback Experience
Figure
2 shows the different stages of the playback experience. To play an interactive tutorial, a learner first needs to open the companion GarageBand project. Then they will click the ‘choose file’ button on Tutoria11y’s user interface. A ‘open file’ dialog box appears, and the learner will need to choose a tutorial file from their computer storage, press the ‘open’ button, and switch back to the GarageBand window. Once the ‘open’ button has been clicked and the tutorial starts playing (stage 1), the instructions for the first step of the task will be played (stage 2) and playback will pause automatically (stage 3). Once the playback has paused, the user will need to perform the actions associated with this step on the companion GarageBand project (stage 4). Only when the user has successfully completed the actions associated with the first step, Tutoria11y will resume playback and play the instructions for the second step. Stages 2-4 will repeat for each subsequent step, until the learner has successfully performed all the steps and completed the tutorial (stage 5).
5.4 Implementation Details
Tutoria11y is built using Objective-C and Swift. Tutoria11y’s user interface contains three elements: a ‘start recording/stop recording’ toggle button, a ‘choose file’ button to select and play a tutorial file, and a dropdown menu containing a list of DAWs. Using the dropdown menu, a user can choose which DAW the tutorial they are recording is intended for. The current version of Tutoria11y only supports GarageBand, since we intended to start with a free and basic DAW that comes included with macOS and has decent accessibility support that we could leverage for our system design.
The tutorial recording process requires access to three permissions from macOS: microphone access to record the trainer’s voice, macOS accessibility API access to keep track of changes made to the different UI elements within GarageBand, and speech recognition access to detect if the tutorial creator is speaking or not. When a trainer starts the recording process, Tutoria11y first takes a ‘snapshot’ of the accessibility hierarchy of GarageBand, which essentially saves the state or value of GarageBand UI elements at the beginning of recording. Whenever the trainer performs an action on GarageBand using their keyboard during the recording phase, Tutoria11y takes another snapshot of GarageBand’s accessibility hierarchy reflecting the most recent changes made to the UI. Each time a new snapshot is captured, Tutoria11y compares it with the previous snapshot to determine which GUI elements have been manipulated by the trainer between the previous and current snapshots and maintains a list of these UI changes and their timestamps. After the trainer completes recording the tutorial by pressing the ‘stop recording’ button or shortcut, Tutoria11y applies speech recognition to the trainer’s recorded voice to determine the timestamps of silences in-between the trainer’s narration. Furthermore, Tutoria11y also checks the list of UI element changes and their timestamps to determine if the tutorial creator performed any actions in the middle of a silence. If Tutoria11y finds any changes made to the GarageBand UI in the middle of a silence, Tutoria11y marks it as a breakpoint in the interactive tutorial. The silent regions that are associated with breakpoints are automatically trimmed out. Overall, the saved tutorial file contains the voice recording of the trainer, the timestamps of the implemented breakpoints, and the list of UI changes associated with each breakpoint.
When an interactive tutorial is opened for playback, the tutorial will automatically pause at each breakpoint and wait for the learner to replicate the exact UI changes associated with this breakpoint. At each breakpoint, whenever the learner performs an action on GarageBand using their keyboard, Tutoria11y takes a snapshot of GarageBand’s accessibility hierarchy and checks if the necessary GUI changes for the current breakpoint have been performed. After successful completion of a breakpoint, playback resumes immediately and Tutoria11y plays the next set of instructions without any silence in-between, since the silent portions associated with breakpoints are automatically trimmed out at the end of the recording process, thus allowing for a seamless playback experience.
Important to acknowledge here is that the current version of Tutoria11y does not account for situations where the learner makes a mistake during playback or the trainer performs an incorrect action during the recording phase, and the playback or recording will need to be restarted from beginning in such situations. Implementing easier ways to rectify such mistakes is an important next step — and our participants also reflect on this in our Findings section.
6 Design Exploration: Method
We conducted exploratory evaluation sessions with five blind audio production trainers whose specializations ranged from offering real-time training to creating audio/video tutorials or written guides. Since Tutoria11y was our participants’ first time experiencing interactive tutorials of any kind, an exploratory evaluation approach allowed us to observe how each of them recorded their first interactive tutorials based on their own instruction styles. In addition, it allowed them to freely ask us questions and share feedback in real time as they participated in the recording and playback activities. Our overarching goal was to solicit feedback on the recording and playback experiences of Tutoria11y and learn how they envisioned using Tutoria11y and interactive tutorials in their own training process. Our study was approved by the Institutional Review Board at our university.
6.1 Participants
Participants were recruited from our research network and snowball sampling (aged 45-70, all identified as male). Four participants were residents of the United States at the time of this study. Three of them also took part in our formative interviews. For audio production software, all participants used both Logic and GarageBand, although Dylan and Max primarily used Pro Tools. Phil, Rob, and Dylan frequently created audio tutorials and offered professional training to blind learners pursuing audio production. Seth prepared text-based tutorials and written guides for GarageBand, although he did not record audio tutorials. Max did not prepare tutorials on a formal basis, although he provided expert suggestions (in both written and audio format) on online forums for blind audio producers. All participants used VoiceOver as their primary screen reader, although Max and Dylan were also proficient with JAWS and NVDA. See Table
1 for details of participants’ self-reported visual disability, type of training offered, and DAWs used.
6.2 Procedure
The first author conducted design exploration sessions with participants via Zoom between March and June 2022. Each session lasted for approximately 90-120 minutes. The session with Seth was divided into two 90-minutes sessions on the same day due to delays caused by technical difficulties.
We started each session by collecting consent from the participants and walking them through the setup procedure for Tutoria11y. Next, to give them an idea of how an interactive audio tutorial works on Tutoria11y, we asked them to open and play a pre-recorded demo tutorial we created that walked them through unmuting a muted audio track and decreasing the volume level of that track. We asked them to share their initial impressions and thoughts on experiencing the interactive audio tutorial.
During the session, participants’ main tasks were creating two interactive audio tutorials using Tutoria11y on GarageBand. For the first task, all participants recorded a tutorial on a pre-selected topic: demonstrating how to trim out a silent portion from the middle of a track. For the second task, participants could select any topic on their own, but we asked them to choose a basic audio production task that could be completed in 1-3 minutes. After completing the recording process, participants were invited to play back the interactive tutorials they recorded. While participants were recording the tutorials and experiencing the playback, we took notes on their reactions and remarks. See Table
2 for details about the tutorials created by participants.
Prior to conducting our sessions, we sent participants all the necessary files, including the executable file for Tutoria11y, the pre-recorded demo tutorial and its companion GarageBand project, and another GarageBand project file that contained the necessary audio track and initial GarageBand UI state for the first task. We also provided detailed instructions for installing Tutoria11y and required setup steps on GarageBand. All participants except Phil used Tutoria11y to record and play interactive tutorials on GarageBand. Phil did not have GarageBand installed and used Tutoria11y on Logic to record the audio tutorials; however, he could not play his recorded tutorials himself, since the playback functionality was not yet implemented for Logic. Instead, we played back on our end the demo tutorial and a pre-recorded tutorial on the first topic (created by the research team), while Phil listened to how the playback functionality on Tutoria11y worked through Zoom.
All participants successfully completed recording the tutorials for both tasks, and all participants who used GarageBand experienced the demo interactive tutorial on their computers successfully. In addition, Max and Seth experienced their own recorded tutorials for the first task and Dylan experienced his own tutorials for both tasks. Participants sometimes ran into issues while experiencing one or both of their own recorded tutorials due to memory overflow issues or because their configuration of GarageBand UI during recording (e.g., full-screened window) did not match the configuration during playback — a scenario that Tutoria11y did not account for at that time. Participants who did not experience their own tutorial for the first task instead experienced an interactive tutorial on the same topic but created by the research team.
We concluded the sessions with an overall debrief on the entire Tutoria11y system, probing participants for their thoughts on the recording process and playback experience of interactive tutorials, how they might incorporate a tool like Tutoria11y into their training and tutorial building workflow, how interactive tutorials might shape tutorial playback experiences of blind learners, potential use-cases, trade-offs and challenges that might arise with Tutoria11y in comparison with their current work practices, and their suggestions for further improvement. Participants were compensated with a US$60 gift card. All sessions were recorded via audio and screen capture and transcribed for analysis.
6.3 Data analysis
We analyzed our observational data by reviewing the recorded sessions and coding user interaction with the system, including completion time, errors or points of confusion, and topics of tutorials created (see Table
2). We analyzed their comments and reflections on the system following thematic analysis [
3]. Our analysis involved a process of open and selective coding led by the first author, where we initially focused on our participants’ reactions to different aspects of the recording and playback experiences on Tutoria11y. The first and third authors met weekly to review initial codes and data together. Next, we analyzed their reflections on how the experience of recording and playing interactive tutorials on Tutoria11y compared to that of regular audio tutorials to identify how Tutoria11y could lower barriers to tutorial creation and scaffold accessible tutorial playback experiences for blind audio producers. Based on iterative refinement and examination of codes, we developed three distinct themes that capture Tutoria11y’s potential role in shaping accessible audio production training.
8 Discussion
Prior work within HCI and accessibility has called attention to the ways in which blind people must learn, navigate, and maintain a wide range of inaccessible tools to perform different forms of computer-supported skilled work (e.g., [
2,
7,
8,
28]). As our study and prior research shows [
30,
35], blind audio producers piece together accessible workflows through years of experience with navigating mainstream audio production tools (many of which are inaccessible), leverage their experiential knowledge to create and share unofficial accessibility scripts that boost efficiency of screen reader users, and advocate for improving accessibility in commercial tools with software developers. Moreover, they actively create access for others by passing on their knowledge through written guides, audio/video tutorials, and one-on-one and group teaching and by maintaining question-and-answer forums specifically geared towards blind audio producers. Despite the challenges associated with making accessible tutorials—such as having to buy and learn an extensive suite of software and hardware tools and spending hours to record and edit tutorials without financial remuneration—our participants deeply valued the joy and sense of purpose they received from sharing their knowledge with others. The design of Tutoria11y is meant to augment the existing efforts of blind audio producers and provide an accessible form of scaffolding to learn audio production for screen reader users. Below we discuss three key tensions in designing technologies that support the creation and use of accessible interactive tutorials for audio production and other computer-supported skilled practices more broadly as well as potential areas for improvement and future research.
8.1 Balancing Automation with User Expertise and Control
One key benefit of Tutuoria11y is that it automates tedious components of the tutorial creation workflow (e.g., by automatically trimming out silence and adding breakpoints). By doing so, the system facilitates conditions under which blind experts can focus more on narrating instructions and demonstrating corresponding actions and
“not concentrating on the technology” (Dylan) for recording the tutorial, which could potentially save time so they can
“get more content out” (Rob). On the surface, a deterministic view of accessibility may assert automating the entire workflow for generating accessible tutorials as an ideal design goal. However, our work illustrates that technological interventions that automate the entire process of tutorial creation may attempt to replace blind trainers’ expertise and experiential knowledge that are an integral component of what makes these learning materials accessible. Indeed, we observed how Phil—while recording the tutorial for an audio production task—explained idiosyncratic behavior of screen readers, alerted listeners when screen readers misrepresented a particular GUI element, detailed efficient ways for screen reader navigation (e.g., first-letter navigation), and provided descriptive instructions for interacting with the complex DAW interface. Without Phil’s screen reader-centric instructions, these tutorials would not have provided enough information required by novice blind learners [
33] who are just getting started with audio production tasks. Put differently, blind trainers’ rich experience with learning and figuring out accessible ways of audio production as screen reader users themselves make them uniquely suited to understand the challenges new learners with vision impairments face and accordingly tailor the learning resources they create for their target audience. While automating the process of tutorial creation based on user logs [
5,
10,
12,
18,
39] or inviting sighted authors to generate interactive tutorials for screen reader users [
34], as prior work has explored, could be one way forward, our work demonstrates another way of viewing the role of technology in accessible learning. In particular, we argue that for skilled practices like audio production, integrating disabled content creators’ knowledge is imperative to ensure accessibility of the learning resources and also honor their professional expertise, advocacy, and community efforts [
35].
8.2 Managing Context Switching across Multiple Interfaces
Prior work has introduced a range of systems that generate interactive tutorials to provide contextual assistance, i.e., learners receive guidance in the actual task interface, as opposed to traditional audio-video or text-based tutorials that require learners to repeatedly switch between the task interface and the tutorial window [
10,
27,
32,
39]. Our work demonstrates that the challenges with context switching are magnified for blind content creators who need to juggle between not only the task interface (e.g., Digital Audio Workstations or DAWs) and the tutorial playback interface (e.g., a browser window or a separate device for playback) but also manage screen readers as well as additional plugins and scripts (e.g., OSARA, Flo Tools) that are required to navigate inaccessible DAWs. Particularly in the context of audio production, rapidly shifting attention between spoken instructions in tutorials, auditory feedback from screen readers, and various audio tracks and effects in the DAWs can be cognitively overwhelming to screen reader users.
This experience is even more demanding for screen reader users who want to create tutorials for others. Our participants described managing as many as five different application interfaces at a time to make sure that their own voice through microphone, auditory feedback from screen reader, audio tracks on the DAWs, and also occasionally screencast videos or real-time interaction with students are all routed through appropriate channels, have discernible volume and speech rate, and are recorded properly in the resultant tutorials. Our expert participants have honed their skills over the years and are able to maintain this sophisticated workflow, but they highlighted how this complex process prohibits other blind audio producers from fully participating in creating accessible learning resources. While findings from our design exploration with Tutoria11y are promising, particularly the potential for a simplified and streamlined workflow, they point to a larger, systemic problem. Until content production tools become more widely accessible and easy to use with a screen reader, blind content creators must continue to put in extensive time and effort to learn, use, and share their knowledge of these tools with others.
8.3 Supporting Task-Based Learning vs Learning Higher-Level Skills
Prior work details how audio production for screen reader users is as much about learning how to navigate state-of-the-art digital audio tools as it is about learning the craft of audio production [
35]. That is, knowing how to use pervasive audio production tools is a crucial part of what it means to be a skilled and proficient audio production engineer. The design of Tutoria11y addresses this challenge by introducing a new way to create step-by-step interactive instructions for blind learners. Following from prior work [
27], guided tutorials can foster observational learning that enable novice blind users to learn by (auditorily) observing and replicating actions executed by expert blind audio producers. The aim is for interactive tutorials created through Tutoria11y to further scaffold novice learners by dividing complex instructions into
“bite-sized,” concise steps, an approach taken in prior work as well [
5,
10,
11,
39]. Such incremental, task-oriented learning can be particularly beneficial for new learners in developing self-efficacy, especially when they are just getting started [
34]. Yet, the decompositional nature of Tutoria11y may have drawbacks in that it may encourage learners to focus narrowly on executing steps required for a task rather than understanding the process at a higher level. That is, the ability to follow steps in a tutorial does not necessarily mean that users are able to use the tools fluently on their own. Future work must examine the kinds of tasks that are best supported by this approach and how blind instructors envision creating tutorials that teach higher-level skills required for audio production.
8.4 Limitations and Future Work
We acknowledge several limitations in our present paper and possible directions for future research. First, the current version of Tutoria11y does not allow trainers to rectify incorrect actions or mistakes in the narrated instructions without re-recording the entire tutorial. As our participants noted, allowing trainers to manually edit and re-record steps of an existing interactive tutorial will be an important feature to implement in future. Another notable limitation of Tutoria11y’s current version is that it does not provide notifications to learners when they take a deviant path, e.g., perform incorrect actions in a step. As such, to enable more effective learning experiences, future iterations need to look into ways to make learners aware of their mistakes and provide opportunities for course correction. This could be done by either pushing error alerts with earcons or spoken notifications from screen readers [
34] or playing narrated instructions that trainers may have previously recorded in their own voice for deviant paths. Finally, to gain a deeper understanding of the effects of interactive tutorials on accessible learning of audio production compared to non-interactive tutorials, evaluation of Tutoria11y’s playback experience with visually impaired students and beginners is an important future step.
9 Conclusion
With an overarching goal of supporting the creation of screen reader accessible learning resources for audio production tasks and grounded in interviews and observations with seven blind trainers, we developed Tutoria11y, an extension that supports blind audio producers in recording and experiencing accessible, interactive tutorials for GarageBand. Our design evaluation sessions with five blind trainers revealed the ways in which Tutoria11y could streamline and simplify accessible tutorial creation, augment and scaffold tutorial playback experiences for screen reader users, and complement real-time training sessions offered by our participants. Synthesis of our findings across both studies encourages rethinking the role of technology in accessible learning as one that supports, rather than automates or replaces, the knowledge of disabled trainers. Furthermore, we encourage future research to investigate how the lessons learned from Tutoria11y’s task-based approach could translate into accessible learning resources for the acquisition of higher-level skills among blind learners.