Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3613904.3642788acmconferencesArticle/Chapter ViewFull TextPublication PageschiConference Proceedingsconference-collections
research-article
Open access

Voice Assistive Technology for Activities of Daily Living: Developing an Alexa Telehealth Training for Adults with Cognitive-Communication Disorders

Published: 11 May 2024 Publication History

Abstract

Individuals with cognitive-communication disorders (CCDs) due to neurological conditions, such as traumatic brain injury and aphasia, experience difficulties in communication and cognition that impact their ability to perform activities of daily living, or ADLs (e.g., self-care, meal preparation, scheduling). Voice assistive technology (VAT) can support the independent performance of ADLs; however, there are limited VAT training programs that teach individuals with CCDs how to properly implement and use VAT for ADLs. The present study examined the implementation of an online training program using Alexa voice commands for five ADL domains (scheduling, entertainment, self-care, news & facts, and meal preparation). Using video analysis with seven adults with CCDs between ages 25 and 82 and interviews with five participants and three caregivers, we synthesized five weeks of training performance, analyzed participants' perceived benefits and challenges, and discussed challenges and opportunities for implementing VAT training for ADLs skills for adults with CCDs.

1 Introduction

Cognitive-communication disorders (CCDs) can be a result of traumatic brain injury, aphasia, and dementia (caused by falls, stroke, or brain tumor) and happen to individuals across the lifespan [1, 17]. For example, as one of the most common neurological disorders, traumatic brain injury (TBI) is ranked as the third largest burden contributor amongst all global diseases, impacting approximately 50-60 million people worldwide (including 1.5-2 million Americans and 2.5 million Europeans) each year [40, 41]. TBI also brings substantial societal and economic consequences for the global economy, with a cost of burden as high as $400 billion annually [42]. Another type of CCDs, aphasia, is an acquired multimodal language disorder. Aphasia leads to impaired cognition and inability to produce or comprehend written and/or spoken words [2], with an estimated 180,000 new cases (one out of every 272 Americans) a year per reports from the U.S. National Institute on Deafness and Other Communication Disorders [3]. As the global population ages, the prevalence of CCDs remains a top cause of death and disability in both developed and developing countries, with rates rising in older adults who are particularly vulnerable to falls, stroke, and other comorbidities (e.g., presbycusis, visual impairments and cognitive difficulties) [4,14,19,31-33]. Inconsistency in care across centers, regions, and countries, emphasize the pressing need for comprehensive and uniform approaches to care for individuals with CCDs [4, 42]. 
Individuals with CCDs benefit from cognitive-communication rehabilitation to improve memory, attention, language, planning and organization to perform in daily activities of daily living (ADLs) at home [4,5,6]. These ADLs encompass essential tasks such as meal preparation, medication management, following routine activities and appointments, managing finances, as well as self-care and hygiene [7]. Performing these ADLs often require a high level of cognitive function and sophisticated communication, therefore, difficulties in executing these tasks independently can increase caregiver dependence, thereby diminishing the individual's sense of self-autonomy [7]. 
Voice assistive technology (VAT), such as Apple Siri, Google Assistant, and Amazon Alexa, allows the user to use voice commands for a wide range of ADLs: setting alarms, searching for information on the web, playing music and obtaining weather updates and time [8, 9, 12-16]. While the adoption of VAT in the older adult population is expected to grow to 132 million users by 2030 [10, 11], existing VAT products only introduce updates on voice commands without offering customized and functional support that help individuals with CCDs to implement specific voice commands towards improving ADL skills for daily life. Therefore, it is critical to ensure that aging populations are educated on the proper use of VAT with supports from clinicians [11,24]. Developing a clinically informed and functional VAT training curriculum could improve the quality of life of older adults with CCDs. This study is one of the first known studies that use a plethora of Amazon Alexa voice commands to support five critical areas of ADLs for community-dwelling adults with CCDs. We specifically selected Amazon Alexa for the VAT training due to the diverse variety of voice-based activities and higher adoption of this VAT per prior research [13,21,53]. By examining the implementation of VAT training and gathering benefits and challenges of VAT, this study aims to:
1.
Investigate the implementation of a telehealth Alexa training curriculum for adults with CCDs to increase independence with ADLs in their homes.
2.
Examine the benefits and challenges associated with the VAT training for adults with CCDs and their caregivers.

2 Related Work

2.1 Benefits and challenges for VAT among adults with CCDs

Prior studies have reported VAT as easy-to-use and accessible tools for individuals with cognitive communication impairments when engaging in their daily activities [8, 16, 18-22, 48]. Goodwin and Brandtzaeg [49] analyzed the perceived benefits and challenges of using conversational user interfaces (CUI) with older adults with cognitive impairments from neurologists and web developers using semi-structured interviews. Using CUI enabled more engagement with human-like social interaction. Kulkarni and colleagues [24] described the use of Google Home among adults with aphasia via speech-to-text (57%), speech therapy practice (43%) and ADLs (28%) with instructions from a speech-language pathologist (SLP) in the clinical setting. Using Google Home helps to simulate “real life” settings for information searching, functional writing (e.g., emails and texting), and performance of ADLs, leading to increased independence, sociability, and confidence. Other studies reported older adults benefited from the use of VAT for increased activities of daily living, gaming for entertainment, and online connection with others [21, 22, 28, 43]. Prior work also found that older individuals with cognitive and visual impairments were found to show increased memory and task completion [18] and use of voice commands using VAT when provided compensatory strategies to perform functional tasks [19, 20].
On the contrary, some studies found challenges for these individuals with CCDs when using VAT [17, 48, 52]. Although many studies have explored the use of VAT among users with physical and visual impairments [22,33], due to a wide range of factors (e.g., different etiologies), individuals with cognitive and communication impairments may have complex communication needs that hinder their ability to use VAT [7,17,24]. While some studies suggested VATs such as Google Home and Amazon Echo were able to pick up most commands produced by individuals with TBI regardless of their cognitive and linguistic abilities [16], others reported critical usability challenges (e.g., pronunciation and phrasing errors due to speech recognition and natural language processing, timing errors due to response latency) [7] and accessibility issues (e.g., set up process, stable internet connectivity) [15]. For instance, to interact with VAT, users must demonstrate an adequate level of speech and cognitive skills [7]; to engage in extended conversational exchanges, individuals should have the ability to interact with the device in a timely manner, remember specific keywords, and utter concise command sequences involving planning, executive functioning, attention, long-term memory and working memory [8]. Therefore, accessibility and usability be significantly affected for individuals with CCDs, and additional trainings and supports are needed to support these individuals.

2.2 Implementing a VAT training curriculum for the home environment

The existing clinical intervention for individuals with CCDs predominantly revolves around utilizing mobile apps, alarms, digital calendars, and remote patient monitoring systems to serve as memory aids [38]. Oyesanya et al. [23] found that individuals with TBI who return home from medical settings have difficulty regaining their cognitive functions; however, the current literature has not thoroughly explored the potential adoption and feasibility of employing VAT within the home environment to compensate for cognitive needs such as memory, attention, and executive functioning. In an exploratory study with persons with TBI and their caregivers on the use of technology to address health, wellness, and safety concerns, [24] reported that the persons with TBI and caregivers wished to address technological solutions related to medication management, mobility, cognitive functioning, and social relationships. Based on the known affordances of VAT, it is possible that using VAT to help individuals with CCDs to not only improve their comprehension skills and ability to produce clear sentences but also execute functional daily tasks, which is also a key emphasis for communicative-cognitive rehabilitation [4-6]. Such training on communication and cognition, in turn, led to an increased independence and self-esteem and more opportunities to return to work for continued occupation [25]. These findings are a strong rationale to develop clinically informed training around ADLs skills for individuals with CCDs.
While VAT devices are becoming increasingly ubiquitous, their actual usage among adults with CCDs remains low. Existing literature has shown that within the older adult population, many individuals have felt that technology has not developed to cater to their specific needs and levels of technology literacy [12, 25, 26]. Studies on the interaction between VAT and its utilization by older adults with CCDs remain relatively sparse, and very few investigated VAT use in the natural home environment for users [8, 12]. For example, [12] reported that older adults primarily used digital sources for objective health information but did not find voice assistants (VAs) such as Amazon Alexa to be adequately tailored to their medical requirements. While [19] and [25] studied the effects of VAT use by adults with cognitive-communication impairments, a limitation arises in the fact that their studies failed to mimic a natural home environment due to control of noise level, distractions, and group interactions (e.g., multiple speakers). By 2050, it is estimated that 22% of the world's population will be over 60 years old (approximately 2 billion older adults) [45]. There is significant room for better design and implementation of VAT to fit the daily, functional needs of the growing older adult population among community-dwelling adults with CCDs [12,18].

3 Methods

3.1 Participants

This study encompassed two rounds of training sessions via two cohorts with a total of seven individuals with CCDs, in order to ensure quality of training for these individuals in a group telehealth service delivery format. Table 1 presents the demographic details and attendance records for the participants in the VAT training sessions. Cohort One (P1, P2 and P3) participated in the study for a total of 6 weeks; Cohort Two (P4, P5, P6 and P7) participated in the study for a total of 8 weeks, with the two extra weeks added to give participants additional breaks and times to complete pre-training baseline measures (Week 1) and post-training debrief interview (Week 8). Recruitment for both cohorts was conducted through word-of-mouth referrals and social media (e.g., Facebook Groups for TBI). To be eligible for the study, participants must (1) be adults with neurological impairments aged 18 years or older, (2) primarily residing at home with a minimum of 1-1.5 years post-injury without reported difficulties with vision or hearing, (3) possess stable internet and computer access with a functioning web camera, (4) receive adequate home support (defined as having at least one caregiver available for technology setup and troubleshooting), (5) be capable of completing all training via video conferencing and (6) have no prior VAT training on ADLs prior to the study. Across both cohorts, a total of 12 participants were initially recruited and only seven individuals completed all the pre-training testing for enrolling in the study. All participants received an Amazon Echo Show 5 device for use during the study period and as study compensation to keep after the study completion. This device contains a 5.5” screen, providing multimodal output to ensure increased accessibility to its users. Caregivers for P1, P3 and P5 were present during the VAT training. P3’s caregiver actively participated in each session, while the caregivers of P1 and P5 were present in the room but did not participate in the training. Cohort One primarily consisted of individuals with mild cognitive impairment resulting from TBI, spanning a more diverse age range (see Table 1). Conversely, participants in Cohort Two exhibited various disorders (e.g., TBI and aphasia) but generally fell within a similar age range.
Table 1:
Participant 
& Cohort #
AgeGenderLocationDiagnosisMedical BackgroundAttendance
P1 (C1)55FCaliforniaTBI5 years post TBI5/6 sessions
P2 (C1)25FFloridaTBI/Seizure5 years post TBI2/6 sessions
P3 (C1)82MNew JerseyTBI6 years post TBI6/6 sessions
P4 (C2)69FCaliforniaStroke4 years post stroke6/8 sessions
P5 (C2)55FCaliforniaTBI34 years post TBI7/8 sessions
P6 (C2)54MCaliforniaStroke10 years post stroke3/8 sessions
P7 (C2)66MCaliforniaTBI/Aphasia4 years post stroke7/8 sessions
Table 1: Demographic information & training attendance for enrolled participants 

3.2 Procedure and Data Analysis

3.2.1 Baseline Measures

To understand clients’ baseline language and cognitive ability to learn new voice commands, one licensed clinician gathered baseline measures (Table 2 & Table 3) using subtests from the three clinical language and cognitive assessments: Cognitive Linguistic Quick Test-Plus (CLQT+), Western Aphasia Battery (WAB) and the Ross Information Processing Assessment-2nd Edition (RIPA-2). It was noted when the baseline measures were collected for both cohorts that all participants demonstrated basic communication skills characterized by the ability to comprehend simple verbal communication and respond cohesively (observed through informal conversations). These basic communication and cognitive skills are important to gather to ensure participants’ ability to not only engage in verbal turn-taking with VAT but also participate in the online training for novel learning. Across all participants, P1, P2, P3, P4 and P6 demonstrated average comprehension and verbal language as noted by the scores on language subtests, such as conversational questions from the spontaneous speech subtest and auditory verbal comprehension subtests from the WAB; story retelling subtests from the CLQT+ and auditory processing subtests from RIPA-2. However, as evident from the picture description part of spontaneous speech tasks from the WAB; P3 (8/10), P5 (7/10) and P7 (8/10) demonstrated difficulties producing novel spontaneous language. On RIPA-2 subtests, P1, P2, P4, P5 and P7 scored lower on the immediate memory subtests (18/30), (19/30), (24/30), (21/30) and (18/30) respectively suggesting difficulties with working memory and attention. P2 also scored lower on temporal orientation subtest (recent) than temporal orientation (remote) indicating difficulty with short-term recall. P7 scored significantly lower than others on all orientation subtests and organization subtests from RIPA-2; and story retelling subtests from CLQT+. This indicated significant difficulties with word finding, sequencing and possibly remote and recent memory. It is worthy to note that when combined with informal subjective observations along with standardized measures, P7’s poor performance on the RIPA-2 subtests and story retelling from CLQT+ could be attributed to expressive language deficits due to underlying aphasia.
Table 2a
ParticipantImmediate MemoryRecent MemoryTemporal Orientation (Recent)Temporal Orientation (Remote)Orientation
to the Environment
OrganizationAuditory ProcessingTotal Score
P1(C1)18/3016/1826/2730/3012/1228/3024/30154/177
P2 (C1)19/3018/1819/2725/3011/1227/3030/30149/177
P3 (C1)
P4 (C2)
P5 (C2)
P6 (C2)
P7 (C2)
28/30
20/30
21/30
25/30
18/30
17/18
18/18
16/18
16/18
14/18
27/27
27/27
24/27
27/27
12/27
30/30
30/30
26/30
29/30
19/30
10/12
12/12
10/12
12/12
7/12
30/30
29/30
21/30
22/30
17/30
30/30
27/30
20/30
27/30
27/30
172/177
163/177
141/177
158/177
114/177
Table 2a : Scores collected from subtests from the RIPA-2
aTable 2 shows the participants’ scores from the RIPA-2 assessment. 7 subtests were administered by a licensed clinician. The immediate memory, recent memory and all the orientation-related subtests from the RIPA-2 assist in determining the working memory, short-term memory, temporal and environmental orientation for each of the participants, needed for gauging orientation and ability to recall the novel or previously used voice commands. The immediate memory, recent memory and all the orientation related subtests from the RIPA-2 assist in determining the working memory, short term memory, temporal and environmental orientation for each of the participants, needed for gauging orientation and ability to recall the novel or previously used voice commands.
Table 3a
 WAB*CLQT+**
ParticipantSpontaneous Speech*Auditory Verbal ComprehensionTotal ScoreStory Retelling
P1 (C1)10/1057/5767/6730/30
P2 (C1)9/1057/5766/6725/30
P3 (C1)
P4 (C2)
P5 (C2)
P6 (C2)
P7 (C2)
8/10
9/10
7/10
10/10
8/10
57/57
57/57
54/57
57/57
57/57
65/67
66/67
61/67
67/67
65/67
30/30
11/18
8/18
12/18
6/18
Table 3a : Scores collected from subtests from the WAB & CLQT+
aTable 3 displays the participants’ scores from three subtests from the WAB (spontaneous speech, auditory comprehension, and verbal total score) and the CLQT (story retelling). The spontaneous speech subtests from the WAB provided the participants’ ability to engage in conversation, formulate cohesive sentences needed for determining the ability to converse with the VA. The auditory/verbal comprehension from the WAB and auditory processing from the RIPA-2 indicate the ability to understand simple yes/no questions; determining the participants’ ability to process information auditorily needed for understanding the information provided by the VA. The story retelling subtests from CLQT+ determined each participant's attention, immediate recall, and sequencing of the story components essential for being able to repeat novel commands, especially for those that include more than single step utterances.
Following the administration of standardized tests, an adapted version of the Lawton-Brody Instrumental Activities of Daily Living Scale [34] was administered before the training to determine the level of function each participant exhibits in specific areas of ADLs using the VA. On the adapted version of Lawton-Brody Instrumental Activities of Daily Living Scale, each participant assessed their own ability to use the VA to complete tasks within each ADLs using specific voice commands as reported in Table 4. Each participant could score each of the commands within the ADL areas as 0= unable to execute, needing caregiver to execute the commands, 1= needs caregiver assistance for execution or 2= being able to execute commands independently. Lower scores in certain areas across both cohorts could indicate reduced familiarity with use of VAs for those ADLs or could be attributed to reduced/poor cognitive communication or language skills. P1 scored lower on scheduling reminders, selfcare/medical needs and meal preparation, indicating that the participant needed assistance from caregivers in these areas to complete tasks in this area. P5 and P7 scored significantly low across all the ADL areas, indicating increased need for caregiver assistance and/or reduced familiarity with the use of VAs for completing ADL tasks.
Table 4:
ParticipantScheduling & RemindersEntertainmentSelf-Care & Medical NeedsNews,
Facts, & Communication
Meal PreparationTotal Score
P150% (8/16)50% (8/16)50% (8/16)50% (8/16)50% (8/16)50% (8/16)
P2100% (20/20)100% (20/20)100% (20/20)100% (20/20)100% (20/20)100% (20/20)
P359.38% (19/32)59.38% (19/32)59.38% (19/32)59.38% (19/32)59.38% (19/32)59.38% (19/32)
P487.5% (21/24)87.5% (21/24)87.5% (21/24)87.5% (21/24)87.5% (21/24)87.5% (21/24)
P550% (10/20)50% (10/20)50% (10/20)50% (10/20)50% (10/20)50% (10/20)
P6 69.64% (78/112)69.64% (78/112)69.64% (78/112)69.64% (78/112)69.64% (78/112)69.64% (78/112)
P787.5% (14/16)87.5% (14/16)87.5% (14/16)87.5% (14/16)87.5% (14/16)87.5% (14/16)
Table 4: Baseline scores collected for “Instrumental Activities of Daily Living Scale Powered by Voice Assistant”

3.2.2 Study Procedure

All participants and caregivers received training on five topics: scheduling/reminders, entertainment, self-care and medical needs, news and facts/information, and meal preparation virtually. P1, P2, P3 and P3’s caregiver received training for six weeks learning one topic per week while P4, P5, P5’s caregiver, P6 and P7 received training for eight weeks, spending the first week on remaining sections of the baseline measures and weeks 4-5 on self-care/medical needs. All 1-hour weekly virtual sessions held via Zoom were delivered by graduate clinicians in speech-language pathology and supervised by a licensed speech-language pathologist. Participants were provided with a brief introduction to the training prior to introducing the first set of skills. Once the training structure was discussed during the introductory phase, clinicians presented designed user scenarios that served as a collective exercise for participants to practice various Alexa commands related to the week's theme. To stimulate reflection, participants were prompted with open-ended questions such as, "What comes to mind when we think of self-care/hygiene?” After introducing each week's skill as a group, participants were placed into assigned breakout rooms and first presented with the user scenario followed by an instructional video on how the command is executed. Cohort One received training from a single graduate clinician, whereas Cohort Two received training from two graduate clinicians. The training was delivered using errorless learning intervention through targeting cognitive and short-term memory skills and to increase accuracy in executing voice commands on the device [35].
For Cohort One, both clinicians running the training asked each participant what they would use the command for to demonstrate personalized interactions between Alexa and its user. Participants in Cohort Two were asked what they would use each command for during their private session in the breakout room. Within these individual sessions, participants engaged in hands-on practice of pre-written commands. Clinicians meticulously observed and documented participants' trials, noting any significant observations or challenges encountered during the exercises. Multimodal prompting included visual and verbal cues such as showing the command by sharing screen with participants and/or redirecting participants to say the appropriate command (e.g., breaking down the command). For instance, certain participants received verbal prompting from clinicians instructing them on what to ask Alexa, such as "Ask Alexa how to get a stain out." In contrast, other participants required visual prompts, which involved clinicians writing down the specific questions to pose to Alexa: "Ask Alexa about stain removal." Every session was video recorded per participants’ consent and recorded sessions were emailed to participants (and participant's caregivers) immediately following the session for additional practice. Offering recorded videos and commands on a document allowed participants to return to any commands they may have missed during the live training. At the end of each training, participants and their caregivers engaged in a debrief interview session (see supplemental materials) to share their experiences of using VAs and provide feedback on the training. 
When establishing a training curriculum for Cohort One, skills chosen were carefully orchestrated to align with the participants’ daily activities and needs. Each skill included specific commands that were selected based on their contribution to the skills being taught and the participants’ ability to effectively execute each command [35]. Following completion of the first training for Cohort One, clinicians decided to expand the training curriculum to span over eight weeks. This involved allocating two weeks for “Self-Care and Medical Needs” and dedicating the first week to the collection of baseline measures and introducing the training program. In addition to expansion, clinicians also decided to rearrange the taught commands to facilitate a step-by-step learning approach. For instance, when teaching participants commands for “Scheduling and Reminders,” participants were first taught single step commands such as “Alexa, set timer for one minute,” followed by teaching them multi-step commands such as “Alexa, set a reminder for Alexa trainings on Thursdays at 5:30pm.” This approach aimed to ensure increased familiarity with commands and promote increased engagement with the devices.
During the final week of each training, all participants and their caregivers engaged in a 1.5 hour debrief interview. This served as a platform for participants and their caregivers to discuss perceived benefits and challenges to using the device for their ADLs. Questions were created to investigate participant's and caregiver's “device satisfaction” and “training satisfaction.” The primary aim of exploring these two aspects was to adapt our training methodologies based on the participants’ feedback, their performance, and their perceived benefits and challenges throughout the training. This comprehensive approach allowed us to identify areas for improvement, leading to the expansion of our curriculum. This expansion involved incorporating a wider range of voice commands, dedicating more time to instructing these commands, and critically evaluating our service delivery during the training process. The structured approach ensured that the training sessions were both informative and participatory, fostering a conducive learning environment for clinicians exploring the capabilities of voice assist technology in their therapeutic practices.

3.2.3 Data Analysis

Data collected for each practiced command during training included the accuracy achieved when executing a given command (successful opportunities/total opportunities given), the type of prompting employed and other relevant observations such as device limitations. Specifically, video analysis [36] was conducted as a specific unit in determining participant performance. For Cohort one data, two graduate clinicians analyzed all five video recordings, collecting data based on established criteria. A third student clinician who was not present for the training, analyzed the same videos, incorporating any missed cues and/or opportunities. This resulted in an overall inter-rater reliability of 88.6% for accuracy of executing voice commands and 89.4% for prompting use [37]. For Cohort Two data, one clinician collected data during each weekly session while the other clinician took the lead. Four graduate clinicians analyzed videos of participants they were not directly working with, making additions, or modifications to any cues and/or opportunities that were overlooked. This resulted in an overall inter-rater reliability of 73.47% for accuracy of execution and 92.60% for prompting use. Thematic analysis [37] was employed for brief interviews from five participants (P1, P3, P4, P5 and P7) and two caregivers (P3 and P5). Two licensed SLPs coded the interviews, focusing on user and caregiver satisfaction, as well as reported challenges and recommendations for both the Amazon Alexa device and the VAT training.

4 Results

4.1 Overview of Participant Performance

Participants who attended all to almost all sessions demonstrated varying consistencies between 17% and 100% with minimal to maximal cueing, showing gradual improvement in the execution of commands, as summarized in Table 5. While P1 and P2 required assistance when finding specific information (i.e., a certain movie trailer), P4 stated that she required assistance when asking Alexa for a doctor nearby (i.e. a medical doctor). Due to difficulties with working and short-term memory, P1, P4, P5 and P7 expressed the most interest in using Alexa to help simplify the process of carrying out ADLs by having Alexa remind them of tasks that needed to be completed such as “taking medications, putting a load [of laundry] in and hanging towels in the bathroom.” When asked what they used prior to using VAT for their reminders, P4 and P7 had mentioned “writing down what they needed to do” on a piece of paper. P7 reported on relying on his wife to remind him to write down the task first for him to remember it later. In addition to using Alexa for reminders, P5 and P2 had expressed interest in utilizing more of Alexa's functions independently such as “asking for the weather”, asking for a specific doctor and playing videos and music.
Throughout the training, each participant had their own levels of proficiency when executing commands. P1 and P2 had maintained consistent accuracy for each week, P1 scoring overall accuracy of 92% with minimal to no cueing and P2 scoring 100% accuracy with minimal to no cueing. P4 and P5 had also maintained similar consistencies ranging from 75-76% accuracy with P5 requiring moderate visual and verbal cues when including essential verbiage to allow for better identification of context. For instance, P5 would require a verbal cue to include the appropriate time of day (i.e., am and pm) desired hour of the day, “Alexa, remind me at 7 o'clock to shower every evening.” P7 was observed to show the most challenge and frustration towards device functions, indicating that it “doesn't hear what he is saying.” He required maximum verbal and visual cues, showing an overall accuracy of only 40%. Various challenges were overcome through clinician prompting and instruction, who guided participants to troubleshoot the device or use of intelligibility strategies such as slowing down, increasing volume, over-articulation, and condensing long utterances. Despite challenges experienced, all participants had expressed interest in pursuing more of Alexa's functions during their final debrief interview. Furthermore, cognitive linguistic skills in the areas of memory and attention were targeted via various strategies embedded within the training sessions. Memory was targeted through introduction of memory strategies such as setting alarms/reminders and repetition of commands to enforce recall. Sustained and alternating attention skills were targeted via aspects of the training that involved sequencing steps to troubleshoot the device. To reduce the length of verbose commands, clinicians administered modeling and presentation of concise commands written on the screen throughout both training sessions. These strategies were effective in guiding patients to execute appropriate commands for Alexa. 
Table 5
ParticipantWeek 1: Scheduling & RemindersWeek 2: EntertainmentWeek 3: Self-Care & Medical NeedsWeek 4: News, Facts, & CommunicationWeek 5: Meal Preparation
AccuracyCuesAccuracyCuesAccuracyCuesAccuracyCuesAccuracyCues
P1100%059%1/2100%0N/AN/a100%0
P286%0N/AN/AN/AN/A100%1/1N/AN/A
P359%1/563%3/744%4/753%6/1066%8/10
P464%9/1563%9/977%9/9N/AN/A85%5/5
P560%6/1274%20/2085.71%1/197%2/266%17/22
P6
44%5/567%1/1N/AN/AN/AN/AN/AN/A
P718%7/767%10/1133%3/1360%3/344%18/29
Table 5 a: Video data collected for weekly lessons of VAT training
a Table 5 represents collected data that was organized for each week based on (1) accuracy, percentage of opportunities and (2) number of successful cues out of total cues given. N/A represents data for participants who were not in attendance or if cues were not needed to execute commands. These scores do not reflect participants' level of cognition but their ability to execute commands efficiently and/or with assistance needed. Note that Week 3 content was delivered across two separate weeks for P4 vs. P5, P6, & P7 due to scheduling conflicts.

4.2 Week 1: Scheduling and Reminders

In Week One of training, participants received a brief introduction to the curriculum, facilitating collaboration between clinicians and participants to establish rapport and familiarity with devices. Participants were informed of skills to cover each week, time commitment and anticipated training outcomes. Following the introduction, the first lesson focused on scheduling events and setting reminders. Clinicians demonstrated setting a reminder for the training at 5:30pm every Thursday. Participants were then provided with a step-by-step “roadmap” for executing voice commands. It was essential for participants to be well-informed about the prompts for setting events or reminders. The roadmap served as a visual guide to help participants break down commands for scheduling. Figure 1 illustrates a notable divergence between participants' self-reported baseline ADL measures and their accuracy in executing voice commands during Week One of VAT training. The data suggests a potential inconsistency, with some participants reporting high ADL measures but demonstrating lower accuracy in voice command execution. This discrepancy may be attributed to participants still acclimating to the device, leading to larger gaps between self-reported capabilities and actual performance.
Figure 1:
A graph of blue and white bars
Figure 1: Comparison of self-reported baseline ADL measure and accuracy in executing voice commands during Week One VAT training “Scheduling and Reminders”

4.2.1 Cohort One Training and Performance

P1 and P2 executed all commands with minimal to no cues, achieving 100% and 86% accuracy, respectively (Table 5). While P1 and P3 relied on caregiver support, P2 expressed interest in exploring additional Alexa commands, such as finding recipes, listening to music, and watching TV. In contrast, P3 often spoke in discourse to Alexa (i.e., would say, “Alexa, please prepare dinner to be ready at 8 o'clock tonight and let me know when the dinner is ready” instead of “Set reminder for dinner”), requiring a total of five verbal cues to streamline his commands for detection and processing (Table 5). Challenges with lengthy commands were resolved through consistent provision of clinician modeling and visual supports including the commands written on the screen. P3’s caregiver, present throughout the training, assisted in cueing.

4.2.2 Cohort Two Training and Performance

With the exception of P4, all participants managed to articulate the complete command “Alexa set reminder for Alexa training for Thursdays at 5:30 pm” in one breath independently, but still relied on the visual roadmap to break down the provided command. It was noted that participants benefited from reading commands to assist in remembering what to ask (e.g., “Alexa set an appointment for Monday at 10 o'clock” for P4, “Set me a timer for 45 minutes” for P5). Although P4, P5 and P6 required minimal to moderate verbal cues when producing commands, with 44-60% accuracy, P7 was frustrated due to continuous failed attempts to activate Alexa (Table 5). For instance, when P7 was given seven opportunities to produce the command “Alexa, delete everyday appointment for training at 5:30 pm,” he stated that Alexa continued to “cut him off” and “not hear what he was saying.” Despite P7’s frustration and unfamiliarity with the device, clinicians found P7 benefited most from having commands typed out using the whiteboard feature on zoom. However, he continued to require moderate-maximum visual and verbal cues, only achieving 18% accuracy when executing all commands (Table 5). P5’s caregiver assisted in cueing throughout the training.

4.3 Week 2: Entertainment

During Week Two's training session, clinicians aimed to teach commands for navigating entertainment platforms, introducing participants to commands that would automatically open YouTube for sourcing entertainment and media. Using a visual roadmap, the session encompassed instruction on-device controls, such as the pause, volume up/down, fast forward/go back, play from the beginning/restart again and the go-home features. Figure 2 compares self-reported ADL measures and actual accuracy during Week 2, revealing disparities for some participants, like P1 and P5. The table provides insights into the alignment or variance between perceived ADL capabilities and actual performance in voice command execution.
Figure 2:
A graph of red and pink bars
Figure 2: Comparison of self-reported baseline ADL measure and accuracy in executing voice commands during Week Two VAT training “Entertainment.” P2 did not attend the VAT training and the missing data was indicated with a dotted bar graph.

4.3.1 Cohort One Training and Performance

In Cohort 1, P1 effectively executed 59% of her commands with minimal to moderate verbal and visual prompting, while P3 executed 63% accuracy, given maximum visual and verbal cueing (Table 5). Both participants found it beneficial when the clinicians modeled the commands initially. Challenges included troubleshooting when the device didn't register a command (e.g., using alternative, more words and phrases) and voice recognition issues (e.g., mumbling when the movie title was too long or producing intelligible speech), which were addressed through step-by-step verbal and visual tutorials, effectively assisting P1 and P3 in generating successful commands.

4.3.2 Cohort Two Training and Performance

For Cohort Two, all participants were present, revealing a noticeable alignment within the group. P4 and P5 exhibited parallel learning trajectories, excelling in key training aspects. P4 achieved 100% accuracy in various commands, demonstrating proficiency in video controls and touchscreen usage. P5's progress, influenced by a practice-and-model approach, resulted in consistent accuracy in video-related commands. P6 showcased independence and adaptability, achieving success in diverse command executions, while P7 demonstrated engagement and proficiency with personalized preferences. However, challenges were noted, such as difficulties in setting reminders for bill payments without visual cues. Clinicians observed participants having to repeat and rephrase commands, with variations in command effectiveness between participants.

4.4 Week 3: Selfcare and Medical Needs

Training in Week Three focused on self-care, hygiene, and medical needs, spanning two weeks for Cohort Two participants due to time conflicts. The clinician commenced by providing examples of self-care and hygiene, emphasizing Alexa's ability to assist in setting reminders for crucial appointments, locating COVID test and flu shot locations, and searching for nearby doctors, as well as hair and skin necessities. Additionally, the clinician showcased the touchscreen feature of Alexa, demonstrating to all participants how they could tap on the device and make selections using their fingers. Figure 3 provides a comparison between participants' self-reported baseline ADL measures and their accuracy in executing voice commands during Week Three of VAT training. Notably, participants like P1 reported perfect self-assessment in ADL, which aligned to his actual percent accuracy achieved. Conversely, participants like P3 self-reported a higher percentage in ADL but attained an accuracy of 44% accuracy, highlighting a potential inconsistency. The table offers insights into the concordance or variance between participants' perceived ADL capabilities and their actual performance in utilizing voice commands during the third week of training.
Figure 3:
A graph of a number of people
Figure 3: Comparison of self-reported baseline ADL measure and accuracy in executing voice commands during Week Three VAT training “Self-care and Medical Needs.” P2 and P6 did not attend the VAT training and the missing data was indicated with a dotted bar graph.

4.4.1 Cohort One Training and Performance

The first user scenario presented was “need to get the COVID vaccine?” Both P1 and P3 executed the command “Alexa, where can I get the COVID vaccine/test” in 100% of opportunities, given minimal visual cues. The next section of the training focused primarily on laundry needs and the clinician encouraged participants to cater the commands toward their personal needs. The clinician also mentioned that Alexa could be used to sequence the steps for doing laundry, know how to get stains out and know when to use fabric softener. Overall, P1 executed all the commands, given minimal visual cues and shared that she benefits from extra support from her family to finish her tasks due to her need for frequent breaks. P3 accurately executed 44% of the commands, given maximum verbal cues and redirection (Table 5) and expressed that he wants to utilize Alexa to know the specifics of the types of laundry he is doing (e.g., when he has guests, what clothing has a particular texture, or washing temperatures). P3 continued to speak in discourse frequently to Alexa (e.g., “Can you give me an address to each of those sites? 1st, 2nd, 3rd and 4th. May I have the address for address #1”) and required reminders from both clinician and caregiver to rephrase the command, speak louder and make his commands more concise. To address P3’s difficulty in speaking in discourse, clinicians modified the training slides by typing the exact script P3 should state and modeled accordingly. P3’s caregiver expressed concern that P3 required a lot of support to finish the commands because he oftentimes would get distracted.

4.4.2 Cohort One Training and Performance

As an expansion to the training curriculum for self-care/hygiene needs, participants in Cohort Two had the opportunity to also practice commands on finding mental health tips, finding different healthcare providers for different needs and injuries (e.g., finding an audiologist for ear pains, finding a doctor for stomach pains, etc.) and where to get different medicine and prescriptions. In addition, participants in Cohort Two were also presented with the same commands from Cohort One and P4 and P5 executed the command for “need to get a COVID vaccine?” in 100% of opportunities, given minimal visual and verbal cues. P7 executed the command in one out of six opportunities and benefitted from maximum verbal prompting to emphasize the “A” in the word, “Alexa.” P4 and P5 demonstrated overall adequate progress in their ability to effectively utilize their devices and generate more spontaneous commands independently. For example, P4 would spontaneously produce commands on her own, such as “Alexa tell me a laundry joke,” or “Alexa do I use fabric softener for white clothes?” P5 also exhibited skill in modifying her commands, as evidenced by her ability to rephrase a provided command (e.g., changing "Alexa how do I treat a migraine?" to "Alexa how do I get rid of a migraine?"). Although P7 demonstrated gradual improvement in device usage and familiarity compared to previous weeks (Table 5), P7 achieved a 33% accuracy when executing the commands for this week's training, requiring a maximum number of verbal prompts and cues. P7 continued to benefit from reminders provided by clinicians to ensure his device was on before executing a command, overemphasize the “A” in “Alexa,” and to speak more slowly. P6 was not present for this portion of the training.

4.5 Week 4: News, Facts, and Communication

Week Four of training highlighted commands on News, Facts and Communication. P2, P3, P4, P5 and P7 were in attendance for this week's training, while P1 and P6 were not present. Participants encountered challenges related to device responsiveness and recognition, occasionally requiring device restarts to restore functionality. Clinicians addressed these difficulties by encouraging participants to say “Alexa, go home” or “Alexa, stop” whenever a breakdown occurred or if the device did not pick up their commands accurately. Figure 4 illustrates the comparison between self-reported baseline ADL measures and accuracy in executing voice commands during Week Four of VAT training, focusing on "News, Facts, and Communication." Notably, absent participants P1, P4, and P6 are indicated with dotted bars, while present participants, such as P2, exhibited a high correlation between reported ADL measures and accuracy in executing voice commands.
Figure 4:
Figure 4: Comparison of self-reported baseline ADL measure and accuracy in executing voice commands during Week Four VAT training “News, Facts, and Communication.” P1, P4, and P6 did not attend the VAT training and the missing data was indicated with a dotted bar graph.

4.5.1 Cohort One Training and Performance

In Cohort One, P3 practiced a total of 11 commands and 10/11 commands were executed properly. Overall, P3 needed a moderate amount of visual prompting and moderate verbal cueing for 7/11 commands. The participant benefited from verbal cues and visual prompts from both clinician and caregiver to use specific and concise commands to improve accuracy. This participant also needed the response rate of Alexa to be slowed down to reduce the unwanted responses before the participant could complete the command. Of the ten cues given, six successfully assisted P3 in stating a working command (Table 5). P2 joined in late and attempted and successfully executed 3/3 commands, with minimal to no cueing. She did not rely on visual prompts on the screen.

4.5.2 Cohort Two Training and Performance

In Cohort Two, P4 was enlightened about mitigating interference issues arising from the ‘Alexa’ command. Participants showed progress through moderate prompting and repeated instruction and reminders. P5 demonstrated consistent cooperation throughout the session, achieving a 97% accuracy in command execution with minimal cues. P5 interacted with commands related to weather, scheduling, reminders, and communication, demonstrating an improved proficiency in rephrasing and customizing commands to meet their individual needs and communication style. P5’s engagement was particularly visible in the commands related to news, facts, and communication. Additionally, P5 adeptly used prompts to stop Alexa when needed and effectively managed the creation, modification and cancellation of reminders and appointments. Although P7 faced challenges in utilizing the mute/unmute feature, clinicians provided an alternative to restart the device whenever a breakdown occurred. P7’s accuracy in executing commands exhibited variability across different prompts. For example, P7’s commands spanned a range of topics, from weather inquires to medical inquiries and event scheduling; however, P7 exhibited a preference for entertainment and news topics over self-care. P7’s ability to deliver clear commands demonstrated marked improvement, potentially attributed to an increased device volume.

4.6 Meal Preparation

Week Five's training module consisted of commands for utilizing Alexa to assist with meal preparation (e.g., finding recipes, creating a shopping list, inquiring about nutritional facts and food substitutions). Figure 5 depicts the comparison between self-reported baseline ADL measures and accuracy in executing voice commands during Week Five of VAT training, focused on "News, Facts, and Communication." Participants P2 and P6 were absent from the training, and their missing data is represented with dotted bars in the graph. Noteworthy observations included P1 reporting a 50% ADL measure but achieving 100% accuracy, and P7 reporting an ADL measure of 96% but achieving an accuracy rate of 44%.
Figure 5:
A graph of a meal preparation
Figure 5: Comparison of self-reported baseline ADL measure and accuracy in executing voice commands during Week 5 VAT training “Meal Preparation.” P2 and P6 did not attend the VAT training and the missing data was indicated with a dotted bar graph.

4.6.1 Cohort One Training and Performance

In Cohort One, P1 did not report any troubleshooting or difficulty with commands learned from the previous week; however, P3 expressed concerns regarding content from Week Two on asking Alexa to play movie trailers. P3 was offered a one-on-one session with the clinician outside of training to help execute the commands. P2 and P6 were not in attendance. P3 and P5’s caregivers were in attendance and assisted with troubleshooting. Week Five training began by allowing all participants to ask Alexa about a recipe they were most interested in. Requests ranged from steak and fajitas to vegan cakes. P1 independently executed all commands from Week Five's training module. When P3 was given the opportunity to find a recipe for “steak,” he required verbal and visual cues to help him break down utterances. For example, P3 produced long utterances such as, “we need a recipe for a meal for an entree of steak” rather than, “find me a recipe for steak.” To provide verbal and visual cues for all participants, the commands were modeled by the clinician and written out on the screen. In addition, P3 was verbally cued to ask Alexa to “slow down” due to concerns over Alexa “not letting him finish.” Of the ten cues given to P3, eight cues (e.g., verbal cues from caregiver, gestural and visual prompts) were successful in eliciting a working command (Table 5).

4.6.2 Cohort Two Training and Performance

In Cohort Two, P4 produced Week Five commands with 85% accuracy and required minimal verbal or visual cues (Table 5). P4 not only customized each taught command to her specific recipe but quickly followed up with commands that were not part of the training curriculum, inquiring about foods she ate on a regular basis (e.g., “Alexa, how much sodium is in a cookie? Burrito?”; “Alexa, what is used for gluten-free pizza crust?”). Since P5 had difficulty reading, written cues were not used to help elicit working commands. As an alternative, the clinician verbally modeled the exact script P5 could state to execute the command. The clinician had the participant first practice a few times without waking up the device before proceeding. P7 required maximal verbal and visual cues, producing taught commands with 51% accuracy (Table 5) due to technical difficulties (e.g., blue line not appearing, mute button being turned on, device getting stuck and needing to be reset).

4.7 Post-training Debrief Interview

4.7.1 Cohort Two Training and Performance

At the end of the VAT training, a post-training debrief session was conducted with five participants via Zoom video conferencing. Caregivers were invited to attend a debrief session for an interview regarding their experience of the VAT training along with the participants. Participants shared about their perceived benefits and difficulties they faced throughout the training. Participants reported multiple perceived benefits for using VAT, such as the increased ability to independently perform information-seeking tasks for orientation to time and event (e.g., date, time, weather, movies) per P1, utilizing reminders (P3 and P5), setting alarms (P5), alerts for medication management per P3, and human-like interaction (P5). Additionally, P4, P5 and P7 stated they would continue to use Alexa for meal preparation and looking up recipes, medication-related reminders (P5 and P7), remembering appointments (P4), looking up news/facts (P5), and finding stores to buy particular hygiene and self-care products (P7). Commonly reported participant difficulties included: forgetfulness with medications as well as instructions to create and cancel medication reminders (P3), using accurate vocabulary, shorter commands, device functions (P4), multimodal interface; however only initially (P4) and appropriate rate of speech to communicate with Alexa (P3 & P3’s caregiver, P5’s caregiver), listening to Alexa clearly and intelligibly via hearing aids (P3), turning and attending to the status of the Alexa device via function buttons (e.g., on/off, mute) (P3 and P7), challenges due to technology literacy (P3’s caregiver), frustration with Alexa's poor speech recognition (P1’s caregiver, P5 & P5’s caregiver and P7) and increased sensitivity to wake word (P5). To cope with these difficulties, P3 reported that “I will adjust the way I speak, the way I begin speaking, remember to address Alexa by name specifically every time and speak of what I'm going to tell her so that I think she can decipher the content, the way it's supposed to be expressed.” 
Specific suggestions for modifying the VAT training curriculum were also documented during the interview. For example, to improve his ability to generate more accurate and intelligible voice commands for Alexa, P3 shared his personal learning preferences: “I do better if I have written instructions and something to read later as to how to do things or undo things and things of that nature. Because that's how I learned originally in life… If we kind of structure a command to reflect what the instructions said we should do, I, for one, do it better than I would otherwise.” Regarding feedback for future training post Cohort One, caregivers for both P1 and P3 mentioned that longer or ongoing training related to complex ADLs such as meal preparation and medication would be most helpful. Specifically, caregivers recommended using Alexa to offer a selection of menus and suggestions for step-by-step instructions during cooking. The meal preparation including finding recipes, ingredients and going through steps to cook a recipe was added to Cohort Two after this feedback from Cohort One. Post training for Cohort Two, P5 mentioned “longer training” to practice complex skills using Alexa for further independence with using VA and P4 mentioned “cheat sheet” meaning list of all the commands practiced during each training session for continued practice at home. These participants and caregiver feedback indicated that future VAT training curriculum could benefit from (1) more training focused on the operational competence of the VAT device with written instructions in addition to training videos, (2) customized exercises on using functional communication strategies with Alexa and (3) extended time and practice dedicated to teaching complex ADLs (e.g., meal preparation, medication management) via cognitive communication strategies.

4.7.2 Caregiver Interview

Feedback on the VAT training was also collected from caregivers for P1, P3, and P5 who demonstrated different levels of independence for VAT use and caregiver assistants. During the interview with caregivers, P1’s caregiver reported that P1 was very independent in her learning process of using VAT, which was consistent with P1’s own debrief feedback: “What I wanted to do was discontinue one of the things I had asked her to program, and I was wondering how to do that. And I think I was at a stage where I was thinking no you just ask her. But I don't know if it would work or not. So that's the only reference I would have to do. If I, if I ran into something.” On the other hand, P3’s caregiver mentioned that although P3 was actively using VAT at home over the period of the training, she has offered additional caregiver assistance to support P3, such as asking Alexa to remind their loved ones for medication and getting mails or newspapers (P3’s caregiver). Specifically, P3’s caregiver stated that “if I saw him giving orders to her or reminders, I think he will step up and do it after I do it a couple of times. And then he will have to do it.” P3 also continued to use a conventional paper-based calendar to write down his appointment reminders; therefore, P3’s caregiver commented that “If he [P3] had instructions, written instructions, he would feel more confident (in using VAT frequently).”). The presence of caregivers appeared to positively impact the performance of participants. Caregiver training was not provided, rather the caregivers were invited to observe, participate, and reinforce the clinician's instructions. In P3’s case, caregiver participation was beneficial in assisting with the navigation of device controls and reinforcement of clinician-instructed tasks. Attendance from caregivers was optional as not all participants were trained alongside their caregivers. Caregiver participation was most beneficial to those patients who were dependent on functional ADLs and required maximal assistance; whereas participants who were mainly independent in ADLs did not feel the need to attend the training sessions with a caregiver. The absence or presence of caregivers may have minimal effects on participant performance; however, these variable warrants further study.

5 Discussion

To our knowledge, this study is one of the first studies to investigate the interaction and efficacy of a VAT curriculum targeting ADLs for adults with CCDs in the home setting via telehealth. It brings multiple unique contributions to both fields of cognitive rehabilitation and assistive technology research for older adults. Differing from previous research on examining the usability and implementation of VAT for adults with CCDs [2, 11, 12, 14], this project aims to examine how an online training curriculum can be used to teach older adults with CCDs to strategically use VAT for ADLs. Using a clinically informed training curriculum [35], this study offered unique opportunities to simulate an in-home VAT training support that can increase personalization of voice commands and encourage personalized instructions between clinicians and individuals with CCDs. Drawing from clinically based baseline measures for language, cognition, and ADL skills and the debrief interview post-training [35], we established a better understanding of each participant's communication-cognitive rehabilitation needs and documented improvement after the training to demonstrate preliminary efficacy. 

5.1 Benefits of VAT: Increase independence and generalization of ADLs

This virtual training offered a platform for participants to acquire new skills within a communal/group environment. Participants were able to gain insights from one another's experiences with the Alexa device and collaboratively engage and problem-solve on one another's successes and challenges. Feedback gathered from debrief interviews from participants and caregivers reported perceived benefits and utility of this VAT training led to positive perceptions from participants. VAT for adults with CCDs demonstrated the use of VAT to increase independence in ADLs and information seeking, which were consistent with prior studies [24,43]. Since P5 had difficulty reading, written cues were not used to help elicit working commands. Due to personal dietary restrictions, P5 expressed satisfaction after learning that she could ask Alexa whether the foods she ate regularly were relatively “healthy” or “unhealthy” for diabetes. P5 engaged with commands encompassing weather, scheduling, reminders, and communication, demonstrating the ability to rephrase commands to suit her communication style was evident, and portraying confidence in tailoring commands to her requirements. P5’s engagement was particularly visible in the commands related to news, facts, and communication. Additionally, she adeptly used prompts to halt Alexa and effectively manage the creation, modification and cancellation of reminders and appointments. VAT also consolidated needs for P6 who inquired whether Alexa could create a specific timer (e.g., a boiling egg timer for 11 minutes). In sum, the ability to independently execute these voice commands allowed participants to improve independence and generalization of ADL skills through VAT, as noted in prior studies [24,43]. Furthermore, caregivers who were present throughout the VAT training additionally reported perceived benefits regarding participants’ increased independence at home. From the perspective of caregivers, [29] states that there are 50 million unpaid family caregivers in the U.S., which amounts to a $600 billion segment of people performing roles that they were not trained for using VAT [30]. One of the primary challenges faced by caregivers is their lack of awareness regarding how technology can assist them and their loved one at home [30]. By allowing caregivers to be part of the training and debrief sessions, caregivers gained knowledge and skills regarding using VAT to assist their loved ones to be more independent with ADLs.

5.2 Challenges of VAT: Accessibility Issues

Some user accessibility challenges included difficulty with speech recognition while using Alexa and where a common complaint during the debrief interview was related to “Alexa cutting speech off.” This would be challenging particularly due to the participants’ language and cognitive communication deficits. Consistent with prior research [8], to converse with VAT, participants need to use specific words in a timely manner and to produce appropriate sequences of commands that need planning, execution and working memory. Additionally, clinicians had also expressed concern regarding competency in accessibility with the device and the application that is used to set up their Alexa account. The clinician who had worked with P7 had stated that Alexa would “cut him off” before he finished the command. Despite challenges experienced with P7, the clinician was unaware of how to delay Alexa's response time and agreed upon further education regarding accessibility functions of Alexa. Before learning skills using Alexa, learning how to set up the device using the Alexa app is a crucial part of this training. Creating a diverse device environment by teaching participants and clinicians how they can navigate the Alexa app would allow for independent troubleshooting and better device management. These changes are expected to enhance participant engagement and improve the effectiveness of the training.

5.3 Customized VAT training for adults with other CCDs and/or comorbidities

Customized training strategies were incorporated for some participants to learn using the VA considering their special needs and/or deficits due to comorbidities. Addressing low literacy with P5 involved reducing reliance on visual cues during therapy sessions. Instead, instructions were delivered through various modalities, including verbal models. Before this training, P5’s ability to search the same information by using graphic user interface (GUI) was very difficult for this participant due to her lack of literacy skills added on to her cognitive-communication difficulties. Difficulties with GUI were likely due to her lack of literacy skills added on to her cognitive-communication disorders, and such challenge was also reported in individuals with visual impairments [13]. With multimodal supports and prompting from clinicians, P5 demonstrated consistent cooperation throughout the session, achieving a 97% accuracy in command execution despite requiring minimal cues. Similarly, due to P7’s diagnoses of both TBI and aphasia, we provided multimodal stimulation consisting of spoken and written commands for P7, where we attempted to reduce the visual distractions from the screen and wrote the commands in a larger font size using the whiteboard feature on Zoom or on a Google doc for them to repeat these commands without getting confused once the wake word was spoken for each command. Additionally, Alexa was deactivated to facilitate verbal repetition practice. By piloting the study via telehealth with participants in their most naturalistic environments, we not only piloted the feasibility of offering easy access to such services but also identified several accessibility considerations through the training (e.g., low literacy, low-vision, instruction being provided with various modalities, etc.).

6 Limitation and Future Work

Findings from this study suggested limitations that allowed for further modifications to elicit future outlook to improve the generalizability of the VAT training. There were three major limitations related to time, participant size, and participant attrition that could be improved in future work. First, prior to the VAT training, participants needed to complete all measures to gather their baseline cognitive-communication skills (45 minutes) and self-reported skills for performing ADLs (25 minutes), therefore at least two Zoom sessions are required to gather these measures. Therefore, instead of conducting the entire self-reported ADL measure prior to starting the VAT training program, future work shall consider simplification of baseline measures using the most robust subtests and gather the corresponding baseline ADL measures in the beginning of each weekly training session. Second, due to the small sample size of participants in this study and inconsistent attendance for some participants, missing data impacted our ability to perform a correlation analysis of participants’ baseline measures and their VAT training performance. The smaller sample size of the participants for both cohorts is justified because the clinicians wanted to devote an increased amount of time to individualized training in separate breakout rooms for each individual participant to tailor the voice commands as per their needs and lifestyles given a wide range of severity and diversity in their cognitive communication deficits. Due to several personal/medical appointments (P2, P4, P6), scheduling conflicts, and participants’ forgetfulness to log in for the session (P6), some participants missed more sessions than others leading to missing data and difficulties in conducting a correlation analysis. In this study, separate graphs were provided comparing self-reported ADL measures to the performance during VAT training for each particular ADL, It is worthy to note participants who scored lower on the baseline measures related to cognitive and communication (e.g., P7) did demonstrate the lowest performance on accuracy in executing VAT commands for three out of five weeks on ADL topics, namely “Scheduling, Self-care and Meal Preparation'' (Week 1, Week 3, and Week 5). 
Future work with a larger cohort of participants should consider comparing the predictability of baseline measures and the likelihood for successful completion and performance in the VAT training. During a 6 month follow up with P1, P3, and P3’s caregiver, they reported having continued difficulty utilizing learned commands from the training due to limited training time and lack of self-instruction prior to each session. They had also mentioned the scheduled time of the training had conflicted with their mealtime, inducing fatigue “after a long day” of engaging in daily activities and attending therapy sessions. To address these issues, it is helpful to expand the training duration or offer make-up sessions to cover desired topics with additional practices. Due to time constraints, no usability evaluation measures were conducted. Future work could benefit from prior work such as the study by Gagnon-Roy et al. [44] to reduce the number of tests to assess cognitive-communication skills, and integrating heuristic and usability evaluation via the Voice Usability Scale or user evaluation measures specifically for Amazon Alexa to further examine participants’ needs of usability and accessibility [46, 47, 54]. Additionally, this study also warrants implications for integrating VAT to increase access to cognitive communication rehabilitation via telehealth. VAT has already been considered as potential reimbursable AT to increase quality of life [39]. Our training program could be integrated into existing synchronous service delivery models in group therapy via telehealth, followed with rigorous goal setting, baseline and outcome measures to further examine potential improvement in independence of ADLs across a longer period of time.

7 Conclusion

This study aims to develop a 6-week online training program to examine the user experience of seven individuals with CCDs to use VAT to perform ADLs. Through a home-based delivery in the group format, speech-language clinicians provided guided training in mastering voice commands and analyzed the benefits and challenges faced by participants and caregivers. Furthermore, the study was multi-faceted (e.g., designed for various age groups, the permissibility of multi-modal prompts via telehealth) and broadly analytical (e.g., consideration of implications for users, clinicians, and technologists) in both its methodology as well as in its results. This tailored training curriculum allowed participants with CCDs to experience that the use of VAT could strengthen their cognitive-communication skills and enhance their independence in completing daily activities within their natural home environments, even amidst background noise and distractions without over-reliance on their caregivers. This study was designed to personalize the skills and commands taught based on each participant's level of independence with VAT and their individual needs within each training topic, informed by a collection of the baseline measures. Moving forward, this study may be feasibly replicated to measure whether VAT bolsters independence and technology literacy and improves cognitive skills for older adults with CCDs within their homes.

Acknowledgments

This study is funded by the Monmouth University's Student Scholar Program and the Monmouth University Urban Coast Institute Heidi Lynn Sculthrope research grant. We thank the participants and caregivers who generously offered their time and support to participate in this study. We especially appreciate graduate student clinicians (Keisha Jones, Gabrielle Uhrik, Nikita Ved) and clinical educators and faculty mentors (Ann Marie Huszar, Jennifer Shubin, Lori Price, Aaron Rothbart) for their generous support.

Footnote

corresponding author.

Supplemental Material

MP4 File - Video Presentation
Video Presentation
Transcript for: Video Presentation
DOCX File - Appendix
Appendix: debrief interview questions with study participants

References

[1]
Centers for Disease Control and Prevention. 2016. Report to congress: traumatic brain injury in the United States. Retrieved from https://www.cdc.gov/traumaticbraininjury/pubs/tbi_report_to_congress.html 
[2]
Birgit Kvikne. 2022. Aphasia and Information Seeking. In Proceedings of the 2022 Conference on Human Information Interaction and Retrieval (CHIIR '22). Association for Computing Machinery, New York, NY, USA, 379–382. https://doi.org/10.1145/3498366.3505805
[3]
Huykien Le and Mickey Y. Lui. 2023. StatPearls. Aphasia. StatPearls Publishing. https://www.ncbi.nlm.nih.gov/books/NBK559315/National Institute on Deafness and Other Communication Disorders. 2015. NIDCD Fact sheet: Aphasia. Retrieved from https://www.nidcd.nih.gov/sites/default/files/Documents/health/voice/Aphasia6-1-16.pdf
[4]
Kathryn Y Hardin and James P Kelly. 2019. The Role of Speech-Language Pathology in an Interdisciplinary Care Model for Persistent Symptomatology of Mild Traumatic Brain Injury. Semin Speech Lang. 2019;40(1):65-78.
[5]
Nigel V. Marsh. 2018. Cognitive Functioning Following Traumatic Brain Injury: The First 5 Years. Retrieved from 10.3233/NRE-182457
[6]
Loïc Caroux, Charles Consel, Lucile Dupuy and Hélène Sauzéon. 2014. Verification of daily activities of older adults: a simple, non-intrusive, low-cost approach. In Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility (ASSETS '14). Association for Computing Machinery, New York, NY, USA, 43–50. https://doi.org/10.1145/2661334.2661360
[7]
Fabio Masina, Valeria Orso, Patrik Pluchino, Giulia Dainese, Stefania Volpato, Cristian Nelini, Daniela Mapelli, Anna Spagnolli and Luciano Gamberini. 2020. Investigating the accessibility of voice assistants with impaired users: Mixed Methods Study. Journal of Medical Internet Research 22, 9 (2020).
[8]
Kenneth Olmstead. 2017. Nearly half of Americans use digital voice assistants, mostly on their smartphones. Washington: Pew Internet and American Life Project; 2017. URL: https://www.pewresearch.org/fact-tank/2017/12/12/
[9]
Personal Tech and the Pandemic: Older Adults Are Upgrading for a Better Online Experience. AARP.org. Published September 2021. Retrieved May 3, 2023 from https://www.aarp.org/research/topics/technology/info-2021/2021-technology-trends-older-americans.html-CMP=RDRCT-PRI-TECH-040721/?cmp=RDRCT-907b618d-20210416.
[10]
Lauren Werner, Gaojian Huang and Brandon J. Pitts. 2023. Smart speech systems: A focus group study on older adult user and non-user perceptions of speech interfaces. International Journal of Human-Computer Interaction. 2023;39(5):1149-1161. 
[11]
Robin Brewer, Casey Pierce, Pooja Upadhyay and Leeseul Park. 2021. An Empirical Study of Older Adult's Voice Assistant Use for Health Information Seeking. ACM Trans. Interact. Intell. Syst. Just Accepted (August 2021). https://doi.org/10.1145/3484507
[12]
Katie J. Edwards, Ray B. Jones, Deborah Shenton, Toni Page, Inocencio Maramba, Alison Warren, Fiona Fraser, Tanja Križaj, Tristan Coombe, Hazel Cowls and Arunangsu Chatterjee. 2021. The Use of Smart Speakers in Care Home Residents: Implementation Study. Journal of Medical Internet Research23, 12 (2021).
[13]
Alexandar B. Kucharski and Sebastian Merkel. 2020. Smart speaker for older users – results of a rapid review. AAL Congress 2020, Leipzig. Retrieved June 23, 2022 from https://www.researchgate.net/profile/Sebastian-Merkel/publication/348607873_Smart_Speaker_for_Older_Users_-Results_of_a_Rapid_Review/links/60075a37299bf14088aa5205/Smart-Speaker-for-Older-Users-Results-of-a-Rapid-Review.pdf
[14]
Pranav Kulkarni, Orla Duffy, Jonathan Synnott, W George Kernohan and Roisin McNaney. 2021. Speech and language practitioners’ experiences of commercially available voice-assisted technology: Web-based survey study (preprint). (2021).
[15]
T. Wallace and J. Morris. 2018. Identifying barriers to usability: Smart speaker testing by military veterans with mild brain injury and PTSD. In Cambridge Workshop on Universal Access and Assistive Technology, 2018, 113–122. http://dx.doi.org/10.1007/978-3-319-75028-6_10
[16]
Sheila MacDonald. 2017. Introducing the model of cognitive-communication competence: A model to guide evidence-based communication interventions after brain injury, Brain Injury, 31:13-14, 1760-1780.
[17]
Aris Malapaschas. 2020. Accessibility and acceptability of voice assistants for people with acquired brain injury. SIGACCESS Access. Comput., 123, Article 4 (January 2019), 1 pages. https://doi.org/10.1145/3386402.3386406
[18]
Alisha Pradhan, Kanika Mehta and Leah Findlater. 2018. "Accessibility Came by Accident": Use of Voice-Controlled Intelligent Personal Assistants by People with Disabilities. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). Association for Computing Machinery, New York, NY, USA, Paper 459, 1–13. https://doi.org/10.1145/3173574.3174033
[19]
Kimia Tuz Zaman, Hasan Wordh UI, Juan Li and Cui Tao. 2023. Empowering Caregivers of Alzheimer's Disease and Related Dementias (ADRD) with a GPT-Powered Voice Assistant: Leveraging Peer Insights from Social Media. 2023 IEEE Symposium on Computers and Communications (ISCC), Gammarth, Tunisia, 2023, pp. 1-7.
[20]
Abdolrahmani, A., Kuber, R., & Branham, S. M. (2018, October). " Siri Talks at You" An Empirical Investigation of Voice-Activated Personal Assistant (VAPA) Usage by Individuals Who Are Blind. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility (pp. 249-258).
[21]
Eduardo Nacimiento-Garcia, Carina Gonzalez-González, Daniel Domínguez-Gutiérrez and Francisco Gutiérrez-Vela. 2021. Pervasive gaming experiences to promote active aging using the virtual voice assistant Alexa. In Ninth International Conference on Technological Ecosystems for Enhancing Multiculturality (TEEM'21). Association for Computing Machinery, New York, NY, USA, 189–194. https://doi.org/10.1145/3486011.3486444
[22]
Vanessa Abrahamson, Jan Jensen, Kate Springett and Mohamed Sakel. 2016. Experiences of patients with traumatic brain injury and their carers during transition from in-patient rehabilitation to the community: A qualitative study. Disability and Rehabilitation 39, 17 (2016), 1683–1694.
[23]
Tolu O. Oyesanya, Nicole Thompson, Karthik Arulselvam and Ronald T. Seel. 2019. Technology and TBI: Perspectives of Persons with TBI and their family caregivers on technology solutions to address health, Wellness and Safety Concerns. Assistive Technology 33, 4 (2019), 190–200.
[24]
Pranav Kulkarni, Orla Duffy, Jonathan Synnott, W George Kernohan and Roisin McNaney. 2022. Speech and Language Practitioners' Experiences of Commercially Available Voice-Assisted Technology: Web-Based Survey Study. JMIR Rehabil Assist Technol. 5 (Jan. 2007).
[25]
Jérémy Bauchet, Hélène Pigot, Sylvain Giroux, Dany Lussier-Desrochers, Yves Lachapelle and Mounir Mokhtari. 2009. Designing judicious interactions for cognitive assistance: the acts of assistance approach. In Proceedings of the 11th international ACM SIGACCESS conference on Computers and accessibility (Assets '09). Association for Computing Machinery, New York, NY, USA, 11–18. https://doi.org/10.1145/1639642.1639647 
[26]
Christina N. Harrington, Ben Jelen, Amanda Lazar, Aqueasha Martin-Hammond, Alisha Pradhan, Blaine Reeder and Katie Siek. 2021. Taking Stock of the Present and Future of Smart Technologies for Older Adults and Caregivers. Retrieved September 14, 2023 from https://cra.org/ccc/resources/ccc-led-whitepapers/#2020-quadrennial-papers
[27]
Rachel McCloud, Carly Perez, Mesfin Awoke Bekalu and K Viswanath. 2022. Using Smart Speaker Technology for Health and Well-being in an Older Adult Population- Pre-Post Feasibility Study. JMIR Aging 2022;5(2):1
[28]
Older Adults Embrace Tech for Entertainment and Day-to-Day Living. AARP.org. December 2021. Retrieved May 3, 2023 from https://www.aarp.org/research/topics/technology/info-2022/2022-technology-trends-older-americans.html
[29]
Mont. County Comm. on Aging presents Smart Homes, Smarter Care: Technology to Support Aging In Places. Video. (18 May  2023). Retrieved June 14, 2023 from https://www.youtube.com/watch?v=nQK1CSU1vfs 
[30]
Ageing. WHO. 2023. Retrieved June 14, 2023 from https://www.who.int/health-topics/ageing#tab=tab_1
[31]
Neurological Disorders Affect Millions Globally: WHO Report. World Health Organization. (February 2007). Retrieved June 14, 2023 from https://www.who.int/news/item/27-02-2007-neurological-disorders-affect-millions-globally-who-report
[32]
Falls. WHO. April 26, 2021. Retrieved June 14, 2023 from https://www.who.int/news-room/fact-sheets/detail/falls 
[33]
Ana Vitória Lachowski Volochtchuk, Higor Leite, and Alessandro Diogo Vieira. 2023. Voice Assistant Technology Applied to Populations with Developmental and Physical Disabilities. Behaviour & Information Technology, 25 (Jul. 2023), 1-23.
[34]
M.P. Lawton and E.M. Brody. 1969. Assessment of older people: Self-maintaining and instrumental activities of Daily Living. The Gerontologist 9, 3 Part 1 (1969), 179–186.
[35]
Claire O'Connor, Ginna Byun, Lauren Kim and Priyal Vora. Designing Voice-Assisted Technology (VAT) Training for Activities of Daily Living (ADLs) for Adults with Cognitive-Communication Needs (CCNs) at Home. 2023. In The 25th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '23). Association for Computing Machinery, New York, NY, USA, https://doi.org/10.1145/3597638.3615656
[36]
Hubert Knoblauch, René Tuma and Bernt Schnettler. 2014. Chapter 30: Video analysis and Videography. (2014).The SAGE handbook of qualitative data analysis. Retrieved June 20, 2022 from https://methods.sagepub.com/book/the-sage-handbook-of-qualitative-data-analysis/n30.xml 
[37]
Virginia Braun and Victoria Clarke. 2008. Using thematic analysis in psychology. Qualitative Research in Psychology 3, 21 (July 2008): 77-101.
[38]
Matthew Jamieson, Breda Cullen, Marilyn McGee-Lennon, Stephen Brewster and Jonathan Evans  (2017). Technological memory aid use by people with acquired brain injury. Neuropsychological Rehabilitation 27, (Sep 2017),  919-936.
[39]
Emre Sezgin, Yungui Huang, Ujjwal Ramtekkar, and Simon Lin.  2020. Readiness for Voice Assistants to Support Healthcare Delivery during a Health Crisis and Pandemic. NPJ Digital Medicine 3, (16 Sep.) 2020, 122.
[40]
R Brock Frost, Thomas J Farrer, Mark Primosch, and Dawson W Hedges. 2013. Prevalence of traumatic brain injury in the general adult population: A meta-analysis. Neuroepidemiology, 40(3), 154-159.
[41]
C Lefevre-Dognin, M Cogné, V Perdrieau, A Granger, C Heslot, and P Azouvi. 2021. Definition and epidemiology of mild traumatic brain injury. Neurochirurgie, 67(3), 218-221.
[42]
Andrew I R Maas, David K Menon, Geoffrey T Manley, Mathew Abrams, Cecilia Åkerlund, Nada Andelic, Marcel Aries... and Roger Zemek. 2022. Traumatic brain injury: progress and challenges in prevention, clinical care, and research. The Lancet Neurology, 21(11), 1004-1060.
[43]
Amanda R Rabinowitz, George Collier, Monica Vaccaro, and Roberto Wingfield. 2021. Development of a conversational agent for promoting increased activity in users with traumatic brain injury. Archives of Physical Medicine and Rehabilitation, 102(10), e83.
[44]
Mireille Gagnon-Roy, Stéphanie Pinard, Carolina Bottari, Fanny Le Morellec, Catherine Laliberté, Rym Ben Lagha, Amel Yaddaden, Hélène Pigot, Sylvain Giroux, and Nathalie Bier. 2022. Smart assistive technology for cooking for people with cognitive impairments following a traumatic brain injury: user experience study. JMIR rehabilitation and assistive technologies, 9(1), e28701.
[45]
Ageing and Health. WHO. (October 2022). from https://www.who.int/news-room/fact-sheets/detail/ageing-and-health
[46]
Faruk Lawal Ibrahim Dutsinma, Debajyoti Pal, Suree Funilkul and Jonathan H. Chan  2022. A systematic review of voice assistant usability: An ISO 9241-11 approach. SN Computer Science, 3(4), 267. https://doi.org/10.1007/s42979-022-01172-3
[47]
Dilawar Shah Zwakman, Debajyoti Pal, and Chonlameth Arpnikanondt. 2021. Usability evaluation of artificial intelligence-based voice assistants: The case of Amazon Alexa. SN Computer Science, 2, 1-16.
[48]
Neeraja Murali Dharan, Muhammad Raisul Alam, and Alex Mihailidis. 2021. Speech-based prompting system to assist with activities of daily living: A feasibility study. Gerontechnology, 20(2), 1-12.
[49]
Goodwin, & P. B. Brandtzaeg (Eds.), Chatbot Research and Design (pp. 23–38). Lecture Notes in Computer Science (LNCS), vol 13171. Springer International Publishing.
[50]
Kathrin Koebel, Martin Lacayo, Madhumitha Murali, Ioannis Tarnanas, and Arzu Coltekin. 2022. Expert Insights for Designing Conversational User Interfaces as Virtual Assistants and Companions for Older Adults with Cognitive Impairments. In A. Følstad, T. Araujo, S. Papadopoulos, E. L.-C. Law, E. Luger, M. 
[51]
Judith Hocking, Anthony Maeder, David Powers, Lua Perimal-Lewis, Beverley Dodd, and Belinda Lange. 2023. Mixed methods, single case design, feasibility trial of a motivational conversational agent for rehabilitation for adults with traumatic brain injury. Clinical Rehabilitation. 2023;0(0).
[52]
Fabio Catania, Micol Spitale, and Franca Garzotto. 2023. "Conversational agents in therapeutic interventions for neurodevelopmental disorders: a survey." ACM Computing Surveys 55.10 (2023): 1-34. https://doi.org/10.1145/3564269
[53]
Nicole Ruggiano, Ellen L. Brown, Lisa Roberts, C. Victoria Framil Suarez, Yan Luo, Zhichao Hao, and Vagelis Hristidis. 2021. Chatbots to support people with dementia and their caregivers: systematic review of functions and quality. Journal of medical Internet research, 23, 6, e25006.
[54]
Raina Langevin, Ross J. Lordon, Thi Avrahami, Benjamin R. Cowan, Tad Hirsch, and Gary Hsieh, G. 2021. Heuristic evaluation of conversational agents. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 1-15.

Cited By

View all
  • (2024)Longitudinal Coadaptation of Older Adults With Wearables and Voice-Activated Virtual Assistants: Scoping ReviewJournal of Medical Internet Research10.2196/5725826(e57258)Online publication date: 7-Aug-2024
  • (2024)Using Voice-Activated Technologies to Enhance Well-Being of Older Adults in Long-Term Care HomesInnovation in Aging10.1093/geroni/igae1028:12Online publication date: 4-Nov-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '24: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems
May 2024
18961 pages
ISBN:9798400703300
DOI:10.1145/3613904
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2024

Check for updates

Badges

Author Tags

  1. Activities of daily living
  2. Individuals with cognitive-communication disorders
  3. Telehealth
  4. Voice assistive technology

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CHI '24

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,077
  • Downloads (Last 6 weeks)160
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Longitudinal Coadaptation of Older Adults With Wearables and Voice-Activated Virtual Assistants: Scoping ReviewJournal of Medical Internet Research10.2196/5725826(e57258)Online publication date: 7-Aug-2024
  • (2024)Using Voice-Activated Technologies to Enhance Well-Being of Older Adults in Long-Term Care HomesInnovation in Aging10.1093/geroni/igae1028:12Online publication date: 4-Nov-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media