1 Introduction
Breakthroughs in Artificial Intelligence (AI) and Machine Learning (ML) have considerably advanced the degree to which interactive systems can augment our lives [
127,
182]. As black-box ML models are increasingly being employed, concerns about humans misusing AI and losing control have led to the need to make AI and ML algorithms easier for users to understand [
25,
155]. This, in turn, has spurred rapidly growing interest into
Explainable AI (XAI) within academia [
11,
74,
130] and industry [
2,
6,
22], and by regulatory entities [
1,
93,
94].
Earlier XAI research aims to help AI/ML developers on model debugging (e.g., [106, 140, 178, 179, 240]) or assist domain experts such as clinicians by revealing more information such as causality and certainty (e.g., [75, 129, 217, 222]). Recently, there has been a growing amount of XAI research focusing on the non-expert end-users [31, 74, 102]. Existing studies have found that XAI can help end-users resolve confusion and build trust [
67,
170]. Industrial practitioners have started to integrate XAI into everyday scenarios and improve user experiences,
e.g., by displaying the match rate of point-of-interest suggestions on map applications [
135].
Alongside the surge of interest into XAI,
Augmented Reality (AR) is another technology making its way into everyday living [
5,
8].
Advances in more lightweight, powerful, and battery-efficient Head-Mounted Displays (HMDs) have brought us closer to the vision of pervasive AR [91]. As AI techniques are needed to enable context-aware, intelligent, everyday AR [
12,
58,
177], XAI will be essential because end-users will interact with outcomes of AI systems. XAI could be used to make intelligent AR behavior interpretable, resolve confusion or surprise when encountering unexpected AI outcomes, promote privacy awareness, and build trust. Therefore, we aim to answer the following research question:
How do we create effective XAI experiences for AR in everyday scenarios?Researchers have developed several design spaces and frameworks to guide the design of XAI outside the context of AR [
25,
67,
155,
217]. However, most previous work focused on identifying a taxonomy of explanation types or generation techniques. They did not consider everyday AR-specific factors such as the rich sensory information that AR technologies have about users and contexts, and its always-on, adaptive nature. These factors can not only support more personalized explanations, but also affect the design of an explanation interface. For example, one could render in-place explanations on related objects (
e.g., explaining a recipe recommendation by highlighting ingredients in the fridge). In this paper, we provide a framework to guide the design of XAI in AR.
To answer the aforementioned research question, a design space analysis [
149] was used to break down the main research question into three sub-questions: 1)
When to explain?, 2)
What to explain?, and 3)
How to explain? Previous research from the XAI and HCI communities has focused on one or two of these sub-questions (
e.g., when [
164,
183], what [
25,
130]). Although not within the context of AR, many of these findings can inform the design of XAI in AR.
Therefore, we first summarized related literature to identify the most important dimensions under each sub-question, as well as the factors that determine the answers to these questions, such as users’ goals for having explanations (i.e., why explain). Then, we conducted two complementary studies to obtain insights from the perspectives of end-users and experts. Specifically, we carried out a large-scale survey including over 500 end-users with different levels of knowledge of AI to collect user preferences about the timing (related to
When), content (related to
What), and modality (related to
How) of explanations in multiple AR scenarios. In addition, we ran three workshops with twelve experts (
i.e., four experts per workshop) from different backgrounds, including algorithm developers, designers, UX professionals, and HCI researchers to iterate on the dimensions and generate guidelines to answer the
When/What/How questions.
Merging the insights obtained from these two studies, we developed the
XAIR (e
Xplainable
AI for Augmented
Reality) framework (Fig.
1). The framework can serve as a comprehensive reference that connects multiple disciplines across XAI and HCI. It also provides a set of guidelines to assist in the development of XAI designs in AR. XAI researchers and designers can use the guidelines to enhance their design intuition and propose more effective and rigorous XAI designs for AR scenarios.
We further conducted two user studies to evaluate XAIR. To verify its utility to support designers, the first study focused on designers’ perspectives. Ten designers were invited to use XAIR and design XAI experiences for two real-life AR scenarios. To demonstrate its effectiveness in guiding the design of an actual AR system, a second study was conducted from the perspective of end-users. We implemented a real-time intelligent AR system based on the designers’ proposals in the previous study using XAIR. The study measured the usability of the AR system with 12 end-users. The results indicated that XAIR could provide meaningful and insightful support for designers to propose effective designs for XAI in AR, and that XAIR could lead to an easy-to-use AR system that was transparent and trustworthy.
The contributions of this research are:
•
We summarized literature from multiple domains and identified the important dimensions for the when/what/how questions in the problem space when designing XAI in AR.
•
Drawing the results from a large-scale survey with over 500 users and an iterative workshop study with 12 experts, we developed XAIR, the first framework for XAI design in AR scenarios. We also proposed a set of guidelines to support designers in their design thinking process.
•
The results of design workshops with 10 designers indicated that XAIR could provide meaningful and insightful creativity support for designers. The study with 12 end-users who used a real-time AR system showed that XAIR led to the design of AR systems that were transparent and trustworthy.
6 Applications
To demonstrate how to leverage XAIR for XAI design, we present two examples that showcase potential workflows that use XAIR for everyday AR applications (Fig.
7). More details can be found in Appendix
B.1. After determining the key factors for a given scenario, we used the framework (Fig.
4-Fig.
6) to make design choices based on the factors.
6.1 Scenario 1: Route Suggestion while Jogging
Scene. Nancy (AI expert, high AI literacy) is jogging in the morning on a quiet trail. Since it is the cherry-blossom season and Nancy loves cherries, her AR glasses display a map beside her and recommend a detour. Nancy is surprised since this route is different from her regular one, but she is happy to explore it. She is also curious to know the reason this new route was recommended.
When. Delivery. Nancy has enough cognitive capacity in this scenario. Her User Goal is Resolving Surprise. Therefore, an explanation is automatically triggered because the two conditions are met (G2).
What.
Content. Other than the
User Goal, the
System Goal is User Intent Discovery (exploring a new route to see cherry blossom). Considering Nancy’s
User Profile, she is an expert in AI, so the appropriate explanation content types (
G3) are Input/Output (
e.g., “This route is recommended based on seasons, your routine, and preferences.”) and Why/Why-Not (
e.g., “The route has cherry blossom trees that you can enjoy. The length of the route is appropriate and fits your morning schedule.”). Examples for all seven explanation content types can be found in Appendix
B.1.
Detail. The AR interface shows the Why as default (G4), and can be expanded to show both types in detail (G5). Nancy can slow down and click the “More” button to see more detailed explanations while standing or walking.
How. Modality. The explanation is presented visually, the same as the recommendation (G6).
Format. The default explanation uses text, while the detailed explanation contains cherry-blossom pictures of the new route to help explain the Why (G7).
Pattern. The explanation is shown explicitly within the route recommendation window (G8).
6.2 Scenario 2: Plant Fertilization Reminder
Scene. Sarah (general end-user, low AI literacy) was chatting with her neighbor about gardening. After she returned home and sat on the sofa, her AR glasses recommended instructions about plant fertilization by showing a care icon on the plant. Sarah is concerned about technology invading her privacy, and wants to know the reason behind the recommendation.
When. Delivery. Although Sarah has enough cognitive capacity, none of the three cases in the second condition of G2 are met (i.e., she was familiar with the recommendation and not confused, and the model didn’t make a mistake). Therefore, the explanation needs to be manually triggered (G2).
What. Content. In this case, the System Goal is Trust Building (clarifying the usage of data), and the User Goal is Privacy Awareness. Sarah’s User Profile indicates that she is not an expert in AI. According to G3, the explanation content type list contains Input/Output, Why/Why-Not, and How.
Detail. Considering Sarah’s concern, the default explanation merges Why and How: “The system scans the plant’s visual appearance. It has abnormal spots on the leaves, which indicate fungi or bacteria infection.” (G4). For the detailed explanation, the full content of the three types is presented in a drop-down list upon her request (G5).
How. Modality. Following G6, the visual modality is used for both the explanation and the manual trigger (a button beside the plant care icon).
Format. Other than using text as the primary format, the abnormal spots on the leaves are also highlighted via circles to provide an in-situ explanation (G7).
Pattern. Since the highlighting of spots is compatible with the environment (shown on leaves), it adopts the implicit pattern (G8). The rest of the texts of the explanation uses the explicit pattern.
Our two examples demonstrate XAIR’s ability to guide XAI design in AR in various scenarios. In Appendix
B.2, we provide additional everyday AR scenarios to further illustrate its practicality.
8 Discussion
XAIR defines the problem space structure of XAI design in AR and details the relationship that exists between the factors and the problem space. By highlighting the key factors that designers need to consider and providing a set of design guidelines for XAI in AR, XAIR not only serves as a reference for researchers, but also assists designers by helping them propose more effective XAI designs in AR scenarios. The two evaluation studies in Sec.
7 illustrated that XAIR can inspire designers with more design opportunities and lead to transparent and trustworthy AR systems.
In this section, we discuss how researchers and designers can apply XAIR, as well as potential future directions of the framework inspired by our studies. We also summarize the limitations of this work.
8.1 Applying XAIR to XAI Design for AR
Researchers and designers can make use of XAIR in their XAI design for AR scenarios by initially using their intuition to propose an initial set of designs. Then, they can follow the framework to identify five key factors: User State , Contextual Information , System Goal , User Goal , and User Profile . The example scenarios in Sec. 6 and Sec. 7 indicate how these factors can be specified. Based on these factors, they would then work through the eight guidelines of when , what , and how , using Fig. 4-Fig. 6 to inspect their initial design and make modifications if there is anything inappropriate or missing. Low-fidelity storyboards or prototypes of the designs can be tested via small-scale end-user evaluation studies. This would be an iterative process. In the future, when sensing and AI technologies are more advanced, it is promising that the procedures of identifying factors and checking guidelines could be automated. 8.2 Towards An Automatic Design Recommendation Toolkit
In Study 3, more than one user mentioned the possibility of converting the framework into an automatic toolkit. For example, P3 was thinking aloud when using XAIR in the study,
“If this framework is described as an algorithm, the five key factors can be viewed as the input of the algorithm... and the output is the design of the three questions.” There are a few decision-making steps in the current framework that involve human intelligence. For example, when designing the default explanations in
what - detail, designers need to consider users’ priority under a given context to determine which explanation content type to highlight. When picking the appropriate visual
paradigm, designers need to determine whether the explanation content is more appropriate in a textual or graphical format, as well as whether the content can be naturally embedded within the environment. Assuming future intelligent models can assist with these decisions, XAIR could be transformed into a design recommendation tool that could enable designers and researchers to experiment with a set of
User State,
Contexts,
System/User Goals, and so on. This could achieve a more advanced version of XAIR, where XAIR are fully automated as an end-to-end model: determining the optimal XAI experience by inferring the five key factors in real time. This is an appealing direction. However, although factors such as
Context and
System Goal are easier to predict with a system, the inference of
User State/Goal is still at an early research stage [
21,
71,
99]. Moreover, extensive research is needed to validate the adequacy and comprehensiveness of the end-to-end algorithm. This also introduces the challenge of nested explanations in XAIR (
i.e., explaining explanations) [
154], which calls for further study.
8.3 The Customized Configuration of XAI Experiences in AR
The experts in Study 2 and the designers in Study 3 brought up the need for end-user to control XAI experiences in AR, e.g., “XAIR can provide a set of default design solutions, and users could further customize the system” (P12, Study 2) and “I personally agree with the guidelines, but I can also imagine some users may want different design options. So there should be some way that allows them to select when/what/how... For example, a user may want the interface to be in an explicit dialogue window all the time [related to how]. We should support this.” (P8, Study 3) This need for control suggests that to achieve a personalized AR system, designers should provide users with methods to configure their system, so that they can set up specific design choices to customize their XAI experience. Such personalization capabilities may also be used to support people with accessibility needs (also mentioned by P2 in Study 3), e.g., visually impaired users can choose to always use the audio modality.
8.4 User-in-The-Loop and Co-Learning
During the iterative expert workshops (Study 2, Sec.
4.2), experts mentioned an interesting long-term co-learning process between the AR system and a user. On the one hand, based on a user’s reactions to AI outcomes and explanations, a system can learn from the data and adapt to the user. Ideally, as the AR system better understands the user, the AI models would be more accurate, thus reducing the need for mistake-related explanations (
e.g., cases where
System Goal as Error Management). On the other hand, the user is also learning from the system. “
Users’ understanding of the system and AI literacy may change as they learn from explanations” (P4, Study 2). This may also affect the user’s need for explanations. For example, the user may have less confusion (
User Goal as Resolving Surprise/Confusion) as they become more familiar with the system. Meanwhile, they may become more interested in exploring additional explanation types (
User Goal as Informativeness). Such a long-term and co-learning process is an interesting research question worth more exploration.
8.5 Limitations
There are a few limitations to this research. First, although we highlighted promising technical paths within the framework in Sec.
5, XAIR does not involve specific AR techniques. The real-time AR system in Study 4 implemented the ingredient recognition and recipe recommendation modules, but the detection of user state/goal was omitted.
Second, our studies might have some intrinsic biases. For example, Study 1 only involved AR recommendation cases. Since everyday AR HMDs are still not widely adopted in daily life, we grouped 500+ participants only based on AI experience instead of AR experience. The experts and designers of our studies were all employees of a technology company. Study 4 only evaluated two specific proposals from designers. Moreover, as there is no previous XAI design in AR, we were only able to compare our XAIR-based system against a baseline without explanation. Third, other than when, what, and how, there could be more aspects in the problem space,
e.g., who and where to explain. Moreover, XAIR mainly focuses on non-expert end-users. Other potential users, such as developers or domain experts, were not included. The scope of the five key factors may also not be comprehensive. For example, we do not consider user trust in AI, which is a part of
User Profile that may be dynamic along with user-system interaction. These could limited the generalizability of our framework, but also suggests a few potential future work directions to expand and enhance XAIR.
9 Conclusion
In this paper, we propose XAIR, a framework to guide XAI design in AR. Based on a literature review of multiple domains, we identified the problem space using three main questions, i.e., when to explain, what to explain, and how to explain. We combined the results from a large-scale survey with over 500 end-users (Study 1) and iterative workshops with 12 experts (Study 2) to develop XAIR and a set of eight design guidelines. Using our framework, we walked through example XAI designs in two everyday AR scenarios. To evaluate XAIR’s utility, we conducted a study with 10 designers (Study 3). The study revealed that designers found XAIR to be a helpful, comprehensive reference that could inspire new design thoughts and provide a backup of designer intuitions. Moreover, to demonstrate the effectiveness of XAIR, we instantiated two design examples in a real-time AR system and conducted another user study with 12 end-users (Study 4). The results indicated excellent usability of the AR system. XAIR can thus help future designers and researchers achieve effective XAI designs in AR and help them explore new design opportunities.