1 Introduction
AI technologies are creating new opportunities to improve people’s lives worldwide, from healthcare to education to business. However, people do over-trust or under-trust these technologies occasionally [
87,
92]. Under-trust can lead to under-reliance, and over-trust can lead to over-compliance, which can negatively impact the task. Hence, for AI systems to reach their potential, people need to have
appropriate levels of trust in these systems, not just trust. Although there are many ways to define appropriate trust [
118], in this article, we take this to mean that the trust a human has in a system needs to align with the actual trustworthiness of the system [
32].
It has only been in recent years that we have found research on appropriate trust in AI systems [
7,
99,
100,
118]. Appropriate trust is a complex topic, as it requires consideration of the influence of context, the goal-related characteristics of the agent, and the cognitive processes that govern the development and erosion of trust [
18]. In this work, we aim to contribute by studying how explanations given by the AI, which highlight different integrity-based principles (e.g., honesty, transparency, fairness), can influence trust and the appropriateness thereof.
Explainable AI (XAI) is meant to give insight into the AI’s internal model and decision-making [
112] and has been shown to help users understand how the system works [
16,
85]. Efforts to ensure that AI is trusted appropriately are often in the form of explanations [
7,
69,
120]. Intuitively, this makes sense, as understanding an AI system’s inner workings and decision-making should, in theory, also allow a user to understand better when to trust or not trust a system to perform a task. Many are focused on how the system works: what it can do and can not [
69,
110]. This is done in many different ways, from highlighting essential features of a decision [
111], contrasting what would have happened if something was different [
91], or how confident the system is about its answer [
121].
Typically, explanations are focused on giving information about a system’s
ability to improve appropriate trust. However, literature on how humans trust typically sees trust as more than a belief about ability. Therefore, it is helpful to expand our perspective on explanations as well. A useful starting point for understanding human trust is the
ABI (Ability, Benevolence, and Integrity) model from the organizational context by Mayer et al. [
73]. This model has been used extensively in modeling trust, such as by Lee and See [
64], Hoffman et al. [
39], and Wagner et al. [
108]. It defines human trust as “A trusts B if A believes B will act in A’s best interest and accept vulnerability to B’s action” [
73]. Moreover, it distinguishes three trustee characteristics that influence a trustor’s trust: belief in ability, benevolence, and integrity.
Ability indicates the skills and competencies to do something. Benevolence is about a willingness to do good to a specific trustor. Integrity is defined as the trustor’s perception that the trustee adheres to acceptable principles [
73]. One of the extensively studied factors in trust research is the ability of the system [
12,
17,
29,
45,
75,
104]. However, fewer studies have investigated the integrity and benevolence dimensions of trust [
123]. Benevolence is a specific attachment and emotional connection between the trustor and trustee, which builds over time [
73]. Human-agent interactions are often short-term, and the extent to which we form emotional connections needs to be clarified. Therefore, more work on long-term social connections between humans and AI might be necessary before fully understanding the role of benevolence in XAI and human-AI trust relationships.
Prior studies on integrity have linked it to conventional standards of morality—especially those of honesty and fairness [
46,
74]. XAI can be regarded as a way to enhance system integrity, i.e., the system being honest about making decisions is a form of integrity. No matter the exact definition, it is clear that integrity is a concept that can play a role even in short-term interactions. Moreover, we follow Huberts in claiming that integrity is an essential concept for human-AI interaction [
46]. By applying Olaf’s principle,
1 integrity is a necessity and a mandatory requirement of being true to oneself and others [
74]. This aligns with the notion that, as AI is increasingly used to make autonomous decisions over time, the principles that underlie these decisions are highly relevant [
1]. Furthermore, lack of integrity could cause issues of bias and deception that have already started to impact humankind [
62].
Therefore, the question arises what the effect would be of explicitly mentioning principles related to integrity into XAI on appropriate trust of a user in the system. In human-human interactions, principles associated with integrity such as accountability, transparency, and honesty have been suggested as important for appropriate trust [
61]. The question arises whether XAI could explicitly use references to these principles in explanations, and how would this affect (the appropriateness of) trust in the system? More specifically, we consider three principles related to integrity to express through explanations:
(1)
Honesty about the system’s capabilities and confidence.
(2)
Transparency about the process of decision-making.
(3)
Fairness in terms of sharing what risks such as biases exist.
Honesty, transparency, and fairness appear in various studies as common elements of integrity in HCI, HRI, or human-AI interaction literature [
9,
25,
50,
57,
58] (see Section
3). Therefore, in this study, we propose to incorporate references to these principles of integrity in explanations and posit the following research questions:
RQ1:
How does the expression of different principles of integrity through explanation affect the appropriateness of human’s trust in the AI agent?
RQ2:
How does human trust in the AI agent change given these different expressions of integrity principles?
RQ3:
How do these different expressions of integrity principles influence the human’s decision-making, and do people feel these explanations are useful in making a decision?
We conducted a user study with 160 participants, where they were asked to estimate the calories of different food dishes based on an image of the food with the help of an AI agent. In our user study, the first research question focuses on how different expressions of principles related to integrity (hereafter referred to as “conditions”) in explanations can affect appropriate trust in human-AI interaction.
In this article, we study RQ1 in the context of making an exclusive choice in the form of a decision to choose oneself or an agent to complete the calories estimation task. Moreover, to allow us to study this question, we formally define what it means for trust to be appropriate in this context. RQ2 aims at understanding change in human trust in the AI agent over time under different expressions of integrity. Finally, RQ3 helps in understanding the effect of expressions of integrity on human decision-making and the effectiveness of explanations. Additionally, we were interested in exploring possible effects of covariates such as propensity to trust.
Contributions Specially, our research contributes the following:
1: We present a measurable construct for appropriate trust in the context of a specific task by providing a formal definition.
2: We illustrate an approach for expressing integrity of the AI systems with explanations focusing on honesty, transparency, and fairness.
3: By conducting a user-study with 160 participants aligned with our research questions, we show how explanations can help in building appropriate human trust in the AI system.
We believe our research holds significance for two main reasons. First, before we can investigate methods to establish suitable trust, it is crucial to have a clear understanding of its meaning. Second, the potential for conveying integrity-related principles through explanations remains largely unexplored. Through our contributions, we aim to broaden our comprehension of fostering appropriate trust between humans and AI, which is vital for effective human-AI interaction [
79].
4 Method
4.1 Participants
One-hundred-eighty-two participants (89 female, 93 male) were recruited to participate in the study, via the online crowdsourcing platform Prolific (mean age = 24.8 years, SD = 4.4 years) and the student university mailing list (mean age = 22.1 years, SD = 2.3 years). We recruited through two different methods, because we had less turnout of students from the mailing list due to long study completion time. There were no differences among the two samples of participants for the responses we received.
A total of 121 participants participated through the crowdsourcing platform and 61 through the university mailing list. We chose Prolific platform because it is an effective and reliable choice for running relatively complex and time-consuming interactive information retrieval studies [
99]. Participants were selected based on the following criteria: age range (18+ years old); fluent level of English—to ensure that participants could understand the instructions; and had no eating disorder—to ensure minimal risk to participants for viewing different food items.
Thirty-five percent of the participants reported having studied computer science or some related field. Our participants were from 30 different countries, with most participants reportedly born in the United Kingdom (35), Germany (26), the USA (20), and India (20). Participants were informed about the nature of the task and the total completion of around 35 minutes. Those who accepted our task received brief instructions about the task and were asked to sign an informed consent before beginning their task session.
The study was approved by the Human Research Ethics Review Board of Delft University of Technology (IRB #2021-1779). Prolific participants received an honorarium of £ 5.43/hr for their participation. All participants were provided an option to participate in 5x 15 Euro Amazon gift voucher raffle prize.
4.2 Task Design
We aimed to establish human-in-the-loop collaboration in our experiment; i.e., a human making a decision with the assistance of an AI assistant. In our experiment, participants were asked to estimate the calories of different food dishes based on an image of the food. We designed this task around calories as an approachable domain for our participants. The food dishes in our experiment were specialized dishes from different countries around the globe. It is rare that participants can judge all the food dishes well but are often good at judging their own cuisine. Therefore, we told participants that there is an AI assistant to help them in identifying the correct amount of calories.
During the brainstorming session of the authors, we decided to use the Food-pics database [
11] for selecting our dishes. We selected this database because it contains most popular dishes for European and North-American populations from across the globe along with detailed meta-data of the dishes. Fifteen randomly selected food dishes (referred to as ‘‘rounds’’ hereafter) were taken from this database in the main experiment. Each round consisted of five steps.
Steps of the task: At the first step, participants were shown an image of a food dish. They were asked to select their confidence in correctly estimating the calories of the food dish. Specifically, we asked our participants, on a scale of 1–10, with 1 being ‘‘Not at all confident’’ and 10 being ‘‘Fully confident,’’ How accurately can you estimate the calories of this food image (Q1)? A zoom-in option was also provided to participants to have a closer look at different ingredients of the food image. Subsequently, they were asked to guess one of the four options they believed to be closest to the correct amount of calories in the dish. One option out of four was always the correct answer, and the first step only involved guessing the correct answer.
At the
second step, an AI assistant guessed the correct answer from the same options as step one. The AI assistant provided a list of ingredients that it believed to be a part of the dish and the dish name with confidence scores (
for details, refer to Figure 3) in real time. The AI assistant also explained the reasoning for an answer by providing explanations. Additionally, at this step, participants were asked (Q2) to tick a checkbox if they believed that the AI assistant could better estimate the calories than themselves. At the
third step, participants selected their final decision by choosing between themselves or the AI assistant (Q3). At the
fourth step, participants rated their comfort level in making the decision (Q4) and usefulness of explanations (Q5). Finally, at the
fifth step, the correct answer was shown to the participants and participants were asked to adjust their trust level in the AI assistant. An overview of the above steps is visualized in Figure
1.
Scoring method: Each correct answer yielded +10 points, and an incorrect answer cost -10 points. We specifically applied -10 points for a wrong answer to involve the risk factor associated with trust. Additionally, participants were informed that if they end up in the top three scorers on the leaderboard, they will qualify to receive a 15 Euro gift voucher. The idea to include the leaderboard was to turn a single-player experience into a social competition and provide participants with a clear goal. Participants were only informed about the top scores of the leaderboard and their rank once they finished the task. We did this to ensure that participants make an informed selection till the end of the task to qualify for the prize. Based on our exit interviews, participants were careful with their selection, as they wanted to maximize their chance of winning the award.
4.3 Measures
We used two types of measures. First, subjective measures where users directly report their opinion (referred to as “subjective measurement” hereafter) (e.g., References [
21,
36,
119]). Second, behavioral measures (e.g., reliance [
14,
27] and trustworthiness, e.g., References [
32,
34,
50]). We used the wording AI assistant instead of AI agent for the ease of participants.
Subjective measures: Guided by the trust definition in the human communication research domain [
114], we measured participant’s trust inspired by Yang et al. [
118] as four different measures: (1) cognitive trust to understand human estimation of AI agent capabilities [
52], (2) participant’s comfort level in making a decision [
118], (3) usefulness of the AI assistant explanation [
118], and (4) a global trust meter that captures changes in trust [
55].
First, human cognitive trust to follow the AI assistant recommendation was measured via Q2: “Select this [check] box if you think that the AI agent can better estimate the calories than yourself.” We informed our participants that by selecting the check-box they believe that the AI agent is better at the task than themselves.
Second, human comfort was measured by the question: Q4—“How do you feel about your decision?” this question measured participants’ comfort in taking a decision and was rated on a 10-point Likert scale from
Not at all comfortable (1) to Very comfortable (10) with a step size of 0.2, i.e., step sizes were 1.0, 1.2, 1.4...9.8, 10.0. We included this question in our user study for two reasons: (1) based on recent work by Yang et al. [
118] indicating the importance of human comfort in decision-making and (2) based on our pilot study where participants often used the word “comfortable” to describe their decision, which also matches with prior work by Wangberg and Muchinsky [
109].
Third, AI assistant explanation was measured by the question Q5: “Was the explanation by the AI assistant helpful in making the decision?” This item was rated on a 10-point Likert scale from Not at all helpful (1) to Very helpful (10) with a step size of 0.2.
Finally, a linear “Trust Meter” ranged from complete distrust (0) to complete trust (+100), inspired by Khasawneh et al. [
55]. Participants were asked to adjust the trust meter after every round if their trust in the AI assistant changes. The trust meter was always available to participants and took the previous round’s trust meter value in every new round. For the first round, the default value of the trust meter was set at 50.
Behavioral measures: For trustworthiness and reliance on the system, we looked at what the participant and AI agent did. First, our trustworthiness (TW) measurement was about who was better at the task, so could be either the participant, the AI agent, or both. It was measured by considering how far the selected option was from the correct answer. No two options among the four options were equal distance from each other. For example, if available options are 25, 66, 97, and 143, of which the correct answer is 97, and human selection is 66 and AI agent selection is 143, then human TW is higher than the AI.
Second, participants were asked to “Select your final decision by selecting among the two options—yourself or the AI assistant’s guess” (Q3). With Q3, we measured reliance (distinct from trust, as we discussed in the introduction) by analyzing the behavior of the participants. If they followed the AI assistant’s advice or decision and selected it, then they were considered to rely on it. If they switched their answer to another answer than the advised answer, then they did not. In case the two options were same, participants were asked to still decide based on the reasoning for calories of the dish, classification of ingredients, and confidence levels. Their choice determined their reliance behavior on the AI agent.
It is important to note that although trust and reliance are related concepts, they should be measured as independent concepts. In this work, we follow this distinction as pronounced by Tolmeijer et al. [
98], where trust is the belief that “an agent will help achieve an individual’s goal in a situation characterized by uncertainty and vulnerability” [
64, p. 51], while reliance is “a discrete process of engaging or disengaging” [
64, p. 50] with the AI agent.
4.4 Experimental Setup
The study was a mixed between- and within-subject design. The within-subject factor was subjective ratings and between-subject factor was the integrity condition. This design choice was inspired by Hussein et al. [
47]. Participants were randomly assigned to one of four different experimental conditions (“Baseline,” “Honesty,” “Transparency,” and “Fairness”). Each condition had an equal number of participants. We did not manipulate other factors such as time [
81] and workload [
22], but we controlled reliability [
65] and risk factors [
77]. The advantage of this experimental setup, as stated by Miller [
82], is that we can perform detailed analysis on the relationship
Trustworthiness \(\rightarrow\) Perceived Trust, which in turn helps in understanding appropriate trust.
We utilized Clarifai Predict API with the “Food” model to recognize food items in images down to the ingredient level.
2 Our visual classifier returned a list of concepts (such as specific food items and visible ingredients) with corresponding probability scores on the likelihood that these concepts are contained within the image. Our pre-trained classifier accuracy was about 75% (11/15 = 73.33%), roughly matching the average actual classifier’s accuracy of 72%. The list of ingredients along with their confidence score was represented in the form of a table, as shown in Figure
3.
Sequence of trials: Each participant finished all 15 rounds, including a trial round. The number of rounds was decided to (1) compare with other experiments that studied trust (e.g., References [
99,
118]), (2) have enough trials to develop trust but prevent participants from memorizing the order (serial position effects [
44]), and (3) have sufficient data for all the integrity conditions.
In each condition, participants finished a sequence of trials. All the sequences had the identical order of correct/incorrect recommendations by the AI assistant. This identical order allowed us to compare different conditions. We also ensured that the AI agent response in the trial round was always correct to protect trust in an early stage and to not skew or strongly bias towards wrong [
72]. Food dishes in the sequence were randomized, and the instances used for training and practice were excluded in the main trials. On completion, participants were asked to fill in a post-experiment questionnaire targeted towards (a) their overall experience, (b) possible reasons for their changes in trust meter, and (c) their decision to select themselves or the AI assistant.
Pilot Study and Pre-test of Explanations: We used a think-aloud protocol with three participants for a pilot study. The aim of the pilot study was to test the experiment design and check the explanations manipulations. In our experiment, participants were comfortable with estimating calories of the food dishes based on their familiarity with the cuisine and often chose the AI agent when they were not confident. For example, a participant who identified himself as an American often relied on the AI agent to guess a food dish from Myanmar. Similarly, another participant who identified herself as an Asian often relied on the AI agent for a Mexican food dish. Based on these observations and UI layout feedback from the participants, we fine-tuned the questions and instructions. After the experiment was finished, we checked for manipulation of explanations. We asked our participants to describe the principle of integrity they saw in the experiment from the note cards that we used earlier with the explanation creators. All participants correctly identified the integrity principles from the note cards. This result helped us in pre-testing our explanation and start with the main experiment. We excluded these three participants from the main experiment.
4.5 Procedure
After participants provided informed consent, they saw an overview of the experiment. As shown in Figure
2, participants were first asked to complete a pre-task questionnaire consisting of (i) demographic questions about their age and gender as well as (ii) the propensity to trust scale [
35] (Q6) and a balanced diet eating question (Q7) on a 10-point Likert scale from
“I don’t care of what I eat” to “I care a lot of what I eat.”At the beginning of the experiment, we told participants that they would work with an AI assistant and hinted that it could be wrong in its recommendation. They then took part in a trial session, read the instructions, saw an example of a food dish, and practiced using the trust meter. Participants then proceeded to the main session. For each
step, as explained in Section
4.3, participants first saw an introduction of what they could expect to see. In addition, they were asked to focus on the table generated by the AI assistant for specific food items and visible ingredients with corresponding probability scores. The screenshots of each step are in Figure
3.
5 Results
One-hundred-eighty-two participants participated in the user study, of which 19 (18 from Prolific and one from the university mailing list) did not pass our attention checks, leaving us with 163 participants. Furthermore, one participant selected the AI agent, and two always selected themselves, with a total experiment time of only eight minutes, indicating potentially invalid data. Hence, we removed the data of those three participants. Thus, the results and analysis include the remaining (160 participants (female = 85, male = 75; mean age = 23.6 years, SD = 2.8 years). A power analysis of the mixed ANOVA with G*Power tool [
30] revealed that with 40 participants per group, we have a power of 0.93 (considering a medium effect size of
f = 0.25,
\(\alpha _{new}\) < .046).
5.1 Effect of Different Principles of Integrity on Appropriate Trust
In this subsection, we analyzed how the expression of different principles of integrity through explanation affects appropriateness of the trust of a human in that agent (
RQ1). For this analysis, we first conducted a descriptive statistics and then performed inferential statistics on the collected data to study the effect of explanations. The post-experiment questionnaire responses were analyzed to support the results and are reported in Section
6.1.
The categorization of trust categories was calculated based on Table
1. Following the equations in the table, Higher TW was derived based on the TW measurement (as described in Section
4.3). The value for Human trusts who? was based on the participant’s response for Q2, and for Human selection, it was based on Q3. On entering these values in Table
1, we got our five different trust categories, as described in Section
2.2.
Frequency Distribution: Table
3 shows the frequency distribution of different trust categories as observed for the explanations expressing different principles of integrity. For example, consider a participant who viewed explanations expressing honesty about uncertainty and who fell into the appropriate trust category seven times, inconsistency (good and bad outcome) two times each, under-trust three times, and over-trust once. Then, for the honesty condition in Table
3, we report appropriate trust as 0.46, inconsistency (good and bad outcome) as 0.13 each, over-trust as 0.20, and under-trust as 0.06 on a scale of 0–1. Each condition consists of data of 40 participants collected over 15 rounds, i.e., 600 data points per condition.
Effect of Integrity Expressions: We found a statistically significant effect of the integrity principles expressed through explanation on trust categories. A chi-square test of independence
\({\chi }^2\)(12,
N = 40) = 55.11, p < .001,
\(\varphi _c\) = 0.30 showed that there is a significant relationship between trust categories and experimental conditions. We further analyzed our contingency table (
Table 3) as a mosaic plot [
40] to investigate relationships between different trust categories and conditions. While constructing the mosaic plot, we extracted Pearson residuals from the output of the
\({\chi }^2\) results.
We visualized Pearson residuals contribution to the total chi-square score using the correlation plot (
for details, refer to Appendix A, Figure 6) as our exploratory analysis. Following the correlation plot, a correlation value of
\(\rho =\) 3.45 between the “Fairness about risk” explanation and appropriate trust category was found. Following Hong and Oh [
43], this correlation implies a strong association between the “Fairness about risk” explanation and the appropriate trust category.
We were also interested in understanding how different trust categories build up or are relatively stable over time and how they are affected by the wrong answer. Figure
4 illustrates the frequency distribution of appropriate trust across 15 rounds. The figure shows that appropriate trust drops with the first wrong answer across the four conditions. However, this effect does not perpetuate in later rounds. It is interesting to note that appropriate trust builds up over time (rounds 1 to 4) and recovers slowly after each wrong answer. We also provide a similar graph as Figure
4 in the supplementary for other trust categories.
Predictors for Trust Categories: The trust categories were binary variables in our study: Either the participant achieved appropriate trust or not. For this reason, we also conducted a multilevel logistic regression per category, predicting proportions of the five trust categories separately. In our model, each round was treated as one observation, i.e., each row was one observation, with 15 rows per participant.
Baseline Model: We first created a baseline model, which comprised a random intercept per participant and the different explanation conditions. Next, we added the ‘‘Wrong answers by the AI agent’’ as additional fixed effects factor to our baseline model. Our dependent measure indicated whether this behavior is an appropriate trust behavior or not (similarly for other trust categories). Furthermore, we added a lag factor as a fixed effect to observe the effect of the previous round answer on the trust rating of the current round. The lag factor was coded as 1 if the previous trial was correct and 0 if not.
Baseline Model plus Covariates: We added three covariates ‘‘Care about eating’’ responses, ‘‘Propensity to trust’’ responses, and human confidence in estimating the calories (Q1) to our baseline model one-by-one. Since the
\({\chi }^2\)-based ANOVA comparison showed no significant improvement in the goodness-of-fit of the model upon adding the covariates and none of the covariates were significant predictors of any trust category, we decided not to include them in the models, see Table
4. For comparing the models for goodness-of-fit, AIC and BIC values are provided in the Appendix
B, Table
11. We also report an marginal and conditional R-squared values, which indicates variance explained by both fixed and random effects; see Table
8.
Appropriate Trust: For the appropriate trust category, the “Fairness about risk” explanation was the only statistically significant predictor. The coefficient value of “Fairness” (\(\beta\) = 0.591, p \(\lt\) .001) is positive. Thus, we can say that when a participant interacted with the AI agent explaining with a focus on fairness through exposing risk and bias, the participant was more likely to achieve an appropriate level of trust in the AI Agent.
Inconsistency: For the inconsistency with a bad outcome trust category, we did not find any statistical significant predictor variable in our analysis. However, for the inconsistency with a good outcome trust category, the “Fairness about bias” explanation was again the only statistically significant predictor variable. The coefficient value of “Fairness about bias” (\(\beta\) = −0.526, p \(\lt\) .001) is negative. Thus, we can say that when participants interacted with the AI agent explaining with a focus on being fair by exposing bias and risk, the participants were less likely to end up in the inconsistency with a good outcome trust category.
Under-trust and Over-trust: For both the under-trust and over-trust categories, we did not find any statistically significant predictor variable in our analysis.
5.2 Effect of Different Principles of Integrity on Subjective Trust
In this subsection, we analyzed, how does human trust in the AI agent change given these different expressions of integrity principles (
RQ2)? For this analysis, we performed a similar approach as RQ1 first to conduct descriptive statistics followed by inferential statistics where we focused on a multilevel regression model. Here, also, post-experiment questionnaire responses were analyzed to support the results and are reported in Section
6.2.
Change in Trust Level Over Time: We used a global trust meter to capture changes in trust over time. First, we calculated changes in human trust towards the AI agent over time by subtracting differences in trust meter values between every two subsequent rounds. As can be seen in Figure
5, trust in the AI agent dropped whenever the AI agent provided a wrong answer. We recorded an average drop of 15 points in trust score when a wrong answer was preceded by a right answer by the AI agent. This drop was more than twice the number of points when there were two wrong answers in a row, i.e., around 35 points. These results seem to confirm that the AI agent’s accuracy influences trust.
Predictors of Subjective Trust Scores: Our dataset includes one row for each participant and one column for each variable or measurement on that participant. In the context of longitudinal data, this means that each measurement in time would have a separate row of its own, therefore, we analyzed this data using a multilevel regression model following the instructions by Finch et al. [
33, Chapter 5].
Baseline Model: We analyzed the global trust meter responses as our dependent variable to test the effect of different principles of integrity expressed through the explanations with a multilevel regression model with random intercept for trials. In addition, we added the current round correctness and lag as additional factors in our baseline model to test the effect of it on subjective trust scores. Since our data is linear, we used the LMER method for this analysis with the lmerTEST package v3.1 [
60].
Baseline Model with a lag factor plus Covariates: We now added fixed interaction effect between the correct/incorrect answer and the lag variable to the model. Furthermore, we also examined the two-way interaction effect between the correct/incorrect answer with different explanation types. This model was significantly better than the other two models for the goodness-of-fit,
\(Pr(\gt chisq)\lt 0.05\) (refer to Appendix
B, Table
9 and
10, for further details). Hence, we finalized this model as reported in Table
5.
Following the same procedure as RQ1, we further explored adding same covariates to our model. Adding these covariates did not improve our model and, therefore, we did not include those variables in our final model. Finally, we added human comfort and usefulness of explanations ratings to the model and found that only the usefulness of explanations helps in improving our model.
Based on the regression results, we can observe that the honesty explanation is a significant predictor of the trust score compared to other explanations expressing integrity (
\(\beta\) = 7.84,
p \(\lt\) .05), i.e., participants who saw the honesty explanation rated their subjective trust in the AI agent higher than the other conditions. Furthermore, as shown in Table
5, both the correct/incorrect answer and the lag variable are statistically significant predictors of the subjective trust ratings (
p \(\lt\) .05). This result confirms our intuition observed from Figure
5, where the effect of the correct/incorrect answer on the trust scores can be observed. Interestingly, the significance of the lag variable shows the effect of the previous round correctness on the current round trust score. In other words, as it is important to study the effect of the correct answer on the trust score for the current score, it is equally important to study how the AI agent performed a round before to observe the changes in the trust score.
Additionally, our results show that the interaction effect between the correct/incorrect answer and the lag variable is significant (
\(\beta\) = −3.38,
p \(\lt\) .05). Given that the sign on the interaction coefficient is negative, we would conclude that there is a buffering or inhibitory effect. Analyzing the correct/incorrect answer, lag, and their interaction reveals the drop and restoration of global trust ratings. For instance, two consecutive correct trials yield a combined score of 21.44, while a correct trial followed by an incorrect one results in a high initial drop of 7.77. Similarly, an incorrect trial followed by a correct one leads to a recovery to 17.05, almost reaching 21.44 again. Two consecutive incorrect trials cause a complete drop to 0, followed by a gradual recovery to 7.77, 17.05, and 21.44. These findings align with the results in Figure
5.
Moreover, there is a significant interaction effect between the correct/incorrect answer and the honesty explanation (
\(\beta\) = −4.63,
p \(\lt\) .05). This indicates that the impact of errors is smaller in the honesty condition, as depicted in Figure
5. Also, usefulness of explanations is a predictor of global trust ratings (
\(\beta\) = 9.45,
p \(\lt\) .0001). This result means the participants found the explanations helpful in adjusting their trust levels after each round.
5.3 Effect of Different Principles of Integrity on Human’s Decision-making and Usefulness of Explanations
In this subsection, we analyzed, how do different expressions of integrity principles influence the human’s decision-making, and do people feel these explanations are useful in making a decision (RQ3)? For this analysis, we performed a similar approach as in RQ2.
Descriptive statistics: We used human comfort ratings (Q4) and usefulness of explanations ratings (Q5) by participants to analyze our responses for RQ3. These ratings were measured after each trial. Therefore, we followed the same analysis method as for RQ2. For the human comfort ratings, we did not find any major differences among the four conditions; refer to Figure
7, Appendix A. The mean ratings for the baseline condition was 6.178 (1.981), for honesty 6.285 (1.863), for transparency 6.246 (1.811), and for fairness 6.128 (1.948). Similarly, for the helpfulness of explanations ratings, we also did not find any major differences among the four conditions; refer to Figure
8, Appendix A. The mean ratings for the baseline condition was 6.333 (2.053), for honesty 6.675 (1.764), for transparency 6.486 (1.845), and for fairness 6.423 (1.831).
Predictors of Comfort and Explanations Helpfulness: We analyzed the human comfort ratings and usefulness of explanations responses as our dependent variable to test the effect of different principles expressed through the explanations with a multilevel regression model with random intercept for participants. We followed the similar model as for RQ2 in analyzing the results of this RQ. Adding the covariates from RQ1 did not improve both our models (human comfort and explanations help). Also, adding the interactions as in Table
5 was not helpful in improving the model statistics. Therefore, we did not include them in our final models. We report the regression model of predicting the usefulness of explanations in Table
6 and human comfort in Table
7, Appendix
B.
Based on Table
6, we can observe that the trust score is a significant predictor of the usefulness of the explanations (
\(\beta\) = 0.02, p < .001), i.e., participants who rated their subjective trust in the AI agent could have found the explanations provided by it more helpful. Similarly, we found that human comfort in decision-making is another significant predictor of the usefulness of the explanations score (
\(\beta\) = 0.34, p < .001). None of the other covariates were found to be significant predictors of the human comfort score except the helpfulness of explanations (
\(\beta\) = 0.35, p < .001).
6 Discussion
Our results offer three major contributions for discussion in the field of the Human-AI interaction:
(1)
We can measure appropriate trust through a formal computation method in the context of a specific task.
(2)
Appropriate trust can be enhanced by providing expressions of fairness principle of integrity in the context of human-AI interaction. Furthermore, appropriate trust builds up over time and recovers slowly if an AI agent provides an incorrect output.
(3)
Subjective trust builds up and recovers better by providing expressions of honesty in human-AI interaction.
Therefore, in this section, we will discuss our results about how the explanations expressing different integrity principles influenced appropriate trust. Next, we will discuss how participants perceived the AI agent’s advice and made their decision based on theories from psychology and social sciences, which possibly led them to select the AI agent. Finally, we will discuss the limitations of our work and possible future directions.
6.1 Expressions of Integrity and Appropriate Trust
We found that the “Fairness about bias” explanations were the most effective for fostering appropriate trust in the AI agent. We know from previous work by Asan et al. [
4] that knowing about biases can influence human trust, which perhaps also explains why trust becomes more appropriate if human can intervene in AI decision-making.
A closer look at our findings shows that, in our case, the explanations highlighting potential bias and risks actually improved appropriate trust through increasing trust rather than decreasing it. This makes intuitive sense, as fairness explanations could have triggered more cognitive effort resulting in increase of people’s cognitive thinking for engaging analytically with the explanation [
13]. Furthermore, recent education research has shown that students’ actual learning and performance was better with the more cognitively demanding instructions [
24]. Overall, our findings seem to support the proposition that we should be building explainable and bias-aware AI systems to facilitate rapid trust calibration, leading to building appropriate trust in human-AI interaction [
101].
Interestingly, irrespective of which integrity principle was highlighted, explanations seem to have helped our participants in correcting under-trust and over-trust (see Figure
3). In particular, being explicit about potential biases and risks actually decreased inconsistent behavior with a good outcome over the other explanation types in some cases (including those cases where trust was appropriate). A possible reason is that these explanations exposed potential bias(es) in the data or the model, which could have convinced the participants to follow the AI agent. For example, P62 reported that
“If the AI Assistant says dataset is biased, then [it’s] true I suppose and it’s more trustworthy than my common sense because I haven’t seen the data, so I will stick to my initial trust decision” (P62, Fairness about bias condition). Similarly, P133 reported
“I feel like the results of [the model] were strange hence I went with my decision first but I was wrong, so next time for a similar round I choose the [AI] Assistant and it was right. Hence, I decided to follow him [AI Agent]!” (P133, Fairness about bias condition).Another finding of our study was that irrespective of what principle of integrity was expressed in the explanation, around 30% of the time participants ended up in the inconsistency (good and bad outcome) trust category. This shows that even when participants reported that they trusted the AI agent to be better than themselves, they still quite often chose not to rely on it. Based on our exit interviews, we found that participants acted inconsistently several times during the experiment to increase their score leading to winning the gift voucher. For example, P20 told us “I think [AI Agent] it is better in identifying this dish, but it was also wrong with a similar dish in one of the previous rounds, so I will choose myself because I do not want to lose any points.” Similarly, P77 said “Ahh, I was just checking if I say I trust [AI Agent] him but do not go with him then what will happen. If it turns out to be good, I will do this again to keep my score up.”
We found none of the covariates “Care about eating” and “propensity to trust” a predictor of subjective trust score and any trust category. For “Care about eating,” a potential reason could be that people who rated higher on caring about their eating behavior were more aware of the different ingredients with their calories level that were known to them and vice versa. Given the images of the food items in our experiment were diverse, this could have impacted their skills to judge the calories well. For example, P97 with a score of 10 for the “Care about eating” question reported that “I am very picky about what I eat as I need my balanced diet. However, this task is not easy as it has many international cuisines!” For, “propensity to trust,” one possible explanation can be that this dispositional covariate became less important as system experience increased. Alternatively, this covariate could influence trusting behaviors more than trusting beliefs. More research is needed on the effect of propensity to trust factors over time.
6.2 Subjective Trust, Helpfulness, and Comfort
Subjective trust is not the same as appropriate trust [
118]. Chen et al. [
18] identified in their study that participant’s objective trust calibration (proper uses and correct rejections) improved as intelligent agent became more transparent. However, their subjective trust did not significantly increase. The “Fairness about bias” explanation in our work helped in fostering appropriate trust in the AI agent. However, it did not necessarily improve participant’s (subjective) trust. This result is in line with Buçinca et al. [
13], who showed that there exists a tradeoff between subjective trust and preference in a system of human+AI decision-making.
From Figure
5 and Table
5, it is evident that the subjective trust ratings for the “Honesty” explanations are significantly higher compared to the other explanation types. This observation can be explained by the explicit references to honesty by the AI agent as reported by P102,
“It [AI Agent] mostly talks about being honest and based on all rounds—I think it is, so I trust it” (P102, Honesty condition). We can recall that the AI agent in the “Honesty” condition expressed its honesty by stating it cared about honesty and adding further information about uncertainty in the decision-making. This expression of honesty resonates with Wilson [
115], who argued that as long as communication is performed in an honest way, it produces ecological integrity affecting trust.
We also found the effect of the current and the previous round correctness on the subjective trust ratings; refer to Table
5. This result is echoed from a prior study by Tolmeijer et al. [
99], who showed that system accuracy influences the trust development in an AI system. Furthermore, the effect of the previous round correctness, i.e., the lag in Table
5, had an influence on the trust score as well. This result indicates that trust is not only influenced by how the system is performing now but also on how it performed before. Human trust develops over time and depends on many factors. Also, each interaction with a system can alter the trust in that system. For example, Holliday et al. [
41] looked at trust formation within one user session; they found that the impression of system reliability at each time point shapes trust. Our results align with van't Wout et al. [
105], who show that the outcome of a previous round (whether the trust was repaid or abused) affected how much a participant trusts another participant to send money.
Turning to the transparency explanations, based on post questionnaire responses, the participants found the visual part of the explanation difficult to follow. For example, “I can see there is best, good and unsure match but I have no idea it really helps as everything looks almost same!” (P140, Transparency condition). Additionally, we believe that the combination of visual with textual explanations may have hampered understandability as reported by P17 “That’s simply too much of information for me!” (P17, Transparency condition).
Overall, trust scores exhibit a consistent level of stability, particularly an initial overall level of trust that remains steady over time, except in cases where an error occurs (Figure
5). This is in line with our intuition of how trust works. Interestingly, while an increase of trust between rounds three and four was expected, trust recovers to same levels between rounds six and seven and nine and ten. A potential explanation can be that the AI agent’s early impressions positively influenced the AI agent’s perceived reliability, leading to increased trust even after inaccurate advice.
The result in Table
6 demonstrates no effect of type of explanations on participant’s usefulness of explanations ratings. However, we found that participant’s trust and human comfort scores significantly predicted the usefulness of explanations ratings. We can understand this result as if an explanation was helpful; participants often rated their trust and comfort in the decision-making process higher than the non-helpful explanations.
We also found that participant’s decision-making comfort levels were similar across conditions. However, the explanations score significantly predicted the participant’s comfort level. A potential reason might be that other individual factors more strongly influence the subjective notion of the comfort of decision-making than the differences between our explanations. Another possible explanation is that different types of explanations by the AI agent did not necessarily improve the comfort level but only assisted in decision-making. A previous study focusing on integrity among colleagues reported that showing integrity did not increase the comfort level of employees to rely on each other [
113]. This result aligns with our findings, where it is hard to establish human comfort by expressing principles related to integrity.
6.3 Understanding Human Psychology for AI Assistant’s Advice Utilization
Advice utilization has been studied in the literature of psychology to understand how humans utilize the advice given by others [
70]. Jodlbauer and Jonas [
51] found that while three different dimensions of trust (competence, benevolence, and integrity) mediate between advisor and listener, for the acceptance of advice, trust in advisor integrity played the strongest mediating role in human-human interaction.
Given that all the AI agents in our user study had the same competence level, the only difference was what principle of integrity was highlighted in the explanation of the AI agent. This difference partly explains why integrity expressions of fairness through exposing potential bias and risk help understand appropriate trust in our study. Furthermore, this difference partly explains how integrity expressions of honesty about uncertainty in decision-making help understand users' subjective trust in our study.
The theory by Bazerman and Moore [
8] can help us partly understand why explanations exposing potential bias and risk were significantly different from the other explanations used in this study. They showed that humans are limited in their rationality and are often subject to cognitive bias. Furthermore, when decisions involve risks based on unbiased advice and people cannot weigh all relevant information, decision-makers often use the advice [
8] that helps in reducing their own bias. Therefore, participants’ trust in the “fairness about risk” condition was more appropriate compared to other conditions. For example, P73 reported,
“I was not sure about different type of vegetables in the salad but the AI told me correctly that it was also not sure, hence I decided not to trust it and went with my best possible option—which was eventually correct!”.
6.4 Reflections on Design Considerations for Building Appropriate Trust
In the prior research, appropriate trust is often linked with [not] relying on the AI system when it makes a [in]correct decision. This notion of appropriate trust heavily relies on the capability of the AI system leaving out other factors that can influence trust, such as integrity or benevolence. Here, our work serves as an example of how expressing different principles related to integrity through explanations can establish appropriate trust in human-AI interaction. Therefore, an essential focus of designing AI for fostering appropriate trust should be both on the capability as well as the integrity of the AI system. However, this comes with the challenge of obtaining accurate measurement information regarding the machine learning models’ performance, bias, fairness, and inclusion considerations.
Lord Kelvin has promoted measurement with his memorable statement: “If you cannot measure it, you cannot improve it” [
54]. There is much discussion on the AI systems to be appropriately trusted. However, there are very few suggestions for measuring the appropriate trust. Part of this lack of literature on measurement is because trust is subjective in nature. What seems an appropriate trust for person A will not be appropriate for person B. Nevertheless, it is also crucial for humans to calibrate their trust, recognizing that AI systems can never be 100% trustworthy. Likewise, we made an attempt to capture trust into various categories (appropriate, over-/under-trust, inconsistency) through formal definitions.
We believe that our proposed formal definitions can help facilitate communication between researchers, practitioners, and stakeholders by providing a common language and understanding of what is meant by measuring appropriate trust. Furthermore, it can set clear expectations for how trust should be measured, can promote a better understanding of what trust means, and what aspects of trust should be considered [
10]. We hope this work highlights the need for guidelines to incorporate a method to capture appropriate trust and develop an understanding of human decision-making with psychological theories such as advice utilization.
6.5 Limitations and Future Work
Our work limits itself to exclusive decision-making, which does not represent the full spectrum of possible human-AI interaction. Our task was inspired by scenarios in which a human needs to make a conscious choice to either follow the system advice or their own; such as the autopilot mode or cruise control in a car. Therefore, our findings may not generalize to every scenario such as human-AI teaming, where the focus is more on the collaboration. Additionally, in our definition of appropriate trust, we did not further explore the reasons for the selections made by the human. Interesting notions for further study are how our notions of appropriate trust can be influenced by the delegation of responsibility, focusing on different choices people make in the delegation. For example, people are more likely to delegate a choice when it affects someone so as not to take the blame if something goes wrong [
95].
In our user study, we used images of various food items for estimation of food calories based on a machine learning model. In our day-to-day situations, people hardly use such technological advances. Therefore, the level of realism can be further improved in future studies. Furthermore, our users got 15 trials in the same condition, which could have led to possible learning or fatigue effects even though we provided a break after seven rounds. Also, the order of the wrong AI advice was same across the conditions, which made it hard for us to control the possible fatigue effects.
We have utilized situation vignettes to craft our explanations. In our work, custom build explanations to highlight different principles related to integrity were better suited to our user study, i.e., by explicitly revealing the importance of individual notions of integrity (honesty, transparency, and fairness) in a calories estimation task. In this, we attempted to keep other variables (e.g., length) mostly the same, but, for instance, it was inevitable that the baseline explanation would be shorter. The style was controlled for in some way by having the same authors for all explanations, but here, too, differences might exist between conditions. For instance, the “fairness about risk” explanation might have been a little more technical, as it explained where in the process risks could come from (e.g., bias in training data). Although we cannot exclude such influences, we would argue that such slight differences will always be inevitable when expressing different principles in explanations. More research on, e.g., style of writing, length, would be relevant to further control for such factors [
106].
Finally, our explanations express the related principles of integrity in one specific way, and different methods of expressing these might have different effects on trust than what we found. However, with this work, we show a method for the AI agent to express its integrity in the form of explanations, and our aim for this research was not to design effective explanations but to study how different expressions of integrity can help in building appropriate trust.
A future research direction to scale this work could look at how we can create vignettes by systematically combining actions of the agent based on the affect control theory [
38] in real time. For example, one could adopt ensemble machine learning methods, as they are shown to perform well and generalize better for generating action-based situations [
26]. One could also look at PsychSim [
89] framework, which combines two established agent technologies: decision-theoretic planning and recursive modeling for crafting explanations using machine learning models.
Furthermore, the understandability of explanations might be further enhanced by design specialists and tested by crowdsourcing with a diverse demographic sampling. Broader findings would further enable designers to craft explanations to make AI systems more understandable and trustworthy. Finally, further work can explore trusting behavior targeting both integrity and benevolence as antecedents of trust.