1 Introduction
Systems designed to provide data-driven, algorithmic advice as input to human decisions are increasingly prevalent [
13,
16,
88]. Such systems have the potential to outperform decisions made by algorithms or humans alone [
8]. They have been used for a wide variety of applications such as identifying health conditions and suggesting appropriate care [
13,
88] or predicting the likelihood of a defendant skipping bail to aid judges’ decision-making [
37,
46]. While many names are used to describe these systems, we will refer to them as
data-driven decision aids.
To maximize the value that data-driven decision aids can bring to individuals and society, people must be willing to adopt them. Prior work has illustrated, however, that ensuring adoption is not easy [
34,
35,
36,
63]. Humans quickly lose trust in data-driven advice when it has inaccuracies—whether stated [
3,
47] or observed [
21,
23,
75,
94]. This holds true even when empirical results show that humans perform worse without the advice, a phenomenon called
algorithm aversion [
21]. In addition to the accuracy concerns that may deter people from using algorithmic systems, users may forgo or stop using systems designed to help them due to privacy concerns about the quantity of personal information these systems may collect to power their algorithms [
25,
28,
55,
64,
78,
84].
Thus, we must understand how humans react to inaccuracy and lack of privacy in data-driven decision aids to ensure that future technological advances are not stymied by low adoption.
Background: COVID-19 Apps. The spread of the 2019 novel
coronavirus disease (COVID-19) has spurred the development of a new set of data-driven decision aids, including COVID-19 mobile
contact-tracing apps [
26,
85]. Contact-tracing is a longstanding epidemiological tool for identifying at-risk individuals in order to contain the spread of a disease [
24]. Contact-tracing apps—a new type of data-driven decision aid not widely implemented [
20] until the COVID-19 pandemic—are designed to support the scaling of manual contact-tracing efforts. These apps work by tracking the shared contacts of all system users, allowing for validated reports of infection (e.g., using a passcode from a medical provider test) to be quickly input into the system. Contact-tracing apps use these two pieces of information to notify individuals who have come into contact with someone who has tested positive for the virus and who has input that information into the system. These notifications serve as advice for potentially-exposed individuals to consider getting tested and to self-isolate. For the remainder of the paper, we refer to this type of system as a ‘COVID-19 app’.
There are two primary design patterns for COVID-19 apps: centralized approaches with a trusted party, e.g., France’s StopCOVID app, and decentralized approaches that need no trusted party, e.g. DP3T and Google-Apple Exposure Notification Framework-based apps [
5,
15,
18,
22,
83]. Choosing between these designs can be seen as an attempt to weigh functionality, accuracy, and privacy. Centralized approaches can leverage location information to deliver hotspot information or provide governments with epidemiological insights [
68], but must store user data centrally and/or track their user’s movements. Decentralized COVID-19 apps aim to preserve user’s privacy, but may not be able to deliver all the same features. It is important to note, however, that the advantages of each approach are not absolute; centralized apps may still fail to provide accurate or actionable information (e.g., when mobile coverage is weak) and decentralized apps may still disclose private information (e.g., who has downloaded the app via Bluetooth beacon exploits, retrospective tracking of individuals who have tested positive, or information leakage from implementation bugs) [
10,
66,
68,
83,
86].
The benefit of any data-driven decision aid is limited by its adoption rate. This is especially true for systems like COVID-19 contact-tracing apps, whose effectiveness scales quadratically with adoption [
15]. As such, it is imperative that we conduct work to understand the factors that impact users’ willingness to start using these systems. There are many considerations that may influence people’s willingness to adopt COVID-19 apps [
68]. For example, a person may weigh the app’s features, the app’s benefits to themselves and their community [
79], who is offering the app [
39], how well the app will preserve the user’s privacy [
15,
54,
83], and the app’s accuracy.
1 In this work, we focus on how the latter two considerations, privacy and accuracy, as well as the primary public health benefit of the app, reduction in infection rate, influence Americans’ intent to adopt COVID-19 apps. To do so, we ran two surveys, the first sampled to be a census representative of the United States population and the second using Amazon Mechanical Turk.
What sets our examination apart from prior work? First, we believe that we are the first to build predictive models to estimate decision-aid adoption on the basis of both privacy and accuracy. We take a quantitative approach – validated in prior work measuring user acceptance of different levels of accuracy [
47] – and conduct large-scale human-subject experiments with 4,615 respondents. We utilize this data to develop statistical models of the impact of accuracy and privacy on decision aid adoption rates, specifically, predicting how numeric increases in benefits such as a reduction in the
false negative (FN) rate relate to increases in intention to adopt. Our quantitative models allow us to better estimate user response to potential app designs. For example, Saxena et al. [
76] have shown that using Bluetooth as a method for detecting proximity—the technique used by the Apple-Google infrastructure underlying many contact tracing apps [
5]—has an approximate error rate of 7–15%. Our empirical models allow us to estimate user adoption rates for apps with such error rates.
Second, we study decision aids that are broadly relevant to all members of society, significantly increasing the external validity of this work. Many prior experiments on data driven decision-aids asked survey respondents to imagine themselves as decision makers or in unfamiliar circumstances. However, respondents often have little experience making the kinds of decision being studied, e.g. setting bail in the criminal justice system, and therefore have difficulty imagining themselves in the required context. This effect, called
ego-centrism [
27], makes it difficult for respondents to extract themselves sufficiently from their own life experiences and biases to give maximally realistic responses in hypothetical scenarios. This, in turn, limits the external validity of prior work examining how users will react to a novel data-driven decision aid. Unlike the decision aids often studied in prior work, the consequences of COVID-19 are
very real for everyone, meaning respondents need not escape their ego-centrism and can respond based on their own personal experiences. Thus, COVID-19 provides a unique opportunity to study people’s willingness to adopt data-driven decision aids with high external validity: in a tangible, high risk situation that is relevant to the entire population.
In summary, our work investigates three research questions:
(RQ1).
Do accuracy (false negatives and false positives) and/or privacy influence whether people are willing to install a COVID-19 app?
(RQ2).
When considering installing a COVID-19 app, do people with different sociodemographic characteristics weigh accuracy and/or privacy considerations differently?
(RQ3).
How does the amount of public health benefit, accuracy, and/or privacy offered by a COVID-19 app influence people’s reported willingness to adopt COVID-19 apps?
While these are interesting questions to ask across various geographical and cultural contexts, we focus on answering them in the context of the United States. In sum, we find a significant, predictive relationship between the amount of public health benefit, false negative and false positive rate, and privacy risk offered by a particular app and respondents’ reported willingness to install (RQ3). We find that both privacy and accuracy significantly influence installation intent (RQ1) and that respondents with different sociodemographics (ages, genders, political leanings), experiences (knowing someone who died of COVID-19), and internet skills weigh accuracy vs. privacy concerns differently (RQ2).
By empirically developing statistical models of people’s willingness to adopt data-driven decision aids in the COVID-19 context we take an important step towards anticipating how users will react to other data-driven decision aids. These results can inform the design of future, more complex data-driven decision aids, including machine learning-powered models, to maximize their uptake and impact. Our findings offer guidance on the development of future data-driven decision aids and a methodological template for future quantitative modeling of how accuracy rates and privacy risk may affect the adoption, and ultimately the efficacy, of new data-driven decision aids.
2 Related Work
Prior work has considered how accuracy [
3,
47,
61,
75,
94,
95], privacy [
3,
45,
52,
92,
96], and fairness [
37,
50,
51,
74,
77,
81,
89] may impact societal acceptability and adoption of data-driven decision aids. Here, we review the relevant findings of this body of work.
Impact of Inaccuracies. There is a significant body of work studying the way inaccuracies impact users’ trust in machine learning systems. Dietvorst et al. [
21], building off of earlier work by Dzindolet et al. [
23], describe a phenomenon they term algorithmic avoidance, in which humans stop trusting algorithms once they see it make a mistake, even if the algorithm outperforms humans on the task. Yin et al. [
93] show that a both a model’s stated accuracy on held-out data and its observed accuracy affect users’ trust in the model. Yu et al. [
94,
95] explore the way that users’ trust in a model changes over time when observing errors, finding that system errors have an out-sized impact on user trust. Salem et al. [
75] found that users are generally willing to work with robots that exhibit faulty behavior, although their trust in the efficacy of those robots significantly diminishes after observing errors. Panniello et al. [
61] look at connecting the accuracy of recommendation systems to users’ trust in the recommended services and products and their willingness to purchase these services and products. They find that accuracy has only a limited impact on purchasing behavior, but does increase consumer’s trust. Finally, Kay et al. [
47] systematically investigate how users perceive accuracy errors and establish an instrument for measuring an individual user’s willingness to tolerate inaccuracies; we leverage this pre-validated methodology in our work.
Impact of Privacy Risks. There has been significant work on the impact of privacy risk on users’ willingness to use various technologies. Most relevant to our work are examinations of privacy concerns related to IoT, location-tracking, and medical technologies. For instance, Hsu et al. [
45] study the impact of privacy risks on users’ willingness to adopt and use Internet-of-Things devices, finding that privacy risks have only a weak effect on adoption. Xu et al. [
92] and Zhao [
96] each study the ways the privacy risks of location-based services impact users’ willingness to adopt the services powered by this technology. The former work finds that the intention of providers to implement privacy protections increases trust in the service and reduces perceived risk. The latter highlights that privacy concerns over location-based service systems could suppress users’ willingness to adopt them. Li et al. [
52] investigate similar questions in the context of wearable medical devices, showing that potential users perform a privacy calculus informed by the device’s benefits, the health information’s sensitivity, the user’s attitude toward emerging technology, policy protections, and perceived prestige. Angst et al. [
3] investigate users’ hesitance to adopt electronic medical records due to perceived privacy risks, demonstrating that concern for information privacy is a driving force behind the decision. Despite this large body of relevant work, to our knowledge, no prior work – particularly in the health domain – has examined how both privacy
and accuracy affect users’ willingness to adopt a decision aid.
COVID-19 Apps. There have been many studies of the acceptance of COVID-19 contact-tracing apps [
39,
53,
68,
79,
87], with most studies finding that users are concerned about app privacy, accuracy, and costs (e.g., mobile data use). However, to our knowledge no prior work quantified the impact of specific levels of accuracy and privacy on potential users intent to adopt these apps. End-user considerations for app adoption may vary based on the architecture of the contact-tracing app available to them [
68]. Ahmed et al. [
2] provide a comprehensive survey of the COVID-19 apps proposals and their underlying techniques. For a complete list of COVID-19 apps and proposals, we refer the reader to the citations contained within this work. Several important privacy-preserving COVID-19 app proposals include DP3T [
83], PACT [
15], BlueTrace [
9], and the Google-Apple Exposure Notification Framework-based apps [
5]. As described in Section
1, these apps generally broadcast rotating identifiers over Bluetooth. When someone tests positive for COVID-19, information is disseminated that allows all devices that the tested person was near that they likely were exposed to COVID-19. We note that similar technology-based contact tracing solutions were proposed to combat the 2014 Ebola outbreak, but were not widely deployed [
20,
73].
Fairness and the Social Context of Data-driven Decision Aids. A related problem to the one we study in this work is the social impact of outsourcing critical decisions to data-driven decision aids. Prior work has shown that data-driven decision aids, and artificial intelligence more broadly, disproportionately harm marginalized populations and re-enforce existing social inequities. For instance, ProPublica found that the COMPAS algorithm used to predict recidivism in the United States consistently overstated the risks for racial minority groups [
4]. More recent work has continued to highlight and explore the way automated decision making can harm both individuals and communities, e.g., [
11,
12,
57]. That work has spurred research evaluating the fairness of these decision systems, e.g., [
37,
50,
51,
74,
77,
81,
89]. Issues of fairness and potential harm are incredibly important and should be considered deeply when deploying data-driven decision aids such as those discussed in this work. These issues are orthogonal, but complementary, to the issues of privacy and accuracy we consider in the work.
3 Methodology
We conducted two surveys intended to answer our research questions. Our first survey examined how inaccuracies and/or privacy leaks impact respondents’ willingness to install a COVID-19 app (RQ1). Additionally, we gathered demographic information to understand how identity and life experience impact this decision (RQ2). With our second survey, we develop empirical models to predict how the amount of personal and communal health benefit, and degree of privacy risk, for a particular COVID-19 app affects people’s intent to install (RQ3).
Our questionnaires evaluate respondents’ willingness to install Bluetooth proximity-based contact-tracing apps, agnostic of app architecture (e.g., centralized vs. decentralized). As detailed in the Introduction, either centralized or decentralized apps may have accuracy issues or may experience privacy leaks [
10,
66,
68,
83,
86]. For example, regarding privacy, the MIT Pathcheck Foundation—which creates decentralized apps for multiple government jurisdictions using the Google-Apple notification framework—states that “while [contact-tracing] apps aim to obscure the identity of the person who is infected, accidental release of information sufficient to identify the person can occur on rare occasions, similar to accidental release of protected health information” [
66].
In this section, we briefly discuss our questionnaires, questionnaire validation, sampling approaches, analysis approaches, and the limitations of our work. All studies were IRB approved by a federally-recognized ethics review board. The full content for our first survey, including question wording, is included in Table
1 and the content for our second survey is included in Table
2.
3.1 State of COVID-19 Pandemic During Our Surveys
In order to properly contextualize our surveys for the reader, we briefly recall the socio-political context in which our surveys were conducted. Both surveys were conducted with respondents located in the United States during May 2020, just four months after the first recorded American COVID-19 case on 21 January, 2020. COVID-19 apps, like the ones that we study in this work, were first discussed broadly in April 2020 as a supplement to traditional contact tracing. By early May, the first contact-tracing apps had been deployed, like the North Dakota app, which had serious privacy flaws [
56,
58]. During this early phase of the pandemic, there was significant uncertainty about the practices and tools that would help mitigate the spread of COVID-19. For instance, the World Health Organization did not suggest that everyone should wear face masks until June 2020. When we conducted our surveys, most American states were in a state of emergency, and stay-at-home orders were starting to be lifted after several weeks. By the end of May, official statistics had recorded 100,000 American COVID-19 deaths [
14]. Thus, our surveys were conducted during a moment of significant distress and fear, during which COVID-19 apps seemed like they might become a significant tool in suppressing the spread of COVID-19.
3.2 First Survey (RQ1 and RQ2)
In our first survey we looked at how accuracy and/or privacy considerations might influence willingness to adopt (RQ1) and how the relative weight of these considerations might vary with respondent demographics (RQ2). We used a vignette survey [
6] to examine these questions. In a vignette survey, respondents are given a short description of a hypothetical situation and then asked a series of questions about how they would respond. This initial situation can also be supplemented with condition-specific descriptions that augment the vignette that is show to all respondents, which allows researchers to isolate specific, explicit differences between respondents in different branches. Vignette surveys have been shown to maximize external validity when examining hypothetical scenarios.
Questionnaire. We used a 2-by-2 between-subjects design to study accuracy and privacy concerns around contact tracing apps that collect two different types of data. We framed our questionnaire by asking respondents to imagine that there exists a contact-tracing app that uses Bluetooth proximity information (Proximity Scenario) or GPS location information (Location Scenario) to identify contact, assigning half of the respondents to the two groups at random. Each participant was then randomly assigned to either the control branch or the experimental branch, and given a series of branch-specific contexts. In the control branch, the respondents were, at random, assigned to one of the following three contexts: (1) the app has perfect accuracy, (2) the app has perfect privacy, or (3) the app has both perfect accuracy and perfect privacy. In the experimental branch, respondents were randomly assigned to one of three contexts: (1) the app has false negatives (“this app occasionally fails to notify you...”), (2) the app has false positives (“this app occasionally notifies you...when you actually have not been exposed”), and (3) the app might leak private information. For this final context, we asked about information leakage to four entities, drawn from prior work [
39]: “non-profit institutions verified by the government”, “technology companies”, “the US government”, and “your employer.” In both branches, respondents were asked “Would you install this app?” to which they could respond “Yes”, “No”, or, in the experimental branch only:“It depends on the [risk, chance of information being revealed, etc.]”. See Table
1 for a more detailed description of this questionnaire.
Validation. The questionnaire was validated through expert reviews with multiple researchers. Additionally, we included three attention-check questions—one general attention check and two scenario-specific attention checks—to ensure respondents understood the decision scenario.
Sample. We contracted with the survey research firm Cint to administer the survey to 789 Americans in May 2020; we quota sampled on age, gender, income and race to ensure the sample was representative of US population demographics.
Analysis. We answer RQ1 using
\(X^2\) proportion tests corrected for multiple testing errors using Bonferroni-Holm correction where appropriate, to compare responses to our different sets of questions. We answer RQ2 by constructing two mixed-effects binomial logistic regression models. Our
dependent variable (DV) is willingness to install the app, with “Yes” and “It depends on the risk” as a positive outcome and “No” as a negative outcome. We model responses to the accuracy and privacy questions separately, controlling for data type and entity in the privacy model, and both data and accuracy type in the accuracy model. We include as additional variables the respondents’ age, gender, race, internet skill (as measured using the Web Use Skill Index [
38]), level of educational attainment, party affiliation, and if the respondent knows someone who died due to complications from COVID-19. We include a mixed-effects term to account for our repeated measures design.
3.3 Second Survey (RQ3)
We use the data from this survey to predict people’s intent to install COVID-19 apps based on the amount of public health (infection rate reduction) and individual health (notification of at-risk status – e.g., accuracy) benefit, or privacy risk, of a hypothetical COVID-19 tracking app (RQ3).
Questionnaire. All questions were asked in the context of an architecture-agnostic Bluetooth proximity-based contact-tracing app, as the
type of information compromised (location vs. contacted individuals), as well as the
entity that could compromise the information (Employer, Government, etc.) had relatively little effect on willingness to install in our first survey (see Section
4.1).
Participants in this survey were randomly assigned to one of four branches: public benefit, false negative, false positive, or explicit privacy. As the degree of privacy risk for contact-tracing apps is not yet quantified, we drew estimates of privacy risk from respondents own perceptions. To this end, respondents in the first three branches were first asked to estimate the risk that information collected by this contact tracing app would be leaked (implicit privacy assessment), using a modified version of the Paling Perspective scale [
60] (see Table
2), which has been validated for use in eliciting digital security [
42] as well as health risk perceptions. Then, each respondent was given a context about the benefit (or privacy risk) of the contact tracing app. Each context provides a
baseline of the benefit, or privacy risk, for individuals who do not install the app (see Table
2 for per-branch baselines). All survey respondents were then asked “Would you install this app?,” with answer options “Yes” and “No”.
In the public benefit branch, respondents were given the context that individuals with the app installed were infected
X% less often (see Table
2 for values of
X) than those who did not have the app installed. For both the false positive and false negative branches, respondents were told that individuals without the app would be notified of a COVID-19 exposure 1 in 100 times. Respondents in the false negative branch were told that the app has a false negative rate of
X%; respondents in the false positive branch were told that the app had a false negative rate of
\(0\%\), but a false positive rate of
X%. Finally, respondents in the explicit privacy branch were told explicitly of the privacy risk of the app:
X in 1,000 people who use this app will have their information compromised. They were then asked the same question as the false negative branch. See Table
2 for an overview.
Although we have used percentages above for brevity, no information in this survey was expressed in terms of percentages, following best practices from prior work on how to measure the acceptability of different levels of machine-learning system accuracy [
47] and prior work in health risk communication and numeracy showing that people reason more accurately with frequency formats than with percentages [
29,
30,
48,
72]. Similarly, we avoided technical terms like “false negative” and “false positive,” instead describing the practical ramifications of the situations. See Table
2 for questionnaire wording.
Validation. The questionnaire and respondent answers were validated in the same way as Survey 1.
Sample. We surveyed 3,826 Amazon Mechanical Turk workers located in the United States. These workers were split into different survey branches (see above) so all results sections note the number of responses analyzed. There are always concerns about the generalizability of crowdsourced results [
41,
59,
70]. Recent work has shown that Amazon Mechanical Turk results generalize well [
59,
62], including in the security and privacy domain [
70]. To further address generalizability concerns in our application area in particular, we also conducted Survey 1 on Amazon Mechanical Turk. We found only one significant difference,
2 with small effect size between the two samples. As the goal of our work is not to provide precise point estimates of phenomena in the entire U.S. population [
7], given the the quantitative nature of the RQ3 survey, the sample size required, and the lack of difference in the most relevant questions (about accuracy and privacy vs. leaks to particular entities), and prior work on the validity of crowd-sourced results, we chose to proceed with using Amazon Mechanical Turk.
Analysis. We develop predictive models using the data obtained in this survey via binomial logistic regression analysis.
3 For benefits and accuracy, we construct models with willingness to install as the dependent variable, the amount of benefit or error (e.g., chance of FN) as the independent variable, and the respondents’ perceived implicit privacy risk for the app as a control variable. To evaluate the impact of privacy on decision making we construct a binomial logistic regression model with willingness to install as the dependent variable and explicit privacy risk as the independent variable. To further evaluate the impact of privacy on decision making and to validate our measurement of the implicit privacy perceptions, we use
\(X^2\) proportion tests to compare the proportion of respondents willing to install given an FN rate in the implicit and explicit privacy conditions.
Model Validation. We did not have a strong prior knowledge as to how exactly the outcomes varied with the quantitative values for the benefit, error, or risk for the app. Thus, we considered models of varying complexity (polynomial degree) to account for the observed responses. 80% of the data were used to fit and select a best model based on the average RMSE estimates across 5-fold cross validation at each of 10 potential polynomial degrees; final performance is quoted on the remaining 20% test set. We use first-degree models, which offered the lowest RMSE.
Important Considerations for Interpreting Models of Human Behavior. Models of human behavior notoriously achieve relatively low accuracy compared to many models developed in computer science and related fields, with
\(R^2\) for “good” models of human behavior approximating 30–40% explanation of variance and prediction accuracy between 60 and 70% [
31,
33,
49,
69,
90]. This lower accuracy is due to a number of factors including high levels of variance in human behavior, estimation of performance on single-period-in-time measurements—which under-count predictive power for repeated decisions such as app installation—and compounded behavioral and self-report biases (see Limitations for more detail on how we mitigate these biases) [
1,
43].
3.4 Limitations
As with all surveys, the answers represented in these results are people’s self-reported intentions regarding how to behave. As shown in prior literature on security, these intentions are likely to align directionally with actual behavior, but are likely to over-estimate actual behavior [
32,
71]. As described in the questionnaire validation sections above, we took multiple steps to minimize self-report biases. The goal of this work is to show
how willingness to adopt may be influenced by privacy/accuracy considerations, and thus model results should not be interpreted as exact adoption estimates.
4 Results
4.1 Both Accuracy & Privacy Influence Whether People Want to Install a COVID App
Flaws in both accuracy and privacy significantly
4 relate to respondents reported willingness to adopt COVID-19 apps, as shown in Figure
1. When considering apps purported to be perfect, we find that respondents do not significantly differentiate between perfect privacy vs. perfect accuracy vs. perfect accuracy & privacy (
\(X^2\) prop. omnibus test,
p = 0.178).
We find that significantly (
\(X^2\) prop. test,
\(p\!\lt \!0.001\)) more people say their decision would depend on the
amount of accuracy error than the amount of privacy risk (the yellow bars in Figure
1). We examine in more depth how the amount of accuracy error influences reported willingness to adopt below.
Respondents differentiate between types of accuracy error. When provided no information about the false positive rate, 8% fewer respondents reported being willing to install an app with false negatives compared to one with false positives or one with privacy leaks to any of the entities examined (\(X^2\) prop. tests BH corrected, both with \(p\lt 0.01\)).
Finally, focusing on privacy leaks, we find that respondents’ reported willingness to install did not significantly differ (\(X^2\) prop. tests BH corrected, all with \(p\gt 0.05\)) based on what data the app might leak to a particular entity, except for hypothetical leaks to the respondents’ employer. Only 23% of respondents were willing to install an app that might leak their locations to their employer while 31% were willing to install an app that might leak information about who they have been near (their proximity data) to their employer.
4.2 Some Americans Weigh Accuracy or Privacy Considerations More Highly than Others
In order to examine whether some Americans weigh accuracy or privacy considerations more highly than others, we construct two mixed-effects logistic regression models as described in Section
3.2. We evaluate model fit by building our model with an 80:20 train test split,
5 however we note that these models are intended to provide
descriptive insight into how respondents’ weigh accuracy and privacy considerations and should be interpreted as such. In the remainder of this section we report descriptively on a model built on the full data set.
First, considering respondents’ willingness to install apps with accuracy errors (Table
3 (left)), we validate that even when controlling for demographic variance, respondents were more comfortable with false positives (spurious exposure notifications) than false negatives (missed notifications after an exposure). Additionally, emphasizing the relevance of ego-centricity [
27] in people’s considerations of data-driven decision aids, we find that those who know someone who died from COVID-19 were more likely to report being willing to install an app with accuracy errors than those who do not know someone who died.
Second, considering privacy risk (Table
3 (right)), we find that respondents were more comfortable installing an app with potential privacy leaks to a nonprofit organization verified by the government than an app with potential leaks to any other entity (their employer, a technology company, or the U.S. government). In this controlled model we find that the type of data leaked (proximity vs. location data) did not significantly affect reported willingness to install (see the previous section for a more detailed examination of this point).
Women and respondents who are younger are less likely to report that they would install an app with privacy errors. The gender finding aligns with past work showing that women may be more privacy sensitive than men [
44,
67]. Further, Democrats are more likely than Republicans to report intent to install an app with privacy risks, potentially reflecting the increasingly politicized nature of privacy [
91] and the COVID-19 pandemic itself at the time of the survey.
Finally, those who have higher internet skills are more willing to install an app that has either errors in accuracy or privacy leaks. This is in line with findings from prior work showing that those with higher skills are more willing to install COVID-19 apps in general [
39], perhaps due to greater confidence in their ability to install and use these apps [
19,
40].
4.3 Amount of Public Health and Individual Benefit Influence Willingness to Install
In the findings above, we validate that the individual considerations of accuracy and privacy both impact reported willingness to install. Further, we find that amount of accuracy error is especially important to people’s decision making about whether or not to use a data-driven decision aid: at least 30% of our respondents reported their installation decision depended on the amount of error.
Thus, in our second survey we examine whether we can predict how a quantified
amount of public health (i.e., infection rate reduction) or individual benefit (i.e., FN and FP rates) impact intended adoption rates. Figure
2 provides an overview of these findings. To examine the relationship between amount of benefit and willingness to install beyond visual inspection, we construct logistic regression models as described in Section
3.3.
With respect to public health benefit, on 20% test data, we can predict reported willingness to install the app with 63.6% accuracy (null model accuracy: 54.0%, threshold = 0.5).
6 We find that, for every 1% reduction to infection rate offered by the app, respondents are 4% more likely to report that they would install (O.R. 95% CI: [0.95, 0.98],
\(p\lt 0.001\)).
With respect to accuracy, we can predict reported willingness to install a COVID-19 app based on the false negative rate with 70.0% accuracy (null model accuracy: 52.0%, threshold = 0.5) and based on false positive rate with 62.5% accuracy (null model accuracy: 41.0%, threshold = 0.5). For every 1% increase (O.R. 95% CI: [1.01, 1.02], \(p\lt 0.001\)) in app sensitivity, respondents are 1% more likely to report that they would install. For every 1% decrease (O.R. 95% CI: [0.98,0.99], \(p\lt 0.001\)) in false positive rate, respondents are 1% more likely to report that they would install.
4.4 Amount of Privacy Risk Also Influences Willingness to Install
Next, we model willingness to install based on explicit privacy risk. We can predict willingness to install based on privacy risk with 65.5% accuracy (null model accuracy: 34.5%, threshold = 0.5). For privacy risk (recall that magnitude of privacy risk is far smaller than benefit rates or accuracy errors), we observe that a 52% decrease in privacy risk results in a 1% increase in intent to install (O.R. 95% CI: [0.3, 0.74], \(p\lt 0.01\)).
Further, we confirm the relevance of privacy risk, implicit or explicit, in respondent decision making – and validate the equivalency of explicitly stated privacy risk and our measurements of respondents’ implicit privacy perceptions – by comparing the proportion of respondents who were willing to install a COVID-19 app given an explicit statement of privacy risk (privacy risks were drawn from the portion of the implicit risk distribution reported by the majority of respondents) vs. their own implicit perception. We find no significant difference between the proportion of respondents who intend to install an app with a given false negative rate when relying on their own implicit privacy assumptions vs. an explicit statement of the risk of privacy leak.
Finally, further confirming the relevance of benefits, accuracy, and privacy, modeling respondents’ response to benefit/error rates (e.g., infection rates, FN rate, FP rate)—including implicit privacy risk as a control in the regression—significantly improves model fit in all three models (likelihood ratio tests [
65],
\(p\lt 0.05\) for all models).
5 Discussion AND Conclusion
In this work we statistically analyzed two surveys of a total of 4,615 Americans to better understand people’s willingness to use data-driven decision aids when these systems are imperfect. Understanding how to encourage adoption of these systems is critical as the importance of these systems grows.
We specifically focus our study on Americans’ intent to adopt data-driven decision aids in the context of COVID-19. We do so for three reasons. First, virtually everyone has been personally impacted by the global pandemic. As such, it is easy for respondents to imagine themselves as decision makers. This overcomes limitations of prior work in which subjects were asked to imagine themselves in an unfamiliar situation [
27]. Second, the accuracy and privacy risks we ask respondents to imagine are far from theoretical [
56,
58]. Third, given the immense need for reductions in COVID-19 infections, it is critical to understand how well COVID-19 data-driven decision aids need to perform to meet both individual and public health needs.
Our results offer clear evidence that both inaccuracies and privacy risks can impede the adoption of important data-derived decision aids. Specifically, we find that:
•
(RQ1). Users are more willing to install COVID-19 apps that have either perfect accuracy, perfect privacy, or both. This finding clearly motivates our other research questions, as it is natural to inquire how perfect the app must be.
•
(RQ2). Respondents with different socio-demographic characteristics, life experiences, or technological comfort consider accuracy and privacy differently. Namely, we find individuals with higher internet skill and those that knew someone who died from COVID-19 were more willing to tolerate accuracy errors. Additionally, women and younger respondents were less likely to install an app that risked disclosing private information. Users with higher internet skill, on the other hand, were more likely to install such an app. These results remind developers that the life experiences and proclivities of certain groups can significantly impact the adoption rate of a data-driven decision aid like a COVID-19 app. For instance, significantly higher tolerance for accuracy errors among respondents who knew someone who died from COVID-19 might indicate that these users considered the inconveniences that the app might pose to be worth it. Similarly, women’s lower tolerance for privacy risks is in line with prior work; this preference persists despite the pressing public health need.
•
(RQ3). Finally, we found that people care about the amount of privacy risk, in addition to amount of accuracy and/or benefit; but are largely neutral regarding to whom their data might leak. While proactively computing the risk of a data leak may be difficult when developing an app, our results indicate that increased efforts on the part of app developers can increase users’ trust and willingness to adopt. Put another way, it is important for developers to invest resources to decrease the chances of a data leak, even if they are not able to reduce the chances of data leakage to zero. Similarly, efforts to improve the accuracy of a system or efficacy of a system in addressing its stated goals, even by a few percentage points, will reap benefits in adoption.
Interestingly, users are less willing to make use of these systems in the presence of false negatives than false positives. In our study context of COVID-19, false negatives represent a threat to personal and communal safety, whereas false positives are merely inconvenient (requiring additional testing or quarantine). Of course, not using a COVID-19 app will guarantee that the user will never see any possible exposure notifications that the app would have generated. In that sense, not installing a COVID-19 app gives the user a 100% false negative rate and a 0% false positive rate. However, respondents may have assumed that installing an additional app on their device might have had other side effects, like privacy risks or decreased battery performance, that would outweigh the benefit of installing an app that fails to accomplish its main goal of producing notifications.
Since we first conducted our surveys, the COVID-19 pandemic has started to slow in the United States. Despite the initial excitement around COVID-19 apps, COVID-19 apps largely failed to live up to their promise [
80,
82]. Reports indicate that accuracy problems may have contributed to the lack of uptake, along with significant logistical and policy failures [
80]. While our work cannot make strong causal claims as to why COVID-19 apps were a failure, our results indicate that lax privacy controls in early deployed apps [
56,
58] and concerns over the app’s accuracy or efficacy may have reduced enthusiasm for COVID-19 app adoption.
5.1 Implications Beyond COVID-19 Apps
While our results offer insight into data-driven decision aids in the context of a global pandemic, more broadly, they speak to the importance of both accuracy and privacy in modeling data-driven decision aid adoption. Although the goal is always to create perfect decision aids, inaccuracy is inherent. Moreover, because of the data-hungry nature of many machine-learning based decision aids, it is doubtful that invasive data collection will disappear in the near future. As such, our work suggests that decision aid designers should address these factors explicitly and consider them closely when deploying their systems. Critically, developers must remember that justifying low accuracy or high risk of privacy breaches with high utility and social good is unlikely to be a successful strategy for earning user trust. Especially for data-driven decision aids that need widespread adoption in order to be useful, ensuring that the system maintains appropriate accuracy thresholds and addresses privacy risk before deployment is critical to achieving and maintaining adoption rates.