1 Introduction

Healthcare is one of the human activities in which digital technologies, in particular artificial intelligence (AI), have large potentials and are expected to lead to momentous changes in the near future. Healthcare is also one of the areas of human activity that require the most extensive ethical demands and deliberations. Artificial intelligence (AI) is frequently described as a “disruptive technology”. In other areas, this term is often taken to have an, at least partly, positive connotation. It signals that even though there might be some initial negative effects, the new technology will lead to future improvements. In the domain of healthcare, however, such disruption does not have much appeal despite the allure of long term benefits. We would not accept a development that puts healthcare in a state of disarray, even if it plausibly would lead to a better future state. All this makes the ethical appraisal of potential future uses of digital technologies in healthcare particularly important. In this article, we will provide a systematic overview of the major ethical issues that have to be dealt with in order to ensure that the introduction of artificial intelligence (AI), machine learning (ML) and other advanced digital technologies is beneficial to patients throughout the introduction phase as well as when they are an integrated part of the healthcare system.

We use Artificial intelligence (AI) as a broad term, covering various machine-based methods to perform tasks that were previously thought to require human intelligence. Machine learning (ML) is a method to achieve AI. It uses a training set of texts, images or sounds with and without some specific properties, classified by experts. The programme identifies and classifies features of these inputs, and searches for correlations between the features it has identified and the classifications made by experts. In doing this, the programme develops an ability to make such classifications on its own. This is then tested against a set of similar texts or images. If the programme is successful in these tests, then it is considered to be capable of making such classifications.

We will divide our discussion into four parts. The first three of them cover the use of digital technologies in three major types of healthcare activities, namely diagnosis, treatment decisions, and treatments. The fourth part discusses the use of digital technologies in activities that occur both in healthcare and in other social sectors, such as administration, communication, and archiving. As this article is also an introduction to the TC ‘The Ethics of Digital Healthcare’, this is followed by a brief conclusion and a summary of the other contributions to this special issue.

2 Diagnosis

Diagnosis and treatment decisions are closely related activities. In some acute situations, it can be difficult to draw a sharp line between the two since a diagnosis leads standardly and almost automatically to predetermined therapeutic measures. However, from an ethical point of view, there is an important distinction to be made, which relates to the patient-physician relationship. According to the modern ethical consensus, treatment decisions should be made, or at least consented to, by the patient (unless she lacks the capacity to do so). In contrast, diagnosis is the physician’s responsibility. Although information, proposals and queries from the patient should be carefully taken into account, diagnosis is not supposed to be negotiated with the patient. This is, for good reason, taken for granted in the biomedical literature, where diagnosis is treated as something to be disclosed to, not negotiated with, the patient (Beauchamp & Childress, 2012).

Improved diagnosis can be an important means to improve healthcare outcomes. For instance, it has been estimated that 5% of the adult patients in the US receive an incorrect diagnosis, and that misdiagnosis contributes to about 10% of patient deaths in the US (National Academies of Sciences, Engineering, and Medicine, 2015). Some diagnostic procedures impose risks on the patient. These risks have to be weighed against the expected therapeutic gains from a correct diagnosis. In addition, the outcome of a diagnostic procedure can fail in two major ways: low sensitivity and low specificity. Low sensitivity gives rise to false negatives, which can deprive patients of treatment that they need. Low specificity gives risk to false positives, which can lead to unnecessary treatments with significant side effects as well as unnecessary stress and anxiety. In the development of diagnostic methods and practices, it is essential to reduce these two risks as far as possible. Since reduction of one of these two risks often leads to an increase in the other it is often also necessary to find a balance between these two objectives, based on their relative importance for the patient.

The evaluation of diagnostic methods is often on a less certain ground than that of therapeutic methods. There is a gold standard for the evaluation of treatment methods, namely well-conducted clinical trials comparing otherwise similar groups of patients receiving different treatments (Hansson, 2014). For diagnostic methods, however, there is not always a clear-cut standard to calibrate against. In many cases, a comparatively low-invasive diagnostic test, such as a blood test, can be calibrated against a more invasive test that is considered to be more reliable, such as a biopsy. In some cases, a diagnosis can be verified or falsified at a later stage of disease development, or post mortem. However, there are also diseases—not least in psychiatry—whose very definitions are based on rather uncertain diagnostic criteria that may not be well suited as standards to calibrate other diagnostic methods against.

The ultimate purpose of diagnosis is to guide treatment decisions in a way that is beneficial for the patient. Therefore, the gold standard for a diagnostic method should arguably be whether it can be shown in clinical trials to be lead to better medical outcomes. Such trials are seldom carried out, but it can be argued that they should be performed more often, in particular with AI-based diagnosis that may differ from conventional methods in ways that are difficult to foresee.

We will focus on two types of AI-based diagnosis that are either already in clinical use, or expected to be taken into use in the near future, namely image interpretation and text interpretation.

2.1 Image Interpretation

Image interpretation is currently the most well-developed and used AI application in healthcare. It is widely used for instance in radiology, pathology, ultrasonography, dermatology, ophthalmology (diabetic retinopathy), and endoscopy (Sand et al., 2022; Grote & Berens, 2020; Topol, 2019). The images to be analyzed mostly depict the inside of the body but there are also application to the outside of the body, which can often be more privacy invasive. For instance, videos of facial expressions have been used to detect pain in patients with dementia or cognitive impairment (Rogers et al., 2021; Hughes et al., 2022). All advanced medical image interpretation applications are based on ML, an area in which there was considerable progress in the 2010s (Mitchell, 2019). In several studies, ML programs have outperformed human experts in the radiological or pathological diagnosis of a number of specific diseases (Bulten et al., 2020; Bejnordi et al., 2017).

However, several problems arise in the clinical use of these technologies. First of all, currently available algorithms are only constructed to discover one particular disease (or a few diseases). Consequently, serious unexpected findings that a human expert could discover and report can go undetected. For instance, in one study a neural network outperformed radiologists in detecting pneumonia in frontal chest X-ray images (Wang et al., 2017). But, since a radiologist’s task is much broader than detecting pneumonia when s/he interpret chest radiographs, such studies do not show that neural networks can do better than radiologists in their actual tasks. In addition, studies have shown that AI tools primarily tend to perform better than junior doctors and non-specialist (Shen et al., 2019; Brzezicki et al., 2020). For another example, consider a program that is capable of diagnosing diabetic retinopathy but cannot discover other eye diseases such as glaucoma or macular degeneration. Such a program cannot replace an inspection by an ophthalmologist. That said, it can nevertheless be useful for screening purposes in poor areas with no access to ophthalmologists (Rogers et al., 2021). Needless to say, we can expect diagnostic tools with less limited capabilities to become available in the future.

Generally speaking, an AI tool that is successful in comparison to human experts in a small, selected task such as radiological diagnosis of one particular condition, need not automatically lead to improvements for patients if introduced into clinical practice. There must be a good plan for how to introduce it into the clinical workflow, and the effects on patients of its introduction have to be carefully evaluated (Topol, 2019).

One of the major problems in ML is that if the test set is biased, then the machine learning program will perpetuate the bias. For instance, an algorithm that is widely used in the US was shown to assign a lower risk level to African American patients than to white patients with the same condition (Obermeyer et al., 2019). On the other hand, properly trained AI programs can protect patients against the bias of healthcare professionals. For instance, people who are often exposed to facial expressions of pain tend to become “immunized”, and fail to perceive the other person’s expression of pain. Such underestimation of patient pain can lead to failure to administer sufficient pain relief (Prkachin, 2011). An AI will not be sensitive to this effect, and can therefore contribute information that leads to more adequate treatment for these patients.

One of the most serious problems with ML algorithms is that when they fail, they can make mistakes of a kind that humans usually do not make. For instance, an image interpreter based on machine learning can suddenly confuse a bird with a car, or a butterfly with a washing machine (Alvarado, 2022; Hendrycks et al., 2021). The network is unable to discover and self-correct these errors, and it can therefore continue to yield absurd results. This is different from how humans fail, and it is therefore also a reason not to treat artificial “agents” of this kind in the same way as human experts. Over-reliance on diagnoses obtained from ML algorithms can be dangerous, since these algorithms usually only have access to a limited data set, such as a set of radiological images. The physician, on the other hand, has access to additional information, such as findings from physical examination of the patient, laboratory tests, and, not least, the patient’s own story. A diagnosis based exclusively on only one type of information, for instance radiological evidence, may of course be adequate in some cases, but there are also cases when it is essential to take other information into account as well. One of the potential problems with overreliance on AI-based diagnosis is that physicians may fear legal repercussions if an AI diagnosis that they reject turns out to have been correct. This can lead to defensive medicine, i.e. medical interventions including unnecessary diagnostic procedures that are performed to protect the physician from legal problems, rather than the patient from disease (Grote & Berens, 2022). However, it should also be emphasized that AI-based diagnosis systems can be constructed to avoid biases that physicians are prone to.

Due to the risk that the performance of ML systems deteriorates in the process of additional learning, regulatory agencies such as the FDA only approve “locked” systems. This means systems that apply what they have learned previously but have no function for continuous additional learning during clinical use (Ursin et al., 2022). This is in line with how drugs are regulated. An approval of a drug does not cover future changes of the composition of that drug. One further reason to use only locked systems is that this can contribute to keeping down the energy consumption. ML can be extremely energy consuming and thus undermine sustainability commitments (Van Wynsberghe, 2021).

Like other diagnostic technologies, diagnosis based on AI can give rise to changes in diagnostic criteria. For instance, if AI is able to discover radiological signs of a disease at an earlier stage than what radiologists can do, then that can lead to more patients being diagnosed with the disease at an early stage. This can be either a positive or a negative effect, depending ultimately on its consequences on the health of these patients. If patients gain from being treated as early as possible, then the earlier diagnosis will be an advantage. However, there may also be cases when the “early signs” do not indicate any need of treatment, and will only develop into a disease needing treatment in some of the patients. In such cases, a diagnosis at an earlier stage can lead to unnecessary treatments, potentially with negative side effects as well as increased stress and anxiety for the patients.

Much of the ethical discussion on AI in medicine has focused on the interpretability and explainability of diagnoses obtained from ML programs such as image interpreters. A study of policy papers issued by radiological organizations showed that those organizations strongly emphasized the importance of explainability of AI-based radiological diagnoses. The reason for this seems to be that radiologists are responsible towards patients for being able to explain the grounds of a diagnosis and for being able to double-check the outcomes of automatic diagnostic processes (Ursin et al., 2022). It is interesting to contrast the demands for explainable AI against those for current non-AI diagnostic practices. Diagnostic methods do not in general have to be explained or explainable to be used. If it can be shown that persons with an elevated blood level of some protein have an increased risk of thrombosis, then that can justify preventive anticoagulant medication. If the frequency of thrombotic disease decreases in patients who receive this treatment, then measurement of the blood protein can be adopted as a diagnostic test, even if there is no explanation of its connection with thrombosis (London, 2019). It should also be noted that image interpretation by humans is not always fully explainable. Sometimes human radiologists can identify a region on the image that is problematic, but they cannot explain in what way it is problematic (Felder, 2021). This is quite similar to some ML programs, which indicate the area in the picture that justifies the diagnosis, but give no further explanation of what is problematic. The explainability of human decisions often consists in our ability to rationalize them post fact. In this sense, decisions by machine learning systems may also be explainable (Lipton, 2018; Rajkomar et al., 2018).

Based largely on analogies with non-AI diagnostic methods, some authors have proposed that AI-based diagnostic methods should not be required to be explained in order to be put to clinical use (London, 2019). However, criteria or standards for explainability of AI-based diagnostic methods need to be different for physicians and for patients (Grote & Berens, 2022). Patients usually do not need or ask for explanations of how tomography, gene sequencing, or measurement of intraocular pressure works. They do, however, expect to be told the implications of the diagnoses obtained with these procedures for future health and potential treatment. The introduction of AI-based diagnosis does not necessarily lead to patients needing more mechanistic knowledge. Many have a strong intuition that there is an (ethically) important difference between, on the one hand, low explicability in connection with AI and, on the other, low explicability in connection with a human physician who, for example, might not be able to explain exactly why a region in an image is problematic. It might, however, prove challenging to articulate exactly what the difference is.

Thomas Ploug and Soren Holm have proposed that emphasis should be put on one particular aspect of the explainability of healthcare technology, namely its contestability. This means that it should be possible for a patient to contest the diagnosis obtained with AI (just like they can if the diagnosis was made without the use of AI). In their view, “individuals have a right to protect themselves against discrimination, and therefore should be granted a right to contest bias in AI diagnostics” (Ploug & Holm, 2020 page 3). To achieve contestability, they say, a certain degree of explainability is required. However, currently and in the foreseeable future, patients will not receive a diagnosis directly from a ML program. Instead that program will be one of the sources used by the physician who makes the diagnosis. It is this overall diagnosis, rather than the output of one of the tools used by the physician, that the patient should have the right to contest (for instance by seeking a second opinion).

It has been argued that explainability for the patient is needed to achieve autonomy. This is because “autonomy requires to know about the situation one is in and why others have assessed it in such a way” (Ursin et al., 2022). However, no one seems to claim that for this purpose, the patient has to understand how tomography or machine learning works. What she needs to know is how reliable the information from a CT scan or an image interpreter based on machine learning is.

For physicians, requirements of explainability are obviously much higher than for patients. Physicians need to have knowledge of potential risks of misdiagnosis and other weaknesses in all diagnostic tools that they use, including AI-based tools. Such knowledge is difficult to achieve if the mechanisms of a diagnostic tool are unknown or undisclosed. It should be the responsibility of manufacturers of AI-based diagnostic tools to provide physicians with all attainable information that can be helpful in interpreting the diagnoses produced by these tools. Importantly, just like the outcomes of any other diagnostic procedure, the outcomes of AI-based diagnostic tools need to be interpreted. Physicians need training in how to do this. The common description of AI tools as “agents” might lead to an underestimation of this need. When a physician receives information from a human agent, such as another physician or a nurse, then a dialogue that helps in the interpretation of the information is possible. This cannot be expected from currently available “artificial agents”.

Diagnostic skill increases with training. This applies not least to image interpretation (Esserman et al., 2002). Unfortunately, the reverse relationship also holds: diagnostic skills decrease with decreased training. Therefore, it can be expected that the image interpretation skills of radiologists and pathologists will decrease if image interpretation is completely transferred to AI. Such deskilling could be detrimental, since it would lead to a loss of the competence needed to reexamine and if necessary, overrule the AI system’s diagnoses. It has therefore been proposed that radiologists should uphold even such image interpretation skills that AI systems can perform reliably (Sand et al., 2022). The same need applies to other clinical specialists whose diagnostic tasks will be automatized.

Some medical images are highly privacy-sensitive. This applies for instance to photos of disease-affected parts of the body, taken for dermatological assessment, and videos of facial expression that are used to diagnose pain in patients unable to tell healthcare personnel whether they feel pain (Hughes et al., 2022). Like all personal medical information, these images should be protected against unauthorized access.

2.2 Text Interpretation

Artificial intelligence can also be used for scanning and interpretation of texts, such as medical records. This usually involves ML based on clinical records from a large number of patients. The process is similar to that applied to images. After analysing a “training set” of medical records the machine will recognize patterns that make it capable of providing diagnoses or diagnostic proposals.

The need for a large number of patient records in the learning phase is problematic, since informed consent from each individual patient would be cumbersome to acquire. In one famous example, a massive amount of patient data was transferred from the Royal Free Hospital to a private company without informed consent. The company then used it to develop a commercially available programme using text analysis (Morley et al., 2020). It is a plausible scenario that patient data from public healthcare will be made available for free to private companies, which will then use this data to develop text-based diagnostic tools that they subsequently sell back to public healthcare at a high price. Another serious problem is that the use of patient data of low quality, for instance patient records with varying degrees of detail and inconsistent terminology, can result in the incorporation of errors and biases into AI tools that are based on these data, leading to the perpetuation of inaccurate diagnostic practices and cementing of bias (Svensson & Jotterand, 2022).

2.3 Risk for Future Disease

One possible diagnostic use of AI is to predict a patient’s potential risk of future diseases. Depending on the information available to the system, such as laboratory data and notes made in the medical record about living conditions, occupation, previous diseases etc., an AI program can produce risk assessments that may lead to individualized recommendations. Such advice can be communicated to the patient either by the system itself or by a healthcare professional. On the positive side, this can lead to improved preventive services to patients, but on the negative side it can also lead to intrusive communications that are perceived by patients as meddlesome, disrespectful and manipulative (Grote & Berens, 2020). The communication of such information to persons who have not asked for it is ethically problematic, in particular if the predicted risk is low and no effective preventive measures are known. An early diagnosis of a serious disease for which there is no therapy can harm patients by giving rise to depression and other psychiatric problems (Ursin et al., 2021).

It has been shown to be possible to detect persons with some types of psychiatric problems with AI-based scanning of social media. In particular, persons with a high risk of depression can be discovered by automatic analysis of the pictures and texts on their social media accounts (Chancellor & De Choudhury, 2020). This can be a way to offer early treatment, for instance against depression, in cases that would otherwise not have been detected. However, the unsolicited scanning that this requires is highly problematic from an ethical point of view. Diagnosing a mental disorder is a medical intervention that requires consent by the concerned individual and social media users cannot be assumed to consent to such procedures unless they have done so explicitly (Laacke et al., 2021).

2.4 Treatment Decisions

The ethical problems in diagnostic procedures that were discussed in the previous section are also present in treatment decisions. In addition, treatment decisions usually involve issues of informed consent, which can to some extent be more difficult when AI is involved. Patient autonomy can only be achieved if patients receive sufficient information and support before giving or denying informed consent to a proposed medical intervention (Starke et al., 2021). It also seems reasonable to put higher demands on explainability for patients when it comes to treatment decisions than in diagnosis, since in the case of treatment decisions, patients are decision-makers. However, some researchers have questioned whether vulnerable patients who are already under significant psychological stress, such as psychiatric patients with paranoid symptoms, really stand to gain from receiving detailed information about ML algorithms. Perhaps it just undermines their overall well-being by fueling symptoms? This has, of course, to be decided on a case-by-case basis (Martinez-Martin et al., 2018).

McDougall has warned that the use of AI in treatment decisions can lead to a new form of paternalism, “computer knows best”, which involves treatment decisions based on the values programmed into the AI tool (McDougall, 2019). This could lead to treatment recommendations that are based on a criterion, such as maximized survival time that has been programmed into the AI tool, to the exclusion of other values of importance for patients, such as quality of life. For instance, the IBM’s Watson for Oncology, which makes treatment recommendations for a selection of malignant disorders, builds its rankings of treatments only on survival rates (Debrabander & Mertes, 2022). One way to solve this is “value-flexible design” of the AI tool, or in other words a tool so designed that it can take different patient values into account. For instance, such an AI tool should be able to adjust to different priorities between life years and life quality (Debrabander & Mertes, 2022). However, patients can take many types of considerations into account when choosing between treatment options. An ML algorithm cannot be expected to react adequately to all potential patient values. It may therefore be preferable for the ML tool to rank alternatives according to expected survival time and add further considerations in the discussion between physician and patient. Arguably, this can be a way to make the risk weighing better match the preferences and values of the individual patient.

Several authors have described the use of health apps that reduce reliance on physicians as a form of “medical emancipation”, and freedom from physician’s paternalism (Sharon, 2017; Risling et al., 2017; Schmietow & Marckmann, 2019). However, emancipation should in this context arguably include freedom from disease, which means that health apps have to be efficiently health-promoting in order to be emancipating. It should also be noted that health apps are usually constructed by corporations seeking profit, and their economic interests need not always coincide with the user’s health interests (Segers & Mertes, 2022). Some health apps send data from a wearable device, mobile phone or computer to some corporate entity for analysis (Kühler, 2022). This raises important ethical issues, for instance, concerning security of the data, and whether and in that case how it will be used or sold for commercial purposes.

If health apps are used instead of consulting health professionals, then they can have negative effects on health, due to overreliance or misunderstanding of the app’s recommendations.

The use of AI has often been compared to a “second opinion” for a patient or to a physician’s discussion of a case with a colleague. This is misleading, since the usefulness of such consultations derives to a large degree from the actual dialogue and discussion, rather than just from a statement of an opinion. It is well-known that we humans are bad at finding faults in our own arguments, but much better at finding errors in arguments put forward by others. The mere statement of an alternative standpoint is usually not sufficient for us to reconsider our own view in an adequate way (Samuelson & Zeckhauser, 1988; Kahneman et al., 1991). Therefore, a treatment recommendation by an AI agent should not be treated by a patient in the same way as a second opinion from another physician, or by physician in the same way as a colleague’s opinion.

It is important to restrict the use of treatment-recommending ML systems to the specific tasks for which they have been empirically validated, and for which physicians have been trained to use them. Likely uses for which there is no such validation should be explicitly warned against (London, 2019). It is also important that clinicians are aware of their responsibility for decisions based on recommendations from AI tools.

It has been speculated that future “universal” systems for diagnosis and treatment recommendations may be able to diagnose and propose treatment against all diseases (Svensson & Jotterand, 2022). Such a system would likely give rise to difficult issues related to informed consent and physicians’ responsibilities. This is, however, very far from the AI tools that are available and under development today. In a recent article, Grote and Berens point out that while the technology undoubtedly will become more powerful we should not expect that to mean that the ethical issues will become less challenging. To the contrary, they are likely to become more complicated due to the complexities and functioning of, for example, Large Language Models (Grote & Berens, 2024).

3 Treatment

The ethical issues arising in the use of AI in actual treatment of disease are largely different from those arising in diagnosis and treatment decisions. They are also different in different types of treatments. In what follows, we will discuss four major uses of robotics and AI in treatments, namely robotic surgery, exoskeletons and wearable robots, virtual psychotherapy, and therapeutic robots.

3.1 Robotic Surgery

3D computer-assisted surgical planning has been used since the late 1980s. It has been further developed, and can be used to plan and simulate cuts and customize cutting guides and implants (Knoops et al., 2019). Augmented reality (AR) can now be used to superimpose images of structures on the surface of an organ, providing the surgeon with a better view for instance of a tumor to be removed. (Gumbs et al., 2021). AR and Virtual reality (VR) technologies are also used in the training of both medical students and surgeons. Robots are also used in actual surgery. Robotic surgery usually means that a surgeon uses a robotic tool that she controls with her own hand movements, but with elimination of tremor. Robotic surgery is particularly useful in minimally invasive surgery and in surgery requiring a high degree of precision.

The future of robotic surgery is often discussed in terms similar to those for self-driving vehicles, with a series of levels with decreasing human influence (O’Sullivan et al., 2019). Currently, robotic surgery is on a low level on this scale, with full continuous human control. However, some surgical subtasks, such as suturing and resection may be possible to automate at a higher level in the near future (Ficuciello et al., 2019).

Patient safety is an urgent issue in robotic surgery. One major concern is that if robotic surgery is performed at a distance (tele surgery), then disruption of the connection can endanger the patient (O’Sullivan et al., 2019). Plans and training for dealing with such situations are therefore a sine qua non. Another serious challenge is that robotic surgery can be hacked, potentially by intruders intending to harm the patient (Bonaci et al., 2015).

In the long run, if automatized robotic surgery replaces much of conventional surgery, it may result in deskilling of surgeons. This may have negative consequences for patients, just like the potential deskilling of radiologists and pathologists discussed above (Ficuciello et al., 2019). Future robotization of surgery can also necessitate a reconsideration of the responsibility for surgical failures, perhaps leading to a shift in responsibility from surgeons to manufacturers of surgical robots and associated software (O’Sullivan et al., 2019).

3.2 Exoskeletons and Wearable Robots

Exoskeletons and wearable robots are currently mostly used as training devices in gait training, balance training, and other training of the lower or upper limbs. For instance, an exoskeleton can help the patient with balance so that she can focus more on gait (Read et al., 2020). The exoskeleton can be used as a tool by a physiotherapist, or in some cases by the patient in a session not led by a physiotherapist (Monoscalco et al., 2022). In the future, exoskeletons or implants with the same function can potentially be used permanently, for instance to enable a patient to walk. Such permanently worn devices and exoskeletons can be essential for the patient’s ability to live a normal life. They can also have a large influence on the users’ perception of themselves. It will therefore be unethical to withdraw the patient’s access to the technology (Gilbert et al., 2023; Greenbaum, 2016; Bissolotti et al., 2018; Hansson, 2021). This can create problems in healthcare systems where a patient’s continued access to healthcare measures depends on her private economic situation. For instance, if unemployment, through such economic mechanisms, leads to loss of the ability to walk, then the healthcare system can be seen as lessening the patient’s chances of future employment as well as substantially decreasing the patient’s overall quality of life.

The safety of exoskeletons is a large and complex issue. There are considerable cybersecurity concerns. Furthermore, technical failures of wearable robots can lead to problematic movements. Patients can even risk toppling over and injuring themselves and/or others. In addition, some users may develop technological over-reliance, which can lead to potentially dangerous movements (Kapeller et al., 2020). Satisfactory solutions to these and other safety issues are probably the most important obstacle that need be overcome in order to make the introduction of exoskeletons and wearable robots into routine healthcare ethically acceptable.

3.3 Virtual Psychotherapy

AI programs have been developed that function as virtual psychotherapists, programmed to help users to deal with their emotions and to reduce anxiety. Studies indicate that such programs can reduce depression and anxiety and support continued use of online exercises (Sachan, 2018; Fitzpatrick et al., 2017). However, criticism has been raised against non-human psychotherapy. Studies have shown that the relationship between patient and healthcare professional is particularly important in mental healthcare, and the quality of this relationship is important for treatment outcomes (Torous & Hsin, 2018). On the other hand, some authors have claimed that virtual psychotherapy can have advantages over human psychotherapists. It has been argued that there are “clear benefits of having a virtual or robotic therapist that is always accessible, has endless amounts of time and patience, never forgets what a patient has said, and does not judge” (Fiske et al., 2019, p. 4). There is an obvious need for a critical discussion of this list of advantages. For instance, it is not necessarily beneficial for a patient to spend “endless” time with a (human or robotic) therapist.

Virtual psychotherapy will probably be much less expensive than therapy provided by a human therapist. If virtual psychotherapy is offered only to patients for whom evidence-based criteria give reasons to expect a positive therapeutic outcome, then this use of artificial intelligence can make a positive contribution to healthcare. For example, it could allow more people to get help and shorten queues. There may also be patients who prefer virtual psychotherapy since they consider it to be shameful or stigmatizing to meet a human psychotherapist. However, there is also an obvious risk that virtual psychotherapy will be offered on too wide criteria. In healthcare systems where access to care depends on the patient’s economic situation there is a considerable risk that only affluent parts of the population will have access to a psychotherapist, whereas the poor only have access to low-quality virtual psychotherapy.

Virtual psychotherapy will take place online. This means that electronic communications will be used. Such communications can be accessed with various forms of electronic eavesdropping. Therefore, all use of this technology requires careful management of security issues in order to uphold privacy and avoid unauthorized access to sensitive private information.

Unless strict regulations are introduced, unvalidated and low-quality psychotherapeutic or “self-help” services sold directly to consumers can become a mental health problem. Direct-to-consumer services give rise to ethical problems concerning safety, effectiveness, accountability and protection of user data (Martinez-Martin & Kreitmair, 2018; Fiske et al., 2019). It should also be mentioned that if psychotherapeutic apps are provided on a market, then the providers of these apps will have an economic incentive to make users of the product depend on it, so that they do not stop buying it. Such dependence can go contrary to the patient’s interest to reach a stage at which psychotherapy is no longer needed. On the other hand, psychotherapy that combines the use of apps with consultations with a human psychotherapist seems to be a promising approach (SOPHIA project at Karolinska Institut, https://sophia.ki.se/).

3.4 Therapeutic Robots

Robots are already in use for a wide range of therapeutic purposes. For instance, robots can remind of medicines to be taken and physical exercises to be performed. Anthropomorphic robots can also show the patient the movements in physical exercise. Positive results have been obtained for instance in orthopedic and cardiological rehabilitation (Vasco et al., 2022). Robots have also been turned out to be useful in the rehabilitation of patients with cognitive problems. They can for instance perform highly repetitive cognitive tasks such as memory games (Yuan et al., 2021; Apostolova & Lanoix, 2022). Children with autism spectrum disease have sometimes found it easier to interact with social robots than with humans. Possibly, robots can be used to help them develop social competence (Scassellati et al., 2012).

There are situations in which some patients prefer robots to human caregivers, in particular situations involving dignity, modesty or shame. This includes personal hygiene, but may also include help with eating (Palmer & Schwan, 2022). A patient who is unable to feed herself may refrain from eating on occasions when she would have liked to do so, in order not to have to ask for assistance (Palmer & Schwan, 2022). Using a robot can be less embarrassing than asking a caregiver for help. In general, robots and other assistive technology can reduce a person’s dependence on others for help. This can lead to increased independence, autonomy and self-determination (Deutscher Ethikrat, 2020).

However, the use of therapeutic robots can also give rise to problems of an ethical nature. Some patients, including many patients with severe dementia, may not be able to give informed consent to the use of a robot (Van Wynsberghe & Li, 2019). The use of robots can also be associated with considerable risks. Like all technology that makes use of digital communication, therapeutic robots come with cybersecurity vulnerabilities, such as risk of privacy infringement and loss of control over the robot. A hacker who manages to take over control of a robot can potentially harm the patient (Monoscalco et al., 2022). It should also be observed that contrary to industrial robots, which can be programmed to complete a movement irrespective of the resistance they receive, therapeutic robots must be programmed to avoid any movements that can be harmful to humans. For instance, it must be impossible for a robot feeding a patient or helping her with personal hygiene to run amok and harm the patient (Deutscher Ethikrat, 2020). Safety considerations can in some cases require that traditional programming rather than machine learning is used to program a robot.

Efficient use of therapeutic robots often requires that the robot monitors and documents the patient’s behaviour and activities in a detailed way. It is essential that the patients know when and where their activities are recorded. Informed consent to such monitoring is essential (Deutscher Ethikrat, 2020). Robot parameters may also have to be personalized in order to adjust to the individual’s habits and capacities (Iosa et al., 2016). There is an obvious risk that such personalization can become manipulative, for instance by trying to persuade a patient to behave in ways she does not want to. A particularly sensitive issue is the extent to which a robot should report a patient’s non-compliance with recommendations regarding, for example, medication and physical exercise to the healthcare provider. Without informed consent, such a system would seem to be a violation of patient autonomy that would be difficult to defend. In general, personalization of robots should always be discussed with patients, and be subject to informed consent.

There is a risk that patients will overestimate the robot’s capacities. For instance, a patient who overestimates the medical knowledge of the robot may choose to tell only the robot, and not any human caregiver, about a health issue that needs to be attended to (Langer et al., 2019). It is essential that patients are well informed about the capacities and limitations of robots, and that robots are not programmed to behave and respond in ways that encourage overreliance. This is particularly important for users with cognitive disabilities, who may have difficulties in understanding what a robot is (Fiske et al., 2019). There is also a risk that patients develop feelings such as attachment to a robot. To avoid this, anthropomorphic features have to be limited to what is necessary for the purpose of the robot (for instance in order to be able to demonstrate movements).

A special type of therapeutic robot is the pet bot. The most common versions are robotic cats, dogs and seals that are all developed to resemble a cuddly animal. They move their heads and make endearing sounds in response to being touched (stroked or brushed) by the users. While studies have shown increased well-being and reduced anxiety, particularly in elderly patients with dementia, this technology is not without ethical risks. It can be difficult for a cognitively impaired patient to understand and remember that the “pet” is a robot. If a robot pet monitors and collects data about the patient, then that actualizes privacy related questions. There is also a risk that the patient will get less human interaction—either because they are more comfortable with their robot or because care staff have to prioritize other tasks. This can lead to further isolation and undermine human dignity.

Therapeutic robots that communicate with users can to some extent enrich their lives. However, there is a risk that although robots have such immediate effects, they may have negative long-term effects by reducing less superficial contacts with humans (Apostolova & Lanoix, 2022). Long-term evaluations are essential. Individuals can potentially underestimate their need of human contact, and overestimate the degree to which it can be replaced by interaction with robots. It is essential that therapeutic robots are used to complement and improve the care that healthcare professionals offer, and not to replace it (Apostolova & Lanoix, 2022).

4 Documentation and Communication

In addition to the uses of digital technology that are specific to healthcare, which we have summarized above under the headings diagnosis, treatment proposal, and treatment, digital technologies are also used in healthcare for purposes that are similar to uses in other sectors, such as documentation, archiving, and communication. The ethical requirements on these activities are largely the same in healthcare as in other social sectors. For instance, cybersecurity and the protection of privacy-sensitive information are essential not only in healthcare but also in many other sectors in society. However, there are at least two aspects of medical records that relate specifically to clinical ethics, and do not seem to be common in other sectors than healthcare.

One of these is the proposal to create medical records automatically. Currently, physicians spend a large part of their time adding information to electronic health records. Reducing this workload would leave more time for work with patients. There are several ways in which the time spent on documentation can be reduced. Efficient speech recognition and transcription software will make a big difference. In the US, much physician time is spent creating billing information that is not needed in a one-payer system.

Recently, several authors have proposed the use of a “digital scribe” that captures the clinician-patient conversation and provides a documentation based on the information obtained from that conversation (Quiroz et al., 2019; Lin et al., 2018). However, an account of the clinician’s conversation with the patient cannot replace the traditional medical record. In a medical record, notes are made of information that physicians may need in future encounters with the patient. This includes medical details that are of little or no interest to the patient, such as precise descriptions in medical language of findings in the physical examination. On the other hand, much of the conversation with the patient may consist of information on a disease and its treatment options that is common knowledge among physicians and need not be included in the medical records. Consequently, both too much and too little information might be retained.

Pierson et al., identify a number of potential advantages of using Large Language Models to, for example, detect biased language in clinical notes and other medical texts, and, also, to propose ways to rewrite the texts to avoid further entrenching of negative stereotypes, “LLMs have important equity-promoting applications in healthcare and more broadly: improving detection of bias, creating structured databases of equity-relevant information, increasing equity of access to information, and improving equity in matching systems.” (Pierson et al., 2023).

The other aspect of medical records has already been mentioned: the use of a large number of individual patient records as “big data” for machine learning. Potentially, useful AI tools for diagnosis and treatment recommendations can be created in this way. However, this gives rise to at least three important ethical issues. One of these is informed consent. Should informed consent be required from all the patients whose data are used? If so, should consent be sought for each use of the data, or is a blanket consent for “medical research and development” sufficient? The second problem is the one-sided commercialization that has already been seen in the use of medical “big data”: commercial actors receive big data for free from public hospitals, but the hospitals (and ultimately the tax-payers) have to pay considerable sums for the tools that have been developed, based on the information that they provided for free. The third problem is privacy. Clinical records often contain information about age, current and previous occupations, residential area etc. that can be used to identify individuals even if their names are removed from the data file. Much of this information can have aetiological relevance, and will therefore not be removed from the file. Consequently, data transferred to private companies for training of AI tools will be highly privacy-sensitive. The experience we have of handling of privacy-sensitive information in large commercial organizations is not reassuring (Hinds et al., 2020; Zuboff, 2019). Therefore, strict regulatory oversight is required to ensure that data transferred for use in the creation of AI tools are safely kept and not used or made available for any other purpose.

5 Conclusion

There can be no doubt that AI and ML have the potential to improve healthcare in many ways. Such technology can provide us with quicker and better diagnosis, as well as improved and more personalized treatments. However, the clinical introduction of these technologies can also give rise to a wide variety of ethical problems, connected for instance with privacy, autonomy, informed consent, explainability and transparency, bias, justice, patient safety, security, dignity, human contact and empathy, responsibility, accountability, technology complacency, and risks of deskilling.

All these are issues that need to be carefully analyzed for each new diagnostic or therapeutic procedure before it is introduced into the clinic. They must also be followed up in post-introduction evaluations. To conclude, we would like to emphasize two overarching challenges that need to be considered not only by healthcare personnel but also by the decision-makers who allocate resources and make other decisions that determine patients’ access to healthcare.

The first of these challenges is to retain human contact, which is central to high quality care. In the short term, money can often be saved by reducing the time that healthcare professionals spend with patients, and AI offers many opportunities to reduce that time. A robot instead of a physiotherapist can instruct and support a patient in performing various exercises. Psychotherapy can be performed by a virtual instead of a human psychotherapist. A robot instead of a human can help a patient with activities of daily life, such as eating and personal hygiene. Patients can be referred to a health app instead of being offered contact with a nurse or other health professional when they need advice. In these and many other cases, both the patients themselves and the caregivers can underestimate the need for personal contact and care. Since the care needs and care situations are so different it is impossible to generalize. What is clear however, is that AI can offer both great improvements and greatly undermine the quality of care. It has often been emphasized that AI programs should be tools used by healthcare professionals, rather than replacements of these professionals. This principle can only be upheld in practice if higher-level decision-makers realize the dangers of de-personalizing healthcare.

The other challenge is fairness in access to high-quality healthcare. There is an obvious risk that less resourceful people will use inexpensive health apps when they would have needed contact with a healthcare professional. If virtual psychotherapy becomes available at low cost, then this may lead to a development where virtual therapy is the only option for those with limited financial resources, whereas the well-to-do can afford going to a human psychotherapists. In poor countries, AI-based eye screening may have to be performed without sufficient access to an ophthalmologist, and AI-based skin screening without the possibility to refer patients to a dermatologist. It is important to realize that universal and fair access to healthcare is achievable in all countries, but nowhere does it come automatically. Resources have to be allocated, and mechanisms for ensuring high quality need to be in place. (Hansson, 2022). In this perspective, it is essential that AI tools are introduced in ways that ensure that their use is determined by patients’ needs, and not by their status or resourcefulness.

6 Overview of the Special Issue

In “Justice and empowerment through digital health: ethical challenges and opportunities”, Philip Nickel, Iris Loosman, Lily Frank and Anna Vinnikova discuss what it takes to achieve justice and empowerment in the introduction of digital health technology. Based on ethical considerations and conceptual analysis, they warn against interpretations of empowerment that presuppose an unrealistic capacity to manage one’s own health. Instead, they investigate conceptualisations of empowerment that focus on improving health literacy and access for those who are now underprivileged in these respects. Their major conclusion is that in order to realize the ethical values underlying empowerment, it is necessary to focus on how AI can be used to benefit underserved and difficult-to-reach populations of patients for both care and prevention.

“No justice without (relational) autonomy? Rethinking the digital empowerment rhetoric,” is a comment by Michiel De Proost and Jesse Gray on the article by Nickel and coworkers. They argue that the concept of autonomy could be better understood in relational terms, and propose the use of Capability Sensitive Design as a means to achieve empowerment through digital health. This is followed by a response by Nickel et al.

In “Responsibility gaps and Black Box Healthcare AI: Shared Responsibilisation as a Solution”, Benjamin H. Lang, Sven Nyholm and Jennifer Blumenthal-Barby investigate the problems that can be associated with responsibility gaps in the use of AI in healthcare and its administration. A responsibility gap can emerge if responsibility for errors committed by an artificial agent cannot be assigned to any human agent. The authors argue that responsibility gaps are created by so-called black box AI in healthcare. In order to avoid the ethical problems that such gaps can give rise to, they propose that the relevant stakeholders should voluntarily responsibilize the gaps. This will mean that stakeholders take moral responsibility for events that they are not blameworthy for. Such responsibilization should be shared among several institutions and persons, such as companies producing the AI, healthcare institutions, programmers and healthcare personnel.

In “Examining ethical and social implications of digital mental health technologies through expert interviews and sociotechnical systems theory”, Jonathan Adams reports a study based on semi-structured interviews with experts in the United Kingdom working with digital mental health. Their expertise covers a wide variety of mental health technologies, including Internet- and app-based services, remote monitoring, and virtual reality for exposure therapy. The interviews explored how the interviewees assess the ethical and social implications of these technologies. Several of them emphasized that digital mental health can be counterproductive and aggravate serious risks such as that of suicide. Problems of responsibility were also seen as serious. Equity in access to mental health services was another important theme in these interviews. The respondents pointed out possible inequities originating both in bias inherent in the technologies and in the digital divide. Other topics often brought up in the interviews were information security and possible sale of individual data to insurance companies. Several respondents warned against potential infringement on individual rights that can arise when mental health is managed in public-private partnerships.

In “Commercial mHealth Apps and the Providers’ Responsibility for Hope”, Leon Walter Sebastian Rossmaier, Yashar Saghai, and Philip Brey study commercial mobile health apps. Such apps measure health-related parameters and give health promotive advice. The authors show that marketing campaigns for some of these apps create inflated and/or false hopes of positive health effects. Vulnerable groups that are already underserved by healthcare institutions are particularly at risk. In some cases, the apps can even have negative health effects. For instance, weight reduction apps can contribute to eating disorders. The authors propose that providers of health apps should adopt at more responsible approach to their marketing messages and to the potential negative effects that their apps can have on users.

In “Digital Pathology: Scanners and Contextual Integrity”, Tom Sorell and Ricky Z. Li extend Helen Nissenbaum’s theory of privacy in information transfer and apply it to digitalized pathology. Their extension of the theory transforms it to a more general theory of ethically appropriate information transfer. The extended theory includes other moral aspects of information transfer than privacy, such as a data subject’s interest not to be targeted in marketing campaigns. Applying the extended theory to digitalized pathology, the authors distinguish between two uses of digitalization. One of these is the use of digital images in patient-by-patient clinical work. The other is the use of digital images from many patients as datasets for creating an algorithm. Due to anonymization of the data, the latter use is usually less problematic than the former with respect to privacy. However, it can give rise to other ethical problems for which a wide approach that goes beyond privacy concerns is necessary. Traditional medical ethics is focused on data from individual patients and needs to be amended to cover business transactions involving large sets of patient data.

In “Making AI’s impact on pathology visible: using ethnographic methods for ethical and epistemological insights”, Megan Milota, Jojanneke Drogt and Karin Jongsma show how ethnographic method, including ethnographic filming, can be used to elucidate ethical and social aspects of the use of artificial intelligence in a digital diagnostic workflow. They have conducted an ethnographic study of a clinical pathology department in order to better understand how AI can impact the tasks and responsibilities of pathologists and laboratory technicians. Their results indicate that AI-supported diagnosis will continue to rely on the artisanal expertise of professionals who process and analyze the tissue samples. They also raise the question whether ethnographic methodology can be useful in studies of AI in other branches of medicine.

In “Human-Curated Validation of Machine Learning Algorithms for Health Data”, Magnus Boman discusses how medical AI should be validated. He describes the two major perspectives that can be applied to this issue, namely those of the machine learning developer and those of the clinician. As Boman points out, many machine learning developers are not knowledgeable about the rather strict and specific standards of external validation that are required in medical research, including for instance randomized double-blinded trials and pre-recorded trial protocols. He also notes that many commercial products that are based on machine learning have been produced in ways that do not hold up to scientific scrutiny. Much work is needed to bridge the gap between the validation traditions in the two areas. Boman proposes a check list (“laundry list”) for clinicians that can facilitate their decisions on validation and their communication with machine learning developers.

In “Policy guidelines for smart sanitation technology as a public health tool”, Maria Carnovale discusses the use of devices such as biosensors and visual sensors to analyze wastewater. This technology can be used to detect biomarkers of infectious diseases and some malignant diseases. It can also be employed to assess the consumption both of legal substances such as antibiotics and other medical drugs and illegal drugs such as cannabis, opioids and cocaine. Such sensors can be placed in private homes or at places in the sewage system that receive waste water from the general population in an area. This technology can potentially contribute to early discovery of outbreaks of infectious diseases and to early diagnosis of cancer. However, depending on the placement of the sensors, it can also result in problematic privacy intrusions and contribute to stigmatisation of people for instance living in areas with a high use of illegal drugs. Carnovale proposes policy guidelines to ensure that the use of these technologies is ethically defensible.