Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

BJD 0423

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

BJD

R EV IE W AR TI C LE British Journal of Dermatology

What is AI? Applications of artificial intelligence to


dermatology*
X. Du-Harpur iD ,1,2,3 F.M. Watt iD ,1 N.M. Luscombe iD 2,4 and M.D. Lynch iD 1,3
1
Centre for Stem Cells and Regenerative Medicine, Faculty of Life Sciences and Medicine, King’s College London, 28th Floor, Tower Wing, Guy’s Hospital,
London SE1 9RT, UK
2
The Francis Crick Institute, 1 Midland Road, London, UK
3
St John’s Institute of Dermatology, Guy’s Hospital, London, UK
4
Okinawa Institute of Science and Technology Graduate University, Okinawa 904-0495, Japan

Summary

Downloaded from https://academic.oup.com/bjd/article/183/3/423/6748151 by guest on 07 November 2024


Correspondence In the past, the skills required to make an accurate dermatological diagnosis have
Xinyi Du-Harpur. required exposure to thousands of patients over many years. However, in recent
Email: xinyi.du@kcl.ac.uk
years, artificial intelligence (AI) has made enormous advances, particularly in the
Accepted for publication area of image classification. This has led computer scientists to apply these tech-
14 January 2020 niques to develop algorithms that are able to recognize skin lesions, particularly
melanoma. Since 2017, there have been numerous studies assessing the accuracy of
Funding sources algorithms, with some reporting that the accuracy matches or surpasses that of a
X.D-H. is the recipient of an Accelerator Award dermatologist. While the principles underlying these methods are relatively
from Cancer Research UK. F.M.W. gratefully
straightforward, it can be challenging for the practising dermatologist to make sense
acknowledges financial support from the UK Medi-
cal Research Council (MR/PO18823/1), the of a plethora of unfamiliar terms in this domain. Here we explain the concepts of
Biotechnology and Biological Sciences Research AI, machine learning, neural networks and deep learning, and explore the principles
Council (BB/M007219/1) and the Wellcome of how these tasks are accomplished. We critically evaluate the studies that have
Trust (206439/Z/17/Z). This work was sup- assessed the efficacy of these methods and discuss limitations and potential ethical
ported by the Francis Crick Institute, which
issues. The burden of skin cancer is growing within the Western world, with major
receives its core funding from Cancer Research UK
(FC010110), the UK Medical Research Council
implications for both population skin health and the provision of dermatology ser-
(FC010110) and the Wellcome Trust vices. AI has the potential to assist in the diagnosis of skin lesions and may have par-
(FC010110). N.M.L. is a Winton Group Leader ticular value at the interface between primary and secondary care. The emerging
in recognition of the Winton Charitable Founda- technology represents an exciting opportunity for dermatologists, who are the indi-
tion’s support towards the establishment of the viduals best informed to explore the utility of this powerful novel diagnostic tool,
Francis Crick Institute. N.M.L. is additionally
and facilitate its safe and ethical implementation within healthcare systems.
funded by a Wellcome Trust Joint Investigator
Award (103760/Z/14/Z), the MRC eMedLab
Medical Bioinformatics Infrastructure Award What is already known about this topic?
(MR/L016311/1) and core funding from the
Okinawa Institute of Science & Technology Gradu- • There is considerable interest in the application of artificial intelligence to
ate University. M.D.L. gratefully acknowledges medicine.
financial support from the Wellcome Trust • Several publications in recent years have described computer algorithms that can
(211276/E/18/Z). diagnose melanoma or skin lesions.
• Multiple groups have independently evaluated algorithms for the diagnosis of mela-
Conflicts of interest noma and skin lesions.
X.D-H. has provided advice for the company Skin
Analytics Ltd. F.M.W. is currently on secondment
as Executive Chair of the Medical Research What does this study add?

Council.
We combine an introduction to the field with a summary of studies comparing
*Plain language summary available online
dermatologists against artificial intelligence algorithms with the aim of providing a
comprehensive resource for clinicians.
DOI 10.1111/bjd.18880 • This review will equip clinicians with the relevant knowledge to critically appraise
future studies, and also assess the clinical utility of this technology.
• A better informed and engaged cohort of clinicians will ensure that the technology
is applied effectively and ethically.

© 2020 The Authors. British Journal of Dermatology British Journal of Dermatology (2020) 183, pp423–430 423
published by John Wiley & Sons Ltd on behalf of British Association of Dermatologists
This is an open access article under the terms of the Creative Commons Attribution License, which permits use,
distribution and reproduction in any medium, provided the original work is properly cited.
424 What is AI? Applications of artificial intelligence to dermatology, X. Du-Harpur et al.

In the last decade, a combination of novel computational


approaches, increases in available computing capacity and Deep learning and neural networks
availability of training data has facilitated the application of Neural networks (Figure 3) pass input data through a series
powerful mathematical algorithms in the field of artificial of interconnected nodes (analogous to biological neurons).
intelligence (AI). This has led to dramatic advances in the Each node functions as a mathematical operation (addition,
performance of computers in tasks that have previously only multiplication, etc.), and a group of interconnected nodes
been possible for humans. Methods that can make predic- within the network is referred to as a ‘layer’ within a net-
tions of data without direct human intervention in the train- work, with the overall structure of the layers being referred to
ing process are referred to as machine learning. Image as the ‘architecture’. During training, every node is adjusted
classification has been at the forefront of machine learning and optimized through an iterative process called ‘backpropa-
research, and as visual pattern recognition plays a larger role gation’,2,3 allowing the neural network to improve its classifi-
in dermatology than perhaps in any other medical specialties, cation accuracy.
early clinical applications of machine learning have been Neural networks with multiple ‘hidden layers’ of nodes

Downloaded from https://academic.oup.com/bjd/article/183/3/423/6748151 by guest on 07 November 2024


within this specialty. (Figure 3) are referred to as ‘deep’ neural nets and perform
‘deep learning’. Although the concept of deep neural networks
What are artificial intelligence and machine was described decades ago, lack of affordable and efficient
learning?
AI is difficult to define precisely. In Alan Turing’s seminal Table 1 Essential terminology in the field of machine learning and
paper ‘Computing machinery and intelligence’, he proposed artificial intelligence
the well-known Turing test, whereby a machine is deemed
intelligent if it is indistinguishable from a human in con- Artificial The ability of machines, such as computers,
versation by an impartial observer.1 In modern parlance, intelligence (AI) to simulate human intelligence
artificial general intelligence refers to the ability of a Machine learning Algorithms and statistical models that are
machine to communicate, reason and operate independently programmed to learn from data, therefore
in both familiar and novel scenarios in a similar manner to recognizing and inferring patterns within
them. This enables computers to perform
a human. This remains far beyond the scope of current
specific tasks without explicit instructions
methods and is not what is being referred to when the from a human operator
term ‘AI’ is commonly used. Most references to AI are now Supervised Refers to machine learning tasks whereby
often used as an interchangeable term with ‘machine learn- learning the goal is to identify a function that best
ing’ or ‘deep learning’, the latter being a specific form of maps a set of inputs (e.g. image) to their
machine learning that is discussed in more detail below correct output (label). This is based
(see Table 1 for a glossary of terms). Machine learning learning or training on prematched pairs.
This is in contrast to unsupervised
refers to algorithms and statistical models that learn from
learning, where novel patterns such as
labelled training data, from which they are able to recog- groups or ‘clusters’ are identified in data
nize and infer patterns (Figure 1). without influence from prior knowledge
Generally, during the training of a machine learning model or labelling
a subset of the data is ‘held back’ and then subsequently used Overfitting A common problem in machine learning
for testing the accuracy of the trained model. The accuracy where the model has high accuracy when
of the model is assessed on this test dataset according to its tested on data from the same source as its
training data, but its performance does
accuracy in correctly matching an image to its label, for
not generalize to novel sources of data
example melanoma or benign naevus. In any classification Neural network A form of supervised machine learning
system there will be a trade-off between sensitivity and speci- inspired by biology whereby data pass
ficity; for example, an AI system may output a probability through a series of interconnected
score for melanoma between 0 and 1, and this would require neurons, which are individually weighted
the operator to set a threshold for the decision boundary. At to make predictions. During training, the
a low threshold, a higher proportion of melanomas will be data pass through the network in an
iterative manner and the weightings are
captured (high sensitivity) but there is a risk of classifying
continually adjusted to optimize its ability
benign naevi as malignant (low specificity). As the threshold to match label to data
is increased, this would decrease the sensitivity, but increase Deep learning Refers to a neural network with multiple
the specificity (i.e. fewer benign naevi classified as mela- layers of ‘neurons’ that have adjustable
noma). The behaviour of a machine learning classifier in weights (mathematical functions)
response to changing the threshold can be visualized as a Convolutional Refers to a type of neural network whereby
receiver operating characteristic (ROC) curve. The greater the neural network the layers apply filters for specific features
to areas within an image
area under the curve, the more accurate the classifier (Fig-
ure 2).

British Journal of Dermatology (2020) 183, pp423–430 © 2020 The Authors. British Journal of Dermatology
published by John Wiley & Sons Ltd on behalf of British Association of Dermatologists
What is AI? Applications of artificial intelligence to dermatology, X. Du-Harpur et al. 425

TRAINING TESTING

Algorithm
Malignant Predicted label

‘Malignant’
Labelled training data Unlabelled
test data
‘Benign’

Benign

Downloaded from https://academic.oup.com/bjd/article/183/3/423/6748151 by guest on 07 November 2024


Figure 1 Schematic depicting how a machine learning algorithm trains on a large dataset to be able to match data to label (supervised learning),
the performance of which can then be assessed.

Sensitivity = True Positive Rate


Specificity = True Negative Rate

Receiving Operator Characteristics


Area Under Curve (AUROC) =

= Perfect Performance
(100% Sensitivity, 100% Specificity)

Sensitivity

Figure 2 Schematic of a receiver operating characteristic (ROC) curve, which is a way of visualizing the performance of a trained model’s
sensitivity and specificity. Typically, machine learning studies will use ROC curves and calculations of the area under the curve (AUC or AUROC)
to quantify accuracy. The dashed line represents the desired perfect performance, when sensitivity and specificity are both 100%; in this scenario,
the AUC would be 10. In reality, there is a trade-off between sensitivity and specificity, which gives rise to a curve.

computing power was a major limitation in being able to PyTorch (developed by Facebook) and then trained further for
train them effectively. However, in 2013 it was recognized a specific purpose or used in a novel application. A common
that graphical processing units (GPUs), originally designed for approach would be to take a pretrained image recognition net-
three-dimensional graphics in computer games, could be work architecture such as Inception, and specialize its applica-
repurposed to power the repetitive training required for neu- tion by inputting a specific type of image data. This process is
ral networks.4,5 Of note, convolutional neural networks referred to as transfer learning.
(CNNs) are a specific form of deep learning architecture that
have proven effective for the classification of image data.
The application of convolutional deep learning
CNNs have massively increased in popularity as a method for
in dermatology
computer-based image classification after the victory of the
GPU-powered CNN AlexNet in 2012, which won the Ima- Classifying data using CNNs is now relatively accessible, com-
geNet competition with a top 5 error rate of 153%, which putationally efficient and inexpensive, hence the explosion in
was a remarkable 10% improvement on the next best so-called ‘artificial intelligence’. In medicine to date, the main
competitor.5 areas of application have been the visual diagnostic specialties
In the past few years, use of CNNs in classification tasks has of dermatology, radiology and pathology. Automating aspects
exploded due to demonstrable and consistently superior effi- of dermatology with computer-aided image classification has
cacy and availability. Novel CNN architectures have been been attempted in dermatology for over 30 years;6–8 however,
developed, improved and made available for public use by previous efforts have achieved only limited accuracy. Although
institutions with a high level of expertise and computational attempts have been made in recent years to use neural net-
resources; examples of these include ‘Inception’ by Google works to diagnose or monitor inflammatory dermatoses,9–11
and ‘ResNet’ by Microsoft. These architectures can be accessed these have generally not been as successful or impressive as
using software such as TensorFlow (developed by Google) or the networks constructed to diagnose skin lesions, particularly

© 2020 The Authors. British Journal of Dermatology British Journal of Dermatology (2020) 183, pp423–430
published by John Wiley & Sons Ltd on behalf of British Association of Dermatologists
426 What is AI? Applications of artificial intelligence to dermatology, X. Du-Harpur et al.

Melanoma

Naevus

Downloaded from https://academic.oup.com/bjd/article/183/3/423/6748151 by guest on 07 November 2024


Figure 3 Schematic depicting how classification tasks are performed in convolutional neural networks. Pixel data from an image are passed
through an architecture consisting of multiple layers of connecting nodes. In convolutional neural networks, these layers contain unique
‘convolutional layers’, which operate as filters. These filters work because it was recognized that the location of a feature within an image is often
less important than whether that feature is present or absent – an example might be (theoretically) the presence or absence of blue-grey veiling
within a melanoma. A convolutional ‘filter’ learns a particular feature of the image irrespective of where it occurs within the image (represented
by the black squares). The network is composed of a large number of hierarchical filters that learn increasingly high-level representations of the
image. These could in principle learn dermoscopic features similar to those described by clinicians, although in practice the precise features
recognized are likely to differ from classic diagnostic criteria.

melanoma. Melanoma is therefore the focus of the remainder (ISIC) database,22 which contains in excess of 20 000 labelled
of this review, and Table S1 (see Supporting Information) dermoscopic images and is required to meet some basic qual-
summarizes these head-to-head comparison studies.12–21 ity standards. This network was trained on over 12 000
In 2017, Esteva et al. published a landmark study in Nature images to perform two tasks: the first was to classify dermo-
that was notable for being the first to compare a neural net- scopic images of melanocytic lesions as benign or malignant
work’s performance against dermatologists.14 They used a pre- (Figure 4b), and the second was to classify clinical images of
trained GoogLeNet Inception v3 architecture and fine-tuned melanocytic lesions as benign or malignant (Figure 4c). The
the network (transfer learning) using a dataset of 127 463 dermatologists were assessed using 200 test images, with the
clinical and dermoscopic images of skin lesions (subsequent decision requested mirroring that of the study of Esteva et al.:
studies have shown it is possible to train networks on signifi- to biopsy/treat or to reassure. Additionally, the dermatolo-
cantly smaller datasets, numbering in the thousands). For test- gists’ demographic data, such as experience and training level,
ing, they selected a subset of clinical and dermoscopic images were requested.
confirmed with biopsy and asked over 20 dermatologists for The method used to quantify the relative performance also
their treatment decisions. Dermatologists were presented with consisted of drawing a mean ROC curve by calculating the
265 clinical images and 111 dermoscopic images of ‘ker- average predicted class probability for each test image (Fig-
atinocytic’ or ‘melanocytic’ nature, and asked whether they ure 4b, c). The dermatologists’ performance for the same set
would: (i) advise biopsy or further treatment or (ii) reassure of images was then plotted on the ROC curve. Barring a few
the patient. They inferred a ‘malignant’ or ‘benign’ diagnosis individual exceptions, the dermatologists’ performance fell
from these management decisions, and then plotted the der- below the CNN ROC curves in both the clinical and dermo-
matologists’ performance on the network’s ROC curves with scopic image classifications. The authors also used a second
regards to classifying the keratinocytic or melanocytic lesions approach, whereby they set the sensitivity of the CNN at the
(which were subdivided as dermoscopic or clinical) as ‘be- level of the attending dermatologists, and compared the mean
nign’ or ‘malignant’ (Figure 4a). In both ‘keratinocytic’ and specificity achieved at equivalent sensitivity. In the dermo-
‘melanocytic’ categories, the average dermatologist performed scopic test, at a sensitivity of 741%, the dermatologists’ speci-
at a level below the CNN ROC curves, with only one individ- ficity was 60% whereas the CNN achieved a superior 865%.
ual dermatologist performing better than the CNN ROC curve As part of an international effort to produce technology for
in each category. This suggests that in the context of this early melanoma diagnosis, in 2016 an annual challenge was
study, the CNN has superior accuracy to dermatologists. established to test the performance of machine learning algo-
A recently published large study detailed in two papers by rithms using the image database from the ISIC.22 A recent
Brinker et al.19,20 involved training a ‘ResNet’ model on the paper by Tschandl et al.21 summarizes the performance of the
publicly available International Skin Imaging Collaboration most recent competition in August to September 2018, and

British Journal of Dermatology (2020) 183, pp423–430 © 2020 The Authors. British Journal of Dermatology
published by John Wiley & Sons Ltd on behalf of British Association of Dermatologists
What is AI? Applications of artificial intelligence to dermatology, X. Du-Harpur et al. 427

(a) Esteva et al.

Downloaded from https://academic.oup.com/bjd/article/183/3/423/6748151 by guest on 07 November 2024


(b) Brinker et al., dermoscopic images (c) Brinker et al., clinical images

(d) Tschandl et al.

Figure 4 Receiver operating characteristic (ROC) curves from studies by Esteva et al.,14 Brinker et al.19,20 and Tschandl et al.21 Most often, the
dermatologists’ comparative ROC curves are plotted as individual data points. Lying below the curve means that their sensitivity and specificity,
and therefore accuracy, are considered inferior to those of the model in the study. The studies all demonstrate that, on average, dermatologists sit
below the ROC curve of the machine learning algorithm. It is noticeable that the performance of the clinicians in Brinker’s studies (b, c), for
example, is inferior to that of the clinicians in the Esteva study (a). Although there is a greater spread of clinical experience in the Brinker studies,
the discrepancy could also be related to how the clinicians were tested. In both Brinker’s and Tschandl’s studies, some individual data points
represent performance discrepancy that is significantly lower than data would suggest in the real world, which could suggest that the assessments
may be biased against clinicians. AUC, area under the curve; CNN, convolutional neural network. All figures are reproduced with permission of
the copyright holders.

© 2020 The Authors. British Journal of Dermatology British Journal of Dermatology (2020) 183, pp423–430
published by John Wiley & Sons Ltd on behalf of British Association of Dermatologists
428 What is AI? Applications of artificial intelligence to dermatology, X. Du-Harpur et al.

History

Examination
Refer urgently

Decision Refer routinely


Reassure
Neural network assessment

Downloaded from https://academic.oup.com/bjd/article/183/3/423/6748151 by guest on 07 November 2024


Figure 5 Schematic showing hypothetical use of a machine learning algorithm to help nonexpert clinicians risk-stratify lesions to make clinical
decisions. Clinicians routinely weigh up both the benefits and limitations of common diagnostic aids such as prostate-specific antigen or D-dimers.
Currently, there are very few useful dermatological diagnostic decision aids available to nonexpert clinicians, as the diagnostic process is
dominated by image recognition. Convolutional neural network could represent a new class of decision aid that could help nonexpert clinicians
triage appropriately and narrow down their differential diagnosis.

also compares the performance of the submitted algorithms from a variety of sources as a web application.15 When Navar-
against 511 human readers recruited from the World Der- ette-Dechent et al. tested the network on data from the ISIC
moscopy Congress, who comprised a mixture of board-certi- dataset, which the network had not previously been exposed
fied dermatologists, dermatology residents and general to, its performance dropped from a reported area under the
practitioners (Figure 4d). Test batches of 30 images were gen- curve of 091, to achieving the correct diagnosis in only 29
erated to compare the groups, with a choice of seven diag- out of 100 lesions, which would imply a far lower area under
noses as multiple-choice questions provided. When comparing the curve.23 As algorithms are fundamentally a reflection of
all 139 algorithms against all dermatologists, dermatologists their training data, this means that if the input image dataset
on average achieved 17 out of 30 on the image multiple- is biased in some way, this will have a direct impact on algo-
choice questions, whereas the algorithms on average achieved rithmic performance, which will only be apparent when they
19. As expected, years of experience improved the probability are tested on completely separate datasets.
for making a correct diagnosis. Regardless, the top three algo- Another important limitation of the methodology used to
rithms in the challenge outperformed even experts with > 10 compare AI models with dermatologists is that ROC curves,
years of experience, and the ROC curves of these top three although a useful visual representation of sensitivity and speci-
algorithms sit well above the average performance of the ficity, do not address other important clinical risks. For exam-
human readers. ple, in order to capture more melanomas (increased
sensitivity), the algorithm may incorrectly misclassify more
benign naevi as malignant (false-positives). However, this
Key biases, limitations and risks of automated
could potentially lead to unnecessary biopsies for patients,
skin lesion classification
which aside from patient harm would create additional
Given that, remarkably, all of the published studies indicate demand on an already burdened healthcare system. There is
superiority of machine learning algorithms over dermatolo- evidence that dermatologists have improved ‘number need to
gists, it is worth exploring the biases commonly found in biopsy’ metrics for melanoma in comparison with nonderma-
these study designs. These can be categorized into biases that tologists.24 The reporting of number need to biopsy would be
favour the networks and biases that disadvantage clinicians. a useful addition to studies such as that of Esteva et al.,14 as it
With regards to the first category, it is first worth noting that would aid in the estimation of potential patient and health
in the studies described, the neural networks were generally economic impact.
trained and tested on the same dataset. This closed-loop sys- It is also worth noting that these datasets are retrospectively
tem of training and testing highlights a common limitation collated and repurposed for image classification training; this
within machine learning called ‘generalizability’. On the occa- means that the images captured may not be representative in
sions that generalizability has been tested, neural networks terms of the proportion of diagnoses, or in terms of having
have often been found lacking. For example, Han et al. typical features. As neural networks are essentially a reflection
released their neural network, which was a Microsoft ResNet- of their labelled data input, this will undoubtedly have conse-
152 architecture trained on nearly 20 000 skin lesion images quences on how they perform. However, given the lack of

British Journal of Dermatology (2020) 183, pp423–430 © 2020 The Authors. British Journal of Dermatology
published by John Wiley & Sons Ltd on behalf of British Association of Dermatologists
What is AI? Applications of artificial intelligence to dermatology, X. Du-Harpur et al. 429

‘real-world’ studies, it is difficult to know how significant this The AI-integrated health service of the future?
is. When it comes to assessing clinicians using images from
these datasets, this may also introduce an element of bias that There are attempts to deploy ‘AI’ technologies within the
disadvantages clinicians too, as lesions that were deemed wor- healthcare space within two main scenarios: direct to con-
thy of capturing via photograph or being biopsied may not be sumer or public, and as a decision aid for clinicians. The
representative of the lesion type. As a result, the sensitivity of direct-to-consumer model already exists in some fashion;
clinicians diagnostically may be lower than in a normal clinic. there are smartphone apps such as SkinVision, which enable
This hypothesis for discrepancy in diagnostic accuracy was individuals to assess and track their skin lesions. However,
borne out in a recent Cochrane review, where the diagnostic currently such apps do not make accountable diagnoses and
sensitivity of dermatologists examining melanocytic lesions usually explicitly state in their terms and conditions that
with dermoscopy was 92%,25 which is significantly higher they do not provide a diagnostic service, and do not intend
than typically found in neural network studies. For example in to replace or substitute visits to healthcare providers. At pre-
Tschandl et al.’s web-based study of 511 clinicians, the sensi- sent, it is not yet clear what the benefits and risks of such

Downloaded from https://academic.oup.com/bjd/article/183/3/423/6748151 by guest on 07 November 2024


tivity of experts was 812%.22 The manner in which clinical a tool are in terms of how frequently it provides false reas-
decisions are inferred as ‘benign’ or ‘malignant’ also makes surance, and how frequently it recommends referral when
some assumptions that may not be accurate; for example, a this is not needed. Although health data democratization has
dermatologist’s decision to biopsy a lesion is a reflection of benefits from the perspective of patient autonomy, it may
risk, not an outright ‘malignant’ classification. be that this does not translate to better health outcomes and
From a safety perspective, there are two considerations that might instead lead to unnecessary concern and investiga-
have yet to be addressed in the studies. Firstly, in order to ‘re- tions. Moreover, fundamentally, healthcare is currently struc-
place’ a dermatologist, an algorithm must be able to match tured in such a way that responsibility and liability are
the current gold standard for screening a patient’s skin lesions. carried by the provider and not the patient, and as such
Currently, this is a clinical assessment by a dermatologist, who these apps do not have a clear-cut position in healthcare
examines the lesion in the context of patient history and the infrastructure.
rest of their skin. Published studies do not compare neural The current social and legal framework of healthcare is
networks against this standard of assessment; they are only better primed for incorporating AI as a decision aid for
compared with dermatologists presented with dermoscopic or clinicians, particularly in enhancing decision making by
clinical images, sometimes with limited additional clinical nonspecialists (Figure 5). This could potentially be of great
information. Not only does this bias the studies against der- use in dermatology services due to the ever-growing burden
matologists, who are not trained or accustomed to make diag- of skin cancer. In the UK, there is a long-standing shortfall
noses without this information, it also represents a limiting of consultant dermatologists, and current workforce plan-
factor in justifying their deployment in a clinical setting as a ning is insufficient to address this. The volume of skin can-
replacement for dermatologists. Fundamentally, it has not yet cers has a knock-on effect on patients with chronic
been demonstrated that they are equivalent to the standard of inflammatory skin diseases, essentially reducing their access
dermatological care currently provided to patients. A second to dermatologists.
important consideration is the fact that training data lack suffi- Dermatologists are also aware that generally, a high propor-
cient quantities of certain types of lesions, particularly the tion of referrals to dermatology with suspected skin cancer on
rarer presentations of malignancy, such as amelanotic mela- the urgent ‘2-week wait’ pathway do not require further
noma.15 It is not yet clear how algorithms will perform when investigation and are actually immediately discharged. Many
presented with entirely novel, potentially malignant lesions; of the lesions falling into this category are easily recognized
this has rare but significant safety implications for patients. by dermatologists, but are not easily recognized by nonspe-
From a legal perspective, an issue that has yet to be fully cialists. One could hypothesize that CNN-based applications
addressed is the lack of explainability by neural networks. Cur- can aid a general practitioner service in triaging skin lesions
rently, it is not possible to know what contributes to their more effectively, and ensure that patients are managed by the
decision-making process. This has led to criticisms and con- appropriate clinical services. Having a clinical user also miti-
cerns that neural networks function as ‘black boxes’ with gates many of the risks and limitations inherent to CNN-based
potential unanticipated and hard-to-explain failure modes. The technologies, improving both the safety profile and the patient
European Union’s General Data Protection Requirement speci- experience.
fies explainability as a requirement for algorithmic decision The recently published Topol Review on ‘Preparing the
making, which is currently not achievable.26,27 Algorithmic healthcare workforce to deliver the digital future’ states that
decision making also has uncertain status in the USA, where ‘to reap the benefits, the NHS must focus on building a digi-
the Food and Drug Administration have advised that until tally ready workforce that is fully engaged and has the skills
there exists a body of evidence from clinical trials, clinical and confidence to adopt and adapt new technologies in prac-
decisions suggested by AI ought to be considered AI guided, tice and in context’. It also concludes that ‘the adoption of
not AI provided, and liability would still rest with the technology should be used to give healthcare staff more time
clinician.28 to care and interact directly with patients’.29 In the context of

© 2020 The Authors. British Journal of Dermatology British Journal of Dermatology (2020) 183, pp423–430
published by John Wiley & Sons Ltd on behalf of British Association of Dermatologists
430 What is AI? Applications of artificial intelligence to dermatology, X. Du-Harpur et al.

dermatology, this very much holds true. Technology adoption 15 Han SS, Kim MS, Lim W et al. Classification of the clinical images
could improve clinical pathways, and enable our neediest for benign and malignant cutaneous tumors using a deep learning
patients to access dermatology services more efficiently. It is algorithm. J Invest Dermatol 2018; 138:1529–38.
16 Rezvantalab A, Safigholi H, Karimijeshni S. Dermatologist level
unlikely that they will threaten our profession; in reality they
dermoscopy skin cancer classification using different deep learning
represent an opportunity for personal learning, service convolutional neural networks algorithms. Available at: https://
improvement and leadership that could be transformative for arxiv.org/ftp/arxiv/papers/1810/1810.10348.pdf (last accessed
our future healthcare system. 27 January 2020).
17 Fujisawa Y, Otomo Y, Ogata Y et al. Deep-learning-based, com-
puter-aided classifier developed with a small dataset of clinical
References images surpasses board-certified dermatologists in skin tumour
1 Turing AMI. Computing machinery and intelligence. Mind 1950; diagnosis. Br J Dermatol 2019; 180:373–81.
LIX:433–60. 18 Tschandl P, Rosendahl C, Akay BN et al. Expert-level diagnosis of
2 LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; nonpigmented skin cancer by combined convolutional neural net-
521:436–44. works. JAMA Dermatol 2019; 155:58–65.

Downloaded from https://academic.oup.com/bjd/article/183/3/423/6748151 by guest on 07 November 2024


3 LeCun Y, Boser BE, Denker JS et al. Handwritten digit recognition 19 Brinker TJ, Hekler A, Enk AH, et al. A convolutional neural net-
with a back-propagation network. In: Advances in Neural Information work trained with dermoscopic images performed on par with
Processing Systems 2 (Touretzky DS, ed.). Burlington, MA: Morgan- 145 dermatologists in a clinical melanoma image classification
Kaufmann, 1990; 396–404. task. Eur J Cancer 2019; 111:148–54.
4 Ciresßan DC, Meier U, Gambardella LM, Schmidhuber J. Deep, big, 20 Brinker TJ, Hekler A, Enk AH et al. Deep learning outperformed
simple neural nets for handwritten digit recognition. Neural Comput 136 of 157 dermatologists in a head-to-head dermoscopic mela-
2010; 22:3207–20. noma image classification task. Eur J Cancer 2019; 113:47–54.
5 Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with 21 Tschandl P, Codella N, Akay BN et al. Comparison of the accuracy
deep convolutional neural networks. Neural Inform Proc Systems 2012; of human readers versus machine-learning algorithms for pig-
25:3065386. mented skin lesion classification: an open, web-based, interna-
6 Cascinelli N, Ferrario M, Tonelli T, Leo E. A possible new tool for tional, diagnostic study. Lancet Oncol 2019; 20:938–47.
clinical diagnosis of melanoma: the computer. J Am Acad Dermatol 22 Tschandl P, Rosendahl C, Kittler H. The HAM10000 dataset, a
1987; 16:361–7. large collection of multi-source dermatoscopic images of common
7 Rubegni P, Burroni M, Cevenini G et al. Digital dermoscopy analy- pigmented skin lesions. Sci Data 2018; 5:180161.
sis and artificial neural network for the differentiation of clinically 23 Navarrete-Dechent C, Dusza SW, Liopyris K et al. Automated der-
atypical pigmented skin lesions: a retrospective study. J Invest Der- matological diagnosis: hype or reality? J Invest Dermatol 2018;
matol 2002; 119:471–4. 138:2277–9.
8 Rubegni P, Cevenini G, Flori ML et al. Relationship between mini- 24 Shahwan KT, Kimball AB. Should we leave the skin biopsies to the
mal phototoxic dose and skin colour plus sun exposure history: a dermatologists? JAMA Dermatol 2016; 152:371–2.
neural network approach. Photodermatol Photoimmunol Photomed 1998; 25 Dinnes J, Deeks JJ, Chuchu N et al. Dermoscopy, with and without
14:26–30. visual inspection, for diagnosing melanoma in adults. Cochrane Data-
9 Shrivastava VK, Londhe ND, Sonawane RS, Suri JS. A novel and base Syst Rev 2018; 12:CD011902.
robust Bayesian approach for segmentation of psoriasis lesions and 26 Watson DS, Krutzinna J, Bruce IN et al. Clinical applications of
its risk stratification. Comput Methods Programs Biomed 2017; 150:9–22. machine learning algorithms: beyond the black box. BMJ 2019;
10 Shen X, Zhang J, Yan C, Zhou H. An automatic diagnosis method 364:l886.
of facial acne vulgaris based on convolutional neural network. Sci 27 Topol EJ. High-performance medicine: the convergence of human
Rep 2018; 8:5839. and artificial intelligence. Nat Med 2019; 25:44–56.
11 Han SS, Park GH, Lim W et al. Deep neural networks show an 28 Mattessich S, Tassavor M, Swetter SM, Grant-Kels JM. How I
equivalent and often superior performance to dermatologists in learned to stop worrying and love machine learning. Clin Dermatol
onychomycosis diagnosis: automatic construction of onychomyco- 2018; 36:777–8.
sis datasets by region-based convolutional deep neural network. 29 NHS Health Education England. The Topol Review. Available at:
PLOS ONE 2018; 13:e0191493. https://topol.hee.nhs.uk (last accessed 27 January 2020).
12 Marchetti MA, Codella NCF, Dusza SW et al. Results of the
2016 International Skin Imaging Collaboration International
Symposium on Biomedical Imaging challenge: comparison of
Supporting Information
the accuracy of computer algorithms to dermatologists for the Additional Supporting Information may be found in the online
diagnosis of melanoma from dermoscopic images. J Am Acad Der- version of this article at the publisher’s website.
matol 2018; 78:270–7.
Table S1 Comparative studies between artificial intelligence
13 Haenssle HA, Fink C, Schneiderbauer R et al. Man against machine:
diagnostic performance of a deep learning convolutional neural algorithms and dermatologists obtained from studies published
network for dermoscopic melanoma recognition in comparison to up until June 2019.
58 dermatologists. Ann Oncol 2018; 29:1836–42. Powerpoint S1 Journal Club Slide Set.
14 Esteva A, Kuprel B, Novoa RA et al. Dermatologist-level classifica-
tion of skin cancer with deep neural networks. Nature 2017;
542:115–18.

British Journal of Dermatology (2020) 183, pp423–430 © 2020 The Authors. British Journal of Dermatology
published by John Wiley & Sons Ltd on behalf of British Association of Dermatologists

You might also like