Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Overall Schematic Diagram Describing The Practical Application of AI in All Common Ophthalmic Imaging Modalities

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

ll

OPEN ACCESS

Review
Artificial intelligence in ophthalmology:
The path to the real-world clinic
Zhongwen Li,1,2,8,* Lei Wang,2,8 Xuefang Wu,3,8 Jiewei Jiang,4,8 Wei Qiang,1 He Xie,2 Hongjian Zhou,5 Shanjun Wu,1,7
Yi Shao,6,* and Wei Chen1,2,*
1Ningbo Eye Hospital, Wenzhou Medical University, Ningbo 315000, China
2School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
3Guizhou Provincial People’s Hospital, Guizhou University, Guiyang 550002, China
4School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
5Department of Computer Science, University of Oxford, Oxford, Oxfordshire OX1 2JD, UK
6Department of Ophthalmology, the First Affiliated Hospital of Nanchang University, Nanchang 330006, China
7Senior author
8These authors contributed equally

*Correspondence: li.zhw@qq.com (Z.L.), freebee99@163.com (Y.S.), chenwei@eye.ac.cn (W.C.)


https://doi.org/10.1016/j.xcrm.2023.101095

SUMMARY

Artificial intelligence (AI) has great potential to transform healthcare by enhancing the workflow and produc-
tivity of clinicians, enabling existing staff to serve more patients, improving patient outcomes, and reducing
health disparities. In the field of ophthalmology, AI systems have shown performance comparable with or
even better than experienced ophthalmologists in tasks such as diabetic retinopathy detection and grading.
However, despite these quite good results, very few AI systems have been deployed in real-world clinical
settings, challenging the true value of these systems. This review provides an overview of the current main
AI applications in ophthalmology, describes the challenges that need to be overcome prior to clinical
implementation of the AI systems, and discusses the strategies that may pave the way to the clinical trans-
lation of these systems.

INTRODUCTION ages.14–16 Detailed information that describes different imaging


types for different purposes in ophthalmology and correspond-
In recent years, artificial intelligence (AI), including machine ing AI applications is summarized in Table S1. In addition, AI
learning and deep learning (Figure 1), has made a great impact may support eye doctors in generating individualized views of
on society worldwide. This is stimulated by the advent of power- patients along their care pathways and guide clinical decisions.
ful algorithms, exponential data growth, and computing hard- For instance, the visual prognosis after 12 months in neovascu-
ware advances.1,2 In the medical and healthcare fields, lar AMD patients receiving ranibizumab can be predicted by an
numerous studies have validated that AI exhibited robust perfor- AI-based model that is developed using their clinical data (e.g.,
mance in disease diagnoses and treatment response predic- ocular coherence tomography [OCT] and best-corrected visual
tion.3–9 For example, a deep-learning system developed on acuity [BCVA]) collected at baseline and in the first 3 months.17
computed tomography (CT) images, can distinguish patients This method may assist eye doctors in better managing the ex-
with COVID-19 pneumonia from patients with other common pectations of patients appropriately during their treatment
types of pneumonia and normal controls with an area under process.
the curve (AUC) of 0.971.10 A machine-learning model trained Although there are many reasons to be hopeful for this trans-
using dual-energy CT radiomics provides a significant additive formation brought on by AI, hurdles remain to the successful
value for response prediction (AUC = 0.75) in metastatic mela- deployment of AI in real-world clinical settings. In this review,
noma patients prior to immunotherapy.11 we first retrace the current main AI applications in ophthal-
In ophthalmology, the application of AI is very promising, mology. Second, we describe the major challenges of AI clinical
given that the diagnoses and therapeutic monitoring of ocular translation. Third, we discuss avenues that could facilitate the
diseases often rely heavily on image recognition (Figure 2). real implementation of AI into clinical practice. By stressing
Based on this technique, diabetic retinopathy (DR), glaucoma, issues in the context of present AI applications for clinical
and age-related macular degeneration (AMD) can be accurately ophthalmology, we wish to provide concepts to help promote
detected from fundus images,6,12,13 and keratitis, pterygium, significative investigations that will finally translate to real-world
and cataract can be precisely identified from slit-lamp im- clinical use.

Cell Reports Medicine 4, 101095, July 18, 2023 ª 2023 The Authors. 1
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
ll
OPEN ACCESS Review

tivity of 100%, and a specificity of 91.1% in discerning sight-


Artificial intelligence threatening DR. Tang et al.30 established a deep-learning system
for detecting referable DR and sight-threatening DR from ultra-
Supervised learning widefield fundus (UWF) images, which had a larger retinal field
SVM, Random forest, of view and contained more information about lesions (especially
XGBoost, etc. peripheral lesions) compared with traditional fundus images. The
Unsupervised learning AUCs of this system for identifying both referable DR and sight-
Machine learning Association rules, PCA, threatening DR were over 0.9 in the external validation datasets.
k-means, etc. Justin et al.31 trained a UWF-image-based deep-learning model
using ResNet34, and this model had an AUC of 0.915 for DR
Reinforcement detection in the test set. Their results indicated that the model
learning
developed based on UWF images may be more accurate than
that based on traditional fundus images because only the
Deep neural networks UWF-image-based model could detect peripheral DR lesions
AlexNet, ResNet, without pharmacologic pupil dilation.
Deep learning DenseNet, GoogleNet, In addition to screening for patients with referable DR (i.e.,
etc.
moderate and worse DR) and sight-threatening DR, detecting
early-stage DR is also crucial. Evidence indicates that proper
intervention to keep glucose, lipid profiles, and blood pressure
under control at an early stage can significantly delay the DR pro-
Figure 1. Relationship between AI, machine learning, and deep
gression and even reverse mild non-proliferative DR (NPDR) to
learning
SVM, support vector machine. PCA, principal-component analysis. the DR-free stage.32 Dai et al.23 reported a deep-learning system
named DeepDR with robust performance in detecting early to
late stages of DR. Their system was developed based on
APPLICATION OF AI ALGORITHMS IN 466,247 fundus images (121,342 diabetic patients) that were
POSTERIOR-SEGMENT EYE DISEASES graded for mild NPDR, moderate NPDR, severe NPDR, prolifer-
ative DR (PDR), and non-DR by a centered reading group
DR including 133 certified ophthalmologists. The evaluation was
The prevalence of diabetes has tripled in the past two decades performed on 209,322 images collected from three external da-
worldwide. It can cause microvascular damage and retinal tasets, China National Diabetic Complications Study (CNDCS),
dysfunction as a result of chronic exposure to hyperglycemia, Nicheng Diabetes Screening Project (NDSP), and Eye Picture
and 34.6% of people with diabetes develop DR, which is a lead- Archive Communication System (EyePACS), and the average
ing cause of vision loss in working-age adults (20–74 years).18–20 AUCs of the system in these datasets were 0.940, 0.944, and
In 2019, approximately 160 million of the population suffered 0.943, respectively. Besides, predicting the onset and progres-
from some form of DR, of whom 47 million suffered from sight- sion of DR is essential for the mitigation of the rising threat of
threatening DR.18 By 2045, this number is projected to increase DR. Bora et al.33 created a deep-learning system using image
to 242 million for DR and 71 million for sight-threatening DR.18 data obtained from 369,074 patients to predict the risk of pa-
Early identification and timely treatment of sight-threatening tients with diabetes developing DR within 2 years, and the sys-
DR can reduce 95% of blindness from this cause.18 Therefore, tem achieved an AUC of 0.70 in the external validation set.
DR screening programs for patients with diabetes are suggested This automated risk-stratification tool may help optimize DR
by the World Health Organization.21 However, conducting these screening intervals and decrease costs while improving vision-
screening programs on a large scale often requires a great deal related outcomes. Arcadu et al.34 developed a predictive DR
of manpower and material and financial resources, which is diffi- progression algorithm (AUC = 0.79) based on 14,070 stereo-
cult to realize in many low-income and middle-income countries. scopic seven-field fundus images to provide a timely referral
For this reason, exploring an approach that can reduce costs for a fast DR-progressing patient, enabling initiation of treatment
and increase the efficiency of DR screening programs should prior to irreversible vision loss occurring.
be a high priority. The emergence of AI provides potential new
solutions, such as applying AI to retinal imaging for automated Glaucoma
DR screening and referral (Table 1). Glaucoma, characterized by cupping of the optic disc and visual
Using deep learning, numerous studies have developed intel- field impairment, is the most frequent cause of irreversible blind-
ligent systems that can accurately detect DR from fundus im- ness, affecting more than 70 million people worldwide.35–37 Due
ages. Gulshan et al.7 developed a deep-learning system using to population growth and aging globally, the number of patients
128,175 fundus images (69,573 subjects) and evaluated the sys- with glaucoma will increase to 112 million by 2040.36 Most
tem in two external datasets with 11,711 fundus images (5,871 vision loss caused by glaucoma can be prevented via early diag-
subjects). Their system achieved an AUC over 0.99 in DR nosis and timely treatment.38 However, identifying glaucoma at
screening. Ting et al.6 reported a deep-learning system with an an early stage, particularly for primary open-angle glaucoma
AUC of 0.936, a sensitivity of 90.5%, and a specificity of (POAG), normal tension glaucoma (NTG), and chronic primary
91.6% in identifying referable DR, and an AUC of 0.958, a sensi- angle-closure glaucoma (CPACG), is challenging for the

2 Cell Reports Medicine 4, 101095, July 18, 2023


ll
Review OPEN ACCESS

Figure 2. Overall schematic diagram describing the practical application of AI in all common ophthalmic imaging modalities

following two reasons. First, POAG, NTG, and CPACG are often ingly, improvements in screening methods for glaucoma are
painless, and visual field defects are inconspicuous at an early necessary. AI may pave the road for cost-effective glaucoma
stage. Therefore, self-detection of these types of glaucoma by screening programs, such as detecting glaucoma from fundus
affected people usually occurs at a relatively late stage when images or OCT images in an automated fashion (Table 2).
central visual acuity is reduced.35,37 Second, the primary Li et al.29 reported a deep-learning system with excellent per-
approach to detect glaucoma is the examination of the optic formance in detecting referable glaucomatous optic neuropathy
disc and retinal nerve fiber layer by a glaucoma specialist (GON) from fundus images. Specifically, they adopted the Incep-
through ophthalmoscopy or fundus images.39–41 Such manual tion-v3 algorithm to train the system and evaluated it in 8,000
optic disc assessment is time consuming and labor intensive, images. Their system achieved an AUC of 0.986 with a sensitivity
which is infeasible to implement in large populations. Accord- of 95.6% and a specificity of 92.0% for discerning referable

Cell Reports Medicine 4, 101095, July 18, 2023 3


ll
OPEN ACCESS Review

Table 1. Major studies on the application of AI in DR


Study Year Study design Data type Data size AI type Task Performance
22
Liu et al. 2022 retrospective TFI 1,171,365 EfficientNet-B5 detection of diabetic AUC = 0.88–0.96
macular edema Sen = 71–100, Spec =
66–88
Dai et al.23 2021 retrospective TFI 666,383 ResNet and Mask DR grading AUC = 0.92–0.97, Sen =
R-CNN 87.5–93.7, Spec = 74.3–
90.0
Lee et al.24 2021 retrospective TFI 311,604 seven algorithms from detection of Sen = 51.0–85.9, Spec =
five companies: OpthAI, referable DR 60.4–83.7
AirDoc, Eyenuk, Retina-
AI Health, and Retmarker
Araújo et al.25 2020 retrospective TFI 103,062 A novel Gaussian- detection of Cohen’s quadratic
sampling method built referable DR weighted Kappa (k) =
upon a multiple-instance 0.71–0.84
learning framework
Heydon et al.26 2020 prospective TFI 120,000 machine-learning- detection of Sen = 95.7, Spec = 54.0
enabled software, referable DR
EyeArt v2.1
Natarajan et al.27 2019 prospective SFI 56,986 Medios AI, a deep- detection of Sen = 95.9–100, Spec =
learning-based system referable DR 78.7–88.4
Gulshan et al.28 2019 prospective TFI 5,762 Inception-v3 detection of AUC = 0.96–0.98, Sen =
referable DR 88.9–92.1, Spec = 92.2–
95.2
Li et al.29 2018 retrospective TFI 177,287 Inception-v3 detection of vision- AUC = 0.96–0.99, Sen =
threatening referable 92.5–97.0, Spec = 91.4–
DR 98.5
Ting et al.6 2017 retrospective TFI 189,018 adapted VGGNet detection of referable for referable DR: AUC =
and vision-threatening 0.89–0.98, Sen = 90.5–
DR 100, Spec = 73.3–92.2;
for vision-threatening
DR: AUC = 0.96, Sen =
100, Spec = 91.1
Gulshan et al.7 2016 retrospective TFI 140,426 Inception-v3 detection of AUC = 0.99, Sen = 87.0–
referable DR 97.5, Spec = 93.9–98.5
AUC, area under the curve, representing an aggregate measure of AI performance across all possible classification thresholds; R-CNN, region-based
convolutional neural network; Sen, sensitivity, representing the rate of positive samples correctly classified by an AI model; SFI, smartphone-based
fundus images; Spec, specificity, representing the rate of negative samples correctly classified by an AI model; TFI, traditional fundus images.

GON. As fundus imaging is intrinsically a two-dimensional (2D) angle-closure detection.55 The system achieved an AUC of
imaging modality observing the surface of the optic nerve head 0.96 with a sensitivity of 0.90 and a specificity of 0.92, which
but glaucoma is a three-dimensional (3D) disease with depth- were better than those of the qualitative feature-based system
resolved structural changes, fundus imaging may not able to (AUC, 0.90; sensitivity, 0.79; specificity, 0.87). These results indi-
reach a level of accuracy that could be acquired through OCT, cate that deep learning may mine a broader range of details of
a 3D imaging modality.52,53 Ran et al.49 trained and tested a anterior-segment OCT images than the qualitative features
3D deep-learning system using 6,921 spectral-domain OCT vol- (e.g., angle opening distance, angle recess area, and iris area)
umes of optic disc cubes from 1,384,200 2D cross-sectional determined by clinicians.
scans. Their 3D system reached an AUC of 0.969 in detecting AI can also be used to predict glaucoma progression. Yousefi
GON, significantly outperforming a 2D deep-learning system et al.56 reported an unsupervised machine-learning method to
trained with fundus images (AUC, 0.921). This 3D system also identify longitudinal glaucoma progression based on visual fields
had performance comparable with two glaucoma specialists from 2,085 eyes of 1,214 subjects. They found that this machine-
with over 10 years of experience. The heatmaps indicated that learning analysis detected progressing eyes earlier (3.5 years)
the features leveraged by the 3D system for GON detection than other methods such as global mean deviation (5.2 years),
were similar to those leveraged by glaucoma specialists. region-wise (4.5 years), and point-wise (3.9 years). Wang
Primary angle-closure glaucoma is avoidable if the progress of et al.57 proposed an AI approach, the archetype method, to
angle closure can be stopped at the early stages.54 Fu et al. detect visual field progression in glaucoma with an accuracy of
developed a deep-learning system using 4,135 anterior- 0.77. Moreover, this AI approach had a significantly higher
segment OCT images from 2,113 individuals for the automated agreement (kappa, 0.48) with the clinician assessment than

4 Cell Reports Medicine 4, 101095, July 18, 2023


ll
Review OPEN ACCESS

Table 2. Major studies on the application of AI in glaucoma


Study Data Data
Study Year design type size AI type Task Performance
Xiong 2022 retrospective VFs and 2,463 pairs multimodal AI detection of GON AUC = 0.87–0.92,
et al.42 OCT images algorithm, Sen = 77.3–81.3,
FusionNet Spec = 84.8–90.6
Li et al.43 2022 retrospective TFI 31,040 images and U-Net and prediction of For predicting
longitudinal data from PredictNet glaucoma glaucoma incidence,
7,127 participants incidence and AUC = 0.88–0.90; for
progression predicting glaucoma
progression, AUC =
0.87–0.91
Fan 2022 retrospective TFI and VFs 66,715 ResNet50 detection of POAG in AUC = 0.74–0.91,
et al.44 patients with ocular Sen = 76–86, Spec =
hypertension 80–85
Li et al.45 2021 retrospective anterior-segment 1.112 million images 3D-ResNet-34 detection of narrow task 1
OCT images of 8694 vol scans and 3D-ResNet- iridocorneal angles AUC = 0.943, Sen =
50 (task 1) 86.7, Spec = 87.8
and peripheral Task 2
anterior AUC = 0.902, Sen =
synechiae (task 2) in 90.0, Spec = 89.0
eyes
with suspected
PACG
Dixit 2021 retrospective a longitudinal 672,123 VF results convolutional long assessment of AUC = 0.89–0.93
et al.46 dataset of merged and 350,437 short-term memory glaucoma
VFs and clinical samples of clinical neural network progression
data data
Medeiros 2021 retrospective OCT images 86,123 pairs ResNet50 detection of AUC = 0.86–0.96
et al.47 and TFI progressive
GON damage
Li et al.12 2020 retrospective UWFI 22,972 InceptionResNetV2 detection of GON AUC = 0.98–1.00,
Sen = 97.5–98.2,
Spec = 94.3–98.4
Yousefi 2020 retrospective VFs 31,591 PCA, manifold monitoring Sen = 77, Spec = 94
et al.48 learning, and glaucomatous
unsupervised functional loss
clustering
Ran et al.49 2019 retrospective OCT images 6,921 ResNet-based 3D detection of GON AUC = 0.89–0.97,
system Sen = 78–90, Spec =
79–96
Martin 2018 prospective CLS parameters 435 subjects random forest detection of POAG AUC = 0.76
et al.50 and initial IOP
Li et al.29 2018 retrospective TFI 48,116 Inception-v3 detection of referable AUC = 0.99, Sen =
GON 95.6, Spec = 92.0
Asaoka 2016 retrospective VFs 279 deep feedforward detection of AUC = 0.93, Sen =
et al.51 neural network preperimetric 77.8, Spec = 90.0
glaucoma
CLS, contact lens sensor; GON, glaucomatous optic neuropathy; IOP, intraocular pressure; OCT, optical coherence tomography; PACG, primary
angle-closure glaucoma; PCA, principal-component analysis; POAG, primary open-angle glaucoma; UWFI, ultra-widefield fundus images; VF, visual
field.

other existing methods (e.g., the permutation of point-wise linear strongest risk factor for AMD and almost all late AMD cases
regression). happen in people at ages over 60 years.58 With the aging pop-
ulation, AMD will continue to be a major cause of vision impair-
AMD ment worldwide. The number of AMD patients will reach 288
AMD, a disease that affects the macular area of the retina, million in 2040, denoting the substantial global burden of
often causes progressive loss of central vision.58 Age is the AMD.59 Consequently, screening for patients with AMD

Cell Reports Medicine 4, 101095, July 18, 2023 5


ll
OPEN ACCESS Review

Table 3. Major studies on the application of AI in AMD


Study Year Study design Data type Data size AI type Task Performance
Potapenko 2022 prospective OCT images 106,840 temporal deep- detection of CNV AUC = 0.90–0.98,
et al.63 learning model activity in AMD Sen = 80.5–95.6,
patients Spec = 84.6–95.9
Yellapragada 2022 retrospective TFI 100,848 self-supervised classification of AMD Acc = 65–87
et al.64 non-parametric severity
instance
discrimination
Rakocz et al.65 2021 retrospective 3D OCT 1,942 SLIVER-net detection of AMD AUC = 0.83–0.99
images architecture progression risk
factors
Yim et al.8 2020 retrospective 3D OCT 130,327 3D U-Net, 3D dense predicting conversion AUC = 0.75–0.89
images convolution blocks, to wet AMD
and 3D prediction
network
Hwang et al.66 2019 retrospective OCT 35,900 VGG16, InceptionV3, differentiating normal AUC = 0.98–0.99,
images and ResNet50 macula and three Acc = 90.7–92.7
AMD types (dry,
inactive wet, and
active wet AMD)
Keel et al.67 2019 retrospective TFI 142,725 Inception-v3 detection of AUC = 0.97–1.00,
neovascular AMD Sen = 96.7–100,
Spec = 93.4–96.4
Peng et al.61 2019 retrospective TFI 59,302 Inception-v3 classification of AUC = 0.93–0.97
patient-based AMD
severity
Grassmann 2018 retrospective TFI 126,211 a network ensemble Predicting the AMD Quadratic weighted
et al.68 of six different neural stage k = 0.92, Acc = 63.3
net architectures
Kermany et al.69 2018 retrospective OCT 208,130 Inception-v3 detection of referable AUC = 0.99–1.00,
images AMD Sen = 96.6–97.8,
Spec = 94.0–97.4
Burlina et al.62 2017 retrospective TFI 133,821 AlexNet detection of referable AUC = 0.94–0.96,
AMD Sen = 72.8–88.4,
Spec = 91.5–94.1
Acc, accuracy; CNV, choroidal neovascularization.

(especially neovascular AMD) and providing suitable medical AI also has the potential to predict the possibility of progres-
interventions in a timely manner can reduce vision loss and sion to late AMD, guiding high-risk patients to start preventive
improve patient visual outcomes.60 care early (e.g., eating healthy food, abandoning smoking, and
AI has the potential to facilitate the automated detection of taking supplements) and assisting clinicians to decide the inter-
AMD and prediction of AMD progression (Table 3). Peng val of the patient’s follow-up examination. In patients diagnosed
et al.61 constructed and tested a deep-learning system with wet AMD in one eye, Yim et al.8 introduced an AI system to
(DeepSeeNet) using 59,302 fundus images from the longitudinal predict conversion to wet AMD in the second eye. Their system
follow-up of 4,549 subjects from the Age-Related Eye Disease was constructed by a segmentation network, diagnosis network,
Study (AREDS). DeepSeeNet performed well on patient-based and prediction network based on 130,327 3D OCT images and
multi-class classification with AUCs of 0.94, 0.93, and 0.97 in de- corresponding automatic tissue maps for predicting progression
tecting large drusen, pigmentary abnormalities, and late AMD, to wet AMD within a clinically actionable time window (6 months).
respectively. Burlina et al.62 reported a deep-learning system The system achieved 80% sensitivity at 55% specificity and
established by AlexNet based on over 130,000 fundus images 34% sensitivity at 90% specificity. As both genetic and environ-
from 4,613 patients to screen for referable AMD, and their sys- mental factors can affect the etiology of AMD, Yan et al.70 devel-
tem achieved an average AUC of 0.95. The referable AMD in their oped an AI approach with a modified deep convolutional neural
study refers to eyes with one of the following conditions: (1) large network (CNN) using 52 AMD-associated genetic variants and
drusen (size larger than 125 mm); (2) multiple medium-sized 31,262 fundus images from 1,351 individuals from the AREDS
drusen and pigmentation abnormalities; (3) choroidal neovascu- to predict whether an eye would progress to late AMD. Their re-
larization (CNV); (4) geographic atrophy.62 sults showed that the approach based on both fundus images

6 Cell Reports Medicine 4, 101095, July 18, 2023


ll
Review OPEN ACCESS

and genotypes could predict late AMD progression with an AUC for the automated detection of visually significant cataracts
of 0.85, whereas the approach based on fundus images alone (BCVA < 20/60), achieving AUCs of 0.916–0.965 in three external
achieved an AUC of 0.81. test sets. One merit of this system is that it can screen for cata-
racts with a single imaging modality, which is different from the
Other retinal diseases traditional method that requires slit-lamp and retroillumination
Numerous studies also have found that AI could be applied to images alongside BCVA measurement. The other merit is that
promote the automated detection of other retinal diseases this system can be readily integrated into existing fundus-im-
from clinical images to provide timely referrals for positive cases, age-based AI systems, allowing simultaneous screening for
solving the issues caused by the unbalanced distribution of other posterior-segment diseases.
ophthalmic medical resources. Milea et al.71 developed a Other than cataract screening, AI can also offer real-time guid-
deep-learning system using 14,341 fundus images to detect ance for phacoemulsification cataract surgery (PCS). Nespolo
papilledema. This system achieved an AUC of 0.96 in the et al.87 invented a computer vision-based platform using a re-
external test dataset consisting of 1,505 images. Brown et al.72 gion-based CNN (Faster R-CNN) built on ResNet50, a k-means
established a deep-learning system based on 5,511 retinal im- clustering technique, and an optical-flow-tracking technology
ages captured by RetCam to diagnose plus disease in retinop- to enhance the surgeon experience during the PCS. Specifically,
athy of prematurity (ROP), a leading cause of blindness in this platform can be used to receive frames from the video
childhood. The AUC of their system was 0.98 with a sensitivity source, locate the pupil, discern the surgical phase being per-
of 93% and a specificity of 93%. In terms of detecting peripheral formed, and provide visual feedback to the surgeon in real
retinal diseases, such as lattice degeneration and retinal breaks, time. The results showed that the platform achieved AUCs of
Li et al.73 trained models with four different deep-learning 0.996, 0.972, 0.997, and 0.880 for capsulorhexis, phacoemulsi-
algorithms (InceptionResNetV2, ResNet50, InceptionV3, and fication, cortex removal, and idle-phase recognition, respec-
VGG16) using 5,606 UWF images. They found that tively, with a dice score of 90.23% for pupil segmentation and
InceptionResNetV2 had the best performance, which achieved a mean processing speed of 97 frames per second. A usability
an AUC of 0.996 with 98.7% sensitivity and 99.2% specificity. survey suggested that most surgeons would be willing to
In addition, AI has also been employed in the automated identi- perform PCS for complex cataracts with this platform and
fication of retinal detachment,74 pathologic myopia,75 polypoidal thought it was accurate and helpful.
choroidal vasculopathy,76 etc.
Keratitis
APPLICATION OF AI ALGORITHMS IN Keratitis is a major global cause of corneal blindness, often
ANTERIOR-SEGMENT EYE DISEASES affecting marginalized populations.88 The burden of corneal
blindness on patients and the wider community can be huge,
Cataract particularly as it tends to occur in people at a younger age
In the past 20 years, although the prevalence of cataracts has than other blinding eye diseases such as AMD and cataracts.89
been decreasing due to the increasing rates of cataract surgery Keratitis can get worse quickly with time, which may lead to per-
because of improved techniques and active surgical initiatives, it manent visual impairment and even corneal perforation.90 Early
still affects 95 million people worldwide.77 Cataract remains the detection and timely management of keratitis can halt the dis-
leading cause of blindness (accounting for 50% of blindness), ease progression, resulting in a favorable prognosis.91
especially in low-income and middle-income countries.77 There- Li et al.15 found that AI had high accuracy in screening for kera-
fore, exploring a set of strategies to promote cataract screening titis and other corneal abnormalities from slit-lamp images. In
and related ophthalmic services is imperative. Recent advance- terms of the deep-learning algorithms, they used Inception-v3,
ments in AI may help achieve this goal, such as diagnosis and DenseNet121, and ResNet50, with DensNet121 performing
quantitative classification of age-related cataract from slit-lamp best. To be specific, the optimal algorithm DenseNet121
images (Table 4). reached AUCs of 0.988–0.997, 0.982–0.990, and 0.988–0.998
Keenan et al.16 trained deep-learning models, named for the classification of keratitis, other corneal abnormalities
DeepLensNet, to detect and quantify nuclear sclerosis (NS) (e.g., corneal dystrophies, corneal degeneration, corneal tu-
from 45 slit-lamp images and cortical lens opacity (CLO) and mors), and normal cornea, respectively, in three external test da-
posterior subcapsular cataract (PSC) from retroillumination im- tasets. Interestingly, their system also performed well on cornea
ages. NS grading was considered on 0.9–7.1 scale. CLO and images captured by smartphone under the super-macro mode,
PSC grading were both considered as percentages. In the full with an AUC of 0.967, a sensitivity of 91.9%, and a specificity
test set, mean squared error values for DeepLensNet were of 96.9% in keratitis detection. This smartphone-based
0.23 for NS, 13.1 for CLO, and 16.6 for PSC. The results indicate approach will be extremely cost-effective and convenient for
that this framework can perform automated and quantitative proactive keratitis screening by high-risk people (e.g., farmers
classification of cataract severity with high accuracy, which and contact lens wearers) if it can be applied to clinical practice.
has the potential to increase the accessibility of cataract evalua- To give prompt and precise treatment to patients with infectious
tion globally. Except for slit-lamp images, Tham et al.78 found keratitis, Xu et al.92 proposed a sequential-level deep-learning
that fundus images could also be used to develop an AI system system that could effectively discriminate among bacterial kera-
for cataract screening. Based on 25,742 fundus images, they titis, fungal keratitis, herpes simplex virus stromal keratitis, and
constructed a framework with ResNet50 and XGBoost classifier other corneal diseases (e.g., phlyctenular keratoconjunctivitis,

Cell Reports Medicine 4, 101095, July 18, 2023 7


ll
OPEN ACCESS Review

Table 4. Major studies on the application of AI in cataract


Study Year Study design Data type Data size AI type Task Performance
16
Keenan et al. 2022 retrospective Slit-lamp images 18,999 DeepLensNet diagnosis and Mean squared error =
quantitative 0.23–16.6
classification of age-
related cataract
Tham et al.78 2022 retrospective TFI 25,742 ResNet50 and detection of visually AUC = 0.92–0.97,
XGBoost classifier significant cataract Sen = 88.8–96.0,
Spec = 81.1–90.3
Xu et al.79 2021 retrospective TFI 9,912 Global-local attention Cataract diagnosis For cataract
network and grading diagnosis: Acc =
90.7; for cataract
grading: Acc = 83.5
Lu et al.80 2021 prospective Slit-lamp images 847 Faster R-CNN and Cataract grading AUC = 0.80–0.98,
ResNet50 Sen = 85.7–94.7,
Spec = 63.6–93.2
Lin et al.81 2020 prospective Participants’ 1,738 Random forest and detection of AUC = 0.82–0.96,
demographic adaptive boosting congenital cataracts Sen = 56-82, Spec =
variables, birth methods 78-98
conditions, family
medical history, and
environmental factors
Xu et al.82 2020 retrospective TFI 8,030 Hybrid global-local Cataract grading Acc = 86.2, Sen =
feature 79.8–95.0, Spec =
representation model 83.3–88.4
Wu et al.83 2019 retrospective Slit-lamp images 37,638 ResNet50 diagnosis of For cataract
cataracts and diagnosis: AUC =
detection of referable 0.99–1.00; for
cataracts detecting referable
cataracts: AUC =
0.92–1.00
Lin et al.84 2019 prospective Slit-lamp images 700 AI platform, CC- diagnosis of For cataract
Cruiser childhood cataracts diagnosis: Acc =
and provision of 87.4; for treatment
treatment determination: Acc =
recommendation 70.8
Zhang et al.85 2019 retrospective TFI 1,352 ResNet18, SVM, and Cataract grading Acc = 92.7, Sen =
FCNN 82.4–99.4, Spec =
81.3–98.5
Gao et al.86 2015 retrospective Slit-lamp images 5,378 Convolutional- Cataract grading MAE = 0.304
recursive neural
network
FCNN, fully connected neural network; SVM, support vector machine.

acanthamoeba keratitis, corneal papilloma), with an overall diag- or the loss of corneal transparency.101 Early identification of ker-
nostic accuracy of 80%, outperforming the mean diagnostic ac- atoconus, especially subclinical keratoconus, and subsequent
curacy (49.27%) achieved by 421 ophthalmologists. The strength treatment (e.g., corneal crosslinking and intrastromal corneal
of this system was that it could extract the detailed patterns of the ring segments) are crucial to stabilize the disease and improve
cornea region and assign local features to an ordered set to the visual prognosis.101 Advanced keratoconus can be detected
conform to the spatial structure and thereby learn the global fea- by classic clinical signs (e.g., Vogt’s striae, Munson’s sign,
tures of the corneal image to perform diagnosis, which achieved Fleischer ring) through slit-lamp examination or by corneal
better performance than conventional CNNs. Major AI applica- topographical characteristics such as increased corneal refrac-
tions in keratitis diagnosis are described in Table 5. tive power, steeper radial axis tilt, and inferior-superior (I-S)
corneal refractive asymmetry from corneal topographical
Keratoconus maps. However, the detection of subclinical keratoconus re-
Keratoconus is a progressive corneal ectasia with central or par- mains challenging.102
acentral stroma thinning and corneal protrusion, resulting in irre- AI may accurately diagnose subclinical keratoconus and ker-
versible visual impairment due to irregular corneal astigmatism atoconus and predict their progress trends (Table 6). Luna

8 Cell Reports Medicine 4, 101095, July 18, 2023


ll
Review OPEN ACCESS

Table 5. Major studies on the application of AI in keratitis


Study Year Study design Data type Data size AI type Task Performance
93
Redd et al. 2022 retrospective corneal images 980 MobileNet differentiation of BK AUC = 0.83–0.86, For
and FK detecting BK, Acc =
75; for detecting FK,
Acc = 81
Ren et al.94 2022 prospective conjunctival 149 random forest differentiation of BK, for referring
swabs FK, and VK using the microbiota
conjunctival bacterial composition, Acc =
microbiome 96.3; for referring
characteristics gene functional
composition, Acc =
93.8
Wu et al.95 2022 prospective tear Raman 75 CNN and RNN detection of keratitis Acc = 92.7–95.4,
spectroscopy Sen = 93.8–95.6,
Spec = 94.8–97.4
Tiwari et al.96 2022 retrospective corneal images 2,746 VGG-16 differentiation of AUC = 0.95–0.97,
active corneal ulcers Sen = 78.2–93.5,
from healed scars Spec = 84.4–91.3
Li et al.15 2021 retrospective slit-lamp and 13,557 DenseNet121 detection of keratitis AUC = 0.97–1.00,
smartphone Sen = 91.9–97.7,
images Spec = 96.9–98.2
Ghosh et al.97 2021 retrospective slit-lamp 2,167 ensemble learning discrimination AUC = 0.90, Sen = 77,
images between BK and FK F1 score = 0.83
Wang et al.98 2021 retrospective slit-lamp and 6,073 Inception-v3 differentiation of BK, AUC = 0.85–0.96,
smartphone FK, and VK QWK = 0.54–0.91
images
Xu et al.92 2020 retrospective slit-lamp 115,408 deep sequential differentiation of BK, Acc = 80.0
images feature learning FK, VK, and other
types of infectious
keratitis
Lv et al.99 2020 retrospective IVCM images 2,088 ResNet101 diagnosis of FK AUC = 0.98–0.99,
Sen = 82.6–91.9
Spec = 98.3–98.9
Gu et al.100 2020 retrospective slit-lamp 5,325 Inception-v3 detection of AUC = 0.88–0.95
images infectious and non-
infectious keratitis
BK, bacterial keratitis; CNN, convolutional neural network; FK, fungal keratitis; IVCM, in vivo confocal microscopy; QWK, quadratic weighted kappa;
RNN, recurrent neural network; VK, viral keratitis.

et al.103 reported machine-learning techniques, decision tree, and F1 score of 0.99 for the two-class task (normal vs. kerato-
and random forest for the diagnosis of subclinical keratoconus conus) and an accuracy of 81.5% with an AUC of 0.93 and F1
based on Pentacam topographic and Corvis biomechanical score of 0.81 for the three-class task (normal vs. keratoconus
metrics, such as the flattest keratometry curvature, steepest vs. subclinical keratoconus).
keratometry curvature, stiffness parameter at the first flat- Early and accurate prediction of progress trends in keratoco-
tening, and corneal biomechanical index. The optimal model nus is critical for the prudent and cost-effective use of corneal
achieved an accuracy of 89% with a sensitivity of 93% and a crosslinking and the determination of timing of follow-up visits.
specificity of 86%. Meanwhile, they found that the stiffness Garcı́a et al.106 reported a time-delay neural network to predict
parameter at the first flattening was the most important deter- keratoconus progression using two prior tomography measure-
minant in identifying subclinical keratoconus. Timemy et al.104 ments from Pentacam. This network received six characteristics
introduced a hybrid deep-learning construct for the detection as input (e.g., average keratometry, the steepest radius of the
of keratoconus. This model was developed using corneal topo- front surface, and the average radius of the back surface), eval-
graphic maps from 204 normal eyes, 215 keratoconus eyes, uated in two consecutive examinations, forecasted the future
and 123 subclinical keratoconus eyes and was tested in an in- values, and obtained the result (stable or suspect progressive)
dependent dataset including 50 normal eyes, 50 keratoconus leveraging the significance of the variation from the baseline.
eyes, and 50 subclinical keratoconus eyes. The proposed The average positive and negative predictive values of the
model reached an accuracy of 98.8% with an AUC of 0.99 network were 71.4% and 80.2%, indicating it had the potential

Cell Reports Medicine 4, 101095, July 18, 2023 9


ll
OPEN ACCESS Review

Table 6. Major studies on the application of AI in keratoconus


Study Year Study design Data type Data size AI type Task Performance
Junior 2022 retrospective corneal tomographic 2,893 BESTi detection of AUC = 0.99, Sen =
et al.105 images subclinical KC 87.0–97.5, Spec =
93.9–98.5
Timemy 2021 retrospective corneal tomographic 4,844 EfficientNet-b0 detection of AUC = 0.93–0.99,
et al.104 images and SVM suspected KC Acc = 81.5–98.8
and KC
Garcı́a106 2021 retrospective KC patients 743 TDNN detection of KC Sen = 70.8, Spec =
measured with progression 80.6
Pentacam
Luna et al.103 2021 retrospective subjects with 81 decision tree and diagnosis of Acc = 89, Sen = 93,
Pentacam random forest Subclinical KC Spec = 86
topographic and
Corvis biomechanical
metrics
Xie et al.107 2020 retrospective corneal tomographic 6,465 InceptionResNetV2 detection of AUC = 0.99, Sen =
images suspected irregular 91.9, Spec = 98.7
cornea and KC
Zéboulon 2020 retrospective subjects with corneal 3000 ResNet detection of KC Acc = 99.3, Sen =
et al.108 topography raw data 100, Spec = 100
Shi et al.109 2020 retrospective eyes with both 121 machine-learning- detection of AUC = 0.93, Sen =
corneal tomographic derived classifier subclinical KC 98.5, Spec = 94.7
and UHR-OCT
images
Cao et al.110 2020 retrospective eyes with complete 267 random forest detection of Acc = 98, Sen = 97,
Pentacam subclinical KC Spec = 98
parameters
Issarti 2019 retrospective subjects with corneal 851 FNN and Grossberg- detection of Acc = 96.6, Sen =
et al.111 elevation and Runge Kutta suspected KC 97.8, Spec = 95.6
thickness data architecture
Hidalgo 2016 retrospective eyes with Pentacam 860 SVM detection of KC Acc = 98.9, Sen =
et al.112 parameters 99.1, Spec = 98.5
BESTi, boosted ectasia susceptibility tomography index; FNN, feedforward neural network; KC, keratoconus; TDNN, time-delay neural network; UHR-
OCT, ultra-high-resolution ocular coherence tomography.

to assist clinicians to make a personalized management plan for trols, 98.0%, 94.5%, and 92.6%, respectively. Moreover, Li
patients with keratoconus. et al.3 introduced an AI system based on Faster R-CNN and
DenseNet121 to detect malignant eyelid tumors from photo-
Other anterior-segment diseases graphic images captured by ordinary digital cameras. In an
A large number of studies have also proved the possibility of us- external test set, the average precision score of the system
ing AI to detect other anterior-segment diseases. For example, was 0.762 for locating eyelid tumors and the AUC was 0.899
Chase et al.113 demonstrated that the deep-learning system, for discerning malignant eyelid tumors.
developed by a VGG19 network based on 27,180 anterior-
segment OCT images, was able to identify dry eye diseases, APPLICATION OF AI ALGORITHMS IN PREDICTING
with 84.62% accuracy, 86.36% sensitivity, and 82.35% speci- SYSTEMIC DISEASES BASED ON RETINAL IMAGES
ficity. The performance of this system was significantly better
than some clinical dry eye tests, such as Schirmer’s test and AI has the potential to detect hidden information that clinicians
corneal staining, and was comparable with that of tear break- are normally unable to perceive from digital health data. In
up time and Ocular Surface Disease Index. In addition, Zhang ophthalmology, with the continuous advancement of AI technol-
et al.114 developed a deep-learning system for detecting ogies, the application of AI based on retinal images has extended
obstructive meibomian gland dysfunction (MGD) and atrophic from the detection of multiple fundus diseases to the screening
MGD using 4,985 in vivo laser confocal microscope images for systemic diseases. These breakthroughs can be attributed
and validated the system on 1,663 images. The accuracy, sensi- to the following three reasons: (1) the unique anatomy of the
tivity, and specificity of the system for obstructive MGD were eye offers an accessible ‘‘window’’ for the in vivo visualization
97.3%, 88.8%, and 95.4%, respectively; for atrophic MGD, of microvasculature and cerebral neurons; (2) the retina manifes-
98.6%, 89.4%, and 98.4%, respectively; and for healthy con- tations can be signs of many systemic diseases, such as

10 Cell Reports Medicine 4, 101095, July 18, 2023


ll
Review OPEN ACCESS

Figure 3. Medical AI translational challenges


between system development and routine
Development clinical application
Clinical
of medical AI Validity Generalizability Interpretability Longevity Liability There are five major challenges in the path of AI
application clinical translation: validity, generalizability, inter-
systems
pretability, longevity, and liability.

AI translational challenges
trained solely by retinal images achieved
AUCs of 0.733–0.911 in validation and
testing datasets, indicating the feasibility
diabetes and heart disease; (3) retinal changes can be recorded of employing retinal photography as an adjunctive screening
through non-invasive digital fundus imaging, which is low cost tool for CKD in community and primary care settings.122
and widely available in different levels of medical institutions.
Alzheimer’s disease
Cardiovascular disease Alzheimer’s disease (AD), a progressive neurodegenerative dis-
Cardiovascular disease (CVD) is a leading cause of death glob- ease, is the most common type of dementia in the elderly world-
ally, taking an estimated 17.9 million lives annually.115 Overt wide and is becoming one of the most lethal, expensive, and
retinal vascular damage (such as retinal hemorrhages) and sub- burdening diseases of this century.123 Diagnosis of AD is com-
tle changes (such as retinal arteriolar narrowing) are markers of plex and normally involves expensive and sometimes invasive
CVD.116 To improve present risk-stratification approaches for tests (such as amyloid positron emission tomography [PET] im-
CVD events, Rim et al.117 developed and validated a deep- aging and cerebrospinal fluid assays), which are not usually avail-
learning-based cardiovascular risk-stratification system using able outside of highly specialized clinical institutions. The retina is
216,152 retinal images from five datasets from Singapore, South an extension of the central nervous system and offers a distinc-
Korea, and the United Kingdom. This system achieved an AUC of tively accessible insight into brain pathology. Research has found
0.742 in predicting the presence of coronary artery calcium (a potentially measurable structural, vascular, and metabolic
preclinical marker of atherosclerosis and strongly associated changes in the retina at the early stages of AD.124 Therefore, using
with the risk of CVD). Poplin et al.118 reported that deep-learning noninvasive and low-cost retinal photography to detect AD is
models trained on data from 284,355 patients could extract new feasible. Cheung et al.125 demonstrated that a deep-learning
information from retinal images to predict cardiovascular risk model had the capability to identify AD from retinal images alone.
factors, such as age (mean absolute error [MAE] within 3.26 They trained, validated, and tested the model using 12,949 retinal
years), gender (AUC = 0.97), systolic blood pressure (MAE within images from 648 AD patients and 3,240 individuals without the
11.23 mm Hg), smoking status (AUC = 0.71), and major adverse disease.125 The model had accuracies ranging from 79.6% to
cardiac events (AUC = 0.70). Meanwhile, they demonstrated that 92.1% and AUCs ranging from 0.73 to 0.91 for detecting AD in
the deep-learning models generated each prediction using testing datasets. In the datasets with PET information, the model
anatomical features, such as the retinal vessels or the optic disc. can also distinguish between participants who were b-amyloid
positive and those who were b-amyloid negative, with accuracies
Chronic kidney disease and type 2 diabetes ranging from 80.6% to 89.3% and AUCs ranging from 0.68 to
Chronic kidney disease (CKD) is a progressive disease with high 0.86. This study showed that a retinal-image-based deep-
morbidity and mortality that occurs in the general adult popula- learning algorithm had high accuracy in detecting AD and this
tion, particularly in people with diabetes and hypertension.119 approach could be used to screen for AD in a community setting.
Type 2 diabetes is another common chronic disease that
accounts for nearly 90% of the 537 million cases of diabetes Challenges in the AI clinical translation
worldwide.120 Early diagnosis and proactive management of Although AI systems have shown great performance in a wide
CKD and diabetes are critical in reducing microvascular and variety of retrospective studies, relatively few of them have
macrovascular complications and mortality burden. As CKD been translated into clinical practice. Many challenges, such
and diabetes have manifestations in the retina, retinal images as the generalizability of AI systems, still exist and stand in the
can be used to detect and monitor these diseases. Zhang path of true clinical adoption of AI tools (Figure 3). In this section,
et al.121 reported that deep-learning models developed based we highlight some critical challenges and the research that has
on 115,344 retinal images from 56,672 patients were able to already been conducted to tackle these issues.
detect CKD and type 2 diabetes solely from retinal images or
in combination with clinical metadata (e.g., age, sex, body VALIDITY OF AI SYSTEMS
mass index, and blood pressure) with AUCs of 0.85–0.93. The
models can also be utilized to predict estimated glomerular filtra- Data issues in developing robust AI systems
tion rates and blood-glucose levels, with MAEs of 11.1–13.4 mL Data sharing
min 1 per 1.73 m2 and 0.65–1.1 mmol L 1, respectively.121 Sa- Large datasets are required to facilitate the development of a
banayagam et al.122 established a deep-learning algorithm using robust AI system. The lack of high-quality public datasets that
12,790 retinal images to screen for CKD. In this study, the model are truly representative of real-world clinical practice stands in

Cell Reports Medicine 4, 101095, July 18, 2023 11


ll
OPEN ACCESS Review

Federated sever

Global model

Updated global model

Local training and privacy preserving Local training and privacy preserving Local updates Local training and privacy preserving

Local data 1 Local data 2 Local data 3

Figure 4. Framework of federated learning


Local hospitals are given a copy of a current global model from a federated server to train on their own datasets. After a certain number of iterations, the local
hospitals send model updates back to the federated server and keep their datasets in their own secure infrastructure. The federated server aggerates the
contributions from these hospitals. Then the updated global model is shared with the local hospitals and they can continue local training. The main advantage of
federated learning is that it establishes a global model without directly sharing datasets, preserving patient privacy across sites.

the path of clinical translation of AI systems. Data sharing might may lead to unpredictable clinical consequences, such as erro-
be a good solution but it generates ethical and legal challenges in neous classifications.130 Several approaches have been lever-
general. Even if data are obtained in an anonymized manner, it aged to resolve disagreements between graders and obtain
can potentially put patient privacy at risk.126 Protecting patient ground truth. One method consists of adopting the majority
privacy and acquiring approval for data use are important rules decision from a panel of three or more professional graders.7
to comply with. Unfortunately, these rules may hinder data Another consists of recruiting two or more professional graders
sharing among different medical research groups, not to mention to label data independently and then employing another
making the data publicly available. The adoption of federated senior grader to arbitrate disagreements, with the senior
learning is a good alternative to training AI models with diverse grader’s decision used as the ground truth.131 Third, some
data from multiple clinical institutions without the centralization data annotations can be conducted using the recognized gold
of data.127 This strategy can address the issue that data reside standard. For example, Li et al.3 annotated images of benign
in different institutions, removing barriers to data sharing and cir- and malignant eyelid tumors based on unequivocal histopatho-
cumventing the problem of patient privacy (Figure 4). Meanwhile, logical diagnoses.
federated learning can facilitate rapid data science collaboration Normally, annotations can be divided into two categories: the
and thus improve the robustness of AI systems.128 In addition, annotation of interest regions in images (e.g., retinal hemor-
controlled-access data sharing, an approach that requires a rhages, exudates, and drusen) and clinical annotations (e.g.,
request for access to datasets to be approved, is another alter- disease classification, treatment response, vision prognosis).
native solution for researchers to acquire data to solve relevant Conducting manual annotations for large-scale datasets before
research issues while protecting participant privacy. the model training is a considerably time-consuming and la-
Data annotation bor-intensive task that needs a lot of professional graders or
Accurate clinical data annotations are crucial for the develop- ophthalmologists, hindering the construction of robust AI sys-
ment of reliable AI systems,2,129 as annotation inconsistencies tems.132–135 Therefore, exploring techniques to promote the

12 Cell Reports Medicine 4, 101095, July 18, 2023


ll
Review OPEN ACCESS

efficient production of annotations is important, although manual real-world primary care clinic. They found that the system had a
annotations are still necessary. Training models using weakly su- high false-positive rate (7.9%) and a low positive predictive value
pervised learning may be a good approach to reduce the work- (12%). The possible reason for this unsatisfactory performance of
load of manual annotations.133,134 For image segmentation, the system is that the incidence of DR in the primary care clinic
weak supervision requires sparse manual annotations of small was 1%, whereas their AI system was developed using retrospec-
interest regions using dots via experts, whereas full supervision tive data in which the incidence of DR was much higher (33.3%).
needs dense annotations, in which all pixels of images are manu- Long et al.145 developed an AI platform that had 98.25% accuracy
ally labeled.132,134 Playout et al.132 have reported that weak su- in childhood cataract diagnosis and 92.86% accuracy in treat-
pervision in combination with advanced learning in model ment suggestions in external datasets retrospectively collected
training can achieve performance comparable with fully super- from three hospitals. However, when they applied the platform
vised models for retinal lesion segmentation in fundus images. to unselected real-world datasets prospectively obtained from
Standardization of clinical data collection five hospitals, accuracies decreased to 87.4% in cataract diag-
In the past two decades, health systems have heavily invested in nosis and 70.8% in treatment determination.84 The possible
the digitalization of every aspect of their operation. This transfor- explanation for this phenomenon is that the retrospective datasets
mation has resulted in unprecedented growth in the volume of often undergo extensive filtering and cleaning, which makes them
medical electronic data, facilitating the development of AI-based less representative of real-world clinical practice. Randomized
medical devices.136 Although the size of datasets has increased, controlled trials (RCTs) and prospective research can bridge
data collection is not done in a standardized manner, affecting such gaps between theory and practice, showing the true perfor-
the ready utilization of these data for AI model training and testing. mance of AI systems in real healthcare settings and demon-
This issue leads to a growing number of multicentric efforts to deal strating how useful the systems are for the clinic.146
with the large variability in examination items, the timing of labora-
tory tests, image quality, etc. To improve the usability of data, GENERALIZABILITY OF AI SYSTEMS TO CLINICAL
standardization of clinical data collection should be implemented PRACTICE
to generate high-quality data with complete and consistent infor-
mation to support the development of robust medical AI prod- Although numerous studies reported that their AI systems
ucts.137,138 For example, medical text data collection should showed robust performance in detecting eye diseases and had
include basic information such as age, gender, and examination the potential to be applied in clinics, most AI-based medical
date, and health examination records should have all examination devices had not yet been authorized for market distribution for
items and complete results. Besides, as low-quality image data clinical management of diseases such as AMD, glaucoma, and
often result in a loss of diagnostic information and affect AI-based cataracts. One of the most important reasons for this is that
image analyses,139 image quality assessment is necessary at the the generalizability of AI systems to populations of different
stage of data allocation to filter out low-quality images, which ethnicities and different countries, different clinical application
could improve the performance of AI-based diagnostic models scenarios, and images captured using different types of cameras
in real-world settings.140 To address this issue, Liu et al.141 devel- remains uncertain. Lots of AI studies only evaluated their sys-
oped a deep-learning-based flow-cytometry-like image quality tems in data from a single source, hence the systems often per-
classifier for the automated, high-throughput, and multidimen- formed poorly in real-world datasets that had more sources of
sional classification of fundus image quality, which can detect variation than the datasets utilized in research papers.84 To
low-quality images in real time and then guide a photographer improve the generalizability of AI systems, first, we need to build
to acquire high-quality images immediately. Li et al.142 reported large, multicenter, and multiethnic datasets for system develop-
a deep-learning-based image-quality-control system that could ment and evaluation. Milea et al.71 developed a deep-learning
discern low-quality slit-lamp images. This system can be used system for papilledema detection using data collected from 19
as a prescreening tool to filter out low-quality images and ensure sites in 11 countries and evaluated the system in data obtained
that only high-quality images will be transferred to the subsequent from five other sites in five countries. The AUCs of their system
AI-based diagnostic systems. Shen et al.143 established a new in internal and external test datasets were 0.99 and 0.96, veri-
multi-task domain adaptation framework for the automated fying that the system had broad generalizability. In addition,
fundus image quality assessment. The proposed framework can transfer learning, a technique that aims to transfer knowledge
offer interpretable quality assessment with both quantitative from one task to a different but related task, can help decrease
scores and quality visualization, which outperforms different generalization errors of AI systems via reusing the weights of a
state-of-the-art methods. Dai et al.23 demonstrated that the AI- pretrained model, particularly when faced with tasks with limited
based image quality assessment could reduce the proportion of data.147 Kermany et al.69 demonstrated that AI systems trained
poor-quality images and significantly improve the accuracy of with a transfer-learning algorithm had good performance and
an AI model for DR diagnosis. generalizability in the diagnosis of common diseases from
different types of images, such as detecting diabetic macular
Real-world performance of AI systems edema from OCT images (accuracy = 98.2%) and pediatric
Recently, several reports showed that AI systems in practice were pneumonia from chest X-ray images (accuracy = 92.8%). Third,
less helpful than retrospective studies described.144,84 For the generalizability of AI networks can be improved by utilizing a
example, Kanagasingam et al.144 evaluated their deep-learning- data-augmentation (DA) strategy that creates more training sam-
based system for DR screening based on retinal images in a ples for increasing the diversity of the training data.148 Zhou

Cell Reports Medicine 4, 101095, July 18, 2023 13


ll
OPEN ACCESS Review

et al.149 proposed an approach named DA-based feature align- prognosis prediction, it is still unclear whether healthcare pro-
ment that could consistently and significantly improve the out- viders, developers, sellers, or regulators should be held account-
of-distribution generalizability (up to +16.3% mean of clean able if an AI system makes mistakes in real-world clinical
AUC) of AI algorithms in glaucoma detection from fundus im- practice even after being thoroughly clinically validated. For
ages. Fourth, an AI algorithm trained based on lesion labels example, AI systems may miss a retinal disease in a fundus im-
can broaden its generalizability in disease detection. Li et al.150 age or recommend an incorrect treatment strategy. As a result,
reported that the algorithm trained with the image-level classifi- patients may be injured. In this case, we have to determine
cation labels and the anatomical and pathological labels dis- who is responsible for this incident. The allocation of liability
played better performance and generalizability than that trained makes clear not only whether and from whom patients acquire
with only the image-level classification labels in diagnosing redress but also whether, potentially, AI systems will make their
ophthalmic disorders from slit-lamp images (accuracies, way into clinical practice.155 At present, the suggested solution is
99.22%–79.47% versus 90.14%–47.19%). to treat medical AI systems as a confirmatory tool rather than as
a source of ways to improve care.156 In other words, a physician
INTERPRETABILITY OF AI SYSTEMS should check every output from the medical AI systems to
ensure the results that meet and follow the standard of care.
AI systems are often described as black boxes due to the nature Therefore, the physician would be held liable if malpractice oc-
of these systems (being trained instead of being explicitly pro- curs due to using these systems. This strategy may minimize
grammed).151 It is difficult for clinicians to understand the precise the potential value of medical AI systems as some systems
underlying functioning of the systems. As a result, correcting may perform better than even the best physicians but the physi-
some erroneous behaviors might be difficult, and acceptance cians would choose to ignore the AI recommendation when it
by clinicians as well as regulatory approval might be hampered. conflicts with standard practice. Consequently, the approach
Decoding AI for clinicians can mitigate such uncertainty. This that can balance the safety and innovation of medical AI needs
challenge has provided a stimulus for research groups and in- to be further explored.
dustries to focus on explainable AI. Techniques that enable a
good understanding of the working principle of AI systems are FUTURE DIRECTIONS
developed. For instance, Niu et al.152 reported a method that
could enhance the interpretability of an AI system in detecting Generally, AI models were directly trained using existing open-
DR. To be specific, they first define novel pathological descrip- sourced machine-learning packages frequently utilized by
tors leveraging activated neurons of the DR detector to encode others to address the issue of interest without additional custom-
both the appearance and spatial information of lesions. Then, ization or refinement. This approach may limit the optimal perfor-
they proposed a novel generative adversarial network (GAN), mance of AI applications as no generalized solution exists in
Patho-GAN, to visualize the signs that the DR detector identified most cases. To improve the performance of AI tools, in-depth
as evidence to make a prediction. Xu et al.153 developed an knowledge of clinical problems as well as the features of AI algo-
explainable AI system for diagnosing fungal keratitis from in vivo rithms is indispensable. Therefore, applicable customization of
confocal microscopy images based on gradient-weighted class the algorithms should be conducted according to the specific
activation mapping (Grad-CAM) and guided Grad-CAM tech- challenges of each problem, which usually needs interdisci-
niques. They found that the assistance from the explainable AI plinary collaboration among ophthalmologists, computer scien-
system could boost ophthalmologists’ performance beyond tists (e.g., AI experts), policymakers, and others.
what was achievable by the ophthalmologist alone or with the Although AI studies have seen enormous progress in the past
black-box AI assistance. Overall, these interpretation frame- decade, they are predominantly based on fixed datasets and
works may facilitate AI acceptance for clinical usage. stationary environments. The performance of AI systems is often
fixed by the time they are developed. However, the world is not
LONGEVITY OF AI SYSTEMS stationary, which requires that AI systems should have the ability
as clinicians to improve themselves constantly and evolve to
The performance of AI systems has the potential to degrade over thrive in dynamic learning settings. Continual learning tech-
time as the characteristics of the world, such as disease distribu- niques, such as gradient-based learning, modular neural
tion, population characteristics, health infrastructure, and cyber network, and meta-learning, may enable AI models to obtain
technologies, are changing all the time. This requires that AI sys- specialized solutions without forgetting previous ones, namely
tems should have the ability of lifelong continuous learning to learning over a lifetime, as a clinician does.157 These techniques
keep and even improve their performance over time. The contin- may take AI to a higher level by improving learning efficiency and
uous learning technique, meta-learning, which aims to improve enabling knowledge transfer between related tasks.
the AI algorithm itself, is a potential approach to address this In addition to current diagnostic and predictive tasks, AI
issue.154 methods can also be employed to support ophthalmologists
with additional information impossible to obtain by sole visual in-
LIABILITY spection. For instance, the objective quantification of the area of
corneal ulcer via a combination of segmentation and detection
Although medical AI systems can help physicians in clinics, such techniques can assist ophthalmologists in precisely evaluating
as disease diagnosis, recommendations for treatment, and whether the treatment is effective on patients in follow-up visits.

14 Cell Reports Medicine 4, 101095, July 18, 2023


ll
Review OPEN ACCESS

If the area becomes smaller, it indicates that the condition has through AI devices, and how the devices will learn and change
improved. Otherwise, it denotes that the condition has worsened while remaining effective and safe, as well as strategies to
and treatment strategies may need to change. reduce performance loss. This regulatory framework is good
To date, AI is not immune to the garbage-in, garbage-out weak- guidance for research groups to better develop and report their
ness, even with big data. Appropriate data preprocessing to ac- AI-based medical products.
quire high-quality training sets is critical to the success of AI sys-
tems.7,158 While AI systems have good performance (e.g., Conclusions
detecting corneal diseases) in high-quality images, they often AI in ophthalmology has made huge strides over the past
perform poorly in low-quality images.159,160 Nevertheless, the per- decade. Plenty of studies have shown that the performance of
formance of human doctors in low-quality images is better than AI is equal to and even superior to that of ophthalmologists in
that of an AI system, exposing a vulnerability of the AI system.159 many diagnostic and predictive tasks. However, much work re-
As low-quality images are inevitable in real-world settings,142,161 mains to be done before deploying AI products from bench to
exploring approaches that can improve the performance of AI sys- bedside. Issues such as real-world performance, generaliz-
tems in low-quality images is needed to enhance the robustness ability, and interpretability of AI systems are still insufficiently
of AI-based products in clinical practice. investigated and will require more attention in future studies.
A lot of studies have drawn overly optimistic conclusions based The solution of data sharing, data annotation, and other related
on AI systems’ good performance on external validation datasets. problems will facilitate the development of more robust AI prod-
However, such results are not evidence of the clinical usefulness ucts. Strategies such as customization of AI algorithms for a spe-
of AI systems.162 Well-conducted and well-reported prospective cific clinical task and utilization of continual learning techniques
studies are essential to provide proof to truly demonstrate the may further improve AI’s ability to serve patients. RCTs and pro-
added value of AI systems in ophthalmology and pave the way spective studies following special guidelines (e.g., SPIRIT-AI
to clinical implementation. Recent guidelines, such as the Stan- extension, STARD-AI, and FDA’s guidance) can rigorously
dard Protocol Items: Recommendations for Interventional Trials demonstrate whether AI devices would bring a positive impact
(SPIRIT)-AI extension, Consolidated Standards of Reporting Trials to real healthcare settings, contributing to the clinical translation
(CONSORT)-AI extension, and Standards for Reporting of Diag- of these devices. Although this field is not completely mature yet,
nostic Accuracy Study (STARD)-AI, may improve the design, we hope AI will play an important role in the future of ophthal-
transparency, reporting, and nuanced conclusions of AI studies, mology, making healthcare more efficient, accurate, and acces-
rigorously validating the usefulness of medical AI, and ultimately sible, especially in regions lacking ophthalmologists.
improving the quality of patient care.163–165 In addition, an interna-
tional team established an evaluation framework termed Transla- SUPPLEMENTAL INFORMATION
tional Evaluation of Healthcare AI (TEHAI) focusing on the assess-
ment of translational aspects of AI systems in medicine.166 The Supplemental information can be found online at https://doi.org/10.1016/j.
xcrm.2023.101095.
evaluation components (e.g., capability, utility, and adoption) of
TEHAI can be used at any stage of the development and deploy- ACKNOWLEDGMENTS
ment of medical AI systems.166
Patient privacy and data security are major concerns in med- This study received funding from the National Natural Science Foundation of
ical AI development and application. Several approaches may China (grant nos. 82201148, 62276210), the Natural Science Foundation of
help address these issues. First, sensitive data should be ob- Zhejiang Province (grant no. LQ22H120002), the Medical and Health Science
tained and used in research with patient consent, and anonym- and Technology Project of Zhejiang Province (grant nos. 2022RC069,
2023KY1140), the Natural Science Foundation of Ningbo (grant no.
ization and aggregation strategies should be adopted to obscure
2023J390), the Natural Science Basic Research Program of Shaanxi (grant
personal details. Any clinical institution should handle patient no. 2022JM-380), and Ningbo Science & Technology Program (grant no.
data responsibly, for example, by utilizing appropriate security 2021S118). The funding organization played no role in the study design,
protocols. Second, differential privacy (DP), a data-perturba- data collection and analysis, decision to publish, or preparation of the
tion-based privacy approach that is able to retain the global in- manuscript.
formation of a dataset while reducing information about a single
AUTHOR CONTRIBUTIONS
individual, can be employed to reduce privacy risks and protect
data security.167 Based on this approach, an outside observer
Conception and design, Z.L., Y.S., and W.C.; identification of relevant litera-
cannot infer whether a specific individual was utilized for ture, Z.L., L.W., X.W., J.J., W.Q., H.X., H.Z., S.W., Y.S., and W.C.; manuscript
acquiring a result from the dataset. Third, homomorphic encryp- writing, all authors; final approval of the manuscript, all authors.
tion, an encryption scheme that allows computation on encryp-
ted data, is widely treated as a gold standard for data security. DECLARATION OF INTERESTS
This approach has successfully been applied to AI algorithms
The authors declare no competing interests.
and to the data that allow secure and joint computation.168
Recently, regulatory agencies, such as the Food and Drug
REFERENCES
Administration (FDA), have proposed a regulatory framework to
evaluate the safety and effectiveness of AI-based medical de- 1. Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M.,
vices during the initial premarket review.169 Specifically, manu- Chou, K., Cui, C., Corrado, G., Thrun, S., and Dean, J. (2019). A guide
facturers have to illustrate what aspects they intend to achieve to deep learning in healthcare. Nat. Med. 25, 24–29.

Cell Reports Medicine 4, 101095, July 18, 2023 15


ll
OPEN ACCESS Review
2. LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature 521, in neovascular Age-Related macular degeneration. Ophthalmol. Retina
436–444. 2, 24–30.
3. Li, Z., Qiang, W., Chen, H., Pei, M., Yu, X., Wang, L., Li, Z., Xie, W., Wu, X., 18. Burton, M.J., Ramke, J., Marques, A.P., Bourne, R.R.A., Congdon, N.,
Jiang, J., and Wu, G. (2022). Artificial intelligence to detect malignant Jones, I., Ah Tong, B.A.M., Arunga, S., Bachani, D., Bascaran, C., et al.
eyelid tumors from photographic images. NPJ Digit. Med. 5, 23. (2021). The lancet global health commission on global eye health: vision
4. Lotter, W., Diab, A.R., Haslam, B., Kim, J.G., Grisot, G., Wu, E., Wu, K., beyond 2020. Lancet Global Health 9, e489–e551.
Onieva, J.O., Boyer, Y., Boxerman, J.L., et al. (2021). Robust breast can- 19. Cheung, N., Mitchell, P., and Wong, T.Y. (2010). Diabetic retinopathy.
cer detection in mammography and digital breast tomosynthesis using Lancet 376, 124–136.
an annotation-efficient deep learning approach. Nat. Med. 27, 244–249. 20. Vujosevic, S., Aldington, S.J., Silva, P., Hernández, C., Scanlon, P., Peto,
5. Landhuis, E. (2020). Deep learning takes on tumours. Nature 580, T., and Simó, R. (2020). Screening for diabetic retinopathy: new perspec-
551–553. tives and challenges. Lancet Diabetes Endocrinol. 8, 337–347.
6. Ting, D.S.W., Cheung, C.Y.L., Lim, G., Tan, G.S.W., Quang, N.D., Gan, 21. WHO (2020). Diabetic Retinopathy Screening: A Short Guide: Increase
A., Hamzah, H., Garcia-Franco, R., San Yeo, I.Y., Lee, S.Y., et al. Effectiveness, Maximize Benefits and Minimize Harm. https://apps.
(2017). Development and validation of a deep learning system for dia- who.int/iris/handle/10665/336660. (Accessed 10 April 2022).
betic retinopathy and related eye diseases using retinal images from 22. Liu, X., Ali, T.K., Singh, P., Shah, A., McKinney, S.M., Ruamviboonsuk, P.,
multiethnic populations with diabetes. JAMA 318, 2211–2223. Turner, A.W., Keane, P.A., Chotcomwongse, P., Nganthavee, V., et al.
7. Gulshan, V., Peng, L., Coram, M., Stumpe, M.C., Wu, D., Narayanasw- (2022). Deep learning to detect OCT-derived diabetic macular edema
amy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., et al. from color retinal photographs: a multicenter validation study. Ophthal-
(2016). Development and validation of a deep learning algorithm for mol. Retina 6, 398–410.
detection of diabetic retinopathy in retinal fundus photographs. JAMA 23. Dai, L., Wu, L., Li, H., Cai, C., Wu, Q., Kong, H., Liu, R., Wang, X., Hou, X.,
316, 2402–2410. Liu, Y., et al. (2021). A deep learning system for detecting diabetic reti-
8. Yim, J., Chopra, R., Spitz, T., Winkens, J., Obika, A., Kelly, C., Askham, nopathy across the disease spectrum. Nat. Commun. 12, 3242.
H., Lukic, M., Huemer, J., Fasler, K., et al. (2020). Predicting conversion 24. Lee, A.Y., Yanagihara, R.T., Lee, C.S., Blazes, M., Jung, H.C., Chee, Y.E.,
to wet age-related macular degeneration using deep learning. Nat. Med. Gencarella, M.D., Gee, H., Maa, A.Y., Cockerham, G.C., et al. (2021).
26, 892–899. Multicenter, Head-to-Head, Real-World validation study of seven auto-
9. Li, Z., Guo, C., Nie, D., Lin, D., Cui, T., Zhu, Y., Chen, C., Zhao, L., Zhang, X., mated artificial intelligence diabetic retinopathy screening systems. Dia-
Dongye, M., et al. (2022). Automated detection of retinal exudates and dru- betes Care 44, 1168–1175.
sen in ultra-widefield fundus images based on deep learning. Eye 36, ^
25. Araújo, T., Aresta, G., Mendonça, L., Penas, S., Maia, C., Carneiro, A.,
1681–1686. Mendonça, A.M., and Campilho, A. (2020). DR|GRADUATE: uncer-
10. Zhang, K., Liu, X., Shen, J., Li, Z., Sang, Y., Wu, X., Zha, Y., Liang, W., tainty-aware deep learning-based diabetic retinopathy grading in eye
Wang, C., Wang, K., et al. (2020). Clinically applicable AI system for ac- fundus images. Med. Image Anal. 63, 101715.
curate diagnosis, quantitative measurements, and prognosis of COVID- 26. Heydon, P., Egan, C., Bolter, L., Chambers, R., Anderson, J., Aldington,
19 pneumonia using computed tomography. Cell 182, 1360. S., Stratton, I.M., Scanlon, P.H., Webster, L., Mann, S., et al. (2021). Pro-
11. Brendlin, A.S., Peisen, F., Almansour, H., Afat, S., Eigentler, T., Amaral, spective evaluation of an artificial intelligence-enabled algorithm for
T., Faby, S., Calvarons, A.F., Nikolaou, K., and Othman, A.E. (2021). A automated diabetic retinopathy screening of 30 000 patients. Br. J. Oph-
Machine learning model trained on dual-energy CT radiomics signifi- thalmol. 105, 723–728.
cantly improves immunotherapy response prediction for patients with 27. Natarajan, S., Jain, A., Krishnan, R., Rogye, A., and Sivaprasad, S. (2019).
stage IV melanoma. J. Immunother. Cancer 9, e003261. Diagnostic accuracy of Community-Based diabetic retinopathy
12. Li, Z., Guo, C., Lin, D., Nie, D., Zhu, Y., Chen, C., Zhao, L., Wang, J., screening with an offline artificial intelligence system on a smartphone.
Zhang, X., Dongye, M., et al. (2021). Deep learning for automated glau- JAMA Ophthalmol. 137, 1182–1188.
comatous optic neuropathy detection from ultra-widefield fundus im- 28. Gulshan, V., Rajan, R.P., Widner, K., Wu, D., Wubbels, P., Rhodes, T.,
ages. Br. J. Ophthalmol. 105, 1548–1554. Whitehouse, K., Coram, M., Corrado, G., Ramasamy, K., et al. (2019).
13. Dow, E.R., Keenan, T.D.L., Lad, E.M., Lee, A.Y., Lee, C.S., Loewenstein, Performance of a Deep-Learning algorithm vs manual grading for detect-
A., Eydelman, M.B., Chew, E.Y., Keane, P.A., and Lim, J.I.; Collaborative ing diabetic retinopathy in India. JAMA Ophthalmol. 137, 987–993.
Community for Ophthalmic Imaging Executive Committee and the Work- 29. Li, Z., Keel, S., Liu, C., He, Y., Meng, W., Scheetz, J., Lee, P.Y., Shaw, J.,
ing Group for Artificial Intelligence in Age-Related Macular Degeneration Ting, D., Wong, T.Y., et al. (2018). An automated grading system for
(2022). From data to deployment: the collaborative community on detection of Vision-Threatening referable diabetic retinopathy on the ba-
ophthalmic imaging roadmap for artificial intelligence in Age-Related sis of color fundus photographs. Diabetes Care 41, 2509–2516.
macular degeneration. Ophthalmology 129, e43–e59. 30. Tang, F., Luenam, P., Ran, A.R., Quadeer, A.A., Raman, R., Sen, P.,
14. Ting, D.S.J., Foo, V.H., Yang, L.W.Y., Sia, J.T., Ang, M., Lin, H., Chodosh, Khan, R., Giridhar, A., Haridas, S., Iglicki, M., et al. (2021). Detection of
J., Mehta, J.S., and Ting, D.S.W. (2021). Artificial intelligence for anterior diabetic retinopathy from Ultra-Widefield scanning laser ophthalmo-
segment diseases: emerging applications in ophthalmology. Br. J. Oph- scope images: a multicenter deep learning analysis. Ophthalmol. Retina
thalmol. 105, 158–168. 5, 1097–1106.
15. Li, Z., Jiang, J., Chen, K., Chen, Q., Zheng, Q., Liu, X., Weng, H., Wu, S., 31. Engelmann, J., McTrusty, A.D., MacCormick, I.J.C., Pead, E., Storkey,
and Chen, W. (2021). Preventing corneal blindness caused by keratitis A., and Bernabeu, M.O. (2022). Detecting multiple retinal diseases in ul-
using artificial intelligence. Nat. Commun. 12, 3738. tra-widefield fundus imaging and data-driven identification of informative
16. Keenan, T.D.L., Chen, Q., Agrón, E., Tham, Y.C., Goh, J.H.L., Lei, X., Ng, regions with deep learning. Nat. Mach. Intell. 4, 1143–1154.
Y.P., Liu, Y., Xu, X., Cheng, C.Y., et al. (2022). DeepLensNet: deep 32. Liu, Y., Wang, M., Morris, A.D., Doney, A.S.F., Leese, G.P., Pearson,
learning automated diagnosis and quantitative classification of cataract E.R., and Palmer, C.N.A. (2013). Glycemic exposure and blood pressure
type and severity. Ophthalmology 129, 571–584. influencing progression and remission of diabetic retinopathy: a longitu-
17. Schmidt-Erfurth, U., Bogunovic, H., Sadeghipour, A., Schlegl, T., Langs, dinal cohort study in GoDARTS. Diabetes Care 36, 3979–3984.
G., Gerendas, B.S., Osborne, A., and Waldstein, S.M. (2018). Machine 33. Bora, A., Balasubramanian, S., Babenko, B., Virmani, S., Venugopalan,
learning to analyze the prognostic value of current imaging biomarkers S., Mitani, A., de Oliveira Marinho, G., Cuadros, J., Ruamviboonsuk, P.,

16 Cell Reports Medicine 4, 101095, July 18, 2023


ll
Review OPEN ACCESS

Corrado, G.S., et al. (2021). Predicting the risk of developing diabetic reti- learning on contact lens Sensor-Derived parameters for the diagnosis of
nopathy using deep learning. Lancet. Digit. Health 3, e10–e19. primary open-angle glaucoma. Am. J. Ophthalmol. 194, 46–53.
34. Arcadu, F., Benmansour, F., Maunz, A., Willis, J., Haskova, Z., and Pru- 51. Asaoka, R., Murata, H., Iwase, A., and Araie, M. (2016). Detecting pre-
notto, M. (2019). Deep learning algorithm predicts diabetic retinopathy perimetric glaucoma with standard automated perimetry using a deep
progression in individual patients. NPJ Digit. Med. 2, 92. learning classifier. Ophthalmology 123, 1974–1980.
35. Jonas, J.B., Aung, T., Bourne, R.R., Bron, A.M., Ritch, R., and Panda- 52. Girard, M.J.A., and Schmetterer, L. (2020). Artificial intelligence and deep
Jonas, S. (2017). Glaucoma. Lancet 390, 2183–2193. learning in glaucoma: current state and future prospects. Prog. Brain
36. Tham, Y.C., Li, X., Wong, T.Y., Quigley, H.A., Aung, T., and Cheng, C.Y. Res. 257, 37–64.
(2014). Global prevalence of glaucoma and projections of glaucoma 53. Hood, D.C., La Bruna, S., Tsamis, E., Thakoor, K.A., Rai, A., Leshno, A.,
burden through 2040: a systematic review and meta-analysis. Ophthal- de Moraes, C.G.V., Cioffi, G.A., and Liebmann, J.M. (2022). Detecting
mology 121, 2081–2090. glaucoma with only OCT: implications for the clinic, research, screening,
37. Weinreb, R.N., Aung, T., and Medeiros, F.A. (2014). The pathophysiology and AI development. Prog. Retin. Eye Res. 90, 101052.
and treatment of glaucoma: a review. JAMA 311, 1901–1911. 54. Ran, A.R., Tham, C.C., Chan, P.P., Cheng, C.Y., Tham, Y.C., Rim, T.H.,
38. GBD 2019 Blindness and Vision Impairment Collaborators; Vision Loss and Cheung, C.Y. (2021). Deep learning in glaucoma with optical coher-
Expert Group of the Global Burden of Disease Study; Briant, P.S., Flax- ence tomography: a review. Eye 35, 188–201.
man, S.R., Taylor, H.R.B., Jonas, J.B., Abdoli, A.A., Abrha, W.A., Abual- 55. Fu, H., Baskaran, M., Xu, Y., Lin, S., Wong, D.W.K., Liu, J., Tun, T.A., Ma-
hasan, A., Abu-Gharbieh, E.G., et al. (2021). Causes of blindness and hesh, M., Perera, S.A., and Aung, T. (2019). A deep learning system for
vision impairment in 2020 and trends over 30 years, and prevalence of automated Angle-Closure detection in anterior segment optical coher-
avoidable blindness in relation to VISION 2020: the Right to Sight: an ence tomography images. Am. J. Ophthalmol. 203, 37–45.
analysis for the Global Burden of Disease Study. Lancet Global Health 56. Yousefi, S., Kiwaki, T., Zheng, Y., Sugiura, H., Asaoka, R., Murata, H., Lemij,
9, e144–e160. H., and Yamanishi, K. (2018). Detection of longitudinal visual field progres-
39. Yu, M., Lin, C., Weinreb, R.N., Lai, G., Chiu, V., and Leung, C.K.S. (2016). sion in glaucoma using machine learning. Am. J. Ophthalmol. 193, 71–79.
Risk of visual field progression in glaucoma patients with progressive 57. Wang, M., Shen, L.Q., Pasquale, L.R., Petrakos, P., Formica, S., Boland,
retinal nerve fiber layer thinning: a 5-Year prospective study. Ophthal- M.V., Wellik, S.R., De Moraes, C.G., Myers, J.S., Saeedi, O., et al. (2019).
mology 123, 1201–1210. An artificial intelligence approach to detect visual field progression in
40. King, A., Azuara-Blanco, A., and Tuulonen, A. (2013). Glaucoma. BMJ glaucoma based on spatial pattern analysis. Invest. Ophthalmol. Vis.
346, f3518. Sci. 60, 365–375.
41. Hollands, H., Johnson, D., Hollands, S., Simel, D.L., Jinapriya, D., and 58. Mitchell, P., Liew, G., Gopinath, B., and Wong, T.Y. (2018). Age-related
Sharma, S. (2013). Do findings on routine examination identify patients macular degeneration. Lancet 392, 1147–1159.
at risk for primary open-angle glaucoma? The rational clinical examina-
59. Wong, W.L., Su, X., Li, X., Cheung, C.M.G., Klein, R., Cheng, C.Y., and
tion systematic review. JAMA 309, 2035–2042.
Wong, T.Y. (2014). Global prevalence of age-related macular degenera-
42. Xiong, J., Li, F., Song, D., Tang, G., He, J., Gao, K., Zhang, H., Cheng, W., tion and disease burden projection for 2020 and 2040: a systematic re-
Song, Y., Lin, F., et al. (2022). Multimodal machine learning using visual view and meta-analysis. Lancet Global Health 2, e106–e116.
fields and peripapillary circular OCT scans in detection of glaucomatous
60. Heier, J.S., Khanani, A.M., Quezada Ruiz, C., Basu, K., Ferrone, P.J.,
optic neuropathy. Ophthalmology 129, 171–180.
Brittain, C., Figueroa, M.S., Lin, H., Holz, F.G., Patel, V., et al. (2022). Ef-
43. Li, F., Su, Y., Lin, F., Li, Z., Song, Y., Nie, S., Xu, J., Chen, L., Chen, S., Li, ficacy, durability, and safety of intravitreal faricimab up to every 16 weeks
H., et al. (2022). A deep-learning system predicts glaucoma incidence for neovascular age-related macular degeneration (TENAYA and
and progression using retinal photographs. J. Clin. Invest. 132, e157968. LUCERNE): two randomised, double-masked, phase 3, non-inferiority
44. Fan, R., Bowd, C., Christopher, M., Brye, N., Proudfoot, J.A., Rezapour, trials. Lancet 399, 729–740.
J., Belghith, A., Goldbaum, M.H., Chuter, B., Girkin, C.A., et al. (2022). 61. Peng, Y., Dharssi, S., Chen, Q., Keenan, T.D., Agrón, E., Wong, W.T., Chew,
Detecting glaucoma in the ocular hypertension study using deep E.Y., and Lu, Z. (2019). DeepSeeNet: a deep learning model for automated
learning. JAMA Ophthalmol. 140, 383–391. classification of patient-based age-related macular degeneration severity
45. Li, F., Yang, Y., Sun, X., Qiu, Z., Zhang, S., Tun, T.A., Mani, B., Nongpiur, from color fundus photographs. Ophthalmology 126, 565–575.
M.E., Chansangpetch, S., Ratanawongphaibul, K., et al. (2022). Digital 62. Burlina, P.M., Joshi, N., Pekala, M., Pacheco, K.D., Freund, D.E., and
gonioscopy based on three-dimensional Anterior-Segment OCT: an in- Bressler, N.M. (2017). Automated grading of Age-Related macular
ternational multicenter study. Ophthalmology 129, 45–53. degeneration from color fundus images using deep convolutional neural
46. Dixit, A., Yohannan, J., and Boland, M.V. (2021). Assessing glaucoma networks. JAMA Ophthalmol. 135, 1170–1176.
progression using machine learning trained on longitudinal visual field 63. Potapenko, I., Thiesson, B., Kristensen, M., Hajari, J.N., Ilginis, T., Fuchs,
and clinical data. Ophthalmology 128, 1016–1026. J., Hamann, S., and la Cour, M. (2022). Automated artificial intelligence-
47. Medeiros, F.A., Jammal, A.A., and Mariottoni, E.B. (2021). Detection of based system for clinical follow-up of patients with age-related macular
progressive glaucomatous optic nerve damage on fundus photographs degeneration. Acta Ophthalmol. 100, 927–936.
with deep learning. Ophthalmology 128, 383–392. 64. Yellapragada, B., Hornauer, S., Snyder, K., Yu, S., and Yiu, G. (2022).
48. Yousefi, S., Elze, T., Pasquale, L.R., Saeedi, O., Wang, M., Shen, L.Q., Self-Supervised feature learning and phenotyping for assessing Age-
Wellik, S.R., De Moraes, C.G., Myers, J.S., and Boland, M.V. (2020). Related macular degeneration using retinal fundus images. Ophthalmol.
Monitoring glaucomatous functional loss using an artificial Intelligence- Retina 6, 116–129.
Enabled dashboard. Ophthalmology 127, 1170–1178. 65. Rakocz, N., Chiang, J.N., Nittala, M.G., Corradetti, G., Tiosano, L., Ve-
49. Ran, A.R., Cheung, C.Y., Wang, X., Chen, H., Luo, L.Y., Chan, P.P., laga, S., Thompson, M., Hill, B.L., Sankararaman, S., Haines, J.L., et al.
Wong, M.O.M., Chang, R.T., Mannil, S.S., Young, A.L., et al. (2019). (2021). Automated identification of clinical features from sparsely anno-
Detection of glaucomatous optic neuropathy with spectral-domain opti- tated 3-dimensional medical imaging. NPJ Digit. Med. 4, 44.
cal coherence tomography: a retrospective training and validation deep- 66. Hwang, D.K., Hsu, C.C., Chang, K.J., Chao, D., Sun, C.H., Jheng, Y.C.,
learning analysis. Lancet. Digit. Health 1, e172–e182. Yarmishyn, A.A., Wu, J.C., Tsai, C.Y., Wang, M.L., et al. (2019). Artificial
50. Martin, K.R., Mansouri, K., Weinreb, R.N., Wasilewicz, R., Gisler, C., Hen- intelligence-based decision-making for age-related macular degenera-
nebert, J., and Genoud, D.; Research Consortium (2018). Use of machine tion. Theranostics 9, 232–245.

Cell Reports Medicine 4, 101095, July 18, 2023 17


ll
OPEN ACCESS Review
67. Keel, S., Li, Z., Scheetz, J., Robman, L., Phung, J., Makeyeva, G., Aung, collaborative management of cataracts. Br. J. Ophthalmol. 103,
K., Liu, C., Yan, X., Meng, W., et al. (2019). Development and validation of 1553–1560.
a deep-learning algorithm for the detection of neovascular age-related
84. Lin, H., Li, R., Liu, Z., Chen, J., Yang, Y., Chen, H., Lin, Z., Lai, W., Long,
macular degeneration from colour fundus photographs. Clin. Exp. Oph-
E., Wu, X., et al. (2019). Diagnostic efficacy and therapeutic decision-
thalmol. 47, 1009–1018.
making capacity of an artificial intelligence platform for childhood cata-
68. Grassmann, F., Mengelkamp, J., Brandl, C., Harsch, S., Zimmermann, racts in eye clinics: a multicentre randomized controlled trial. EClinical-
M.E., Linkohr, B., Peters, A., Heid, I.M., Palm, C., and Weber, B.H.F. Medicine 9, 52–59.
(2018). A deep learning algorithm for prediction of Age-Related eye dis-
85. Zhang, H., Niu, K., Xiong, Y., Yang, W., He, Z., and Song, H. (2019). Auto-
ease study severity scale for Age-Related macular degeneration from co-
matic cataract grading methods based on deep learning. Comput.
lor fundus photography. Ophthalmology 125, 1410–1420.
Methods Progr. Biomed. 182, 104978.
69. Kermany, D.S., Goldbaum, M., Cai, W., Valentim, C.C.S., Liang, H.,
86. Gao, X., Lin, S., and Wong, T.Y. (2015). Automatic feature learning to
Baxter, S.L., McKeown, A., Yang, G., Wu, X., Yan, F., et al. (2018). Iden-
grade nuclear cataracts based on deep learning. IEEE Trans. Biomed.
tifying medical diagnoses and treatable diseases by Image-Based deep
Eng. 62, 2693–2701.
learning. Cell 172, 1122–1131.e9.
87. Garcia Nespolo, R., Yi, D., Cole, E., Valikodath, N., Luciano, C., and Lei-
70. Yan, Q., Weeks, D.E., Xin, H., Swaroop, A., Chew, E.Y., Huang, H., Ding, Y.,
derman, Y.I. (2022). Evaluation of artificial Intelligence-Based intraopera-
and Chen, W. (2020). Deep-learning-based prediction of late Age-Related
tive guidance tools for phacoemulsification cataract surgery. JAMA Oph-
macular degeneration progression. Nat. Mach. Intell. 2, 141–150.
thalmol. 140, 170–177.
71. Milea, D., Najjar, R.P., Zhubo, J., Ting, D., Vasseneix, C., Xu, X., Aghsaei
88. Flaxman, S.R., Bourne, R.R.A., Resnikoff, S., Ackland, P., Braithwaite, T.,
Fard, M., Fonseca, P., Vanikieti, K., Lagrèze, W.A., et al. (2020). Artificial
Cicinelli, M.V., Das, A., Jonas, J.B., Keeffe, J., Kempen, J.H., et al. (2017).
intelligence to detect papilledema from ocular fundus photographs.
N. Engl. J. Med. 382, 1687–1695. Global causes of blindness and distance vision impairment 1990-2020: a
systematic review and meta-analysis. Lancet Global Health 5, e1221–
72. Brown, J.M., Campbell, J.P., Beers, A., Chang, K., Ostmo, S., Chan, R.V.P., e1234.
Dy, J., Erdogmus, D., Ioannidis, S., Kalpathy-Cramer, J., et al. (2018). Auto-
mated diagnosis of plus disease in retinopathy of prematurity using deep 89. Burton, M.J. (2009). Prevention, treatment and rehabilitation. Community
convolutional neural networks. JAMA Ophthalmol. 136, 803–810. Eye Health 22, 33–35.

73. Li, Z., Guo, C., Nie, D., Lin, D., Zhu, Y., Chen, C., Zhang, L., Xu, F., Jin, C., 90. Singh, P., Gupta, A., and Tripathy, K. (2020). Keratitis. StatPearls.
Zhang, X., et al. (2019). A deep learning system for identifying lattice 91. Lin, A., Rhee, M.K., Akpek, E.K., Amescua, G., Farid, M., Garcia-Ferrer,
degeneration and retinal breaks using ultra-widefield fundus images. F.J., Varu, D.M., Musch, D.C., Dunn, S.P., and Mah, F.S.; American
Ann. Transl. Med. 7, 618. Academy of Ophthalmology Preferred Practice Pattern Cornea and
74. Li, Z., Guo, C., Nie, D., Lin, D., Zhu, Y., Chen, C., Wu, X., Xu, F., Jin, C., External Disease Panel (2019). Bacterial keratitis preferred practice Pat-
Zhang, X., et al. (2020). Deep learning for detecting retinal detachment tern(R). Ophthalmology 126, P1–P55.
and discerning macular status using ultra-widefield fundus images. 92. Xu, Y., Kong, M., Xie, W., Duan, R., Fang, Z., Lin, Y., Zhu, Q., Tang, S.,
Commun. Biol. 3, 15. Wu, F., and Yao, Y.F. (2021). Deep sequential feature learning in clinical
75. Li, Y., Foo, L.L., Wong, C.W., Li, J., Hoang, Q.V., Schmetterer, L., Ting, image classification of infectious keratitis. Engineering 7, 1002–1010.
D.S.W., and Ang, M. (2023). Pathologic myopia: advances in imaging 93. Redd, T.K., Prajna, N.V., Srinivasan, M., Lalitha, P., Krishnan, T., Rajara-
and the potential role of artificial intelligence. Br. J. Ophthalmol. 107, man, R., Venugopal, A., Acharya, N., Seitzman, G.D., Lietman, T.M., et al.
600–606. (2022). Image-Based differentiation of bacterial and fungal keratitis using
76. Tsai, Y.Y., Lin, W.Y., Chen, S.J., Ruamviboonsuk, P., King, C.H., and deep convolutional neural networks. Ophthalmol. Sci. 2, 100119.
Tsai, C.L. (2022). Diagnosis of polypoidal choroidal vasculopathy from 94. Ren, Z., Li, W., Liu, Q., Dong, Y., and Huang, Y. (2022). Profiling of the
fluorescein angiography using deep learning. Transl. Vis. Sci. Technol. conjunctival bacterial microbiota reveals the feasibility of utilizing a
11, 6. Microbiome-Based machine learning model to differentially diagnose mi-
77. Liu, Y.C., Wilkins, M., Kim, T., Malyugin, B., and Mehta, J.S. (2017). Lan- crobial keratitis and the core components of the conjunctival bacterial
cet 390, 600–612. interaction network. Front. Cell. Infect. Microbiol. 12, 860370.
78. Tham, Y.C., Goh, J.H.L., Anees, A., Lei, X., Rim, T.H., Chee, M.L., Wang, 95. Wu, W., Huang, S., Xie, X., Chen, C., Yan, Z., Lv, X., Fan, Y., Chen, C.,
Y.X., Jonas, J.B., Thakur, S., Teo, Z.L., et al. (2022). Detecting visually Yue, F., and Yang, B. (2022). Raman spectroscopy may allow rapid
significant cataract using retinal photograph-based deep learning. Nat. noninvasive screening of keratitis and conjunctivitis. Photodiagnosis
Aging 2, 264–271. Photodyn. Ther. 37, 102689.
79. Xu, X., Li, J., Guan, Y., Zhao, L., Zhao, Q., Zhang, L., and Li, L. (2021). 96. Tiwari, M., Piech, C., Baitemirova, M., Prajna, N.V., Srinivasan, M., Lali-
GLA-Net: a global-local attention network for automatic cataract classi- tha, P., Villegas, N., Balachandar, N., Chua, J.T., Redd, T., et al. (2022).
fication. J. Biomed. Inf. 124, 103939. Differentiation of active corneal infections from healed scars using
80. Lu, Q., Wei, L., He, W., Zhang, K., Wang, J., Zhang, Y., Rong, X., Zhao, Z., deep learning. Ophthalmology 129, 139–146.
Cai, L., He, X., et al. (2022). Lens Opacities Classification System III- 97. Ghosh, A.K., Thammasudjarit, R., Jongkhajornpong, P., Attia, J., and
based artificial intelligence program for automatic cataract grading. Thakkinstian, A. (2022). Deep learning for discrimination between fungal
J. Cataract Refract. Surg. 48, 528–534. keratitis and bacterial keratitis: DeepKeratitis. Cornea 41, 616–622.
81. Lin, D., Chen, J., Lin, Z., Li, X., Zhang, K., Wu, X., Liu, Z., Huang, J., Li, J., 98. Wang, L., Chen, K., Wen, H., Zheng, Q., Chen, Y., Pu, J., and Chen, W.
Zhu, Y., et al. (2020). A practical model for the identification of congenital (2021). Feasibility assessment of infectious keratitis depicted on slit-
cataracts using machine learning. EBioMedicine 51, 102621. lamp and smartphone photographs using deep learning. Int. J. Med.
82. Xu, X., Zhang, L., Li, J., Guan, Y., and Zhang, L. (2020). A hybrid Global- Inf. 155, 104583.
Local representation CNN model for automatic cataract grading. IEEE J. 99. Lv, J., Zhang, K., Chen, Q., Chen, Q., Huang, W., Cui, L., Li, M., Li, J.,
Biomed. Health Inform. 24, 556–567. Chen, L., Shen, C., et al. (2020). Deep learning-based automated diag-
83. Wu, X., Huang, Y., Liu, Z., Lai, W., Long, E., Zhang, K., Jiang, J., Lin, D., nosis of fungal keratitis with in vivo confocal microscopy images. Ann.
Chen, K., Yu, T., et al. (2019). Universal artificial intelligence platform for Transl. Med. 8, 706.

18 Cell Reports Medicine 4, 101095, July 18, 2023


ll
Review OPEN ACCESS

100. Gu, H., Guo, Y., Gu, L., Wei, A., Xie, S., Ye, Z., Xu, J., Zhou, X., Lu, Y., Liu, 117. Rim, T.H., Lee, C.J., Tham, Y.C., Cheung, N., Yu, M., Lee, G., Kim, Y.,
X., and Hong, J. (2020). Deep learning for identifying corneal diseases Ting, D.S.W., Chong, C.C.Y., Choi, Y.S., et al. (2021). Deep-learning-
from ocular surface slit-lamp photographs. Sci. Rep. 10, 17851. based cardiovascular risk stratification using coronary artery calcium
101. Ferdi, A.C., Nguyen, V., Gore, D.M., Allan, B.D., Rozema, J.J., and Wat- scores predicted from retinal photographs. Lancet. Digit. Health 3,
son, S.L. (2019). Keratoconus natural progression: a systematic review e306–e316.
and meta-analysis of 11 529 eyes. Ophthalmology 126, 935–945. 118. Poplin, R., Varadarajan, A.V., Blumer, K., Liu, Y., McConnell, M.V., Cor-
102. de Sanctis, U., Loiacono, C., Richiardi, L., Turco, D., Mutani, B., and rado, G.S., Peng, L., and Webster, D.R. (2018). Prediction of cardiovas-
Grignolo, F.M. (2008). Sensitivity and specificity of posterior corneal cular risk factors from retinal fundus photographs via deep learning. Nat.
elevation measured by Pentacam in discriminating keratoconus/subclin- Biomed. Eng. 2, 158–164.
ical keratoconus. Ophthalmology 115, 1534–1539. 119. Kalantar-Zadeh, K., Jafar, T.H., Nitsch, D., Neuen, B.L., and Perkovic, V.
103. Castro-Luna, G., Jiménez-Rodrı́guez, D., Castaño-Fernández, A.B., and (2021). Chronic kidney disease. Lancet 398, 786–802.
Pérez-Rueda, A. (2021). Diagnosis of subclinical keratoconus based on
120. Ahmad, E., Lim, S., Lamptey, R., Webb, D.R., and Davies, M.J. (2022).
machine learning techniques. J. Clin. Med. 10, 4281.
Type 2 diabetes. Lancet 400, 1803–1820.
104. Al-Timemy, A.H., Mosa, Z.M., Alyasseri, Z., Lavric, A., Lui, M.M., Hazarbas-
121. Zhang, K., Liu, X., Xu, J., Yuan, J., Cai, W., Chen, T., Wang, K., Gao, Y.,
sanov, R.M., and Yousefi, S. (2021). A hybrid deep learning construct for de-
Nie, S., Xu, X., et al. (2021). Deep-learning models for the detection and
tecting keratoconus from corneal maps. Transl. Vis. Sci. Technol. 10, 16.
incidence prediction of chronic kidney disease and type 2 diabetes from
105. Almeida, J.G., Guido, R.C., Balarin, S.H., Brandao, C.C., Carlos, D.M.L., retinal fundus images. Nat. Biomed. Eng. 5, 533–545.
Lopes, B.T., Machado, A.P., and Ambrosio, R.J. (2022). Novel artificial in-
122. Sabanayagam, C., Xu, D., Ting, D.S.W., Nusinovici, S., Banu, R.,
telligence index based on Scheimpflug corneal tomography to distin-
Hamzah, H., Lim, C., Tham, Y.C., Cheung, C.Y., Tai, E.S., et al. (2020).
guish subclinical keratoconus from healthy corneas. J. Cataract Refract.
A deep learning algorithm to detect chronic kidney disease from retinal
Surg.
photographs in community-based populations. Lancet. Digit. Health 2,
106. Jiménez-Garcı́a, M., Issarti, I., Kreps, E.O., Nı́ Dhubhghaill, S., Koppen, e295–e302.
C., Varssano, D., and Rozema, J.J.; The REDCAKE Study Group
(2021). Forecasting progressive trends in keratoconus by means of a 123. Scheltens, P., De Strooper, B., Kivipelto, M., Holstege, H., Chételat, G.,
time delay neural network. J. Clin. Med. 10, 3238. Teunissen, C.E., Cummings, J., and van der Flier, W.M. (2021). Alz-
heimer’s disease. Lancet 397, 1577–1590.
107. Xie, Y., Zhao, L., Yang, X., Wu, X., Yang, Y., Huang, X., Liu, F., Xu, J., Lin,
L., Lin, H., et al. (2020). Screening candidates for refractive surgery with 124. Gupta, V.B., Chitranshi, N., den Haan, J., Mirzaei, M., You, Y., Lim, J.K.,
corneal Tomographic-Based deep learning. JAMA Ophthalmol. 138, Basavarajappa, D., Godinez, A., Di Angelantonio, S., Sachdev, P., et al.
519–526. (2021). Retinal changes in Alzheimer’s disease- integrated prospects of
imaging, functional and molecular advances. Prog. Retin. Eye Res. 82,
108. Zéboulon, P., Debellemanière, G., Bouvet, M., and Gatinel, D. (2020).
100899.
Corneal topography raw data classification using a convolutional neural
network. Am. J. Ophthalmol. 219, 33–39. 125. Cheung, C.Y., Ran, A.R., Wang, S., Chan, V.T.T., Sham, K., Hilal, S., Ven-
109. Shi, C., Wang, M., Zhu, T., Zhang, Y., Ye, Y., Jiang, J., Chen, S., Lu, F., ketasubramanian, N., Cheng, C.Y., Sabanayagam, C., Tham, Y.C., et al.
and Shen, M. (2020). Machine learning helps improve diagnostic ability (2022). A deep learning model for detection of Alzheimer’s disease based
of subclinical keratoconus using Scheimpflug and OCT imaging modal- on retinal photographs: a retrospective, multicentre case-control study.
ities. Eye Vis. 7, 48. Lancet. Digit. Health 4, e806–e815.

110. Cao, K., Verspoor, K., Chan, E., Daniell, M., Sahebjada, S., and Baird, 126. Price, W.N., and Cohen, I.G. (2019). Privacy in the age of medical big
P.N. (2021). Machine learning with a reduced dimensionality representa- data. Nat. Med. 25, 37–43.
tion of comprehensive Pentacam tomography parameters to identify 127. Rieke, N., Hancox, J., Li, W., Milletarı̀, F., Roth, H.R., Albarqouni, S., Ba-
subclinical keratoconus. Comput. Biol. Med. 138, 104884. kas, S., Galtier, M.N., Landman, B.A., Maier-Hein, K., et al. (2020). The
111. Issarti, I., Consejo, A., Jiménez-Garcı́a, M., Hershko, S., Koppen, C., and future of digital health with federated learning. NPJ Digit. Med. 3, 119.
Rozema, J.J. (2019). Computer aided diagnosis for suspect keratoconus
128. Dayan, I., Roth, H.R., Zhong, A., Harouni, A., Gentili, A., Abidin, A.Z., Liu,
detection. Comput. Biol. Med. 109, 33–42.
A., Costa, A.B., Wood, B.J., Tsai, C.S., et al. (2021). Federated learning
112. Ruiz Hidalgo, I., Rodriguez, P., Rozema, J.J., Nı́ Dhubhghaill, S., Zakaria, for predicting clinical outcomes in patients with COVID-19. Nat. Med.
N., Tassignon, M.J., and Koppen, C. (2016). Evaluation of a Machine- 27, 1735–1743.
Learning classifier for keratoconus detection based on scheimpflug to-
129. Krause, J., Gulshan, V., Rahimy, E., Karth, P., Widner, K., Corrado, G.S.,
mography. Cornea 35, 827–832.
Peng, L., and Webster, D.R. (2018). Grader variability and the importance
113. Chase, C., Elsawy, A., Eleiwa, T., Ozcan, E., Tolba, M., and Abou Shou- of reference standards for evaluating machine learning models for dia-
sha, M. (2021). Comparison of autonomous AS-OCT deep learning algo- betic retinopathy. Ophthalmology 125, 1264–1272.
rithm and clinical dry eye tests in diagnosis of dry eye disease. Clin. Oph-
130. Sylolypavan, A., Sleeman, D., Wu, H., and Sim, M. (2023). The impact of
thalmol. 15, 4281–4289.
inconsistent human annotations on AI driven clinical decision making.
114. Zhang, Y.Y., Zhao, H., Lin, J.Y., Wu, S.N., Liu, X.W., Zhang, H.D., Shao, NPJ Digit. Med. 6, 26.
Y., and Yang, W.F. (2021). Artificial intelligence to detect meibomian
gland dysfunction from in-vivo laser confocal microscopy. Front. Med. 131. Lin, D., Xiong, J., Liu, C., Zhao, L., Li, Z., Yu, S., Wu, X., Ge, Z., Hu, X.,
8, 774344. Wang, B., et al. (2021). Application of Comprehensive Artificial intelli-
gence Retinal Expert (CARE) system: a national real-world evidence
115. GBD 2013 Mortality and Causes of Death Collaborators (2015). Global,
study. Lancet. Digit. Health 3, e486–e495.
regional, and national age-sex specific all-cause and cause-specific
mortality for 240 causes of death, 1990-2013: a systematic analysis for 132. Playout, C., Duval, R., and Cheriet, F. (2019). A novel weakly supervised
the Global Burden of Disease Study 2013. Lancet 385, 117–171. multitask architecture for retinal lesions segmentation on fundus images.
IEEE Trans. Med. Imag. 38, 2434–2444.
116. Seidelmann, S.B., Claggett, B., Bravo, P.E., Gupta, A., Farhad, H., Klein,
B.E., Klein, R., Di Carli, M., and Solomon, S.D. (2016). Retinal vessel cal- 133. Wang, J., Li, W., Chen, Y., Fang, W., Kong, W., He, Y., and Shi, G. (2021).
ibers in predicting Long-Term cardiovascular outcomes: the atheroscle- Weakly supervised anomaly segmentation in retinal OCT images using an
rosis risk in communities study. Circulation 134, 1328–1338. adversarial learning approach. Biomed. Opt Express 12, 4713–4729.

Cell Reports Medicine 4, 101095, July 18, 2023 19


ll
OPEN ACCESS Review
134. Xing, R., Niu, S., Gao, X., Liu, T., Fan, W., and Chen, Y. (2021). Weakly 153. Xu, F., Jiang, L., He, W., Huang, G., Hong, Y., Tang, F., Lv, J., Lin, Y., Qin,
supervised serous retinal detachment segmentation in SD-OCT images Y., Lan, R., et al. (2021). The clinical value of explainable deep learning for
by two-stage learning. Biomed. Opt Express 12, 2312–2327. diagnosing fungal keratitis using in vivo confocal microscopy images.
135. Cai, W., Xu, J., Wang, K., Liu, X., Xu, W., Cai, H., Gao, Y., Su, Y., Zhang, Front. Med. 8, 797616.
M., Zhu, J., et al. (2021). EyeHealer: a large-scale anterior eye segment 154. Hospedales, T.M., Antoniou, A., Micaelli, P., and Storkey, A.J. (2021).
dataset with eye structure and lesion annotations. Precis. Clin. Med. Meta-Learning in Neural Networks: A Survey (IEEE Trans Pattern Anal
4, 85–92. Mach Intell).
136. Chen, D., Liu, S., Kingsbury, P., Sohn, S., Storlie, C.B., Habermann, E.B., 155. Maliha, G., Gerke, S., Cohen, I.G., and Parikh, R.B. (2021). Artificial intel-
Naessens, J.M., Larson, D.W., and Liu, H. (2019). Deep learning and ligence and liability in medicine: balancing safety and innovation. Milbank
alternative learning strategies for retrospective real-world clinical data. Q. 99, 629–647.
NPJ Digit. Med. 2, 43. 156. Price, W.N., Gerke, S., and Cohen, I.G. (2019). Potential liability for phy-
137. Yang, Y., Li, R., Xiang, Y., Lin, D., Yan, A., Chen, W., Li, Z., Lai, W., Wu, X., sicians using artificial intelligence. JAMA 322, 1765–1766.
Wan, C., et al. (2021). Standardization of collection, storage, annotation, 157. Hadsell, R., Rao, D., Rusu, A.A., and Pascanu, R. (2020). Embracing
and management of data related to medical artificial intelligence. Intelli- change: continual learning in deep neural networks. Trends Cognit. Sci.
gent Medicine. 24, 1028–1040.
138. Li, X.T., and Huang, R.Y. (2020). Standardization of imaging methods for 158. Li, Z., Jiang, J., Zhou, H., Zheng, Q., Liu, X., Chen, K., Weng, H., and
machine learning in neuro-oncology. Neurooncol. Adv. 2, v49–v55. Chen, W. (2021). Development of a deep learning-based image eligibility
139. Li, Z., and Chen, W. (2023). Solving data quality issues of fundus images verification system for detecting and filtering out ineligible fundus im-
in real-world settings by ophthalmic AI. Cell Rep. Med. 4, 100951. ages: a multicentre study. Int. J. Med. Inf. 147, 104363.
140. Liu, R., Wang, X., Wu, Q., Dai, L., Fang, X., Yan, T., Son, J., Tang, S., Li, J., 159. Li, Z., Jiang, J., Qiang, W., Guo, L., Liu, X., Weng, H., Wu, S., Zheng, Q.,
Gao, Z., et al. (2022). DeepDRiD: diabetic Retinopathy-Grading and im- and Chen, W. (2021). Comparison of deep learning systems and cornea
age quality estimation challenge. Patterns (N Y) 3, 100512. specialists in detecting corneal diseases from low-quality images. iS-
cience 24, 103317.
141. Liu, L., Wu, X., Lin, D., Zhao, L., Li, M., Yun, D., Lin, Z., Pang, J., Li, L., Wu,
Y., et al. (2023). DeepFundus: a flow-cytometry-like image quality classi- 160. Li, Z., Li, M., Wang, D., Hou, P., Chen, X., Chu, S., Chai, D., Zheng, J., Bai,
fier for boosting the whole life cycle of medical artificial intelligence. Cell J., Xu, F., et al. (2020). Deep learning from ‘‘passive feeding’’ to ‘‘selective
Rep. Med. 4, 100912. eating’’ of real-world data. Cell Biosci. 10, 143.

142. Li, Z., Jiang, J., Chen, K., Zheng, Q., Liu, X., Weng, H., Wu, S., and Chen, 161. Trucco, E., Ruggeri, A., Karnowski, T., Giancardo, L., Chaum, E.,
W. (2021). Development of a deep learning-based image quality control Hubschman, J.P., Al-Diri, B., Cheung, C.Y., Wong, D., Abràmoff, M.,
system to detect and filter out ineligible slit-lamp images: a multicenter et al. (2013). Validating retinal fundus image analysis algorithms: issues
study. Comput. Methods Progr. Biomed. 203, 106048. and a proposal. Invest. Ophthalmol. Vis. Sci. 54, 3546–3559.

143. Shen, Y., Sheng, B., Fang, R., Li, H., Dai, L., Stolte, S., Qin, J., Jia, W., and 162. Nagendran, M., Chen, Y., Lovejoy, C.A., Gordon, A.C., Komorowski, M.,
Shen, D. (2020). Domain-invariant interpretable fundus image quality Harvey, H., Topol, E.J., Ioannidis, J.P.A., Collins, G.S., and Maruthappu,
assessment. Med. Image Anal. 61, 101654. M. (2020). Artificial intelligence versus clinicians: systematic review of
design, reporting standards, and claims of deep learning studies. BMJ
144. Kanagasingam, Y., Xiao, D., Vignarajan, J., Preetham, A., Tay-Kearney,
368, m689.
M.L., and Mehrotra, A. (2018). Evaluation of artificial Intelligence-Based
grading of diabetic retinopathy in primary care. JAMA Netw. Open 1, 163. Cruz Rivera, S., Liu, X., Chan, A.W., Denniston, A.K., and Calvert, M.J.;
e182665. SPIRIT-AI and CONSORT-AI Working Group (2020). Guidelines for clin-
ical trial protocols for interventions involving artificial intelligence: the
145. Long, E., Lin, H., Liu, Z., Wu, X., Wang, L., Jiang, J., An, Y., Lin, Z., Li, X.,
SPIRIT-AI extension. Lancet. Digit. Health 2, e549–e560.
Chen, J., et al. (2017). An artificial intelligence platform for the multihos-
pital collaborative management of congenital cataracts. Nat. Biomed. 164. Liu, X., Cruz Rivera, S., Moher, D., Calvert, M.J., and Denniston, A.K.;
Eng. 1, 0024. SPIRIT-AI and CONSORT-AI Working Group (2020). Reporting guide-
lines for clinical trial reports for interventions involving artificial intelli-
146. Rajpurkar, P., Chen, E., Banerjee, O., and Topol, E.J. (2022). AI in health
gence: the CONSORT-AI extension. Nat. Med. 26, 1364–1374.
and medicine. Nat. Med. 28, 31–38.
165. Sounderajah, V., Ashrafian, H., Aggarwal, R., De Fauw, J., Denniston,
147. Wang, N., Cheng, M., and Ning, K. (2022). Overcoming regional limita-
A.K., Greaves, F., Karthikesalingam, A., King, D., Liu, X., Markar, S.R.,
tions: transfer learning for cross-regional microbial-based diagnosis of
et al. (2020). Developing specific reporting guidelines for diagnostic ac-
diseases. Gut. 2022-328216.
curacy studies assessing AI interventions: the STARD-AI Steering Group.
148. Shorten, C., and Khoshgoftaar, T.M. (2019). A survey on image data Nat. Med. 26, 807–808.
augmentation for deep learning. J. Big Data 6. 60-48. 166. Reddy, S., Rogers, W., Makinen, V.P., Coiera, E., Brown, P., Wenzel, M.,
149. Zhou, C., Ye, J., Wang, J., Zhou, Z., Wang, L., Jin, K., Wen, Y., Zhang, C., Weicken, E., Ansari, S., Mathur, P., Casey, A., and Kelly, B. (2021). Eval-
and Qian, D. (2022). Improving the generalization of glaucoma detection uation framework to guide implementation of AI systems into healthcare
on fundus images via feature alignment between augmented views. Bio- settings. BMJ Health Care Inform. 28, e100444.
med. Opt Express 13, 2018–2034. 167. Zhu, T., Ye, D., Wang, W., Zhou, W., and Yu, P.S. (2022). More than pri-
150. Li, W., Yang, Y., Zhang, K., Long, E., He, L., Zhang, L., Zhu, Y., Chen, C., vacy: applying differential privacy in key areas of artificial intelligence.
Liu, Z., Wu, X., et al. (2020). Dense anatomical annotation of slit-lamp im- IEEE Trans. Knowl. Data Eng. 34, 2824–2843.
ages improves the performance of deep learning for the diagnosis of 168. Hesamifard, E., Takabi, H., and Ghasemi, M. (2019). Deep Neural Net-
ophthalmic disorders. Nat. Biomed. Eng. 4, 767–777. works Classification over Encrypted Data,ACM, pp. 97–108.
151. Duran, J.M., and Jongsma, K.R. (2021). Who is afraid of black box algo- 169. US Food and Drug Administration (FDA). Proposed Regulatory Frame-
rithms? On the epistemological and ethical basis of trust in medical AI. work for Modifcations to Artifcial Intelligence/Machine Learning (AI/
J. Med. Ethics. ML)-Based Sofware as a Medical Device (SaMD)—Discussion
152. Niu, Y., Gu, L., Zhao, Y., and Lu, F. (2022). Explainable diabetic retinop- Paper and Request for Feedback. https://www.fda.gov/files/medical
athy detection and retinal image generation. IEEE J. Biomed. Health %20devices/published/US-FDA-Artificial-Intelligence-and-Machine-
Inform. 26, 44–55. Learning-Discussion-Paper.pdf. (Accessed 25 May 2022).

20 Cell Reports Medicine 4, 101095, July 18, 2023

You might also like