Overall Schematic Diagram Describing The Practical Application of AI in All Common Ophthalmic Imaging Modalities
Overall Schematic Diagram Describing The Practical Application of AI in All Common Ophthalmic Imaging Modalities
Overall Schematic Diagram Describing The Practical Application of AI in All Common Ophthalmic Imaging Modalities
OPEN ACCESS
Review
Artificial intelligence in ophthalmology:
The path to the real-world clinic
Zhongwen Li,1,2,8,* Lei Wang,2,8 Xuefang Wu,3,8 Jiewei Jiang,4,8 Wei Qiang,1 He Xie,2 Hongjian Zhou,5 Shanjun Wu,1,7
Yi Shao,6,* and Wei Chen1,2,*
1Ningbo Eye Hospital, Wenzhou Medical University, Ningbo 315000, China
2School of Ophthalmology and Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, China
3Guizhou Provincial People’s Hospital, Guizhou University, Guiyang 550002, China
4School of Electronic Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
5Department of Computer Science, University of Oxford, Oxford, Oxfordshire OX1 2JD, UK
6Department of Ophthalmology, the First Affiliated Hospital of Nanchang University, Nanchang 330006, China
7Senior author
8These authors contributed equally
SUMMARY
Artificial intelligence (AI) has great potential to transform healthcare by enhancing the workflow and produc-
tivity of clinicians, enabling existing staff to serve more patients, improving patient outcomes, and reducing
health disparities. In the field of ophthalmology, AI systems have shown performance comparable with or
even better than experienced ophthalmologists in tasks such as diabetic retinopathy detection and grading.
However, despite these quite good results, very few AI systems have been deployed in real-world clinical
settings, challenging the true value of these systems. This review provides an overview of the current main
AI applications in ophthalmology, describes the challenges that need to be overcome prior to clinical
implementation of the AI systems, and discusses the strategies that may pave the way to the clinical trans-
lation of these systems.
Cell Reports Medicine 4, 101095, July 18, 2023 ª 2023 The Authors. 1
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
ll
OPEN ACCESS Review
Figure 2. Overall schematic diagram describing the practical application of AI in all common ophthalmic imaging modalities
following two reasons. First, POAG, NTG, and CPACG are often ingly, improvements in screening methods for glaucoma are
painless, and visual field defects are inconspicuous at an early necessary. AI may pave the road for cost-effective glaucoma
stage. Therefore, self-detection of these types of glaucoma by screening programs, such as detecting glaucoma from fundus
affected people usually occurs at a relatively late stage when images or OCT images in an automated fashion (Table 2).
central visual acuity is reduced.35,37 Second, the primary Li et al.29 reported a deep-learning system with excellent per-
approach to detect glaucoma is the examination of the optic formance in detecting referable glaucomatous optic neuropathy
disc and retinal nerve fiber layer by a glaucoma specialist (GON) from fundus images. Specifically, they adopted the Incep-
through ophthalmoscopy or fundus images.39–41 Such manual tion-v3 algorithm to train the system and evaluated it in 8,000
optic disc assessment is time consuming and labor intensive, images. Their system achieved an AUC of 0.986 with a sensitivity
which is infeasible to implement in large populations. Accord- of 95.6% and a specificity of 92.0% for discerning referable
GON. As fundus imaging is intrinsically a two-dimensional (2D) angle-closure detection.55 The system achieved an AUC of
imaging modality observing the surface of the optic nerve head 0.96 with a sensitivity of 0.90 and a specificity of 0.92, which
but glaucoma is a three-dimensional (3D) disease with depth- were better than those of the qualitative feature-based system
resolved structural changes, fundus imaging may not able to (AUC, 0.90; sensitivity, 0.79; specificity, 0.87). These results indi-
reach a level of accuracy that could be acquired through OCT, cate that deep learning may mine a broader range of details of
a 3D imaging modality.52,53 Ran et al.49 trained and tested a anterior-segment OCT images than the qualitative features
3D deep-learning system using 6,921 spectral-domain OCT vol- (e.g., angle opening distance, angle recess area, and iris area)
umes of optic disc cubes from 1,384,200 2D cross-sectional determined by clinicians.
scans. Their 3D system reached an AUC of 0.969 in detecting AI can also be used to predict glaucoma progression. Yousefi
GON, significantly outperforming a 2D deep-learning system et al.56 reported an unsupervised machine-learning method to
trained with fundus images (AUC, 0.921). This 3D system also identify longitudinal glaucoma progression based on visual fields
had performance comparable with two glaucoma specialists from 2,085 eyes of 1,214 subjects. They found that this machine-
with over 10 years of experience. The heatmaps indicated that learning analysis detected progressing eyes earlier (3.5 years)
the features leveraged by the 3D system for GON detection than other methods such as global mean deviation (5.2 years),
were similar to those leveraged by glaucoma specialists. region-wise (4.5 years), and point-wise (3.9 years). Wang
Primary angle-closure glaucoma is avoidable if the progress of et al.57 proposed an AI approach, the archetype method, to
angle closure can be stopped at the early stages.54 Fu et al. detect visual field progression in glaucoma with an accuracy of
developed a deep-learning system using 4,135 anterior- 0.77. Moreover, this AI approach had a significantly higher
segment OCT images from 2,113 individuals for the automated agreement (kappa, 0.48) with the clinician assessment than
other existing methods (e.g., the permutation of point-wise linear strongest risk factor for AMD and almost all late AMD cases
regression). happen in people at ages over 60 years.58 With the aging pop-
ulation, AMD will continue to be a major cause of vision impair-
AMD ment worldwide. The number of AMD patients will reach 288
AMD, a disease that affects the macular area of the retina, million in 2040, denoting the substantial global burden of
often causes progressive loss of central vision.58 Age is the AMD.59 Consequently, screening for patients with AMD
(especially neovascular AMD) and providing suitable medical AI also has the potential to predict the possibility of progres-
interventions in a timely manner can reduce vision loss and sion to late AMD, guiding high-risk patients to start preventive
improve patient visual outcomes.60 care early (e.g., eating healthy food, abandoning smoking, and
AI has the potential to facilitate the automated detection of taking supplements) and assisting clinicians to decide the inter-
AMD and prediction of AMD progression (Table 3). Peng val of the patient’s follow-up examination. In patients diagnosed
et al.61 constructed and tested a deep-learning system with wet AMD in one eye, Yim et al.8 introduced an AI system to
(DeepSeeNet) using 59,302 fundus images from the longitudinal predict conversion to wet AMD in the second eye. Their system
follow-up of 4,549 subjects from the Age-Related Eye Disease was constructed by a segmentation network, diagnosis network,
Study (AREDS). DeepSeeNet performed well on patient-based and prediction network based on 130,327 3D OCT images and
multi-class classification with AUCs of 0.94, 0.93, and 0.97 in de- corresponding automatic tissue maps for predicting progression
tecting large drusen, pigmentary abnormalities, and late AMD, to wet AMD within a clinically actionable time window (6 months).
respectively. Burlina et al.62 reported a deep-learning system The system achieved 80% sensitivity at 55% specificity and
established by AlexNet based on over 130,000 fundus images 34% sensitivity at 90% specificity. As both genetic and environ-
from 4,613 patients to screen for referable AMD, and their sys- mental factors can affect the etiology of AMD, Yan et al.70 devel-
tem achieved an average AUC of 0.95. The referable AMD in their oped an AI approach with a modified deep convolutional neural
study refers to eyes with one of the following conditions: (1) large network (CNN) using 52 AMD-associated genetic variants and
drusen (size larger than 125 mm); (2) multiple medium-sized 31,262 fundus images from 1,351 individuals from the AREDS
drusen and pigmentation abnormalities; (3) choroidal neovascu- to predict whether an eye would progress to late AMD. Their re-
larization (CNV); (4) geographic atrophy.62 sults showed that the approach based on both fundus images
and genotypes could predict late AMD progression with an AUC for the automated detection of visually significant cataracts
of 0.85, whereas the approach based on fundus images alone (BCVA < 20/60), achieving AUCs of 0.916–0.965 in three external
achieved an AUC of 0.81. test sets. One merit of this system is that it can screen for cata-
racts with a single imaging modality, which is different from the
Other retinal diseases traditional method that requires slit-lamp and retroillumination
Numerous studies also have found that AI could be applied to images alongside BCVA measurement. The other merit is that
promote the automated detection of other retinal diseases this system can be readily integrated into existing fundus-im-
from clinical images to provide timely referrals for positive cases, age-based AI systems, allowing simultaneous screening for
solving the issues caused by the unbalanced distribution of other posterior-segment diseases.
ophthalmic medical resources. Milea et al.71 developed a Other than cataract screening, AI can also offer real-time guid-
deep-learning system using 14,341 fundus images to detect ance for phacoemulsification cataract surgery (PCS). Nespolo
papilledema. This system achieved an AUC of 0.96 in the et al.87 invented a computer vision-based platform using a re-
external test dataset consisting of 1,505 images. Brown et al.72 gion-based CNN (Faster R-CNN) built on ResNet50, a k-means
established a deep-learning system based on 5,511 retinal im- clustering technique, and an optical-flow-tracking technology
ages captured by RetCam to diagnose plus disease in retinop- to enhance the surgeon experience during the PCS. Specifically,
athy of prematurity (ROP), a leading cause of blindness in this platform can be used to receive frames from the video
childhood. The AUC of their system was 0.98 with a sensitivity source, locate the pupil, discern the surgical phase being per-
of 93% and a specificity of 93%. In terms of detecting peripheral formed, and provide visual feedback to the surgeon in real
retinal diseases, such as lattice degeneration and retinal breaks, time. The results showed that the platform achieved AUCs of
Li et al.73 trained models with four different deep-learning 0.996, 0.972, 0.997, and 0.880 for capsulorhexis, phacoemulsi-
algorithms (InceptionResNetV2, ResNet50, InceptionV3, and fication, cortex removal, and idle-phase recognition, respec-
VGG16) using 5,606 UWF images. They found that tively, with a dice score of 90.23% for pupil segmentation and
InceptionResNetV2 had the best performance, which achieved a mean processing speed of 97 frames per second. A usability
an AUC of 0.996 with 98.7% sensitivity and 99.2% specificity. survey suggested that most surgeons would be willing to
In addition, AI has also been employed in the automated identi- perform PCS for complex cataracts with this platform and
fication of retinal detachment,74 pathologic myopia,75 polypoidal thought it was accurate and helpful.
choroidal vasculopathy,76 etc.
Keratitis
APPLICATION OF AI ALGORITHMS IN Keratitis is a major global cause of corneal blindness, often
ANTERIOR-SEGMENT EYE DISEASES affecting marginalized populations.88 The burden of corneal
blindness on patients and the wider community can be huge,
Cataract particularly as it tends to occur in people at a younger age
In the past 20 years, although the prevalence of cataracts has than other blinding eye diseases such as AMD and cataracts.89
been decreasing due to the increasing rates of cataract surgery Keratitis can get worse quickly with time, which may lead to per-
because of improved techniques and active surgical initiatives, it manent visual impairment and even corneal perforation.90 Early
still affects 95 million people worldwide.77 Cataract remains the detection and timely management of keratitis can halt the dis-
leading cause of blindness (accounting for 50% of blindness), ease progression, resulting in a favorable prognosis.91
especially in low-income and middle-income countries.77 There- Li et al.15 found that AI had high accuracy in screening for kera-
fore, exploring a set of strategies to promote cataract screening titis and other corneal abnormalities from slit-lamp images. In
and related ophthalmic services is imperative. Recent advance- terms of the deep-learning algorithms, they used Inception-v3,
ments in AI may help achieve this goal, such as diagnosis and DenseNet121, and ResNet50, with DensNet121 performing
quantitative classification of age-related cataract from slit-lamp best. To be specific, the optimal algorithm DenseNet121
images (Table 4). reached AUCs of 0.988–0.997, 0.982–0.990, and 0.988–0.998
Keenan et al.16 trained deep-learning models, named for the classification of keratitis, other corneal abnormalities
DeepLensNet, to detect and quantify nuclear sclerosis (NS) (e.g., corneal dystrophies, corneal degeneration, corneal tu-
from 45 slit-lamp images and cortical lens opacity (CLO) and mors), and normal cornea, respectively, in three external test da-
posterior subcapsular cataract (PSC) from retroillumination im- tasets. Interestingly, their system also performed well on cornea
ages. NS grading was considered on 0.9–7.1 scale. CLO and images captured by smartphone under the super-macro mode,
PSC grading were both considered as percentages. In the full with an AUC of 0.967, a sensitivity of 91.9%, and a specificity
test set, mean squared error values for DeepLensNet were of 96.9% in keratitis detection. This smartphone-based
0.23 for NS, 13.1 for CLO, and 16.6 for PSC. The results indicate approach will be extremely cost-effective and convenient for
that this framework can perform automated and quantitative proactive keratitis screening by high-risk people (e.g., farmers
classification of cataract severity with high accuracy, which and contact lens wearers) if it can be applied to clinical practice.
has the potential to increase the accessibility of cataract evalua- To give prompt and precise treatment to patients with infectious
tion globally. Except for slit-lamp images, Tham et al.78 found keratitis, Xu et al.92 proposed a sequential-level deep-learning
that fundus images could also be used to develop an AI system system that could effectively discriminate among bacterial kera-
for cataract screening. Based on 25,742 fundus images, they titis, fungal keratitis, herpes simplex virus stromal keratitis, and
constructed a framework with ResNet50 and XGBoost classifier other corneal diseases (e.g., phlyctenular keratoconjunctivitis,
acanthamoeba keratitis, corneal papilloma), with an overall diag- or the loss of corneal transparency.101 Early identification of ker-
nostic accuracy of 80%, outperforming the mean diagnostic ac- atoconus, especially subclinical keratoconus, and subsequent
curacy (49.27%) achieved by 421 ophthalmologists. The strength treatment (e.g., corneal crosslinking and intrastromal corneal
of this system was that it could extract the detailed patterns of the ring segments) are crucial to stabilize the disease and improve
cornea region and assign local features to an ordered set to the visual prognosis.101 Advanced keratoconus can be detected
conform to the spatial structure and thereby learn the global fea- by classic clinical signs (e.g., Vogt’s striae, Munson’s sign,
tures of the corneal image to perform diagnosis, which achieved Fleischer ring) through slit-lamp examination or by corneal
better performance than conventional CNNs. Major AI applica- topographical characteristics such as increased corneal refrac-
tions in keratitis diagnosis are described in Table 5. tive power, steeper radial axis tilt, and inferior-superior (I-S)
corneal refractive asymmetry from corneal topographical
Keratoconus maps. However, the detection of subclinical keratoconus re-
Keratoconus is a progressive corneal ectasia with central or par- mains challenging.102
acentral stroma thinning and corneal protrusion, resulting in irre- AI may accurately diagnose subclinical keratoconus and ker-
versible visual impairment due to irregular corneal astigmatism atoconus and predict their progress trends (Table 6). Luna
et al.103 reported machine-learning techniques, decision tree, and F1 score of 0.99 for the two-class task (normal vs. kerato-
and random forest for the diagnosis of subclinical keratoconus conus) and an accuracy of 81.5% with an AUC of 0.93 and F1
based on Pentacam topographic and Corvis biomechanical score of 0.81 for the three-class task (normal vs. keratoconus
metrics, such as the flattest keratometry curvature, steepest vs. subclinical keratoconus).
keratometry curvature, stiffness parameter at the first flat- Early and accurate prediction of progress trends in keratoco-
tening, and corneal biomechanical index. The optimal model nus is critical for the prudent and cost-effective use of corneal
achieved an accuracy of 89% with a sensitivity of 93% and a crosslinking and the determination of timing of follow-up visits.
specificity of 86%. Meanwhile, they found that the stiffness Garcı́a et al.106 reported a time-delay neural network to predict
parameter at the first flattening was the most important deter- keratoconus progression using two prior tomography measure-
minant in identifying subclinical keratoconus. Timemy et al.104 ments from Pentacam. This network received six characteristics
introduced a hybrid deep-learning construct for the detection as input (e.g., average keratometry, the steepest radius of the
of keratoconus. This model was developed using corneal topo- front surface, and the average radius of the back surface), eval-
graphic maps from 204 normal eyes, 215 keratoconus eyes, uated in two consecutive examinations, forecasted the future
and 123 subclinical keratoconus eyes and was tested in an in- values, and obtained the result (stable or suspect progressive)
dependent dataset including 50 normal eyes, 50 keratoconus leveraging the significance of the variation from the baseline.
eyes, and 50 subclinical keratoconus eyes. The proposed The average positive and negative predictive values of the
model reached an accuracy of 98.8% with an AUC of 0.99 network were 71.4% and 80.2%, indicating it had the potential
to assist clinicians to make a personalized management plan for trols, 98.0%, 94.5%, and 92.6%, respectively. Moreover, Li
patients with keratoconus. et al.3 introduced an AI system based on Faster R-CNN and
DenseNet121 to detect malignant eyelid tumors from photo-
Other anterior-segment diseases graphic images captured by ordinary digital cameras. In an
A large number of studies have also proved the possibility of us- external test set, the average precision score of the system
ing AI to detect other anterior-segment diseases. For example, was 0.762 for locating eyelid tumors and the AUC was 0.899
Chase et al.113 demonstrated that the deep-learning system, for discerning malignant eyelid tumors.
developed by a VGG19 network based on 27,180 anterior-
segment OCT images, was able to identify dry eye diseases, APPLICATION OF AI ALGORITHMS IN PREDICTING
with 84.62% accuracy, 86.36% sensitivity, and 82.35% speci- SYSTEMIC DISEASES BASED ON RETINAL IMAGES
ficity. The performance of this system was significantly better
than some clinical dry eye tests, such as Schirmer’s test and AI has the potential to detect hidden information that clinicians
corneal staining, and was comparable with that of tear break- are normally unable to perceive from digital health data. In
up time and Ocular Surface Disease Index. In addition, Zhang ophthalmology, with the continuous advancement of AI technol-
et al.114 developed a deep-learning system for detecting ogies, the application of AI based on retinal images has extended
obstructive meibomian gland dysfunction (MGD) and atrophic from the detection of multiple fundus diseases to the screening
MGD using 4,985 in vivo laser confocal microscope images for systemic diseases. These breakthroughs can be attributed
and validated the system on 1,663 images. The accuracy, sensi- to the following three reasons: (1) the unique anatomy of the
tivity, and specificity of the system for obstructive MGD were eye offers an accessible ‘‘window’’ for the in vivo visualization
97.3%, 88.8%, and 95.4%, respectively; for atrophic MGD, of microvasculature and cerebral neurons; (2) the retina manifes-
98.6%, 89.4%, and 98.4%, respectively; and for healthy con- tations can be signs of many systemic diseases, such as
AI translational challenges
trained solely by retinal images achieved
AUCs of 0.733–0.911 in validation and
testing datasets, indicating the feasibility
diabetes and heart disease; (3) retinal changes can be recorded of employing retinal photography as an adjunctive screening
through non-invasive digital fundus imaging, which is low cost tool for CKD in community and primary care settings.122
and widely available in different levels of medical institutions.
Alzheimer’s disease
Cardiovascular disease Alzheimer’s disease (AD), a progressive neurodegenerative dis-
Cardiovascular disease (CVD) is a leading cause of death glob- ease, is the most common type of dementia in the elderly world-
ally, taking an estimated 17.9 million lives annually.115 Overt wide and is becoming one of the most lethal, expensive, and
retinal vascular damage (such as retinal hemorrhages) and sub- burdening diseases of this century.123 Diagnosis of AD is com-
tle changes (such as retinal arteriolar narrowing) are markers of plex and normally involves expensive and sometimes invasive
CVD.116 To improve present risk-stratification approaches for tests (such as amyloid positron emission tomography [PET] im-
CVD events, Rim et al.117 developed and validated a deep- aging and cerebrospinal fluid assays), which are not usually avail-
learning-based cardiovascular risk-stratification system using able outside of highly specialized clinical institutions. The retina is
216,152 retinal images from five datasets from Singapore, South an extension of the central nervous system and offers a distinc-
Korea, and the United Kingdom. This system achieved an AUC of tively accessible insight into brain pathology. Research has found
0.742 in predicting the presence of coronary artery calcium (a potentially measurable structural, vascular, and metabolic
preclinical marker of atherosclerosis and strongly associated changes in the retina at the early stages of AD.124 Therefore, using
with the risk of CVD). Poplin et al.118 reported that deep-learning noninvasive and low-cost retinal photography to detect AD is
models trained on data from 284,355 patients could extract new feasible. Cheung et al.125 demonstrated that a deep-learning
information from retinal images to predict cardiovascular risk model had the capability to identify AD from retinal images alone.
factors, such as age (mean absolute error [MAE] within 3.26 They trained, validated, and tested the model using 12,949 retinal
years), gender (AUC = 0.97), systolic blood pressure (MAE within images from 648 AD patients and 3,240 individuals without the
11.23 mm Hg), smoking status (AUC = 0.71), and major adverse disease.125 The model had accuracies ranging from 79.6% to
cardiac events (AUC = 0.70). Meanwhile, they demonstrated that 92.1% and AUCs ranging from 0.73 to 0.91 for detecting AD in
the deep-learning models generated each prediction using testing datasets. In the datasets with PET information, the model
anatomical features, such as the retinal vessels or the optic disc. can also distinguish between participants who were b-amyloid
positive and those who were b-amyloid negative, with accuracies
Chronic kidney disease and type 2 diabetes ranging from 80.6% to 89.3% and AUCs ranging from 0.68 to
Chronic kidney disease (CKD) is a progressive disease with high 0.86. This study showed that a retinal-image-based deep-
morbidity and mortality that occurs in the general adult popula- learning algorithm had high accuracy in detecting AD and this
tion, particularly in people with diabetes and hypertension.119 approach could be used to screen for AD in a community setting.
Type 2 diabetes is another common chronic disease that
accounts for nearly 90% of the 537 million cases of diabetes Challenges in the AI clinical translation
worldwide.120 Early diagnosis and proactive management of Although AI systems have shown great performance in a wide
CKD and diabetes are critical in reducing microvascular and variety of retrospective studies, relatively few of them have
macrovascular complications and mortality burden. As CKD been translated into clinical practice. Many challenges, such
and diabetes have manifestations in the retina, retinal images as the generalizability of AI systems, still exist and stand in the
can be used to detect and monitor these diseases. Zhang path of true clinical adoption of AI tools (Figure 3). In this section,
et al.121 reported that deep-learning models developed based we highlight some critical challenges and the research that has
on 115,344 retinal images from 56,672 patients were able to already been conducted to tackle these issues.
detect CKD and type 2 diabetes solely from retinal images or
in combination with clinical metadata (e.g., age, sex, body VALIDITY OF AI SYSTEMS
mass index, and blood pressure) with AUCs of 0.85–0.93. The
models can also be utilized to predict estimated glomerular filtra- Data issues in developing robust AI systems
tion rates and blood-glucose levels, with MAEs of 11.1–13.4 mL Data sharing
min 1 per 1.73 m2 and 0.65–1.1 mmol L 1, respectively.121 Sa- Large datasets are required to facilitate the development of a
banayagam et al.122 established a deep-learning algorithm using robust AI system. The lack of high-quality public datasets that
12,790 retinal images to screen for CKD. In this study, the model are truly representative of real-world clinical practice stands in
Federated sever
Global model
Local training and privacy preserving Local training and privacy preserving Local updates Local training and privacy preserving
the path of clinical translation of AI systems. Data sharing might may lead to unpredictable clinical consequences, such as erro-
be a good solution but it generates ethical and legal challenges in neous classifications.130 Several approaches have been lever-
general. Even if data are obtained in an anonymized manner, it aged to resolve disagreements between graders and obtain
can potentially put patient privacy at risk.126 Protecting patient ground truth. One method consists of adopting the majority
privacy and acquiring approval for data use are important rules decision from a panel of three or more professional graders.7
to comply with. Unfortunately, these rules may hinder data Another consists of recruiting two or more professional graders
sharing among different medical research groups, not to mention to label data independently and then employing another
making the data publicly available. The adoption of federated senior grader to arbitrate disagreements, with the senior
learning is a good alternative to training AI models with diverse grader’s decision used as the ground truth.131 Third, some
data from multiple clinical institutions without the centralization data annotations can be conducted using the recognized gold
of data.127 This strategy can address the issue that data reside standard. For example, Li et al.3 annotated images of benign
in different institutions, removing barriers to data sharing and cir- and malignant eyelid tumors based on unequivocal histopatho-
cumventing the problem of patient privacy (Figure 4). Meanwhile, logical diagnoses.
federated learning can facilitate rapid data science collaboration Normally, annotations can be divided into two categories: the
and thus improve the robustness of AI systems.128 In addition, annotation of interest regions in images (e.g., retinal hemor-
controlled-access data sharing, an approach that requires a rhages, exudates, and drusen) and clinical annotations (e.g.,
request for access to datasets to be approved, is another alter- disease classification, treatment response, vision prognosis).
native solution for researchers to acquire data to solve relevant Conducting manual annotations for large-scale datasets before
research issues while protecting participant privacy. the model training is a considerably time-consuming and la-
Data annotation bor-intensive task that needs a lot of professional graders or
Accurate clinical data annotations are crucial for the develop- ophthalmologists, hindering the construction of robust AI sys-
ment of reliable AI systems,2,129 as annotation inconsistencies tems.132–135 Therefore, exploring techniques to promote the
efficient production of annotations is important, although manual real-world primary care clinic. They found that the system had a
annotations are still necessary. Training models using weakly su- high false-positive rate (7.9%) and a low positive predictive value
pervised learning may be a good approach to reduce the work- (12%). The possible reason for this unsatisfactory performance of
load of manual annotations.133,134 For image segmentation, the system is that the incidence of DR in the primary care clinic
weak supervision requires sparse manual annotations of small was 1%, whereas their AI system was developed using retrospec-
interest regions using dots via experts, whereas full supervision tive data in which the incidence of DR was much higher (33.3%).
needs dense annotations, in which all pixels of images are manu- Long et al.145 developed an AI platform that had 98.25% accuracy
ally labeled.132,134 Playout et al.132 have reported that weak su- in childhood cataract diagnosis and 92.86% accuracy in treat-
pervision in combination with advanced learning in model ment suggestions in external datasets retrospectively collected
training can achieve performance comparable with fully super- from three hospitals. However, when they applied the platform
vised models for retinal lesion segmentation in fundus images. to unselected real-world datasets prospectively obtained from
Standardization of clinical data collection five hospitals, accuracies decreased to 87.4% in cataract diag-
In the past two decades, health systems have heavily invested in nosis and 70.8% in treatment determination.84 The possible
the digitalization of every aspect of their operation. This transfor- explanation for this phenomenon is that the retrospective datasets
mation has resulted in unprecedented growth in the volume of often undergo extensive filtering and cleaning, which makes them
medical electronic data, facilitating the development of AI-based less representative of real-world clinical practice. Randomized
medical devices.136 Although the size of datasets has increased, controlled trials (RCTs) and prospective research can bridge
data collection is not done in a standardized manner, affecting such gaps between theory and practice, showing the true perfor-
the ready utilization of these data for AI model training and testing. mance of AI systems in real healthcare settings and demon-
This issue leads to a growing number of multicentric efforts to deal strating how useful the systems are for the clinic.146
with the large variability in examination items, the timing of labora-
tory tests, image quality, etc. To improve the usability of data, GENERALIZABILITY OF AI SYSTEMS TO CLINICAL
standardization of clinical data collection should be implemented PRACTICE
to generate high-quality data with complete and consistent infor-
mation to support the development of robust medical AI prod- Although numerous studies reported that their AI systems
ucts.137,138 For example, medical text data collection should showed robust performance in detecting eye diseases and had
include basic information such as age, gender, and examination the potential to be applied in clinics, most AI-based medical
date, and health examination records should have all examination devices had not yet been authorized for market distribution for
items and complete results. Besides, as low-quality image data clinical management of diseases such as AMD, glaucoma, and
often result in a loss of diagnostic information and affect AI-based cataracts. One of the most important reasons for this is that
image analyses,139 image quality assessment is necessary at the the generalizability of AI systems to populations of different
stage of data allocation to filter out low-quality images, which ethnicities and different countries, different clinical application
could improve the performance of AI-based diagnostic models scenarios, and images captured using different types of cameras
in real-world settings.140 To address this issue, Liu et al.141 devel- remains uncertain. Lots of AI studies only evaluated their sys-
oped a deep-learning-based flow-cytometry-like image quality tems in data from a single source, hence the systems often per-
classifier for the automated, high-throughput, and multidimen- formed poorly in real-world datasets that had more sources of
sional classification of fundus image quality, which can detect variation than the datasets utilized in research papers.84 To
low-quality images in real time and then guide a photographer improve the generalizability of AI systems, first, we need to build
to acquire high-quality images immediately. Li et al.142 reported large, multicenter, and multiethnic datasets for system develop-
a deep-learning-based image-quality-control system that could ment and evaluation. Milea et al.71 developed a deep-learning
discern low-quality slit-lamp images. This system can be used system for papilledema detection using data collected from 19
as a prescreening tool to filter out low-quality images and ensure sites in 11 countries and evaluated the system in data obtained
that only high-quality images will be transferred to the subsequent from five other sites in five countries. The AUCs of their system
AI-based diagnostic systems. Shen et al.143 established a new in internal and external test datasets were 0.99 and 0.96, veri-
multi-task domain adaptation framework for the automated fying that the system had broad generalizability. In addition,
fundus image quality assessment. The proposed framework can transfer learning, a technique that aims to transfer knowledge
offer interpretable quality assessment with both quantitative from one task to a different but related task, can help decrease
scores and quality visualization, which outperforms different generalization errors of AI systems via reusing the weights of a
state-of-the-art methods. Dai et al.23 demonstrated that the AI- pretrained model, particularly when faced with tasks with limited
based image quality assessment could reduce the proportion of data.147 Kermany et al.69 demonstrated that AI systems trained
poor-quality images and significantly improve the accuracy of with a transfer-learning algorithm had good performance and
an AI model for DR diagnosis. generalizability in the diagnosis of common diseases from
different types of images, such as detecting diabetic macular
Real-world performance of AI systems edema from OCT images (accuracy = 98.2%) and pediatric
Recently, several reports showed that AI systems in practice were pneumonia from chest X-ray images (accuracy = 92.8%). Third,
less helpful than retrospective studies described.144,84 For the generalizability of AI networks can be improved by utilizing a
example, Kanagasingam et al.144 evaluated their deep-learning- data-augmentation (DA) strategy that creates more training sam-
based system for DR screening based on retinal images in a ples for increasing the diversity of the training data.148 Zhou
et al.149 proposed an approach named DA-based feature align- prognosis prediction, it is still unclear whether healthcare pro-
ment that could consistently and significantly improve the out- viders, developers, sellers, or regulators should be held account-
of-distribution generalizability (up to +16.3% mean of clean able if an AI system makes mistakes in real-world clinical
AUC) of AI algorithms in glaucoma detection from fundus im- practice even after being thoroughly clinically validated. For
ages. Fourth, an AI algorithm trained based on lesion labels example, AI systems may miss a retinal disease in a fundus im-
can broaden its generalizability in disease detection. Li et al.150 age or recommend an incorrect treatment strategy. As a result,
reported that the algorithm trained with the image-level classifi- patients may be injured. In this case, we have to determine
cation labels and the anatomical and pathological labels dis- who is responsible for this incident. The allocation of liability
played better performance and generalizability than that trained makes clear not only whether and from whom patients acquire
with only the image-level classification labels in diagnosing redress but also whether, potentially, AI systems will make their
ophthalmic disorders from slit-lamp images (accuracies, way into clinical practice.155 At present, the suggested solution is
99.22%–79.47% versus 90.14%–47.19%). to treat medical AI systems as a confirmatory tool rather than as
a source of ways to improve care.156 In other words, a physician
INTERPRETABILITY OF AI SYSTEMS should check every output from the medical AI systems to
ensure the results that meet and follow the standard of care.
AI systems are often described as black boxes due to the nature Therefore, the physician would be held liable if malpractice oc-
of these systems (being trained instead of being explicitly pro- curs due to using these systems. This strategy may minimize
grammed).151 It is difficult for clinicians to understand the precise the potential value of medical AI systems as some systems
underlying functioning of the systems. As a result, correcting may perform better than even the best physicians but the physi-
some erroneous behaviors might be difficult, and acceptance cians would choose to ignore the AI recommendation when it
by clinicians as well as regulatory approval might be hampered. conflicts with standard practice. Consequently, the approach
Decoding AI for clinicians can mitigate such uncertainty. This that can balance the safety and innovation of medical AI needs
challenge has provided a stimulus for research groups and in- to be further explored.
dustries to focus on explainable AI. Techniques that enable a
good understanding of the working principle of AI systems are FUTURE DIRECTIONS
developed. For instance, Niu et al.152 reported a method that
could enhance the interpretability of an AI system in detecting Generally, AI models were directly trained using existing open-
DR. To be specific, they first define novel pathological descrip- sourced machine-learning packages frequently utilized by
tors leveraging activated neurons of the DR detector to encode others to address the issue of interest without additional custom-
both the appearance and spatial information of lesions. Then, ization or refinement. This approach may limit the optimal perfor-
they proposed a novel generative adversarial network (GAN), mance of AI applications as no generalized solution exists in
Patho-GAN, to visualize the signs that the DR detector identified most cases. To improve the performance of AI tools, in-depth
as evidence to make a prediction. Xu et al.153 developed an knowledge of clinical problems as well as the features of AI algo-
explainable AI system for diagnosing fungal keratitis from in vivo rithms is indispensable. Therefore, applicable customization of
confocal microscopy images based on gradient-weighted class the algorithms should be conducted according to the specific
activation mapping (Grad-CAM) and guided Grad-CAM tech- challenges of each problem, which usually needs interdisci-
niques. They found that the assistance from the explainable AI plinary collaboration among ophthalmologists, computer scien-
system could boost ophthalmologists’ performance beyond tists (e.g., AI experts), policymakers, and others.
what was achievable by the ophthalmologist alone or with the Although AI studies have seen enormous progress in the past
black-box AI assistance. Overall, these interpretation frame- decade, they are predominantly based on fixed datasets and
works may facilitate AI acceptance for clinical usage. stationary environments. The performance of AI systems is often
fixed by the time they are developed. However, the world is not
LONGEVITY OF AI SYSTEMS stationary, which requires that AI systems should have the ability
as clinicians to improve themselves constantly and evolve to
The performance of AI systems has the potential to degrade over thrive in dynamic learning settings. Continual learning tech-
time as the characteristics of the world, such as disease distribu- niques, such as gradient-based learning, modular neural
tion, population characteristics, health infrastructure, and cyber network, and meta-learning, may enable AI models to obtain
technologies, are changing all the time. This requires that AI sys- specialized solutions without forgetting previous ones, namely
tems should have the ability of lifelong continuous learning to learning over a lifetime, as a clinician does.157 These techniques
keep and even improve their performance over time. The contin- may take AI to a higher level by improving learning efficiency and
uous learning technique, meta-learning, which aims to improve enabling knowledge transfer between related tasks.
the AI algorithm itself, is a potential approach to address this In addition to current diagnostic and predictive tasks, AI
issue.154 methods can also be employed to support ophthalmologists
with additional information impossible to obtain by sole visual in-
LIABILITY spection. For instance, the objective quantification of the area of
corneal ulcer via a combination of segmentation and detection
Although medical AI systems can help physicians in clinics, such techniques can assist ophthalmologists in precisely evaluating
as disease diagnosis, recommendations for treatment, and whether the treatment is effective on patients in follow-up visits.
If the area becomes smaller, it indicates that the condition has through AI devices, and how the devices will learn and change
improved. Otherwise, it denotes that the condition has worsened while remaining effective and safe, as well as strategies to
and treatment strategies may need to change. reduce performance loss. This regulatory framework is good
To date, AI is not immune to the garbage-in, garbage-out weak- guidance for research groups to better develop and report their
ness, even with big data. Appropriate data preprocessing to ac- AI-based medical products.
quire high-quality training sets is critical to the success of AI sys-
tems.7,158 While AI systems have good performance (e.g., Conclusions
detecting corneal diseases) in high-quality images, they often AI in ophthalmology has made huge strides over the past
perform poorly in low-quality images.159,160 Nevertheless, the per- decade. Plenty of studies have shown that the performance of
formance of human doctors in low-quality images is better than AI is equal to and even superior to that of ophthalmologists in
that of an AI system, exposing a vulnerability of the AI system.159 many diagnostic and predictive tasks. However, much work re-
As low-quality images are inevitable in real-world settings,142,161 mains to be done before deploying AI products from bench to
exploring approaches that can improve the performance of AI sys- bedside. Issues such as real-world performance, generaliz-
tems in low-quality images is needed to enhance the robustness ability, and interpretability of AI systems are still insufficiently
of AI-based products in clinical practice. investigated and will require more attention in future studies.
A lot of studies have drawn overly optimistic conclusions based The solution of data sharing, data annotation, and other related
on AI systems’ good performance on external validation datasets. problems will facilitate the development of more robust AI prod-
However, such results are not evidence of the clinical usefulness ucts. Strategies such as customization of AI algorithms for a spe-
of AI systems.162 Well-conducted and well-reported prospective cific clinical task and utilization of continual learning techniques
studies are essential to provide proof to truly demonstrate the may further improve AI’s ability to serve patients. RCTs and pro-
added value of AI systems in ophthalmology and pave the way spective studies following special guidelines (e.g., SPIRIT-AI
to clinical implementation. Recent guidelines, such as the Stan- extension, STARD-AI, and FDA’s guidance) can rigorously
dard Protocol Items: Recommendations for Interventional Trials demonstrate whether AI devices would bring a positive impact
(SPIRIT)-AI extension, Consolidated Standards of Reporting Trials to real healthcare settings, contributing to the clinical translation
(CONSORT)-AI extension, and Standards for Reporting of Diag- of these devices. Although this field is not completely mature yet,
nostic Accuracy Study (STARD)-AI, may improve the design, we hope AI will play an important role in the future of ophthal-
transparency, reporting, and nuanced conclusions of AI studies, mology, making healthcare more efficient, accurate, and acces-
rigorously validating the usefulness of medical AI, and ultimately sible, especially in regions lacking ophthalmologists.
improving the quality of patient care.163–165 In addition, an interna-
tional team established an evaluation framework termed Transla- SUPPLEMENTAL INFORMATION
tional Evaluation of Healthcare AI (TEHAI) focusing on the assess-
ment of translational aspects of AI systems in medicine.166 The Supplemental information can be found online at https://doi.org/10.1016/j.
xcrm.2023.101095.
evaluation components (e.g., capability, utility, and adoption) of
TEHAI can be used at any stage of the development and deploy- ACKNOWLEDGMENTS
ment of medical AI systems.166
Patient privacy and data security are major concerns in med- This study received funding from the National Natural Science Foundation of
ical AI development and application. Several approaches may China (grant nos. 82201148, 62276210), the Natural Science Foundation of
help address these issues. First, sensitive data should be ob- Zhejiang Province (grant no. LQ22H120002), the Medical and Health Science
tained and used in research with patient consent, and anonym- and Technology Project of Zhejiang Province (grant nos. 2022RC069,
2023KY1140), the Natural Science Foundation of Ningbo (grant no.
ization and aggregation strategies should be adopted to obscure
2023J390), the Natural Science Basic Research Program of Shaanxi (grant
personal details. Any clinical institution should handle patient no. 2022JM-380), and Ningbo Science & Technology Program (grant no.
data responsibly, for example, by utilizing appropriate security 2021S118). The funding organization played no role in the study design,
protocols. Second, differential privacy (DP), a data-perturba- data collection and analysis, decision to publish, or preparation of the
tion-based privacy approach that is able to retain the global in- manuscript.
formation of a dataset while reducing information about a single
AUTHOR CONTRIBUTIONS
individual, can be employed to reduce privacy risks and protect
data security.167 Based on this approach, an outside observer
Conception and design, Z.L., Y.S., and W.C.; identification of relevant litera-
cannot infer whether a specific individual was utilized for ture, Z.L., L.W., X.W., J.J., W.Q., H.X., H.Z., S.W., Y.S., and W.C.; manuscript
acquiring a result from the dataset. Third, homomorphic encryp- writing, all authors; final approval of the manuscript, all authors.
tion, an encryption scheme that allows computation on encryp-
ted data, is widely treated as a gold standard for data security. DECLARATION OF INTERESTS
This approach has successfully been applied to AI algorithms
The authors declare no competing interests.
and to the data that allow secure and joint computation.168
Recently, regulatory agencies, such as the Food and Drug
REFERENCES
Administration (FDA), have proposed a regulatory framework to
evaluate the safety and effectiveness of AI-based medical de- 1. Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M.,
vices during the initial premarket review.169 Specifically, manu- Chou, K., Cui, C., Corrado, G., Thrun, S., and Dean, J. (2019). A guide
facturers have to illustrate what aspects they intend to achieve to deep learning in healthcare. Nat. Med. 25, 24–29.
Corrado, G.S., et al. (2021). Predicting the risk of developing diabetic reti- learning on contact lens Sensor-Derived parameters for the diagnosis of
nopathy using deep learning. Lancet. Digit. Health 3, e10–e19. primary open-angle glaucoma. Am. J. Ophthalmol. 194, 46–53.
34. Arcadu, F., Benmansour, F., Maunz, A., Willis, J., Haskova, Z., and Pru- 51. Asaoka, R., Murata, H., Iwase, A., and Araie, M. (2016). Detecting pre-
notto, M. (2019). Deep learning algorithm predicts diabetic retinopathy perimetric glaucoma with standard automated perimetry using a deep
progression in individual patients. NPJ Digit. Med. 2, 92. learning classifier. Ophthalmology 123, 1974–1980.
35. Jonas, J.B., Aung, T., Bourne, R.R., Bron, A.M., Ritch, R., and Panda- 52. Girard, M.J.A., and Schmetterer, L. (2020). Artificial intelligence and deep
Jonas, S. (2017). Glaucoma. Lancet 390, 2183–2193. learning in glaucoma: current state and future prospects. Prog. Brain
36. Tham, Y.C., Li, X., Wong, T.Y., Quigley, H.A., Aung, T., and Cheng, C.Y. Res. 257, 37–64.
(2014). Global prevalence of glaucoma and projections of glaucoma 53. Hood, D.C., La Bruna, S., Tsamis, E., Thakoor, K.A., Rai, A., Leshno, A.,
burden through 2040: a systematic review and meta-analysis. Ophthal- de Moraes, C.G.V., Cioffi, G.A., and Liebmann, J.M. (2022). Detecting
mology 121, 2081–2090. glaucoma with only OCT: implications for the clinic, research, screening,
37. Weinreb, R.N., Aung, T., and Medeiros, F.A. (2014). The pathophysiology and AI development. Prog. Retin. Eye Res. 90, 101052.
and treatment of glaucoma: a review. JAMA 311, 1901–1911. 54. Ran, A.R., Tham, C.C., Chan, P.P., Cheng, C.Y., Tham, Y.C., Rim, T.H.,
38. GBD 2019 Blindness and Vision Impairment Collaborators; Vision Loss and Cheung, C.Y. (2021). Deep learning in glaucoma with optical coher-
Expert Group of the Global Burden of Disease Study; Briant, P.S., Flax- ence tomography: a review. Eye 35, 188–201.
man, S.R., Taylor, H.R.B., Jonas, J.B., Abdoli, A.A., Abrha, W.A., Abual- 55. Fu, H., Baskaran, M., Xu, Y., Lin, S., Wong, D.W.K., Liu, J., Tun, T.A., Ma-
hasan, A., Abu-Gharbieh, E.G., et al. (2021). Causes of blindness and hesh, M., Perera, S.A., and Aung, T. (2019). A deep learning system for
vision impairment in 2020 and trends over 30 years, and prevalence of automated Angle-Closure detection in anterior segment optical coher-
avoidable blindness in relation to VISION 2020: the Right to Sight: an ence tomography images. Am. J. Ophthalmol. 203, 37–45.
analysis for the Global Burden of Disease Study. Lancet Global Health 56. Yousefi, S., Kiwaki, T., Zheng, Y., Sugiura, H., Asaoka, R., Murata, H., Lemij,
9, e144–e160. H., and Yamanishi, K. (2018). Detection of longitudinal visual field progres-
39. Yu, M., Lin, C., Weinreb, R.N., Lai, G., Chiu, V., and Leung, C.K.S. (2016). sion in glaucoma using machine learning. Am. J. Ophthalmol. 193, 71–79.
Risk of visual field progression in glaucoma patients with progressive 57. Wang, M., Shen, L.Q., Pasquale, L.R., Petrakos, P., Formica, S., Boland,
retinal nerve fiber layer thinning: a 5-Year prospective study. Ophthal- M.V., Wellik, S.R., De Moraes, C.G., Myers, J.S., Saeedi, O., et al. (2019).
mology 123, 1201–1210. An artificial intelligence approach to detect visual field progression in
40. King, A., Azuara-Blanco, A., and Tuulonen, A. (2013). Glaucoma. BMJ glaucoma based on spatial pattern analysis. Invest. Ophthalmol. Vis.
346, f3518. Sci. 60, 365–375.
41. Hollands, H., Johnson, D., Hollands, S., Simel, D.L., Jinapriya, D., and 58. Mitchell, P., Liew, G., Gopinath, B., and Wong, T.Y. (2018). Age-related
Sharma, S. (2013). Do findings on routine examination identify patients macular degeneration. Lancet 392, 1147–1159.
at risk for primary open-angle glaucoma? The rational clinical examina-
59. Wong, W.L., Su, X., Li, X., Cheung, C.M.G., Klein, R., Cheng, C.Y., and
tion systematic review. JAMA 309, 2035–2042.
Wong, T.Y. (2014). Global prevalence of age-related macular degenera-
42. Xiong, J., Li, F., Song, D., Tang, G., He, J., Gao, K., Zhang, H., Cheng, W., tion and disease burden projection for 2020 and 2040: a systematic re-
Song, Y., Lin, F., et al. (2022). Multimodal machine learning using visual view and meta-analysis. Lancet Global Health 2, e106–e116.
fields and peripapillary circular OCT scans in detection of glaucomatous
60. Heier, J.S., Khanani, A.M., Quezada Ruiz, C., Basu, K., Ferrone, P.J.,
optic neuropathy. Ophthalmology 129, 171–180.
Brittain, C., Figueroa, M.S., Lin, H., Holz, F.G., Patel, V., et al. (2022). Ef-
43. Li, F., Su, Y., Lin, F., Li, Z., Song, Y., Nie, S., Xu, J., Chen, L., Chen, S., Li, ficacy, durability, and safety of intravitreal faricimab up to every 16 weeks
H., et al. (2022). A deep-learning system predicts glaucoma incidence for neovascular age-related macular degeneration (TENAYA and
and progression using retinal photographs. J. Clin. Invest. 132, e157968. LUCERNE): two randomised, double-masked, phase 3, non-inferiority
44. Fan, R., Bowd, C., Christopher, M., Brye, N., Proudfoot, J.A., Rezapour, trials. Lancet 399, 729–740.
J., Belghith, A., Goldbaum, M.H., Chuter, B., Girkin, C.A., et al. (2022). 61. Peng, Y., Dharssi, S., Chen, Q., Keenan, T.D., Agrón, E., Wong, W.T., Chew,
Detecting glaucoma in the ocular hypertension study using deep E.Y., and Lu, Z. (2019). DeepSeeNet: a deep learning model for automated
learning. JAMA Ophthalmol. 140, 383–391. classification of patient-based age-related macular degeneration severity
45. Li, F., Yang, Y., Sun, X., Qiu, Z., Zhang, S., Tun, T.A., Mani, B., Nongpiur, from color fundus photographs. Ophthalmology 126, 565–575.
M.E., Chansangpetch, S., Ratanawongphaibul, K., et al. (2022). Digital 62. Burlina, P.M., Joshi, N., Pekala, M., Pacheco, K.D., Freund, D.E., and
gonioscopy based on three-dimensional Anterior-Segment OCT: an in- Bressler, N.M. (2017). Automated grading of Age-Related macular
ternational multicenter study. Ophthalmology 129, 45–53. degeneration from color fundus images using deep convolutional neural
46. Dixit, A., Yohannan, J., and Boland, M.V. (2021). Assessing glaucoma networks. JAMA Ophthalmol. 135, 1170–1176.
progression using machine learning trained on longitudinal visual field 63. Potapenko, I., Thiesson, B., Kristensen, M., Hajari, J.N., Ilginis, T., Fuchs,
and clinical data. Ophthalmology 128, 1016–1026. J., Hamann, S., and la Cour, M. (2022). Automated artificial intelligence-
47. Medeiros, F.A., Jammal, A.A., and Mariottoni, E.B. (2021). Detection of based system for clinical follow-up of patients with age-related macular
progressive glaucomatous optic nerve damage on fundus photographs degeneration. Acta Ophthalmol. 100, 927–936.
with deep learning. Ophthalmology 128, 383–392. 64. Yellapragada, B., Hornauer, S., Snyder, K., Yu, S., and Yiu, G. (2022).
48. Yousefi, S., Elze, T., Pasquale, L.R., Saeedi, O., Wang, M., Shen, L.Q., Self-Supervised feature learning and phenotyping for assessing Age-
Wellik, S.R., De Moraes, C.G., Myers, J.S., and Boland, M.V. (2020). Related macular degeneration using retinal fundus images. Ophthalmol.
Monitoring glaucomatous functional loss using an artificial Intelligence- Retina 6, 116–129.
Enabled dashboard. Ophthalmology 127, 1170–1178. 65. Rakocz, N., Chiang, J.N., Nittala, M.G., Corradetti, G., Tiosano, L., Ve-
49. Ran, A.R., Cheung, C.Y., Wang, X., Chen, H., Luo, L.Y., Chan, P.P., laga, S., Thompson, M., Hill, B.L., Sankararaman, S., Haines, J.L., et al.
Wong, M.O.M., Chang, R.T., Mannil, S.S., Young, A.L., et al. (2019). (2021). Automated identification of clinical features from sparsely anno-
Detection of glaucomatous optic neuropathy with spectral-domain opti- tated 3-dimensional medical imaging. NPJ Digit. Med. 4, 44.
cal coherence tomography: a retrospective training and validation deep- 66. Hwang, D.K., Hsu, C.C., Chang, K.J., Chao, D., Sun, C.H., Jheng, Y.C.,
learning analysis. Lancet. Digit. Health 1, e172–e182. Yarmishyn, A.A., Wu, J.C., Tsai, C.Y., Wang, M.L., et al. (2019). Artificial
50. Martin, K.R., Mansouri, K., Weinreb, R.N., Wasilewicz, R., Gisler, C., Hen- intelligence-based decision-making for age-related macular degenera-
nebert, J., and Genoud, D.; Research Consortium (2018). Use of machine tion. Theranostics 9, 232–245.
73. Li, Z., Guo, C., Nie, D., Lin, D., Zhu, Y., Chen, C., Zhang, L., Xu, F., Jin, C., 90. Singh, P., Gupta, A., and Tripathy, K. (2020). Keratitis. StatPearls.
Zhang, X., et al. (2019). A deep learning system for identifying lattice 91. Lin, A., Rhee, M.K., Akpek, E.K., Amescua, G., Farid, M., Garcia-Ferrer,
degeneration and retinal breaks using ultra-widefield fundus images. F.J., Varu, D.M., Musch, D.C., Dunn, S.P., and Mah, F.S.; American
Ann. Transl. Med. 7, 618. Academy of Ophthalmology Preferred Practice Pattern Cornea and
74. Li, Z., Guo, C., Nie, D., Lin, D., Zhu, Y., Chen, C., Wu, X., Xu, F., Jin, C., External Disease Panel (2019). Bacterial keratitis preferred practice Pat-
Zhang, X., et al. (2020). Deep learning for detecting retinal detachment tern(R). Ophthalmology 126, P1–P55.
and discerning macular status using ultra-widefield fundus images. 92. Xu, Y., Kong, M., Xie, W., Duan, R., Fang, Z., Lin, Y., Zhu, Q., Tang, S.,
Commun. Biol. 3, 15. Wu, F., and Yao, Y.F. (2021). Deep sequential feature learning in clinical
75. Li, Y., Foo, L.L., Wong, C.W., Li, J., Hoang, Q.V., Schmetterer, L., Ting, image classification of infectious keratitis. Engineering 7, 1002–1010.
D.S.W., and Ang, M. (2023). Pathologic myopia: advances in imaging 93. Redd, T.K., Prajna, N.V., Srinivasan, M., Lalitha, P., Krishnan, T., Rajara-
and the potential role of artificial intelligence. Br. J. Ophthalmol. 107, man, R., Venugopal, A., Acharya, N., Seitzman, G.D., Lietman, T.M., et al.
600–606. (2022). Image-Based differentiation of bacterial and fungal keratitis using
76. Tsai, Y.Y., Lin, W.Y., Chen, S.J., Ruamviboonsuk, P., King, C.H., and deep convolutional neural networks. Ophthalmol. Sci. 2, 100119.
Tsai, C.L. (2022). Diagnosis of polypoidal choroidal vasculopathy from 94. Ren, Z., Li, W., Liu, Q., Dong, Y., and Huang, Y. (2022). Profiling of the
fluorescein angiography using deep learning. Transl. Vis. Sci. Technol. conjunctival bacterial microbiota reveals the feasibility of utilizing a
11, 6. Microbiome-Based machine learning model to differentially diagnose mi-
77. Liu, Y.C., Wilkins, M., Kim, T., Malyugin, B., and Mehta, J.S. (2017). Lan- crobial keratitis and the core components of the conjunctival bacterial
cet 390, 600–612. interaction network. Front. Cell. Infect. Microbiol. 12, 860370.
78. Tham, Y.C., Goh, J.H.L., Anees, A., Lei, X., Rim, T.H., Chee, M.L., Wang, 95. Wu, W., Huang, S., Xie, X., Chen, C., Yan, Z., Lv, X., Fan, Y., Chen, C.,
Y.X., Jonas, J.B., Thakur, S., Teo, Z.L., et al. (2022). Detecting visually Yue, F., and Yang, B. (2022). Raman spectroscopy may allow rapid
significant cataract using retinal photograph-based deep learning. Nat. noninvasive screening of keratitis and conjunctivitis. Photodiagnosis
Aging 2, 264–271. Photodyn. Ther. 37, 102689.
79. Xu, X., Li, J., Guan, Y., Zhao, L., Zhao, Q., Zhang, L., and Li, L. (2021). 96. Tiwari, M., Piech, C., Baitemirova, M., Prajna, N.V., Srinivasan, M., Lali-
GLA-Net: a global-local attention network for automatic cataract classi- tha, P., Villegas, N., Balachandar, N., Chua, J.T., Redd, T., et al. (2022).
fication. J. Biomed. Inf. 124, 103939. Differentiation of active corneal infections from healed scars using
80. Lu, Q., Wei, L., He, W., Zhang, K., Wang, J., Zhang, Y., Rong, X., Zhao, Z., deep learning. Ophthalmology 129, 139–146.
Cai, L., He, X., et al. (2022). Lens Opacities Classification System III- 97. Ghosh, A.K., Thammasudjarit, R., Jongkhajornpong, P., Attia, J., and
based artificial intelligence program for automatic cataract grading. Thakkinstian, A. (2022). Deep learning for discrimination between fungal
J. Cataract Refract. Surg. 48, 528–534. keratitis and bacterial keratitis: DeepKeratitis. Cornea 41, 616–622.
81. Lin, D., Chen, J., Lin, Z., Li, X., Zhang, K., Wu, X., Liu, Z., Huang, J., Li, J., 98. Wang, L., Chen, K., Wen, H., Zheng, Q., Chen, Y., Pu, J., and Chen, W.
Zhu, Y., et al. (2020). A practical model for the identification of congenital (2021). Feasibility assessment of infectious keratitis depicted on slit-
cataracts using machine learning. EBioMedicine 51, 102621. lamp and smartphone photographs using deep learning. Int. J. Med.
82. Xu, X., Zhang, L., Li, J., Guan, Y., and Zhang, L. (2020). A hybrid Global- Inf. 155, 104583.
Local representation CNN model for automatic cataract grading. IEEE J. 99. Lv, J., Zhang, K., Chen, Q., Chen, Q., Huang, W., Cui, L., Li, M., Li, J.,
Biomed. Health Inform. 24, 556–567. Chen, L., Shen, C., et al. (2020). Deep learning-based automated diag-
83. Wu, X., Huang, Y., Liu, Z., Lai, W., Long, E., Zhang, K., Jiang, J., Lin, D., nosis of fungal keratitis with in vivo confocal microscopy images. Ann.
Chen, K., Yu, T., et al. (2019). Universal artificial intelligence platform for Transl. Med. 8, 706.
100. Gu, H., Guo, Y., Gu, L., Wei, A., Xie, S., Ye, Z., Xu, J., Zhou, X., Lu, Y., Liu, 117. Rim, T.H., Lee, C.J., Tham, Y.C., Cheung, N., Yu, M., Lee, G., Kim, Y.,
X., and Hong, J. (2020). Deep learning for identifying corneal diseases Ting, D.S.W., Chong, C.C.Y., Choi, Y.S., et al. (2021). Deep-learning-
from ocular surface slit-lamp photographs. Sci. Rep. 10, 17851. based cardiovascular risk stratification using coronary artery calcium
101. Ferdi, A.C., Nguyen, V., Gore, D.M., Allan, B.D., Rozema, J.J., and Wat- scores predicted from retinal photographs. Lancet. Digit. Health 3,
son, S.L. (2019). Keratoconus natural progression: a systematic review e306–e316.
and meta-analysis of 11 529 eyes. Ophthalmology 126, 935–945. 118. Poplin, R., Varadarajan, A.V., Blumer, K., Liu, Y., McConnell, M.V., Cor-
102. de Sanctis, U., Loiacono, C., Richiardi, L., Turco, D., Mutani, B., and rado, G.S., Peng, L., and Webster, D.R. (2018). Prediction of cardiovas-
Grignolo, F.M. (2008). Sensitivity and specificity of posterior corneal cular risk factors from retinal fundus photographs via deep learning. Nat.
elevation measured by Pentacam in discriminating keratoconus/subclin- Biomed. Eng. 2, 158–164.
ical keratoconus. Ophthalmology 115, 1534–1539. 119. Kalantar-Zadeh, K., Jafar, T.H., Nitsch, D., Neuen, B.L., and Perkovic, V.
103. Castro-Luna, G., Jiménez-Rodrı́guez, D., Castaño-Fernández, A.B., and (2021). Chronic kidney disease. Lancet 398, 786–802.
Pérez-Rueda, A. (2021). Diagnosis of subclinical keratoconus based on
120. Ahmad, E., Lim, S., Lamptey, R., Webb, D.R., and Davies, M.J. (2022).
machine learning techniques. J. Clin. Med. 10, 4281.
Type 2 diabetes. Lancet 400, 1803–1820.
104. Al-Timemy, A.H., Mosa, Z.M., Alyasseri, Z., Lavric, A., Lui, M.M., Hazarbas-
121. Zhang, K., Liu, X., Xu, J., Yuan, J., Cai, W., Chen, T., Wang, K., Gao, Y.,
sanov, R.M., and Yousefi, S. (2021). A hybrid deep learning construct for de-
Nie, S., Xu, X., et al. (2021). Deep-learning models for the detection and
tecting keratoconus from corneal maps. Transl. Vis. Sci. Technol. 10, 16.
incidence prediction of chronic kidney disease and type 2 diabetes from
105. Almeida, J.G., Guido, R.C., Balarin, S.H., Brandao, C.C., Carlos, D.M.L., retinal fundus images. Nat. Biomed. Eng. 5, 533–545.
Lopes, B.T., Machado, A.P., and Ambrosio, R.J. (2022). Novel artificial in-
122. Sabanayagam, C., Xu, D., Ting, D.S.W., Nusinovici, S., Banu, R.,
telligence index based on Scheimpflug corneal tomography to distin-
Hamzah, H., Lim, C., Tham, Y.C., Cheung, C.Y., Tai, E.S., et al. (2020).
guish subclinical keratoconus from healthy corneas. J. Cataract Refract.
A deep learning algorithm to detect chronic kidney disease from retinal
Surg.
photographs in community-based populations. Lancet. Digit. Health 2,
106. Jiménez-Garcı́a, M., Issarti, I., Kreps, E.O., Nı́ Dhubhghaill, S., Koppen, e295–e302.
C., Varssano, D., and Rozema, J.J.; The REDCAKE Study Group
(2021). Forecasting progressive trends in keratoconus by means of a 123. Scheltens, P., De Strooper, B., Kivipelto, M., Holstege, H., Chételat, G.,
time delay neural network. J. Clin. Med. 10, 3238. Teunissen, C.E., Cummings, J., and van der Flier, W.M. (2021). Alz-
heimer’s disease. Lancet 397, 1577–1590.
107. Xie, Y., Zhao, L., Yang, X., Wu, X., Yang, Y., Huang, X., Liu, F., Xu, J., Lin,
L., Lin, H., et al. (2020). Screening candidates for refractive surgery with 124. Gupta, V.B., Chitranshi, N., den Haan, J., Mirzaei, M., You, Y., Lim, J.K.,
corneal Tomographic-Based deep learning. JAMA Ophthalmol. 138, Basavarajappa, D., Godinez, A., Di Angelantonio, S., Sachdev, P., et al.
519–526. (2021). Retinal changes in Alzheimer’s disease- integrated prospects of
imaging, functional and molecular advances. Prog. Retin. Eye Res. 82,
108. Zéboulon, P., Debellemanière, G., Bouvet, M., and Gatinel, D. (2020).
100899.
Corneal topography raw data classification using a convolutional neural
network. Am. J. Ophthalmol. 219, 33–39. 125. Cheung, C.Y., Ran, A.R., Wang, S., Chan, V.T.T., Sham, K., Hilal, S., Ven-
109. Shi, C., Wang, M., Zhu, T., Zhang, Y., Ye, Y., Jiang, J., Chen, S., Lu, F., ketasubramanian, N., Cheng, C.Y., Sabanayagam, C., Tham, Y.C., et al.
and Shen, M. (2020). Machine learning helps improve diagnostic ability (2022). A deep learning model for detection of Alzheimer’s disease based
of subclinical keratoconus using Scheimpflug and OCT imaging modal- on retinal photographs: a retrospective, multicentre case-control study.
ities. Eye Vis. 7, 48. Lancet. Digit. Health 4, e806–e815.
110. Cao, K., Verspoor, K., Chan, E., Daniell, M., Sahebjada, S., and Baird, 126. Price, W.N., and Cohen, I.G. (2019). Privacy in the age of medical big
P.N. (2021). Machine learning with a reduced dimensionality representa- data. Nat. Med. 25, 37–43.
tion of comprehensive Pentacam tomography parameters to identify 127. Rieke, N., Hancox, J., Li, W., Milletarı̀, F., Roth, H.R., Albarqouni, S., Ba-
subclinical keratoconus. Comput. Biol. Med. 138, 104884. kas, S., Galtier, M.N., Landman, B.A., Maier-Hein, K., et al. (2020). The
111. Issarti, I., Consejo, A., Jiménez-Garcı́a, M., Hershko, S., Koppen, C., and future of digital health with federated learning. NPJ Digit. Med. 3, 119.
Rozema, J.J. (2019). Computer aided diagnosis for suspect keratoconus
128. Dayan, I., Roth, H.R., Zhong, A., Harouni, A., Gentili, A., Abidin, A.Z., Liu,
detection. Comput. Biol. Med. 109, 33–42.
A., Costa, A.B., Wood, B.J., Tsai, C.S., et al. (2021). Federated learning
112. Ruiz Hidalgo, I., Rodriguez, P., Rozema, J.J., Nı́ Dhubhghaill, S., Zakaria, for predicting clinical outcomes in patients with COVID-19. Nat. Med.
N., Tassignon, M.J., and Koppen, C. (2016). Evaluation of a Machine- 27, 1735–1743.
Learning classifier for keratoconus detection based on scheimpflug to-
129. Krause, J., Gulshan, V., Rahimy, E., Karth, P., Widner, K., Corrado, G.S.,
mography. Cornea 35, 827–832.
Peng, L., and Webster, D.R. (2018). Grader variability and the importance
113. Chase, C., Elsawy, A., Eleiwa, T., Ozcan, E., Tolba, M., and Abou Shou- of reference standards for evaluating machine learning models for dia-
sha, M. (2021). Comparison of autonomous AS-OCT deep learning algo- betic retinopathy. Ophthalmology 125, 1264–1272.
rithm and clinical dry eye tests in diagnosis of dry eye disease. Clin. Oph-
130. Sylolypavan, A., Sleeman, D., Wu, H., and Sim, M. (2023). The impact of
thalmol. 15, 4281–4289.
inconsistent human annotations on AI driven clinical decision making.
114. Zhang, Y.Y., Zhao, H., Lin, J.Y., Wu, S.N., Liu, X.W., Zhang, H.D., Shao, NPJ Digit. Med. 6, 26.
Y., and Yang, W.F. (2021). Artificial intelligence to detect meibomian
gland dysfunction from in-vivo laser confocal microscopy. Front. Med. 131. Lin, D., Xiong, J., Liu, C., Zhao, L., Li, Z., Yu, S., Wu, X., Ge, Z., Hu, X.,
8, 774344. Wang, B., et al. (2021). Application of Comprehensive Artificial intelli-
gence Retinal Expert (CARE) system: a national real-world evidence
115. GBD 2013 Mortality and Causes of Death Collaborators (2015). Global,
study. Lancet. Digit. Health 3, e486–e495.
regional, and national age-sex specific all-cause and cause-specific
mortality for 240 causes of death, 1990-2013: a systematic analysis for 132. Playout, C., Duval, R., and Cheriet, F. (2019). A novel weakly supervised
the Global Burden of Disease Study 2013. Lancet 385, 117–171. multitask architecture for retinal lesions segmentation on fundus images.
IEEE Trans. Med. Imag. 38, 2434–2444.
116. Seidelmann, S.B., Claggett, B., Bravo, P.E., Gupta, A., Farhad, H., Klein,
B.E., Klein, R., Di Carli, M., and Solomon, S.D. (2016). Retinal vessel cal- 133. Wang, J., Li, W., Chen, Y., Fang, W., Kong, W., He, Y., and Shi, G. (2021).
ibers in predicting Long-Term cardiovascular outcomes: the atheroscle- Weakly supervised anomaly segmentation in retinal OCT images using an
rosis risk in communities study. Circulation 134, 1328–1338. adversarial learning approach. Biomed. Opt Express 12, 4713–4729.
142. Li, Z., Jiang, J., Chen, K., Zheng, Q., Liu, X., Weng, H., Wu, S., and Chen, 161. Trucco, E., Ruggeri, A., Karnowski, T., Giancardo, L., Chaum, E.,
W. (2021). Development of a deep learning-based image quality control Hubschman, J.P., Al-Diri, B., Cheung, C.Y., Wong, D., Abràmoff, M.,
system to detect and filter out ineligible slit-lamp images: a multicenter et al. (2013). Validating retinal fundus image analysis algorithms: issues
study. Comput. Methods Progr. Biomed. 203, 106048. and a proposal. Invest. Ophthalmol. Vis. Sci. 54, 3546–3559.
143. Shen, Y., Sheng, B., Fang, R., Li, H., Dai, L., Stolte, S., Qin, J., Jia, W., and 162. Nagendran, M., Chen, Y., Lovejoy, C.A., Gordon, A.C., Komorowski, M.,
Shen, D. (2020). Domain-invariant interpretable fundus image quality Harvey, H., Topol, E.J., Ioannidis, J.P.A., Collins, G.S., and Maruthappu,
assessment. Med. Image Anal. 61, 101654. M. (2020). Artificial intelligence versus clinicians: systematic review of
design, reporting standards, and claims of deep learning studies. BMJ
144. Kanagasingam, Y., Xiao, D., Vignarajan, J., Preetham, A., Tay-Kearney,
368, m689.
M.L., and Mehrotra, A. (2018). Evaluation of artificial Intelligence-Based
grading of diabetic retinopathy in primary care. JAMA Netw. Open 1, 163. Cruz Rivera, S., Liu, X., Chan, A.W., Denniston, A.K., and Calvert, M.J.;
e182665. SPIRIT-AI and CONSORT-AI Working Group (2020). Guidelines for clin-
ical trial protocols for interventions involving artificial intelligence: the
145. Long, E., Lin, H., Liu, Z., Wu, X., Wang, L., Jiang, J., An, Y., Lin, Z., Li, X.,
SPIRIT-AI extension. Lancet. Digit. Health 2, e549–e560.
Chen, J., et al. (2017). An artificial intelligence platform for the multihos-
pital collaborative management of congenital cataracts. Nat. Biomed. 164. Liu, X., Cruz Rivera, S., Moher, D., Calvert, M.J., and Denniston, A.K.;
Eng. 1, 0024. SPIRIT-AI and CONSORT-AI Working Group (2020). Reporting guide-
lines for clinical trial reports for interventions involving artificial intelli-
146. Rajpurkar, P., Chen, E., Banerjee, O., and Topol, E.J. (2022). AI in health
gence: the CONSORT-AI extension. Nat. Med. 26, 1364–1374.
and medicine. Nat. Med. 28, 31–38.
165. Sounderajah, V., Ashrafian, H., Aggarwal, R., De Fauw, J., Denniston,
147. Wang, N., Cheng, M., and Ning, K. (2022). Overcoming regional limita-
A.K., Greaves, F., Karthikesalingam, A., King, D., Liu, X., Markar, S.R.,
tions: transfer learning for cross-regional microbial-based diagnosis of
et al. (2020). Developing specific reporting guidelines for diagnostic ac-
diseases. Gut. 2022-328216.
curacy studies assessing AI interventions: the STARD-AI Steering Group.
148. Shorten, C., and Khoshgoftaar, T.M. (2019). A survey on image data Nat. Med. 26, 807–808.
augmentation for deep learning. J. Big Data 6. 60-48. 166. Reddy, S., Rogers, W., Makinen, V.P., Coiera, E., Brown, P., Wenzel, M.,
149. Zhou, C., Ye, J., Wang, J., Zhou, Z., Wang, L., Jin, K., Wen, Y., Zhang, C., Weicken, E., Ansari, S., Mathur, P., Casey, A., and Kelly, B. (2021). Eval-
and Qian, D. (2022). Improving the generalization of glaucoma detection uation framework to guide implementation of AI systems into healthcare
on fundus images via feature alignment between augmented views. Bio- settings. BMJ Health Care Inform. 28, e100444.
med. Opt Express 13, 2018–2034. 167. Zhu, T., Ye, D., Wang, W., Zhou, W., and Yu, P.S. (2022). More than pri-
150. Li, W., Yang, Y., Zhang, K., Long, E., He, L., Zhang, L., Zhu, Y., Chen, C., vacy: applying differential privacy in key areas of artificial intelligence.
Liu, Z., Wu, X., et al. (2020). Dense anatomical annotation of slit-lamp im- IEEE Trans. Knowl. Data Eng. 34, 2824–2843.
ages improves the performance of deep learning for the diagnosis of 168. Hesamifard, E., Takabi, H., and Ghasemi, M. (2019). Deep Neural Net-
ophthalmic disorders. Nat. Biomed. Eng. 4, 767–777. works Classification over Encrypted Data,ACM, pp. 97–108.
151. Duran, J.M., and Jongsma, K.R. (2021). Who is afraid of black box algo- 169. US Food and Drug Administration (FDA). Proposed Regulatory Frame-
rithms? On the epistemological and ethical basis of trust in medical AI. work for Modifcations to Artifcial Intelligence/Machine Learning (AI/
J. Med. Ethics. ML)-Based Sofware as a Medical Device (SaMD)—Discussion
152. Niu, Y., Gu, L., Zhao, Y., and Lu, F. (2022). Explainable diabetic retinop- Paper and Request for Feedback. https://www.fda.gov/files/medical
athy detection and retinal image generation. IEEE J. Biomed. Health %20devices/published/US-FDA-Artificial-Intelligence-and-Machine-
Inform. 26, 44–55. Learning-Discussion-Paper.pdf. (Accessed 25 May 2022).