Abstract
Differentiating between diabetic nephropathy (DN) and non-diabetic renal disease (NDRD) without a kidney biopsy remains a major challenge, often leading to missed opportunities for targeted treatments that could greatly improve NDRD outcomes. To reform the traditional biopsy-all diagnostic paradigm and avoid unnecessary biopsy, we developed a transformer-based deep learning (DL) system for detecting DN and NDRD upon non-invasive multi-modal data of fundus images and clinical characteristics. Our Trans-MUF achieved an AUC of 0.980 (95% CI: 0.979 to 0.980) over the internal retrospective set and also had superior generalizability over a prospective dataset (AUC: 0.989, 95% CI: 0.987 to 0.990) and a multicenter, cross-machine and multi-operator dataset (AUC: 0.932, 95% CI: 0.931 to 0.939). Moreover, the nephrologistsâ diagnosis accuracy can be improved by 21%, through visualization assistance of the DL system. This paper lays a foundation for automatically differentiating DN and NDRD without biopsy. (Registry name: Correlation Study Between Clinical Phenotype and Pathology of Type 2 Diabetic Nephropathy. ID: NCT03865914. Date: 2017-11-30).
Similar content being viewed by others
Introduction
Diabetic nephropathy (DN) is the leading cause of end-stage renal disease worldwide, necessitating kidney transplantation or dialysis1. DN is also one of the most prevalent reasons for clinical nephrology referrals, affecting approximately half of the patients with type 2 diabetes mellitus (T2DM)2,3. In addition to DN, nondiabetic renal disease (NDRD) and the combination of NDRD and DN are also common for patients with concurrent T2DM and chronic kidney disease (CKD). In this paper, the combination of NDRD and DN is labeled as NDRD. In clinic, differential diagnosis of DN and NDRD is of great importance due to the significant difference in their risk ratio of developing into end-stage renal disease, i.e., 7.1% (95% CI: 2.46â20.49) and 0.89% (95% CI: 0.19â4.24) for DN and NDRD, respectively4. In addition, the treatment and prognosis of DN and NDRD are completely different5, e.g., treating NDRD requires individualized management and more medical resource. Therefore, the missed diagnosis of NDRD means a loss in identifying specific methods to slow the progression of CKD. So it is of great importance to accurately identify patients with NDRD, either alone or combined with DN, necessitating the renal biopsy for precise pathological diagnosis. This enables the initiation of individualized management, such as immunosuppressive therapy for lupus nephritis or renal vasculitis, crucial for slowing CKD progression. In our study, NDRD and the combination of NDRD and DN are both annotated as NDRD.
Kidney biopsy6,7 is the only way to obtain a reliable diagnosis of DN and NDRD. Unfortunately, performing kidney biopsies for a large number of T2DM patients with CKD remains a challenge due to the high cost and logistical difficulty. The primary clinical guidelines suggest first screening patients with DN or NDRD based on certain clinical characteristics, such as diabetes history and diabetic retinopathy (DR), and then performing kidney biopsy on the screened NDRD patients for the final diagnosis8. However, previous studies have shown that these guidelines exhibit a low specificity of only 40.63% in diagnosing NDRD9. Recently, some studies have benefited from the great success of deep learning10 to improve the diagnostic specificity with clinical characteristics. Nevertheless, simple clinical factors have intrinsically low data dimensionality, providing rather limited information for precise and robust diagnosis11. For instance, current studies only consider the presence or absence of DR, ignoring the detailed association between DR and DN, which can be found in high-dimensional clinical data, e.g., fundus images. In fact, the potential impact of leveraging noninvasive optical imaging data has not been well explored in artificial intelligence (AI)-based NDRD screening12. In this scenario,it seems more interesting and reasonable to devise an AI system by integrating fundus images and clinical factors for both NDRD screening and timelynephrology referrals.
In this paper, we develop and validate a transformer-based deep learning system named Trans-MUF to classify DN and NDRD with nonbiopsy multimodal clinical data of fundus images and clinical characteristics. Figure 1a shows a schematic diagram for the comparison between the traditional paradigm and our AI-based system in classifying DN and NDRD. As shown, our system is able to improve sensitivity for NDRD detection and consequently reduce the number of false-positive findings, which, in return, leads to fewer biopsies compared to the straightforward biopsy for all approach. We extensively evaluate the superior performance of Trans-MUF with internal and external datasets, including both retrospective and prospective cases.Additionally, to provide interpretability and promote the clinical application of Trans-MUF, we further generate DN- related lesion masks and a visualization map showing the reasoning behind each decision. Interestingly, nephrologistsâ performance improved when they used visualization maps as assistance for diagnosis. A statistical analysis of the decision visualization maps is also conducted, facilitating both model explainability and in-depth understanding of DN and NDRD.
a Comparison between the traditional paradigm and our AI-based diagnostic paradigms for the diagnosis of DN and NDRD. In contrast with the conventional paradigm, the proposed AI system can greatly improve the missed diagnosis rate of NDRD and reduce the number of kidney biopsies. b The pipeline of our AI system. With a multimodal input of retinal images, lesion segmented maps and risk factors, our AI system can generate a predicted probability of DN/NDRD, lesion segmented maps, subjective network visualization maps and a correlated quantitative analysis. It constructs a multimodal fusion network consisting of multiple models. c Personalized management analysis for our AI system. We simulated a scenario that assessed whether the AI system could correctly identify patients with DN from a mixed group of DN and NDRD to avoid biopsy for DN patients. This simulation included 55 patients from external validation set and used a decision curve analysis methodology. d Composition illustration of the data used in this study. e ROC curves of our AI model for different validation sets. AI, artificial intelligence; Seg, segmentation.
Results
AI system overview
The AI system (Figs. 1b and 2) devised in this paper was trained in a supervised learning manner, with multimodal input of fundus images and clinical factors, including proteinuria and hematuria. The detailed structure of the proposed AI system can be found in Section. Our system mimics a clinical outpatient scenario where medical resources are very limited, that is, only nonbiopsy examinations are available for the majority of patients. Fundus images are first passed through the model to generate lesion and anatomic features, which are then leveraged as auxiliary information for the diagnostic model. Beyond the basic output of the predicted probability of DN and NDRD, our system also provides interpretability for clinical practice by presenting model decisions in the form of visualization maps and corresponding lesion attribution. That is, for each patient, the system visualizes and quantifies the important lesions that contribute to the detection of DN. The underlying deep learning model of the system is constructed with a transformer13, which is capable of modeling the correlation among various image features and, of note, has achieved state-of-the-art performance in the image classification field.
Taking the multi-modal input of fundus image and the clinical factors, the Trans-MUF system can output the DN/NDRD prediction result, as well as the auxiliary outputs of interpretable visualization map and pathology attributing score. Trans-MUF system is composed of three subnets, including a ImgLesion subnet b Factor subnet and c Diagnosis subnet.
Patient data
Our Retina-DKD dataset used for model training and testing includes data from 246 diabetic kidney disease (DKD) patients who underwent kidney biopsy, fundus imaging (nâ=â934) and blood tests at the Chinese Peopleâs Liberation Army (PLA) General Hospital. For each patient, seventeen clinical factors (Table 1) were collected, and five factors (medical history of diabetes mellitus (DM), systolic blood pressure (SBP), estimated glomerular filtration rate (eGFR), hemoglobin (Hb), and glycosylated hemoglobin (HbA1c)) were selected and finally used for model development. Based on the microscopy of kidney specimens, each patient was labeled DN or NDRD (including NDRD and the combination of NDRD and DN) by senior nephrologists. The entire dataset was initially split into training and validation sets with a 4:1 ratio.
Moreover, we collected associated histopathological images, pathology reports, patient demographic data and clinical examination information (Fig. 1d, Table 1, Supplementary Fig. 2). In addition to patient-level annotation, fine-grained pixelwise annotation of 17 retinal lesions in 529 fundus images, totaling 15,960 lesion annotations, was performed by a senior practitioner with more than 10 years of experience in ophthalmology. Detailed statistical data of our lesion annotations is provided in Supplementary Table 1. Besides, the demographic information of the patients can be found in Table 1 As shown in Table 1, all patients had a long history of diabetes, with an average course of more than 80 months. In addition, the average eGFR in the nonstandard validation datasets is relatively low. However, there is no significant difference in the average eGFR for the other four datasets.
We further validated our model performance with two multicohort datasets. The first dataset was prospectively collected from the center at the Chinese PLA General Hospital and contains data from 41 DKD patients who underwent similar biopsy, imaging and blood tests as our internal dataset. The remaining dataset was retrospectively collected from multiple external institutions and multiple angles. Detailed information about the datasets can be found in Table 1 and Methods. Of note, in terms of clinical characteristics, there is no significant difference in the average eGFR for the two multicohort datasets.
Primary model performance
For the internal validation set, i.e., Retina-DKD, our Trans-MUF system achieved an area under the receiver operating characteristic curve (AUC) of 0.973 [95% confidence interval (CI): 0.969â0.977] when classifying patients with retinal fundus images and biochemical examination results. The system was also evaluated with four other metrics, and the results are as follows: 0.936 in accuracy, 0.962 in sensitivity, 0.900 in specificity and 0.944 in F1 -score. A summary of the primary model performance is shown in Table 2. In addition, the receiver operating characteristic (ROC) curve of our Trans-MUF system is shown in Fig. 3a. Figure 3b represents a t-distributed stochastic neighbor embedding (t-SNE) visualization of the Retina-DKD dataset by our Trans-MUF system, clearly showing 2 clusters of fundus images and indicating the ability of our model to distinguish DN and NDRD.
a (Top row) The ROCs of our model, other compared models and human expert based on the three validation datasets. (Bottom row) The ROCs of our model and other ablated models based on the three validation datasets. b t-SNE visualization of the Trans-MUF system with the Retina-DKD validation set. c The performance analysis on the segmentation proportion in multi-modal concatenation (Top) and WAM block number (Bottom) during training. d (Lest to right) Subjective visualization of lesion segmentation of the annotated multi-lesion maps, binarized lesion maps, baseline model (i.e., single layer fusion) and our proposed model. e The performance analysis on the number of fusion layers in our proposed lesion segmentation model.
To validate the generalizability of our AI system, we further evaluated our model with the prospective PLA dataset and external multi-institution dataset. For the prospective PLA dataset, our system achieved 0.979 AUC (0.977â0.980) and 0.976 accuracy, which is shown in Table 2. In addition, as seen in this table, the superiority between these two validation sets varies across evaluation metrics. For example, the prospective set performs better than the internal set in terms of accuracy (i.e., 96.3% of prospective versus 93.6% of internal), yet remains inferior in some other metrics, including AUC (i.e., 98.9% of prospective 98.0% of internal). Therefore, it can be concluded that the modelâs performance on these two datasets is essentially consistent, since both the prospective validation set and the internal validation set originate from the same hospital, albeit collected at different time points, thus exhibiting a certain degree of consistency. For the external multi-institution dataset, the AI system reached 0.876 AUC (0.867â0.893) and 0.919 accuracy (0.916â0.923). Detailed results of model evaluation in terms of 5 metrics are presented in Table 2 As shown in Table 2, the classification performance slightly deceased when validated with the external dataset, which is likely due to the domain shift caused by the difference in image acquisition device. In addition, the ROC curves of our Trans-MUF system with the two datasets are shown in Fig. 3a.
Moreover, we further simulated a scenario that assessed whether the AI system could correctly identify patients with DN from a mixed group of DN and NDRD to avoid biopsy for DN patients. This simulation included 55 patients from external validation set and used a decision curve analysis methodology. Results are shown in Fig. 1d, validating the potential of our system for reducing unnecessary renal biopsy.
In addition, the model we developed was also compared with nephrologists. In our retrospective reader study, three board-certified nephrologists interpreted 44 cases sampled from the Retina-DKD validation set. The nephrologists achieved an average performance of 70.43% in accuracy, 59.72% in sensitivity and 83.33% specificity at the patient level (Supplementary Table 2). In contrast, the patient-wise classification performance of our AI system for the reader study subset was 93.18% in accuracy, 100.00% insensitivity and 85.00% in specificity, showing great superiority over thenephrologists. In addition, the performance of the three nephrologists are averaged, shown as the red working point in Fig. 3a. Our AI system performed far better than nephrologists in classifying DN and NDRD, which is probably due to its advantage in capturing underlying diagnostic features of both images and clinical factors.
Model effectiveness and robustness analysis
We compared our model with three other commonly used image classification methods, including DenseNet- 12114, AlexNet15 and VGG-1616, DAFT17, on the task of multimodal disease classification. Notably, we slightly tailored the compared methods to the multimodal settings since they were not originally designed with multimodal data. Table 2 shows that Trans-MUF performed the best, achieving at least 93.6%, 100%, 94.6% and 98.0% improvements in accuracy, sensitivity, F1 -score and AUC, respectively, over the other models. This indicates that our model can effectively utilize multimodal information for classifying DN and NDRD. In addition, fig. 3a shows plots of the ROCs of all models, demonstrating the superior performance of our model over other comparison models.
An ablation study was designed to validate the effectiveness of the two modalities,i.e., images and clinical factors, in classifying DN and NDRD. Specifically, we removed the input of the fundus images and lesion segmented maps, as well as the related model structure, and conducted disease classification experiments. Table 2 shows that the performance after removing image modality decreases in all metrics, with reductions of 4.3%, 9.4%, 4.0%, and 7.8% in accuracy, sensitivity, specificity, F1 -score and AUC, respectively. Similar results for ablating the modality of clinical factors, as well as the convolutional neural network (CNN) part and transformer part of the proposed Trans-MUF, are observed in Table 2 and Supplementary Table 3 based on all the validation sets. The ROCs in fig. 3a indicate the effectiveness of jointly using the two modalities and the usefulness of fusing the CNN and transformer networks.
We also validated the performance of our AI system in the auxiliary task of lesion segmentation with the internal Retina-DKD set. Our Trans-MUF system achieved 97.5% for the accuracy when performing pixelwise segmentation of retinal lesions based on fundus images. The system was also evaluated with other 3 metrics, that is, 69.0% in Dice similarity coefficient (DSC), 94.3% in area under the receiver operating characteristic curve (AUROC) and 79.9% in area under the precision-recall curve (AUPRC). Comparative analysis of the proposed multi-layer fusion segmentation mechanism against the referenced U-Net18 is shown in Fig. 3e and Supplementary Fig. 3. Additionally, Fig. 3d also shows the subjective segmentation results of lesions generated by our AI system, further indicating the effectiveness of this system in the task of lesion segmentation.
Furthermore, a robustness analysis was conducted on fundus images taken at non-standard angles. In practice, clinically acquired images are not always of good quality, posing great challenges in the real-world validation of AI systems. To validate the robustness of our AI system, we conducted experiments with fundus images with nonstandard shooting angles. Specifically, we tested our AI system with 42 images (nâ=â16 cases) with nonstandard imaging angles as well as their correlated clinical characteristics from the PLA General Hospital. According to the diagnostic results of the model, the diagnostic accuracy of our system decreased when using images with nonstandard angles; however, the AUC value still exceeded 80%, validating the robustness of our Trans-MUF system.
Model interpretability analysis
When making diagnoses with multimodal clinical data, clinicians and AI systems tend to pay different attention to different modalities19. To uncover the dependency of our AI system on different modalities, we devised a modality weight quantification method (refer to Section for details) and applied it to the internal retrospective validation set. Quantitative results are shown in Fig. 4a. As shown, lesion segmentation (41.98%) plays a more important role than raw fundus images (29.60%) and clinical characteristics (28.42%) in DN and NDRD classification.
a The weight of multimodal data in diagnosis. b The feature importance of clinical data. c Segmented lesion map and network visualization of DN and NDRD. d Violin plot of the average AUC between visualization maps and different lesion segmentation maps. The three lines in the violin plot from top to bottom are the 75% quartile, median and 25% quartile. e AI-assisted decision-making of nephrologists with network visualization.
In addition to intermodality importance attribution, further quantitative attribution analysis within the two modalities of clinical factors and images was conducted. Specifically, for clinical factors, we obtained the relative importance among the five included clinical factors. A descending importance order of SBP, HbA1c,eGFR, DM and Hb was observed (Fig. 4b). A quantitative study of lesion attribution was also performed based on a widely used feature visualization method20. Specifically, for each patient diagnosed by the system, we provided both the heatmaps and lesion segmentation of fundus images as well as the dependency score of different lesions. The result is shown in Fig. 4c and Supplementary Fig. 4.
In addition to interpreting model decisions, the visualization maps were further used to assist nephrologists in clinical diagnosis. Specifically, with the help of ophthalmologists, one general principle of visualization maps was observed; that is, visualization maps for DN are more scattered and focus more on lesion regions, while those for NDRD are more concentrated and focus more on retinal vessels. In addition, when classifying DN, the top three important lesions are fiber-added membrane, pigmentation and hard exudates (Fig. 4d, e). The average AUC andlesion number of NDRD are shown in Supplementary Fig. 6. Then, three nephrologists further conducted DN and NDRD classification with the summarized principle based on fundus images, network visualization maps and segmented lesion maps. The experimental results are shown in Fig. 4e. As shown, the diagnostic accuracies of the three nephrologists improved to 93.2%, 86.4% and 93.2%, respectively (Refer to Supplementary Table 2 and Supplementary Fig. 5 for details). The three nephrologists were also asked to diagnose using only the visualization maps (with retinal images). The diagnostic accuracy rates were 93.2%, 88.6%, and 90.9%, respectively (Refer to Supplementary Table 2 and Supplementary Fig. 5 for details). This indicates that visualization maps (with retinal images) alone are sufficient for nephrologists to attempt a diagnosis.
Discussion
In this study, we developed a Trans-MUF model for predicting DN and NDRD based on fundus images and clinical data for the first time. We designed an integrated deep learning network that organically combines two subnets and two different modalities. At the same time, we validated the superior generalizability of our multimodal model using multicenter and multiangle datasets. More importantly, we visualized multimodal data, and the interpretability of the model clarified the morphological focus areas of diagnosis. Moreover, we extracted information from the model prediction process to guide doctors in making decisions, which can improve the accuracy of doctorsâ diagnosis.
Previous studies have shown that DN and NDRD each account for an average of 40-50% in DKD patients21,22,23. However, the treatment methods of NDRD and DN patients are completely different. For patients with concurrent DN, sodium-dependent glucose transporter 2 (SGLT2) and renin-angiotensin system (RAS) inhibitors are generally used to reduce glomerular hyperfiltration5. The diagnosis of any NDRD requires initiating individualized management methods, such as immunosuppressive therapy for lupus nephritis or renal vasculitis. A missed diagnosis of NDRD represents a missed opportunity to identify specific methods to slow the progression of kidney disease. However, due to the large number of patients with DN, performing renal biopsy for all patients with diabetes combined with CKD is costly and wasteful. Using noninvasive methods to screen diabetes combined with NDRD patients for renal biopsy in advance will greatly reduce the pain and cost of patients and save cliniciansâ time and energy. Therefore, we established a multimodal model for the preliminary screening of DN and NDRD patients.
The Kidney Disease Outcome Quality Initiative (KDOQI) guidelines recommend the use of clinical characteristics to distinguish DN and NDRD24. However, in previous studies, insufficient specificity in distinguishing DN and NDRD has been observed when following the KDOQI guidelines9. The use of clinical characteristics such as DR, DM duration and Hb have helped to identify DN and NDRD and establish diagnostic models for them in previous studies25,26. However, in a broader context, disease diagnosis is a multimodal problem driven by histology, clinical data, etc. Some studies have found that DR is closely related to DN diagnosis10,27, but no study has yet been conducted to diagnose DN based on fundus images, nor has the relationship between DR-related lesions and DN been explored. In this study, DN and NDRD were diagnosed based on fundus images, lesion segmentation images and clinical data, and the features of fundus images were extracted using deep learning networks without requiring ophthalmologists to diagnose fundus images. Our method avoids the subjectivity of ophthalmologistsâ diagnosis and the diagnostic differences between different doctors.The results show that this model performs slightly better with the validation set than previous studies for both diagnosed DN and NDRD patients (including NDRD and NDRD combined with DN patients)10,28, with AUCs of 0.97 vs 0.95 vs 0.93. For the external validation set, our model performance is significantly better than that of the above two models10,28.
The generalizability of a model is an important factor affecting its generalization and application29,30. Considering the differences in central instruments and fundus images, we collected fundus images and clinical data from five hospitals to verify the generalizability of the model. The experimental results confirmed that our model has good generalizability and demonstrates superior performance with multicenter datasets from five hospitals. This maybe related to our standardized preprocessing of the images, which improves the visibility of details and content in the input image31,32. Furthermore, considering the differences in photography techniques among ophthalmologists with different work experience, especially for novice ophthalmologists, it is not easy to ensure that all fundus photos are collected at standard angles. Therefore, we included a dataset of fundus images from different angles to verify the applicability of our model, and our model was tested with this dataset. The above two validation results indicate that our model has superior performance with fundus images from different centers and with different angles, which greatly reduces the limitations of our model in clinical use and is conducive to its widespread clinical application. A multimodal diagnostic model based on fundus images, clinical data and lesion segmentation images may provide more effective screening and auxiliary diagnosis for a wider range of DN and NDRD populations.
Of note, it is clinically reasonable for our Trans-MUF model to utilize the multimodal data of retinal images, lesion segmentations and clinical factors. The usage of multimodal data is consistent with the clinical diagnosing process, so it makes our AI system both reasonable and clinically interpretable. Moreover, it is cost-effective to use the multimodal data in practice. This is because the retinal imaging (for fundus images) and blood testing (for clinical factors) are both basic and routine clinical examinations for the CKD patients. In addition, the lesion segmentation maps can be automatically generated by our proposed segmentation model and thus free of expert annotation. Finally, compared with the SOTA multimodal classification methods, our model achieves great performance improvement. For example, compared with DAFT, which uses all three modalities of retinal fundus images, lesion segmetations and clinical factors, our model has achieved an approximate 5% improvement from 93.3% to 98.0% in terms of AUC. In addition, we have conducted further experiment with the ensembled model of VGG-16 and a factor model. Results are shown in Supplementary Table 3. As shown in this table, the performance of the ensembled model remains inferior to our Trans-MUF in all the valdaition datasets. For instance, the ensembled model achieves 90.4% versus 96.3% of our model in accuracy, demonstrating the effectiveness of the multimodal modeling of our Trans-MUF.
When making diagnoses with multimodal clinical data, clinicians and AI systems tend to pay different attention to different modalities. To understand the mechanism of Trans-MUF and minimize blackbox effects, we employed various visualization techniques to display the focus of different modal data. Our system relies more on lesion segmentation images when diagnosing DN than NDRD, which is consistent with the results of the meta-analysis that indicated that DR is a high risk factor for DN33. This to some extent explains the superiority of our AI system over nephrologistsin DN classification. In clinical practice, clinical characteristics are still predominant diagnostic tools, and lesion localization information together with other fundus image features is much more diagnostically informative. In addition, our visualization heatmaps and AUC plot show that in DN patients, the model focuses on the lesion regions where the hard exudates, soft exudates, neovascularization and other lesions are located. These lesions are also common manifestations of DR34. The visualization maps for NDRD are more concentrated and focus more on the retinal vessels. The probable reason behind this is that DN and DR are essentially microvascular complications of T2DM, and vascular morphology is an important basis for distinguishing DN and NDRD1,35. As indicated by previous studies, an understandable AI tool not only increases confidence levels of clinicians in making or excluding a diagnosis but also provides educational feedback that will benefit nonexperts such as nonophthalmologist clinicians or general nephrologists36,37. Visualization maps allows clinicians in our studies to focus on AI prediction information, which can then be used to help guide their clinical decision-making.
Our study has some limitations. Although this was a multicenter study, the sample size was relatively small, and data from multiple ethnicities were not collected compared with other published models for the detection of systemic diseases from ocular images. In addition, although we successfully identified DN and NDRD, our model did not reach a more than 95% accuracy rate, and it is also important to establish a multimodal model based on fundus images to predict the prognosis of patients. We believe that larger follow-up datasets from different populations may improve the validity and generalizability of our model and enable greater advances, such as elucidation of patient efficacy and outcomes.
In conclusion, the study has shown that it is possible to screen and identify DN and NDRD using models developed by deep learning based on multimodal data. More importantly, our model achieved superior performance with a multicenter validation set. In addition, we found that DR-related lesions also contain key diagnostic information, thus providing a new direction for the study of the pathophysiology of DN and creating new opportunities for disease diagnosis.
Methods
Datasets
This study was approved by the Ethics Committees of the Chinese PLA General Hospital (S2017-133-01), Peopleâs Liberation Army Army Medical Center Ethics Committee (YY2019-36), Dalian Medical University Affiliated First Hospital Ethics Committee (PJ-KS-KY-2019-74(X)), the Ethics Committee of Tongren Hospital (TRECKY2018-027). Beijing Tongren Hospital (nâ=â10 patients). All patients provided written informed consent. The study was conducted in accordance with the Declaration of Helsinki. Informed consent was obtained from participants. This study (Correlation Study Between Clinical Phenotype and Pathology of Type 2 Diabetic Nephropathy) is registered on ClinicalTrails.gov (https://www.clinicaltrials.gov/) under the identifier NCT03865914 (2017-11-30).
All datasets were included and excluded according to the same criteria. The inclusion criteria were as follows: 1. T2DM patients age ⥠18 years old; 2. Diagnosed with CKD through renal biopsy; 3. Retinal images and included optical discs were collected. The exclusion criteria were as follows: 1. Accepted kidney replacement therapy such as hemodialysis, peritoneal dialysis, or kidney transplantation; 2. Pregnancy or malignancy; 3. Poor-quality fundus images, such as a massive hemorrhage obscuring the optic disc, blurred images, or large dark shadows; 4. Missing important clinical information, such as patient demographics and blood markers. Of note, there could only be three possible scenarios for patients,i.e., DN, NDRD, or a combination of DN and NDRD, which is categorized as NDRD in our study.
According to the inclusion and exclusion criteria, 934 fundus images were gathered retrospectively from 246 T2DM patients with CKD who were admitted to the Department of Nephrology at the First Medical Centre of Chinese PLA General Hospital from May 2016 to April 2020 for kidney puncture as the Retina-DKD dataset. Notably, retinal imaging was conducted on both eyes of patients, with an average of 3.8 images per case. Then, patients were divided into training and validation sets at a ratio of 4:1. All patient IDs exist independently of the training and validation sets. The flowchart for inclusion and exclusion is shown in Supplementary Fig. 1. Moreover, to validate the clinical practicality of our model, we prospectively collected 136 fundus images of 41 patients with diabetes and CKD at the First Medical Centre of Chinese PLA General Hospital from May 2020 to December 2020 as a prospective validation dataset.
To study the generalizability of our model, the algorithm was applied to an external validation dataset. This dataset consists of 55 retinal fundus images from 28 patients at five other large hospitals in China between September 2022 and March 2023, including Daping Hospital in Chongqing City (nâ=â9 patients), The First Affiliated Hospital of Dalian Medical University in Dalian City (nâ=â4 patients), Beijing Tongren Hospital (nâ=â10 patients), The Third Medical Centre of Chinese PLA General Hospital (nâ=â3 patients) and The Fourth Medical Centre of Chinese PLA General Hospital (nâ=â2 patients), both in Beijing City. Importantly, 42 nonstandard angle fundus images from 16 T2DM patients with CKD collected from the Department of Nephrology at the First Medical Centre of Chinese PLA General Hospital were also used to validate the performance of the model with images from different angles (Supplementary Fig. 3).
Clinical characteristics collection and statistical analysis
Additionally, the demographic and clinical data of all patients were collected for the above three datasets. We collected baseline characteristics and clinical parameters, including sex, age, body mass index (BMI),blood pressure (BP), and medical history of diabetes mellitus (DM). The obtained laboratory data included hemoglobin (Hb), glycated hemoglobin (HbA1c), serum creatinine (SCr), albumin (ALB), estimated glomerular filtration rate (eGFR), cholesterol, fibrinogen, triglyceride, blood uric acid (BUA), 24-hr proteinuria, hematuria and patientsâ type of kidney disease (DN or NDRD). Quantitative data are expressed as the mean standard deviation. Qualitative data are expressed as a percentage.
Then, we used logistic univariate regression to explore the factors associated with the diagnosis of DN and NDRD. Age and serum creatinine were not included in the univariate regression because they were used in calculating eGFR. As shown in Supplementary Table 4, five indicators for the diagnosis of DN and NDRD, i.e., DM, SBP, eGFR, Hb, and HbA1c, were statistically significant, with p valuesâ<â0.05. Therefore, we selected these five clinical variables for building the multimodal datasets. Logistic univariate regression was performed using SPSS 25.0 software.
Retinal fundus image collection and annotation
Retinal fundus images from the First Medical Centre of the Chinese PLA General Hospital were collected by 45 nondilated pupil color fundus cameras (Kowa VX-20 and Nonmyd 7), with at least two images (one for each eye) per patient. Additionally, 42 nonstandard angle fundus images were collected to test the generalizability of the model. Retinal fundus images from five other hospitals were captured using a variety of standard fundus cameras, including Kowa NonmydWX, TOPCON TRC-NW8, TOPCON TRC-NW7SF and CANON CR-2.
The fundus lesions were labeled with the LabelMe software by two junior practitioners and reviewed by a senior practitioner with more than 10 years of experience. The labeling process for fundus lesions is shown in Supplementary fig. 7. A total of 11 different lesions were labeled,i.e., microvascular, soft exudates, hard exudates, hemorrhages, pigmentation, microaneurysms, retinal detachment, epiretinal membrane, fiber-added membrane, preretinal ocular membrane, and neovascularization.
Definition and criteria for disease diagnosis
The criteria for CKD diagnosis were eGFR of more than 60âml/min/1.73 m2 with albuminuria or less than 60âml/min/1.73 m2. This was confirmed by at least two visits separated by three months38. The diagnosis of eGFR was based on the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) Equation39, which was validated with Chinese populations40. In our study, kidney biopsy was nonselectively performed in CKD patients with type 2 diabetes. The kidney tissue was examined by light microscopy, immunofluorescence and electron microscopy using standard procedures. Kidney specimens were stained with hematoxylin-eosin, periodic acidâSchiff, hexosamine-silver, and immunofluorescence and electron microscopy were used to confirm the diagnosis of DN or NDRD. All histopathological features were independently evaluated by three renal pathologists. The diagnosis of DN and NDRD was based on the Renal Pathology Society (RPS) classification standard41. For patients with multiple visits, we used the clinical indicators and retinal fundus images captured when the patient had their kidney puncture during hospitalization. The labeling process for DN and NDRD is shown in Supplementary Fig. 8.
Structure of the Trans-MUF system
In our paper, the Trans-MUF system was proposed to predict the probability of DN or NDRD based on multimodal input of fundus images and clinical factors, with the auxiliary output of segmented lesion maps. As illustrated in Fig. 2, the proposed Trans-MUF system consists of 3 subnets: ImgLesion subnet, factor subnet and diagnosis subnet. In the ImgLesion subnet, the input fundus image is first processed to generate the optic disk and lesion segmented maps, which are then incorporated with the fundus image for image feature extraction (Of note, the performance analysis on the segmentation proportion in multi-modal concatenation during training is shown in Fig. 3c.) In addition, the clinical factors are fed into the factor subnet to further extract the factor features. Finally, based on the extracted multimodal features, the diagnosis subnet can then output the final diagnosis result. Detailed structures of Trans-MUF are provided in Supplementary Fig. 9-11.
ImgLesion subnet
The ImgLesion subnet was devised to extract diagnosis-relevant features from the image modality, including fundus images, lesion and optic disk segmented maps. Specifically, the ImgLesion subnet consists of five modules, i.e., the segmentation module, window attention mechanism-based (WAM) ResNet module, feature extracting module, transformer-based encoder module and fusion squeeze and excitation (SE) module. Details of the structures are described as follows.
The segmentation module was designed to generate segmented lesion and optic disc/cup maps based on fundus images. Studies have shown that segmenting information can have a positive guiding effect on models. First, we utilized a patch-based output space adversarial learning (pOSAL) framework42 for segmenting the optic disc/cup regions. In addition, we further devised a U-shape network for segmenting the lesion area from the background in fundus images. The U-shape network is designed on the basis of U-Net18, which is a commonly used image segmentation method. Specifically, we modified U-Net in a multi-scale feature extraction manner so that retinal lesions of different sizes can be effectively detected43.
The lesion segmentation task is conducted based on the pixel-wise annotated part of the internal Retina-DKD set, including 529 fundus images as well as the lesion annotation of 17 lesions. Specifically, we firstly screen out 22 images of low quality and/or of low lesion number with help of a senior ophthalmologist. Then, the remaining 507 images are divided into training set (442 images) and test set (65 images). Of note, to avoid the information leaking from the segmentation task to the final DN and NDRD classification task, the training set of lesion segmentation is the subset of that of classification task. To simplify the multi-lesion segmentation task, we treated different lesions as the same and finally perform the binary segmentation task, i.e., lesions as foreground scene and others as background.
The structure of the lesion segmentation module is shown in Supplementary fig. 9. In the lesion segmentation module, a U-shaped structure, which is composed of five down-transition units and four up-transition units, is designed to extract the features for precisely localizing retinal lesions. Specifically, the input fundus image is firstly fed into a multi-scale feature extraction layer for extracting features of multi-scale lesions. Then, the extracted features are progressively contracted and down-sampled through five down-transition layers followed by max pooling layers with stride of 2. In this way, the contextual information of fundus images can be captured in the outputs of the last down-transition layers, namely contextual features. Subsequently, the contextual features are progressively expanded and up-sampled through three up-transition layers. Note that the skip connections are adopted between the up-transition unit and its corresponding down-transition unit, in order to provide boundary information during the up-sampling process. Then the outputs of the last down-transition unit and each up-transition layer are further processed by the convolution layers to generate the multi-scale intermediate segmentation. Assuming that \({\hat{{\bf{S}}}}_{{\bf{i}}}\) is the segmentation result at the i-th scale, the final segmentation lesion \({\hat{{\bf{S}}}}_{{\bf{i}}}\) is calculated as follows:
In the above equation, {\({\eta }_{i}{\}}_{i=1}^{3}\) are the hyper-parameters to balance the intermediate segmentation at different scales, and UP(·, t) is the t-time upscale operation. During the training stage, segmented lesion \({\hat{{\bf{S}}}}_{{\bf{i}}}\) is supervised by its corresponding ground-truth lesion. To optimize the parameters of the network, we utilized two commonly used loss functions,i.e., Dice loss44 and Focal loss45.
Based on the classic ResNet46 structure, we proposed a modified version named WAM ResNet, with the embedding of a WAM block, which is designed to increase the receptive field of the input image. In the WAM ResNet, the WAM block is cascaded four times for dimension alignment. Specifically, in the WAM block, we replaced the traditional single convolutional layer with multiple parallel convolutional layers, which are further weighted through the self-learned weight coefficient wj via the following formulation:
In Equation 2, Xj represents the output of the j-th convolutional layer, while X denotes the final output. Notably, ahead of the WAM block, a convolution layer is applied for size matching. The performance analysis on WAM block number during training is shown in Fig. 3c. Besides, the detailed structure of the WAM ResNet module is provided in the Supplementary Fig. 10.
In addition to extracting local image features with convolutional layers, we further adopted a transformer-based method to obtain the global information from fundus images. Specifically, based on the vision transformer (ViT) model13, we proposed a transformer-based encoder module, in which two transformer blocks are utilized to extract global features. Unlike the traditional ViT structure, the classification head is removed in our module so that the extracted global image features can be further fused with the local image features extracted in the WAM ResNet module. More details of the transformer-based encoder module information can be found in the Supplementary Fig. 11.
To accelerate the convergence of our system with limited training data, we added a feature extraction module before the transformer-based encoder module. In this module, we utilized the WAM block to extract local image features from the input images, which works in a residual manner. This helps the transformer blocks to fully recognize local image information. The fusion SE module was proposed to merge the local and global features extracted from the convolution and transformer- based modules. Specifically, in our module, we leveraged the SE block47 to recalibrate the importance of each channel of the input features by enhancing the representation of channels with more informative features while suppressing less important ones.
Factor subnet
The purpose of the factor subnet is to generate high-level semantic feature vectors that contain the clinical information of patients. First, we conducted research on the five indicators that are most relevant to DN and NDRD based on logistic regression and related analysis. Then, we categorized these five indicators into two groups, i.e., invasive and noninvasive groups, based on the varying levels of difficulty in obtaining them. Notably, for each risk factor, we performed data normalization before the feature extraction process. The two types of risk factors are trained separately using two different networks, which are then concatenated together.
Modality weight quantification
The attention mechanism-based modality weight quantification method is proposed for measuring the relative importance of the three input modalities,i.e., fundus images, lesion and optic disk/cup segmented maps and clinical factors, for the task of disease classification. First, the test accuracy of Trans-MUF is set as the baseline value. Next, we ablated each modality by setting the related part of the input to zero and then quantified the modality importance by the model performance variation. To better assess the impact of different modalities, we utilized a softmax function to normalize the results.
Quantitative network visualization analysis based on fundus images
Quantitative network visualization was conducted to assess the importance of different retinal lesions within the fundus images in the task of DN and NDRD classification. We designed our visualization paradigm on the basis of a commonly used network visualization method,i.e., Grad-CAM20, for visualizing the Trans-MUF system and quantifying the relative importance of different lesions in disease classification. Specifically, we used the output layer of WAM ResNet for the visualization of fundus images. We first computed the gradient of the predicted disease probability score \({y}^{c}\) of class \(c\) with respect to the feature activation map \({A}^{k}\), where \(A\) represents the output of the WAM ResNet through the activation layer and \(k\) denotes the \(k\)-th channel of \({\boldsymbol{A}}\). Then, we conducted global average pooling for each gradient layer to obtain the neuron importance weights \({{\rm{\alpha }}}^{c}\) as follows.
where \({H}_{A}\) and \({W}_{A}\) represent the width and height of the feature activation map \({\boldsymbol{A}},k\),, respectively. Finally, the calculated \(\alpha c\) is multiplied by the output of each layer of \(A\), and the multiplied result is further processed by the rectified linear unit (ReLU) activation layer48 to obtain the visualization map.
Quantitative attribution of risk factors
To quantify the decision importance of different risk factors, we conducted occlusion experiments for the factors. Specifically, we set each risk factor to zero and input it to the model to obtain the classification accuracy, which was then subtracted from the original accuracy to generate the quantified effect of the risk factor. Finally, we utilized a softmax function to normalize the calculated effect of all risk factors.
Performance evaluation for disease classification
The performance of DKD classification was evaluated in terms of accuracy, sensitivity, specificity, F1 -score and area under the receiver operating characteristic (ROC) curve (AUC). Accuracy is the ratio of true predictions over all predictions. Sensitivity measures the percentage of positive samples that are correctly identified. Specificity measures the percentage of negative samples that are correctly identified. F1 -score is a trade-off metric that balances the classification accuracy of both positive and negative samples. The AUC, which is defined as the area under the ROC curve, comprehensively measures the classification effect of various threshold settings for classification. They are defined as follows:
Here, N is the number of samples in the class. TP, TN, FP and FN stand for the numbers of samples with true positive, true negative, false-positive and false-negative classifications, respectively. Note that for all 5 metrics, higher scores indicate better classification performance.
Loss functions and optimization methods
Once the system is established, it needs to be optimized to obtain the optimal solution. We adopted the following cross-entropy loss function for DN and NDRD classification:
In the given equation, Ï represents the model parameters, and λ is the L2 regularization coefficient. In this context, we chose \(\lambda \,=\,0.005\). Besides, yic takes the value of 1 when the sample value matches c. pic is the predicted probability of sample i of class c, and αic is the weight coefficient, defined as follows:
The purpose of defining weight coefficients is to gain more benefits in optimizing the prediction of the NDRD, thereby increasing the modelâs sensitivity for NDRD.
During model training, the Adam optimization algorithm49 is utilized for gradient descent. To accelerate model convergence and prevent it from getting trapped in local optima, the learning rate η is updated using the cosine annealing restart algorithm, as shown in the following formula:
In Equation 8, \({\eta}_{t}\) represents the current learning rate, \({\eta }_{\max }^{i}\) and \({\eta }_{\min }^{i}\) denote the range of the learning rate, \({T}_{e}\) denotes the current epoch, and Ti represents the current cycle number. The update is performed after each completion of a cycle of \({T}_{e}\), as specified by the following equation:
When training our system, \({T}_{0}\) and \({T}_{mult}\) are set to 10 and 2, respectively. Overall, the choice of \({T}_{0}\) and \({T}_{mult}\) has some degree of empirical basis from50, yet the final values are further derived from a substantial number of experiments and application results. This strategy allows the learning rate to oscillate and decay, thereby helping to escape local optima and accelerate convergence.
Data availability
The involved Retina-DKD dataset, prospective validation dataset, multi-center validation dataset and nonstandard validation dataset are open sourced in dropbox, including clinical data, fundus images and segmentation lesion annotations and histological data.
Code availability
Source code for our paper is available in GitHub via the Retina-DKD repository.
References
Thomas, M. C. et al. Diabetic kidney disease. Nat. Rev. Dis. Prim. 1, 1â20 (2015).
Thomas, M. C., Weekes, A. J., Broadley, O. J., Cooper, M. E. & Mathew, T. H. The burden of chronic kidney disease in australian patients with type 2 diabetes (thenefron study). Med. J. Aust. 185, 140â144 (2006).
Dwyer, J. P. et al. Renal dysfunction in the presence of normoalbuminuria in type 2 diabetes: results from the demand study. Cardiorenal Med. 2, 1â10 (2012).
Iwai, T. et al. Diabetes mellitus as a cause or comorbidity of chronic kidney disease and its outcomes: the gonryo study. Clin. Exp. Nephrol. 22, 328â336 (2018).
Anders, H.-J., Davis, J. M. & Thurau, K. Nephron protection in diabetic kidney disease. N. Engl. J. Med. 375, 2096â2098 (2016).
Suarez, M. L. G., Thomas, D. B., Barisoni, L. & Fornoni, A. Diabetic nephropathy: Is it time yet for routine kidney biopsy? World J. Diab. 4, 245 (2013).
Fiorentino, M. et al. Renal biopsy inpatients with diabetes: a pooled meta-analysis of 48 studies. Nephrol. Dialysis Transplant. 32, 97â110 (2017).
Nelson, R. G. & Tuttle, K. R. The new kdoqitm clinical practice guidelines and clinical practice recommendations for diabetes and ckd. Blood Purif. 25, 112â114 (2007).
Liu, X.-M. et al. Validation of the 2007 kidney disease outcomes quality initiative clinical practice guideline for the diagnosis of diabetic nephropathy and nondiabetic renal disease in chinese patients. Diab. Res. Clin. Pract. 147, 81â86 (2019).
Zhang, W. et al. New diagnostic model for the differentiation of diabetic nephropathy from non-diabetic nephropathy in Chinese patients. Frontiers in Endocrinology 13, 913021 (2022).
Lewis, S. M. et al. Spatial omics and multiplexed imaging to explore cancer biology. Nat. methods 18, 997â1012 (2021).
Zhang, K. et al. Deep-learning models for the detection and incidence prediction of chronic kidney disease and type 2 diabetes from retinal fundus images. Nat. Biomed. Eng. 5, 533â545 (2021).
Dosovitskiy, A. et al. An image is worth 16Ã16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4700â4708 (2017).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
Wolf, T. N., Pölsterl, S. & Wachinger, C. Daft: A universal module to interweave tabular data and 3âd images in cnns. NeuroImage 260, 119505 (2022).
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted InterventionâMICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, 234â241 (Springer, 2015).
Jin, W., Li, X. & Hamarneh, G. Evaluating explainable ai on a multi-modal medical imaging task: Can existing algorithms fulfill clinical requirements? In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 11945â11953 (2022).
Selvaraju, R. R. et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, 618â626 (2017).
Tong, X. et al. Insights into the role of renal biopsy inpatients with t2dm: a literature review of global renal biopsy results. Diab. Ther. 11, 1983â1999 (2020).
Olsen, S. & Mogensen, C. How often isniddm complicated with non-diabetic renal disease? an analysis of renal biopsies and the literature: An analysis of renal biopsies and the literature. Diabetologia 39, 1638â1645 (1996).
Mak, S. et al. Clinical predictors of non-diabetic renal disease inpatients with non-insulin dependent diabetes mellitus. Nephrol., Dialysis, Transplant.: Off. Publ. Eur. Dialysis Transpl. Assoc.-Eur. Ren. Assoc. 12, 2588â2591 (1997).
Levin, A. & Rocco, M. Kdoqi clinical practice guidelines and clinical practice recommendations for diabetes and chronic kidney disease. Am. J. Kidney Dis. 49, S10âS179 (2007).
Zhou, J. et al. A differential diagnostic model of diabetic nephropathy and non-diabetic renal diseases. Nephrol. Dialysis Transplant. 23, 1940â1945 (2008).
Jiang, S. et al. Novel model predicts diabetic nephropathy in type 2 diabetes. Am. J. Nephrol. 51, 130â138 (2020).
Yin, L., Zhang, D., Ren, Q., Su, X. & Sun, Z. Prevalence and risk factors of diabetic retinopathyin diabetic patients: A community-based cross-sectional study. Medicine 99, e19236 (2020).
Yang, Z., Feng, L., Huang, Y. & Xia, N. A differential diagnosis model for diabetic nephropathy and non-diabetic renal disease in patients with type 2 diabetes complicated with chronic kidney disease. Diabetes, metabolic syndrome and obesity: targets and therapy 1963â1972 (2019).
Sendra-Balcells, C. et al. Domain generalization in deep learning for contrast-enhanced imaging. Computers Biol. Med. 149, 106052 (2022).
Jin, P., Lu, L., Tang, Y. & Karniadakis, G. E. Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness. Neural Netw. 130, 85â99 (2020).
Tsiknakis, N. et al. Deep learning for diabetic retinopathy detection and classification based on fundus images: A review. Computers Biol. Med. 135, 104599 (2021).
Lecca, M. & Poiesi, F. Performance comparison of image enhancers with and without deep learning. JOSA A 39, 610â620 (2022).
Jiang, W. et al. Establishment and Validation of a Risk Prediction Model for Early Diabetic Kidney Disease Based on a Systematic Review and Meta-Analysis of 20 Cohorts. Diab. Care 43, 925â933 (2020).
Hammes, H., Lemmen, K. D. & Bertram, B. Diabetic retinopathy and maculopathy. Exp. Clin. Endocrinol. Diab. 122, 387â390 (2014).
Cole, J. B. & Florez, J. C. Genetics of diabetes mellitus and diabetes complications. Nat. Rev. Nephrol. 16, 377â390 (2020).
Lee, H. et al. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat. Biomed. Eng. 3, 173â182 (2019).
Qian, X. et al. Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning. Nat. Biomed. Eng. 5, 522â532 (2021).
de Boer, I. H. et al. Kdigo 2020 clinical practice guideline for diabetes management in chronic kidney disease. Kidney Int. 98, S1âS115 (2020).
Levey, A. S. et al. A new equation to estimate glomerular filtration rate. Ann. Intern. Med. 150, 604â612 (2009).
Bikbov, B. et al. Global, regional, and national burden of chronic kidney disease, 1990â2017: a systematic analysis for the global burden of disease study 2017. lancet 395, 709â733 (2020).
Tervaert, T. W. C. et al. Pathologic classification of diabetic nephropathy. J. Am. Soc. Nephrol. 21, 556â563 (2010).
Wang, S., Yu, L., Yang, X., Fu, C.-W. & Heng, P.-A. Patch-based output space adversarial learning for joint optic disc and cup segmentation. IEEE Trans. Med. imaging 38, 2485â2495 (2019).
Wang, X. et al. Joint learning of multi-level tasks for diabetic retinopathy grading on low-resolution fundus images. IEEE J. Biomed. Health Inform. 26, 2216â2227 (2021).
Fidon, L. et al. Generalised wassersteindice score for imbalanced multi-class segmentation using holistic convolutional networks. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: Third International Workshop, BrainLes 2017, Held in Conjunction with MICCAI 2017, Quebec City, QC, Canada, September 14, 2017, Revised Selected Papers 3, 64â76 (Springer, 2018).
Yeung, M., Sala, E., Schönlieb, C.-B. & Rundo, L. Unified focal loss: Generalising dice and crossentropy-based losses to handle class imbalanced medical image segmentation. Computerized Med. Imaging Graph. 95, 102026 (2022).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770â778 (2016).
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7132â7141 (2018).
Agarap, A. F. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 (2018).
Kinga, D., Ba, J. A. & others. A method for stochastic optimization. International Conference on Learning Representations (ICLR) 5, 6 (2015).
Loshchilov, I. & Hutter, F. Stochastic gradient descent with warm restarts. Proceedings of the 5th International Conference on Learning Representations, 1â16.
Acknowledgements
This work was supported by Capitalâs Funds for Health Improvement and Research (CFH 2024-1-5021), Science Technology Project of Beijing, China (No. Z221100007422121), Natural Science Foundation of China (No.3214100532000530,32371035), NSFC under Grants 62250001, 62231002, and by Beijing Natural Science Foundation under Grant L223021, L232122,7244305, L222133. Meantime, we also thank Jijun Li (Department of Nephrology, Fourth Medical Center of Chinese PLA General Hospital, Beijing, China), Fangyuan Yang (Beijing Key Laboratory of Diabetes Research and Care, Beijing Diabetes Institute, Beijing Tongren Hospital, Capital Medical University), Jie Yang (Department of nephrology, Daping Hospital, Army Medical University), Fenghua Yao (Department of Nephrology, Chinese PLA General Hospital), Ming Fang (The First Affiliated Hospital of Dalian Medical University), Lin Zhang (Dept. of Endocrinology, Dept. of Medical Record; Beijing Tongren Hospital, Capital Medical University), Jie Zhao (Senior Department of Ophthalmology, the Third Medical Center of PLA General Hospital), Yan Yu (The First Affiliated Hospital of Dalian Medical University),Xiang Ma (The First Affiliated Hospital of Dalian Medical University), and Jingying Zhou (The First Affiliated Hospital of Dalian Medical University) for their efforts in data collection.
Author information
Authors and Affiliations
Contributions
Conceptualization, XM.C., Z.D., and S.P.; methodology, X.W., S.P., T.W, Z.D., XM.C, M. X, XN.C., L.J., S.L., and Y.F.; formal analysis, Z.D., X.W., S.P., T.W, and XN.C,; investigation, Z.D., X.W., S.P., T.W. and S.J.; data curation, Z.D., S.P., S.J., X.C., Y.L., Z.W., XN.C., Q.W., P.C., G.C., L.Z., Y.W., J.W., L.T., J.Z., J.Y., Y.H., H.L., Y.G., Z.L., X.Y., and Y.Z.; writingâoriginal draft preparation, S.P., Z.D., X.W., T.W, S.J., and XN.C.; writingâreview and editing, XM.C., M.X. and L.W.; visualization, Z.D., X.W., S.P., T.W, and X.C.; supervision, XM.C., M.X. and L.W.; project administration, XM.C. and M.X.; funding acquisition, XM.C., M.X., and L.W. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleâs Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Dong, Z., Wang, X., Pan, S. et al. A multimodal transformer system for noninvasive diabetic nephropathy diagnosis via retinal imaging. npj Digit. Med. 8, 50 (2025). https://doi.org/10.1038/s41746-024-01393-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-024-01393-1