Journal Pre-Proof: Artificial Intelligence in Medicine
Journal Pre-Proof: Artificial Intelligence in Medicine
Journal Pre-Proof: Artificial Intelligence in Medicine
PII: S0933-3657(20)31266-5
DOI: https://doi.org/10.1016/j.artmed.2020.102001
Reference: ARTMED 102001
Please cite this article as: Md. Kamrul Hasan, Md. Ashraful Alam, Md. Toufick E Elahi,
Shidhartho Roy, Robert Martı́, DRNet: Segmentation and Localization of Optic Disc and
Fovea from Diabetic Retinopathy Image, <![CDATA[Artificial Intelligence In Medicine]]>
(2020), doi: https://doi.org/
This is a PDF file of an article that has undergone enhancements after acceptance, such as
the addition of a cover page and metadata, and formatting for readability, but it is not yet the
definitive version of record. This version will undergo additional copyediting, typesetting and
review before it is published in its final form, but we are providing this version to give early
visibility of the article. Please note that, during the production process, errors may be
discovered which could affect the content, and all legal disclaimers that apply to the journal
pertain.
of
Abstract
ro
In modern ophthalmology, automated Computer-aided Screening Tools (CSTs) are cru-
cial non-intrusive diagnosis methods, where an accurate segmentation of Optic Disc (OD)
-p
and localization of OD and Fovea centers are substantial integral parts. However, designing
such an automated tool remains challenging due to small dataset sizes, inconsistency in
re
spatial, texture, and shape information of the OD and Fovea, and the presence of different
artifacts.
lP
Methods
This article proposes an end-to-end encoder-decoder network, named DRNet, for the
na
segmentation and localization of OD and Fovea centers. In our DRNet, we propose a skip
connection, named residual skip connection, for compensating the spatial information lost
due to pooling in the encoder. Unlike the earlier skip connection in the UNet, the proposed
ur
skip connection does not directly concatenate low-level feature maps from the encoder’s
beginning layers with the corresponding same scale decoder. We validate DRNet using
Jo
different publicly available datasets, such as IDRiD, RIMONE, DRISHTI-GS, and DRIVE
for OD segmentation; IDRiD and HRF for OD center localization; and IDRiD for Fovea
center localization.
1
Results
The proposed DRNet, for OD segmentation, achieves mean Intersection over Union
(mIoU) of 0.845, 0.901, 0.933, and 0.920 for IDRiD, RIMONE, DRISHTI-GS, and DRIVE,
respectively. Our OD segmentation result, in terms of mIoU, outperforms the state-of-the-
art results for IDRiD and DRIVE datasets, whereas it outperforms state-of-the-art results
concerning mean sensitivity for RIMONE and DRISHTI-GS datasets. The DRNet localizes
the OD center with mean Euclidean Distance (mED) of 20.23 and 13.34 pixels, respectively,
for IDRiD and HRF datasets; it outperforms the state-of-the-art by 4.62 pixels for IDRiD
of
dataset. The DRNet also successfully localizes the Fovea center with mED of 41.87 pixels for
the IDRiD dataset, outperforming the state-of-the-art by 1.59 pixels for the same dataset.
ro
Conclusion
As the proposed DRNet exhibits excellent performance even with limited training data
-p
and without intermediate intervention, it can be employed to design a better-CST system to
screen retinal images. Our source codes, trained models, and ground-truth heatmaps for OD
re
and Fovea center localization will be made publicly available upon publication at GitHub1 .
lP
1. Introduction
Diabetic Retinopathy (DR) is a severe complication of diabetes, which affects the vascu-
lature of the eye of roughly one-third of diabetes patients [9, 83]. In the United States (US)
Jo
1
https://github.com/kamruleee51/DRNet_Segmentation_Localization_OD_Fovea
∗
I am corresponding author
Email addresses: m.k.hasan@eee.kuet.ac.bd (Md. Kamrul Hasan), alam1603001@stud.kuet.ac.bd
(Md. Ashraful Alam), toufick1469@gmail.com (Md. Toufick E Elahi), swapno15roy@gmail.com
(Shidhartho Roy), robert.marti@udg.edu (Robert Martı́)
1
Department of Electrical and Electronic Engineering, Khulna University of Engineering & Technology,
Khulna-9203, Bangladesh.
Preprint submitted to Artificial Intelligence in Medicine December 10, 2020
alone, the number of US adults, having positive diabetes, is estimated to increase from 22.3
million (9.1 %) (2014) to 39.7 million (3.9 %) (2030), and to 60.6 million (17.9 %) in 2060
[45]. Another statistical study on diabetes, in 2017, predicted that the existing 451.0 million
patients with positive diabetes worldwide would increase to 693.0 million by 2045 [15, 30].
However, DR, as a microvascular complication of Diabetes Mellitus (DM), appears to be the
leading cause of acquired vision impairment worldwide in middle-age subjects [39]. DR acc-
counts for 4.8 % of blindness cases (37.0 million) worldwide [68]. A conclusion of 35 studies,
on 22, 896 individuals with positive diabetes in the US, Australia, Europe, and Asia (between
of
1980 and 2008), found that the average incidence of both DR (Type-1 DM and Type-2 DM)
was 34.6 %, with 7.0 % vision-threatening DR [93]. However, the severity of DR and the pos-
ro
sibility of eye vision loss can be reduced significantly, approximately by a 57.0 %, with early
diagnosis [26]. For that, diabetic patients are strongly invited once or twice a year for reti-
-p
nal screening [40]. The ophthalmologist or expert diagnoses DR via physical assessment of
the retinal fundus images looking for different lesions such as microaneurysms, hemorrhage,
re
soft exudates, hard exudates etc. However, the number of ophthalmologists worldwide is
insufficient, currently estimated at around 1.0 per 29.0 million people approximately [67].
lP
This scenario is more severe in underdeveloped and developing countries, especially in ru-
ral areas, where most of the population is unaware of the importance of screening for an
early diagnosis. However, visual assessment of the retinal images via the naked eye may
na
introduce discrepancies in the assessment, as eye lesions and healthy eye tissues may have
similar characteristics [35]. Visual assessment done by ophthalmologists are also tedious,
time-consuming, subjective, and error-prone tasks. Currently, CST systems are developed
ur
for alleviating all the above-motioned limitations, in addition to reducing the working load
of ophthalmologists by providing an automated diagnostic tool. In that sense, OD segmen-
Jo
tation and localization of the Fovea and OD centers are integral parts of an automatic CST
system. The segmentation is an essential requirement in the CST systems as it extracts
the Region of Interest (ROI), which can be used to obtain detailed structural information
of various types of retinal lesions. Segmentation is also important to find signs of venous
bleeding, retinal neovascularization, retinal thickening, fluid accumulation, among others.
3
OD and Fovea are the most important anatomical structures used by ophthalmologists for
the assessment of retinal fundus images. For instance, around the OD, glaucoma severely
damages the optic nerves, which can be primarily detected from the segmented and localized
OD ROI. However, the automation of the OD segmentation and the localization of OD and
Fovea centers is highly challenging for the following reasons:
1. OD and Fovea are not always well-visible, as they present low contrast compared to
the background (see Fig. 1). Sometimes they also confound with other visually similar
structures (see in Fig. 1).
of
2. Fovea shows a large variability in terms of shape, structure, and boundary (see in
Fig. 1). It is usually small in size compared to the whole image size, and often imper-
ro
ceptible to the naked eye (see in Fig. 1).
3. The presence of different artifacts, such as dust particles, flash, diffused bright areas,
-p
lubrication on the camera lens, and reflection in the retinal fundus images.
4. Small numbers of manually annotated images are currently available, limiting the
re
robustness of supervised learning systems.
lP
na
Figure 1: Example of challenging retinal fundus images, in the Indian Diabetic Retinopathy Image dataset
[59], for automatic segmentation and localization of OD and Fovea. It can be observed that hard exudate
is more visible than the OD (first-row-first-column) as well as the OD is confounded with the background
(second-row-first-column). The Fovea is also imperceptible to the naked eye (second-row-second-column).
4
1.2. Related work
of
iterative morphological operations for producing a region-enhanced retinopathy image. The
ro
authors also employed 2-step automatic thresholding for obtaining the OD ROI by con-
straining that blood vessel confluence at the OD. Mittapalli and Kande [51] proposed an
OD segmentation model based on a region-based active contour model [43] by incorporat-
-p
ing localized image information from different channel features such as intensity, color, and
texture. However, the selection of robust initial contours is often tricky when the OD is sim-
re
ilar to the background. A method based on mathematical morphology, the Circular Hough
Transform (CHT), and the grow-cut algorithm [86], was proposed by Abdullah et al. [3].
lP
Morphological operators were applied in order to enhance the OD region and remove the
retinal vasculature. Subsequently, CHT was used to approximate the center of the OD, and
a grow-cut algorithm was finally used to segment the OD. A low-pass finite impulse response
na
filter was proposed by Bharkad [13] to suppress the dominance of blood vessels and enhance
the OD region. However, it requires selecting a proper value of the filter design parameters
ur
for obtaining a suitable frequency response with an optimized order of the filter. Finally,
the OD region segmentation was performed using grayscale morphological dilation and me-
Jo
dian filtering. Sarathi et al. [73] proposed an adaptive threshold-based region growing for
the OD segmentation, where they removed the vascular structure by precise inpainting of
blood vessels in the disc region. Tan et al. [81] developed a Convolutional Neural Network
(CNN) to segment OD, Fovea, and blood vessels. The authors extracted three channels
from the neighborhood points and forwarded the response to a network of 7-layers for every
5
point in a fundus image. Finally, the output layer consists of four neurons representing the
background, OD, Fovea, and blood vessels. A deep learning-based network for OD segmen-
tation, called MNet, was introduced by Fu et al. [21]. It includes a multiscale input layer,
U-shape CNN to learn the salient features, and a multi-label loss function. The authors
also proposed transforming the original image into a polar coordinate system to provide the
new representation of the original image and improve segmentation results. Porwal et al.
[58] used intensity profile analysis for the OD segmentation using a L0 Gradient Minimiza-
tion (L0 -GM)-based technique [90], where the authors had chosen the optimum threshold
of
applying an OTSU’s method. A method based on OD homogenization was proposed by
Naqvi et al. [52]. OD’s contour was achieved by inpainting the major vascular structure
ro
after utilizing a local Laplacian filtering. Finally, the authors used the gradient-independent
active contour for the OD boundary extraction. Raj et al. [61] segmented the OD employing
-p
the structural features such as brightness and contrast around the OD. The Harris corner
detection algorithm was used to extract the key features, called landmark points, followed
re
by a Voronoi image decomposition. In the end, the authors used Pratt’s circle fitting algo-
rithm for segmenting the OD based on those extracted landmark points. Zabihollahy and
lP
Ukwatta [95] proposed a CNN network for segmenting the OD, after localizing it using a
random forest classifier [97]. The saliency-based OD segmentation method, proposed by
Zou et al. [102], includes two stages: the OD localization and the saliency-based segmen-
na
tation. The authors estimated a region’s saliency by exploiting boundary and connectivity
priors, and the OD segmentation was obtained by thresholding and ellipse fitting. Kumar
et al. [42] proposed a pipeline for OD segmentation, which consists of preprocessing, blood
ur
vessels detection, and finally segmenting the OD. They employed mathematical morphology
and watershed algorithms for preprocessing and segmentation, respectively. Particle Swarm
Jo
Optimization (PSO) [85]-based ensemble, of deep neural networks, was proposed by Zhang
and Lim [98] for OD segmentation. The authors used an ensemble segmentation model to
avoid a single network’s bias, where final output masks were obtained from a voting mech-
anism. Abdullah et al. [2] proposed a fuzzy clustering method to extract the OD mask.
The authors chose the first channel of the input image for a preprocessing stage. Subse-
6
quently, their framework uses an active contour model based on a fuzzy clustering method
to obtain the final result. Lu and Chen [48] used the GrabCut [71] method for generating
pseudo-ground-truths images which were used to train a modified UNet [70] model. Finally,
they utilized a smaller number of ground-truth images to fine-tune the model. A rough OD
boundary was extracted by Xie et al. [89] based on a UNet framework. Subsequently, the
authors proposed a SUNet and a Viterbi method to jointly obtain the segmentation masks.
Ramani and Shanthamalar [62] used different image preprocessing methods such as image
resizing, binary conversion and masking, erosion, mapping, and Gaussian filtering. Then,
of
they proposed a region-based pixel density calculation method for OD localization and an
improved CHT with Hough peak value selection using super-pixels from the red channel to
ro
obtain the final OD segmentation.
Methods for localization. Gegundez-Arias et al. [24] localized the Fovea center in two
-p
steps. Firstly, they obtained a pixel within the Fovea region using prior anatomical location
information related to the OD and vascular tree. Secondly, that pixel was used to crop a sub-
re
image containing the Fovea, and finally, the Fovea center was localized using thresholding
and feature extraction techniques. Wu et al. [88] presented a localization method based on
lP
two-directional models. A global directional model was used to model the main vessels by
using two parabolas that shared vertex and different parameters. Then, a local directional
model was applied to characterize the local vessel convergence in the OD. In the end, both
na
models were integrated to localize the OD center. Kamble et al. [37] localized a center from
one-dimensional intensity profile analysis using time and frequency domain information.
ur
OD’s final central landmark was localized using signal peak and valley analysis in both
time and frequency domains. With the previous OD location’s help, the final Fovea central
Jo
landmark was estimated by finding a valley point. Araújo et al. [8] proposed a UOLO
method for simultaneous localization and segmentation. They derived the UOLO model from
the well-known YOLO [65] and UNet networks. The authors then obtained the geometric
centers from the segmented masks as the final OD and Fovea positions. Babu et al. [10]
proposed a two-stage framework for the OD and Fovea localization. In the first stage,
7
they used an object detection framework such as faster-RCNN (FRCNN) for obtaining the
object bounding box [66]. This was used to select the ROI and, in a second stage, the
authors applied a two-layer regression CNN with batch normalization to finally estimate
the OD and Fovea coordinates. Li et al. [44] detected the OD and Fovea centers using a
multi-stage region-based CNN. Firstly, they employed a standard FRCNN and SVM [69]
for the segmentation. Secondly, the authors proposed a relative position information-based
FRCNN for the detection. The localization of Fovea and OD centers was proposed by Al-
Bander et al. [5] employing a deep multiscale sequential CNN. The authors converted and
of
rescaled all the images to grayscale and 256 × 256 pixels before feeding them to the network.
They then applied a contrast-limited adaptive histogram equalization technique to reduce
ro
uneven illumination and enhance the images’ brightness. Lin et al. [46] implemented a
localization framework, where they incorporated pre-processing and main vessel extraction
-p
steps based on morphological operations. Finally, a parabolic fitting algorithm was used
to convey the main vessels by a least-square method. Joshi et al. [36] localized the OD
re
from the segmented binary mask. The authors added motion to the fundus image for
extracting the bright regions, which were then used for OD segmentation by thresholding.
lP
Finally, those binary masks were used to obtain the center of the disc. A low-rank semi-
supervised learning algorithm was presented by Zhou et al. [99] for automated disc and
Fovea localization. The authors used a low-rank representation for preserving the local
na
and global structures of the original fundus image. A template-supervised network, called
TNet, was proposed by Song et al. [78] to extract task-specific salient features. Firstly,
the authors obtained the templates from pixel-level annotations by down-sampling binary
ur
masks of recognition targets according to specific tasks. Then, the encoding network was
trained under the supervision of the specific templates. Finally, the encoding network was
Jo
merged with a posterior network for upsampling to obtain the segmented masks and a region
proposal network for the OD and Fovea localization. Maiya and Mathur [49] proposed a
Naive Single Stacked Hourglass (NSSH) network to learn the spatial orientation and pixel
intensity contrast between OD and Fovea. The NSSH incorporates three salient design
decisions, such as hourglass geometry, convolutional layer stacking, and replacing ResNet
8
blocks [32]. El Kim et al. [20] utilized a pre-trained network to locate the OD in the fundus
images. The authors used transfer learning in the well-known AlexNet [41] with a linear
regression output for localization. A concentric circular sectional symmetry measure was
proposed by Guo et al. [28] for symmetry axis detection and extraction of ROI. A weighted
gradient accumulation map was also proposed for locating the intensity changes, which
mitigated the effect of noise and artifacts to refine the localization.
Table 1 presents numerous methods mentioned above for the segmentation of OD and
localization of OD and Fovea centers with their respective utilized datasets and correspond-
of
ing results. Different methods for the segmentation of OD and the localization of OD and
Fovea centers have been reviewed in the previous paragraphs, which can roughly be split into
ro
image processing and deep learning-based approaches. However, a common characteristic
of both sets of methods is that they are highly dependent on parameter tuning, impacting
-p
the robustness and generability of the methods. Remarkably, robust parameter selection is
often complicated due to different challenging conditions in the fundus image (see in Fig. 1),
re
as described earlier. Moreover, several localization methods, as presented in subsection 1.2,
firstly segment the OD and Fovea, then measure the geometric center from the binary mask
lP
as their center. However, these approaches are primarily dependent on the shape of the ob-
ject on a binary mask. In the next subsection, we present our approach to alleviating those
above-motioned limitations for segmentation and localization of OD and Fovea in retinal
na
fundus images.
Fovea, which substantially alleviates the necessity of manual parameter tuning while being an
end-to-end system. The proposed DRNet has a U-shaped structure with newly proposed skip
connections, called residual skip connections, consisting of two shortcut paths. Those two
paths act as a regularizing path to each other (see details in subsection 3.2). The summing
output of those two shortcut paths from the encoder are then concatenated, channel-wise,
9
Table 1: Several published literature for the segmentation of OD and localization of OD and Fovea with their
utilized datasets and performances, where we show different metrics such as Acc, mDSC, mED, mIoU, and
mSn respectively for accuracy, mean dice similarity coefficient, mean Euclidean distance, mean intersection
over union, and mean sensitivity.
Localization result
Different methods Year Different datasets OD segmentation result
OD Fovea
Morphological reconstruction, DRIVE mIoU: 0.807, mSn: 0.878 - -
Gaussian mixture model, and DIARETDB1 mIoU: 0.802, mSn: 0.882 - -
2015
finally, morphological DIARETDB0 mIoU: 0.776, mSn: 0.866 - -
post-processing [72] CHASE DB1 mIoU: 0.808, mSn: 0.896 - -
Measurement of global symmetry DRIVE - Acc: 0.975 -
axis, then finding the centroid of 2015 STARE - Acc: 0.975 -
of
the blood vessel [56] HRF - Acc: 1.00 -
Adaptive threshold-based region MESSIDOR mIoU: 0.890 - -
2016
growing Sarathi et al. [73] DRIVE mIoU: 0.870 - -
Thresholding using grayscale DRIVE mIoU: 0.626, mSn: 0.871 mED: 9.12 -
morphological dilation and median 2017 DIARETDB0 mIoU: 0.611, mSn: 0.746 mED: 11.83 -
ro
filtering Bharkad [13] DIARETDB1 mIoU: 0.586, mSn: 0.751 mED: 13.00 -
Edge detection is followed by
2018 HRF - Acc: 0.933 -
circular hough transform [18]
UOLO algorithm for simultaneous DRIVE mIoU: 0.820, mDSC: 0.890 mED: 8.13 -
detection and segmentation [8]
A framework using an attention
based object relation module [10]
Faster-RCNN and SVM, then
2018
2018
2018
MESSIDOR
IDRiD
IDRiD
-p
mIoU: 0.880, mDSC: 0.930
-
mED: 9.40
mED: 60.32
mED: 32.6
mED: 10.44
mED: 95.45
mED: 52.0
re
RPI-based faster-RCNN [44]
U-shaped architecture using DRISHTI-GS mIoU: 0.904, mDSC: 0.949 - -
2018
fully-connected DenseNet [6] RIMONE mIoU: 0.829, mDSC: 0.904 - -
L0 -gradient minimization with IDRiD mIoU: 0.721, mDSC: 0.802 - -
2018
OTSU thresholding [58] MESSIDOR mIoU: 0.826, mDSC: 0.785 - -
lP
GrabCut and UNet methods [55] 2020 RIMONE mIoU: 0.880, mSn: 0.910 - -
10
with the same scale corresponding decoder. Unlike earlier skip connections as in UNet [70],
the proposed residual skip connection will not directly aggregate low-level features from the
earlier layers of the encoder with the same scale decoder since skipping features are passed
through several convolutional layers and merged with the non-zero regularizing skipping
path. Thus, the proposed DRNet has a better-compensating capability of losing spatial
information due to pooling in the encoder. For localization, we generate and propose a
2D heatmap (see details in subsection 3.1), having Gaussian-shaped intensity distribution,
whose peak is located at the given center of the OD and Fovea. Those 2D maps are then
of
used to train the proposed DRNet as a regression network to predict the OD and Fovea
centers from the predicted heatmap’s peak intensity points. Thus, we propose a shape
ro
independent localization method, which can provide higher accuracy in localization than
earlier methods (subsection 1.2), which are likely to be adversely affected by noise, low
-p
contrast, and artifacts. To our best knowledge, our proposed DRNet outperforms state-of-
the-art results on five different datasets for the OD segmentation and localization of OD
re
and Fovea centers.
The remaining sections are presented as follows: section 2 describes the utilized datasets
lP
to carry the experimentation. The proposed methodologies, along with the designing of the
proposed DRNet, are presented in section 3. Section 4 describes the obtained results, for
both segmentation and localization, along with a proper interpretation and state-of-the-art
na
2. Datasets
ur
We evaluate our proposed DRNet for two different tasks: the segmentation of OD and
centers’ localization of OD and Fovea. Table 1 shows the different datasets and corresponding
Jo
methods used in the recent literature for the segmentation and localization. In this article,
we utilize five different datasets, from Table 1, in order to compare our method with current
state-of-the-art. We use the Indian Diabetic Retinopathy Image Dataset (IDRiD) [59],
the Retinal Image Database for Optic Nerve Evaluation (RIMONE) [22], the DRISHTI-
GS [77, 76], and the Digital Retinal Images for Vessel Extraction (DRIVE) [79] for OD
11
segmentation. IDRiD and High-Resolution Fundus (HRF) [14] datasets are used for OD
center localization, whereas the IDRiD dataset is adopted for Fovea center localization.
Table 2 shows a summary of the data distribution of all the datasets. However, the validation
set is not provided for all the datasets, which is essential for supervised learning systems.
We have applied a cross-validation technique for the training, validation, and testing sets
selection. Fig. 1 shows several challenging instances of retinal fundus images in the datasets.
In the following paragraphs, we briefly describe all the used datasets.
Table 2: Distribution and details of the five different datasets used for evaluating the proposed DRNet for
of
OD segmentation and OD and Fovea center localization.
ro
Different datasets Resolution FoV Bit depth
OD Fovea OD mask
IDRiD [59] 4288 × 2848 50◦ 24-bits 516 516 81
RIMONE [22]
DRISHTI-GS [77, 76]
1072 × 1424
2896 × 1944
-
30◦
24-bits
24-bits
-p -
-
-
-
169
101
◦
DRIVE [79] 786 × 584 30 8-bits - - 40
re
HRF [14] 3504 × 2336 45◦ 24-bits 45 - -
lP
IDRiD. The images were acquired in India using a digital fundus camera (Kowa VX-10a)
and manually annotated by a master’s student using special software developed by ADCIS
na
RIMONE. This dataset was captured and proposed in Spain using Nidek AFC-210 with a
ur
body of a Canon EOS 5D Mark II of 21.1 megapixels, where the OD and cup were manually
segmented by five experts, which were then combined to form the ground-truth segmentation
masks.
Jo
DRISHTI-GS. This dataset is generated by the medical image processing group, IIIT,
India. The ground-truth segmentation masks containing the optic disc were annotated by
combining the markings of four practitioners.
12
DRIVE. The images were acquired in The Netherlands using a Canon CR5 non-mydriatic
3CCD camera. The volunteers, who manually annotated the images, were instructed and
trained by an experienced ophthalmologist.
HRF. Those images were extracted in Germany using a Canon CR-1 fundus camera, where
the gold standard ground-truth data was generated by a group of experts working in retinal
image analysis and clinicians from the cooperated ophthalmology clinics.
3. Methodology
of
In this section, we describe the proposed methodology for segmentation and localization.
ro
In subsection 3.1, we present the preprocessing of all the images in different datasets, as
described in earlier section 2, before feeding them to the proposed network. Subsection 3.2
-p
represents the design of the proposed network. Finally, in subsection 3.3, we describe the
training protocol for the segmentation and regression by employing the proposed DRNet.
The proposed workflow, as shown in Fig. 2, serves two different tasks: (1) OD segmentation
re
and (2) regression to generate heatmaps for OD and Fovea center localization. We describe
lP
Train
Preprocessing
DRNet Task-1
Training phase OD Segmentation
na
DRNet
Task-2
ur
processing is the crucial integral part of the workflow in the training phase.
each part of the proposed workflow shown in Fig. 2, in the following subsections.
13
3.1. Preprocessing
Preprocessing is a crucial integral step to train the proposed network. Generally, large
amounts of data are used to train a CNN to overcome overfitting. However, many medical
imaging domains, such as the one addressed in this paper, have access to small-sized datasets
as manually annotated training images are challenging to generate [29]. Data augmentation
is a commonly used preprocessing step for training a CNN network, enhancing the size and
quality of training datasets to improve the generalization of the CNN network [75]. In this
article, we use several geometry-based augmentations: rotation (40◦ around the center of the
of
image), height-width shift (10 %) and zooming (10 %). To reduce the computational burden,
we resize all the images to 256 × 170 pixels using nearest-neighbor interpolation by keeping
ro
the same aspect ratio, approximately 1.5, as in the original fundus images. Additionally, we
have standardized the images to zero mean and unit variance, and rescaled to [0 1].
-p
We train the proposed DRNet with the 2D heatmaps, centered on the ground-truth
coordinates, to localize OD and Fovea center coordinates [53, 84, 92]. Such training of
re
DRNet resembles output heatmaps of the test images, where the predicted OD and Fovea
center coordinates are obtained by computing the argmax of pixel values from the predicted
lP
heatmap. However, the utilized datasets provide only the coordinate of the OD and Fovea
centers rather than 2D center heatmaps. For example, for the image, as shown in Fig. 3,
the coordinate of the OD center is given as (2858, 1805). Therefore, we create 2D heatmaps
na
(0,0)
2858
(2858,1805)
ur
(0,0)
(2858,1805)
1805
Figure 3: Generation of 2D heatmaps for regression to localize the OD and Fovea centers, where the heatmap
Jo
Medical image segmentation has achieved tremendous success markedly since 2015 after
the introduction of UNet [70]. Nowadays, for medical images, CNN-based segmentation net-
works have been widely applied, outperforming traditional image processing methods relying
of
on hand-crafted features [80]. The latter often requires a significant amount of parameter
and feature tuning, while the former is considered an end-to-end approach. However, the
ro
CNN-based segmentation network comprises two necessary components: the encoder and
the decoder [70]. An encoder, also called the feature learner module, consists of convolu-
-p
tional and pooling layers. The convolutional layers are used to generate the feature maps,
while the pooling layers gradually reduce the dimension of those feature maps to capture
higher semantic features with higher spatial invariance [31]. Such a reduction in resolution
re
due to pooling also expands the Field of View (FoV) of the generated feature maps and
reduces the computational cost [47]. The encoder part in our proposed DRNet, as presented
lP
in Fig. 4, has five convolutional layers with the kernels of 3 × 3 and four pooling layers with
the stride of 2 × 2. The final feature map has the resolution of M/24 × N/24 for an input
na
resolution of M × N .
A decoder projects the lower resolution learned features resulting from the encoder onto
a higher resolution for gradually recovering the spatial information and finally obtaining the
ur
semantically segmented mask of the input image resolution [47, 70, 31]. All current segmen-
tation networks contain almost similar encoders, but they differ significantly in the decoder
Jo
mechanism to regain the lost spatial information. However, after the encoder, the reduced
feature maps often suffer from spatial resolution loss, which introduces coarseness, less edge
information, checkerboard artifacts, and over- and under-segmentation in the final predicted
masks [47, 70, 54]. Currently, several methods have already been proposed to alleviate the
limitations of regaining lost spatial information. Ronneberger et al. [70] introduced a skip
15
+
8 24 16 64 2 Sigmoid
Input Output
8 16 32 8
+
16 40 16
16
16 32 16
+
of
32 64 32
24
64 32
ro
+
32 64 32
32
32
32
32
-p
re
+
Conv. with ReLU Batch Norm Pooling Upsampling Concatenation Final Feature Map Addition
lP
Figure 4: Structure of the proposed DRNet, for both segmentation and regression, employing the proposed
residual skip connection.
na
connection in their popular UNet, which allowed the decoder to regain lost spatial informa-
tion by the channel-wise concatenation at each stage of the encoder and decoder with the
same scale. However, as in UNet, skip connections unessentially aggregate the same scale
ur
feature maps of the encoder and decoder. Such aggregation is regarded as a limitation of
the UNet, as it forces an unnecessary fusion strategy, imposing aggregation only at the same
Jo
scale feature maps of the encoder and decoder [100]. Long et al. [47] fused features from
different pooling layers of the encoder in their Fully Convolutional Network (FCN). Finally,
those fused maps were then upsampled using different up-scaling factors. A deconvolution
overlap occurs when the kernel size is not divisible by the up-scaling factor. The number
of low-resolution features across the high-resolution feature map is not constant due to the
16
deconvolution overlap, which introduces checkerboard artifacts in the final segmented masks
[54, 31]. Al-Masni et al. [7] proposed a Full resolution Convolution Network (FrCN) without
any pooling layers, which preserves the spatial information in the output segmented masks.
However, the pooling layers are highly desirable to learn the most salient features from dif-
ferent resolutions, as described earlier. In SegNet, Badrinarayanan et al. [11] preserved the
indices at each pooling layer in the encoder, which was utilized to upsample the correspond-
ing feature map in the decoder. However, the neighboring information was not considered
during the upsampling.
of
In the DRNet, we propose a residual skip connection, as depicted in Fig. 5, instead of the
earlier skip connection in the well-known UNet. The proposed residual skip connection, in
ro
Xin
-p F
+
Xout
re
lP
Figure 5: A presentation of the proposed residual skip connection to generate a skipped output of Xout for
the input of Xin .
na
our DRNet, has two shortcut paths: the convolutional path and the regularizing path. The
convolutional shortcut path can not degrade the features’ quality as a non-zero regularizing
path will skip over them. On the other hand, the direct skipping of the non-zero regularizing
ur
path can not also hamper the performance as it has been added with the learned features in
the convolutional shortcut path. For nth convolutional layer of the DRNet, the feature map
Jo
n n
Xin is used to produce an output feature map Xout by applying the proposed residual skip
n n n
connection (see Fig. 5) as Xout = F (Xin ) + Xin , where F is the stack of convolutions in the
n
convolutional shortcut path. The output of a residual skip connection Xout ∈ RB×H×W ×E is
then concatenated with the same scaled nth decoder output Yout
n
∈ RB×H×W ×D , to regain the
lost spatial information. Mathematically, XCn = [Xout
n n
++ Yout ], where XCn ∈ RB×H×W ×(E+D)
17
and + , B, H, W , E, and D respectively denote the channel concatenation, batch size,
n n
height, width, depth of Xout , and depth of Yout . Such a proposed residual skip connection,
as in our DRNet, will not directly aggregate low-level features in the encoder with the
corresponding same scale decoder since features are passed through several convolutional
layers merged with the non-zero regularizing skipping path.
Additionally, all the convolutions, in both encoder and decoder, are followed by the
batch normalization layers [34] for standardizing the inputs to the layers each mini-batch to
tackle the internal covariate shift. The number of convolutional layers and the corresponding
of
feature depth in DRNet are kept small to learn fewer parameters, without sacrificing the
quality results, considering also the small size of the datasets used (see section 2). Such
ro
a network design also makes the DRNet lightweight and more suitable for real-time CST
systems. Moreover, the final output 2D feature map from the sigmoid activation function
-p
allows the DRNet as a segmentation or regression network. For segmentation, the output
maps are thresholded to get binary masks of the OD. For localization, 2D argmax on the
re
resembled output heatmaps provides the OD and Fovea centers’ spatial coordinates.
The Xavier uniform distribution, also known as Glorot uniform distribution [25], is used
to initialize the kernels in all the convolutional layers in both encoder and decoder. Such a
na
distribution draws the samples from a truncated normal distribution centered on 0.0 with
p
a standard deviation of (2/(Fin + Fout )). Fin and Fout denotes the number of input and
output units in the weight tensor, respectively. As the OD ROIs are much smaller than the
ur
background, a typical metric that equally weighs background and foreground pixels would be
biased towards background pixels. Hence, we use IoU as a metric (Mseg ), to be maximized
Jo
18
where y, ŷ, and N are the true label, predicted label, and the total number of pixels,
respectively. The product of y and ŷ, in Eq. 1, is the measure of similarity (intersection)
between true and predicted mask. On the other hand, binary cross-entropy and mean
squared errors are applied as the loss functions for segmentation and regression by the
proposed framework, respectively. For both the tasks, the loss functions are optimized
using the Adam optimizer [38] with initial Learning Rate (LR) and exponential decay rates
(β 1 , β 2 ) as LR = 0.001, β 1 = 0.9, and β 2 = 0.999, respectively, without AMSGrad variant.
The LR is reduced after 5 epochs by 10.0 % if validation loss stops improving. We set the
of
initial epochs as 200 and stop the training using a callback function when the validation loss
has stopped improving.
ro
4. Results and Discussion
-p
In this section, we are presenting the results of different experiments. In subsection 4.1,
we firstly present the ablation studies on our proposed DRNet, specifically to evaluate the
re
benefits of the proposed skip connection in DRNet, using four different datasets (see sub-
section 4.1.1). We also show OD segmentation results on those datasets and compare them
lP
to state-of-the-art methods on the same datasets (see in subsection 4.1.2). The localization
results using different datasets for OD and Fovea centers are presented in the subsection
4.2 along with the state-of-the-art comparisons. Finally, in subsection 4.3, we show several
na
4.1. Segmentation
ur
mean accuracy (mAcc), mean Intersection over Union (mIoU) as the metrics to evaluate
the segmented OD masks. The mSn and mAcc are the metrics used to evaluate the false-
negative region and the percentage of correctly classified pixels, whereas the mIoU quantifies
the overlap between the true and predicted OD masks.
19
4.1.1. DRNet segmentation results
Fig. 6 shows comprehensive OD segmentation performances using our proposed DRNet
and the equivalent traditional encoder-decoder network without our proposed skip connec-
tion on four different datasets. The experimental results demonstrate higher performance, in
1.00
0.95
0.90
IoU
0.85
of
0.80 With Proposed SC
Without Proposed SC
ro
IDRiD RIMONE DRISHTI-GS DRIVE
hibit that the OD masks with proposed skip connection have smoother boundaries, whereas
the other masks suffer from the zigzag or coarse boundaries. The zigzag boundaries are due
to the checkerboard or blocky artifacts during the upsampling with less spatial information.
ur
Such improved results experimentally reveal that the direct concatenation of low-level fea-
tures to the corresponding decoder has an adverse effect in semantic segmentation to regain
Jo
the lost spatial information due to pooling in the encoder. Hence, the addition of the convo-
lutional path with a regularizing path in our proposed skip connection successfully improves
the OD segmentation results, which exhibits better regaining capability of our proposed
skip connection for the semantic segmentation. In the next paragraph, we further explore
qualitative and quantitative OD segmentation results employing our proposed DRNet on
20
Query image with PSC without PSC Query image with PSC
of
p roposeds kipc onnection.pdf
RIMONE dataset DRISHTI-GS dataset
ro
Figure 7: Qualitative results on two different datasets, such as RIMONE and DRISHTI-GS, with and
without Proposed Skip Connection (PSC).
-p
different datasets as the skip connection in our DRNet has a better performance comparing
the earlier skip connection in the UNet.
re
Table 3 presents the mean and median values of different metrics for OD segmentation on
four different datasets. On average, 89.9 %, 95.9 %, 96.0 %, and 96.2 % pixels are correctly
lP
Table 3: OD segmentation results employing the proposed DRNet on four different publicly available
datasets.
na
Metrics
Different datasets mSn mAcc mIoU
Mean Median Mean Median Mean Median
ur
labeled as OD, with respective type-II errors as 10.1 %, 4.1 %, 4.0 %, and 3.8 %, for IDRiD,
RIMONE, DRISHTI-GS, and DRIVE datasets, respectively. It is also noteworthy from
21
the mIoU of the segmented OD masks by the proposed DRNet that approximately 84.5 %,
90.1 %, 93.3 %, and 92.0 % predicted disc pixels have coincided with true disc pixels for
IDRiD, RIMONE, DRISHTI-GS, and DRIVE datasets, respectively. The median values of
different metrics, as in Table 3, also exhibit the average performances of the OD segmentation
for more than 50 % test samples.
The qualitative results of several fundus images of four separate datasets, for the OD
segmentation employing the proposed DRNet, are presented in Fig. 8. The overlay results,
as in Fig. 8, qualitatively depict that the DRNet generates the OD mask with very high
of
true-positive pixels compared to false-positive and false-negative pixels, which also shows
the precision of the segmented OD boundary (green color) comparing the true OD boundary
ro
(blue color) for all the datasets. The segmented OD results qualitatively show that the OD
boundaries do not show checkerboard or blocky results, as they show those effects in Fig. 7
-p
(third- and sixth-columns), due to the proposed skip connection’s capability of compensating
the lost spatial information in the encoder. Although there are different types of lesions in
re
some images, the DRNet can accurately extract the OD boundaries, with very negligible
false-positive and false-negative regions, for all the datasets.
lP
masks for seven out of the twelve cases while performing as a second-best on the three cases
on four different datasets. The proposed DRNet produces the best OD segmentation results
ur
for the mIoU by outperforming the recent state-of-the-art FCN [23] and DCF+CHT [19]
with the margins of 7.7 % and 4.7 % on the IDRiD and DRIVE datasets, respectively. A
Jo
method, DCF+CHT proposed by Dharmawan et al. [19], is solely reliant on the OD size
estimation and its center localization, which requires a significant amount of parameter
tuning. The robust estimation of those two requirements is often highly challenging when
the images of OD are noisy (see in Fig. 1 and subsection 1.1). On the other hand, the
FCN network implemented by Furtado [23] but originally proposed by Long et al. [47],
22
(a) OD segmentation results of IDRiD dataset
of
ro
(b) OD segmentation results of RIMONE dataset
-p
re
lP
Figure 8: Qualitative OD segmentation results using the proposed DRNet on different publicly available
datasets, where the green and blue circles respectively denote the predicted and true OD boundaries. The
IoU is also provided for quantitative evaluation.
23
Table 4: OD segmentation results for the proposed DRNet and other state-of-the-art methods on the IDRiD,
RIMONE, DRISHTI-GS, and DRIVE datasets. Bold-font and underline respectively denote the best and
second-best metrics. The acronyms of the methods are given in the table footnote.
of
DMS∗ [91] 2019 0.952 - - - - - - - - - - -
LARKIFCM∗ [82] 2019 - - - 0.868 0.934 0.880 0.848 0.926 0.870 - - -
GSOA∗ [60] 2020 - - - - 0.986 - - - - - 1.00 -
GrabCut+UNet [48] 2020 - - - 0.904 0.993 0.896 0.929 0.997 0.919 - - -
ro
DeepLab [23] 2020 - - 0.680 - - - - - - - -
FCN [23] 2020 - - 0.768 - - - - - - - -
UNet [23] 2020 - - 0.173 - - - - - - -
ResNet+UNet [94] 2020 - - - - - 0.925 - - 0.949 - - -
DCF+CHT∗ [19]
SSTL∗ [12]
Proposed DRNet
∗
2020
2020
2020
-
-
0.899
-
-
0.997
-
-
0.845
-
0.873
0.959
-
0.995
0.962
-p
0.882
0.901
- -
0.954
0.960
-
0.996
0.998
-
0.931
0.933
0.914
-
0.962
-
-
0.999
CHT+AT: Circular Hough Transform + Adaptive Thresholding; BAO: Bat Algorithm Optimization; PCA+OT: Principal Component
0.873
-
0.920
re
Analysis + OTSU Thresholding; DMS: Deep Membrane System; LARKIFCM: Level Set-based Adaptively Regularized Kernel-based
Intuitionistic Fuzzy C means; GSOA: Glowworm Swarm Optimization Algorithm; DCF+CHT: Dolph-Chebyshev Filter + Circular Hough
Transform; SSTL: Semi-supervised and Transfer Learning.
lP
has several limitations in the semantic segmentation, such as checkerboard artifacts and
deconvolution overlapping, as it was shown in DSNet [31] and FrCN [7] for skin lesion
segmentation. Moreover, our DRNet, with 1.0 million parameters, is more lightweight than
na
FCN Furtado [23], as DRNet has 184-times less parameters. Furthermore, the mIoU of the
OD segmentation using the traditional UNet [23], initially proposed by Ronneberger et al.
ur
[70], has been outperformed by our proposed DRNet by a margin of 67.2 % for IDRiD dataset.
Results indicate that the proposed residual skip connection better estimates the lost spatial
Jo
information due to pooling in the encoder layers. Although the ResNet+UNet of Yu et al. [94]
for DRISHTI-GS outperforms our network by a margin of 1.6 % for mIoU, DRNet achieves
second-best results. In terms of mSn and mAcc, DRNet outperforms all other methods being
SSTL [12] and Deep FCNN [6], the second-best performing methods, respectively. Again, for
the RIMONE dataset, DRNet beats the second-best OD segmentation method (CHT+AT
24
[96]) by a margin of 4.8 % in terms of mSn. ResNet+UNet of Yu et al. [94] is a two-stages
method for OD segmentation, which is computationally expensive, and the failure of the
first-stage hampers the performance of the second stage. However, our DRNet is an end-
to-end single-stage method without intermediate intervention, with approximately 33-times
less number of parameters than the ResNet+UNet. CHT+AT, of Zahoor and Fraz [96],
requires a large number of parameter selections, which is often less robust and error-prone
due to variability of the retinal images (see in Fig. 1 and subsection 1.1). In conclusion,
the above all discussions experimentally validate our proposed DRNet, with proposed skip
of
connection for the OD segmentation.
4.2. Localization
ro
In this subsection, we show qualitative and quantitative results for OD and Fovea center
localization. In subsection 4.2.1, the Euclidean distance (ED) (in pixels) between true (blue
-p
color) and predicted (green color) markers is used as a quantitative metric. Subsequently,
we compare the results of the proposed DRNet with the state-of-the-art results on the same
re
datasets, in subsection 4.2.2.
lP
HRF dataset, the values are 13.34 and 13.60 pixels, respectively. Regarding Fovea center
localization, the values are 41.87 and 22.36 pixels for the IDRiD dataset, respectively. Those
median localization results show that 50.0 % of the predicted OD centers have been located
ur
within 16.45 and 13.60 pixels from the true locations in the IDRiD and HRF datasets,
respectively, and a larger value of 22.36 pixels for the Fovea center localization in the IDRiD
Jo
dataset. Those results are generally regarded as satisfactory for center localization, especially
considering the high spatial resolution of the original fundus images of up to 4288 × 2848
and 3504 × 2336 pixels. Qualitative localization results are shown in Fig. 9, for both OD
and Fovea localization. For OD localization (see in Fig. 9 (left) and Fig. 9 (middle)) ED
errors are 16.55 and 13.60 pixels. However, visually they are located inside the OD and
25
slightly deviated from the true locations. Similarly, the detected Fovea center is also slightly
deviated (22.02 pixels) from the true location (see Fig. 9 (right)) but also visually acceptable.
of
Figure 9: Three examples of qualitative results where true and predicted markers are denoted in blue and
green, respectively. The Left and middle images show OD center estimation in the IDRiD and HRF datasets,
respectively, and the right image depicts the Fovea center localization in the IDRiD dataset.
ro
Additional qualitative results for the OD and Fovea center localization are presented in
-p
Fig. 10 showing results for different EDs and datasets. Fig. 10 shows qualitative localization
results under different conditions. The OD results in Fig. 10 (a) (first row-third column)
re
show that the HE is more visible than the OD, but still, DRNet can detect the location of
OD center with ED of 16.4 pixels (less than the median value). The last column of Fig. 10
lP
(a) shows a difficult case, and although a large ED is obtained, the OD is predicted inside the
real OD region. Other complex examples are also shown in Fig. 10 (b) for the HRF dataset,
with similar results of satisfactory OD localization. The DRNet also successfully locates the
na
Fovea center in the complex retinal images of Fig. 10 (c), where results are satisfactory if
Fovea is well visible. As an extreme illustrative case, the last column and last row of Fig. 10
(c) shows a case where the Fovea is barely visible, but DRNet can determine the Fovea
ur
26
(a) OD center localization results of IDRiD dataset
of
ro
-p
re
(b) OD center localization results of HRF dataset
lP
na
ur
Figure 10: DRNet Qualitative results for center localization (a) OD of IDRiD dataset, (b) OD of HRF
Jo
dataset, and (c) Fovea center localization of IDRiD dataset. The ROIs are zoomed to better-visualize the
true (blue color) and predicted (green color) markers. ED metric is also shown overlaid.
27
Table 5: The qualitative metric (ED in pixels) from different OD and Fovea localization methods, including
the proposed DRNet. Bold-font and underline respectively denote the best and second-best metrics.
of
UNet [78] 32.61 - -
TNet [78] 24.85 - -
ro
Proposed DRNet (2020) 20.23 41.87 13.34
ResNet and UNet were originally proposed by He et al. [33] and Ronneberger et al. [70]
-p
respectively, but implemented by Babu et al. [10], for OD and Fovea center localization, and
Song et al. [78] for OD center localization. The proposed DRNet outperforms all methods,
re
being better than the second-best [78] and [10] by the 18.59 % and 3.66 %, for OD and
Fovea center localization, respectively, for the IDRiD dataset. DRNet also outperforms the
lP
IDRiD challenge winner [101] by the margins of 21.0 %, and 8.76 % respectively for the OD
and Fovea centers. One should also note that the proposed DRNet, with the proposed skip
connection, again outperforms the traditional UNet, which again experimentally shows the
na
better capability of our proposed skip connection for regaining the lost spatial information.
For OD center localization on the HRF dataset Panda et al. [56], Devasia et al. [18], and
ur
Rathod et al. [63] achieved accuracies of 100 %, 93.3 %, 95.0 %, respectively. In all those
works, the accuracy criteria used defines localization as correct if the detected OD center
Jo
lies inside the OD region, without considering its distance from the true location. In that
sense, to our knowledge, this is the first work to provide OD center localization error in
pixels for the HRF dataset. Using the same accuracy criteria, our DRNet achieved 100 %
accuracy with mED of 13.60 pixels, which shows the accuracy and robustness of the DRNet
for OD center localization on the HRF dataset.
28
4.3. Limitation of DRNet
Although the overall satisfactory results obtained by the DRNet (additional results are
shown in the Appendix, Fig. A.12), Fig. 11 shows several cases where the proposed DRNet
fails to generate precise results for the segmentation of OD and localization of OD and Fovea
centers.
of
ro
-p
re
Figure 11: Several examples of failing cases for the segmentation of OD (first row) and localization of OD
and Fovea by the DRNet (second and third rows), where the true and predicted markers are denoted by the
lP
A possible explanation for those errors could be related to the relatively small size of the
na
training dataset (see in section 2), which could impact on generalization capabilities of the
network. The additional fact is that the network structure (depth, number of convolutions
and subsampling layers, and other hyperparameters) has not been explicitly tuned for the
ur
datasets used could also explain the small number of failed cases. However, in the future,
we will collect additional images to enhance the diversity in the training samples and search
Jo
5. Conclusion
of
the number of parameters, layers, and depth of the DRNet for achieving the highest possible
performances. Since the proposed skip connection has a better-compensating capability of
ro
lost spatial information due to pooling in the encoder, other domains of medical images for
segmentation and localization will be used, where precise spatial information is crucial. The
-p
proposed DRNet will be applied to other domains for medical imaging to verify its versatility
and generability.
re
Acknowledgements
lP
References
[1] Abdullah, A.S., Özok, Y.E., Rahebi, J., 2018. A novel method for retinal optic disc detection using
Jo
bat meta-heuristic algorithm. Medical & biological engineering & computing 56, 2015–2024.
[2] Abdullah, A.S., Rahebi, J., Özok, Y.E., Aljanabi, M., 2020. A new and effective method for human
retina optic disc segmentation with fuzzy clustering method based on active contour model. Medical
& biological engineering & computing 58, 25–37.
[3] Abdullah, M., Fraz, M.M., Barman, S.A., 2016. Localization and segmentation of optic disc in retinal
images using circular hough transform and grow-cut algorithm. PeerJ 4, e2003.
30
[4] ADCIS, 2020. adcis: a team of imaging experts. http://www.adcis.net/en/home/ [Accessed: 12
November 2020].
[5] Al-Bander, B., Al-Nuaimy, W., Williams, B.M., Zheng, Y., 2018a. Multiscale sequential convolutional
neural networks for simultaneous detection of fovea and optic disc. Biomedical Signal Processing and
Control 40, 91–101.
[6] Al-Bander, B., Williams, B.M., Al-Nuaimy, W., Al-Taee, M.A., Pratt, H., Zheng, Y., 2018b. Dense
fully convolutional segmentation of the optic disc and cup in colour fundus for glaucoma diagnosis.
Symmetry 10, 87.
[7] Al-Masni, M.A., Al-Antari, M.A., Choi, M.T., Han, S.M., Kim, T.S., 2018. Skin lesion segmentation in
dermoscopy images via deep full resolution convolutional networks. Computer methods and programs
of
in biomedicine 162, 221–231.
[8] Araújo, T., Aresta, G., Galdran, A., Costa, P., Mendonça, A.M., Campilho, A., 2018. UOLO-
ro
automatic object detection and segmentation in biomedical images, in: Deep Learning in Medical
Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, pp. 165–173.
[9] Atlas, I.D., 2017. Brussels, Belgium: international diabetes federation; 2013. International Diabetes
Federation (IDF) , 147.
-p
[10] Babu, S.C., Maiya, S.R., Elango, S., 2018. Relation networks for optic disc and fovea localization in
re
retinal images. arXiv:1812.00883 .
[11] Badrinarayanan, V., Kendall, A., Cipolla, R., 2017. SegNet: A deep convolutional encoder-decoder
architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence
lP
39, 2481–2495.
[12] Bengani, S., Vadivel, S., et al., 2020. Automatic segmentation of optic disc in retinal fundus images
using semi-supervised deep learning. Multimedia Tools and Applications , 1–26.
na
[13] Bharkad, S., 2017. Automatic segmentation of optic disk in retinal images. Biomedical Signal Pro-
cessing and Control 31, 483–498.
[14] Budai, A., Bock, R., Maier, A., Hornegger, J., Michelson, G., 2013. Robust vessel segmentation in
ur
31
band filter. Computers in Biology and Medicine 56, 1–12.
[18] Devasia, T., Jacob, P., Thomas, T., 2018. Automatic optic disc localization in color retinal fundus
images. Adv. Comput. Sci. Technol 11, 1–13.
[19] Dharmawan, D.A., Ng, B.P., Rahardja, S., 2020. A new optic disc segmentation method using a
modified Dolph-Chebyshev matched filter. Biomedical Signal Processing and Control 59, 101932.
[20] El Kim, D., Hacisoftaoglu, R.E., Karakaya, M., 2020. Optic disc localization in retina images us-
ing deep learning frameworks (conference presentation), in: Disruptive Technologies in Information
Sciences IV, International Society for Optics and Photonics. p. 1141904.
[21] Fu, H., Cheng, J., Xu, Y., Wong, D.W.K., Liu, J., Cao, X., 2018. Joint optic disc and cup segmentation
based on multi-label deep network and polar transformation. IEEE transactions on medical imaging
of
37, 1597–1605.
[22] Fumero, F., Alayón, S., Sanchez, J.L., Sigut, J., Gonzalez-Hernandez, M., 2011. RIM-ONE: An
ro
open retinal image database for optic nerve evaluation, in: 2011 24th international symposium on
computer-based medical systems (CBMS), IEEE. pp. 1–6.
[23] Furtado, P., 2020. Deep semantic segmentation of diabetic retinopathy lesions: what metrics really
-p
tell us, in: Medical Imaging 2020: Biomedical Applications in Molecular, Structural, and Functional
Imaging, International Society for Optics and Photonics. p. 113170O.
re
[24] Gegundez-Arias, M.E., Marin, D., Bravo, J.M., Suero, A., 2013. Locating the fovea center position
in digital fundus images using thresholding and feature extraction techniques. Computerized Medical
Imaging and Graphics 37, 386–393.
lP
[25] Glorot, X., Bengio, Y., 2010. Understanding the difficulty of training deep feedforward neural networks,
in: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp.
249–256.
na
[26] Group, E.T.D.R.S.R., et al., 2020. Grading diabetic retinopathy from stereoscopic color fundus pho-
tographs—an extension of the modified airlie house classification: Etdrs report number 10. Ophthal-
mology 127, S99–S119.
ur
[27] Gu, Z., Cheng, J., Fu, H., Zhou, K., Hao, H., Zhao, Y., Zhang, T., Gao, S., Liu, J., 2019. Ce-net:
Context encoder network for 2d medical image segmentation. IEEE transactions on medical imaging
38, 2281–2292.
Jo
[28] Guo, X., Wang, H., Lu, X., Hu, X., Che, S., Lu, Y., 2020. Robust fovea localization based on symmetry
measure. IEEE journal of biomedical and health informatics .
[29] Harangi, B., 2018. Skin lesion classification with ensembles of deep convolutional neural networks.
Journal of biomedical informatics 86, 25–32.
[30] Hasan, M.K., Alam, M.A., Das, D., Hossain, E., Hasan, M., 2020a. Diabetes prediction using ensem-
32
bling of different machine learning classifiers. IEEE Access 8, 76516–76531.
[31] Hasan, M.K., Dahal, L., Samarakoon, P.N., Tushar, F.I., Martı́, R., 2020b. DSNet: Automatic
dermoscopic skin lesion segmentation. Computers in Biology and Medicine , 103738.
[32] He, K., Zhang, X., Ren, S., Sun, J., 2016a. Deep residual learning for image recognition, in: Proceed-
ings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
[33] He, K., Zhang, X., Ren, S., Sun, J., 2016b. Identity mappings in deep residual networks, in: European
conference on computer vision, Springer. pp. 630–645.
[34] Ioffe, S., Szegedy, C., 2015. Batch normalization: Accelerating deep network training by reducing
internal covariate shift. arXiv:1502.03167 .
[35] Jones, O., Jurascheck, L., van Melle, M., Hickman, S., Burrows, N., Hall, P., Emery, J., Walter, F.,
of
2019. Dermoscopy for melanoma detection and triage in primary care: a systematic review. BMJ
open 9, e027529.
ro
[36] Joshi, P., KS, R.R., Masilamani, V., Alike, J., Suresh, K., Kumaresh, K., 2019. Optic disc localization
using interference map and localized segmentation, in: 2019 IEEE 1st International Conference on
Energy, Systems and Information Processing (ICESIP), IEEE. pp. 1–4.
-p
[37] Kamble, R., Kokare, M., Deshmukh, G., Hussin, F.A., Mériaudeau, F., 2017. Localization of optic
disc and fovea in retinal images using intensity based line scanning analysis. Computers in biology
re
and medicine 87, 382–396.
[38] Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv:1412.6980 .
[39] Kobrin Klein, B.E., 2007. Overview of epidemiologic studies of diabetic retinopathy. Ophthalmic
lP
[41] Krizhevsky, A., Sutskever, I., Hinton, G.E., 2012. Imagenet classification with deep convolutional
neural networks, in: Advances in neural information processing systems, pp. 1097–1105.
[42] Kumar, S., Adarsh, A., Kumar, B., Singh, A.K., 2020. An automated early diabetic retinopathy
ur
detection through improved blood vessel and optic disc segmentation. Optics & Laser Technology
121, 105815.
[43] Li, C., Kao, C.Y., Gore, J.C., Ding, Z., 2007. Implicit active contours driven by local binary fitting
Jo
energy, in: 2007 IEEE Conference on Computer Vision and Pattern Recognition, IEEE. pp. 1–7.
[44] Li, X., Shen, L., Duan, J., 2018. Optic disc and fovea detection using multi-stage region-based con-
volutional neural network, in: Proceedings of the 2nd International Symposium on Image Computing
and Digital Medicine, pp. 7–11.
[45] Lin, J., Thompson, T.J., Cheng, Y.J., Zhuo, X., Zhang, P., Gregg, E., Rolka, D.B., 2018a. Projection
33
of the future diabetes burden in the united states through 2060. Population health metrics 16, 9.
[46] Lin, J.w., Weng, Q., Yu, L., 2018b. Fast fundus optic disc localization based on main blood ves-
sel extraction, in: Proceedings of the 2018 10th International Conference on Machine Learning and
Computing, pp. 242–246.
[47] Long, J., Shelhamer, E., Darrell, T., 2015. Fully convolutional networks for semantic segmentation,
in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440.
[48] Lu, Z., Chen, D., 2020. Weakly supervised and semi-supervised semantic segmentation for optic disc
of fundus image. Symmetry 12, 145.
[49] Maiya, S.R., Mathur, P., 2020. Rethinking retinal landmark localization as pose estimation: Naı̈ve
single stacked network for optic disk and fovea detection, in: ICASSP 2020-2020 IEEE International
of
Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE. pp. 1125–1129.
[50] Marin, D., Gegundez-Arias, M.E., Suero, A., Bravo, J.M., 2015. Obtaining optic disc center and pixel
ro
region by automatic thresholding methods on morphologically processed fundus images. Computer
methods and programs in biomedicine 118, 173–185.
[51] Mittapalli, P.S., Kande, G.B., 2016. Segmentation of optic disk and optic cup from digital fundus
-p
images for the assessment of glaucoma. Biomedical Signal Processing and Control 24, 34–46.
[52] Naqvi, S.S., Fatima, N., Khan, T.M., Rehman, Z.U., Khan, M.A., 2019. Automatic optic disk detection
re
and segmentation by variational active contour estimation in retinal fundus images. Signal, Image
and Video Processing 13, 1191–1198.
[53] Nibali, A., He, Z., Morgan, S., Prendergast, L., 2018. Numerical coordinate regression with convolu-
lP
weakly supervised learning, in: 2020 Chinese Control And Decision Conference (CCDC), IEEE. pp.
4791–4794.
[56] Panda, R., Puhan, N.B., Panda, G., 2015. Global vessel symmetry for optic disc detection in reti-
ur
nal images, in: 2015 Fifth National Conference on Computer Vision, Pattern Recognition, Image
Processing and Graphics (NCVPRIPG), IEEE. pp. 1–4.
[57] Pereira, C.S., Fernandes, H., Mendonça, A.M., Campilho, A., 2007. Detection of lung nodule can-
Jo
didates in chest radiographs, in: Iberian Conference on Pattern Recognition and Image Analysis,
Springer. pp. 170–177.
[58] Porwal, P., Pachade, S., Kadethankar, A., Joshi, A., Patwardhan, V., Kamble, R., Kokare, M., Meri-
audeau, F., 2018a. Automatic segmentation of optic disc by gradient minimization based approach,
in: 2018 International Conference on Intelligent and Advanced System (ICIAS), IEEE. pp. 1–5.
34
[59] Porwal, P., Pachade, S., Kamble, R., Kokare, M., Deshmukh, G., Sahasrabuddhe, V., Meriaudeau,
F., 2018b. Indian diabetic retinopathy image dataset (idrid): A database for diabetic retinopathy
screening research. Data 3, 25.
[60] Pruthi, J., Khanna, K., Arora, S., 2020. Optic Cup segmentation from retinal fundus images using
Glowworm Swarm Optimization for glaucoma detection. Biomedical Signal Processing and Control
60, 102004.
[61] Raj, P.K., Kumar, J.H., Jois, S., Harsha, S., Seelamantula, C.S., 2019. A structure tensor based
voronoi decomposition technique for optic cup segmentation, in: 2019 IEEE International Conference
on Image Processing (ICIP), IEEE. pp. 829–833.
[62] Ramani, R.G., Shanthamalar, J.J., 2020. Improved image processing techniques for optic disc seg-
of
mentation in retinal fundus images. Biomedical Signal Processing and Control 58, 101832.
[63] Rathod, D.D., Manza, R.R., Rajput, Y.M., Patwari, M.B., Saswade, M., Deshpande, N., 2014. Lo-
ro
calization of optic disc and macula using multilevel 2-D wavelet decomposition based on haar wavelet
transform. International Journal of Engineering Research & Technology (IJERT) 3.
[64] Razeen, S.F.A., Rajinikanth, V., Tamizharasi, P., Varthini, B.P., et al., 2020. Examination of Optic
-p
Disc Sections of Fundus Retinal Images—A Study with Rim-One Database, in: Intelligent Data
Engineering and Analytics. Springer, pp. 711–719.
re
[65] Redmon, J., Divvala, S., Girshick, R., Farhadi, A., 2016. You only look once: Unified, real-time object
detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp.
779–788.
lP
[66] Ren, S., He, K., Girshick, R., Sun, J., 2015. Faster r-cnn: Towards real-time object detection with
region proposal networks, in: Advances in neural information processing systems, pp. 91–99.
[67] Resnikoff, S., Felch, W., Gauthier, T.M., Spivey, B., 2012. The number of ophthalmologists in practice
na
and training worldwide: a growing gap despite more than 200 000 practitioners. British Journal of
Ophthalmology 96, 783–787.
[68] Resnikoff, S., Pascolini, D., Etya’Ale, D., Kocur, I., Pararajasegaram, R., Pokharel, G.P., Mariotti,
ur
S.P., 2004. Global data on visual impairment in the year 2002. Bulletin of the world health organization
82, 844–851.
[69] Ricci, E., Perfetti, R., 2007. Retinal blood vessel segmentation using line operators and support vector
Jo
35
iterated graph cuts. ACM transactions on graphics (TOG) 23, 309–314.
[72] Roychowdhury, S., Koozekanani, D.D., Kuchinka, S.N., Parhi, K.K., 2015. Optic disc boundary and
vessel origin segmentation of fundus images. IEEE journal of biomedical and health informatics 20,
1562–1574.
[73] Sarathi, M.P., Dutta, M.K., Singh, A., Travieso, C.M., 2016. Blood vessel inpainting based technique
for efficient localization and segmentation of optic disc in digital fundus images. Biomedical Signal
Processing and Control 25, 108–117.
[74] Sevastopolsky, A., 2017. Optic disc and cup segmentation methods for glaucoma detection with
modification of U-Net convolutional neural network. Pattern Recognition and Image Analysis 27,
618–624.
of
[75] Shorten, C., Khoshgoftaar, T.M., 2019. A survey on image data augmentation for deep learning.
Journal of Big Data 6, 60.
ro
[76] Sivaswamy, J., Krishnadas, S., Chakravarty, A., Joshi, G., Tabish, A.S., et al., 2015. A comprehen-
sive retinal image dataset for the assessment of glaucoma from the optic nerve head analysis. JSM
Biomedical Imaging Data Papers 2, 1004.
-p
[77] Sivaswamy, J., Krishnadas, S., Joshi, G.D., Jain, M., Tabish, A.U.S., 2014. Drishti-gs: Retinal image
dataset for optic nerve head (onh) segmentation, in: 2014 IEEE 11th international symposium on
re
biomedical imaging (ISBI), IEEE. pp. 53–56.
[78] Song, W., Liang, Y., Wang, K., He, L., 2020. T-net: A template-supervised network for task-specific
feature extraction in biomedical image analysis. arXiv:2002.08406 .
lP
[79] Staal, J., Abràmoff, M.D., Niemeijer, M., Viergever, M.A., Van Ginneken, B., 2004. Ridge-based
vessel segmentation in color images of the retina. IEEE transactions on medical imaging 23, 501–509.
[80] Tajbakhsh, N., Jeyaseelan, L., Li, Q., Chiang, J.N., Wu, Z., Ding, X., 2020. Embracing imperfect
na
datasets: A review of deep learning solutions for medical image segmentation. Medical Image Analysis
, 101693.
[81] Tan, J.H., Acharya, U.R., Bhandary, S.V., Chua, K.C., Sivaprasad, S., 2017. Segmentation of optic
ur
disc, fovea and retinal vasculature using a single convolutional neural network. Journal of Computa-
tional Science 20, 70–79.
[82] Thakur, N., Juneja, M., 2019. Optic disc and optic cup segmentation from retinal images using hybrid
Jo
36
a graphical model for human pose estimation, in: Advances in neural information processing systems,
pp. 1799–1807.
[85] Trelea, I.C., 2003. The particle swarm optimization algorithm: convergence analysis and parameter
selection. Information processing letters 85, 317–325.
[86] Vezhnevets, V., Konouchine, V., 2005. GrowCut: Interactive multi-label nd image segmentation by
cellular automata, in: proc. of Graphicon, Citeseer. pp. 150–156.
[87] Wang, L., Liu, H., Lu, Y., Chen, H., Zhang, J., Pu, J., 2019. A coarse-to-fine deep learning framework
for optic disc segmentation in fundus images. Biomedical Signal Processing and Control 51, 82–89.
[88] Wu, X., Dai, B., Bu, W., 2016. Optic disc localization using directional models. IEEE Transactions
on Image Processing 25, 4433–4442.
of
[89] Xie, Z., Ling, T., Yang, Y., Shu, R., Liu, B.J., 2020. Optic disc and cup image segmentation utilizing
contour-based transformation and sequence labeling networks. Journal of Medical Systems 44, 1–13.
ro
[90] Xu, L., Lu, C., Xu, Y., Jia, J., 2011. Image smoothing via L0 gradient minimization, in: Proceedings
of the 2011 SIGGRAPH Asia Conference, pp. 1–12.
[91] Xue, J., Yan, S., Qu, J., Qi, F., Qiu, C., Zhang, H., Chen, M., Liu, T., Li, D., Liu, X., 2019. Deep
-p
membrane systems for multitask segmentation in diabetic retinopathy. Knowledge-Based Systems 183,
104887.
re
[92] Yang, W., Li, S., Ouyang, W., Li, H., Wang, X., 2017. Learning feature pyramids for human pose
estimation, in: proceedings of the IEEE international conference on computer vision, pp. 1281–1290.
[93] Yau, J.W., Rogers, S.L., Kawasaki, R., Lamoureux, E.L., Kowalski, J.W., Bek, T., Chen, S.J., Dekker,
lP
J.M., Fletcher, A., Grauslund, J., et al., 2012. Global prevalence and major risk factors of diabetic
retinopathy. Diabetes care 35, 556–564.
[94] Yu, S., Xiao, D., Frost, S., Kanagasingam, Y., 2019. Robust optic disc and cup segmentation with
na
deep learning for glaucoma detection. Computerized Medical Imaging and Graphics 74, 61–71.
[95] Zabihollahy, F., Ukwatta, E., 2019. Fully-automated segmentation of optic disk from retinal images
using deep learning techniques, in: Medical Imaging 2019: Computer-Aided Diagnosis, International
ur
[97] Zhang, J., Chen, Y., Bekkers, E., Wang, M., Dashtbozorg, B., ter Haar Romeny, B.M., 2017. Retinal
vessel delineation using a brain-inspired wavelet transform and random forest. Pattern Recognition
69, 107–123.
[98] Zhang, L., Lim, C.P., 2020. Intelligent optic disc segmentation using improved particle swarm opti-
mization and evolving ensemble models. Applied Soft Computing , 106328.
37
[99] Zhou, W., Qiao, S., Yi, Y., Han, N., Chen, Y., Lei, G., 2020. Automatic optic disc detection using
low-rank representation based semi-supervised extreme learning machine. International Journal of
Machine Learning and Cybernetics 11, 55–69.
[100] Zhou, Z., Siddiquee, M.M.R., Tajbakhsh, N., Liang, J., 2019. Unet++: Redesigning skip connections
to exploit multiscale features in image segmentation. IEEE transactions on medical imaging 39, 1856–
1867.
[101] ZJU-BII-SGEX Group, 2018. Medical Image Analysis Group. https://idrid.grand-challenge.
org/Leaderboard/ [Accessed: 18 May 2020].
[102] Zou, B., Liu, Q., Yue, K., Chen, Z., Chen, J., Zhao, G., 2019. Saliency-based segmentation of optic
disc in retinal images. Chinese Journal of Electronics 28, 71–75.
of
ro
-p
re
lP
na
ur
Jo
38
Appendix A. Additional Segmentation and Localization Results
of
ro
-p
re
lP
na
ur
Jo
Figure A.12: Additional results for OD segmentation (top), OD center localization (middle), and Fovea
center localization (bottom), where blue and green colors respectively denote ground-truth and prediction.
39
Major Highlights
1. Precise segmentation and localization of optic disc and Fovea using proposed
DRNet
of
training, and testing times)
ro
4. Localization of optic disc and Fovea centers by proposing 2D bell-shaped
-p
5. A generic DRNet by image augmentation although small datasets are being
re
utilized
lP
na
ur
Jo
Train
Preprocessing
DRNet Task-1
Training phase OD Segmentation
of
ro
-p
re
lP DRNet
na
ur
Jo
Task-2
Query Image OD and Fovea Localization