Abstract
Deep convolutional neural networks have shown outstanding performance in medical image segmentation tasks. The usual problem when training supervised deep learning methods is the lack of labeled data which is time-consuming and costly to obtain. In this paper, we propose a novel uncertainty guided semi-supervised learning based on student-teacher approach for training the segmentation network using limited labeled samples and large number of unlabeled images. First, a teacher segmentation model is trained from the labeled samples using Bayesian deep learning. The trained model is used to generate soft segmentation labels and uncertainty map for the unlabeled set. The student model is then updated using the softly segmented samples and the corresponding pixel-wise confidence of the segmentation quality estimated from the uncertainty of the teacher model using a newly designed loss function. Experimental results on a retinal layer segmentation task show that the proposed method improves the segmentation performance in comparison to the fully supervised approach and is on par with the expert annotator. The proposed semi-supervised segmentation framework is a key contribution and applicable for biomedical image segmentation across various imaging modalities where access to annotated medical images is challenging.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Segmentation of anatomical regions in biomedical images such as optical coherence tomography (OCT) retinal scans is of great clinical significance especially for disease diagnosis, progression analysis and treatment planning. For example, the progressive thinning of circumpapillary retinal nerve fiber layer (cpRNFL) thickness measured by OCT can be used to predict the visual functional loss in patient with glaucoma [6].
In the past few years, Convolutional Neural Networks (CNNs) based methods such as Unet [9] and Dense-Unet [4, 7] have achieved remarkable performance gain in medical image and natural image segmentation. The networks are trained end-to-end, pixels-to-pixels on semantic segmentation exceeded the most state-of-the-art methods without further machinery. For example, such network have been used for retinal structure segmentation in fundus [8] and OCT [10, 11] images. Such fully supervised segmentation algorithms requires large number of annotated images to achieve reasonable robustness and accuracy. However, acquiring pixel-wise ground truth annotations can be time-consuming and costly in medical imaging domain where only experts can provide reliable annotations. The under-supply of the labeled data motivates the need for effective machine learning methods that require limited supervision, such as semi-supervised learning.
Semi-supervised learning tackle this problem by leveraging large number of readily available unlabeled data along with the limited labeled data to improve the performance. For example, semi-supervised approaches have been applied to different medical imaging tasks such as MRI segmentation [1], lung nodule detection and retinal vessel segmentation [14]. In another approaches, [2] uses auxiliary manifold learning in the latent space for MS lesion segmentation task and [12] uses the feature embedding obtained from unlabeled images to segment optic cup in retinal fundus images.
In this paper, we propose a novel semi-supervised approach to leverage unlabeled images to segment the retinal layers in OCT images. The proposed method consists of two components: (a) student segmentation network which is responsible for learning suitable data representation and learning the main segmentation task and (b) teacher segmentation network which controls the learning of the student network by modeling the unreliability in segmentation prediction. First, the teacher model is trained on the labeled set using Bayesian deep learning to capture the uncertainty map and is used to generate soft segmentation labels for the unlabeled samples. The uncertainty map indicates pixel-wise unreliability of the soft labels. Based on the uncertainty map, we further propose a novel loss function to guide the student model by adaptively weighting regions with unreliable soft labels to improve final segmentation performance. Our proposed algorithm has been applied to the task of retinal layer segmentation in OCT images from the optic nerve head. Experimental results indicate that our proposed algorithm can improve the segmentation accuracy compared to the state-of-the-art fully supervised OCT segmentation methods and is in par with the human expert.
2 Proposed Semi-supervised Segmentation Method
In this section, we describe our proposed uncertainty guided semi-supervised learning. We assume that we are given a large set of unlabeled images \(D_u=(\mathbf {x}_i)\) and a small set of high quality labeled images \(D_l=\left\{ (\mathbf {x}_{i},\mathbf {y}_{i})\right\} \) where \(\mathbf {x}_i\) is image and \(\mathbf {y}_i\) is the segmentation annotation. As shown in Fig. 1(a), our proposed approach involves two kind of deep neural networks called the teacher segmentation network \(F_T(\mathbf {x}) \) and student segmentation network \(F_S(\mathbf {x})\). The teacher network is trained using labeled dataset based on Bayesian deep learning to output both segmentation map as well as uncertainty map. The teacher network is then applied to each image in the unlabeled set to obtain the segmentation label and associated uncertainty map to generate the softly labeled samples. The uncertainty map captures the pixel-wise confidence values indicating the reliability of the segmentation output from the teacher network and it is used to guide the training of the student network learning which we describe in Sect. 2.2.
We use DenseUNet architecture [4, 7] as our base model for both teacher and student networks. As shown in Fig. 1(b) the model consists of three dense blocks in encoder and decoder and a bottleneck dense block with Unet like skip connections between the output of the encoder dense blocks and input of the decoder dense blocks. Each dense block contains four convolution units each comprising of 8 \((3\times 3)\) convolution, batch normalization (BN) and ReLU layer where the output of each unit is fed to the subsequent ones. Therefore each dense block produces 32 feature maps. The final prediction layer is a convolution layer with channel number equivalent to number of classes C followed by a softmax activation function. We use SpatialDropout [13] layer before every convolution layer with dropout rate of 0.2.
2.1 Teacher Segmentation Network as a Bayesian Model
We model the teacher segmentation network \(F_T(\mathbf {x})\) using Bayesian deep learning to capture the segmentation uncertainties for the student model, estimated with respect to the labeled data. We adopt the approach introduced by [3] based on the dropout variational inference to compute the segmentation uncertainty. First we train the \(F_T(\mathbf {x})\) from the labeled sample set \(D_l\) using the class weighted categorical cross entropy loss. For segmentation and uncertainty quantification, we enable the dropout in test phase and the output predictive distribution is obtained by performing K stochastic forward passes through the network, i.e., \(\mathbf {y}^{k}=F_T^k(x), k=1,\cdots , K\) where \(F_T^k\) is an effective network after the spatial dropout operation. In each forward pass, the fraction of convolution feature-maps (denoted by dropout rate) are disabled and the segmentation score is computed using only the remaining feature-maps. The segmentation score vector \(\mathbf {\bar{y}}\) is obtained by averaging the K samples, via monte carlo integration:
The average score vector contains the probability score for each class, ie \(\mathbf {\bar{y}} = [\bar{y}_1,\cdots ,\bar{y}_C ] \). The overall segmentation uncertainty for each pixel can be obtained by computing the entropy of the average probability vector:
Higher segmentation uncertainty is obtained when the network assigns higher probabilities to different classes for different forward passes. Conversely, for the confident predictions, network assigns higher probability to the true class for any forward passes, resulting in lower uncertainty value.
2.2 Uncertainty Guided Learning of Student Network
Here, we describe the process of learning the student segmentation network \(F_s(\mathbf {x})\) from both unlabeled and labeled data with the guidance from teacher segmentation network \(F_T(\mathbf {x})\). We first apply \(F_T(\mathbf {x})\) to the unlabeled images \(\mathbf {x}_u \in U\) to obtain the soft segmentation map \(\mathbf {z}\) using Eq. 1 and the associated segmentation uncertainty map \(\mathbf {u}\) using Eq. 2. The higher values in the uncertainty map denotes the regions where generated soft labels are likely to be incorrect and needs to be down-weighted while updating \(F_s(\mathbf {x})\). We convert the uncertainty map \(\mathbf {u}\) to obtain the normalized confidence map as:
where \(\alpha \) is a positive scalar hyper-parameter and the confidence map \(\varvec{\omega } \in [0,1] \) provides the pixel-wise quality of the soft labels produced by \(F_T(\mathbf {x})\) such that higher uncertainty values produces low quality score and vice versa. The unlabeled loss is then formulated as the confidence weighted cross entropy as:
where
such that \(\zeta _{c}\) weights the contribution of each class to mitigate the effect of class imbalance in soft labels due to its confidence weights; \(Z_{c}\) denotes the pixels region of the \(c^{th}\) class in the soft label \(\mathbf {z}\) and \(z_{c}^{t}\) is the softmax output from \(F_s(\mathbf {x})\) for the \(t^{th}\) pixel and \(c^{th}\) class. The Eq. 5 sets \(\zeta _{c}=0\), when the effective number of pixels per class \( \smash {\sum _{\forall Z_{c}}} \omega _{t} \le P\) to improve the stability of unlabeled loss which can happen when majority of pixels of \(Z_c\) are uncertain. We empirically set \(P=50\) for our retinal segmentation task. Finally, semi-supervised loss is a sum of both labeled and unlabeled loss:
where \(L_{lab}\) is a categorical crossentropy computed from the labeled mini-batch samples. The training steps of our method is shown in Algorithm 1.
The proposed semi-supervised loss function encourages the network to discard the pixels with inaccurate soft labels generated by \(F_T(\mathbf {x})\). The hyper-parameter \(\alpha \) in Eq. 3 controls the information flow from \(F_T(\mathbf {x})\) to \(F_s(\mathbf {x})\). Intuitively small \(\alpha \) allows the student to blindly follow teacher, whereas the bigger \(\alpha \) controls the learning by giving emphasis on the teacher’s uncertainty. For example, setting \(\alpha =0\) is equivalent to using all the soft labels whereas setting \(\alpha >0\) allows probabilistic selection the soft labels that are more certain. We empirically set the value of \(\alpha \) using validation which we describe in Sect. 3.
We train the student network for 40000 iterations or until the validation loss converges using mini-batch gradient descent and the Adam optimizer with momentum and a batch size of 1 for each labeled and unlabeled component. The learning rate is set to \(10^{-5}\) which is decreased by one tenth after 10000 iterations of the training. We augment the training images and corresponding label map masks through a mirror-image reflection and random rotation within the range of \([-15, 15]\) degrees.
3 Experiments
The dataset consists of 570 spectral-domain optical coherence tomography (OCT) optic nerve volumes acquired using commercial OCT device (Cirrus HD-OCT; Zeiss). Each OCT volume consists of 200 BScans of size \(1024 \times 200\). We take 700 BScans sampled from 70 OCT volumes to create a labeled set where the ground truth has been obtained by manual annotation of the nine boundaries from eight retinal layers [5]. We convert the layer boundaries to the probability map for the eight layers regions and the background region. Therefore, the number of classes is \(C=9\).
Out of 700 labeled images, we select 490 images from 49 volumes to create a labeled training set, 70 images from 7 volumes for validation set and 140 images from 14 volume as a test set. Beside the annotation of expert E1 on the 700 labeled set, we also obtained a second set of annotation for the test from the second expert E2 to compare with E1. We then use 10000 BScans sampled from the remaining 500 volumes as an unlabeled set.
We compare our proposed uncertainty guided semi-supervised layer segmentation (U-SLS) method with the baseline fully supervised Dense-Unet (FS-DU) [4] that does not take unlabeled images into account and the plain semi-supervised learning (Plain-SL) method that blindly follows the teacher model without taking uncertainty into account, i.e., for the case where \(\alpha =0\) in Eq. 3. Figure 2(b)–(d) shows the examples of soft labels and uncertainty maps produced by the teacher model \(F_T(\mathbf {x})\) on an unlabeled image. It can be seen that the teacher model trained on less number of labeled data produces less accurate soft labels and the corresponding uncertainty map correlates with the inaccuracies in the generated soft labels. For our method U-SLS, we set the optimal value of \(\alpha =2\) using validation set as shown in Fig. 2(e).
Table 1 compares the average Dice coefficient (DC) between the ground truth and generated segmentations by the proposed U-SLS, the plain-SLS methods and the fully supervised Dense-Unet method. The proposed method U-SLS resulted in average DC of 0.90 for RNFL layer and 0.82 across all eight layers when trained on the full labeled training set and the unlabeled set improving over both FS-DU and PLain-SLS. However, when we reduce the number of labeled training samples, the improvement of the proposed method was more significant than both FS-DU and Plain-SLS. This demonstrates that the proposed approach improves the segmentation performance when the number of labeled images are limited. Moreover, the lower performance of Plain-SLS shows that student model is corrupted by the soft labels when uncertainty is not taken into account. On the other hand U-SLS improves the performance by uncertainty guided learning from the unlabeled samples.
Table 2 compares the performance of our method with the performance of the human annotator E2. It can be observed that performance of our method is on par with the human expert for most of the layers, including RNFl, GCL+IPL, INL, ONL and OS. We also report the confident version of our method (U-SLS-Conf) by evaluating on the pixels whose confidence score \(\omega > 0.5\) (computed using Eq. 3) which comprises \(95\%\) of the total number of pixels on average. As expected, U-SLS-Conf significantly improves over U-SLS which shows that the uncertainty measure produced by our method highly correlates with the segmentation inaccuracies.
Figure 3 (left) shows the examples of the retinal layers segmentation and the generated uncertainty map. Figure 3 (right) shows the precision recall curve for RNFL layer comparing our method with the human annotator. This shows that performance U-SLS is comparable with the human expert, whereas the confident version, U-SLS-Conf exceeded the human expert.
4 Conclusion
In this paper, we presented a novel and effective semi-supervised method, based on student-teacher framework, for segmentation of OCT images of retina. The proposed method is able to leverage large volume of unlabeled noisy data and incorporate uncertainty for improved segmentation of retina structures, compared to the state-of-the-art fully and semi-supervised segmentation methods. We have demonstrated that the proposed uncertainty guided method can effectively transfer knowledge from the teacher to the student model for the segmentation task and is able to generate expert-level segmentation using limited number of labeled samples. Therefore, our approach is useful in clinical applications where access to large volume of annotated images (which is needed for state-of-the-art fully supervised approaches) is challenging. Although, we have applied our approach in retinal image segmentation, we believe that our method is equally applicable to other modalities.
References
Bai, W., et al.: Semi-supervised learning for network-based cardiac MR image segmentation. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 253–260. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66185-8_29
Baur, C., Albarqouni, S., Navab, N.: Semi-supervised deep learning for fully convolutional networks. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 311–319. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_36
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML, pp. 1050–1059 (2016)
Jégou, S., Drozdzal, M., Vázquez, D., Romero, A., Bengio, Y.: The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: CVPR Workshops, pp. 1175–1183 (2017)
Lang, A., et al.: Retinal layer segmentation of macular OCT images using boundary classification. Biomed. Opt. Express 4(7), 1133–1152 (2013)
Leung, C.K., Cheung, C.Y., Weinreb, R.N., Qiu, K., Liu, S.: Evaluation of retinal nerve fiber layer progression in glaucoma: a study on optical coherence tomography guided progression analysis. Invest. Ophthalmol. Vis. Sci. 51(1), 217–222 (2010)
Li, X., Chen, H., Qi, X., Dou, Q., Fu, C., Heng, P.: H-DenseUNet: hybrid densely connected unet for liver and tumor segmentation from CT volumes. IEEE Trans. Med. Imaging 37(12), 2663–2674 (2018)
Maninis, K.-K., Pont-Tuset, J., Arbeláez, P., Van Gool, L.: Deep retinal image understanding. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 140–148. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_17
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Roy, A.G., et al.: ReLayNet: retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional networks. Biomed. Opt. Express 8(8), 3627–3642 (2017)
Sedai, S., Antony, B., Mahapatra, D., Garnavi, R.: Joint segmentation and uncertainty visualization of retinal layers in optical coherence tomography images using Bayesian deep learning. In: Stoyanov, D., et al. (eds.) OMIA/COMPAY -2018. LNCS, vol. 11039, pp. 219–227. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00949-6_26
Sedai, S., Mahapatra, D., Hewavitharanage, S., Maetschke, S., Garnavi, R.: Semi-supervised segmentation of optic cup in retinal fundus images using variational autoencoder. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10434, pp. 75–82. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66185-8_9
Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR, pp. 648–656 (2015)
You, X., Peng, Q., Yuan, Y., Cheung, Y., Lei, J.: Segmentation of retinal blood vessels using the radial projection and semi-supervised approach. Pattern Recogn. 44(10–11), 2314–2324 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sedai, S. et al. (2019). Uncertainty Guided Semi-supervised Segmentation of Retinal Layers in OCT Images. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11764. Springer, Cham. https://doi.org/10.1007/978-3-030-32239-7_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-32239-7_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32238-0
Online ISBN: 978-3-030-32239-7
eBook Packages: Computer ScienceComputer Science (R0)