Keywords

1 Introduction

Dysplasia is characterized by alterations on the cell information such as size, shape and brightness intensity. In developing countries, this anomaly is a common type of pre-cancerous lesions that can be classified as mild, moderate and severe [5]. Cancer is the second most common cause of death and severe class dysplasias have a 3% to 36% chance of progressing to this type of malignant lesion [12]. Usually, the diagnosis for these lesions is performed by analysing the size of the lesion and the intensity of morphological alterations in tissue nuclei. However, different size lesions may have similar levels of nuclear alterations [12]. Thus, pathologists may assign different levels of dysplasia to lesions with similar level of alterations [17]. Moreover, there are several malignant lesions that are identified only by the intensity of nuclei alterations [12].

With the advance of digital systems applied in microscopes for histological analysis, specialists can obtain data that allows investigation using computational algorithms. The development of digital tools for analysis can assist pathologists in decision making as well as reduce the error rate caused by subjectivity [3]. These systems are known as computer aided diagnosis (CAD), which provide quantitative analysis in a large number of data and features [7]. This system shows a set of steps ranging from the signal-to-noise ratio improvement, segmentation, feature extraction and data classification. Segmentation is an important step that allows the identification of objects that will be analyzed by feature descriptors and classified in subsequent steps [7]. The main ways of applying this stage are by discontinuity or similarity of pixel brightness intensity. Discontinuity is based on the detection of abrupt variations of the pixel intensity to delimit region borders. The goal of the similarity segmentation is to divide an image into regions with similar features based on established criteria [7].

In literature, there are several studies that investigate the segmentation of cellular structures of malignant lesions based on computer vision algorithms that employed thresholding techniques, region merging and semantic information [11, 15]. With the goal of identifying nuclei in epithelial breast tissues, the authors in [10] proposed a method that employs color deconvolution and Otsu thresholding for nuclei segmentation. In [1], the K-means method was used to segment and identify cells in lymphoma images. In [15], the authors proposed a method that segment cell nuclei using neural networks for semantic segmentation where each pixel is assigned to a class region present in the image. In the context of histological images, segmentation of epithelial nuclei is a complex task due to irregular characteristics shown by nuclei, such as dye variation. These nuclei may have color and aspects similar to other structures present in the tissue [11]. This shows that it is an area where there are major challenges related to nuclei segmentation. In the case of dysplasia, this process can be more complex due to the growth of the connective tissue that can invade the epithelial tissue and difficult the nuclei segmentation [12]. The described proposals have not yet considered a method focused to the segmentation of epithelial nuclei as proposed in this work.

This work proposes a segmentation method based on region-based convolutional neural networks (R-CNN) aiming to identify cell nuclei present in oral histological tissues. For this, in the first step individual nuclei masks were generated to train the network. In the training step, bounding boxes were defined for candidate objects and combined with the masks. In the segmentation step, the R-CNN classifies each image pixel in nuclei or background based on these pixels neighborhood. As post-processing step, morphological operations of dilation and erosion were combined with hole-fill operations and small object exclusion to eliminate false negatives and positives. A dataset of mice images was used for the evaluation of the method. The results were evaluated in relation to the gold standard and a comparison was made employing methods used in literature.

This paper is organized as follows. In the Sect. 2 we detail the dataset, the methods used to train the network, the segmentation, post-processing and the method for quantitative analysis. The obtained results and the comparison with other segmentation methods present in literature are shown in Sect. 3. The Sect. 4 presents discussions and our work conclusions.

2 Methodology

2.1 Image Dataset

The dataset was built from tongue slides of 30 mice previously submitted to a carcinogen during two experiments performed between 2009 and 2010, duly approved by the Ethics Committee on the Use of Animals under protocol number 038/09 at the Federal University of Uberlândia, Brazil.

The histological images were obtained using the LeicaDM500 light microscope with a magnification of 400x and saved in TIFF format using the RGB color model and resolution of \(2048 \times 1536\) pixels. In total, 43 images were obtained and with the aid of a pathologist were classified into healthy tissues, mild dysplasia, moderate dysplasia and severe dysplasia. After digitalization, these images were cropped in regions of interest (ROI) with a size of \(450 \times 250\) pixels, totalling 296 ROI with 74 ROI for each class. In Fig. 1 some cases of dysplasia and a case of a healthy region of histological images of the buccal cavity are shown. The lesions were manually marked by a specialist and automatically analyzed by the proposed method (gold standard).

Fig. 1.
figure 1

Histological images of oral epithelial tissues: (a) healthy tissue, (b) mild oral dysplasia, (c) moderate dysplasia and (d) severe dysplasia.

2.2 Segmentation via Mask R-CNN

The Mask R-CNN model is based on the Faster R-CNN model, which has two stages: the first is called the region proposal network (RPN), which proposes to define bounding boxes for candidate objects [14]; in the second stage, bounding box regression is used to refine the area of the boxes [8].

Then, in the training step is applied the loss function defined by:

$$\begin{aligned} L = L_{cls} + L_{box} + L_{mask} \end{aligned}$$
(1)

In this context, we have \(L_{cls} = -\log p_u\), where p is the distribution probability of each ROI over the \(K+1\) possible classes and u the gold standard of the ROI class [6]. The bounding boxes loss is defined by:

$$\begin{aligned} _{box}(t^u, v) = \sum _{i \in (x,y,w,h)} smooth_{L_{1}}(t_i^u - v_i), \end{aligned}$$
(2)

where:

(3)

where \(t^u\) is the regression of the boxes, v is the gold standard of the box regression, x and y are the coordinates of the upper left corner of each ROI, and h and w are height and width information for the region [6].

The output of this step is the masks with size \(Km^2\), containing K binary masks of resolution \(m \times n\), one for each of the K classes. In this study, the \(L_{mask}\) adopted was the average binary cross-entropy loss as described in the work of [8]. The main reason for the choice of masks was the advantage of allowing the spatial representation of the objects. Thus, this information was extracted by pixel-by-pixel correspondence through convolution operations.

In this work, Mask R-CNN was applied to the Resnet-50 convolutional neural network model [9]. This network has 50 layers arranged in the following structure: input layer; 16 blocks of convolutional layers organized into 4 groups called B1, B2, B3 and B4; and an output layer. The network structure can be seen in Fig. 2. Each block has 3 layers, with convolutions of sizes \(1 \times 1\), \(3 \times 3\) and \(1 \times 1\), respectively. The \(1 \times 1\) convolutions are responsible for reducing and restoring dimensions leaving the \(3 \times 3\) convolution with smaller input and output dimension size. Between the first layer and blocks B1, a max pooling filter with a \(3 \times 3\) dimension and stride of 2 is applied, reducing the size of the input by half. To halve input size between each block, in the first layer of groups B2, B3 and B4, the convolution is performed using a moving window with a stride of 2. In the final step of the network, an average pooling filter and a fully connected layer were used for object classification. The network uses the rectified linear unit (ReLU) activation function.

Fig. 2.
figure 2

Proposed method workflow to illustrate the steps used for segmentation of histological images.

Given the set of 160 ROI obtained from the 4 classes of histological images of the oral cavity, 40 ROI were employed for the network training. In this stage, 32 ROI were used to build the weights of the net and 8 ROI to evaluate each epoch. With the help of a pathologist, 1,220 individual nucleus masks were obtained from these ROI. Since the Mask R-CNN maps the masks over the ROI to extract nuclei features, our set consists of 1,220 nucleus masks, being 1,027 for the training set and 193 for the test set. According to the authors in [14], since the RPN has a magnitude order with few parameters, it has less risk of overfitting on small dataset. Also, by using the Resnet-50 model instead of the Resnet-101 there is an overall overfitting reduction, as explained by the authors in [9]. The network was pre-trained on the ImageNet dataset and fine-tuned using our dataset. For the training, it was defined as a batch size of 9 masks and learning rate of 0.001. It was also used the SGD optimizer with a momentum of 0.9. The fine-tuning of the network was performed using 40 epochs with 142 iterations, which is the number of batches passed to the network in each epoch. These parameters were empirically defined for the analyzed dataset. After this stage, the 120 ROI were employed in the network evaluation stage. The CNN was trained on a computer with an eight-core AMD FX-8320 processor, 8 GB of RAM and Nvidia GTX-1060 GPU with 6 GB of VRAM, using the TensorFlow library in the Python language.

2.3 Post-processing

The segmentation step result in a binary image with the regions of the identified nuclei. This image may have incomplete regions and small artifacts. To fill the incomplete nuclei, the concept of morphological closing was applied. First, in order to close the contour of these nuclei, the dilation operation was performed using a cross-shaped structuring element with size of \(3 \times 3\) pixels. Then, a hole-fill function was used on the present binary objects. Then, an erosion operation was employed with the same structuring element used in the dilation operation to eliminate noise still present in the images. Finally, objects with an area size smaller than 30 pixels were classified as background.

2.4 Evaluation Metrics

The evaluation of a segmentation method can be performed by calculating the overlapping regions of the segmented image and the regions of a reference image demarcated by a specialist (gold-standard) [4]. In this stage, 30 histological ROI of each class were randomly chosen and first segmented by specialists considered the gold standard. Then, the following metrics were considered: accuracy (\(A_{CC}\)), sensitivity (\(S_E\)), specificity (\(S_P\)), correspondencerRate (\(C_R\)) and Dice coefficient (\(D_C\)) [2, 13, 16].

The values of \(S_E\) and \(S_P\) were used to determine the proportion of pixels correctly marked as objects and as background, respectively. The metric \(A_ {CC}\) was used to measure the amount of true positive and true negative calculated in relation to all positives and negatives. The \(C_R\) metric evaluated the correspondence between the result obtained and the gold standard. Then, finally, the \(D_C\) measure was used to evaluate the similarity between the gold standard and the result.

3 Experimental Results

The results of the proposed segmentation method for images of mild, moderate and severe dysplasia are shown in Fig. 3. In an evaluation of the images of the mild class (see Figs. 3a, d and g), it is noted that the method was able to detect and segment the regions with the presence of nuclei. In these figures, it is possible to observe, by the arrows marked in red color, that there are objects detected as false-positive in relation to the marking made by the specialist (Fig. 3d). It is also possible to note regions with the presence of false negative, that is, some nuclei were eliminated by the segmentation process (see the arrows marked in green color). However, the method was able to segment obscured and difficult-to-identify nuclei, obtaining close similarity to the gold standard as seen in the region marked in blue in Fig. 3g. In a similar way, the images of the moderate dysplasia and severe dysplasia classes (see Figs. 3h and i) also showed some regions with the presence of the false positive and false negative.

Fig. 3.
figure 3

Histological images of the oral cavity with the presence of dysplasia: (a) mild, (b) moderate and (c) severe. The regions marked by the pathologist: (d) marking of mild dysplasia, (e) marking of moderate dysplasia and (f) marking of severe dysplasia. Segmentation with the proposed method: (g) result for mild dysplasia, (h) result for moderate dysplasia and (i) result for severe dysplasia. (Color figure online)

Aiming to investigate the method, algorithms based on semantic segmentation by SegNet [15], EM-GMM and K-means [11] were also applied on the image dataset. Some of these methods are extensively used for comparison of histological component segmentation methods [11]. The EM-GMM and K-means algorithms were performed using a cluster number \(k=3\). The SegNet was trained using the same 10 ROI used for the proposed method and using 10 groundtruth masks, each one containing all nuclei from the ROI.

Figures 4a, b, c, d, e, f, g, h and i show the results of the EM-GMM, K-means and SegNet methods, respectively. In the results it is noticed that there are also flaws in relation to false positive regions (red color arrows) and false negative regions (green color arrows). In addition, there are results in which there was a great degradation with respect to the nuclei structures (see Figs. 4a, h and i), where part of these structures was eliminated.

Fig. 4.
figure 4

Results with proposed methods: EM-GMM algorithm: (a) mild dysplasia, (b) moderate dysplasia, and (c) severe dysplasia. The images after application of K-means: (d) mild dysplasia, (e) moderate dysplasia and (f) severe dysplasia. Segmentation with Segnet: (g) mild dysplasia, (h) moderate dysplasia and (i) severe dysplasia. (Color figure online)

The performance of the proposed method for tissue segmentation is shown in Table 1. In images with severe dysplasia, the \(S_E\) and \(A_{CC}\) were smaller than the results of the other classes of lesions. This shows that the method has greater difficulty in identifying the nuclei for this class. This may occur because some nuclei of this class have a high-intensity morphological alteration, which makes it difficult for the algorithm to identify it, classifying it as a background region [12].

Table 1. Evaluation of the proposed method in tissues with different levels of dysplasia.

The results of the algorithms present in literature are presented in Table 2. The proposed method obtained relevant results in relation to the methods present in literature (\(A_{CC} = 89.52\pm 0.04\% \), \(C_R = 0.76\pm 0.10\) and \( D_C = 0.84\pm 0.06\)). The K-means methods obtained a value of \(A_{CC} = 77.32\pm 0.05\%\) and this represents a difference of 12% in relation to the proposed method. The SegNet algorithm obtained the values of \(C_R = 0.40\,\pm \,0.24\) and \(D_C = 0.60\,\pm \,0.16\), being 37% and 23% lower than the results of the proposed method. This behavior with lower results can also be noticed in the results presented by the EM-GMM method (\(A_{CC} =72.91\pm 0.15\) and \(C_R = 0.35\pm 0.30\)).

Table 2. Comparison among the method and techniques presented in literature.

4 Conclusions

In this study, the proposal was put forward an approach for automatic segmentation of nuclei in oral epithelial tissue images. In literature, methods for segmentation of oral dysplastic images are not yet explored and this solution makes a contribution to specialists in the field. This work presented an algorithm of segmentation in images of oral dysplasias based on a deep learning approach. The EM-GMM, K-means and SegNet methods were applied on images of the dataset for comparison purposes. Through qualitative and quantitative analyzes, the combination of the algorithms used in this approach was able to reach more effective results than the compared techniques. In future studies pre-processing algorithms such as color normalization will be investigated aiming to improve the images in the processing initial stage.