Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Received: November 13, 2022. Revised: December 21, 2022. 597 Optimization Contrast Enhancement and Noise Reduction for Semantic Segmentation of Oil Palm Aerial Imagery Maura Widyaningsih1,2 Tri Kuntoro Priyambodo3* Kamal4 Moh Edi Wibowo3 Muhammad 1 Doctoral Program, Department of Computer Science and Electronics, Faculty of Mathematics and Computer Sciences, Gadjah Mada University, Yogyakarta, Indonesia 2 Department of Informatica Technic,STMIK Palangkaraya Palangka Raya, Indonesia 3 Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Gadjah Mada University, Yogyakarta, Indonesia 4 Department of Geographic Information Science, Faculty of Geography, Gadjah Mada University, Yogyakarta, Indonesia * Corresponding author’s Email: mastri@ugm.ac.id Abstract: Contrast improvement and noise reduction are needed to improve aerial images prone to interference during data collection. Contrast limited adaptive histogram equalization (CLAHE) is a histogram equalization (HE) development method that is commonly used for contrast improvement. Median blur (MB) is one of the methods used for noise reduction. Combining these two techniques helps optimize the preprocessing process before semantic segmentation analysis is carried out in image maps. The results of testing experiments with U-Net VGG-16 and VGG19 on image maps show a detailed representation of the predicted pixel class. Comparing accuracy with state-of-theart methods shows that contrast enhancement and noise reduction are better than the previous method. The highest average result for combining CLAHE+MB with U-Net VGG-16 was 76.5 and VGG-19 was 73.8, and the highest accuracy for image sample testing was 87.94 with U-net VGG-16. Keywords: Noise reduction, Aerial imagery, Semantic segmentation. 1. Introduction Quality data supply supports the process of semantic segmentation in producing map images. Enhancement aims to improve image quality during pre-processing, such as segmentation tasks, feature extraction, and image analysis [1,2], and make it more informative [3]. The purpose of optimization during pre-processing is to maintain the image's edges, structure, and detail [4, 5]. A good map image can represent the results of segmenting different objects in the vegetation area. Its success is also influenced by the improvements made during preprocessing. The cause of aerial image interference can be influenced by the results of shooting with different quality fishing gear, for example, drones with unmanned aerial vehicle (UAV) facilities that are not the same. This impact allows for overlapping and loss of quality, such as defects (noise), contrast disturbances, sharpness, and blur [6] in the image. Other causes include poor user experience and inappropriate data collection times [7], along with the effects of light, weather, and viewing angles. So, this needs to be considered for pre-processing to improve image quality. So far, the challenge of the pre-processing problem is when analyzing the separation of objects that must still be consistent with the results of the object segmentation representation. The issue of object separation during class prediction is that the images have considerable intra-class variability due to large-scale object/scene variations, affecting lowclass separation results. International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51 Received: November 13, 2022. Revised: December 21, 2022. The histogram equalization (HE) technique has so far been used to overcome previous challenges in aerial imagery problems, but with this technique, it is still necessary to increase image artifacts [1]. HE is the simplest method to adjust or enhance color contrast globally [8] and update the histogram pixel intensity distribution. CLAHE is also an approach to reduce noise and increase the contrast value in images [9], resulting in artificial boundaries [10]. CLAHE also can consider feature brightness [11] and reduce noise artifacts [12] that can occur in aerial image acquisition. The noise reduction approach using median blur (MB) is the most effective method for removing image noise, such as salt-and-pepper interference [4]. Color image segmentation is a color-spacedependent procedure; choosing a suitable color space also improves the performance of segmentation results and feature compatibility, in addition to applying contrast and noise reduction. Determination of color space also aims to reduce the complexity of the segmentation process. The proposed method uses YUV color space conversion to emphasize intensity in segmentation correction. The proposed approach to improve image quality uses contrast enhancement and noise reduction. The proposed method for optimizing the improvement of contrast quality uses histogram equalization (HE) and contrast-limited adaptive histogram equalization (CLAHE) combined with median blur (MB) noise reduction. Combining the proposed image improvement methods with their respective capabilities, along with distribution to the YUV color space, aims to optimize the characteristics of the pixels when predictive learning is carried out on the proposed network to produce a good map image. The segmentation method is based on pixel classification using the U-Net VGG-16 and VGG-19 architecture on Keras, which will process the enhanced image. Semantic segmentation is the process of generating map images using deep learning (DL) methods across a convolution network (FCN). FCN developed a CNN framework that is applied to represent images on a map using a segmentation process [13, 14]. CNN can build a good model by modifying the model's hyperparameters and defining the network structure [15, 16]. The modified CNN is like the general U-Net framework form in the Encoder-Decoder layer. This segmentation network is used to annotate pixels in the pixel-labeling process so that the final labeling result is an image map. The proposed method to improve image quality combines HE+MB and CLAHE+MB using YUV color space processing to improve segmentation 598 results. The results of the correction help improve the intensity of pixels in aerial images; thus helping pixels predict their class in the segmentation process. The proposed enhancement method will be compared with the results of segmentation accuracy with other state-of-the-art methods, such as the contrast enhancement primer method with HE and CLAHE [17], noise reduction with MB [18], and contrast enhancement with dynamic clip-limit histogram equation boundary window size (DCWHE) [1], and adaptive gamma correction with weighted histogram distribution (AGCWHD) [19]. The segmentation process used in this experiment is with VGG-16 and VGG-19 on Keras. Besides comparing the results of the accuracy of image segmentation testing, comparing the representation results with ground-truth images is also carried out. Ground-truth images are processed from conventional annotations with the help of humans to annotate pixels with labeling using tools. The purpose of comparing the segmentation representation results between the annotation processes carried out by humans and those carried out by machine learning as a recommendation material for providing map images that meet the needs of further analysis in a large area. The conventional way humans do the pixel labeling process has several weaknesses that can occur; namely, the annotation process must be thorough and takes quite a long time. In addition, the annotation work process cannot be done automatically; the work is carried out through a long process stage with tools to improve image quality, annotations, and segmentation. The tools used for labeling are also licensed and require funding. Another possibility is that annotation errors occur when marking out pixels. So in this study, the authors offer a segmentation process using the U-net shape in producing map images. 2. Image quality enhancement method Aerial images are susceptible to noise, contrast, blur, overlap, and image size differences. Preprocessing supports good data representation results to produce image maps in the segmentation process. Pre-processing helps improve and reduce dire image predictions to produce good image maps in the semantic segmentation process. The noise reduction approach allows aerial imagery to be corrected for noise caused by shooting, such as image blur. Contrast enhancement is also an approach to help with the distribution of the image's color intensity, to be precise. One effective aerial imagery enhancement International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51 Received: November 13, 2022. Revised: December 21, 2022. solution introduces enhanced local detail lighting [19,20]. 2.1 Contrast correction method 2.1.1. Histogram equalization (HE) HE functions to evenly distribute the most frequent pixel values (which have the most significant number in the histogram apart from spreading the intensity, the function density also increases the contrast in the gradient domain and then equalizes a single value to adjust the brightness [19]. Good histogram equalization if the image histogram is limited to a particular area. HE changes the intensity value of the image such that the distribution is uniform by changing the intensity level of a pixel with a new intensity level with the transformation function. HE will perform poorly in places with a considerable variation in intensity and where the histogram covers a large extent; there are light and dark pixels. HE starts by processing each pixel channel to correct for contrast, using a histogram or probability density function (PDF) distribution first calculated as [11]: ℎ(𝑖) = 𝑛(𝑖) and 𝑝(𝑖) ℎ(𝑖) 𝑁 (1) Where i = pixel intensity index, p(i) = probability of each intensity index, n(i) is the number of pixels in the image with intensity i , and N is the image six expressed by height and width. Then calculate the value of the cumulative density function (CDF) distribution. 𝐶(𝑖) = ∑𝑖𝑗=0 𝑝 (𝑗) (2) 𝐼𝐸 = (𝐿 − 1) 𝑥 𝐶(𝑖) (3) The final step of the HE process is followed by the improvement of the pixel intensity of the image, where IE is the enhanced intensity, and L is the color channel range of intensity between 0 and 255 expressed in each pixel. 2.1.2. Contrast limited adaptive histogram equalization (CLAHE) Contrast and reduce noise by creating multiple image histograms and using them to redistribute the image brightness. Two parameters must be set clip limit (Clip Limit/ CL) and block size (BS). CL measures image quality by setting a threshold for contrast changes. CL is also a truncated histogram 599 that depends on two-factor histogram normalization and the size of the neighbor’s region. The BS regulates the number of titles in the rows and columns [12]. The CLAHE method does not eliminate artifacts but at least reduces artifact interference [17]. The block size and clip boundary are parameters that control image quality and are defined. If we specify and set parameters incorrectly, CLAHE results will not be as good as using HE. CLAHE process is explained that the original image is divided into sub-images of size M x N, followed by calculating the histogram of each subimage, clip limit the histogram limit of each image. The clip limit of a histogram can be defined as follows: 𝛽= 𝛼 𝑀 (1 + 100 (𝑆𝑚𝑎𝑥 𝑁 −1) (4) Variable M represents the area size, N represents the intensity value (0-255) aims the clip factor represents the addition of the histogram boundary by a value between 0 and 100. Histograms above the clip limit value were considered excess pixels distributed over the area. Below the clip boundary, the histogram has evenly distributed an illustration of the distribution of leftover pixels [17]. 2.2 Noise reduction method 2.2.1. Median Blur (MB) The median filter method uses the value generated by the filter sequence. The effectiveness of noise reduction depends on the filter mask's size and shape, while the algorithm's complexity depends on the median value [18]. The median filter is a nonlinear function used to remove salt or pepper noise from the image and is usually applied to preprocessing before the data is segmented. This algorithm provides a smooth flow of ideas, can effectively maintain edges while reducing impulsive noise [21], and can preserve image detail and improve retouching performance [7]. This method focuses on the median value of the total number of surrounding pixel values that affect the central pixel. This technique works by filling the value of each center pixel (noisy value) with the median value of a sequence of neighbor pixels. The median selection process begins by first sorting the neighbor’s pixel values and then selecting the middle value. The output of the median filter is: International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51 Received: November 13, 2022. Revised: December 21, 2022. 600 classification accuracy in each image and the representation of the segmentation process. The whole process can be seen in Fig. 2. Figure. 1 Illustration of calculating the value in an image with a kernel size of 3x3 Image Acuisition The Oil Palm area Preprocessing Distribusi Color Space with YUV HE CLAHE MB HE + MB CLAHE + MB Distribusi Color Space with BGR Resize Semantic Segmentation U-Net Shape VGG-19 Normalization Semantic Segmentation U-Net Shape VGG-16 Result Accuracy and image map Figure. 2 Process Enhancement and semantic segmentation with U-Net Shape 𝑔(𝑥, 𝑦) = 𝑚𝑒𝑑{𝑓(𝑥 − 𝑖, 𝑦 − 𝑖), 𝑖, 𝑗 ∈ 𝑊} (5) Where f(x, y) is the original image, and g(x, y) is the image representation of the median filter output. The values of i and j represent the kernel filter sizes. W is a two-dimensional mask with a mask size of n×n (where n is usually odd), such as 3×3, 5×5, and so on, and the shape of the mask is linear, square, circular, and cross [7]. The replacement of noisy values in the median blur technique is as follows: Fig. 1 with neighbors 3×3, sorted, and then taking the median value of the pixel sequence, 86 will replace the value 61, and the process continues to shift throughout the image to replace noisy pixels. 3. Research method The data sample for this study is an aerial image of the oil palm vegetation areas taken from the Malaysian oil palm plantation (MOPD) dataset ((https://github.com/dongrunmin/oil_palm_data). The process begins with pre-processing, color space conversion, image quality improvement methods, size determination, and image normalization. The process is continued to preprocessing with the semantic segmentation method. The final result is the calculation of the class pixel 3.1 Preprocessing Color selection is one of the tasks supporting the improvement of ideal color results in segmentation tasks. A color space is a numerical replica representing color details as distinct color channels (components) in a three or four-dimensional polar or Cartesian system. The color space used in the experiment uses BGR to YUV conversion because it is more suitable for recovering cloud-free images. After all, the luminance can change independently without affecting the chromatic information [22]. YUV models human perception of color more closely than the standard red-green-blue (RGB) model used in computer graphics hardware. The YUV color space is Y, which is the component of luma, or brightness, ranging from 0 to 100% in most applications; UV is chromium (an element of blue and red lighting). The color space application function provides information on sharpening the visualization results of the image. The conversion from the RGB to YUV color space is linear [23], and the standard formula for converting the RGB to YUV color space is : 𝑌 = 0,299𝑅 + 0,587𝐺 + 0,114𝐵 𝑈 = − 0,1687𝑅 – 0, 3313𝐺 + 0,5𝐵 + 128 𝑉 = 0,5𝑅 – 0,4187𝐺 0, 813𝐵 + 128 (6) This experiment converts the input image from the blue green red (BGR) color space to YUV. The task is continued in the process of improving image quality using the HE contrast improvement method as in Eq. (1-3) and CLAHE with Eq. (4), MB with Eq. (5), and combining HE+MB, and CLAHE+MB. Implementing the HE and CLAHE methods uses the functions available in open CV. The aerial image used as input data still requires quality improvement before being forwarded to the segmentation using the previously discussed method. The noise reduction procedure with a median blur set filter of kernel size (3×3) will help in more detail on the affirmation of pixel values. In contrast, the CLAHE function sets two parameters with a clip limit value of 0.52, which is quite suitable for color images with large image resolutions, and 4x4 tile sizes. The pre-processing results are continued, return the color to BGR mode, then set all sizes equal to 512 x 512 and normalize the image. Image normalization International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51 Received: November 13, 2022. Revised: December 21, 2022. U-Net Image Segmentation in Keras Output segmentation map Input image tile skip connections bottleneck Figure. 3 U-Net shape of image segmentation to VGG-16 and VGG-19 architecture 64 64 Transpose, ReLU, BN Max pooling 128 128 Transpose, ReLU, BN 16, BN, ReLU Concatenate 32, BN, ReLU Max pooling 256 256 256 Transpose, ReLU, BN Max pooling 256 256 256 32, BN, ReLU 64, BN, Transpose, Concatenate ReLU ReLU, BN Max pooling 512 512 512 Max pooling Concatenate 512, BN, ReLU Transpose, Concatenate 128, BN, ReLU ReLU, BN 512, BN, ReLU Figure. 4 The process structure VGG-16 process by dividing each intensity value by 255 to provide a smaller pixel intensity distribution value, not reducing feature information and simplifying the segmentation process. 3.2 Segmentation processing with U-Net shape Image segmentation is also an important stage of operations describing vegetation objects, from the process also segmentation also affects the results of classification accuracy. The implementation of image segmentation is to separate vegetation types, in this case, the challenge is because of irregular shapes and various patterns and colors [24]. The result of segmentation is assessed quantitatively based on the image map which is visually interpreted as a reference. Semantic segmentation using U-Net Shape architecture is suitable for handling multiclass problems, so the determination is based on the pixel class (not the entire image/segment). In other words, semantic segmentation is tasked with classifying object classes for each pixel in an image, which means assigning a label to each pixel. The purpose of image segmentation is to partition an image into homogeneous and connected regions 601 without using additional knowledge about the objects in the picture. Regional homogeneity in color image segmentation involves natural colors and sometimes color textures [25]. The semantic segmentation method was developed by increasing the sampling time, involving full convolution (FC) in the network to recover every input detail, and balancing network work processes and invariants [26]. The architecture of the U-Net form consists of two parts: the encoder and decoder layers. U-Net VGG-16 is similar to the encoder and decoder layer, which modifies the inference layer on the CNN with a de-convolutional layer using the backbone [27]. VGG-19 is similar to VGG-16 and differs only in each convolutional encoder layer. The U-Net form architecture consists of paths defined to capture the evolving context symmetrically to allow for precise localization. The network path with convolution operations performs back-and-forth operations on the sampling feature map alternately and in sampling. Convolution (left layer) is a process to get a smaller output size. In comparison, de-convolution (suitable layer) is derived from up-sampling or transposing of the convolution function to get a larger output size, a fractional convolution step to produce an image map. Between the two layers, there is a bottleneck layer, a convolution layer to bridge the two network processes of the encoder and decoder. In the encoder, convolutional layers and a continuous down-sampling layer extract in-depth features with large receptive fields. The decoder samples the extracted inner features from the input resolution for pixel-level semantic prediction. The high-resolution features of the different scales of the encoder are combined with bypass connections to reduce the loss of spatial information caused by down-sampling. The task of the decoder is to project the discriminatory features semantically (lower resolution) studied by the encoder into the pixel space (higher resolution) to obtain a dense classification [26]. The general framework of the U-Net is illustrated in Fig. 3. U-Net shapes VGG-16 and VGG-19 use the Keras standard for their implementation. In this experiment, we set the adjusted parameter functions, namely the activation function, decoder line layer arrangement, and optimization function. The layer sizes specified in the decoder are 128, 64, 32, 32, and 16, which are manageable measures of the number of parameters in accelerating the segmentation process. The activation function uses ReLU to use convolution and softmax to the final operation in International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51 Received: November 13, 2022. 64 64 Transpose, ReLU, BN Max pooling 128 Revised: December 21, 2022. 4. Discussion and result 16, BN, ReLU Transpose, 32, BN, Concatenate ReLU, BN ReLU 128 Max pooling 256 256 256 256 Transpose, ReLU, BN Max pooling 256 256 256 256 Transpose, Concatenate ReLU, BN Max pooling 512 512 512 512 Max pooling Concatenate 512, BN, ReLU Transpose, ReLU, BN 512, BN, ReLU Concatenate 32, BN, ReLU 64, BN, ReLU 128, BN, ReLU Figure. 5 The process structure VGG-19 network, which supports the categorical crossentropy loss function, which changes the scale of the model output so that it has the right properties [26, 28]. This function is also very effective for segmentation tasks [29]. Up-sampling function with Transpose function, normalization to the convolution fungtion use batch normalization (BN). The decoder layer is also set to the size of the convolution layer with standard parameters that are manageable on the network. Softmax activation function to represent more than two class pixels so that the color is distributed according to the pixel class visualized by the image map. Fig. 4 and Fig. 5 are the process structure on the VGG-16 and VGG-19 architectures that are used to perform segmentation testing on the pixel class. The VGG-16 architecture of Keras with a total of 21,877,363 parameters and VGG-19 consists of 27,187,059 parameters. The structure of VGG-16 and VGG-19 consists of a convolution layer on the encoder layer, a bottleneck with the center layer, and deconvolution with the decoder layer. Concatenate function used the merger process with layers convolution of encoder with decoder. 3.3 Evaluation performance Performance measurement determines the model's ability to train pixels to recognize their classes. The overall accuracy considers all categories that enter metrically from the class pixel prediction component. 𝑂𝐴 = ∑𝑁 𝑘=1 𝑇𝑃𝑘 ∑𝑁 𝑘=1 𝑇𝑃𝑘 +𝐹𝑃𝑘 +𝐹𝑁𝑘 +𝑇𝑁𝑘 602 (7) Where 𝑇𝑃𝑘 , 𝐹𝑃𝑘 , 𝑇𝑁𝑘, and 𝐹𝑁𝑘 indicate true positives, false positives, true negatives, and false negatives, respectively, for object indication as class k [30], the accuracy result is multiplied by 100 [31]. Pre-processing was carried out on six samples of aerial images, in pre-processing image quality using contrast correction using the HE function, CLAHE, noise correction with MB, and merging HE with MB, and CLAHE with MB. Overall, as shown in Fig. 4, the result of processing the contrast equalization and noise reduction process in Image1, Image2, Image3, Image4, Image5, and Image6. Parameters are used and applied in noise reduction and contrast improvement, taking into account the size of the input image and the picture's condition. The HE parameter is aligned globally in the embodiment using the default value in the open CV function, which has three color channels, namely RGB. When implementing CLAHE, two parameters are used: clip limit with a value of 0,52, which is a parameter to set the threshold or contrast. The second parameter is the 4x4 tile grid size to set the number of kernels in rows and columns. The kernel size applied to MB using is 3x3; this smallest kernel is defined to support stricter filtering on pixel-based classification. A comparison of histogram results for each image shows that the original image processed with the MB process slightly changes in intensity. The visualization results show that the picture looks sharper than the original. Contrast processing with HE and CLAHE, the histogram results of both processes show a more even distribution of color towards the light intensity. The HE histogram results show a more even contrast than CLAHE, and with CLAHE, there are still some non-contrast intensities. The image visualization between the HE and CLAHE results shows the emphasis of dark blue on plant areas with old vegetation, light blue on areas of young , and yellow and white color on road or soil areas (objects which are not vegetation). The histogram results show the difference in better results after combining the HE process with MB and CLAHE with MB. The histogram results show that both approaches experience better color distribution and reduced image noise. Image visualization shows a change in color intensity in a particular pixel area so that this color variation strengthens the detail of the pixel intensity value, which is processed into a semantic segmentation network. The incorporation of retouching techniques on the image makes it sharper; the contrast is increased, and noise is noticeably reduced. Fig. 6 is one of the Image2 samples with a dense vegetation area that was preprocessed; the difference International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51 Received: November 13, 2022. HE Revised: December 21, 2022. CLAHE Graph histogram HE 603 MB HE + MB Graph histogram MB CLAHE + MB Graph histogram CLAHE Graph histogram HE + MB Graph histogram CLAHE +MB Figure. 6 Representation sample Image2 for contrast result with HE , CLAHE and noise reduction with MB; Graph histogram results of distribution pixel-intensity from the image representation HE CLAHE Graph histogram HE MB HE + MB Graph histogram MB Graph histogram HE + MB CLAHE + MB Graph histogram CLAHE Graph histogram CLAHE+MB Figure. 7 Representation sample Image4 for contrast result with HE, CLAHE and noise reduction with MB; Graph histogram results of distribution pixel-intensity from the image representation in contrast results is visible after the HE and CLAHE processes, as an example of a partial image representation marked with a circle. The HE histogram graph shows that the green color is more distributed to the contrast, while with CLAHE, the International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51 Received: November 13, 2022. Revised: December 21, 2022. Table 1. Training semantic segmentation VGG-16 with six aerial image (Image1, Image2, Image3, Image4, Image5, Image6 ) use accuracy value in percent Data Sample six image Metode 1 2 3 4 5 6 Original 50,13 39,97 42,01 26,87 15,33 43,01 HE 47,43 58,75 46,28 38,97 48,73 37,5 CLAHE 40,78 47,7 39,7 30,57 23,41 40,25 [17] MB [18] 39,84 47,34 41,85 28,79 19,57 42,09 DCWHE 43,8 43,55 41,36 31,23 22,83 37,96 [1] AGCW 31,96 39,57 36,93 37,73 42,51 40,07 HD [19] Proposed 70,11 64,69 65,91 67,63 74,18 58,8 HE+MB Proposed CLAHE+ 70,85 74,05 78,6 77,12 87,94 70,24 MB Table 2. Training semantic segmentation VGG-19 with six aerial image (Image1, Image2, Image3, Image4, Image5, Image6 ) use accuracy value in percent Data Sample six image Method 1 2 3 4 5 6 Original 35,16 42,04 38,51 32,5 33,21 22,71 HE 44,9 54,06 46,35 48,53 51,36 20,61 CLAHE 42,59 49,97 43,27 33,18 35,04 19,49 [17] MB [18] 40,88 47,32 38,25 33,1 33,43 21,59 DCWHE 41,86 47,24 34,59 30,62 35,03 20,79 [1] AGCW 34,45 30,53 32,82 33,59 34,66 40,52 HD [19] Proposed 66,74 67,07 62,86 69,09 72,26 65,4 HE+MB Proposed CLAHE+ 72,34 68,45 70,18 72,31 75,02 84,47 MB red color is more many contrasted. CLAHE indicates that some prominent intensities and distributions differ from the HE results. However, combining the noise reduction process with MB gives good results from both the HE and CLAHE functions; the image representation shows the pixel color distribution providing additional intensity variations. It can be seen that the blue color CLAHE+MB histogram is more highly distributed than the red HE+MB color; in this case, the blue color confirms the vegetation area. Fig. 7 is an example of an area not dense with vegetation objects; several other objects exist. The results of the HE improvement compared to CLAHE show a different distribution of pixel intensity, as seen in the histogram graph. The pixel color representation also looks different in the area marked 604 in the image. The results of combining HE+MB and CLAHE+MB show a different histogram in the even distribution of red and green colors, where the combination of the two colors emphasizes the nonvegetated object area, namely roads or soil. From the image representation, CLAHE+MB retains the blue color element to describe the vegetation object area, as seen from the circle marked. After improving the image quality, the segmentation process continued using the U-Net VGG-19 and VGG-16 forms. The parameters set in U-Net use the standard set in Keras; parameters given to the decoder function are layer size and the output activation and transpose process as up-sampling. Evaluation of network performance is based on the segmentation process's accuracy function for six aerial image samples. U-Net shape network training is carried out with a loss cross entropy function with 30 epochs and activation operations with Softmax for multiclass identification on pixels. The results show an increase when the enhancement approach is applied to the image compared, as shown in Table 1 and Table 2. The information shown in Table 1 and Table 2 calculates the prediction accuracy of the pixel class with the proposed HE, MB, CLAHE, DCWHE, AGCWHD, HE+MB, and CLAHE+MB processes. The results of the accuracy of the pixel testing in the segmentation process with 6 sample images show that the combination of HE+MB and CLAHE+MB functions is better than HE and CLAHE without combining with MB. Likewise, comparisons with state-of-the-art methods such as DCWHE and AGCWHD show better results than we propose. Accuracy with the VGG-16 process showed that the highest accuracy was in combining CLAHE + MB in the Image5 sample with an accuracy test result of 87.94, while in combining HE + MB with an accuracy of 74.18, as shown in Table 1. Testing accuracy with VGG-19, the highest result was the combination of CLAHE+MB in the Image6 sample, which was 84.47, while with HE+MB in the Image5 example, the accuracy was 72.26, as in Table 2. Overall it shows that the highest accuracy when testing the segmentation results of 6 images is the combination of the CLAHE+MB approach compared to other methods. The application of YUV color space conversion also supports the sharpening characteristics of the pixels themselves. This approach helps improve image quality so that pixels can optimize that noise in the resulting image. Fig. 8 shows the results of the highest accuracy in the Image5 sample of the six samples tested for pixel prediction in the image segmentation process with International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51 Received: November 13, 2022. Revised: December 21, 2022. Graph Accuracy of VGG-16 Original 605 Ground truth HE CLAHE MB HE+MB CLAHE+MB DCWHE AGCWHD Figure. 8 The highest accuracy of results the pixel training using VGG-16 and representation result prediction segmentation of Image5 VGG-16 compared to other methods using epoch 30. The accuracy graph shows that the highest accuracy is using quality improvement by incorporating CLAHE+MB. The second sequence is the combination of HE+MB, as shown in Table 1 for accuracy value. The test graph shows that the processing of the CLAHE+MB image repair results when testing shows an increase in accuracy, which from the start maintains the stability of the accuracy results, while the HE+MB test experience the process of increasing the accuracy. The representation of the segmentation process shows results that are almost difficult to distinguish from several comparisons, but using U-Net segmentation shows the level of detail of pixels in its class, compared to the ground truth process that is carried out conventionally, which causes pixels to work not optimally which causes pixels to be recognized as other objects. The difference in representation results between the ground truth and the consequences of segmentation processing with U-Net shows that the different segmentation object is more visible and transparent than the ground truth. The map image shows apparent differences in the area of soil and vegetation. Segmentation from the CLAHE+MB and HE+MB merging process shows the density of older vegetation objects marked in dark blue, young vegetation in light blue, and soil areas marked in yellow and orange. The difference is not too striking with other methods. Still, there are slight differences with other methods, namely the density of vegetation intensity, old vegetation is somewhat blurred closer to young color, and some areas of land are recognized as vegetation. In contrast, from the ground-truth, the land object is detected as a vegetation class object, as shown in the image marked with a circle. The success of the pixels trained in the U-net network to carry out the segmentation process can produce image maps that represent objects with precision and detail. Fig. 9 shows pixel training in the segmentation process on the Image6 sample, which shows the highest accuracy with VGG-19. The displayed accuracy graph shows the same thing in the previous approach: the combined CLAHE+MB and HE+MB are still superior compared to other methods. It can see from the ground-truth representation all objects are identified with soil, while segmentation processing with U-Net shows a clear representation between land and vegetation areas. The two proposed methods show better representation results because some vegetation objects retain their class sufficiently, so they are not recognized as soil objects. In other scenarios, some intensities experience blurred conditions so that other class objects still realize some things; for example, vegetation is identified as soil or vice versa. Table 3 shows the average accuracy results from testing the data of 6 sample images against a comparison of image quality improvement methods, namely HE, CLAHE, MB, DCWHE, AGCWHD, and the proposed combination of HE+MB and CLAHE+MB. The processing results show that VGG-16 performs better on six image samples in thesegmentation testing process compared to VGG19. Improving the image quality of the combined International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51 Received: November 13, 2022. Revised: December 21, 2022. Graph Accuracy of VGG-19 Original 606 Ground truth HE CLAHE MB HE+MB CLAHE+MB DCWHE AGCWHD Figure. 9 The highest accuracy of results the pixel training using VGG-19 and representation result prediction segmentation of Image6 Table 3. The accuracy average testing six of the sample images with U-Net shape Segmentaton Method Method VGG-16 VGG-19 Original 36,2 34,0 HE 46,3 44,3 CLAHE [17] 37,1 37,3 MB [18] 36,6 35,8 DCWHE [1] 36,8 35,0 AGCWHD [19] 38,1 34,4 Proposed (HE+MB) 66,9 67,2 Proposed (CLAHE+MB) 76,5 73,8 processing between CLAHE+MB gave the highest accuracy results in the process with VGG-16, namely 76.5, while with VGG-16, it was 73.8. The experimental results of applying the YUV color space conversion help increase the color intensity of pixels before image problems are corrected. The role of YUV color space conversion before contrast enhancement and noise reduction can strengthen pixel characteristics, thereby reducing the complexity of pixel classes to predict. An approach that combines contrast optimization and noise reduction can be detailed pixel intensity so that when area segmentation is performed, the pixel can predict its class quite well. Improving image quality can help prepare data to support the quality of pixel intensity before the segmentation process is carried out. The proposed approach combines contrast enhancement and noise reduction like CLAHE+MB and HE+MB, giving feature pixel enhancement. The improved pixel characteristics help the pixel learning process in segmentation and can produce good-quality map images to separate objects in the vegetation area. Testing with the appropriate parameters in determining the Clip-limit and Window-size in the CLAHE process determines the results of the contrast quality improvement process. Although the HE and CLAHE methods are commonly used and developed for image improvement processes, implementation with aerial image data has yet to yield good results. Likewise, when image repair noise reduction with MB, the accuracy results also do not give good results. Compared with other improvement methods, such as DCWHE and AGCWHD also do not provide good accuracy compared to the two proposed approaches. The segmentation process with U-Net is used to perform labeling through the machine learning of pixels, and the process works automatically, including the image quality improvement process. Map image representation of the results of the segmentation test on U-Net as machine learning has better results than ground-truth images. Pixels successfully trained in the U-net network to carry out the segmentation process can produce image maps representing objects with sufficient precision and detail. Ground-truth image representation itself does not give to provide maximum results from the segmentation process carried out by humans, and it can be seen that several classes are recognized as other classes. Labeling done conventionally by humans still has the risk of identifying pixels as another class, so the map image results may not necessarily display the appropriate segmentation. The segmentation process performed by machine learning helps make pixel labeling work better. Comparing the results of the machine learning International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51 Received: November 13, 2022. Revised: December 21, 2022. process allows pixels to make predictions about their class to generate map images that can distinguish objects from one another, such as differences in soil and old and young vegetation over a large area. 5. Conclusion Overall, the proposed process of improving image quality by combining HE+MB and CLAHE+MB can improve segmentation results for aerial images, with the highest average accuracy in VGG-16. The highest Accuracy of the VGG-16 process with combining CLAHE + MB in the Image5 sample is 87.94. The average accuracy of the six sample data in the semantic segmentation process for pixels is 76.5 on VGG-16 and 73.8 on VGG-19. The settings still often experience weight changes which usually affect the accuracy of the results between the two VGG-16 and VGG-19 networks. However, the two still provide better and superior results in combining image enhancement compared to other image repair methods. The characteristics of the aerial image features are strengthened by the distribution of the YUV color space and the suggested approach for optimizing contrast improvement coupled with noise reduction. The distribution of color space is one of the supports in providing quality to the intensity of image pixels. Consideration of good results is also affected by using CLAHE-compliant parameters such as the Clip-limit and Window-size parameters and the kernel size for MB. The role of machine learning work on pixel annotations can provide immediate results both in the pre-processing process and the segmentation that works on pixels to produce map images. Pixel annotation with U-net as machine learning for segmentation helps reduce labeling processing time in providing better image map data because it works automatically. The weaknesses of the annotation process carried out by humans are to minimize the possibility of errors that occur during annotations, which lead to unfavorable segmentation results. U-Net is the recommended network for segmenting and training pixels to class prediction. This network needs further improvement and optimization research to provide even better segmentation results. One of these others problems of aerial imagery is also influenced by the differences in the characteristics of each image at the time of data collection. Therefore, image repair is always carried out for the availability of good data on aerial imagery before the segmentation process is carried out. Further development can improve the U-net shape architectural model to provide better pixel machine 607 learning results of enhanced image quality, thus producing good-quality map images. Conflicts of interest The authors declare there is no conflict of interest in this work. Author contributions The contributions of each author are described as follows: "Conceptualization and methodology, Priyambodo, Wibowo, Widyaningsih; implementation the of methodology, Widyaningsih; data validation, Kamal and Widyaningsih ; formal analysis, Priyambodo, Wibowo, and Kamal; investigation, Priyambodo, Wibowo, Kamal and Widyaningsih; writing—original draft preparation, Widyaningsih; writing—review and editing, Priyambodo, Wibowo and Kamal; supervision, Priyambodo, Wibowo and Kamal; funding acquisition, Widyaningsih". Acknowledgments The authors would like to thank STMIK Palangkaraya (for collect data) and the Department of Computer Science and Electronics for supporting publication funds and using laboratories for experiments and testing. References [1] M. B. Mortatha, S. S. Thabit, H. R. A. Ameer, and R. R. Nuiaa, “Dynamic Clip Limit Window Size Histogram Equalization for Poor Information Images”, International Journal of Intelligent Engineering & System, Vol. 15, No. 5, pp.57-70, 2022, doi: 10.22266/ijies2022.1031.06. [2] G. Tsagkatakis, A. Aidini, K. Fotiadou, M. Giannopoulos, A. Pentari, and P. Tsakalides, “Survey of deep learning approaches for remote sensing observation enhancement”, Sensors, Switzerland, Vol. 19, No. 18, pp. 1-38, 2019. [3] A. Shah, J. I. Bangash, A.W. Khan, I. Ahmed, A. Khan, A. Khan, and A. Khan, “Comparative analysis of median filter and its variants for removal of impulse noise from grayscale images”, Journal of King Saud University Computer and Information Sciences, Vol. 34, No. 3, pp. 505-519, 2022. [4] U. Erkan, S. Enginoğlu, D. N. H. Thanh, and L. M. Hieu, “Adaptive frequency median filter for the salt and pepper denoising problem”, The Institution of Engineering and Technology International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51 Received: November 13, 2022. Revised: December 21, 2022. Image Processing, Vol. 14, No. 7, pp. 12401247, 2020. [5] P. Srestasathiern and P. Rakwatin, “Oil Palm Tree Detection with High Resolution MultiSpectral Satellite Imagery”, Remote Sensing, Vol. 6, No. 10, pp. 9749-9774, 2014. [6] A. Mitra, S. Roy, S. Roy, and S. K. Setua, “Enhancement and restoration of non-uniform illuminated fundus image of retina obtained through thin layer of cataract”, Computer Methods and Programs in Biomedicine, Vol. 156, pp. 169-178, 2018. [7] B. Desai, S. Jha, and U. Kushwaha, “Image Filtering - Techniques, Algorithm and Applications”, GIS Science Journal, Vol. 7, No. 11, pp. 970-975, 2020. [8] M. J. Alwazzan, M. A. Ismael, and A. N. Ahmed, “A Hybrid Algorithm to Enhance Colour Retinal Fundus Images Using a Wiener Filter and CLAHE”, Journal of Digital Imaging, Vol. 34, No. 3, pp. 750-759, 2021. [9] P. Dai, H. Sheng, J. Zhang, L. Li, J. Wu, and M. Fan, “Retinal fundus image enhancement using the normalized convolution and noise removing”, International Journal of Biomedical Imaging, pp. 1-12, 2016. [10] Z. Zou and Z Shi, “Random Access Memories: A New Paradigm for Target Detection in High Resolution Aerial Remote Sensing Images”, IEEE Transactions on Image Processing, Vo. 27, No. 3, pp. 1100–1111, 2018. [11] A. M. Pour, H. Seyedarabi, S. H. A. Jahromi, and A. Javadzadeh, “Automatic Detection and Monitoring of Diabetic Retinopathy Using Efficient Convolutional Neural Networks and Contrast Limited Adaptive Histogram Equalization”, IEEE Access, Vol. 8, pp. 136668136673, 2020. [12] G. Ulutas and B. Ustubioglu, “Underwater image enhancement using contrast limited adaptive histogram equalization and layered difference representation”, Multimedia Tools and Applications, Vol. 80, No. 10, pp. 1506715091, 2021. [13] M. Lan, Y. Zhang, L. Zhang, and B. Du, “Global Context based Automatic Road Segmentation via Dilated Convolutional Neural Network”, Information Sciences, Vol. 535, pp. 156–171, 2020. [14] Y. Bengio, A. Courville, and P. Vincent, “Representation Learning: A Review and New Perspectives”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 8, pp. 1798–1828, 2013. 608 [15] Chrisantonius, T. K. Priyambodo, F. H. Raswa, and J. C. Wang, “Partial Fingerprint on Combined Evaluation using Deep Learning and Feature Descriptor”, In: Proc. of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, pp. 1611-1614, 2021. [16] K. Hidjah, M. Wibowo, A. Harjoko, and R. R. Shantiningsih, “Periapical Radiograph Texture Features for Osteoporosis Detection using Deep Convolutional Neural Network”, International Journal of Advanced Computer Science and Applications, Vol. 13, No. 1, pp. 223-232, 2022. [17] F. Hana and I. D. Maulida, “Analysis of contrast limited adaptive histogram equalization (CLAHE) parameters on finger knuckle print identification”, Journal of Physics: Conference Series, Vol. 1764, No. 1, 2021. [18] A. Pati, S. K. Sagarar, and P. R. Rajvanshi, “A New Improved Statistical Algorithm for Image Noise reduction By”, International Journal of Research, Vo. 5, No. 12, pp. 2672-2679, 2018. [19] M. Veluchamy and B. Subramani, “Image contrast and color enhancement using adaptive gamma correction and histogram equalization”, International Journal for Light and Electron Optics - Optik, Vol. 183, pp. 329-337, 2019. [20] X. Fu, J. Wang, D. Zeng, Y. Huang, and X. Ding, “Remote Sensing Image Enhancement Using Regularized-Histogram Equalization and DCT”, IEEE Geoscience and Remote Sensing Letters, Vol. 12, No. 11, pp. 2301-2305, 2015. [21] R. Sabah, R. Ngadiran, and D. A. Hammood, “Image denoising using wavelet and median filter based on raspberry Pi”, Jurnal Informatika, Vol. 15, No. 2, pp. 91-102, 2021. [22] X. Wen, Z. Pan, Y. Hu and J. Liu, “Generative Adversarial Learning in YUV Color Space for Thin Cloud Removal on Satellite Imagery”, Remote Sensing, Vol. 13, No. 1079, pp. 3-22, 2021. [23] C. E. Prema, S. S. Vinsley, and S. Suresh, “Multi-Feature Analysis of Smoke in YUV Color Space for Early Forest Fire Detection”, Fire Technology, Vol. 52, No. 5, pp. 1319-1342, 2016. [24] S. M. U. Sinaga, and M. Kamal, “Image segmentation for vegetation types extraction using WorldView-2: a case study in parts of Dieng Plateau, Central Java”, In: Proc Sixth Geoinformation Science Symposium, p. 2, 2019. [25] S. Tammina, “Transfer learning using VGG-16 with Deep Convolutional Neural Network for Classifying Images”, International Journal of International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51 Received: November 13, 2022. Revised: December 21, 2022. 609 Scientific and Research Publications (IJSRP), Vol. 9, No. 10, p. 9420, 2019. [26] S. Avalos, and J. Ortiz, Convolutional Neural Networks Architecture: A Tutorial1, Queens University, pp. 159-168, 2019. [27] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional EncoderDecoder Architecture for Image Segmentation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 12, pp. 2481–2495, 2017. [28] Y. Ho and S. Wookey, “The Real-World-Weight Cross-Entropy Loss Function : Modeling the Costs of Mislabeling”, IEEE Access, Vol. 8, pp. 4806-4813, 2020. [29] D. J. Doggalli and S. B. S. Kumar, “The Efficacy of U-Net in Segmenting Liver Tumors from Abdominal CT Images”, International of Intelligent Engineering and Systems, Vol. 15, No. 5, pp. 151-161, 2022, doi: 10.22266/ijies2022.1031.14. [30] K. M. Ting, “Confusion Matrix”, Encyclopedia of Machine Learning and Data Mining, pp. 260260, 2017. [31] M. S. Madni and C. Vijaya, “Hand Gesture Recognition using Auto Encoder with Bidirection Long Short Term Memory”, International of Intelligent Engineering & System, Vol. 14, No. 6, pp. 168 – 176, 2021, doi: 10.22266/ijies2021.1231.16. International Journal of Intelligent Engineering and Systems, Vol.16, No.1, 2023 DOI: 10.22266/ijies2023.0228.51