Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
SAVE: Self-Attention on Visual Embedding for Zero-Shot Generic Object Counting
Previous Article in Journal
Enhancing U-Net Segmentation Accuracy Through Comprehensive Data Preprocessing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Highly Robust Encoder–Decoder Network with Multi-Scale Feature Enhancement and Attention Gate for the Reduction of Mixed Gaussian and Salt-and-Pepper Noise in Digital Images

by
Milan Tripathi
,
Waree Kongprawechnon
* and
Toshiaki Kondo
School of Information, Computer and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani 12120, Thailand
*
Author to whom correspondence should be addressed.
J. Imaging 2025, 11(2), 51; https://doi.org/10.3390/jimaging11020051
Submission received: 20 January 2025 / Revised: 4 February 2025 / Accepted: 7 February 2025 / Published: 10 February 2025
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)

Abstract

:
Image denoising is crucial for correcting distortions caused by environmental factors and technical limitations. We propose a novel and highly robust encoder–decoder network (HREDN) for effectively removing mixed salt-and-pepper and Gaussian noise from digital images. HREDN integrates a multi-scale feature enhancement block in the encoder, allowing the network to capture features at various scales and handle complex noise patterns more effectively. To mitigate information loss during encoding, skip connections transfer essential feature maps from the encoder to the decoder, preserving structural details. However, skip connections can also propagate redundant information. To address this, we incorporate attention gates within the skip connections, ensuring that only relevant features are passed to the decoding layers. We evaluate the robustness of the proposed method across facial, medical, and remote sensing domains. The experimental results demonstrate that HREDN excels in preserving edge details and structural features in denoised images, outperforming state-of-the-art techniques in both qualitative and quantitative measures. Statistical analysis further highlights the model’s ability to effectively remove noise in diverse, complex scenarios with images of varying resolutions across multiple domains.

1. Introduction

Image denoising is a critical aspect of image processing, focusing on the restoration of the original image through the reduction of noise in a noisy version. It aids in resolving additional image processing issues and is categorized into two primary methods, traditional filtering and deep learning [1,2,3,4,5,6]. The traditional filtering approach involves employing mathematical operations and spatial filters like mean or median filters, but its effectiveness is limited when confronted with complex situations. Deep learning leverages CNNs to acquire knowledge of noise patterns through pairs of noisy and clean images, leading to enhanced denoising with higher accuracy. Image denoising holds significant importance in the field of research and has the capacity to improve diverse image processing applications.
Traditional filters, such as Gaussian filter, median filter, bilateral filter, and BM3D, have been widely employed, but they do not achieve state-of-the-art denoising performance [1,2,3,4].
The availability of vast amounts of data and high processing power has led to the development of several neural network-based techniques for image processing. A CNN model was proposed by Gose et al. for image denoising [7]. It achieved optimal metric scores compared to traditional filtering methods; however, the authors did not compare the proposed technique with current state-of-the-art methods, and the images were noised with limited noise levels. Ronneberger et al. proposed the U-Net architecture, which is an encoder–decoder-based model featuring a skip connection between the encoding and decoding layers [8]. This innovative design enabled selective data transfer, leading to improved image reconstruction. Oktay et al. introduced a novel attention gate (AG) model that automatically directs attention to target structures of varying sizes and shapes, eliminating the need for external localization modules [9]. This innovation enhances the accuracy of CNN architectures like U-Net without introducing significant computational overhead. The experimental results demonstrate that AGs enhance U-Net’s performance across various datasets and training sizes while maintaining computational efficiency. In their work, Zhang et al. introduced the application of feed-forward denoising convolutional neural networks [10]. They harness advancements in deep architecture, learning algorithms, and regularization techniques to address image denoising. To expedite the training process, the incorporation of residual learning and batch normalization has been employed, thereby resulting in an augmentation of the denoising performance. Cha et al. implemented a novel fully convolutional architecture, which greatly improves the base supervised model and introduces regularization methods for adaptive fine-tuning [11]. This results in a more potent and robust adaptivity. In their study, Tian et al. introduced an image denoising method based on an attention-guided denoising convolutional neural network [12]. This network comprises several components, namely a sparse block, a feature enhancement block, an attention block, and a reconstruction block. Tian et al. proposed a novel network called a batch renormalization denoising network (BRDNet) [13]. To obtain more features, the width of the network was increased, and batch renormalization was also used. Li et al. introduced a novel Gated Fully Fusion (GFF) architecture for semantic segmentation, where gates are employed in a fully connected manner to fuse features across multiple levels [14]. The proposed method exhibited state-of-the-art performance on four challenging scene parsing datasets. Gurrola-Ramos et al. introduced a residual dense neural network (RDUNet) for image denoising [15]. This encoder–decoder-based architecture features densely connected convolutional layers that enable local residual learning and the reuse of feature maps. The proposed method achieved competitive results without requiring prior knowledge of noise levels. In their work, Zhang et al. developed RatUNet, a refined deep convolutional U-Net architecture aimed at image denoising [16]. It enhances skip connections, downsampling, upsampling, and network depth, while also incorporating polarized and depthwise self-attention mechanisms. Mafi et al. introduce a deep convolutional neural network that combines batch normalization and regularization to effectively remove mixed Gaussian and impulse noise [17]. Their model demonstrates optimal structural metric performance when addressing both familiar and unfamiliar noise combinations. Khmag introduced a self-adjusting generative confrontation network designed for denoising digital images [18]. The method demonstrated quantitative performance and visual quality on par with machine learning approaches such as DnCNN and SuKro, while also offering faster processing speeds [10,19]. Khmag integrated second-generation wavelet transformation with a convolutional neural network to enhance the deblurring of digital images [20]. The proposed approach achieved superior results compared to BM3D, MLP, SSDA, and traditional neural networks [21,22,23]. Cheng et al. introduced a multi-domain encoder–decoder latent data assimilation (MEDLA) framework for dynamical systems [24]. This approach effectively reduces computational burden while enhancing assimilation accuracy. Cheng et al. proposed a novel multi-scale physics-constrained neural network (MSPCNN) for dynamical systems [25]. This innovative approach demonstrates noise robustness and enhances prediction accuracy when physical constraints are introduced in low-fidelity fields. Chen et al. introduced TransUNet, which combines Transformers with UNET to offer a dependable and efficient method for medical image segmentation [26]. The global self-attention mechanism of Transformers, integrated into UNET-based architecture, could inspire advancements in image denoising.
Although CNNs have become widely favored in the field of image processing, their performance tends to deteriorate as plain CNN architecture deepens, limiting their capacity for feature extraction. Some networks fail to effectively share features between shallow and deep layers. Additionally, many networks overlook edge information, leading to poor structural similarity between the denoised image and the original image.
In this paper, we propose a highly robust encoder–decoder architecture, HRED, which integrates a multi-scale feature extraction block and an attention gate to address key challenges in image denoising. The framework employs an encoder–decoder architecture where downsampling extracts feature maps, and upsampling reconstructs a clean image. However, encoding can result in loss of information, reducing the decoder’s ability to preserve structural details. To mitigate this, skip connections transfer feature map information from the encoder to the decoder [8]. While effective, skip connections may also propagate redundant information from encoding to decoding layers. To address this, we incorporate attention gates within the skip connections, ensuring only relevant features are passed forward. Additionally, skip connections help to mitigate the vanishing gradient problem in deep networks [27]. To further enhance feature extraction capabilities, we integrate a multi-scale feature enhancement block into the encoder. This block captures features at varying scales, improving the network’s capacity to handle complex noise patterns. The experimental results demonstrate that HRED surpasses state-of-the-art denoising methods across multiple datasets, achieving superior MSE, PSNR, SSIM, and IEF values [28,29]. HREDN can be used as a preprocessing step in facial image analysis, medical imaging, and remote sensing. Images in these fields are particularly prone to various types of noise. As a result, HREDN effectively removes noise and enhances image clarity, which in turn improves accuracy in tasks such as segmentation, classification, object detection, and more.
Although BM3D is highly effective at removing noise when the variation in an image is continuous, such as Gaussian noise, the addition of salt-and-pepper noise increases the sparsity of the noise, introducing random extreme values. This reduces the efficacy of BM3D, as clearly demonstrated in the results. Therefore, a convolutional neural network-based method offers a more practical solution for this scenario. DNCNN, FCAIDE, and BRDNET are well-known convolutional neural networks commonly used for image denoising. However, they all lack an effective attention mechanism, which can lead to the propagation of superfluous information into the deeper layers of the network. Additionally, these models do not include mechanisms such as multi-scale feature enhancement blocks, which are crucial for effectively handling complex noise. Although the Gated Fully Fusion (GFF) network facilitates the propagation of useful information from the encoder to the decoder, potentially reducing noise, the experimental results suggest that it is less effective than the attention gate. RDUNET, on the other hand, adopts an encoder–decoder architecture like HREDN, where information is transferred from the encoder to the decoder through skip connections. However, it does not incorporate attention gates within these skip connections, which are vital for ensuring that only relevant information is passed to the decoder. Furthermore, RDUNET lacks specific mechanisms for multi-scale feature extraction, limiting its ability to address complex noise effectively. In contrast, TransUNET and SwinUNET represent advanced Transformer-based architecture. These models leverage self-attention mechanisms to capture global context effectively. However, they lack an inherent capability to focus on spatially local details. Additionally, Transformer-based models are computationally intensive, making them challenging to train in resource-constrained environments. As a result, in this research, we employed two different Transformer-based models for facial, CT-scan, and remote sensing image denoising to achieve optimal results. Lastly, the HREDN model draws inspiration from both U-Net and Attention U-Net. It improves upon these designs by introducing advanced attention mechanisms and incorporating multi-scale feature extraction capabilities, making it particularly well-suited for denoising complex noise.

2. Materials and Methods

2.1. Image Noise

This research primarily focuses on Gaussian noise and salt-and-pepper noise. Gaussian noise occurs when an image is distorted by the introduction of a random Gaussian function. Meanwhile, salt-and-pepper noise is generated when bright and dark spots are randomly inserted into an image. Mixed noise arises when an image undergoes distortion caused by a combination of multiple noise types [17].
Mathematically, the process of adding mixed noise (Gaussian + salt-and-pepper) in an image is shown below:
I n = f s & p I c + w × G
where I n denotes a noised image, while I c corresponds to its noise-free original image. The presence of Gaussian noise is represented by the variable G , with its noise factor denoted by w . Additionally, the function f s & p represents a function that randomly add salt-and-pepper noise to the image.
In this study, images corrupted by five distinct types of mixed noise are used. An example of an image corrupted by these five noise types is shown in Figure 1.

2.2. Network Architecture

In this paper, we propose a highly robust encoder–decoder network (HREDN). The structure of the proposed network is illustrated in Figure 2.
As shown in Figure 2, our network adopts an encoder–decoder structure, where the encoder consists of four encoding modules. Each module includes four 3 × 3 convolutional layers, four multi-scale feature enhancement blocks, and four max-pooling layers with a stride of 2 × 2. Two 3 × 3 convolutional layers are included in the intermediate stage.
Meanwhile, the decoder consists of four 3 × 3 transposed convolutional layers with a stride of 2 × 2 and four pairs of 3 × 3 convolutional layers. At the end of the network, a 1 × 1 convolutional layer with a sigmoid activation function is added. In addition to this, ReLU is used as the activation function in all other convolutional layers to enhance the network’s nonlinearity representation [30].

2.2.1. Residual Encoder–Decoder Architecture

We use a residual encoder–decoder architecture as the backbone of our HREDN. When passing an image through the encoder block, there is a risk of information loss, which reduces the decoder’s capacity to retain structural details. To address this issue, skip connections are employed to transfer information from the feature maps of the encoder to the decoder.
The encoder–decoder architecture is symmetric, so skip connections can only be added between encoder and decoder layers at corresponding levels. However, skip connections can also transfer superfluous information from encoding layers to decoding layers. To prevent this, an attention gate is introduced within the skip connection. Additionally, in deep networks, skip connections help address the vanishing gradient problem [27].

2.2.2. Multi-Scale Feature Enhancement Block (MSFEB)

In this paper, a multi-scale feature enhancement block is constructed to extract features at varying scales. The multi-scale feature enhancement block designed in this paper is shown in Figure 3.
Convolutional layers with varying filter sizes and numbers are combined to enhance the network’s feature extraction capability. ReLU is applied in the convolutional layers to maintain nonlinearity. Additionally, a 1 × 1 convolutional kernel is used to reduce computational overhead.
Learning features at multiple scales allows the model to capture both fine details and broader patterns in the data. Varying the number of filters enables the network to detect more complex patterns, improving its performance in handling intricate noise. This approach boosts the model’s robustness and enhances its ability to represent diverse features.

2.2.3. Attention Gate

The attention gate in HREDN is inspired by the additive attention gate [9]. This attention gate is integrated into UNET architecture to enhance its performance. It selectively focuses on significant activations, enabling the extraction of the most relevant data features from the information passed through the skip connection. As a result, irrelevant and noisy responses are filtered out, leading to improved performance. The attention process occurs just before the concatenation operation, ensuring that only pertinent activations are combined. The proposed architecture of the attention block is shown in Figure 4.
As shown in Figure 4, two inputs are passed to the attention block. The input named O s is generated from the upper layer, producing a representation with higher dimensions, whereas O d is generated from the lower layer, resulting in a representation with lower dimensions. To maintain uniform dimensions, a relevant convolution operation is applied, and the resulting representations are then added.
The process of aligning the dimensions of two input vectors followed by the concatenation of these aligned inputs is as shown in Equation (2).
O d s = R 1 C o n v O d R 2 C o n v O s
where O s represents the value gained from skip connection, O d represents the output of decoder, and O d s represents the output gained after addition. Conv is a 1 × 1 convolution operator, and R 1 and R 2 are tensor reshape operators.
Afterward, a rectified linear unit (ReLU) is employed as an activation function, and the resulting tensor undergoes a convolutional layer with a filter count of 1. This produces a single-depth vector, representing the input’s weight. To maintain interpretability, the gained output undergoes additional handling through a sigmoid activation function.
The process of the gaining weight vector is shown in Equation (3).
w = S i g m o i d R 3 C o n v R e l u O d s
where R 3 is the tensor reshape operator, S i g m o i d represents sigmoid activation function, w represents gained weight vector, Conv is a 1 × 1 convolution operator, and R e l u represents the rectified linear unit activation function.
The resulting weight vector is upsampled to match the dimensions of O s . The vector O s and weight vector undergo element-wise multiplication, which scales the elements according to their relevance, effectively adjusting the vector. Finally, the output gained after multiplication is convolved and passed to the decoder.
The process of adjusting the O s with respect to the weight vector is shown in Equation (4).
O a = R 4 C o n v O s U p s a m p l e w
where R 4 is tensor reshape operator, Conv is a 1 × 1 convolution operator, and Upsample is the upsampling layer.

2.3. Loss Function

To enhance the performance of the HREDN model, its parameters need to be optimized. This can be achieved by minimizing the loss function during the model training process. Equation (5) represents the loss function used in this research.
L θ = 1 K i = 1 K f H R E D N I N i , θ I c i 2
where K is the number of noisy images, θ denotes the parameters of HREDN, I N i is i th noisy image, and I c i is i th clean image.

2.4. Quantitative Evaluation

To quantitatively evaluate the performance of the proposed models, mean square error (MSE), peak signal-to-noise ratio (PSNR) [28], structural similarity (SSIM) [29], and image enhancement factor (IEF) are used. They can be defined as follows:
M S E = 1 n i = 1 n I d e n o i s e d ,   i I o r i g i n a l , i 2
where I d e n o i s e d ,   i represents the denoised image at pixel i , I o r i g i n a l , i represents the original noise-free image at pixel i , and n denotes the total number of pixels in the image.
P S N R d B = 10 l o g 10 M A X S i g n a l 2 / M S E
where M S E represents mean square error of all the pixels in the images and M A X represents the maximum value of pixel.
S x , y = 2 μ x μ y + C 1 μ x 2 + μ y 2 + C 1 · 2 σ x σ y + C 2 σ x 2 + σ y 2 + C 2 · σ x y + C 3 σ x σ y + C 3
where μ denotes the means, σ denotes the variance, C denotes the constant, and σ x y denotes the covariance.
I E F = I n o i s y I o r i g i n a l 2 I d e n o i s e d I o r i g i n a l 2
where I o r i g i n a l represents the original image, I n o i s y represents the noisy image, and I d e n o i s e d represents the denoised image.

3. Results

3.1. Dataset

To train our model for mixed Gaussian and salt-and-pepper noise denoising, our training set is constructed by using images from the Facial Expression Recognition 2013 Dataset (FER2013) [31]. The FER2013 dataset consists of 35,887 grayscale images, each with a resolution of 48 × 48 pixels. To train, validate, and test the deep learning model, 25,120 images (70%), 5383 images (15%), and 5384 images (15%) are designated, respectively. Furthermore, for additional testing, the CKPLUS-48 dataset is utilized, which comprises 750 grayscale images, that also have a resolution of 48 × 48 pixels [32]. The deep learning model trained on the FER2013 dataset is employed to denoise the noisy CKPLUS images.
To diversify the dataset, the Curated COVID CT dataset is used [33]. It includes 17,104 grayscale CT images of 128 × 128 pixels, divided into 70% for training, 15% for validation, and 15% for testing.
To further evaluate the model, the NWPU-RESISC45 dataset is used [34]. This remote sensing image dataset contains 31,500 images spanning 45 scene classes, with 700 images per class. A subset of 17,500 grayscale images from 25 scene classes (700 images per class) is selected. Each image has a resolution of 128 × 128 pixels. The dataset is split into 70% for training, 15% for validation, and 15% for testing.

3.2. Mixed Noise Generation

To generate noisy images, a mixture of Gaussian and salt-and-pepper noise is used. Gaussian noise is applied with noise factors of 10, 30, 50, 70, and 90. Meanwhile, salt-and-pepper noise is randomly generated for each image in the dataset, with a replacement range of 1% to 50%. The range of salt-and-pepper noise is limited compared to Gaussian noise because the impact of small amounts of salt-and-pepper noise is equivalent to that of high-value Gaussian noise. This ensures that neither type of noise has a disproportionate influence.
As shown in Table 1, Gaussian noise with a noise factor of 30 and salt-and-pepper noise with 5% replacement yield similar metric values.

3.3. Experiment Settings

We used the Adam optimizer with β 1 = 0.9 ,     β 2 = 0.999 , and ε = 10 7 to minimize the loss function for the training model [35]. The learning rate is 0.001, the number of training epochs is 100, and the batch size is 64. Early stopping with a patience value of 2 is used to monitor the validation loss and prevent overfitting. For CT scan and remote sensing image denoising, a batch size of 16 is used.
We use TensorFlow version 2.1.0, keras version 2.3.1, and python version 3.7.6 to train and test models, and all experiments were run on a Kaggle with NVIDIA T4(×2) GPU.

3.4. Analysis of Results

This section shows the experimental results. For the facial image denoising, the experimental result of the HREDN is compared with several recent state-of-the-art methods, such as BM3D, DNCNN, FCAIDE, ADNET, BRDNET, GFF, RDUNET, and TransUNET [4,10,11,12,13,14,15,26]. For CT scan image denoising, SwinUNet is used instead of TransUNet [36]. TransUNet is computationally more expensive and challenging to train with the current setup, whereas SwinUNet offers more efficient memory usage.
Among the compared methods, BM3D is a representative model-based method, while SwinUNET and TransUNET are CNN-Transformer-based models, and the remaining methods are CNN-based denoising techniques. All models are re-implementations of the original training processes.

3.4.1. Facial Image Dataset

Figure 5 presents the qualitative results of each model after denoising a facial image corrupted by a mixture of Gaussian noise with a noise factor of 30 and random salt-and-pepper noise. From the figure, it is evident that BM3D fails to remove the noise completely. Noise remains clearly visible in the image produced by DNCNN as well. The denoised image produced by the GFF is blurry, and the facial details are unclear. While ADNET and BRDNET successfully remove the noise, these methods fail to preserve structural details. RDUNET, FCAIDE, RDUNET, and TransUNET generate smoother and clearer images; however, the proposed HREDN demonstrates superior performance in preserving edge information and structural details. This is particularly noticeable in the edges of the cheek, where HREDN effectively retains the facial structure.
Table 2, Table 3, Table 4 and Table 5 present the metric values for each model after denoising the facial image. From the table, it is evident that the proposed HREDN achieves the lowest MSE, indicating that the denoised image closely matches the original in terms of pixel values. Additionally, HREDN exhibits the highest PSNR and SSIM scores, suggesting that it effectively removes noise while maintaining high image quality and preserving visual similarity to the original. HREDN also preserves key structural details. Finally, HREDN achieves the highest IEF score, demonstrating its ability to enhance edges and textures without introducing artifacts.
In addition to the FER2013 dataset, the performance of the proposed HREDN is also compared with state-of-the-art algorithms on the CKPLUS dataset. The CKPLUS dataset, which is another facial image dataset, is used for denoising tasks. Since the FER2013 dataset consists of facial images, using a denoising dataset with similar image types is a logical choice. Models trained on the FER2013 dataset are applied to denoise noisy images from the CKPLUS dataset.
Figure 6 shows the qualitative results of each model after denoising a facial image corrupted by a mixture of Gaussian noise with a noise factor of 30 and random salt-and-pepper noise. The figure clearly demonstrates that the image generated by BM3D is largely unable to remove noise. Noise remains visible in the images produced by DNCNN and ADNET. The denoised image produced by the GFF is blurry, and the facial features are not visible. The images generated by BRDNET and FCAIDE appear blurry. RDUNET and TransUNET successfully produce clear, noise-free images, but the edge information and structural details are better preserved in the image generated by HREDN. HREDN effectively reduces noise while maintaining finer details, such as facial features and the overall shape of the head.
Table 6, Table 7, Table 8 and Table 9 present the denoising performance of each model. HREDN stands out with the lowest MSE and highest PSNR, indicating superior noise reduction and image quality. It also excels in preserving structural details and enhancing edges, as evidenced by its highest SSIM and IEF scores.

3.4.2. CT Scan Dataset

To further evaluate the performance of HREDN in other domains, it is applied to denoise CT scan images.
Figure 7 shows the qualitative results of each model after denoising a CT scan image corrupted by a mixture of Gaussian noise (with a noise factor of 30) and random salt-and-pepper noise. To better observe the denoising performance of each model, a red-boxed region of interest is highlighted in the image. The figure clearly demonstrates that the image generated by BM3D fails to remove noise effectively. Noise remains visible in the image produced by RDUNET. The region of interest is blurry in the case of GFF, and the model is unable to preserve crucial details. Meanwhile, DNCNN, ADNET, and BRDNET struggle to preserve edge information in the region of interest. Although the images generated by FCAIDE and SwinUNET appear clearer, the structural information is better preserved in the image produced by the proposed HREDN. Additionally, the edge information in the HREDN-generated image is notably sharper.
Table 10, Table 11, Table 12 and Table 13 present the metric values for each model after denoising the CT scan images. From the table, it is evident that the proposed HREDN achieves the lowest MSE and the highest PSNR, SSIM, and IEF values. The proposed HREDN outperforms other state-of-the-art methods by a significant margin in nearly all mixed noise cases, achieving optimal metric scores.

3.4.3. NWPU-RESISC45 Dataset

To further evaluate the performance of HREDN in other domains, it is applied to denoise remote sensing images.
Figure 8 illustrates the qualitative results of each model after denoising a mountain image affected by a combination of Gaussian noise (with a noise factor of 30) and random salt-and-pepper noise. To enhance the evaluation of denoising performance, a red-boxed region of interest is highlighted in the image. The figure clearly demonstrates that the BM3D algorithm struggles to effectively remove noise. The denoised image produced by DNCNN still contains noticeable noise, while the region of interest appears blurry in the image processed by GFF. Although ADNET performs better than DNCNN, noise remains, and the region of interest lacks clarity. BRDNet significantly reduces noise, but the region of interest is still not well-defined. FCAIDE, RDUNet, and SwinUNET successfully remove noise while preserving edge information; however, the proposed HREDN outperforms them in retaining structural details. This is evident from both the obtained metric scores and the visualization of the region of interest.
Table 14, Table 15, Table 16 and Table 17 display the metric values for each model following the denoising of remote sensing images. The data clearly indicate that the proposed HREDN attains the lowest MSE and the highest PSNR, SSIM, and IEF values. In nearly all mixed noise scenarios, HREDN surpasses other state-of-the-art methods by a considerable margin, achieving superior metric scores.

3.4.4. Statistical Evaluation of HREDN’s Generalization Across Domains

To evaluate the robustness of HREDN across all domains, a statistical analysis was conducted. For the facial domain, only the FER2013 dataset results were used, as it was directly used for training the model. An ANOVA test was performed to determine whether HREDN’s mean performance differed across domains, while Levene’s test was conducted to assess the similarity of variances across groups. Together, these tests provide a statistically rigorous generalization analysis.
As shown in Table 18, the ANOVA results indicate that there are no statistically significant differences across domains for PSNR (p = 0.1585), SSIM (p = 0.3641), and IEF (p = 0.1585), suggesting that the model performed consistently. While MSE showed some variation (p = 0.0963), it did not reach statistical significance. Additionally, Levene’s test confirmed that the variance across domains was homogeneous for all metrics (p > 0.05). These findings suggest that the model generalizes well across different image domains.
Compared to the medical and remote sensing domains, the facial images are significantly smaller, resulting in fewer pixels, each carrying more information. As a result, noise has a greater impact, leading to higher MSE variation in the facial domain. Nonetheless, the model effectively handles this with only small performance degradation.

3.5. Ablation Studies

Ablation experiments are conducted to assess the contribution of each module in the proposed network architecture. The experiments are performed under mixed noise conditions (G:30 + RSP) using the FER2013 dataset for facial image denoising.
  • ED: A basic encoder–decoder architecture with a skip connection.
  • ED + Attention: A basic encoder–decoder architecture with a skip connection enhanced by an attention gate.
  • ED + Attention + MSFEB: A basic encoder–decoder architecture with a skip connection enhanced by an attention gate and a multi-scale feature enhancement block.
Table 19 presents the average performance metric scores obtained by integrating the attention gate and MSFEB module into an encoder–decoder architecture. The attention gate improves the extraction of relevant features, leading to the enhanced performance of the model. Furthermore, the inclusion of multi-scale features through the MSFEB module results in a significant and more pronounced improvement in model performance. This underscores the crucial role of edge and structural information captured by MSFEB in enhancing the overall effectiveness of the network.
Table 20 presents a summary of the training time, number of parameters, and inference time gained after integrating each module into the basic encoder–decoder architecture. The values for each performance metric shown in the table represent averages derived from all mixed noise cases. It is evident from the table that incorporating attention mechanisms has led to a reduction in both training and inference times. This suggests that the model prioritizes relevant input features while disregarding unnecessary information, though at the cost of an increased number of parameters. Additionally, the inclusion of MSFEB has resulted in a significant increase in all performance metrics, indicating a higher requirement for computational time and resources during model training.
The impact of adding attention and MSFEB modules is analyzed in Table 19 and Table 20 to understand their contributions to model performance and computational efficiency. While the addition of these modules has improved the model’s performance, it comes at the cost of increased computational burden. The main objective of this study is to denoise digital images in complex noise scenarios, which requires a sophisticated network architecture capable of capturing intricate noise patterns. However, for tasks involving denoising images with lower complexity, the model can be simplified.

4. Conclusions

In summary, a novel and highly robust encoder–decoder network (HREDN) for denoising mixed salt-and-pepper and Gaussian noise is proposed. HREDN uses a multi-scale feature enhancement block in the encoder to capture features at various scales. Skip connections transfer important feature maps from the encoder to the decoder to preserve structural details, while attention gate ensures only relevant features are passed, eliminating redundant information. The experimental results show that the proposed method significantly outperforms existing approaches across all types of Gaussian and salt-and-pepper noise mixtures in facial, medical, and remote sensing image denoising. The method effectively preserves edge information and structural details in the denoised images, achieving superior performance in both qualitative and quantitative metrics compared to existing methods. The statistical analysis also demonstrates the model’s robustness in removing noise across multiple complex noise scenarios, with varying image resolutions across different domains.
In future work, we plan to investigate more advanced deep learning architectures and expand the proposed network to other domains. Additionally, we aim to explore a wider range of noise scenarios beyond mixed Gaussian and salt-and-pepper noise.

Author Contributions

M.T., W.K. and T.K. contributed equally to this article. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data that are available in publicly accessible repositories do not issue DOIs. The used data can be found here: the Facial Expression Recognition 2013 Dataset (FER2013) is available at https://www.kaggle.com/datasets/msambare/fer2013 (accessed on 31 December 2019), the Cohn-Kanade Plus 48 (CKPLUS-48) dataset is available at https://sites.pitt.edu/~emotion/ck-spread.htm (accessed on 15 April 2020), the Curated COVID CT dataset is available at https://www.kaggle.com/datasets/maedemaftouni/large-covid19-ct-slice-dataset (accessed on 22 May 2024), and the NWPU-RESISC25 dataset is available at https://www.kaggle.com/datasets/aaryaankurparikh/nwpu-resisc25 (accessed on 28 January 2025).

Acknowledgments

The authors acknowledge the Excellent Foreign Students (EFS) and Doctoral Research Scholarship (TU-PhD) awarded by Sirindhorn International Institute of Technology and Thammasat University.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HREDNHighly Robust Encoder–Decoder Network
CNNConvolutional Neural Network
CTComputed Tomography
MSEMean Squared Error
PSNRPeak Signal-to-Noise Ratio
SSIMStructural Similarity Index Measure
IEFImage Enhancement Framework
GGaussian
RSPRandom Salt-and-Pepper
MSFEBMulti-Scale Feature Enhancement Block
FER2013Facial Expression Recognition 2013
CKPLUS-48Cohn-Kanade Plus Dataset, 48 × 48 Resolution
BM3DBlock-Matching and 3D Filtering
DNCNNDenoising Convolutional Neural Network
FCAIDEFully Convolutional Pixel Adaptive Image Denoise
ADNETAttention-Guided CNN for Image Denoising
BRDNETBatch-Renormalization Denoising Network
RDUNETResidual Dense Neural Network
EDEncoder–Decoder
ReLURectified Linear Unit

References

  1. Wang, M.; Zheng, S.; Li, X.; Qin, X. A New Image Denoising Method Based on Gaussian Filter. In Proceedings of the 2014 International Conference on Information Science, Electronics and Electrical Engineering, Sapporo, Japan, 26–28 April 2014; Volume 1. [Google Scholar]
  2. Li, X.; Ji, J.; Li, J.; He, S.; Zhou, Q. Research on Image Denoising Based on Median Filter. In Proceedings of the IMCEC 2021—IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference, Chongqing, China, 18–20 June 2021. [Google Scholar]
  3. Bhonsle, D.; Chandra, V.; Sinha, G.R. Medical Image Denoising Using Bilateral Filter. Int. J. Image Graph. Signal Process. 2012, 4, 36. [Google Scholar] [CrossRef]
  4. Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image Denoising with Block-Matching and 3D Filtering. In Proceedings of the Image Processing: Algorithms and Systems, Neural Networks, and Machine Learning, San Jose, CA, USA, 17 February 2006; Volume 6064. [Google Scholar]
  5. Tian, C.; Fei, L.; Zheng, W.; Xu, Y.; Zuo, W.; Lin, C.W. Deep Learning on Image Denoising: An Overview. Neural Netw. 2020, 131, 251–275. [Google Scholar] [CrossRef] [PubMed]
  6. Ilesanmi, A.E.; Ilesanmi, T.O. Methods for Image Denoising Using Convolutional Neural Network: A Review. Complex. Intell. Syst. 2021, 7, 2179–2198. [Google Scholar] [CrossRef]
  7. Ghose, S.; Singh, N.; Singh, P. Image Denoising Using Deep Learning: Convolutional Neural Network. In Proceedings of the Confluence 2020—10th International Conference on Cloud Computing, Data Science and Engineering, Noida, India, 29–31 January 2020. [Google Scholar]
  8. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Marrakesh, Morocco, 7–11 October 2024; Volume 9351. [Google Scholar]
  9. Oktay, O.; Schlemper, J.; Le Folgoc, L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; Mcdonagh, S.; Hammerla, N.Y.; Kainz, B.; et al. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
  10. Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. IEEE Trans. Image Process. 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed]
  11. Cha, S.; Moon, T. Fully Convolutional Pixel Adaptive Image Denoiser. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
  12. Tian, C.; Xu, Y.; Li, Z.; Zuo, W.; Fei, L.; Liu, H. Attention-Guided CNN for Image Denoising. Neural Netw. 2020, 124, 117–129. [Google Scholar] [CrossRef] [PubMed]
  13. Tian, C.; Xu, Y.; Zuo, W. Image Denoising Using Deep CNN with Batch Renormalization. Neural Netw. 2020, 121, 461–473. [Google Scholar] [CrossRef]
  14. Li, X.; Zhao, H.; Han, L.; Tong, Y.; Yang, K. GFF: Gated Fully Fusion for Semantic Segmentation. Proc. AAAI 2019, 34, 11418–11425. [Google Scholar] [CrossRef]
  15. Gurrola-Ramos, J.; Dalmau, O.; Alarcón, T.E. A Residual Dense U-Net Neural Network for Image Denoising. IEEE Access 2021, 9, 31742–31754. [Google Scholar] [CrossRef]
  16. Zhang, H.; Lian, Q.; Zhao, J.; Wang, Y.; Yang, Y.; Feng, S. RatUNet: Residual U-Net Based on Attention Mechanism for Image Denoising. PeerJ Comput. Sci. 2022, 8, e970. [Google Scholar] [CrossRef] [PubMed]
  17. Mafi, M.; Izquierdo, W.; Martin, H.; Cabrerizo, M.; Adjouadi, M. Deep Convolutional Neural Network for Mixed Random Impulse and Gaussian Noise Reduction in Digital Images. IET Image Process 2020, 14, 3791–3801. [Google Scholar] [CrossRef]
  18. Khmag, A. Additive Gaussian Noise Removal Based on Generative Adversarial Network Model and Semi-Soft Thresholding Approach. Multimed. Tools Appl. 2023, 82, 7757–7777. [Google Scholar] [CrossRef]
  19. Dantas, C.F.; Da Costa, M.N.; Da Rocha Lopes, R. Learning Dictionaries as a Sum of Kronecker Products. IEEE Signal Process Lett. 2017, 24, 559–563. [Google Scholar] [CrossRef]
  20. Khmag, A.; Kamarudin, N. Natural Image Deblurring Using Recursive Deep Convolutional Neural Network (R-DbCNN) and Second-Generation Wavelets. In Proceedings of the 2019 IEEE International Conference on Signal and Image Processing Applications, ICSIPA, Kuala Lumpur, Malaysia, 17–19 September 2019. [Google Scholar]
  21. Hasan, M.; El-Sakka, M.R. Improved BM3D Image Denoising Using SSIM-Optimized Wiener Filter. EURASIP J. Image Video Process 2018, 2018, 25. [Google Scholar] [CrossRef] [PubMed]
  22. Min, C.; Wen, G.; Li, B.; Fan, F. Blind Deblurring via a Novel Recursive Deep CNN Improved by Wavelet Transform. IEEE Access 2018, 6, 69242–69252. [Google Scholar] [CrossRef]
  23. Xu, L.; Ren, J.S.J.; Liu, C.; Jia, J. Deep Convolutional Neural Network for Image Deconvolution. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
  24. Cheng, S.; Zhuang, Y.; Kahouadji, L.; Liu, C.; Chen, J.; Matar, O.K.; Arcucci, R. Multi-Domain Encoder–Decoder Neural Networks for Latent Data Assimilation in Dynamical Systems. Comput. Methods Appl. Mech. Eng. 2024, 430, 117201. [Google Scholar] [CrossRef]
  25. Zhou, H.; Cheng, S.; Arcucci, R. Multi-Fidelity Physics Constrained Neural Networks for Dynamical Systems. Comput. Methods Appl. Mech. Eng. 2024, 420, 116758. [Google Scholar] [CrossRef]
  26. Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar]
  27. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  28. Korhonen, J.; You, J. Peak Signal-to-Noise Ratio Revisited: Is Simple Beautiful? In Proceedings of the 2012 4th International Workshop on Quality of Multimedia Experience, Melbourne, VIC, Australia, 5–7 July 2012. [Google Scholar]
  29. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Measurement to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–613. [Google Scholar] [CrossRef]
  30. Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML), Haifa, Israel, 21–24 June 2010. [Google Scholar]
  31. Goodfellow, I.J.; Erhan, D.; Luc Carrier, P.; Courville, A.; Mirza, M.; Hamner, B.; Cukierski, W.; Tang, Y.; Thaler, D.; Lee, D.H.; et al. Challenges in Representation Learning: A Report on Three Machine Learning Contests. Neural Netw. 2015, 64, 117–124. [Google Scholar] [CrossRef]
  32. Kanade, T.; Cohn, J.F.; Tian, Y. Comprehensive Database for Facial Expression Analysis. In Proceedings of the 4th IEEE International Conference on Automatic Face and Gesture Recognition, Grenoble, France, 28–30 March 2000. [Google Scholar]
  33. Maftouni, M.; Law, A.C.C.; Shen, B.; Kong Grado, Z.; Zhou, Y.; Yazdi, N.A. A Robust Ensemble-Deep Learning Model for COVID-19 Diagnosis Based on an Integrated CT Scan Images Database. In Proceedings of the IISE Annual Conference and Expo 2021, Virtual, 22–25 May 2021. [Google Scholar]
  34. Cheng, G.; Han, J.; Lu, X. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proc. IEEE 2017, 105, 1865–1883. [Google Scholar] [CrossRef]
  35. Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  36. Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In Proceedings of the Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Milan, Italy, 29 September–4 October 2024; Volume 13803 LNCS. [Google Scholar]
Figure 1. Images corrupted by mixed Gaussian (G) and random salt-and-pepper (RSP) noise: (a) G:10 + RSP, (b) G:30 + RSP, (c) G:50 + RSP, (d) G:70 + RSP, (e) G:90 + RSP.
Figure 1. Images corrupted by mixed Gaussian (G) and random salt-and-pepper (RSP) noise: (a) G:10 + RSP, (b) G:30 + RSP, (c) G:50 + RSP, (d) G:70 + RSP, (e) G:90 + RSP.
Jimaging 11 00051 g001
Figure 2. Architecture of the proposed HREDN.
Figure 2. Architecture of the proposed HREDN.
Jimaging 11 00051 g002
Figure 3. Architecture of the multi-scale feature enhancement block.
Figure 3. Architecture of the multi-scale feature enhancement block.
Jimaging 11 00051 g003
Figure 4. In-depth architecture of proposed attention block.
Figure 4. In-depth architecture of proposed attention block.
Jimaging 11 00051 g004
Figure 5. Visual comparison among eight facial image denoising methods on a single testing image from the FER2013 dataset with noise level (G:30 + RSP).
Figure 5. Visual comparison among eight facial image denoising methods on a single testing image from the FER2013 dataset with noise level (G:30 + RSP).
Jimaging 11 00051 g005
Figure 6. Visual comparison results for eight facial image denoising methods on single testing image from CKPLUS dataset with noise level (G:30 + RSP).
Figure 6. Visual comparison results for eight facial image denoising methods on single testing image from CKPLUS dataset with noise level (G:30 + RSP).
Jimaging 11 00051 g006
Figure 7. Visual comparison results for eight CT scan image denoising methods on single testing image from Curated COVID CT dataset with noise level (G:30 + RSP).
Figure 7. Visual comparison results for eight CT scan image denoising methods on single testing image from Curated COVID CT dataset with noise level (G:30 + RSP).
Jimaging 11 00051 g007
Figure 8. Visual comparison results for eight remote sensing image denoising methods on single testing image from NWPU-RESISC45 dataset with noise level (G:30 + RSP).
Figure 8. Visual comparison results for eight remote sensing image denoising methods on single testing image from NWPU-RESISC45 dataset with noise level (G:30 + RSP).
Jimaging 11 00051 g008
Table 1. PSNR and SSIM value gained after adding noise.
Table 1. PSNR and SSIM value gained after adding noise.
Original vs. Noisy Image
Noise TypePSNRSSIM
Gaussian (30)18.590.56
Salt-and-Pepper (5%)18.450.62
Table 2. Average MSE gained before and after denoising 5384 Fer2013 testing images.
Table 2. Average MSE gained before and after denoising 5384 Fer2013 testing images.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Noisy
0.08390.09320.11150.13890.1754
Original vs. Denoised
BM3D0.0684 ± 0.0590.0755 ± 0.0600.0942 ± 0.0590.1233 ± 0.0540.1626 ± 0.049
DNCNN0.0031 ± 0.0020.0051 ± 0.0020.0111 ± 0.0140.0109 ± 0.0030.0187 ± 0.005
FCAIDE0.0011 ± 0.0010.0031 ± 0.0010.0052 ± 0.0020.0073 ± 0.0020.0087 ± 0.003
ADNET0.0027 ± 0.0020.0041 ± 0.0010.0075 ± 0.0030.0095 ± 0.0030.0121 ± 0.004
BRDNet0.0011 ± 0.0010.0034 ± 0.0010.0058 ± 0.0020.0075 ± 0.0030.0102 ± 0.003
GFF0.0119 ± 0.0230.0110 ± 0.0090.0153 ± 0.0350.0163 ± 0.0600.0134 ± 0.005
RDUNET0.0015 ± 0.0010.0031 ± 0.0010.0053 ± 0.0020.0071 ± 0.0020.0176 ± 0.006
TransUNET0.0012 ± 0.0010.0029 ± 0.0010.0054 ± 0.0020.0068 ± 0.0020.0090 ± 0.003
HREDN0.0009 ± 0.0010.0027 ± 0.0010.0047 ± 0.0020.0066 ± 0.0020.0085 ± 0.003
Table 3. Average PSNR gained before and after denoising 5384 Fer2013 testing images.
Table 3. Average PSNR gained before and after denoising 5384 Fer2013 testing images.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Noisy
12.156411.229410.05428.85517.7067
Original vs. Denoised
BM3D14.0295 ± 55.4713.1275 ± 4.7811.1517 ± 2.919.4634 ± 1.788.0671 ± 1.22
DNCNN26.0204 ± 2.7223.1781 ± 1.5420.6251 ± 2.4619.7788 ± 1.1417.4133 ± 1.07
FCAIDE29.8812 ± 1.965.3660 ± 1.4123.0189 ± 1.4121.5827 ± 1.4020.8224 ± 1.45
ADNET26.7243 ± 2.9424.0540 ± 11.3221.5050 ± 1.4620.4072 ± 1.3019.3431 ± 1.26
BRDNet30.1462 ± 1.9624.9166 ± 1.4022.5217 ± 1.2821.4418 ± 1.3820.1332 ± 1.32
GFF20.9552 ± 3.0320.1160 ± 1.9519.3942 ± 2.3919.1173 ± 2.0818.9849 ± 1.48
RDUNET28.4322 ± 1.2225.2940 ± 1.5123.0055 ± 1.4321.7566 ± 1.4717.7947 ± 1.41
TransUNET29.6051 ± 1.9325.5655 ± 1.4822.8748 ± 1.3521.9385 ± 1.4320.6780 ± 1.38
HREDN30.6853 ± 1.9225.8967 ± 1.5223.5542 ± 1.4822.0414 ± 1.5320.9744 ± 1.50
Table 4. Average SSIM gained before and after denoising 5384 Fer2013 testing images.
Table 4. Average SSIM gained before and after denoising 5384 Fer2013 testing images.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Noisy
0.35540.28950.22530.17460.1364
Original vs. Denoised
BM3D0.4253 ± 0.220.3717 ± 0.200.2503 ± 0.110.1746 ± 0.070.1308 ± 0.06
DNCNN0.8763 ± 0.070.7800 ± 0.070.7202 ± 0.060.6540 ± 0.060.4632 ± 0.07
FCAIDE0.9581 ± 0.020.8833 ± 0.030.8128 ± 0.040.7589 ± 0.050.7123 ± 0.06
ADNET0.9060 ± 0.050.8282 ± 0.040.7323 ± 0.070.6827 ± 0.050.6134 ± 0.05
BRDNet0.9585 ± 0.020.8681 ± 0.030.7892 ± 0.040.7392 ± 0.050.6778 ± 0.05
GFF0.7303 ± 0.100.7169 ± 0.060.6828 ± 0.070.6468 ± 0.080.6323 ± 0.07
RDUNET0.9334 ± 0.020.8860 ± 0.030.8159 ± 0.040.7692 ± 0.050.5760 ± 0.06
TransUNET0.9394 ± 0.020.8483 ± 0.050.7318 ± 0.060.7099 ± 0.070.6337 ± 0.08
HREDN0.9632 ± 0.010.8951 ± 0.030.8330 ± 0.040.7759 ± 0.060.7336 ± 0.06
Table 5. Average IEF gained before and after denoising 5384 Fer2013 testing images.
Table 5. Average IEF gained before and after denoising 5384 Fer2013 testing images.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Denoised
BM3D11.0443 ± 661.913.2545 ± 105.031.3066 ± 0.381.1426 ± 0.081.0856 ± 0.03
DNCNN26.8849 ± 10.0717.0285 ± 7.0214.1173 ± 7.8312.8376 ± 4.029.5817 ± 2.13
FCAIDE72.3963 ± 43.8130.2600 ± 17.7621.8154 ± 11.7819.8386 ± 9.0121.7623 ± 9.25
ADNET31.0252 ± 10.7822.6659 ± 13.4715.7620 ± 8.1815.1775 ± 5.8515.2315 ± 4.75
BRDNet76.4292 ± 45.0026.9473 ± 15.0219.4890 ± 9.6219.2113 ± 7.7519.2113 ± 7.75
GFF9.9257 ± 6.659.1453 ± 5.269.5543 ± 4.1911.5889 ± 4.9214.2739 ± 5.36
RDUNET53.2546 ± 32.3029.6204 ± 17.2821.7541 ± 10.6820.7435 ± 8.7610.6554 ± 3.19
TransUNET67.4204 ± 39.4732.1680 ± 0.2021.1724 ± 10.7221.5955 ± 8.7621.0163 ± 8.26
HREDN88.6590 ± 58.4434.6337 ± 21.5125.0132 ± 14.0722.5668 ± 18.9222.7099 ± 13.77
Table 6. Average MSE gained before and after denoising CKPLUS images.
Table 6. Average MSE gained before and after denoising CKPLUS images.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Noisy
0.09270.09920.11770.14220.1822
Original vs. Denoised
BM3D0.0781 ± 0.0670.0829 ± 0.0640.1016 ± 0.0610.1283 ± 0.0560.1699 ± 0.051
DNCNN0.0034 ± 0.0030.0051 ± 0.0020.0121 ± 0.0140.0113 ± 0.0020.0197 ± 0.004
FCAIDE0.0011 ± 0.0010.0030 ± 0.0010.0053 ± 0.0010.0073 ± 0.0020.0090 ± 0.002
ADNET0.0027 ± 0.0020.0044 ± 0.0010.0080 ± 0.0020.0104 ± 0.0030.0141 ± 0.003
BRDNet0.0011 ± 0.0010.0034 ± 0.0010.0060 ± 0.0010.0079 ± 0.0020.0108 ± 0.003
GFF0.0104 ± 0.0090.0113 ± 0.0040.0790 ± 0.2300.0139 ± 0.0080.0143 ± 0.004
RDUNET0.0014 ± 0.0000.0030 ± 0.0010.0050 ± 0.0010.0066 ± 0.0020.0166 ± 0.006
TransUNET0.0012 ± 0.0010.0028 ± 0.0010.0057 ± 0.0010.0065 ± 0.0020.0091 ± 0.002
HREDN0.0009 ± 0.0000.0026 ± 0.0010.0044 ± 0.0010.0063 ± 0.0020.0083 ± 0.002
Table 7. Average PSNR gained before and after denoising CKPLUS images.
Table 7. Average PSNR gained before and after denoising CKPLUS images.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Noisy
11.780711.01949.83468.76657.5486
Original vs. Denoised
BM3D13.4348 ± 5.4312.7514 ± 4.8810.7952 ± 2.889.3025 ± 1.827.8825 ± 1.25
DNCNN25.7057 ± 2.9623.1300 ± 1.4720.1699 ± 2.4019.5756 ± 0.8917.1669 ± 0.97
FCAIDE29.8132 ± 1.7025.4079 ± 1.1322.9416 ± 1.1621.4770 ± 1.0820.6398 ± 1.20
ADNET26.6098 ± 2.8623.7323 ± 1.0421.1195 ± 1.1119.9613 ± 1.1318.6523 ± 1.09
BRDNet30.0759 ± 1.7724.8131 ± 1.1522.3180 ± 1.0121.1683 ± 1.1119.8075 ± 1.12
GFF20.5565 ± 2.3219.7845 ± 1.6517.1302 ± 5.5518.8690 ± 1.5018.6391 ± 1.34
RDUNET28.5416 ± 1.0325.3700 ± 1.3123.1461 ± 1.1421.9464 ± 1.1218.0431 ± 1.38
TransUNET29.6529 ± 1.6825.6144 ± 1.1622.5530 ± 1.0821.9911 ± 1.1420.5479 ± 1.10
HREDN30.8059 ± 1.5725.9801 ± 1.1923.7090 ± 1.1622.1735 ± 1.1720.9619 ± 0.04
Table 8. Average SSIM gained before and after denoising CKPLUS images.
Table 8. Average SSIM gained before and after denoising CKPLUS images.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Noisy
0.39600.34290.27500.22250.1734
Original vs. Denoised
BM3D0.4626 ± 0.210.4194 ± 0.190.3038 ± 0.110.2268 ± 0.080.1706 ± 0.06
DNCNN0.8979 ± 0.050.8367 ± 0.050.7740 ± 0.050.7253 ± 0.050.5403 ± 0.05
FCAIDE0.9707 ± 0.010.9201 ± 0.020.8702 ± 0.030.8258 ± 0.040.7870 ± 0.04
ADNET0.9260 ± 0.030.8651 ± 0.030.7932 ± 0.060.7417 ± 0.040.6602 ± 0.04
BRDNet0.9707 ± 0.010.9048 ± 0.020.8401 ± 0.030.8013 ± 0.040.7392 ± 0.04
GFF0.7716 ± 0.070.7603 ± 0.040.6769 ± 0.160.7098 ± 0.050.7019 ± 0.05
RDUNET0.9534 ± 0.010.9246 ± 0.020.8784 ± 0.030.8468 ± 0.030.6893 ± 0.05
TransUNET0.9610 ± 0.010.9047 ± 0.030.8087 ± 0.050.8116 ± 0.050.7482 ± 0.05
HREDN0.9752 ± 0.010.9310 ± 0.020.8910 ± 0.020.8502 ± 0.030.8136 ± 0.04
Table 9. Average IEF gained before and after denoising CKPLUS images.
Table 9. Average IEF gained before and after denoising CKPLUS images.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Denoised
BM3D1.5722 ± 0.701.6895 ± 1.121.2670 ± 0.241.1334 ± 0.071.0803 ± 0.03
DNCNN26.3715 ± 7.8317.5795 ± 6.4513.1186 ± 6.1312.4262 ± 3.159.2680 ± 1.42
FCAIDE74.0230 ± 34.2731.1236 ± 14.5321.6118 ± 7.0719.2882 ± 5.0020.8350 ± 4.53
ADNET32.5192 ± 10.0221.3975 ± 10.2314.7096 ± 6.1013.8772 ± 4.1813.1702 ± 2.75
BRDNet77.9218 ± 35.0326.9789 ± 12.2518.8929 ± 6.6917.9399 ± 4.5817.1437 ± 3.39
GFF9.2810 ± 5.028.6178 ± 4.147.6207 ± 4.2210.7420 ± 3.2613.2688 ± 3.44
RDUNET58.6180 ± 32.0330.2517 ± 12.9722.6860 ± 7.5821.4818 ± 5.5111.5312 ± 2.72
TransUNET71.4136 ± 33.1332.6235 ± 15.1919.8365 ± 6.7821.7372 ± 5.7220.4034 ± 4.44
HREDN94.2450 ± 45.8835.4397 ± 16.2725.9209 ± 8.9222.7043 ± 6.0922.4347 ± 4.80
Table 10. Average MSE gained before and after denoising Curated COVID CT dataset.
Table 10. Average MSE gained before and after denoising Curated COVID CT dataset.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Noisy
0.03440.04260.06180.08860.1258
Original vs. Denoised
BM3D0.0240 ± 0.0380.0266 ± 0.0380.0402 ± 0.0400.0695 ± 0.0380.1103 ± 0.038
DNCNN0.0119 ± 0.0360.0031 ± 0.0020.0140 ± 0.0510.1205 ± 0.6960.0113 ± 0.006
FCAIDE0.0007 ± 0.0010.0014 ± 0.0010.0021 ± 0.0010.0033 ± 0.0020.0037 ± 0.002
ADNET0.0020 ± 0.0020.0045 ± 0.0020.0070 ± 0.0090.0064 ± 0.0030.0098 ± 0.004
BRDNet0.0008 ± 0.0010.0016 ± 0.0010.0033 ± 0.0020.0039 ± 0.0020.0037 ± 0.002
GFF0.0104 ± 0.1130.0220 ± 0.0530.0066 ± 0.0090.0079 ± 0.0150.0105 ± 0.012
RDUNET0.0012 ± 0.0010.0062 ± 0.0010.0022 ± 0.0010.0028 ± 0.0020.0046 ± 0.002
SwinUNET0.0010 ± 0.0010.0017 ± 0.0010.0027 ± 0.0020.0036 ± 0.0020.0041 ± 0.003
HREDN0.0006 ± 0.0010.0012 ± 0.0010.0020 ± 0.0010.0024 ± 0.0010.0031 ± 0.002
Table 11. Average PSNR gained before and after denoising Curated COVID CT dataset.
Table 11. Average PSNR gained before and after denoising Curated COVID CT dataset.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Noisy
16.545914.654412.550510.74349.1307
Original vs. Denoised
BM3D19.729 ± 5.6018.690 ± 5.1715.141 ± 2.8811.958 ± 1.629.7560 ± 1.16
DNCNN24.2760 ± 5.2325.4330 ± 1.5822.1402 ± 3.4420.2527 ± 5.4219.8496 ± 1.67
FCAIDE32.6582 ± 2.3629.0439 ± 1.8327.2311 ± 1.8825.2667 ± 1.8424.7140 ± 1.69
ADNET27.6839 ± 2.2923.7655 ± 1.5823.0164 ± 2.8322.2420 ± 1.5120.3353 ± 1.34
BRDNet31.7221 ± 2.3828.4166 ± 1.9225.1626 ± 1.5824.5054 ± 1.7824.6908 ± 1.69
GFF25.1290 ± 3.4122.2717 ± 5.4123.6673 ± 3.2523.0447 ± 3.2221.1637 ± 2.93
RDUNET29.4405 ± 1.3922.1498 ± 0.6827.1482 ± 1.8425.9761 ± 1.8423.7356 ± 1.62
SwinUNET30.7141 ± 2.2528.3595 ± 1.9526.2044 ± 1.9425.0414 ± 1.9824.3525 ± 1.91
HREDN33.0712 ± 2.4429.8158 ± 2.0427.5033 ± 1.8326.6562 ± 1.8325.5341 ± 1.73
Table 12. Average SSIM gained before and after denoising Curated COVID CT dataset.
Table 12. Average SSIM gained before and after denoising Curated COVID CT dataset.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Noisy
0.49420.30820.20310.14230.1019
Original vs. Denoised
BM3D0.5756 ± 0.170.5143 ± 0.150.2883 ± 0.060.1431 ± 0.040.0940 ± 0.03
DNCNN0.7776 ± 0.090.8144 ± 0.040.6974 ± 0.060.5203 ± 0.080.5188 ± 0.04
FCAIDE0.9543 ± 0.020.9103 ± 0.040.8795 ± 0.040.8348 ± 0.050.8178 ± 0.05
ADNET0.8856 ± 0.030.6902 ± 0.070.7736 ± 0.120.5641 ± 0.080.5052 ± 0.06
BRDNet0.9433 ± 0.030.8933 ± 0.040.8130 ± 0.040.8018 ± 0.050.8186 ± 0.05
GFF0.8232 ± 0.080.6994 ± 0.190.8033 ± 0.090.7797 ± 0.070.7112 ± 0.11
RDUNET0.8750 ± 0.020.6031 ± 0.050.8740 ± 0.040.8477 ± 0.050.7920 ± 0.05
SwinUNET0.8958 ± 0.040.8326 ± 0.050.7731 ± 0.060.7337 ± 0.070.7115 ± 0.07
HREDN0.9592 ± 0.020.9176 ± 0.030.8855 ± 0.040.8689 ± 0.040.8387 ± 0.05
Table 13. Average IEF gained before and after denoising Curated COVID CT dataset.
Table 13. Average IEF gained before and after denoising Curated COVID CT dataset.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Denoised
BM3D2.2878 ± 1.073.2563 ± 2.881.8942 ± 0.671.3306 ± 0.161.1563 ± 0.06
DNCNN6.6344 ± 2.7712.9500 ± 5.8210.2342 ± 3.0311.5061 ± 4.1812.1719 ± 3.12
FCAIDE51.4136 ± 38.6130.6735 ± 16.8631.0313 ± 11.2329.6048 ± 8.9837.9663 ± 11.76
ADNET14.5153 ± 6.138.7725 ± 3.7211.7021 ± 3.3514.3738 ± 2.7013.5151 ± 3.01
BRDNet39.8532 ± 26.1426.2605 ± 13.4318.9899 ± 5.8924.6910 ± 6.7337.6310 ± 11.10
GFF9.9782 ± 6.987.2333 ± 3.2814.1050 ± 4.5619.3547 ± 8.0817.6712 ± 7.24
RDUNET24.9305 ± 17.656.9188 ± 5.2930.5303 ± 11.7035.2578 ± 12.5930.2938 ± 9.68
SwinUNET32.0464 ± 22.7226.4815 ± 15.7824.8669 ± 10.9128.4843 ± 10.3635.3092 ± 12.43
HREDN56.4491 ± 42.1836.9480 ± 21.7833.4629 ± 14.2841.1711 ± 14.3546.1834 ± 15.93
Table 14. Average MSE gained before and after denoising NWPU-RESISC45 dataset.
Table 14. Average MSE gained before and after denoising NWPU-RESISC45 dataset.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Noisy
0.04630.05490.07320.10010.1379
Original vs. Denoised
BM3D0.0310 ± 0.0480.0352 ± 0.0490.0517 ± 0.0490.0825 ± 0.0460.1232 ± 0.046
DNCNN0.0025 ± 0.0020.0049 ± 0.0130.0143 ± 0.0930.0619 ± 0.3320.0089 ± 0.005
FCAIDE0.0009 ± 0.0010.0017 ± 0.0020.0026 ± 0.0020.0034 ± 0.0030.0044 ± 0.004
ADNET0.0024 ± 0.0010.0037 ± 0.0040.0034 ± 0.0030.0060 ± 0.0040.0064 ± 0.005
BRDNet0.0015 ± 0.0010.0020 ± 0.0020.0286 ± 0.0240.0038 ± 0.0030.0059 ± 0.005
GFF0.0031 ± 0.0040.0623 ± 0.1580.1029 ± 0.5222.5468 ± 16.7670.0182 ± 0.031
RDUNET0.0007 ± 0.0010.0017 ± 0.0020.0026 ± 0.0030.0315 ± 0.0060.0493 ± 0.009
SwinUNET0.0008 ± 0.0010.0018 ± 0.0020.0028 ± 0.0030.0035 ± 0.0040.0043 ± 0.005
HREDN0.0007 ± 0.0010.0016 ± 0.0010.0025 ± 0.0020.0033 ± 0.0030.0039 ± 0.004
Table 15. Average PSNR gained before and after denoising NWPU-RESISC45 dataset.
Table 15. Average PSNR gained before and after denoising NWPU-RESISC45 dataset.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Noisy
15.333513.721511.912610.27638.7689
Original vs. Denoised
BM3D19.3923 ± 6.5018.2901 ± 6.2214.2067 ± 3.2311.2872 ± 1.829.3137 ± 1.29
DNCNN26.9817 ± 2.8725.0395 ± 3.3423.6504 ± 3.7921.3229 ± 5.5421.0595 ± 1.96
FCAIDE31.4057 ± 2.6028.8960 ± 3.1426.9885 ± 3.0526.0518 ± 3.1824.8383 ± 3.14
ADNET26.8334 ± 2.0925.1769 ± 2.4825.7991 ± 2.8022.8546 ± 2.1122.6764 ± 2.26
BRDNet28.9446 ± 2.3128.2385 ± 3.0016.2222 ± 2.3025.4023 ± 2.9223.2963 ± 2.65
GFF27.4096 ± 4.2722.0400 ± 8.4920.2756 ± 6.7421.8686 ± 10.3120.6686 ± 4.54
RDUNET32.7698 ± 3.2528.9744 ± 3.2527.2181 ± 3.3015.0838 ± 0.7913.1487 ± 0.81
SwinUNET32.4972 ± 3.1428.5223 ± 2.9426.7357 ± 3.0826.0650 ± 3.4225.3101 ± 3.67
HREDN32.7986 ± 3.1629.2580 ± 3.2627.3235 ± 3.3026.1450 ± 3.2825.5101 ± 3.33
Table 16. Average SSIM gained before and after denoising NWPU-RESISC45 dataset.
Table 16. Average SSIM gained before and after denoising NWPU-RESISC45 dataset.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Noisy
0.28360.19150.12650.08820.0634
Original vs. Denoised
BM3D0.4762 ± 0.240.3739 ± 0.220.1334 ± 0.090.0705 ± 0.060.0475 ± 0.05
DNCNN0.7976 ± 0.070.7575 ± 0.080.6770 ± 0.090.6245 ± 0.100.5376 ± 0.07
FCAIDE0.9126 ± 0.030.8566 ± 0.070.8087 ± 0.070.7798 ± 0.080.7565 ± 0.10
ADNET0.8510 ± 0.050.7604 ± 0.070.7707 ± 0.080.6472 ± 0.060.5696 ± 0.07
BRDNet0.8530 ± 0.040.8455 ± 0.060.2623 ± 0.080.7576 ± 0.080.6801 ± 0.09
GFF0.8238 ± 0.090.6437 ± 0.270.6921 ± 0.180.6995 ± 0.200.6936 ± 0.15
RDUNET0.9321 ± 0.040.8547 ± 0.070.8114 ± 0.080.2280 ± 0.080.1620 ± 0.07
SwinUNET0.8557 ± 0.070.7271 ± 0.110.6499 ± 0.110.6040 ± 0.110.5704 ± 0.13
HREDN0.9328 ± 0.040.8652 ± 0.060.8158 ± 0.080.7889 ± 0.090.7631 ± 0.10
Table 17. Average IEF gained before and after denoising NWPU-RESISC45 dataset.
Table 17. Average IEF gained before and after denoising NWPU-RESISC45 dataset.
MethodsG:10 + RSPG:30 + RSPG:50 + RSPG:70 + RSPG:90 + RSP
Original vs. Denoised
BM3D3.0925 ± 2.284.2752 ± 5.171.7778 ± 0.571.2686 ± 0.131.1348 ± 0.05
DNCNN16.1654 ± 6.9914.5267 ± 4.5717.6271 ± 7.5217.0917 ± 7.4518.1111 ± 6.86
FCAIDE55.2539 ± 0.0343.7731 ± 42.6939.1788 ± 29.1445.7072 ± 30.0747.6427 ± 26.99
ADNET16.4873 ± 9.0916.8464 ± 11.2929.5449 ± 21.0920.0266 ± 9.9626.5407 ± 9.55
BRDNet27.7074 ± 16.5434.1582 ± 22.462.7167 ± 0.3237.4862 ± 19.7131.3183 ± 12.30
GFF23.9413 ± 21.5822.7844 ± 27.4011.9775 ± 8.6530.5923 ± 23.3520.0421 ± 11.57
RDUNET83.3028 ± 93.3045.8627 ± 49.6642.8922 ± 35.813.2826 ± 1.612.8931 ± 1.15
SwinUNET76.9846 ± 85.2241.7133 ± 52.5438.3050 ± 33.1147.4897 ± 34.7858.0151 ± 43.56
HREDN84.3325 ± 96.6049.3666 ± 55.7543.3928 ± 33.2647.8326 ± 34.8658.0657 ± 37.55
Table 18. Statistical evaluation of HREDN performance across domains.
Table 18. Statistical evaluation of HREDN performance across domains.
MetricANOVA p-ValueLevene’s p-Value
MSE0.09630.0756
PSNR0.15850.8916
SSIM0.36410.4112
IEF0.35200.7268
Table 19. Impact of modules on network performance for denoising Fer2013 images with noise (G:30 + RSP).
Table 19. Impact of modules on network performance for denoising Fer2013 images with noise (G:30 + RSP).
MethodsAverage MSEAverage PSNRAverage SSIMAverage IEF
ED0.0028 ± 0.0025.7280 ± 1.520.8919 ± 0.0333.3376 ± 20.67
ED + Attention0.0028 ± 0.0025.7320 ± 1.550.8943 ± 0.0333.2194 ± 20.42
ED + Attention + MSFEB0.0027 ± 0.0025.8967 ± 1.520.8951 ± 0.0334.6337 ± 21.51
Table 20. Impact of modules on network performance for denoising Fer2013 images.
Table 20. Impact of modules on network performance for denoising Fer2013 images.
MethodsTraining Time (Minutes)Number of Parameters (Millions)Inference Time
(Seconds)
ED3938.3820
ED + Attention2741.5319
ED + Attention + MSFEB6360.6827
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tripathi, M.; Kongprawechnon, W.; Kondo, T. A Highly Robust Encoder–Decoder Network with Multi-Scale Feature Enhancement and Attention Gate for the Reduction of Mixed Gaussian and Salt-and-Pepper Noise in Digital Images. J. Imaging 2025, 11, 51. https://doi.org/10.3390/jimaging11020051

AMA Style

Tripathi M, Kongprawechnon W, Kondo T. A Highly Robust Encoder–Decoder Network with Multi-Scale Feature Enhancement and Attention Gate for the Reduction of Mixed Gaussian and Salt-and-Pepper Noise in Digital Images. Journal of Imaging. 2025; 11(2):51. https://doi.org/10.3390/jimaging11020051

Chicago/Turabian Style

Tripathi, Milan, Waree Kongprawechnon, and Toshiaki Kondo. 2025. "A Highly Robust Encoder–Decoder Network with Multi-Scale Feature Enhancement and Attention Gate for the Reduction of Mixed Gaussian and Salt-and-Pepper Noise in Digital Images" Journal of Imaging 11, no. 2: 51. https://doi.org/10.3390/jimaging11020051

APA Style

Tripathi, M., Kongprawechnon, W., & Kondo, T. (2025). A Highly Robust Encoder–Decoder Network with Multi-Scale Feature Enhancement and Attention Gate for the Reduction of Mixed Gaussian and Salt-and-Pepper Noise in Digital Images. Journal of Imaging, 11(2), 51. https://doi.org/10.3390/jimaging11020051

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop