Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Liu A Data-Centric Solution To NonHomogeneous Dehazing Via Vision Transformer CVPRW 2023 Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

This CVPR workshop paper is the Open Access version, provided by the Computer Vision Foundation.

Except for this watermark, it is identical to the accepted version;


the final published version of the proceedings is available on IEEE Xplore.

A Data-Centric Solution to NonHomogeneous Dehazing via Vision Transformer

Yangyi Liu1 , Huan Liu1 , Liangyan Li1 , Zijun Wu2 and Jun Chen1
1
McMaster University, Hamilton, Canada
2
China Telecom Research Institute, Shanghai, China
{liu5, lil61, chenjun}@mcmaster.ca, liuh127@outlook.com, wuzj12@chinatelecom.cn

Figure 1. Our results on NTIRE 2023 dehazing challenge, achieving the best performance in terms of PNSR, SSIM and LPIPS.

Abstract quality, we present an innovative dehazing method. Specifi-


cally, we apply RGB-channel-wise transformations on the
augmented datasets, and incorporate the state-of-the-art
Recent years have witnessed an increased interest in transformers as the backbone in the two-branch framework.
image dehazing. Many deep learning methods have been We conduct extensive experiments and ablation studies to
proposed to tackle this challenge, and have made signif- demonstrate the effectiveness of our proposed method. The
icant accomplishments dealing with homogeneous haze. source code is available at https://github.com/
However, these solutions cannot maintain comparable per- yangyiliu21/ntire2023_ITBdehaze.
formance when they are applied to images with non-
homogeneous haze, e.g., NH-HAZE23 dataset introduced
by NTIRE challenge. One of the reasons for such fail- 1. Introduction
ures is that non-homogeneous haze does not obey one of
the assumptions that is required for modeling homoge- Recent years have witnessed an increased interest in
neous haze. In addition, a large number of pairs of non- image dehazing, which is categorized as one of the sub-
homogeneous hazy image and the clean counterpart is re- tasks in image restoration. Haze naturally exists all over
quired using traditional end-to-end training approaches, the world, and has become more frequent due to the cli-
while NH-HAZE23 dataset is of limited quantities. Al- mate change. This common atmospheric phenomenon has
though it is possible to augment the NH-HAZE23 dataset drawn significant attention because of its potential risks
by leveraging other non-homogeneous dehazing datasets, to traffic safety, as both the human observation and com-
we observe that it is necessary to design a proper data- puter vision models are prone to fail in hazy scenes. These
preprocessing technique that reduces the distribution gaps make image dehazing an important low-level vision task,
between the target dataset and the augmented one. This and many methods have been proposed to tackle this chal-
finding indeed aligns with the essence of data-centric AI. lenge [10, 15, 18, 21–23, 29, 30, 35–37, 39, 42].
With a novel network architecture and a principled data- Among them, many neural network based approaches
preprocessing approach that systematically enhances data [10, 11, 21, 29, 39, 42] show remarkable performance in

1406
handling image dehazing problem. Specifically, benefiting to build the second branch with a RCAN [40]. Since there is
from powerful network modules and vast training data, the no down-sampling and up-sampling operations in the sec-
end-to-end approaches deliver promising results. However, ond branch, we expect it to extract features distinct from
as the distribution of haze becomes more complicated and the ones obtained by the first branch. Finally, a fusion tail
non-homogeneous, many of them fail to achieve satisfying aggregates the results from both branches and produces de-
results. The reason for such failures is because the thickness hazed output images.
of the non-homogeneous haze is not determined entirely by Overall, our contributions are summarized as fol-
the depth of the background scene. lows. Firstly we put forward a simple but effective data-
Although researchers have made tremendous efforts col- preprocessing approach inspired by data-centric AI, lever-
lecting data with non-homogeneous haze, e.g., the NH- aging extra data to significantly enhance our model. Sec-
HAZE datasets [6–8], the quantity is still limited. A com- ondly, we incorporate the state-of-the-art backbone in the
mon belief is that models are prone to encounter the overfit- two-branch framework. By carefully balancing the two
ting problems when training a deep neural network from branches, our model demonstrates promising results us-
scratch with such small datasets. A naive solution is to ing limit-sized datasets, and outperforms other current ap-
combine all the available non-homogeneous haze datasets proaches adopting this pipeline. Finally, we conduct ex-
together to form a relatively larger dataset. However, due to tensive experiments to demonstrate the competitive perfor-
the differences between datasets caused by a variety of fac- mance of our proposed method. With substantial ablation
tors, such as color distortion, objects complexity and cam- study on different combinations of models and data, we
era capability, it has been shown that a direct combination hope to convince the future competition participants to pay
actually compromises the dehazing performance on individ- equal attention to model design and data engineering.
ual datasets [22]. It remains a serious challenge to find a ro-
bust solution to cope with the practical situation where both 2. Related Works
the quality and quantity of the available data limited. In this section, we briefly review the literature of single
To address the above-mentioned problems, we adopt the image dehazing and learning with limited data.
two-branch framework consisting of state-of-the-art back- Single Image Dehazing. Approaches proposed for sin-
bone networks, with a novel data-preprocessing transfor- gle image dehazing are divided into two categories: prior-
mation applied on the NH-HAZE datasets from previous based methods and learning-based methods. To guarantee
years. Motivated by the idea of data-centric AI that ma- the performance, prior-based methods require reasonable
chine learning has matured to a point that high-performance assumptions and knowledge on hazy images to obtain ac-
model architectures are widely available, while approaches curate estimations about the transmission map and atmo-
to engineering datasets have lagged [1, 27], we put much spheric light intensity in ASM modeling [26]. Representa-
effort in engineering the data. Inspired by the promising tive works in this category include [9,14,18,34,44]. Specif-
performance of gamma correction [15, 37], we propose a ically, [34] observed that clear images have higher contrast
simple yet effective RGB-channel-wise data-preprocessing comparing to the hazy counterparts, and proposed a local
approach. We demonstrate its suitability for this competi- contrast maximization method. Based on the assumption
tion setting, and argue that it is prospective to be the prin- that image pixels in no-haze patches have intensity val-
ciple for augmenting similar dataset. Details of this data- ues close to zero in at least one color channel, [18] intro-
centric AI inspired preprocessing approach are discussed in duced Dark Channel Prior (DCP). [44] presented a linear
later sections. Regarding to the network architecture, we de- model adapting color attenuation prior (CAP) to estimate
sign our model under the two-branch framework [15,36,37]. the depth according to the knowledge about the difference
In the first branch, we adopt the Swin Transformer V2 between the brightness and the saturation of hazy images.
model [24] pre-trained on ImageNet dataset [12] as the en- Prior-based methods left a permanent mark in single im-
coder. The powerful Swin Transformer is accredited to be age dehazing but their vulnerability when adapted in vari-
able to supersede the previous methods in many contexts of able scenes pivoted the researchers to another direction, the
transfer learning, where the knowledge gained from large- learning-based methods. With the advances in neural net-
scale benchmark is adapted to task-specific datasets [20,24]. works, [10, 11, 21, 29, 39, 42] have proposed progressively
Such pertinent features are of vital importance when deal- more powerful models that are capable of directly recov-
ing with small real-world non-homogeneous datasets [37]. ering the clean image from hazy image without estimating
Alongside a refined decoder and skip connections, the first the transmission map and depth. The superiority of these
branch extracts multi-level features of the hazy images. The methods in removing homogeneous haze is attributed to the
second branch is introduced to complement the knowledge availability of large training datasets. When applied on non-
learned from the pre-trained model by exclusively working homogeneous haze, they fail to yield comparable results.
on the domain of target data. For simplicity, we follows [37] The limited quantity of existing non-homogeneous haze

1407
Figure 2. Comparison of RGB-wise distribution of datasets (GT) before and after being processed by our proposed method.

datasets prevents researchers from adopting simple end-to- available data. In the next section, we introduce our innova-
end training methods. tive solution standing out in the NTIRE challenge settings.
Learning with Limited Data. Data is indispensable for
all the AI models. Many of the models demand a huge 3. Proposed Method
dataset for training, but large dataset is not always avail- In this section, we introduce the details of our method-
able. Therefore, it urges the researchers to find solutions to ology following the order of the working pipeline. Firstly,
accomplish training with limited data. In terms of dehazing, we demonstrate the data-preprocessing method inspired by
a seemingly straightforward solution to address the issues the idea of data-centric AI. Secondly, details of our model
caused by small non-homogeneous training datasets is com- architecture are presented, as well as the functions of each
posing a relatively large dataset by combining several small component. Finally, we introduce the loss functions applied
datasets all together. In terms of NTIRE2023 challenge [8], to train our proposed networks.
it can be done by augmenting the NH-HAZE datasets (aug-
mented dataset) [6, 7] with this year’s data (target dataset). 3.1. Data-Centric Engineering
Surprisingly, against the common believe that larger dataset
is always better in deep learning, [22] observed that the Systematically engineering the data is a key requirement
models perform better when training and testing are con- for training deep neural networks. The idea of data-centric
ducted on a single dataset (as opposed to the union of all AI moreover emphasizes on assessing the data quality be-
datasets). This observation indicates that the augmented fore deployment [38]. By comparing the NH-HAZE20 and
dataset locates in a different domain comparing to the target 21 dataset [6, 7] to the data provided this year both numeri-
data. Direct aggregation introduces domain shift problem cally and empirically, we notice obvious color discrepancy.
within the dataset. Thereby, [22] proposed a testing time When evaluating on this year’s test data, training on a direct
training strategy to mitigate the problems, while [15,31,37] combination of all data does not boost the score comparing
chose to adjust the domains of training data before send- to training on this year’s data only (see results in Section
ing them into the dehazing modules. Interestingly, the idea 4.3.1). Therefore, our goal is to propose an approach that
of focusing on improving the dataset rather than the model reduces the color differences, and shifts the distribution of
was introduced by the Data-Centric AI competition [1]. augmented data towards that of target data. Inspired by the
Data-Centric AI is anticipated to deliver a set of approaches success application of gamma correction [15, 37] as a sim-
for dataset optimization, thereby enabling deep neural net- ple yet effective data-preprocessing technique, we propose
works to be effectively trained using smaller datasets [27]. a more systematic solution for data engineering. Instead of
The set of proposed techniques ranges widely from simple the practice in [15, 37] by applying gray-scale gamma cor-
ones to complex combinations [38]. Through our experi- rection, we here introduce to correct on each R,G,B channel
ments and qualitative analysis, we find that a too simple ap- separately:
proach, such as the gamma correction adopted by [15, 37]
fails to recover the color accurately. Nevertheless, a compli- \label {GC} O_{R,G,B} = (\frac {I_{R,G,B}}{255}) ^{\frac {1}{\gamma _{R,G,B}}} (1)
cated method, like [31] applying domain adaptation to learn
a separate neural network to translate the data, is infeasible where O and I are output and input pixel intensity (∈
due to the scarcity and lacking of depth information of the [0, 255]), respectively. γ is the gamma factor. The sub-

1408
Transfer Learning Branch
SwinTransformer
Block Concat

Linear Embedding
Patch Partition

Swin
Atten- Atten- Atten-

Up-sampling

Up-sampling
Up-sampling
Trans- Linear Embed Linear Embed Linear Embed
SwinT Block
+ tion tion tion Enhanc
former + +
SwinT Block*2 SwinT Block*18 SwinT Block*2 Block Block Block e-ing
Block
CA+PA CA+PA CA+PA Block
Pre-Process

Tail
ReflectionPad
X2

Conv 7X7
Fusion
Tanh
c

X 4 RCAG
Conv 3X3

… …
RCAB

RCAB

X 10
Residual Channel Attention Group (RCAG)
Data Fitting Branch

Figure 3. An overview of our network. The model consists of two branches. The transfer learning branch is composed by Swin Transformer
based model. The data fitting branch consists of residual channel attention groups.

scripts R, G, B indicate that the values for different chan- 3.2. Network Architecture
nels are unique.
As shown in Figure 3, the pre-processed data is fed
As for implementation, we first calculate the average into a two-branch model architecture. This two-branch
pixel intensity of each channel of the three datasets; then for framework has been successfully employed in various com-
each channel of the NH-HAZE20 or 21 dataset, we apply a puter vision tasks [19], and has facilitated several works
transformation with a unique gamma value to all the pixels, [15, 36, 37] winning the awards in the past NTIRE chal-
resulting in similar mean and variance values comparing lenges. In our implementation, the first Transfer Learning
with the corresponding channel of NH-HAZE23 dataset. In Branch aims to extract pertinent features of the inputs with
Figure 2, we present the histogram change with correspond- pre-trained weights initialization. The second Data Fitting
ing γ values. From observation, our method adjusts the Branch is responsible to complement the knowledge learned
color of NH-HAZE20 and 21 data to become much similar from the first branch and work on the domain of target data.
to the NH-HAZE23 data. Numerically, the average pixel The fusion tail aggregates the outputs from both branches
intensity of 2023 data is 107.46(R), 114.48(G), 101.92(B). and produces dehazed images.
After applying our method, the adjusted average pixel
Swin Transformer based Transfer Learning. To lever-
intensity of NH-HAZE20 data is 107.77(R), 114.33(G),
age the power of transfer learning [33], we use the Ima-
102.08(B); and the adjusted ones of NH-HAZE21 data is
geNet [12] pre-trained Swin Transformer [24] as the back-
107.43(R), 115.01(G), 102.13(B). Note that, we not only
bone of our encoder. Swin Transformer achieves the state-
apply such preprocessing method on the clean ground truth
of-the-art performance in many vision tasks. It is exception-
images but also on the hazy images (as opposed to [15, 37]
ally efficient and more accurate as comparing to its prede-
only manipulating the ground truth images).
cessor, Vision Transformer (ViT) [13], which struggles with
With this novel data-preprocessing method, the distribu- high resolution images because its complexity is quadratic
tions of all three color channels of NH-HAZE20 and 21 data to the input size. The working pipeline of the Swin Trans-
are shifted closer to those of NH-HAZE23 dataset. Benefit- former is summarized as follows. First, Swin Transformer
ing from more in-distribution data, the models gain substan- splits an input image into non-overlapping patches with a
tial improvements. Being able to work with small but good patch splitting module. Through a linear embedding layer,
dataset, rather than a larger but internally diverged one helps the patches and their features are set as a concatenation of
us stand out in the competition. This indeed aligns with the the raw pixel RGB values, also referred to as “token”, and
idea of data-centric AI [27, 38]. For future competition par- then be projected to an arbitrary dimension. These tokens
ticipants, we elect this approach to be a good starting point are processed by a cascade of stages. Each stage consists
for data engineering. of a linear embedding layer and several Swin Transformer

1409
Block (SwinT Block) modules. SwinT Block uses cyclic- lation for the i-th pixel follows:
shift with MSA modules to implement efficient batch com-
putation for shifted window partitioning. From the previ- \begin {aligned} \text {SSIM}(i)&=\frac {2\mu _O\mu _G + C_1}{\mu _O^2 + \mu _G^2 +C_1} \cdot \frac {2\sigma _{OG} + C_2}{\sigma _O^2 + \sigma _G^2 +C_2} , \label {con:1} \end {aligned} (4)
ous stage to the next, the spatial dimension of the feature
maps are effectively reduced, resulting in hierarchical fea-
where C1 and C2 help stabilize the division.
ture maps. These modules compose our encoder part of
Perceptual Loss. Besides pixel-scale supervision on
the Transfer Learning Branch. As for the decoder part, we
perceptual quality, we adopt ImageNet [12] pre-trained
adopt the ideas from [15,37]. With skip connections, the at-
VGG16 [32] to measure perceptual similarity, which helps
tention blocks and up-sampling layers gradually restore the
reconstruct finer details [43]. Denoting x and y as hazy
hierarchical feature maps and produce an output with the
inputs and ground truth images respectively, the loss is de-
same spatial dimension as the input.
fined as:
Rest of the Model. We adopt the Data Fitting Branch
from [40], which is based on residual channel attention L_{perc} = \frac {1}{N}\sum _{j}\frac {1}{C_jH_jW_j}||\phi _j(f_\theta (x)) - \phi _j(y)||_2^2 , (5)
block [40]. Trained from scratch, this second branch com-
plements the first one by exclusively working on the domain
of target data. With no down-sampling and up-sampling op- where fθ (x) is the dehazed image. ϕj (·) denotes the feature
erations, this branch operates in the full-resolution mode, map. We choose L2 loss to measure the distances between
thus extracts features distinct from the ones obtained by the them. N denotes the number of features.
first branch. A simple yet insightful fusion tail consisting Adversarial Loss. Compensating for the risks that
of a reflection padding layer, a 7 × 7 convolutional layer pixel-wised loss functions fail to provide sufficient super-
and the Tanh activation [37] combines the features from two vision when training on a small dataset, we employ the ad-
branches and produces dehazed images. versarial loss [43]:

3.3. Loss Functions


L_{adv} = \sum _{n = 1}^N - logD(f_\theta (x)), (6)
Since our method mainly focuses on the data-centric en-
gineering and implementing transformers, we follow [15,
37] to adopt a combination of several losses for training our where fθ (x) denotes the dehazed image. D(·) represents
model. the discriminator.
Smooth L1 Loss. For image fidelity reconstruction, the Total Loss. The total loss is a weighted sum of afore-
smooth L1 loss [17] has been proved to be more robust than mentioned four components with pre-defined weights:
the MSE loss in various image restoration tasks [41]. The
L = L_{l_1} + 0.5 L_{\textit {MS-SSIM}} + 0.01 L_{perc} + 0.0005 L_{adv}. (7)
formulation follows:

L_{l_1} = \frac {1}{N}\sum _i^Nsmooth_{L_1}(y_i - f_\theta (x_i)), (2) 4. Experiments


In this section, we first introduce the datasets used to
conduct experiments along with implementation details.
Then, we conduct ablation studies to verify the effective-
smooth_{L_1}(z) = \begin {cases} 0.5z^2 & \text {if } |z| < 1\\ |z| - 0.5& \text {otherwise} , \end {cases} (3) ness of our model design and data-preprocessing method.
Finally, we evaluate the dehazing results of our proposed
where xi and yi denote the i-th pixel of clean and hazy im- method qualitatively and quantitatively, and compare with
ages, respectively. N is the total number of pixels. fθ (·) several state-of-the-art methods.
represents the network.
4.1. Datasets
MS-SSIM Loss. Multi-scale Structure similarity (MS-
SSIM) is based on the assumption that human eyes are O-HAZE. With the help of the professional haze ma-
adapted for extracting structural information, and therefore chine that generate real haze, O-HAZE [5] was published
a metric of evaluating structural similarity can provide a in 2018, containing 45 clean and hazy image pairs in total.
good approximation to perceived image quality. Let O and Each pair has a unique spatial resolution. We conduct our
G represent two windows centered at the i-th pixel in the evaluation based on the official train, and test split [2].
dehazed image and the ground truth image, respectively. DENSE-HAZE. DENSE-HAZE [3, 4] was introduced
Gaussian filters are applied on both windows, and produce with the NTIRE2019 challenge. It characterizes in dense
resulting values of corresponding means (µO , µG ), standard and homogeneous haze. The dataset contains 45 training
deviations σO , σG , and covariance σOG . The SSIM formu- images, 5 validation images and 5 test images. All images

1410
Table 1. Ablation study for architectures and data-preprocessing
techniques. The scores are evaluated using NTIRE2023 online
validation server.

Res2Net+RCAN Ours
Data
PSNR SSIM PSNR SSIM
NH-HAZE23 only 20.68 0.678 21.54 0.682
NH-HAZE20+21+23 20.86 0.688 21.54 0.689
NH-HAZE20+21+23 GC 21.08 0.690 21.58 0.693
NH-HAZE20+21+23 RGB 21.26 0.693 21.94 0.697

tation, horizontal flip, and vertical flip. Note that, we do


not apply any augmentation strategy related with bright-
ness or color change as we have no intention to jeopar-
dize the adjusted color distributions produced by our data-
preprocessing method. We use the AdamW [25] (β1 =
Figure 4. Qualitative ablation study on the data-centric design. (a) 0.9, β2 = 0.999) as our optimizer. The learning rate is
the results generated by the model trained on NH-HAZE20+21+23 initially set to 1e−4 and decreased to 1e−6 with a cosine
GC, (b) the results generated by the model trained on NH-
annealing strategy. We implement with the PyTorch library
HAZE20+21+23 RGB, and (c) ground truth.
[28] on two Nvidia Titan XP GPUs. Peak Signal to Noise
Ratio (PSNR) and the Structural Similarity Index (SSIM)
are of the same 1600×1200 dimension. In our experiments, are used as two metrics for quantitative evaluation.
we follow the official train, val and test split. 4.3. Ablation Study
NH-HAZE20 & NH-HAZE21. In NTIRE2020 [6]
and NTIRE2021 [7] challenges, NH-HAZE20 and NH- We conduct comprehensive ablation studies to analyze
HAZE21 were released. The haze pattern in these two and demonstrate the effectiveness of our data-preprocessing
datasets is non-homogeneous. The images in these two method and proposed network architecture.
datasets are of the size of 1600 × 1200. NH-HAZE20 con-
tains 45 training data, 5 validation data and 5 testing data. 4.3.1 Importance of Data-Centric Design.
We adopt the official train, and test split to conduct experi-
In Section 3.1, we emphasize the importance of data for
ments on NH-HAZE20. For NH-HAZE21, we take the first
succeeding in non-homogeneous dehazing. To further
20 training images as our training set, and the rest 5 images
demonstrate, we conduct experiments on several datasets
are used for testing.
with different data-preprocessing methods. There are four
NH-HAZE23. Inheriting the non-homogeneous haze sets of training data, including: 1) NH-HAZE23 only:
style from previous years, NTIRE2023 introduces 50 image only the data from NTIRE2023 challenge is used; 2) NH-
pairs, each of a much higher resolution of 4000 × 6000. HAZE20+21+23: a direct combination of data from NH-
The increase in image size leads to larger volume of train- HAZE20, NH-HAZE21 and NH-HAZE23 datasets; 3) NH-
ing data and greater demand in computation resources. As HAZE20+21+23 GC: a combination of data from NH-
the ground truth images of the 5 validation data and 5 test HAZE20, NH-HAZE21 and NH-HAZE23 with the GT data
data are not public so far, we can only make use of the 40 from NH-HAZE20 and 21 being processed by gray-scale
train data when not evaluating on the server. We adopt dif- gamma correction as [15, 37]; 4) NH-HAZE20+21+23
ferent train/test splitting strategies for performing quanti- RGB: a combination of data from NH-HAZE20, NH-
tative comparisons of SOTA methods and ablation studies. HAZE21 and NH-HAZE23 with both the hazy and GT
For methods comparison, we choose the first 35 images of data from NH-HAZE20 and 21 being processed by our pro-
the official training set as our training data, and the rest 5 posed approach described in Section 3.1. We employ these
are used for testing. For ablation study, we use all 40 pairs sets of data on two different model architectures. The first
to perform training, and get testing scores using the online model is from the NTIRE2021 challenge [37] (we refer to
validation server of the challenge. as Res2Net+RCAN), where Res2Net [16] is adopted as the
backbone. The second model is our proposed one intro-
4.2. Implementation Details
duced in Section 3.2. In total, we conduct 8 individual
The input images are randomly cropped to a size of experiments, and report their best results (in terms of the
256 × 256, and augmented by several data augmentation PSNR index) evaluated on the NTIRE2023 online valida-
strategies, including 90, 180, 270 degrees of random ro- tion sever. The results are shown in Table 1.

1411
Figure 5. Comparison of RGB-wise distribution of datasets (hazy) before and after being processed by our proposed method.

By comparing the first and second row of Table 1, we


find that the direct combination of all the available data
yields limited improvements for both our model and [37].
By comparing the last two rows with the second row in Ta-
ble 1, it can be observed that performing data-preprocessing
is generally beneficial. Not surprisingly, the models trained
on our dataset achieve the best performance. These results
reinforce the importance of data-centric engineering.
To qualitative evaluate the importance of data engineer-
ing, we show in Figure 4 by the images generated from the
models respectively trained on NH-HAZE20+21+23 GC
and NH-HAZE20+21+23 RGB. By comparing the two re-
sults with ground truth, it is obvious that the model trained
on our processed dataset can generate more faithful results
in terms of color and brightness. Specifically, the colors of
the building and trees of ours are much more in line with
those in the ground truth, while the compared one tend to
generate green objects.
Figure 6. Examples from NH-HAZE20 and NH-HAZE21 datasets
to visually showcase the color correction.
4.3.2 Effectiveness of Transformer
As noted in Section 3.2, our network is built upon the re-
cent work [37]. The main difference is that we replace HAZE23. We emphasize that this data-centric engineering
the Res2Net branch in [37] with the proposed Transformer- is the key that helps our method stand out in the competi-
based structure. By quickly check the performance of our tion. Based on both the analysis in this section and Section
method and that of Res2Net+RCAN on four datasets in Ta- 4.3.1, we conclude that the data quality is one of the deter-
ble 1, it can be easily observed that our method always out- mining factors, possibly the most important one under the
performs Res2Net+RCAN by a significant margin. This NTIRE dehazing challenges.
illustrate the effectiveness of using Transformer in non-
homogeneous dehazing.
4.5. Comparisons with the State-of-the-art Methods
To conduct comparisons, we select five state-of-the-art
4.4. Further Analysis on Data-Centric Engineering methods, including DCP [18], AOD-Net [21], GCANet
In Figure 2, we show the distribution change of the [11], FFA [29], and Res2Net+RCAN [37].
ground truths after applying the proposed RGB gamma cor- In Table 2, we illustrate the best PSNR and SSIM in-
rection. In Figure 5, we provide the distribution change dexes of each method on five different datasets. The meth-
of NH-HAZE20 and 21 hazy images as a supplement. It ods adopting the two-branch framework perform generally
can be observed that after our data-preprocessing, the dis- well on all the datasets, where in Figure 7, Res2Net+RCAN
tribution of the three image channels (RGB) of the NH- and our method can produce visual pleasing results on
HAZE20 and 21 hazy images are more in line with those all datasets. They unveil significantly better performance
of NH-HAZE23 data. Figure 6 further qualitatively illus- when dealing with non-homogeneous haze patterns, as we
trates the images before and after data-preprocessing. The could observe from the results on NH-HAZE20, 21 and 23.
results shows that colors of the processed NH-HAZE20 and Therefore, the two branch framework remains dominating
NH-HAZE21 data are much more similar to that of NH- in the limited data scenarios.

1412
Figure 7. Qualitative evaluation on the four representative datasets, i.e., DENSE-HAZE, NH-HAZE20, NH-HAZE21 and NH-HAZE23.
For DENSE-HAZE and NH-HAZE20, we follow the official train, val and test split. For NH-HAZE21 and NH-HAZE23, due to the
unavailable of test data, we split the released official training data to our training set and test set.

Table 2. Quantitative evaluation on DENSE-HAZE, NH-HAZE20, NH-HAZE21 and NH-HAZE23 datasets. The best results are marked
in bold, and the second bests are marked with underlines.

O-HAZE DENSE-HAZE NH-HAZE20 NH-HAZE21 NH-HAZE23


Methods
PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
DCP 12.92 0.505 10.85 0.404 12.29 0.411 11.30 0.605 11.87 0.470
AOD 17.69 0.616 13.30 0.469 13.44 0.413 13.22 0.613 12.47 0.369
GCANet 19.50 0.660 12.42 0.478 17.58 0.594 18.76 0.768 16.36 0.512
FFA 22.12 0.768 16.26 0.545 18.51 0.637 20.40 0.806 18.09 0.585
Res2Net+RCAN 25.54 0.783 16.36 0.582 21.44 0.704 21.66 0.843 20.11 0.627
Ours 25.98 0.789 16.31 0.561 21.44 0.710 21.67 0.838 20.53 0.636

It is worth noticing that our model substantially outper- improving the model’s capacity.
forms the Res2Net+RCAN model only on O-HAZE and
NTIRE2023. We argue the reason behind is that due to 5. Conclusion
the huge increase of image resolution on O-HAZE and NH-
HAZE23 datasets. For example, the number of pixels in In this paper, we propose a method targeting on non-
NH-HAZE23 data is 6.25 times larger than that of the com- homogeneous dehazing. It consists of a data-preprocessing
bination of NH-HAZE20 and NH-HAZE21 datasets. Since strategy inspired by data-centric AI and a Transformer
our transformer-based model contains more learnable pa- based two-branch model structure. Combining them to-
rameters, a larger training dataset can essentially alleviate gether, we construct a solution that outperforms the SOTA
the overfitting problem. This phenomenon further indicates methods, which stimulates our advocation on treating the
that when it comes to a limited data setting, it is more criti- model and the data equally important. Additionally, exten-
cal to investigate in a data-centric manner other than simply sive experimental results provide strong support to the ef-
fectiveness of our method.

1413
References on Applications of Computer Vision (WACV), pages 1375–
1383, 2019. 1, 2, 7
[1] Data-centric ai competition submission guide, 2021. 2, 3 [12] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li,
[2] Cosmin Ancuti, Codruta O Ancuti, and Radu Timofte. Ntire and Li Fei-Fei. Imagenet: A large-scale hierarchical image
2018 challenge on image dehazing: Methods and results. In database. In 2009 IEEE conference on computer vision and
Proceedings of the IEEE Conference on Computer Vision pattern recognition, pages 248–255. Ieee, 2009. 2, 4, 5
and Pattern Recognition Workshops, pages 891–901, 2018. [13] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov,
5 Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner,
[3] Codruta O. Ancuti, Cosmin Ancuti, Mateu Sbert, and Radu Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl-
Timofte. Dense haze: A benchmark for image dehazing with vain Gelly, et al. An image is worth 16x16 words: Trans-
dense-haze and haze-free images. In IEEE International formers for image recognition at scale. arXiv preprint
Conference on Image Processing (ICIP), IEEE ICIP 2019, arXiv:2010.11929, 2020. 4
2019. 5 [14] Raanan Fattal. Dehazing using color-lines. ACM transac-
[4] C. O. Ancuti, C. Ancuti, R. Timofte, L. Van Gool, L. Zhang, tions on graphics (TOG), 34(1):1–14, 2014. 2
M. Yang, T. Guo, X. Li, V. Cherukuri, V. Monga, H. Jiang, [15] Minghan Fu, Huan Liu, Yankun Yu, Jun Chen, and Keyan
S. Yang, Y. Liu, X. Qu, P. Wan, D. Park, S. Y. Chun, M. Wang. Dw-gan: A discrete wavelet transform gan for nonho-
Hong, J. Huang, Y. Chen, S. Chen, B. Wang, P. N. Miche- mogeneous dehazing. In Proceedings of the IEEE/CVF Con-
lini, H. Liu, D. Zhu, J. Liu, S. Santra, R. Mondal, B. Chanda, ference on Computer Vision and Pattern Recognition (CVPR)
P. Morales, T. Klinghoffer, L. M. Quan, Y. Kim, X. Liang, R. Workshops, pages 203–212, June 2021. 1, 2, 3, 4, 5, 6
Li, J. Pan, J. Tang, K. Purohit, M. Suin, A. N. Rajagopalan, [16] Shanghua Gao, Ming-Ming Cheng, Kai Zhao, Xin-Yu
R. Schettini, S. Bianco, F. Piccoli, C. Cusano, L. Celona, Zhang, Ming-Hsuan Yang, and Philip HS Torr. Res2net: A
S. Hwang, Y. S. Ma, H. Byun, S. Murala, A. Dudhane, H. new multi-scale backbone architecture. IEEE transactions
Aulakh, T. Zheng, T. Zhang, W. Qin, R. Zhou, S. Wang, J. on pattern analysis and machine intelligence, 2019. 6
Tarel, C. Wang, and J. Wu. Ntire 2019 image dehazing chal- [17] Ross Girshick. Fast r-cnn. In Proceedings of the IEEE Inter-
lenge report. In 2019 IEEE/CVF Conference on Computer national Conference on Computer Vision (ICCV), December
Vision and Pattern Recognition Workshops (CVPRW), pages 2015. 5
2241–2253, 2019. 5 [18] Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze
[5] Codruta O. Ancuti, Cosmin Ancuti, Radu Timofte, and removal using dark channel prior. IEEE transactions on pat-
Christophe De Vleeschouwer. O-haze: a dehazing bench- tern analysis and machine intelligence, 33(12):2341–2353,
mark with real hazy and haze-free outdoor images. In IEEE 2010. 1, 2, 7
Conference on Computer Vision and Pattern Recognition, [19] Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and
NTIRE Workshop, NTIRE CVPR’18, 2018. 5 Geoffrey E Hinton. Adaptive mixtures of local experts. Neu-
[6] Codruta O Ancuti, Cosmin Ancuti, Florin-Alexandru ral computation, 3(1):79–87, 1991. 4
Vasluianu, and Radu Timofte. Ntire 2020 challenge on non- [20] Simon Kornblith, Jonathon Shlens, and Quoc V. Le. Do
homogeneous dehazing. In Proceedings of the IEEE/CVF better imagenet models transfer better? In Proceedings of
Conference on Computer Vision and Pattern Recognition the IEEE/CVF Conference on Computer Vision and Pattern
Workshops, pages 490–491, 2020. 2, 3, 6 Recognition (CVPR), June 2019. 2
[21] Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, and
[7] Codruta O Ancuti, Cosmin Ancuti, Florin-Alexandru
Dan Feng. Aod-net: All-in-one dehazing network. In Pro-
Vasluianu, and Radu Timofte. Ntire 2021 nonhomogeneous
ceedings of the IEEE international conference on computer
dehazing challenge report. In Proceedings of the IEEE/CVF
vision, pages 4770–4778, 2017. 1, 2, 7
Conference on Computer Vision and Pattern Recognition,
[22] Huan Liu, Zijun Wu, Liangyan Li, Sadaf Salehkalaibar,
pages 627–646, 2021. 2, 3, 6
Jun Chen, and Keyan Wang. Towards multi-domain single
[8] Codruta O Ancuti, Cosmin Ancuti, Florin-Alexandru
image dehazing via test-time training. In Proceedings of
Vasluianu, and Radu Timofte. Ntire 2023 challenge on non-
the IEEE/CVF Conference on Computer Vision and Pattern
homogeneous dehazing. In Proceedings of the IEEE/CVF
Recognition, pages 5831–5840, 2022. 1, 2, 3
Conference on Computer Vision and Pattern Recognition
[23] Jing Liu, Haiyan Wu, Yuan Xie, Yanyun Qu, and Lizhuang
Workshops, 2023. 2, 3
Ma. Trident dehazing network. In Proceedings of the
[9] Dana Berman, Shai Avidan, et al. Non-local image dehazing. IEEE/CVF Conference on Computer Vision and Pattern
In Proceedings of the IEEE conference on computer vision Recognition (CVPR) Workshops, June 2020. 1
and pattern recognition, pages 1674–1682, 2016. 2 [24] Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie,
[10] Bolun Cai, Xiangmin Xu, Kui Jia, Chunmei Qing, and Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, et al.
Dacheng Tao. Dehazenet: An end-to-end system for single Swin transformer v2: Scaling up capacity and resolution. In
image haze removal. IEEE Transactions on Image Process- Proceedings of the IEEE/CVF conference on computer vi-
ing, 25(11):5187–5198, 2016. 1, 2 sion and pattern recognition, pages 12009–12019, 2022. 2,
[11] D. Chen, M. He, Q. Fan, J. Liao, L. Zhang, D. Hou, L. Yuan, 4
and G. Hua. Gated context aggregation network for image [25] Ilya Loshchilov and Frank Hutter. Decoupled weight decay
dehazing and deraining. In 2019 IEEE Winter Conference regularization. arXiv preprint arXiv:1711.05101, 2017. 6

1414
[26] William Edgar Knowles Middleton. Vision through the at- the European conference on computer vision (ECCV), pages
mosphere. University of Toronto Press, 1952. 2 286–301, 2018. 2, 5
[27] Mohammad Motamedi, Nikolay Sakharnykh, and Tim [41] Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. Loss
Kaldewey. A data-centric approach for training deep neu- functions for image restoration with neural networks. IEEE
ral networks with less data, 2021. 2, 3, 4 Transactions on computational imaging, 3(1):47–57, 2016.
[28] Adam Paszke, Sam Gross, Soumith Chintala, Gregory 5
Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Al- [42] Zhaorun Zhou, Zhenghao Shi, Mingtao Guo, Yaning Feng,
ban Desmaison, Luca Antiga, and Adam Lerer. Automatic and Minghua Zhao. Cggan: A context guided generative
differentiation in pytorch. 2017. 6 adversarial network for single image dehazing, 2020. 1, 2
[29] Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, and [43] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A
Huizhu Jia. Ffa-net: Feature fusion attention network for Efros. Unpaired image-to-image translation using cycle-
single image dehazing. Proceedings of the AAAI Conference consistent adversarial networks. In Proceedings of the IEEE
on Artificial Intelligence, 34(07):11908–11915, Apr. 2020. international conference on computer vision, pages 2223–
1, 2, 7 2232, 2017. 5
[30] Wenqi Ren, Lin Ma, Jiawei Zhang, Jinshan Pan, Xiaochun [44] Qingsong Zhu, Jiaming Mai, and Ling Shao. A fast single
Cao, Wei Liu, and Ming-Hsuan Yang. Gated fusion network image haze removal algorithm using color attenuation prior.
for single image dehazing. In Proceedings of the IEEE Con- IEEE transactions on image processing, 24(11):3522–3533,
ference on Computer Vision and Pattern Recognition, pages 2015. 2
3253–3261, 2018. 1
[31] Yuanjie Shao, Lerenhan Li, Wenqi Ren, Changxin Gao, and
Nong Sang. Domain adaptation for image dehazing. In Pro-
ceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pages 2808–2817, 2020. 3
[32] Karen Simonyan and Andrew Zisserman. Very deep convo-
lutional networks for large-scale image recognition. arXiv
preprint arXiv:1409.1556, 2014. 5
[33] Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang,
Chao Yang, and Chunfang Liu. A survey on deep transfer
learning, 2018. 4
[34] Robby T Tan. Visibility in bad weather from a single im-
age. In 2008 IEEE conference on computer vision and pat-
tern recognition, pages 1–8. IEEE, 2008. 2
[35] Haiyan Wu, Jing Liu, Yuan Xie, Yanyun Qu, and Lizhuang
Ma. Knowledge transfer dehazing network for nonhomoge-
neous dehazing. In Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition (CVPR)
Workshops, June 2020. 1
[36] Haiyan Wu, Jing Liu, Yuan Xie, Yanyun Qu, and Lizhuang
Ma. Knowledge transfer dehazing network for nonhomoge-
neous dehazing. In Proceedings of the IEEE/CVF confer-
ence on computer vision and pattern recognition workshops,
pages 478–479, 2020. 1, 2, 4
[37] Yankun Yu, Huan Liu, Minghan Fu, Jun Chen, Xiyao Wang,
and Keyan Wang. A two-branch neural network for non-
homogeneous dehazing via ensemble learning. In Proceed-
ings of the IEEE/CVF conference on computer vision and
pattern recognition, pages 193–202, 2021. 1, 2, 3, 4, 5, 6, 7
[38] Daochen Zha, Zaid Pervaiz Bhat, Kwei-Herng Lai, Fan
Yang, and Xia Hu. Data-centric ai: Perspectives and chal-
lenges, 2023. 3, 4
[39] He Zhang and Vishal M Patel. Densely connected pyramid
dehazing network. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 3194–3203,
2018. 1, 2
[40] Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng
Zhong, and Yun Fu. Image super-resolution using very
deep residual channel attention networks. In Proceedings of

1415

You might also like