¹¹institutetext: National Key Lab of Multispectral Information Intelligent Processing Technology,
Huazhong University of Science and Technology, China ¹¹email: {shengqi,sunrun,yichang,caoshuning,xiaoxueyao,yanluxin}@hust.edu.cn
https://shengqi77.github.io/RLR-AT.github.io/

Long-range Turbulence Mitigation: A Large-
scale Dataset and A Coarse-to-fine Framework

Shengqi Xu Run Sun Yi Chang ^🖂 Shuning Cao Xueyao Xiao Luxin Yan

Abstract

Long-range imaging inevitably suffers from atmospheric turbulence with severe geometric distortions due to random refraction of light. The further the distance, the more severe the disturbance. Despite existing research has achieved great progress in tackling short-range turbulence, there is less attention paid to long-range turbulence with significant distortions. To address this dilemma and advance the field, we construct a large-scale real long-range atmospheric turbulence dataset (RLR-AT), including 1500 turbulence sequences spanning distances from 1 Km to 13 Km. The advantages of RLR-AT compared to existing ones: turbulence with longer-distances and higher-diversity, scenes with greater-variety and larger-scale. Moreover, most existing work adopts either registration-based or decomposition-based methods to address distortions through one-step mitigation. However, they fail to effectively handle long-range turbulence due to its significant pixel displacements. In this work, we propose a coarse-to-fine framework to handle severe distortions, which cooperates dynamic turbulence and static background priors (CDSP). On the one hand, we discover the pixel motion statistical prior of turbulence, and propose a frequency-aware reference frame for better large-scale distortion registration, greatly reducing the burden of refinement. On the other hand, we take advantage of the static prior of background, and propose a subspace-based low-rank tensor refinement model to eliminate the misalignments inevitably left by registration while well preserving details. The dynamic and static priors complement to each other, facilitating us to progressively mitigate long-range turbulence with severe distortions. Extensive experiments demonstrate that the proposed method outperforms SOTA methods on different datasets.

Figure 1: Visual examples of turbulence mitigation on proposed real-world long-range atmospheric turbulence benchmark RLR-AT. The proposed method could effectively handle the long-range turbulence with severe distortions.

1 Introduction

Seeing farther and more clearly is crucial for many military and civilian applications. Unfortunately, long-range imaging inevitably suffers from the atmospheric turbulence with severe geometric distortion due to random refraction of light [49, 8, 17]. When accumulating over long distance in a non-uniform atmosphere medium, more refractions occur during optical transmission, resulting in more severe pixel displacement. Although short-range atmospheric turbulence mitigation has achieved great progress in recent years [29, 39, 54, 58, 24, 26], very few studies have focused on long-range turbulence. In this work, our goal is to handle the long-range turbulence with severe distortions as shown in Fig. 1.

Atmospheric turbulence benchmark is a key issue for evaluating turbulence mitigation methods. Existing datasets can be classified into synthetic datasets [56, 55] and real-world datasets [25, 1, 29, 4, 21, 41, 39]. Synthetic datasets are mainly constructed using turbulence simulators [37, 10, 12]. However, simulators cannot precisely generate data entirely matching the features of real turbulence, since turbulence in real scenarios possesses complex nature, especially for long-range turbulence. Real turbulence is primarily influenced by temperature and imaging distance. Greater distances and higher temperatures both lead to more severe turbulence. Typically, real turbulence can be divided into: hot-air turbulence and long-range turbulence [37]. Most public real datasets consist of hot-air turbulence, and TurbRecon [37] stands out by capturing 2 turbulence sequences for building scene at a distance of 4 Km. As such, constructing a large-scale real-world long-range turbulence dataset with diverse scenes is highly necessary.

In this work, we construct a large-scale real-world long-range turbulence dataset RLR-AT for long-range turbulence mitigation. The strength of our benchmark is threefold. Firstly, RLR-AT contains long-range turbulence with longer-distances and higher-diversity, covering diverse distortions ranging from 1 Km to 13 Km. Secondly, it consists of large-scale and diverse scenes, including 1500 turbulence sequences collected across various scenarios, such as text, object, building, etc. Last but not least, RLR-AT is collected by a telephoto camera with high-resolution (1980*1080 pixels). Overall, RLR-AT can serve as a benchmark for future works targeting long-range turbulence mitigation.

The main difficulty of long-range turbulence mitigation lies in the severe distortions. To handle distortions, most existing methods can be classified into two categories: registration-based methods [7, 15, 26, 37, 48, 58, 53] and decomposition-based methods [14, 41, 24]. The former merely employs a zero-mean assumption prior of dynamic turbulence to construct a reference frame, aligning the distorted frames with the reference frame using registration technique. However, such strategy struggles to handle long-range turbulence with severe distortions, as registration errors emerge from the blurring of the reference frame.

On the contrary, the latter is to treat the distortions among the frames as gross error, and directly remove the distortions through matrix decomposition by exploring low-rank prior of static background. Though it is effective for mild distortions, it is theoretically less robust when handling severely corrupted observations [52, 36]. Thus, directly employing such strategy on the long-range turbulence with severe distortions would suffer from unexpected details loss. Overall, previous methods either utilize the zero-mean assumption of dynamic turbulence or low-rank prior of static background, and both of them are difficult to handle long-range turbulence with severe distortions.

To handle long-range turbulence with severe distortions, we propose a coarse-to-fine framework that cooperates dynamic turbulence and static background priors (CDSP). On the one hand, we explore the pixel motion statistical prior of turbulence and discover that the pixel occurring most frequently at one certain position is most likely closer to the original GT. This inspires us to propose a frequency-aware reference frame for better distortion registration, significantly reducing the burden of subsequent refinement. On the other hand, we further take advantage of static prior of background and propose a subspace-based low-rank tensor refinement model to refine the registration errors unavoidably left by registration meanwhile well preserving details. The dynamic and static priors complement to each other, facilitating us to mitigate long-range turbulence with severe distortions. Overall, our main contributions are summarized as follows:

1.

Our work focuses on a challenging yet practical task: long-range turbulence mitigation. We construct a large-scale real-world long-range atmospheric turbulence benchmark (RLR-AT). Compared to existing public real-world datasets , RLR-AT is the farthest (ranging from 1 Km to 13 Km) and largest-scale (1500 sequences with high-resolution 1980*1080) turbulence dataset with diverse turbulence levels and scenes. This dataset would be a good testbed for the community, especially for long-range turbulence mitigation.
2.

We propose a coarse-to-fine framework for long-range turbulence mitigation, which cooperates dynamic and static priors. Specifically, we figure out the pixel displacement statistical prior of dynamic turbulence and propose a frequency-aware reference frame for better registration, significantly reducing the burden of refinement. Moreover, we take advantage of low-rank prior of static background and propose a subspace-based low-rank tensor refinement model to remove the registration errors meanwhile well preserving details. Compared to existing methods, the dynamic and static priors complement to each other, facilitating us to address long-range turbulence with significant distortions.
3.

We comprehensively compare CDSP with existing methods on proposed real long-range turbulence dataset RLR-AT and synthetic dataset. Extensive experiments show that our CDSP consistently outperforms SOTA methods, especially when handling long-range turbulence with severe distortions.

2 Related Work

Real Atmospheric Turbulence Benchmarks. Atmospheric turbulence is a fundamental issue in long-range imaging system, mostly depending on the temperature and imaging distance. Higher temperatures and longer distances typically result in more severe turbulence. Generally, turbulence can be mainly classified into two categories: hot-air turbulence and long-range turbulence [37]. In Table 1, we provide a comprehensive summary of existing pubic benchmarks. EEF [25] provided two widely used hot-air turbulence samples. UG2+ TurbuText [1] and Heat Chamber [39] collected turbulence sequences with a distance of 20 meters. These hot-air turbulence datasets were collected with artificial heat burner. Further, CLEAR [4], OTIS [21], Turbulence Text [39] and TSR-WGAN [29] collected turbulence under high-temperature environment near the surface. Most existing public datasets are composed of hot-air turbulence, and TurbRecon [37] captured 2 turbulence sequences with distance of 4Km. Moreover, existing datasets are still limited in terms of scale and diversity. In this work, we focus on the challenging problem: long-range turbulence mitigation and construct a large-scale real long-range turbulence dataset to advance this field.

Table 1: Summary of existing available real-world turbulence benchmarks.

Venue Hot-air Temperature Long-range Distance Scene Category Scene Avg Frames Resolution EFF [25] ✓ - ✗ $<$ 1Km Building, Chimney 2 100 240*240 Heat Chamber [39] ✓ - ✗ 20m Image 400 100 440*440 UG2+ TurbuText [1] ✓ - ✗ 20m Text 100 100 440*440 CLEAR [4] ✓ $46^{\circ}$ ✗ $<$ 2Km Building, Street 3 53 250*180 OTIS [21] ✓ - ✗ $<$ 1Km Pattern, door 21 276 256*256 TSR-WGAN [29] ✓ $33^{\circ}$ ✗ $<$ 3Km Street, Grassland 21 233 960*540 Turbulence Text [39] ✓ $30^{\circ}$ ✗ 300m Text 100 100 440*440 TurbRecon [37] ✓ $30^{\circ}$ ✓ 4Km Building, Chimney 4 100 512*512 RLR-AT ✓ $-6^{\circ}\sim 40^{\circ}$ ✓ 1Km-13Km Building, Text $\ldots$ Car 1500 800 1920*1080

Atmospheric Turbulence Mitigation. Lucky imaging is an intuitive way to mitigate the turbulence [6, 2, 26, 11, 3, 5, 31]. Its key idea is to choose the lucky high-quality frames least affected by atmosphere from short-exposure imaging frames. Unfortunately, the lucky assumption does not hold any more for long-range anisoplanatic turbulence, where severe distortions persist across all frames [43, 18, 37]. In recent years, the data-driven methods have been popular due to its end-to-end simplicity [27, 39, 19, 54, 40, 32, 28, 16, 42, 51, 56, 55]. Its main idea is to train on the paired clean-degraded synthetic data from turbulence simulators [12, 10, 38, 44, 22]. The learning-based methods would achieve satisfactory results on simulated data while can not generalize well to the real turbulence due to the domain gap, especially for long-range turbulence with severe distortions. Considering the turbulence has clear physical procedure namely light refraction and diffraction, Zhu et al. [58] proposed a classical multi-stage restoration framework which gradually performed the distortion correction and deblurring. In this work, we follow this research line with clear physics foundations, and investigate how to cooperate the dynamic and static priors for better distortion correction.

Turbulence Distortion Correction. The main difficulty of the long-range turbulence lies in the severe distortions. Most existing methods can be classified into two categories: registration-based methods [48, 58, 7, 53, 37, 26, 15] and decomposition-based methods [41, 14, 24]. Most existing registration-based methods employ the zero-mean assumption prior of dynamic turbulence to construct a reference frame, and suppress distortion by aligning each input frame to the reference frame utilizing non-grid registration. For example, the most common way to obtain a reference frame is by directly averaging input frames [25, 58, 4, 7, 23]. Mao et al. [37] further presented a novel space-time non-local averaging method to adaptively assign different weights for different frames, departing from uniform temporal averaging. However, these methods would suffer from blur, leading to inaccurate registration, especially for long-range turbulence. In this work, we discover the pixel motion statistical prior of turbulence and propose a frequency-aware high-quality reference frame for better large-scale distortion registration.

The removal-based methods started from the matrix decomposition perspective, utilizing low-rank prior of static background to remove distortions. For example, Oreifej et al. [41] proposed a three-term matrix decomposition approach for simultaneously distortion removal and object detection. However, these matrix-based methods need to transform 3-D video into 2-D matrix, which would unexpectedly damage the spatio-temporal structure. In this work, we propose a subspace-based low-rank tensor refinement model to refine the registration error meanwhile well preserving the spatio-temporal details. We further integrate dynamic and static priors within a coarse-to-fine framework in a complementary manner to better handle long-range turbulence with severe distortions.

Refer to caption — Figure 2: Illustration of the proposed dataset RLR-AT. (a) Long-range imaging with larger focal length lens through turbulence. (b) Typical turbulence with diverse long-distance conditions. (c) Statistics of distance and scene of the proposed benchmark.

3 Large-scale Real Long-range Turbulence Benchmark

Owing to the challenges in gathering long-range turbulence, current datasets mainly focus on hot-air turbulence, overlooking long-range turbulence with severe distortions. To fill this gap, we construct a large-scale long-range turbulence dataset for verification and analysis, named as RLR-AT. Note that RLR-AT also includes videos of dynamic scenes and turbulence coupled with haze, which can be used to study turbulence in dynamic scenes and multi-degradation restoration.

Benchmark Collection. In this work, we collect the long-range turbulence sequences by a telephoto camera (Nikon Coolpix P1000) with equivalent 3000mm lens focal length, sampled in 30 fps. The data collection process is illustrated in Fig. 2(a). Firstly, we stabilize the camera on a tripod to capture distant static scenes. Subsequently, we adjust the focal length until we discern the emergence of geometric distortion and blurring induced by non-uniform indices of refraction associated with long-range turbulence. For each sequence, we record approximate 35 seconds and extract the intermediate steady 30 seconds into our dataset.

Benchmark Statistics. Table 1 presents the detailed statistical comparison between our proposed RLR-AT and existing turbulence benchmarks. Overall, our dataset contains 1500 sequences, each of which consists of approximately 800 frames, collected from diverse cities. Notably, our dataset comprehensively covers long-range turbulence across distances ranging from 1 km to 13 km. Moreover, over 19 typical scenes are captured, including the street, sports ground, factory, car and billboard, etc, offering a comprehensive range for long-range surveillance scenarios. To visualize the distribution of distances and scene categories, a bar chart and a sunburst chart are illustrated in Fig. 2(c).

Turbulence with Longer-distances and Higher-diversity. The key difference between RLR-AT and other datasets is that RLR-AT covers turbulence distortions with longer-distances and higher-diversity. Most existing public datasets are mainly composed of hot-air turbulence, and TurbRecon [37] stands out by capturing two turbulence sequences at a distance of 4 Km. In comparison, RLR-AT contains long-range turbulence images captured from longer and richer distances (1-13Km). In Fig. 2(b), we visualize some long-range turbulence images with increasing distances in RLR-AT. It can be observed that with increasing distance, turbulence degradation level is higher, leading to more severe distortions in the images. The bar chart in Fig. 2(c) displays the number of turbulence sequences in our dataset at each distance, ranging from 1 km to 13 km, which further highlights the diversity of distances within our RLR-AT.

Scenes with Larger-scale and Greater-variety. We concern not only the distances diversity of long-range turbulence but also the variety and scale of scenes. In Table 1, we can observe that the existing real-world turbulence benchmarks are still limited in terms of scene amount and diversity. Most datasets focus on relatively common scenes, such as building, street or grassland, and UG2+ TurbuText [1] and Turbulence Text [39] both collect hot-air turbulence specifically for text scenes. In comparison, RLR-AT contains 1500 long-range turbulence sequences across 19 various categories. Figure 2(b) shows some typical long-range turbulence images in various scenes, such as car, motorcycle, building, text, offering a comprehensive range for long-range surveillance scenarios. The sunburst chart in Fig. 2(c) illustrates the distribution of scene categories in RLR-AT, further showcasing the diversity of scenes within our dataset.

4 A Coarse-to-fine Framework for Long-range Turbulence

In this work, we propose a coarse-to-fine framework that cooperates dynamic turbulence prior and static background priors (CDSP) to handle long-range turbulence with severe distortions. On the one hand, we discover the pixel motion statistical prior of turbulence and propose a frequency-aware reference frame for better large-scale distortion registration, which greatly reduces the burden of refinement (Section 4.1). Then we align the distorted frames to the proposed reference frame utilizing registration approach based on optical flow [34]. On the other hand, we take advantage of the static prior of background and propose a subspace-based low-rank tensor refinement model to refine the registration errors unavoidably left by registration while well preserving details (Section 4.2). The dynamic and static priors complement to each other, facilitating us to better eliminate the severe distortions. Finally, we employ a simple data-driven network to further remove the residual blur, and the generation of paired deblurring data is based on the proposed distortion correction framework. The details of blur removal are provided in the supplementary material.

4.1 Frequency-aware Reference Frame Construction

Most previous methods [25, 7, 23, 58] employ temporal averaging (Temp Avg) to construct a reference frame by naively applying zero-mean assumption of turbulence. However, Temp Avg often suffers from blur, leading to imprecise registration when handling severe distortions, as shown in Fig. 3(a). In contrast, FRF achieves more accurate registration due to its superior quality. In Fig. 3(b), Temp Avg assumes that all pixel intensities occurring at a certain position have equal weight. However, the output is dissimilar with original GT, as the non-original pixel intensities make a negative contribution to the output at the certain position. In this work, we propose a frequency-aware method to construct a reliable reference frame based on the pixel motion statistical prior of turbulence.

Pixel Displacement Statistical Prior of Turbulence. To explore the pixel displacement statistical prior of turbulence, we conduct an analysis experiment utilizing the corners of checkerboard in Fig. 4. We take checkerboard turbulence images as the experimental datasets due to the ability to approximate corner motion as pixel motion, and the mature nature of checkerboard corner detection techniques. Then we apply a corner detector [20] on the collected datasets to detect the shifted corners. Note that we conduct extensive analysis experiments on long-range turbulence across various distances and scenes ( e.g. Building with wall corners). Please refer to supplementary material for details.

In Fig. 4(a), we show the clean and distorted checkerboard and corresponding motion field of corners. It is evident that the corners of distorted frames exhibit noticeable motion, and the motion of corners varies across different frames. To better explore the motion of corners in the temporal dimension, we further visualize the motion trajectory of the corners along the temporal axis in Fig. 4(b). The motion trajectories of each corner are distinct. We randomly select two corners and present their horizontal and vertical positions on the right. It is observed that although the displacements of the two corners differ, a commonality is that they consistently occur relatively close to their original positions, resembling a statistical rule. To further explore the statistics of pixel motion, we normalize the positions of all corners across 200 frames and perform statistics analysis on their displacements in Fig. 4(c). The statistical results show that the motions approximately follow a zero-mean gaussian distribution, indicating that pixels most frequently occur in their original positions. This insight implies that the pixel occurring most frequently at one certain position is most likely the original GT, inspiring us to propose a frequency-aware reference frame.

Frequency-aware Reference Frame. Given the distorted sequence, suppose there are $K$ different pixel intensities occurring at a certain position $\bm{\mathit{l}}=(x,y)$ along the temporal dimension. Let $I_{\bm{\mathit{l}},k}$ represent a pixel intensity occurring at this position in the distorted sequence and $k\in[1,K]$ . We first count the frequency of each pixel intensity occurring at the position along the temporal dimension: $N_{\bm{\mathit{l}},k}=count(I_{\bm{\mathit{l}},k})$ . Previous temporal average reference frame assigns a weight of one to each pixel intensity at the certain position, hence, the output $T_{\bm{\mathit{l}}}$ is obtained by summing all pixel intensities and dividing by the number of frames that equals the sum of the frequency of pixel intensities:

$\begin{aligned} T_{\bm{\mathit{l}}}=\Big{(}{\sum\limits_{k=1}^{K}N_{\bm{% \mathit{l}},k}\times I_{\bm{\mathit{l}},k}}\Big{)}/\Big{(}{\sum\limits_{k=1}^{% K}N_{\bm{\mathit{l}},k}}\Big{)},\end{aligned}$

(1)

Different from temporal averaging, we argue that the weight of pixel intensities is positively correlated with their frequency as shown in Fig. 3(b). Consequently, we construct a frequency-aware weight for each pixel intensity:

\displaystyle\omega_{\bm{\mathit{l}},k}=\mathrm{e}^{\sigma\times N_{\bm{% \mathit{l}},k}},

(2)

which is a function of the frequency $N_{\bm{\mathit{l}},k}$ , and $\sigma$ is a hyper-parameter controlling the growth rate of the weight. Next, the pixel value of the reference frame at the certain position $F_{\bm{\mathit{l}}}$ is constructed via weighted averaging based on frequency:

$\begin{aligned} F_{\bm{\mathit{l}}}=\Big{(}{\sum\limits_{k=1}^{K}N_{\bm{% \mathit{l}},k}\times I_{\bm{\mathit{l}},k}\times\omega_{\bm{\mathit{l}},k}}% \Big{)}/\Big{(}{\sum\limits_{k=1}^{K}N_{\bm{\mathit{l}},k}}\times\omega_{\bm{% \mathit{l}},k}\Big{)}.\end{aligned}$

(3)

Relationship between FRF and Temp Avg. We further discuss the relationship between Temp Avg and FRF, which is established through the parameter $\sigma$ . The $\sigma$ in Eq. (2) decides the sensitivity of weight function to frequency. When $\sigma=0$ , the weight is one for all intensities, and the Eq. (3) can be simplified as:

$\begin{aligned} F_{\bm{\mathit{l}}}=\Big{(}{\sum\limits_{k=1}^{K}N_{\bm{% \mathit{l}},k}\times D_{\bm{\mathit{l}},k}}\Big{)}/\Big{(}{\sum\limits_{k=1}^{% K}N_{\bm{\mathit{l}},k}}\Big{)}.\end{aligned}$

(4)

which is the same as Eq. (1), illustrating that average reference frame is a special case of proposed FRF when $\sigma=0$ . The reference frame constructed with $\sigma=0$ (Temp Avg) is shown in Fig. 3(a), it can be observed that the result suffers from severe blur. On the contrary, when $\sigma\neq 0$ , the higher the frequency of pixel intensity, the greater the weight, resulting in a output more similar to the original pixel. The result with $\sigma\neq 0$ (FRF) in Fig. 3(a) possesses superior visual quality compared to Temp Avg, which is beneficial for severe distortion registration.

4.2 Low-rank Tensor Distortion Refinement

Low-rank Prior of Static Background. Due to the severe distortions in long-range turbulence, achieving perfect pixel-level registration is impossible. Consequently, registration errors are unavoidable in the registered sequences. Considering the static nature of scene, we aim to utilize low-rank prior of static background to refine registration errors while preserving details. We utilize the section lines and singular values to analyze the low-rank property of the distorted, registered and refined sequence in Fig. 5. In Fig. 5(a), we randomly select 1D section lines from each sequence. It is observed that the registered section lines contain sparse noise, while the refined section lines exhibit smoothness along the temporal dimension. Figure 5(b) shows the curves of singular value, revealing that the refined sequence manifests the strongest low-rank property.

Subspace-based Low-rank Tensor Refinement Model. Previous methods directly utilized matrix decomposition to remove turbulence [41, 24], which need to transform 3-D video into 2-D matrix, damaging the spatial-temporal structure. In this work, we propose a subspace-based low-rank tensor refinement model (SLRTR) to rectify the misalignments while preserving details. To our knowledge, we are the first to introduce tensor model into turbulence removal. Given the registered sequence $\bm{\mathcal{R}}\in{{\mathbb{R}}^{h\times w\times t}}$ , where $h$ , $w$ and $t$ respectively denote the image height, width, and the number of frames. The key challenge lies in effectively reducing registration error while preserving the spatio-temporal details. A registered sequence can be described as the following formula:

{\bm{\mathcal{R}}}={\bm{\mathcal{B}}}+{\bm{\mathcal{E}}}+{\bm{\mathcal{N}}},

(5)

where ${\bm{\mathcal{B}}}\in{{\mathbb{R}}^{h\times w\times t}}$ represents the refined sequence, ${\bm{\mathcal{E}}}\in{{\mathbb{R}}^{h\times w\times t}}$ is the registration error, ${\bm{\mathcal{N}}}\in{{\mathbb{R}}^{h\times w\times t}}$ denotes the random noise. In this work, we formulate the refinement as an inverse problem utilizing the maximum-a-posterior, as follows:

$\mathop{\min}\limits_{{\bm{\mathcal{B}}},{\bm{\mathcal{E}}}}\frac{1}{2}||{\bm{% \mathcal{B}}}+{\bm{\mathcal{E}}}-{\bm{\mathcal{R}}}||_{F}^{2}+\alpha\Phi_{b}({% \bm{\mathcal{B}}})+\beta\Phi_{e}({\bm{\mathcal{E}}}),$

(6)

where $\Phi_{b}$ and $\Phi_{e}$ represent the prior knowledge for the background and error, respectively, $\alpha$ and $\beta$ are the corresponding hyper-parameters. For static scene turbulence videos, on the one hand, the refined sequence ${\bm{\mathcal{B}}}$ exhibits global low-rank property along the temporal dimension, with an ideal rank of one. On the other hand, it also has significant non-local low-rank property along the spatial dimension, due to the self-similarity widely employed in image restoration [13]. Hence, we effectively exploit a joint global-nonlocal prior across both spatial and temporal dimensions to enhance the representation of the static background ${\bm{\mathcal{B}}}$ :

\displaystyle\leavevmode\resizebox{258.36667pt}{}{$\Phi_{b}({\bm{\mathcal{B}}}% )=\alpha\mathop{\sum}\limits_{i}\left(\frac{1}{{\lambda_{i}^{2}}}||{{\bm{% \mathcal{S}}}_{i}}{\bm{\mathcal{B}}}{\times_{3}}O_{i}-{{\bm{\mathcal{G}}}_{i}}% ||_{F}^{2}+||{\bm{\mathcal{G}}}_{i}||_{tnn}\right),$}

(7)

where ${{\bm{\mathcal{S}}}_{i}}{\bm{\mathcal{B}}}\in{{\mathbb{R}}^{{p^{2}}\times n% \times t}}$ is the constructed 3-D tensor via the non-local clustering of a sub-cubic ${u_{i}}\in{{\mathbb{R}}^{p\times p\times t}}$ [9], $p$ and $n$ are the spatial size and number of the sub-cubic respectively, $O_{i}\in{{\mathbb{R}}^{d\times t}}(d\ll t)$ is an orthogonal subspace projection matrix used to capture the temporal low-rank property, $\times_{3}$ is the tensor product along the temporal dimension [30], ${{{\bm{\mathcal{G}}}_{i}}}$ represents the low-rank approximation variable, $||\bullet||_{tnn}$ means the tensor nuclear norm for simplicity [9], $\lambda_{i}$ is the regularization parameter. As for the error ${\bm{\mathcal{E}}}$ , we formulate it as the sparse error [52] via the $L_{1}$ sparsity. Thus, the Eq. (6) can be expressed as:

		$\displaystyle\leavevmode\resizebox{258.36667pt}{}{$\left\{{\hat{\bm{\mathcal{B% }}},\hat{\bm{\mathcal{E}}},{{\hat{\bm{\mathcal{G}}}}_{i}},\hat{O}_{i}}\right\}% =\arg\mathop{\min}\limits_{{\bm{\mathcal{B}}},{\bm{\mathcal{E}}},{{\bm{% \mathcal{G}}}_{i}},O_{i}}\frac{1}{2}\|\|{\bm{\mathcal{B}}}+{\bm{\mathcal{E}}}-{% \bm{\mathcal{R}}}\|\|_{F}^{2}$}$		(8)
		$\displaystyle\leavevmode\resizebox{258.36667pt}{}{$+\beta\|\|{\bm{\mathcal{E}}}{% \|\|_{1}}+\alpha\mathop{\sum}\limits_{i}\left({\frac{1}{{\lambda_{i}^{2}}}\|\|{{% \bm{\mathcal{S}}}_{i}}{\bm{\mathcal{B}}}{\times_{3}}O_{i}-{{\bm{\mathcal{G}}}_% {i}}\|\|_{F}^{2}+\|\|{\bm{\mathcal{G}}}_{i}\|\|_{tnn}}\right).$}$		(8)

To solve ${\bm{\mathcal{B}}},{\bm{\mathcal{E}}},{\bm{\mathcal{G}}}_{i},O_{i}$ , we adopt the alternating minimization scheme [33] to solve the Eq. (8) for each variable. Please refer to the appendix for the solution.

5 Experimental Results

Table 2: Quantitative comparison with other methods on synthetic turbulence dataset at different distances. ^∗ denotes single-frame based approach. Red text indicates the best performance, blue text indicates the second-best performance.

\Delta

denotes the exact superiority of proposed CDSP over the second-best method. As the distance increases, our method outperforms the SOTA approach even more significantly.

Distance Metric Static-scene Independent Methods Static-scene Dependent Methods $\Delta$ TurbNet^∗ PiRN^∗ TSR-WGAN TMT TurbRecon NDL CLEAR SG NDIR CDSP 2 Km PSNR $\uparrow$ 23.58 25.93 23.98 26.54 27.27 24.93 26.19 24.53 24.29 27.94 0.67 SSIM $\uparrow$ 0.8155 0.8675 0.8215 0.8887 0.8995 0.8459 0.8815 0.8545 0.8147 0.9181 0.0186 4 Km PSNR $\uparrow$ 21.97 24.43 22.90 24.82 26.05 24.12 24.79 23.31 23.23 26.91 0.86 SSIM $\uparrow$ 0.7491 0.8168 0.7813 0.8556 0.8615 0.8111 0.8382 0.8026 0.7796 0.8893 0.0278 6 Km PSNR $\uparrow$ 21.22 23.42 21.89 23.73 24.93 23.37 23.58 22.50 22.59 25.74 0.81 SSIM $\uparrow$ 0.7132 0.7790 0.7394 0.8265 0.8311 0.7834 0.7925 0.7643 0.7561 0.8594 0.0283 8 Km PSNR $\uparrow$ 20.69 22.74 21.31 23.01 23.95 22.76 22.62 21.89 22.19 24.84 0.89 SSIM $\uparrow$ 0.6864 0.7518 0.7135 0.8012 0.8024 0.7592 0.7523 0.7335 0.7402 0.8319 0.0295

5.1 Datasets and Experimental Settings

Datasets. We conduct the experiments on various datasets, including a synthetic dataset, the proposed dataset RLR-AT and real hot-air turbulence dataset TurbuText [1]. Synthetic turbulence is simulated with varying distances on the ADE20K [57] employing the turbulence simulator P2S [38]. Further details of the simulator protocol are provided in the supplementary material.

Comparison Methods. We compare CDSP with (1) conventional turbulence removal methods: TurbRecon [37], SG [35], CLEAR [4] and NDL [58]; (2) deep learning based methods: TurbNet [39], TSR-WGAN [29], PiRN [27], NDIR [32] and TMT [56]. For a fair comparison, considering that the experimental datasets consist of static scenes, we categorize the methods into two groups: static-scene dependent and static-scene independent, and they are further classified based on input frames into multi-frame based and single-frame based. We employ codes and pre-train models of TMT and TurbRecon designed for static scenes to ensure a fair comparison. All methods incorporate deblurring effects, with NDIR and NDL using their default deblurring approach [45].

5.2 Qualitative and Quantitative Evaluation

Qualitative Evaluation on Real Turbulence. In Fig. 6, we compare with the existing methods on the RLR-AT. Single-frame based methods TurbNet [39] and PiRN [27] struggle with distortions as they lack modeling of temporal information for turbulence. The results of supervised-based methods like TSR-WGAN [29] and TMT [56] still exhibit distortions due to the domain gap. CLEAR [4], SG [35] and NDL [58] continually produce artifacts or distortions due to unsuitable design for long-range turbulence. Albeit TurbRecon [37] can acquire results with comparable quality, the results still encounter misalignments. In comparison, CDSP consistently achieves more pleasing results at various distances, effectively addressing severe distortions while preserving details. We also conduct a comparison on hot-air turbulence, which are shown in the appendix.

Quantitative Evaluation on Synthetic Turbulence. We further evaluate the performance of CDSP and other methods on synthetic turbulence in Table 2. It is observed that most multi-frame based methods perform better than single-frame based methods, since they take into consideration the temporal information of turbulence. CDSP and TurbRecon achieve the best performance in their respective categories. Note that as the distance increases, CDSP outperforms existing SOTA methods even more significantly, further revealing the superiority of CDSP for long-range turbulence with severe distortions.

Table 3: Ablation study of FRF and SLRTR. FRF SLRTR PSNR SSIM ✗ ✗ 22.62 0.8051 ✗ ✓ 23.02 0.8034 ✓ ✗ 23.53 0.8154 ✓ ✓ 23.96 0.8385 Table 4: Effectiveness of FRF when embedded into existing methods compared to others. Methods Metric Reference frame Gain Temp avg [58] Non-local avg [37] FRF (Ours) NDL [58] PSNR 22.95 22.88 23.29 0.34 SSIM 0.7698 0.7650 0.7819 0.0121 TurbRecon [37] PSNR 24.33 24.14 24.79 0.65 SSIM 0.8122 0.8082 0.8231 0.0149 CDSP (Ours) PSNR 25.25 24.68 25.41 SSIM 0.8391 0.8331 0.8462 Table 5: Boosting performance of high-level text recognition task. Methods CRNN ASTERN DAN Distorted 0.2553 0.3475 0.3759 TurbNet 0.2057 0.3546 0.3617 PiRN 0.2695 0.3546 0.3758 SG 0.2340 0.3971 0.4042 NDL 0.4539 0.4894 0.4681 TMT 0.4040 0.4893 0.4964 TurbRecon 0.4397 0.4965 0.4894 CDSP 0.5248 0.5319 0.5532

5.3 Ablation and Discussion

How dose FRF Facilitate Severe Distortion Correction? We study the importance of FRF for distortion correction. As shown in Fig. 3(a) earlier, due to the superior quality and sharp edges of FRF, better registration can be achieved, greatly reducing the burden of refinement. We remove the FRF-based registration (FRF), and directly employ SLRTR to handle severe distortions, placing a heavy burden on SLRTR. Table 5 shows that the removal of FRF leads to a noticeable performance drop. Moreover, the result without FRF in Fig. 8(a) encounters severe detail loss. Figure 8(b) and (c) show the visualization and distribution of residuals from SLRTR decomposition. It is evident that the residual without FRF contains more lost details. This indicates that FRF is indispensable in reducing the burden of refinement and preventing severe details loss.

How dose SLRTR Improve Severe Distortion Correction? We aim to emphasize the necessity of SLRTR for distortion correction. We remove the SLRTR and directly utilize FRF-based registration. However, achieving perfect pixel-level registration is impossible, since there exist severe distortions in long-range turbulence. Table 5 shows that the method encounters a obvious performance drop without SLRTR, and the result without SLRTR in Fig. 8(a) still exists misalignments, indicating that SLRTR is necessary for refining registration errors.

Effectiveness of Subspace and Self-similarity. Then we aim to illustrate the effectiveness of subspace and self-similarity to the proposed SLRTR. The subspace is utilized to characterize the global low-rank property along the temporal dimension. In Fig. 8(b), the result without subspace still exists distortions, indicating that relying solely on the prior of spatial self-similarity is insufficient to characterize the properties of the temporal dimension. The non-local prior is employed to explore the self-similarity of the spatial dimension. Figure 8(c) shows the result without self-similarity, which suffers from unexpected details loss, implying that relying only on the temporal information is not enough.

Complementarity between SLRTR and FRF. We further discuss how does FRF and SLRTR complement each other in Fig. 8(a). On one hand, FRF-based registration could notably mitigate distortions with fewer corruptions, greatly reducing the burden of SLRTR. On the other hand, the SLRTR could effectively refine the residual errors unavoidably left by FRF-based registration. FRF and SLRTR complement to each other to better remove the severe distortion.

Effectiveness of Frequency-aware Reference Frame. To further illustrate the Effectiveness of FRF, we embed existing reference frames (Temp Avg [58], Non-local average [37]) and proposed FRF into existing frameworks: NDL [58], TurbRecon [37] and proposed CDSP on 5 Km synthetic turbulence. Table 5 shows that existing methods obtain the highest PSNR/SSIM after integrating FRF, revealing the effectiveness of FRF. We also visualize the comparison between FRF and other reference frames in Fig. 9. It is observed that FRF possesses superior quality and sharper edge, further demonstrating the reliability of FRF.

Promotion for Downstream Recognition. We further evaluate the turbulence mitigating methods on text recognition using TurbuText [1]. We apply three text recognition methods (CRNN [46], ASTERN [47], DAN [50]) on restoration results and report the accuracies in Table 5. CDSP consistently improves the recognition performance for all recognition methods.

Limitation. The proposed method could effectively handle static scene turbulence, but dynamic scenes (e.g. scene contains moving objects or camera shake) are more complex due to the coupling of turbulence, object and camera motion. Figure 10 shows the results of dynamic scene. Though static regions are well-processed, dynamic regions suffer from severe trailing since CDSP does not model object motion. We will address dynamic scene turbulence in future work.

6 Conclusion

Our work focuses on long-range turbulence mitigation. We construct a long-range turbulence dataset (RLR-AT). We propose a coarse-to-fine framework for long-range turbulence mitigation, which cooperates the dynamic and the static priors. We propose a frequency-aware reference frame for better registration. We propose a low-rank tensor refinement model to refine the registration error with details preserving. Extensive experiments demonstrate the proposed method outperforms SOTA methods on different datasets.

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 62371203. The computation is completed in the HPC Platform of Huazhong University of Science and Technology.

References

[1] Bridging the gap between computational photography and visual recognition: 6th ug2+ prize challenge, http://http://cvpr2023.ug2challenge.org/dataset23_t2.html, track 2
[2] Lucky imaging: high angular resolution imaging in the visible from the ground. Astronomy & Astrophysics 446(2), 739–745 (2006)
[3] Anantrasirichai, N., Achim, A., Bull, D.: Atmospheric turbulence mitigation for sequences with moving objects using recursive image fusion. In: ICIP. pp. 2895–2899 (2018)
[4] Anantrasirichai, N., Achim, A., Kingsbury, N.G., Bull, D.R.: Atmospheric turbulence mitigation using complex wavelet-based fusion. IEEE TIP 22(6), 2398–2408 (2013)
[5] Boehrer, N., Nieuwenhuizen, R.P., Dijk, J.: Turbulence mitigation in neuromorphic camera imagery. vol. 11540, pp. 43–58. SPIE (2020)
[6] Brandner, W., Hormuth, F.: Lucky imaging in astronomy. Astronomy at High Angular Resolution: A Compendium of Techniques in the Visible and Near-Infrared pp. 1–16 (2016)
[7] Caliskan, T., Arica, N.: Atmospheric turbulence mitigation using optical flow. In: ICPR. pp. 883–888 (2014)
[8] Chan, S.H.: Tilt-then-blur or blur-then-tilt? clarifying the atmospheric turbulence model. IEEE SPL 29, 1833–1837 (2022)
[9] Chang, Y., Yan, L., Zhong, S.: Hyper-laplacian regularized unidirectional low-rank tensor recovery for multispectral image denoising. In: CVPR. pp. 4260–4268 (2017)
[10] Chimitt, N., Chan, S.H.: Simulating anisoplanatic turbulence by sampling intermodal and spatially correlated zernike coefficients. OE 59(8), 083101–083101 (2020)
[11] Chimitt, N., Mao, Z., Hong, G., Chan, S.H.: Rethinking atmospheric turbulence mitigation. arXiv preprint arXiv:1905.07498 (2019)
[12] Chimitt, N., Zhang, X., Mao, Z., Chan, S.H.: Real-time dense field phase-to-space simulation of imaging through atmospheric turbulence. IEEE TCI 8, 1159–1169 (2022)
[13] Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE TIP 16(8), 2080–2095 (2007)
[14] Deshmukh, A.S., Medasani, S.S., Reddy, G.R.: A fast hierarchical patch-based approach for mitigating atmospheric turbulence. In: ICACCI. pp. 1–7 (2013)
[15] Fazlali, H., Shirani, S., BradforSd, M., Kirubarajan, T.: Atmospheric turbulence removal in long-range imaging using a data-driven-based approach. IJCV 130(4), 1031–1049 (2022)
[16] Feng, B.Y., Xie, M., Metzler, C.A.: Turbugan: An adversarial learning approach to spatially-varying multiframe blind deconvolution with applications to imaging through turbulence. IEEE JSAIT 3(3), 543–556 (2022)
[17] Fried, D.L.: Optical resolution through a randomly inhomogeneous medium for very long and very short exposures. JOSA 56(10), 1372–1379 (1966)
[18] Fried, D.L.: Anisoplanatism in adaptive optics. JOSA 72(1), 52–61 (1982)
[19] Gao, J., Anantrasirichai, N., Bull, D.: Atmospheric turbulence removal using convolutional neural network. arXiv preprint arXiv:1912.11350 (2019)
[20] Geiger, A., Moosmann, F., Car, Ö., Schuster, B.: Automatic camera and range sensor calibration using a single shot. In: ICRA. pp. 3936–3943 (2012)
[21] Gilles, J., Ferrante, N.B.: Open turbulent image set (otis). PRL 86, 38–41 (2017)
[22] Hardie, R.C., Power, J.D., LeMaster, D.A., Droege, D.R., Gladysz, S., Bose-Pillai, S.: Simulation of anisoplanatic imaging through optical turbulence using numerical wave propagation with new validation analysis. OE 56(7), 071502–071502 (2017)
[23] Hardie, R.C., Rucci, M.A., Dapore, A.J., Karch, B.K.: Block matching and wiener filtering approach to optical turbulence mitigation and its application to simulated and real imagery with quantitative error analysis. OE 56(7), 071503–071503 (2017)
[24] He, R., Wang, Z., Fan, Y., Fengg, D.: Atmospheric turbulence mitigation based on turbulence extraction. In: ICASSP. pp. 1442–1446 (2016)
[25] Hirsch, M., Sra, S., Schölkopf, B., Harmeling, S.: Efficient filter flow for space-variant multiframe blind deconvolution. In: CVPR. pp. 607–614 (2010)
[26] Hua, X., Pan, C., Shi, Y., Liu, J., Hong, H.: Removing atmospheric turbulence effects via geometric distortion and blur representation. IEEE TGRS 60, 1–13 (2020)
[27] Jaiswal, A., Zhang, X., Chan, S.H., Wang, Z.: Physics-driven turbulence image restoration with stochastic refinement. In: ICCV. pp. 12170–12181 (2023)
[28] Jiang, W., Boominathan, V., Veeraraghavan, A.: Nert: Implicit neural representations for unsupervised atmospheric turbulence mitigation. In: CVPRW. pp. 4235–4242 (2023)
[29] Jin, D., Chen, Y., Lu, Y., Chen, J., Wang, P., Liu, Z., Guo, S., Bai, X.: Neutralizing the impact of atmospheric turbulence on complex scene imaging via deep learning. NMI 3(10), 876–884 (2021)
[30] Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM review 51(3), 455–500 (2009)
[31] Lau, C.P., Lai, Y.H., Lui, L.M.: Restoration of atmospheric turbulence-distorted images via rpca and quasiconformal maps. Inverse Problems 35(7), 074002 (2019)
[32] Li, N., Thapa, S., Whyte, C., Reed, A.W., Jayasuriya, S., Ye, J.: Unsupervised non-rigid image distortion removal via grid deformation. In: ICCV. pp. 2522–2532 (2021)
[33] Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. NeurIPS 24 (2011)
[34] Liu, C., et al.: Beyond pixels: exploring new representations and applications for motion analysis. Ph.D. thesis, Massachusetts Institute of Technology (2009)
[35] Lou, Y., Kang, S.H., Soatto, S., Bertozzi, A.L.: Video stabilization of atmospheric turbulence distortion. Citeseer IPI 7(3), 839–861 (2013)
[36] Lu, C., Feng, J., Chen, Y., Liu, W., Lin, Z., Yan, S.: Tensor robust principal component analysis: Exact recovery of corrupted low-rank tensors via convex optimization. In: CVPR. pp. 5249–5257 (2016)
[37] Mao, Z., Chimitt, N., Chan, S.H.: Image reconstruction of static and dynamic scenes through anisoplanatic turbulence. IEEE TCI 6, 1415–1428 (2020)
[38] Mao, Z., Chimitt, N., Chan, S.H.: Accelerating atmospheric turbulence simulation via learned phase-to-space transform. In: ICCV. pp. 14759–14768 (2021)
[39] Mao, Z., Jaiswal, A., Wang, Z., Chan, S.H.: Single frame atmospheric turbulence mitigation: A benchmark study and a new physics-inspired transformer model. In: ECCV. pp. 430–446. Springer (2022)
[40] Mei, K., Patel, V.M.: Ltt-gan: Looking through turbulence by inverting gans. IEEE JSTSP (2023)
[41] Oreifej, O., Li, X., Shah, M.: Simultaneous video stabilization and moving object detection in turbulence. IEEE TPAMI 35(2), 450–462 (2012)
[42] Rai, S.N., Jawahar, C.: Removing atmospheric turbulence via deep adversarial learning. IEEE TIP 31, 2633–2646 (2022)
[43] Roggemann, M.C., Welsh, B.M., Hunt, B.R.: Imaging through turbulence. CRC press (1996)
[44] Schwartzman, A., Alterman, M., Zamir, R., Schechner, Y.Y.: Turbulence-induced 2d correlated image distortion. In: ICCP. pp. 1–13 (2017)
[45] Shan, Q., Jia, J., Agarwala, A.: High-quality motion deblurring from a single image. ACM TOG 27(3), 1–10 (2008)
[46] Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE TPAMI 39(11), 2298–2304 (2016)
[47] Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: An attentional scene text recognizer with flexible rectification. IEEE TPAMI 41(9), 2035–2048 (2018)
[48] Shimizu, M., Yoshimura, S., Tanaka, M., Okutomi, M.: Super-resolution from image sequence under influence of hot-air optical turbulence. In: CVPR. pp. 1–8 (2008)
[49] Tatarski, V.I.: Wave propagation in a turbulent medium. Courier Dover Publications (2016)
[50] Wang, T., Zhu, Y., Jin, L., Luo, C., Chen, X., Wu, Y., Wang, Q., Cai, M.: Decoupled attention network for text recognition. In: AAAI. vol. 34, pp. 12216–12224 (2020)
[51] Wang, Y., Jin, D., Chen, J., Bai, X.: Revelation of hidden 2d atmospheric turbulence strength fields from turbulence effects in infrared imaging. NCS pp. 1–13 (2023)
[52] Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. NeurIPS 22 (2009)
[53] Xie, Y., Zhang, W., Tao, D., Hu, W., Qu, Y., Wang, H.: Removing turbulence effect via hybrid total variation and deformation-guided kernel regression. IEEE TIP 25(10), 4943–4958 (2016)
[54] Yasarla, R., Patel, V.M.: Learning to restore images degraded by atmospheric turbulence using uncertainty. In: ICIP. pp. 1694–1698 (2021)
[55] Zhang, X., Chimitt, N., Chi, Y., Mao, Z., Chan, S.H.: Spatio-temporal turbulence mitigation: A translational perspective. In: CVPR (2024)
[56] Zhang, X., Mao, Z., Chimitt, N., Chan, S.H.: Imaging through the atmosphere using turbulence mitigation transformer. IEEE TCI (2024)
[57] Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: CVPR. pp. 633–641 (2017)
[58] Zhu, X., Milanfar, P.: Removing atmospheric turbulence via space-invariant deconvolution. IEEE TPAMI 35(1), 157–170 (2012)

Long-range Turbulence Mitigation: A Large- scale Dataset and A Coarse-to-fine Framework