Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
11institutetext: National Key Lab of Multispectral Information Intelligent Processing Technology,
Huazhong University of Science and Technology, China 11email: {shengqi,sunrun,yichang,caoshuning,xiaoxueyao,yanluxin}@hust.edu.cn
https://shengqi77.github.io/RLR-AT.github.io/

Long-range Turbulence Mitigation: A Large-
scale Dataset and A Coarse-to-fine Framework

Shengqi Xu    Run Sun    Yi Chang 🖂    Shuning Cao    Xueyao Xiao    Luxin Yan
Abstract

Long-range imaging inevitably suffers from atmospheric turbulence with severe geometric distortions due to random refraction of light. The further the distance, the more severe the disturbance. Despite existing research has achieved great progress in tackling short-range turbulence, there is less attention paid to long-range turbulence with significant distortions. To address this dilemma and advance the field, we construct a large-scale real long-range atmospheric turbulence dataset (RLR-AT), including 1500 turbulence sequences spanning distances from 1 Km to 13 Km. The advantages of RLR-AT compared to existing ones: turbulence with longer-distances and higher-diversity, scenes with greater-variety and larger-scale. Moreover, most existing work adopts either registration-based or decomposition-based methods to address distortions through one-step mitigation. However, they fail to effectively handle long-range turbulence due to its significant pixel displacements. In this work, we propose a coarse-to-fine framework to handle severe distortions, which cooperates dynamic turbulence and static background priors (CDSP). On the one hand, we discover the pixel motion statistical prior of turbulence, and propose a frequency-aware reference frame for better large-scale distortion registration, greatly reducing the burden of refinement. On the other hand, we take advantage of the static prior of background, and propose a subspace-based low-rank tensor refinement model to eliminate the misalignments inevitably left by registration while well preserving details. The dynamic and static priors complement to each other, facilitating us to progressively mitigate long-range turbulence with severe distortions. Extensive experiments demonstrate that the proposed method outperforms SOTA methods on different datasets.

[Uncaptioned image]
Figure 1: Visual examples of turbulence mitigation on proposed real-world long-range atmospheric turbulence benchmark RLR-AT. The proposed method could effectively handle the long-range turbulence with severe distortions.

1 Introduction

Seeing farther and more clearly is crucial for many military and civilian applications. Unfortunately, long-range imaging inevitably suffers from the atmospheric turbulence with severe geometric distortion due to random refraction of light [49, 8, 17]. When accumulating over long distance in a non-uniform atmosphere medium, more refractions occur during optical transmission, resulting in more severe pixel displacement. Although short-range atmospheric turbulence mitigation has achieved great progress in recent years [29, 39, 54, 58, 24, 26], very few studies have focused on long-range turbulence. In this work, our goal is to handle the long-range turbulence with severe distortions as shown in Fig. 1.

Atmospheric turbulence benchmark is a key issue for evaluating turbulence mitigation methods. Existing datasets can be classified into synthetic datasets [56, 55] and real-world datasets [25, 1, 29, 4, 21, 41, 39]. Synthetic datasets are mainly constructed using turbulence simulators [37, 10, 12]. However, simulators cannot precisely generate data entirely matching the features of real turbulence, since turbulence in real scenarios possesses complex nature, especially for long-range turbulence. Real turbulence is primarily influenced by temperature and imaging distance. Greater distances and higher temperatures both lead to more severe turbulence. Typically, real turbulence can be divided into: hot-air turbulence and long-range turbulence [37]. Most public real datasets consist of hot-air turbulence, and TurbRecon [37] stands out by capturing 2 turbulence sequences for building scene at a distance of 4 Km. As such, constructing a large-scale real-world long-range turbulence dataset with diverse scenes is highly necessary.

In this work, we construct a large-scale real-world long-range turbulence dataset RLR-AT for long-range turbulence mitigation. The strength of our benchmark is threefold. Firstly, RLR-AT contains long-range turbulence with longer-distances and higher-diversity, covering diverse distortions ranging from 1 Km to 13 Km. Secondly, it consists of large-scale and diverse scenes, including 1500 turbulence sequences collected across various scenarios, such as text, object, building, etc. Last but not least, RLR-AT is collected by a telephoto camera with high-resolution (1980*1080 pixels). Overall, RLR-AT can serve as a benchmark for future works targeting long-range turbulence mitigation.

The main difficulty of long-range turbulence mitigation lies in the severe distortions. To handle distortions, most existing methods can be classified into two categories: registration-based methods [7, 15, 26, 37, 48, 58, 53] and decomposition-based methods [14, 41, 24]. The former merely employs a zero-mean assumption prior of dynamic turbulence to construct a reference frame, aligning the distorted frames with the reference frame using registration technique. However, such strategy struggles to handle long-range turbulence with severe distortions, as registration errors emerge from the blurring of the reference frame.

On the contrary, the latter is to treat the distortions among the frames as gross error, and directly remove the distortions through matrix decomposition by exploring low-rank prior of static background. Though it is effective for mild distortions, it is theoretically less robust when handling severely corrupted observations [52, 36]. Thus, directly employing such strategy on the long-range turbulence with severe distortions would suffer from unexpected details loss. Overall, previous methods either utilize the zero-mean assumption of dynamic turbulence or low-rank prior of static background, and both of them are difficult to handle long-range turbulence with severe distortions.

To handle long-range turbulence with severe distortions, we propose a coarse-to-fine framework that cooperates dynamic turbulence and static background priors (CDSP). On the one hand, we explore the pixel motion statistical prior of turbulence and discover that the pixel occurring most frequently at one certain position is most likely closer to the original GT. This inspires us to propose a frequency-aware reference frame for better distortion registration, significantly reducing the burden of subsequent refinement. On the other hand, we further take advantage of static prior of background and propose a subspace-based low-rank tensor refinement model to refine the registration errors unavoidably left by registration meanwhile well preserving details. The dynamic and static priors complement to each other, facilitating us to mitigate long-range turbulence with severe distortions. Overall, our main contributions are summarized as follows:

  1. 1.

    Our work focuses on a challenging yet practical task: long-range turbulence mitigation. We construct a large-scale real-world long-range atmospheric turbulence benchmark (RLR-AT). Compared to existing public real-world datasets , RLR-AT is the farthest (ranging from 1 Km to 13 Km) and largest-scale (1500 sequences with high-resolution 1980*1080) turbulence dataset with diverse turbulence levels and scenes. This dataset would be a good testbed for the community, especially for long-range turbulence mitigation.

  2. 2.

    We propose a coarse-to-fine framework for long-range turbulence mitigation, which cooperates dynamic and static priors. Specifically, we figure out the pixel displacement statistical prior of dynamic turbulence and propose a frequency-aware reference frame for better registration, significantly reducing the burden of refinement. Moreover, we take advantage of low-rank prior of static background and propose a subspace-based low-rank tensor refinement model to remove the registration errors meanwhile well preserving details. Compared to existing methods, the dynamic and static priors complement to each other, facilitating us to address long-range turbulence with significant distortions.

  3. 3.

    We comprehensively compare CDSP with existing methods on proposed real long-range turbulence dataset RLR-AT and synthetic dataset. Extensive experiments show that our CDSP consistently outperforms SOTA methods, especially when handling long-range turbulence with severe distortions.

2 Related Work

Real Atmospheric Turbulence Benchmarks. Atmospheric turbulence is a fundamental issue in long-range imaging system, mostly depending on the temperature and imaging distance. Higher temperatures and longer distances typically result in more severe turbulence. Generally, turbulence can be mainly classified into two categories: hot-air turbulence and long-range turbulence [37]. In Table 1, we provide a comprehensive summary of existing pubic benchmarks. EEF [25] provided two widely used hot-air turbulence samples. UG2+ TurbuText [1] and Heat Chamber [39] collected turbulence sequences with a distance of 20 meters. These hot-air turbulence datasets were collected with artificial heat burner. Further, CLEAR [4], OTIS [21], Turbulence Text [39] and TSR-WGAN [29] collected turbulence under high-temperature environment near the surface. Most existing public datasets are composed of hot-air turbulence, and TurbRecon [37] captured 2 turbulence sequences with distance of 4Km. Moreover, existing datasets are still limited in terms of scale and diversity. In this work, we focus on the challenging problem: long-range turbulence mitigation and construct a large-scale real long-range turbulence dataset to advance this field.

Table 1: Summary of existing available real-world turbulence benchmarks.

Venue Hot-air Temperature Long-range Distance Scene Category Scene Avg Frames Resolution EFF [25] - <<<1Km Building, Chimney 2 100 240*240 Heat Chamber [39] - 20m Image 400 100 440*440 UG2+ TurbuText [1] - 20m Text 100 100 440*440 CLEAR [4] 46superscript4646^{\circ}46 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT <<<2Km Building, Street 3 53 250*180 OTIS [21] - <<<1Km Pattern, door 21 276 256*256 TSR-WGAN [29] 33superscript3333^{\circ}33 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT <<<3Km Street, Grassland 21 233 960*540 Turbulence Text [39] 30superscript3030^{\circ}30 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT 300m Text 100 100 440*440 TurbRecon [37] 30superscript3030^{\circ}30 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT 4Km Building, Chimney 4 100 512*512 RLR-AT 640similar-tosuperscript6superscript40-6^{\circ}\sim 40^{\circ}- 6 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ∼ 40 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT 1Km-13Km Building, Text \ldots Car 1500 800 1920*1080

Atmospheric Turbulence Mitigation. Lucky imaging is an intuitive way to mitigate the turbulence [6, 2, 26, 11, 3, 5, 31]. Its key idea is to choose the lucky high-quality frames least affected by atmosphere from short-exposure imaging frames. Unfortunately, the lucky assumption does not hold any more for long-range anisoplanatic turbulence, where severe distortions persist across all frames [43, 18, 37]. In recent years, the data-driven methods have been popular due to its end-to-end simplicity [27, 39, 19, 54, 40, 32, 28, 16, 42, 51, 56, 55]. Its main idea is to train on the paired clean-degraded synthetic data from turbulence simulators [12, 10, 38, 44, 22]. The learning-based methods would achieve satisfactory results on simulated data while can not generalize well to the real turbulence due to the domain gap, especially for long-range turbulence with severe distortions. Considering the turbulence has clear physical procedure namely light refraction and diffraction, Zhu et al. [58] proposed a classical multi-stage restoration framework which gradually performed the distortion correction and deblurring. In this work, we follow this research line with clear physics foundations, and investigate how to cooperate the dynamic and static priors for better distortion correction.

Turbulence Distortion Correction. The main difficulty of the long-range turbulence lies in the severe distortions. Most existing methods can be classified into two categories: registration-based methods [48, 58, 7, 53, 37, 26, 15] and decomposition-based methods [41, 14, 24]. Most existing registration-based methods employ the zero-mean assumption prior of dynamic turbulence to construct a reference frame, and suppress distortion by aligning each input frame to the reference frame utilizing non-grid registration. For example, the most common way to obtain a reference frame is by directly averaging input frames [25, 58, 4, 7, 23]. Mao et al. [37] further presented a novel space-time non-local averaging method to adaptively assign different weights for different frames, departing from uniform temporal averaging. However, these methods would suffer from blur, leading to inaccurate registration, especially for long-range turbulence. In this work, we discover the pixel motion statistical prior of turbulence and propose a frequency-aware high-quality reference frame for better large-scale distortion registration.

The removal-based methods started from the matrix decomposition perspective, utilizing low-rank prior of static background to remove distortions. For example, Oreifej et al. [41] proposed a three-term matrix decomposition approach for simultaneously distortion removal and object detection. However, these matrix-based methods need to transform 3-D video into 2-D matrix, which would unexpectedly damage the spatio-temporal structure. In this work, we propose a subspace-based low-rank tensor refinement model to refine the registration error meanwhile well preserving the spatio-temporal details. We further integrate dynamic and static priors within a coarse-to-fine framework in a complementary manner to better handle long-range turbulence with severe distortions.

Refer to caption
Figure 2: Illustration of the proposed dataset RLR-AT. (a) Long-range imaging with larger focal length lens through turbulence. (b) Typical turbulence with diverse long-distance conditions. (c) Statistics of distance and scene of the proposed benchmark.

3 Large-scale Real Long-range Turbulence Benchmark

Owing to the challenges in gathering long-range turbulence, current datasets mainly focus on hot-air turbulence, overlooking long-range turbulence with severe distortions. To fill this gap, we construct a large-scale long-range turbulence dataset for verification and analysis, named as RLR-AT. Note that RLR-AT also includes videos of dynamic scenes and turbulence coupled with haze, which can be used to study turbulence in dynamic scenes and multi-degradation restoration.

Benchmark Collection. In this work, we collect the long-range turbulence sequences by a telephoto camera (Nikon Coolpix P1000) with equivalent 3000mm lens focal length, sampled in 30 fps. The data collection process is illustrated in Fig. 2(a). Firstly, we stabilize the camera on a tripod to capture distant static scenes. Subsequently, we adjust the focal length until we discern the emergence of geometric distortion and blurring induced by non-uniform indices of refraction associated with long-range turbulence. For each sequence, we record approximate 35 seconds and extract the intermediate steady 30 seconds into our dataset.

Benchmark Statistics. Table 1 presents the detailed statistical comparison between our proposed RLR-AT and existing turbulence benchmarks. Overall, our dataset contains 1500 sequences, each of which consists of approximately 800 frames, collected from diverse cities. Notably, our dataset comprehensively covers long-range turbulence across distances ranging from 1 km to 13 km. Moreover, over 19 typical scenes are captured, including the street, sports ground, factory, car and billboard, etc, offering a comprehensive range for long-range surveillance scenarios. To visualize the distribution of distances and scene categories, a bar chart and a sunburst chart are illustrated in Fig. 2(c).

Turbulence with Longer-distances and Higher-diversity. The key difference between RLR-AT and other datasets is that RLR-AT covers turbulence distortions with longer-distances and higher-diversity. Most existing public datasets are mainly composed of hot-air turbulence, and TurbRecon [37] stands out by capturing two turbulence sequences at a distance of 4 Km. In comparison, RLR-AT contains long-range turbulence images captured from longer and richer distances (1-13Km). In Fig. 2(b), we visualize some long-range turbulence images with increasing distances in RLR-AT. It can be observed that with increasing distance, turbulence degradation level is higher, leading to more severe distortions in the images. The bar chart in Fig. 2(c) displays the number of turbulence sequences in our dataset at each distance, ranging from 1 km to 13 km, which further highlights the diversity of distances within our RLR-AT.

Scenes with Larger-scale and Greater-variety. We concern not only the distances diversity of long-range turbulence but also the variety and scale of scenes. In Table 1, we can observe that the existing real-world turbulence benchmarks are still limited in terms of scene amount and diversity. Most datasets focus on relatively common scenes, such as building, street or grassland, and UG2+ TurbuText [1] and Turbulence Text [39] both collect hot-air turbulence specifically for text scenes. In comparison, RLR-AT contains 1500 long-range turbulence sequences across 19 various categories. Figure 2(b) shows some typical long-range turbulence images in various scenes, such as car, motorcycle, building, text, offering a comprehensive range for long-range surveillance scenarios. The sunburst chart in Fig. 2(c) illustrates the distribution of scene categories in RLR-AT, further showcasing the diversity of scenes within our dataset.

Refer to caption
Figure 3: Comparison between Temp Avg and proposed FRF. (a) Comparison of registration performance between Temp Avg and FRF. Temp Avg fails to achieve precise registration since it suffers from blur, while FRF achieves more accurate registration. (b) Comparison between the construction of Temp Avg and FRF. Temp Avg assigns equal weight to all intensities at a certain position. However, the output is dissimilar with original GT, since the less frequently occurring intensities make a negative contribution. In contrast, we argue that the higher the frequency, the greater the weight, as the most frequently occurring pixel at the position is closer to the original GT.

4 A Coarse-to-fine Framework for Long-range Turbulence

In this work, we propose a coarse-to-fine framework that cooperates dynamic turbulence prior and static background priors (CDSP) to handle long-range turbulence with severe distortions. On the one hand, we discover the pixel motion statistical prior of turbulence and propose a frequency-aware reference frame for better large-scale distortion registration, which greatly reduces the burden of refinement (Section 4.1). Then we align the distorted frames to the proposed reference frame utilizing registration approach based on optical flow [34]. On the other hand, we take advantage of the static prior of background and propose a subspace-based low-rank tensor refinement model to refine the registration errors unavoidably left by registration while well preserving details (Section 4.2). The dynamic and static priors complement to each other, facilitating us to better eliminate the severe distortions. Finally, we employ a simple data-driven network to further remove the residual blur, and the generation of paired deblurring data is based on the proposed distortion correction framework. The details of blur removal are provided in the supplementary material.

4.1 Frequency-aware Reference Frame Construction

Most previous methods [25, 7, 23, 58] employ temporal averaging (Temp Avg) to construct a reference frame by naively applying zero-mean assumption of turbulence. However, Temp Avg often suffers from blur, leading to imprecise registration when handling severe distortions, as shown in Fig. 3(a). In contrast, FRF achieves more accurate registration due to its superior quality. In Fig. 3(b), Temp Avg assumes that all pixel intensities occurring at a certain position have equal weight. However, the output is dissimilar with original GT, as the non-original pixel intensities make a negative contribution to the output at the certain position. In this work, we propose a frequency-aware method to construct a reliable reference frame based on the pixel motion statistical prior of turbulence.

Pixel Displacement Statistical Prior of Turbulence. To explore the pixel displacement statistical prior of turbulence, we conduct an analysis experiment utilizing the corners of checkerboard in Fig. 4. We take checkerboard turbulence images as the experimental datasets due to the ability to approximate corner motion as pixel motion, and the mature nature of checkerboard corner detection techniques. Then we apply a corner detector [20] on the collected datasets to detect the shifted corners. Note that we conduct extensive analysis experiments on long-range turbulence across various distances and scenes ( e.g. Building with wall corners). Please refer to supplementary material for details.

Refer to caption
Figure 4: Analysis of pixel motion statistical priors using checkerboard corners.(a) visualizes the clean and distorted checkerboard and the motion field of corners. (b) shows the motion trajectory of corners along the temporal axis, with an extended display of two randomly selected corners across 200 frames. (c) performs a statistic of corner motions and the motions of corners approximately conform to a zero-mean gaussian distribution, indicating that corners most frequently occur in their original positions.

In Fig. 4(a), we show the clean and distorted checkerboard and corresponding motion field of corners. It is evident that the corners of distorted frames exhibit noticeable motion, and the motion of corners varies across different frames. To better explore the motion of corners in the temporal dimension, we further visualize the motion trajectory of the corners along the temporal axis in Fig. 4(b). The motion trajectories of each corner are distinct. We randomly select two corners and present their horizontal and vertical positions on the right. It is observed that although the displacements of the two corners differ, a commonality is that they consistently occur relatively close to their original positions, resembling a statistical rule. To further explore the statistics of pixel motion, we normalize the positions of all corners across 200 frames and perform statistics analysis on their displacements in Fig. 4(c). The statistical results show that the motions approximately follow a zero-mean gaussian distribution, indicating that pixels most frequently occur in their original positions. This insight implies that the pixel occurring most frequently at one certain position is most likely the original GT, inspiring us to propose a frequency-aware reference frame.

Frequency-aware Reference Frame. Given the distorted sequence, suppose there are K𝐾Kitalic_K different pixel intensities occurring at a certain position 𝒍=(x,y)𝒍𝑥𝑦\bm{\mathit{l}}=(x,y)bold_italic_l = ( italic_x , italic_y ) along the temporal dimension. Let I𝒍,ksubscript𝐼𝒍𝑘I_{\bm{\mathit{l}},k}italic_I start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT represent a pixel intensity occurring at this position in the distorted sequence and k[1,K]𝑘1𝐾k\in[1,K]italic_k ∈ [ 1 , italic_K ]. We first count the frequency of each pixel intensity occurring at the position along the temporal dimension: N𝒍,k=count(I𝒍,k)subscript𝑁𝒍𝑘𝑐𝑜𝑢𝑛𝑡subscript𝐼𝒍𝑘N_{\bm{\mathit{l}},k}=count(I_{\bm{\mathit{l}},k})italic_N start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT = italic_c italic_o italic_u italic_n italic_t ( italic_I start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT ). Previous temporal average reference frame assigns a weight of one to each pixel intensity at the certain position, hence, the output T𝒍subscript𝑇𝒍T_{\bm{\mathit{l}}}italic_T start_POSTSUBSCRIPT bold_italic_l end_POSTSUBSCRIPT is obtained by summing all pixel intensities and dividing by the number of frames that equals the sum of the frequency of pixel intensities:

T𝒍=(k=1KN𝒍,k×I𝒍,k)/(k=1KN𝒍,k),subscript𝑇𝒍superscriptsubscript𝑘1𝐾subscript𝑁𝒍𝑘subscript𝐼𝒍𝑘superscriptsubscript𝑘1𝐾subscript𝑁𝒍𝑘\begin{aligned} T_{\bm{\mathit{l}}}=\Big{(}{\sum\limits_{k=1}^{K}N_{\bm{% \mathit{l}},k}\times I_{\bm{\mathit{l}},k}}\Big{)}/\Big{(}{\sum\limits_{k=1}^{% K}N_{\bm{\mathit{l}},k}}\Big{)},\end{aligned}start_ROW start_CELL italic_T start_POSTSUBSCRIPT bold_italic_l end_POSTSUBSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT × italic_I start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT ) / ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT ) , end_CELL end_ROW

(1)

Different from temporal averaging, we argue that the weight of pixel intensities is positively correlated with their frequency as shown in Fig. 3(b). Consequently, we construct a frequency-aware weight for each pixel intensity:

ω𝒍,k=eσ×N𝒍,k,subscript𝜔𝒍𝑘superscripte𝜎subscript𝑁𝒍𝑘\displaystyle\omega_{\bm{\mathit{l}},k}=\mathrm{e}^{\sigma\times N_{\bm{% \mathit{l}},k}},italic_ω start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT = roman_e start_POSTSUPERSCRIPT italic_σ × italic_N start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , (2)

which is a function of the frequency N𝒍,ksubscript𝑁𝒍𝑘N_{\bm{\mathit{l}},k}italic_N start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT, and σ𝜎\sigmaitalic_σ is a hyper-parameter controlling the growth rate of the weight. Next, the pixel value of the reference frame at the certain position F𝒍subscript𝐹𝒍F_{\bm{\mathit{l}}}italic_F start_POSTSUBSCRIPT bold_italic_l end_POSTSUBSCRIPT is constructed via weighted averaging based on frequency:

F𝒍=(k=1KN𝒍,k×I𝒍,k×ω𝒍,k)/(k=1KN𝒍,k×ω𝒍,k).subscript𝐹𝒍superscriptsubscript𝑘1𝐾subscript𝑁𝒍𝑘subscript𝐼𝒍𝑘subscript𝜔𝒍𝑘superscriptsubscript𝑘1𝐾subscript𝑁𝒍𝑘subscript𝜔𝒍𝑘\begin{aligned} F_{\bm{\mathit{l}}}=\Big{(}{\sum\limits_{k=1}^{K}N_{\bm{% \mathit{l}},k}\times I_{\bm{\mathit{l}},k}\times\omega_{\bm{\mathit{l}},k}}% \Big{)}/\Big{(}{\sum\limits_{k=1}^{K}N_{\bm{\mathit{l}},k}}\times\omega_{\bm{% \mathit{l}},k}\Big{)}.\end{aligned}start_ROW start_CELL italic_F start_POSTSUBSCRIPT bold_italic_l end_POSTSUBSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT × italic_I start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT × italic_ω start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT ) / ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT × italic_ω start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT ) . end_CELL end_ROW

(3)

Relationship between FRF and Temp Avg. We further discuss the relationship between Temp Avg and FRF, which is established through the parameter σ𝜎\sigmaitalic_σ. The σ𝜎\sigmaitalic_σ in Eq. (2) decides the sensitivity of weight function to frequency. When σ=0𝜎0\sigma=0italic_σ = 0, the weight is one for all intensities, and the Eq. (3) can be simplified as:

F𝒍=(k=1KN𝒍,k×D𝒍,k)/(k=1KN𝒍,k).subscript𝐹𝒍superscriptsubscript𝑘1𝐾subscript𝑁𝒍𝑘subscript𝐷𝒍𝑘superscriptsubscript𝑘1𝐾subscript𝑁𝒍𝑘\begin{aligned} F_{\bm{\mathit{l}}}=\Big{(}{\sum\limits_{k=1}^{K}N_{\bm{% \mathit{l}},k}\times D_{\bm{\mathit{l}},k}}\Big{)}/\Big{(}{\sum\limits_{k=1}^{% K}N_{\bm{\mathit{l}},k}}\Big{)}.\end{aligned}start_ROW start_CELL italic_F start_POSTSUBSCRIPT bold_italic_l end_POSTSUBSCRIPT = ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT × italic_D start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT ) / ( ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT bold_italic_l , italic_k end_POSTSUBSCRIPT ) . end_CELL end_ROW

(4)

which is the same as Eq. (1), illustrating that average reference frame is a special case of proposed FRF when σ=0𝜎0\sigma=0italic_σ = 0. The reference frame constructed with σ=0𝜎0\sigma=0italic_σ = 0 (Temp Avg) is shown in Fig. 3(a), it can be observed that the result suffers from severe blur. On the contrary, when σ0𝜎0\sigma\neq 0italic_σ ≠ 0, the higher the frequency of pixel intensity, the greater the weight, resulting in a output more similar to the original pixel. The result with σ0𝜎0\sigma\neq 0italic_σ ≠ 0 (FRF) in Fig. 3(a) possesses superior visual quality compared to Temp Avg, which is beneficial for severe distortion registration.

Refer to caption
Figure 5: Low-rank property analysis of different sequences. (a) Visualization of selected section lines. (b) Singular value curves of corresponding sequences.

4.2 Low-rank Tensor Distortion Refinement

Low-rank Prior of Static Background. Due to the severe distortions in long-range turbulence, achieving perfect pixel-level registration is impossible. Consequently, registration errors are unavoidable in the registered sequences. Considering the static nature of scene, we aim to utilize low-rank prior of static background to refine registration errors while preserving details. We utilize the section lines and singular values to analyze the low-rank property of the distorted, registered and refined sequence in Fig. 5. In Fig. 5(a), we randomly select 1D section lines from each sequence. It is observed that the registered section lines contain sparse noise, while the refined section lines exhibit smoothness along the temporal dimension. Figure 5(b) shows the curves of singular value, revealing that the refined sequence manifests the strongest low-rank property.

Subspace-based Low-rank Tensor Refinement Model. Previous methods directly utilized matrix decomposition to remove turbulence [41, 24], which need to transform 3-D video into 2-D matrix, damaging the spatial-temporal structure. In this work, we propose a subspace-based low-rank tensor refinement model (SLRTR) to rectify the misalignments while preserving details. To our knowledge, we are the first to introduce tensor model into turbulence removal. Given the registered sequence 𝓡h×w×t𝓡superscript𝑤𝑡\bm{\mathcal{R}}\in{{\mathbb{R}}^{h\times w\times t}}bold_caligraphic_R ∈ blackboard_R start_POSTSUPERSCRIPT italic_h × italic_w × italic_t end_POSTSUPERSCRIPT, where hhitalic_h, w𝑤witalic_w and t𝑡titalic_t respectively denote the image height, width, and the number of frames. The key challenge lies in effectively reducing registration error while preserving the spatio-temporal details. A registered sequence can be described as the following formula:

𝓡=𝓑+𝓔+𝓝,𝓡𝓑𝓔𝓝{\bm{\mathcal{R}}}={\bm{\mathcal{B}}}+{\bm{\mathcal{E}}}+{\bm{\mathcal{N}}},bold_caligraphic_R = bold_caligraphic_B + bold_caligraphic_E + bold_caligraphic_N , (5)

where 𝓑h×w×t𝓑superscript𝑤𝑡{\bm{\mathcal{B}}}\in{{\mathbb{R}}^{h\times w\times t}}bold_caligraphic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_h × italic_w × italic_t end_POSTSUPERSCRIPT represents the refined sequence, 𝓔h×w×t𝓔superscript𝑤𝑡{\bm{\mathcal{E}}}\in{{\mathbb{R}}^{h\times w\times t}}bold_caligraphic_E ∈ blackboard_R start_POSTSUPERSCRIPT italic_h × italic_w × italic_t end_POSTSUPERSCRIPT is the registration error, 𝓝h×w×t𝓝superscript𝑤𝑡{\bm{\mathcal{N}}}\in{{\mathbb{R}}^{h\times w\times t}}bold_caligraphic_N ∈ blackboard_R start_POSTSUPERSCRIPT italic_h × italic_w × italic_t end_POSTSUPERSCRIPT denotes the random noise. In this work, we formulate the refinement as an inverse problem utilizing the maximum-a-posterior, as follows:

min𝓑,𝓔12𝓑+𝓔𝓡F2+αΦb(𝓑)+βΦe(𝓔),subscript𝓑𝓔12superscriptsubscriptnorm𝓑𝓔𝓡𝐹2𝛼subscriptΦ𝑏𝓑𝛽subscriptΦ𝑒𝓔\mathop{\min}\limits_{{\bm{\mathcal{B}}},{\bm{\mathcal{E}}}}\frac{1}{2}||{\bm{% \mathcal{B}}}+{\bm{\mathcal{E}}}-{\bm{\mathcal{R}}}||_{F}^{2}+\alpha\Phi_{b}({% \bm{\mathcal{B}}})+\beta\Phi_{e}({\bm{\mathcal{E}}}),roman_min start_POSTSUBSCRIPT bold_caligraphic_B , bold_caligraphic_E end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG | | bold_caligraphic_B + bold_caligraphic_E - bold_caligraphic_R | | start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_α roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( bold_caligraphic_B ) + italic_β roman_Φ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ( bold_caligraphic_E ) ,

(6)

where ΦbsubscriptΦ𝑏\Phi_{b}roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT and ΦesubscriptΦ𝑒\Phi_{e}roman_Φ start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT represent the prior knowledge for the background and error, respectively, α𝛼\alphaitalic_α and β𝛽\betaitalic_β are the corresponding hyper-parameters. For static scene turbulence videos, on the one hand, the refined sequence 𝓑𝓑{\bm{\mathcal{B}}}bold_caligraphic_B exhibits global low-rank property along the temporal dimension, with an ideal rank of one. On the other hand, it also has significant non-local low-rank property along the spatial dimension, due to the self-similarity widely employed in image restoration [13]. Hence, we effectively exploit a joint global-nonlocal prior across both spatial and temporal dimensions to enhance the representation of the static background 𝓑𝓑{\bm{\mathcal{B}}}bold_caligraphic_B:

Φb(𝓑)=αi(1λi2||𝓢i𝓑×3Oi𝓖i||F2+||𝓖i||tnn),\displaystyle\leavevmode\resizebox{258.36667pt}{}{$\Phi_{b}({\bm{\mathcal{B}}}% )=\alpha\mathop{\sum}\limits_{i}\left(\frac{1}{{\lambda_{i}^{2}}}||{{\bm{% \mathcal{S}}}_{i}}{\bm{\mathcal{B}}}{\times_{3}}O_{i}-{{\bm{\mathcal{G}}}_{i}}% ||_{F}^{2}+||{\bm{\mathcal{G}}}_{i}||_{tnn}\right),$}roman_Φ start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( bold_caligraphic_B ) = italic_α ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG | | bold_caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_caligraphic_B × start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_caligraphic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | | bold_caligraphic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT italic_t italic_n italic_n end_POSTSUBSCRIPT ) , (7)

where 𝓢i𝓑p2×n×tsubscript𝓢𝑖𝓑superscriptsuperscript𝑝2𝑛𝑡{{\bm{\mathcal{S}}}_{i}}{\bm{\mathcal{B}}}\in{{\mathbb{R}}^{{p^{2}}\times n% \times t}}bold_caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_caligraphic_B ∈ blackboard_R start_POSTSUPERSCRIPT italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT × italic_n × italic_t end_POSTSUPERSCRIPT is the constructed 3-D tensor via the non-local clustering of a sub-cubic uip×p×tsubscript𝑢𝑖superscript𝑝𝑝𝑡{u_{i}}\in{{\mathbb{R}}^{p\times p\times t}}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_p × italic_p × italic_t end_POSTSUPERSCRIPT [9], p𝑝pitalic_p and n𝑛nitalic_n are the spatial size and number of the sub-cubic respectively, Oid×t(dt)subscript𝑂𝑖superscript𝑑𝑡much-less-than𝑑𝑡O_{i}\in{{\mathbb{R}}^{d\times t}}(d\ll t)italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_t end_POSTSUPERSCRIPT ( italic_d ≪ italic_t ) is an orthogonal subspace projection matrix used to capture the temporal low-rank property, ×3subscript3\times_{3}× start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is the tensor product along the temporal dimension [30], 𝓖isubscript𝓖𝑖{{{\bm{\mathcal{G}}}_{i}}}bold_caligraphic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT represents the low-rank approximation variable, ||||tnn||\bullet||_{tnn}| | ∙ | | start_POSTSUBSCRIPT italic_t italic_n italic_n end_POSTSUBSCRIPT means the tensor nuclear norm for simplicity [9], λisubscript𝜆𝑖\lambda_{i}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the regularization parameter. As for the error 𝓔𝓔{\bm{\mathcal{E}}}bold_caligraphic_E, we formulate it as the sparse error [52] via the L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT sparsity. Thus, the Eq. (6) can be expressed as:

{𝓑^,𝓔^,𝓖^i,O^i}=argmin𝓑,𝓔,𝓖i,Oi12𝓑+𝓔𝓡F2^𝓑^𝓔subscript^𝓖𝑖subscript^𝑂𝑖subscript𝓑𝓔subscript𝓖𝑖subscript𝑂𝑖12superscriptsubscriptnorm𝓑𝓔𝓡𝐹2\displaystyle\leavevmode\resizebox{258.36667pt}{}{$\left\{{\hat{\bm{\mathcal{B% }}},\hat{\bm{\mathcal{E}}},{{\hat{\bm{\mathcal{G}}}}_{i}},\hat{O}_{i}}\right\}% =\arg\mathop{\min}\limits_{{\bm{\mathcal{B}}},{\bm{\mathcal{E}}},{{\bm{% \mathcal{G}}}_{i}},O_{i}}\frac{1}{2}||{\bm{\mathcal{B}}}+{\bm{\mathcal{E}}}-{% \bm{\mathcal{R}}}||_{F}^{2}$}{ over^ start_ARG bold_caligraphic_B end_ARG , over^ start_ARG bold_caligraphic_E end_ARG , over^ start_ARG bold_caligraphic_G end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over^ start_ARG italic_O end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } = roman_arg roman_min start_POSTSUBSCRIPT bold_caligraphic_B , bold_caligraphic_E , bold_caligraphic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG | | bold_caligraphic_B + bold_caligraphic_E - bold_caligraphic_R | | start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (8)
+β||𝓔||1+αi(1λi2||𝓢i𝓑×3Oi𝓖i||F2+||𝓖i||tnn).\displaystyle\leavevmode\resizebox{258.36667pt}{}{$+\beta||{\bm{\mathcal{E}}}{% ||_{1}}+\alpha\mathop{\sum}\limits_{i}\left({\frac{1}{{\lambda_{i}^{2}}}||{{% \bm{\mathcal{S}}}_{i}}{\bm{\mathcal{B}}}{\times_{3}}O_{i}-{{\bm{\mathcal{G}}}_% {i}}||_{F}^{2}+||{\bm{\mathcal{G}}}_{i}||_{tnn}}\right).$}+ italic_β | | bold_caligraphic_E | | start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_α ∑ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( divide start_ARG 1 end_ARG start_ARG italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG | | bold_caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_caligraphic_B × start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_caligraphic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + | | bold_caligraphic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT italic_t italic_n italic_n end_POSTSUBSCRIPT ) .

To solve 𝓑,𝓔,𝓖i,Oi𝓑𝓔subscript𝓖𝑖subscript𝑂𝑖{\bm{\mathcal{B}}},{\bm{\mathcal{E}}},{\bm{\mathcal{G}}}_{i},O_{i}bold_caligraphic_B , bold_caligraphic_E , bold_caligraphic_G start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we adopt the alternating minimization scheme [33] to solve the Eq. (8) for each variable. Please refer to the appendix for the solution.

5 Experimental Results

Refer to caption
Figure 6: Visual comparisons of long-range turbulence mitigation at various distances on RLR-AT. Figures on the above row are results of static-scene independent methods (marked by method), and denotes single-frame based method. Figures on the below row are results of static-scene dependent methods.
Table 2: Quantitative comparison with other methods on synthetic turbulence dataset at different distances. denotes single-frame based approach. Red text indicates the best performance, blue text indicates the second-best performance. ΔΔ\Deltaroman_Δ denotes the exact superiority of proposed CDSP over the second-best method. As the distance increases, our method outperforms the SOTA approach even more significantly.

Distance Metric Static-scene Independent Methods Static-scene Dependent Methods ΔΔ\Deltaroman_Δ TurbNet PiRN TSR-WGAN TMT TurbRecon NDL CLEAR SG NDIR CDSP 2 Km PSNR\uparrow 23.58 25.93 23.98 26.54 27.27 24.93 26.19 24.53 24.29 27.94 0.67 SSIM\uparrow 0.8155 0.8675 0.8215 0.8887 0.8995 0.8459 0.8815 0.8545 0.8147 0.9181 0.0186 4 Km PSNR\uparrow 21.97 24.43 22.90 24.82 26.05 24.12 24.79 23.31 23.23 26.91 0.86 SSIM\uparrow 0.7491 0.8168 0.7813 0.8556 0.8615 0.8111 0.8382 0.8026 0.7796 0.8893 0.0278 6 Km PSNR\uparrow 21.22 23.42 21.89 23.73 24.93 23.37 23.58 22.50 22.59 25.74 0.81 SSIM\uparrow 0.7132 0.7790 0.7394 0.8265 0.8311 0.7834 0.7925 0.7643 0.7561 0.8594 0.0283 8 Km PSNR\uparrow 20.69 22.74 21.31 23.01 23.95 22.76 22.62 21.89 22.19 24.84 0.89 SSIM\uparrow 0.6864 0.7518 0.7135 0.8012 0.8024 0.7592 0.7523 0.7335 0.7402 0.8319 0.0295

[Uncaptioned image] Figure 7: Effectiveness of FRF and SLRTR. (a) Visual comparison of results w/o FRF-based registration, w/o SLRTR and w/ both. (b) Residual error from SLRTR decomposition. (c) Histogram of residual error. [Uncaptioned image] Figure 8: Ablation study on SLRTR. (a) Input. (b) w/o subspace. (c) w/o self-similarity. (d) w/ both.

5.1 Datasets and Experimental Settings

Datasets. We conduct the experiments on various datasets, including a synthetic dataset, the proposed dataset RLR-AT and real hot-air turbulence dataset TurbuText [1]. Synthetic turbulence is simulated with varying distances on the ADE20K [57] employing the turbulence simulator P2S [38]. Further details of the simulator protocol are provided in the supplementary material.

Comparison Methods. We compare CDSP with (1) conventional turbulence removal methods: TurbRecon [37], SG [35], CLEAR [4] and NDL [58]; (2) deep learning based methods: TurbNet [39], TSR-WGAN [29], PiRN [27], NDIR [32] and TMT [56]. For a fair comparison, considering that the experimental datasets consist of static scenes, we categorize the methods into two groups: static-scene dependent and static-scene independent, and they are further classified based on input frames into multi-frame based and single-frame based. We employ codes and pre-train models of TMT and TurbRecon designed for static scenes to ensure a fair comparison. All methods incorporate deblurring effects, with NDIR and NDL using their default deblurring approach [45].

5.2 Qualitative and Quantitative Evaluation

Qualitative Evaluation on Real Turbulence. In Fig. 6, we compare with the existing methods on the RLR-AT. Single-frame based methods TurbNet [39] and PiRN [27] struggle with distortions as they lack modeling of temporal information for turbulence. The results of supervised-based methods like TSR-WGAN [29] and TMT [56] still exhibit distortions due to the domain gap. CLEAR [4], SG [35] and NDL [58] continually produce artifacts or distortions due to unsuitable design for long-range turbulence. Albeit TurbRecon [37] can acquire results with comparable quality, the results still encounter misalignments. In comparison, CDSP consistently achieves more pleasing results at various distances, effectively addressing severe distortions while preserving details. We also conduct a comparison on hot-air turbulence, which are shown in the appendix.

Quantitative Evaluation on Synthetic Turbulence. We further evaluate the performance of CDSP and other methods on synthetic turbulence in Table 2. It is observed that most multi-frame based methods perform better than single-frame based methods, since they take into consideration the temporal information of turbulence. CDSP and TurbRecon achieve the best performance in their respective categories. Note that as the distance increases, CDSP outperforms existing SOTA methods even more significantly, further revealing the superiority of CDSP for long-range turbulence with severe distortions.

Table 3: Ablation study of FRF and SLRTR. FRF SLRTR PSNR SSIM 22.62 0.8051 23.02 0.8034 23.53 0.8154 23.96 0.8385 Table 4: Effectiveness of FRF when embedded into existing methods compared to others. Methods Metric Reference frame Gain Temp avg [58] Non-local avg [37] FRF (Ours) NDL [58] PSNR 22.95 22.88 23.29 0.34 SSIM 0.7698 0.7650 0.7819 0.0121 TurbRecon [37] PSNR 24.33 24.14 24.79 0.65 SSIM 0.8122 0.8082 0.8231 0.0149 CDSP (Ours) PSNR 25.25 24.68 25.41 SSIM 0.8391 0.8331 0.8462 Table 5: Boosting performance of high-level text recognition task. Methods CRNN ASTERN DAN Distorted 0.2553 0.3475 0.3759 TurbNet 0.2057 0.3546 0.3617 PiRN 0.2695 0.3546 0.3758 SG 0.2340 0.3971 0.4042 NDL 0.4539 0.4894 0.4681 TMT 0.4040 0.4893 0.4964 TurbRecon 0.4397 0.4965 0.4894 CDSP 0.5248 0.5319 0.5532

Refer to caption
Figure 9: Visual comparison of FRF with other reference frame.

5.3 Ablation and Discussion

How dose FRF Facilitate Severe Distortion Correction? We study the importance of FRF for distortion correction. As shown in Fig. 3(a) earlier, due to the superior quality and sharp edges of FRF, better registration can be achieved, greatly reducing the burden of refinement. We remove the FRF-based registration (FRF), and directly employ SLRTR to handle severe distortions, placing a heavy burden on SLRTR. Table 5 shows that the removal of FRF leads to a noticeable performance drop. Moreover, the result without FRF in Fig. 8(a) encounters severe detail loss. Figure 8(b) and (c) show the visualization and distribution of residuals from SLRTR decomposition. It is evident that the residual without FRF contains more lost details. This indicates that FRF is indispensable in reducing the burden of refinement and preventing severe details loss.

How dose SLRTR Improve Severe Distortion Correction? We aim to emphasize the necessity of SLRTR for distortion correction. We remove the SLRTR and directly utilize FRF-based registration. However, achieving perfect pixel-level registration is impossible, since there exist severe distortions in long-range turbulence. Table 5 shows that the method encounters a obvious performance drop without SLRTR, and the result without SLRTR in Fig. 8(a) still exists misalignments, indicating that SLRTR is necessary for refining registration errors.

Effectiveness of Subspace and Self-similarity. Then we aim to illustrate the effectiveness of subspace and self-similarity to the proposed SLRTR. The subspace is utilized to characterize the global low-rank property along the temporal dimension. In Fig. 8(b), the result without subspace still exists distortions, indicating that relying solely on the prior of spatial self-similarity is insufficient to characterize the properties of the temporal dimension. The non-local prior is employed to explore the self-similarity of the spatial dimension. Figure 8(c) shows the result without self-similarity, which suffers from unexpected details loss, implying that relying only on the temporal information is not enough.

Complementarity between SLRTR and FRF. We further discuss how does FRF and SLRTR complement each other in Fig. 8(a). On one hand, FRF-based registration could notably mitigate distortions with fewer corruptions, greatly reducing the burden of SLRTR. On the other hand, the SLRTR could effectively refine the residual errors unavoidably left by FRF-based registration. FRF and SLRTR complement to each other to better remove the severe distortion.

Effectiveness of Frequency-aware Reference Frame. To further illustrate the Effectiveness of FRF, we embed existing reference frames (Temp Avg [58], Non-local average [37]) and proposed FRF into existing frameworks: NDL [58], TurbRecon [37] and proposed CDSP on 5 Km synthetic turbulence. Table 5 shows that existing methods obtain the highest PSNR/SSIM after integrating FRF, revealing the effectiveness of FRF. We also visualize the comparison between FRF and other reference frames in Fig. 9. It is observed that FRF possesses superior quality and sharper edge, further demonstrating the reliability of FRF.

Promotion for Downstream Recognition. We further evaluate the turbulence mitigating methods on text recognition using TurbuText [1]. We apply three text recognition methods (CRNN [46], ASTERN [47], DAN [50]) on restoration results and report the accuracies in Table 5. CDSP consistently improves the recognition performance for all recognition methods.

Limitation. The proposed method could effectively handle static scene turbulence, but dynamic scenes (e.g. scene contains moving objects or camera shake) are more complex due to the coupling of turbulence, object and camera motion. Figure 10 shows the results of dynamic scene. Though static regions are well-processed, dynamic regions suffer from severe trailing since CDSP does not model object motion. We will address dynamic scene turbulence in future work.

Refer to caption
Figure 10: Limitation of the proposed CDSP. (a) Dynamic scene turbulence. (b) Result of CDSP. CDSP can effectively handle static region but fails to process dynamic regions.

6 Conclusion

Our work focuses on long-range turbulence mitigation. We construct a long-range turbulence dataset (RLR-AT). We propose a coarse-to-fine framework for long-range turbulence mitigation, which cooperates the dynamic and the static priors. We propose a frequency-aware reference frame for better registration. We propose a low-rank tensor refinement model to refine the registration error with details preserving. Extensive experiments demonstrate the proposed method outperforms SOTA methods on different datasets.

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant 62371203. The computation is completed in the HPC Platform of Huazhong University of Science and Technology.

References

  • [1] Bridging the gap between computational photography and visual recognition: 6th ug2+ prize challenge, http://http://cvpr2023.ug2challenge.org/dataset23_t2.html, track 2
  • [2] Lucky imaging: high angular resolution imaging in the visible from the ground. Astronomy & Astrophysics 446(2), 739–745 (2006)
  • [3] Anantrasirichai, N., Achim, A., Bull, D.: Atmospheric turbulence mitigation for sequences with moving objects using recursive image fusion. In: ICIP. pp. 2895–2899 (2018)
  • [4] Anantrasirichai, N., Achim, A., Kingsbury, N.G., Bull, D.R.: Atmospheric turbulence mitigation using complex wavelet-based fusion. IEEE TIP 22(6), 2398–2408 (2013)
  • [5] Boehrer, N., Nieuwenhuizen, R.P., Dijk, J.: Turbulence mitigation in neuromorphic camera imagery. vol. 11540, pp. 43–58. SPIE (2020)
  • [6] Brandner, W., Hormuth, F.: Lucky imaging in astronomy. Astronomy at High Angular Resolution: A Compendium of Techniques in the Visible and Near-Infrared pp. 1–16 (2016)
  • [7] Caliskan, T., Arica, N.: Atmospheric turbulence mitigation using optical flow. In: ICPR. pp. 883–888 (2014)
  • [8] Chan, S.H.: Tilt-then-blur or blur-then-tilt? clarifying the atmospheric turbulence model. IEEE SPL 29, 1833–1837 (2022)
  • [9] Chang, Y., Yan, L., Zhong, S.: Hyper-laplacian regularized unidirectional low-rank tensor recovery for multispectral image denoising. In: CVPR. pp. 4260–4268 (2017)
  • [10] Chimitt, N., Chan, S.H.: Simulating anisoplanatic turbulence by sampling intermodal and spatially correlated zernike coefficients. OE 59(8), 083101–083101 (2020)
  • [11] Chimitt, N., Mao, Z., Hong, G., Chan, S.H.: Rethinking atmospheric turbulence mitigation. arXiv preprint arXiv:1905.07498 (2019)
  • [12] Chimitt, N., Zhang, X., Mao, Z., Chan, S.H.: Real-time dense field phase-to-space simulation of imaging through atmospheric turbulence. IEEE TCI 8, 1159–1169 (2022)
  • [13] Dabov, K., Foi, A., Katkovnik, V., Egiazarian, K.: Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE TIP 16(8), 2080–2095 (2007)
  • [14] Deshmukh, A.S., Medasani, S.S., Reddy, G.R.: A fast hierarchical patch-based approach for mitigating atmospheric turbulence. In: ICACCI. pp. 1–7 (2013)
  • [15] Fazlali, H., Shirani, S., BradforSd, M., Kirubarajan, T.: Atmospheric turbulence removal in long-range imaging using a data-driven-based approach. IJCV 130(4), 1031–1049 (2022)
  • [16] Feng, B.Y., Xie, M., Metzler, C.A.: Turbugan: An adversarial learning approach to spatially-varying multiframe blind deconvolution with applications to imaging through turbulence. IEEE JSAIT 3(3), 543–556 (2022)
  • [17] Fried, D.L.: Optical resolution through a randomly inhomogeneous medium for very long and very short exposures. JOSA 56(10), 1372–1379 (1966)
  • [18] Fried, D.L.: Anisoplanatism in adaptive optics. JOSA 72(1), 52–61 (1982)
  • [19] Gao, J., Anantrasirichai, N., Bull, D.: Atmospheric turbulence removal using convolutional neural network. arXiv preprint arXiv:1912.11350 (2019)
  • [20] Geiger, A., Moosmann, F., Car, Ö., Schuster, B.: Automatic camera and range sensor calibration using a single shot. In: ICRA. pp. 3936–3943 (2012)
  • [21] Gilles, J., Ferrante, N.B.: Open turbulent image set (otis). PRL 86, 38–41 (2017)
  • [22] Hardie, R.C., Power, J.D., LeMaster, D.A., Droege, D.R., Gladysz, S., Bose-Pillai, S.: Simulation of anisoplanatic imaging through optical turbulence using numerical wave propagation with new validation analysis. OE 56(7), 071502–071502 (2017)
  • [23] Hardie, R.C., Rucci, M.A., Dapore, A.J., Karch, B.K.: Block matching and wiener filtering approach to optical turbulence mitigation and its application to simulated and real imagery with quantitative error analysis. OE 56(7), 071503–071503 (2017)
  • [24] He, R., Wang, Z., Fan, Y., Fengg, D.: Atmospheric turbulence mitigation based on turbulence extraction. In: ICASSP. pp. 1442–1446 (2016)
  • [25] Hirsch, M., Sra, S., Schölkopf, B., Harmeling, S.: Efficient filter flow for space-variant multiframe blind deconvolution. In: CVPR. pp. 607–614 (2010)
  • [26] Hua, X., Pan, C., Shi, Y., Liu, J., Hong, H.: Removing atmospheric turbulence effects via geometric distortion and blur representation. IEEE TGRS 60, 1–13 (2020)
  • [27] Jaiswal, A., Zhang, X., Chan, S.H., Wang, Z.: Physics-driven turbulence image restoration with stochastic refinement. In: ICCV. pp. 12170–12181 (2023)
  • [28] Jiang, W., Boominathan, V., Veeraraghavan, A.: Nert: Implicit neural representations for unsupervised atmospheric turbulence mitigation. In: CVPRW. pp. 4235–4242 (2023)
  • [29] Jin, D., Chen, Y., Lu, Y., Chen, J., Wang, P., Liu, Z., Guo, S., Bai, X.: Neutralizing the impact of atmospheric turbulence on complex scene imaging via deep learning. NMI 3(10), 876–884 (2021)
  • [30] Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM review 51(3), 455–500 (2009)
  • [31] Lau, C.P., Lai, Y.H., Lui, L.M.: Restoration of atmospheric turbulence-distorted images via rpca and quasiconformal maps. Inverse Problems 35(7), 074002 (2019)
  • [32] Li, N., Thapa, S., Whyte, C., Reed, A.W., Jayasuriya, S., Ye, J.: Unsupervised non-rigid image distortion removal via grid deformation. In: ICCV. pp. 2522–2532 (2021)
  • [33] Lin, Z., Liu, R., Su, Z.: Linearized alternating direction method with adaptive penalty for low-rank representation. NeurIPS 24 (2011)
  • [34] Liu, C., et al.: Beyond pixels: exploring new representations and applications for motion analysis. Ph.D. thesis, Massachusetts Institute of Technology (2009)
  • [35] Lou, Y., Kang, S.H., Soatto, S., Bertozzi, A.L.: Video stabilization of atmospheric turbulence distortion. Citeseer IPI 7(3), 839–861 (2013)
  • [36] Lu, C., Feng, J., Chen, Y., Liu, W., Lin, Z., Yan, S.: Tensor robust principal component analysis: Exact recovery of corrupted low-rank tensors via convex optimization. In: CVPR. pp. 5249–5257 (2016)
  • [37] Mao, Z., Chimitt, N., Chan, S.H.: Image reconstruction of static and dynamic scenes through anisoplanatic turbulence. IEEE TCI 6, 1415–1428 (2020)
  • [38] Mao, Z., Chimitt, N., Chan, S.H.: Accelerating atmospheric turbulence simulation via learned phase-to-space transform. In: ICCV. pp. 14759–14768 (2021)
  • [39] Mao, Z., Jaiswal, A., Wang, Z., Chan, S.H.: Single frame atmospheric turbulence mitigation: A benchmark study and a new physics-inspired transformer model. In: ECCV. pp. 430–446. Springer (2022)
  • [40] Mei, K., Patel, V.M.: Ltt-gan: Looking through turbulence by inverting gans. IEEE JSTSP (2023)
  • [41] Oreifej, O., Li, X., Shah, M.: Simultaneous video stabilization and moving object detection in turbulence. IEEE TPAMI 35(2), 450–462 (2012)
  • [42] Rai, S.N., Jawahar, C.: Removing atmospheric turbulence via deep adversarial learning. IEEE TIP 31, 2633–2646 (2022)
  • [43] Roggemann, M.C., Welsh, B.M., Hunt, B.R.: Imaging through turbulence. CRC press (1996)
  • [44] Schwartzman, A., Alterman, M., Zamir, R., Schechner, Y.Y.: Turbulence-induced 2d correlated image distortion. In: ICCP. pp. 1–13 (2017)
  • [45] Shan, Q., Jia, J., Agarwala, A.: High-quality motion deblurring from a single image. ACM TOG 27(3), 1–10 (2008)
  • [46] Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE TPAMI 39(11), 2298–2304 (2016)
  • [47] Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: An attentional scene text recognizer with flexible rectification. IEEE TPAMI 41(9), 2035–2048 (2018)
  • [48] Shimizu, M., Yoshimura, S., Tanaka, M., Okutomi, M.: Super-resolution from image sequence under influence of hot-air optical turbulence. In: CVPR. pp. 1–8 (2008)
  • [49] Tatarski, V.I.: Wave propagation in a turbulent medium. Courier Dover Publications (2016)
  • [50] Wang, T., Zhu, Y., Jin, L., Luo, C., Chen, X., Wu, Y., Wang, Q., Cai, M.: Decoupled attention network for text recognition. In: AAAI. vol. 34, pp. 12216–12224 (2020)
  • [51] Wang, Y., Jin, D., Chen, J., Bai, X.: Revelation of hidden 2d atmospheric turbulence strength fields from turbulence effects in infrared imaging. NCS pp. 1–13 (2023)
  • [52] Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. NeurIPS 22 (2009)
  • [53] Xie, Y., Zhang, W., Tao, D., Hu, W., Qu, Y., Wang, H.: Removing turbulence effect via hybrid total variation and deformation-guided kernel regression. IEEE TIP 25(10), 4943–4958 (2016)
  • [54] Yasarla, R., Patel, V.M.: Learning to restore images degraded by atmospheric turbulence using uncertainty. In: ICIP. pp. 1694–1698 (2021)
  • [55] Zhang, X., Chimitt, N., Chi, Y., Mao, Z., Chan, S.H.: Spatio-temporal turbulence mitigation: A translational perspective. In: CVPR (2024)
  • [56] Zhang, X., Mao, Z., Chimitt, N., Chan, S.H.: Imaging through the atmosphere using turbulence mitigation transformer. IEEE TCI (2024)
  • [57] Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: CVPR. pp. 633–641 (2017)
  • [58] Zhu, X., Milanfar, P.: Removing atmospheric turbulence via space-invariant deconvolution. IEEE TPAMI 35(1), 157–170 (2012)