Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

(eccv) Package eccv Warning: Package ‘hyperref’ is loaded with option ‘pagebackref’, which is *not* recommended for camera-ready version

11institutetext: Institute for AI Industry Research (AIR), Tsinghua University 22institutetext: Tongji University 33institutetext: Ocean University of China 44institutetext: Duke Kunshan University 55institutetext: Haomo.ai
55email: kevin729@tongji.edu.cn, 55email: zhengju@stu.ouc.edu.cn
55email: zhaohao@air.tsinghua.edu.cn

SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing

Xiaowei Song*1,2    Jv Zheng*1,3    Shiran Yuan 1144    Huan-ang Gao 11    Jingwei Zhao 55    Xiang He 55    Weihao Gu 55    Hao Zhao 11
Abstract

In this paper, we present a Scale-adaptive method for Anti-aliasing Gaussian Splatting (SA-GS). While the state-of-the-art method Mip-Splatting needs modifying the training procedure of Gaussian splatting, our method functions at test-time and is training-free. Specifically, SA-GS can be applied to any pretrained Gaussian splatting field as a plugin to significantly improve the field’s anti-alising performance. The core technique is to apply 2D scale-adaptive filters to each Gaussian during test time. As pointed out by Mip-Splatting, observing Gaussians at different frequencies leads to mismatches between the Gaussian scales during training and testing. Mip-Splatting resolves this issue using 3D smoothing and 2D Mip filters, which are unfortunately not aware of testing frequency. In this work, we show that a 2D scale-adaptive filter that is informed of testing frequency can effectively match the Gaussian scale, thus making the Gaussian primitive distribution remain consistent across different testing frequencies. When scale inconsistency is eliminated, sampling rates smaller than the scene frequency result in conventional jaggedness, and we propose to integrate the projected 2D Gaussian within each pixel during testing. This integration is actually a limiting case of super-sampling, which significantly improves anti-aliasing performance over vanilla Gaussian Splatting. Through extensive experiments using various settings and both bounded and unbounded scenes, we show SA-GS performs comparably with or better than Mip-Splatting. Note that super-sampling and integration are only effective when our scale-adaptive filtering is activated. Our codes, data and models are available at https://github.com/zsy1987/SA-GS.

Keywords:
3D Vision, Novel View Synthesis, Rasterization, Scale Consistency, Super-Sampling
* Indicates Equal Contribution. Indicates Corresponding Author.
[Uncaptioned image]
Figure 1: Under zoom-in, 3D Gaussian Splatting [10] (3DGS3𝐷𝐺𝑆3DGS3 italic_D italic_G italic_S) exhibits significant erosion artefacts, while under zoom-out, it undergoes dramatic dilation. Mip-Splatting [24] utilizes 3D smoothing and 2D Mip filters to regularize primitives during training. In contrast, our method is training-free and maintains scale consistency using solely a single 2D scale-adaptive filter. Scale adaptation allows us to use super-sampling (named as SA-GSsup𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝SA\text{-}GS_{sup}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT later in the paper) and its limiting case integration (named as SA-GSint𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡SA\text{-}GS_{int}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT later in the paper) to obtain more accurate results when zooming out.

1 Introduction

Novel View Synthesis (NVS) has played an important role in fields such as visualization[12], simulation[21, 20], automation[27, 26], and VR/AR[19, 16]. The advent of Neural Radiance Fields (NeRFs) [13, 25] has significantly enhanced the quality of view synthesis while bypassing the need of reconstructing geometry, texture, material and lighting (which is typically a very under-determined inverse problem). Recently, another method, 3D Gaussian Splatting (3DGS)[10] has garnered attention from both academia and industry for its high synthesis quality and fast rendering speed. Gaussian primitive-based representation and its corresponding efficient CUDA implementation enable 3DGS[10] to render scenes in real-time with intricate details, greatly accelerating NVS systems intended for tasks such as gaming, simulation and multimedia.

Problem. Unlike implicit representations like NeRFs, 3DGS[10] utilizes Gaussian primitives to represent 3D scenes in an explicit manner. This is achieved by optimizing the position, scale, transparency, rotation and spherical harmonic coefficients of each Gaussian primitive to fit input images, producing a continuous 3D signal with a complex Gaussian mixture distribution. However, as pointed out by a recent study Mip-Splatting [24], there is a trick in 3DGS not mentioned in the paper, which is introduced to ensure numerical stability. Specifically, 2D dilation is added during training to expand the distribution over a planar region to eliminate the case in which the region is smaller than one single pixel (thus causing instability). This operation guarantees steady updating of the Gaussian primitives, but results in inconsistencies in the degree of dilation and the degree of change in Gaussian scale if the intrinsic and extrinsic parameters of the camera are not equal to the training situation, with artefacts illustrated in Fig. 1. This is due to the fact that 2D dilation is fixed on the pixel space and is not informed of the scale variation of the Gaussian, as illustrated in Fig. 3.

Cause & Solution. Training data is typically produced using consistent camera settings. Therefore, the use of a fixed dilation operation during the training phase does not result in variations in the dilated scale at the same location. In this case, the Gaussian primitives can still learn a reasonable 2D projection distribution. However, during the rendering phase, the Gaussian scene may be observed at various resolutions and distances. This can compromise the otherwise good 2D projection distribution, resulting in a different α𝛼\alphaitalic_α-blending process than during the training phase, which ultimately affects the rendering quality. In this paper, we name this phenomenon as Gaussian scale mismatch, which is a property specific to 3DGS and absent in NeRFs. We believe that the 2D projection distribution of Gaussian primitives in the rendering phase should be consistent with the training phase. We correct the 2D dilation operation (in 3DGS) and the 3D smoothing+2D Mip filter (in Mip-Splatting) via a 2D scale-adaptive filter to enforce scale consistency at different rendering parameter settings.

Anti-aliasing. When the projected 2D Gaussian distribution remains consistent with the training at different rendering settings, anti-aliasing is simplified to ensure the synergy of the sampling frequency and the scene frequency. As the sampling frequency decreases, the Nyquist sampling theorem[18, 15] may not be satisfied at a certain frequency level, resulting in aliasing effects in the image. Therefore, we introduce conventional anti-aliasing ideas, super-sampling and its limiting case integration, into 3DGS so that the Nyquist sampling theorem is satisfied when zooming out. Notably, super-sampling and integration only make sense after the Gaussian scale mismatch issue is addressed.

Significance. As shown in Fig. 1, our method can well address the artefacts of vanilla 3DGS while being training-free. Under the zoom-in and zoom-out settings, vanilla 3DGS shows severe visual quality degradation because of erosion and dilation. The quality degradation is actually caused by an intertwined effect of Gaussian scale mismatch and aliasing, which is different from the case that, in NeRFs, artefacts under zoom-in and zoom-out are solely caused by aliasing. While Mip-Splatting [24] can address these artefacts, our method is: (1) More flexible. Mip-Splatting needs to modify the training procedure of 3DGS, but our method is a training-free plugin; (2) More elegant. Mip-Splatting resolves the Gaussian scale mismatch issue with 3D smoothing and 2D Mip filter, but our method exploits a single 2D scale-adaptive filter; (3) More accurate. Our method performs comparably with Mip-Splatting but out-performs it under zoom-out because our scale-adaptive formulation can unleash the power of simple and effective strategies super-sampling (and its limiting case integration).

In summary, our contributions are as follows:

  • 1.

    We introduce a training-free approach that can be directly applied to the inference process of any pretrained 3DGS[10] model to resolve its visual artefacts at drastically changed rendering settings. The method itself is named as scale-adaptive Gaussian splatting (SA-GS) as a whole.

  • 2.

    Technically, we propose a 2D scale-adaptive filter that keeps the Gaussian projection scale consistent with the training phase scale at different rendering settings. This scale-adaptive filter also allows simple anti-aliasing techniques (super-sampling and its limiting case integration) to work effectively.

  • 3.

    Extensive qualitative and quantitative experiments were conducted on the Mip-NeRF 360[2] and Blender[13] datasets. Our method achieves superior or comparable performance compared to the state-of-the-art Gaussian anti-aliasing methods, while being training-free.

2 Related Works

2.1 Anti-aliased Neural Radiance Fields

Aliasing is a common issue in conventional computer graphics that occurs when the rendering frequency drastically changes, resulting in visual artefacts such as jagged edges or moiré patterns. These artefacts are caused by the discrete sampling of continuous physical signals. Neural rendering, which is represented by neural radiance fields (NeRFs) [13] or other techniques [4, 9, 11], is also troubled by aliasing. Anti-aliasing neural radiance fields are an active research direction, with notable milestones like Mip-NeRF [1], Mip-NeRF 360 [2], Zip-NeRF [3] and Tri-MipRF [7]. Mip-NeRF [1] featurizes the 3d frustum with an approximate ellipse such as the view-dependent distance-aware visual effects are captured by a closed-form geometric encoding of the (approximate) ellipse. Mip-NeRF 360 [2] accelerates Mip-NeRF [1] with a density proposal network and addresses unbounded scenes using a heuristic contraction rule. Zip-NeRF [3] designs a spiral sampling pattern, instead of volume encoding, for the anti-aliasing of radiance fields based on hash grids. Tri-MipRF [7] interpolates down-sampled tri-planes (at corresponding scales) to build a mipmap-like representation. Despite various techniques designed for different NeRF backbones (MLP-based, triplane-based or grid-based), most of them are motivated by two old ideas: super-sampling and mipmap. Our method SA-GS is also motivated by super-sampling, but as will be discussed in the next section, in Gaussian splatting fields, other issues need to be addressed before the power of super-sampling can be fully unleashed.

2.2 Gaussian Splatting

3D Gaussian Splatting (3DGS) [10] is a recently proposed neural111This is somewhat inaccurate because many people argue that 3DGS does not involve typical neural networks but only matrix multiplication. rendering paradigm that is primitive-based (like Point-NeRF[23] or ADOP[17]), uses spherical harmonics as the color representation (like Plenoxels[5]), and renders at a very fast rate (faster than Instant-NGP[14]). While this new paradigm has recently seen much progress in terms of physical integration (PhysGaussian[22]) and geometrical alignment (SuGaR[6]), anti-aliasing for 3DGS[10] is not yet well addressed. We note that one fundamental challenge is that when 3DGS[10] is rendered at different distances (i.e., resolutions), the Gaussian scale might not match the training-time scale. This mismatch issue entangles with aliasing, making the problem extremely complicated. This has been pointed out by a recent study Mip-Splatting [24], showing that the dilation and erosion issues were caused by this mismatch. In this paper, we demonstrate that classical super-sampling (and its limiting case of integration) is effective for 3DGS[10] anti-aliasing, but only under the case of matched Gaussian scales. Unlike the heuristic and problematic Gaussian scale filtering techniques used by 3DGS[10] and Mip-Splatting[24], our adaptive solution well addresses the mismatch issue, thus fully unleashing the power of super-sampling for anti-aliasing.

3 Method

Refer to caption
Figure 2: Paradigm Comparison of Gaussian Rasterization Process. All Gaussian Splatting methods share this framework for training and rendering, but different models use different strategies to process Gaussian primitives. During training, 3DGS[10] uses (c) in pixel space for training stability, but results in scale inconsistencies at different rendering settings; Mip-Splatting utilises (a) to restrict the Gaussian frequency upper bound in 3D space, and (b) to emulate box filtering in pixel space. But Mip-Splatting[24] still suffers from scale inconsistency and needs to modify the training procedure of 3DGS. Our approach is training-free and only operates on the testing flow. We use (d) in pixel space to maintain the scale consistency of the Gaussian primitives, and further enhance the anti-aliasing capability of 3DGS by applying (e) and (f) to the α𝛼\alphaitalic_α-blending process. Note that (e) and (f) only make sense with (d) activated.

A Paradigm comparison is presented in Fig. 2. Overall, our SA-GS method aims to mitigate the artefacts of 3DGS when rendered at different settings (e.g., 8×8\times8 × zoom-in and 1/8×1/8\times1 / 8 × zoom-out shown in Fig. 1). A notable fact is that in NeRFs [13, 1, 2, 3, 8], artefacts under zoom-in/out are caused by aliasing, while for 3DGS these artefacts are caused by the intertwined effects of Gaussian scale mismatch and aliasing. Thus, anti-aliasing techniques make sense only after the Gaussian scale mismatch issue is addressed. The root of Gaussian scale mismatch in vanilla 3DGS, as pointed out by [24], is not described in the paper of [10]. The root is shown in Fig. 2-(c), which is designed to expand projected 2D Gaussians so that the case of the region is smaller than one single pixel is eliminated. To alleviate the mismatch, Mip-splatting introduces 3D smoothing (Fig. 2-(a)) and 2D Mip filter (Fig. 2-(b)), during training. Unfortunately, these heuristic methods (Fig. 2-(a/b/c)) do not address the mismatch issue in principle, so that conventional anti-alising techniques fail to work for 3DGS and Mip-splatting. Our SA-GS features a 2D scale adaptive filter (Fig. 2-(d)) that resolves the mismatch issue in principle and is a training-free plugin. It also unleashes the power of simple anti-aliasing techniques like super-sampling (Fig. 2-(f)) and its limiting case integration (Fig. 2-(e)) to work for 3DGS.

Refer to caption
Figure 3: Scale ambiguity. The heuristic 2D dilation process in vanilla 3DGS code (pointed out by [24]) operates on the pixel space and enlarges the projected 2D Gaussian by a fixed amount (around 1.64 pixel). However, a fixed 2D dilation (1.64 pixel) can result in scale ambiguities when representing the same scene at different rendering settings, as shown by the green expansion area. (a) When the Gaussian scale is held constant and the resolution changes, the dilation scale (green) changes inconsistently. (b) When the Gaussian scale changes and the resolution remains constant, the dilation scale (green) does not change with the Gaussian. Our 2D scale-adaptive filter ensures that the Gaussian scale remains consistent across different rendering settings, as shown by the red expansion area. This keeps the scale consistent with the training setup.

3.1 2D Scale-adaptive Filter

The dilation operation (heuristic and fixed at 1.64 pixel) used by 3DGS[10] during training introduces scale ambiguity to the 3D scene, as shown in Fig. 3. As mentioned above, it is crucial to maintain the scale of the Gaussian in the training setup consistent at different rendering settings.

We propose a 2D scale-adaptive filter that bridges the scale gap between the rendering stage and the training setup, whose effects are shown in Fig. 4. In pixel space, a 2D Gaussian primitive can be expressed parametrically by its mean 𝐩ksubscript𝐩𝑘\mathbf{p}_{k}bold_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and covariance 𝚺ksubscript𝚺𝑘\boldsymbol{\Sigma}_{k}bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT as follows:

𝒢k2D(𝐱)=e12(𝐱𝐩k)T𝚺k1(𝐱𝐩k)subscriptsuperscript𝒢2𝐷𝑘𝐱superscript𝑒12superscript𝐱subscript𝐩𝑘𝑇superscriptsubscript𝚺𝑘1𝐱subscript𝐩𝑘\displaystyle\mathcal{G}^{2D}_{k}(\mathbf{x})=e^{-\frac{1}{2}(\mathbf{x}-% \mathbf{p}_{k})^{T}\boldsymbol{\Sigma}_{k}^{-1}(\mathbf{x}-\mathbf{p}_{k})}caligraphic_G start_POSTSUPERSCRIPT 2 italic_D end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_x ) = italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( bold_x - bold_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_x - bold_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT (1)
Refer to caption
Figure 4: 3DGS heuristic dilation and our scale-adaptive filter. Our 2D scale-adaptive filter maintains the structure of the scene at any resolution, whereas 3DGS[10] dilation (fixed at 1.64 pixel) leads to erroneous dilation at low frequencies and erosion at high frequencies. (Note dilation refers to method and artefacts in different contexts.)

Problem. During the training of vanilla 3DGS, a low-pass Gaussian kernel function 𝒢lsubscript𝒢𝑙\mathcal{G}_{l}caligraphic_G start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is applied for dilation. This is formally expressed as a convolution between two Gaussians and can eventually be written as ksubscript𝑘\mathcal{M}_{k}caligraphic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT:

𝒢k2D(x)3DGSsuperscriptsubscript𝒢𝑘2𝐷subscript𝑥3𝐷𝐺𝑆\displaystyle\mathcal{G}_{k}^{2D}(x)_{3DGS}caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_D end_POSTSUPERSCRIPT ( italic_x ) start_POSTSUBSCRIPT 3 italic_D italic_G italic_S end_POSTSUBSCRIPT =|𝚺k+σl𝐈||𝚺k|(𝒢k(𝐩k,𝚺k)𝒢l(𝐩k,σl𝐈))(x)absentsubscript𝚺𝑘subscript𝜎𝑙𝐈subscript𝚺𝑘subscript𝒢𝑘subscript𝐩𝑘subscript𝚺𝑘subscript𝒢𝑙subscript𝐩𝑘subscript𝜎𝑙𝐈𝑥\displaystyle=\sqrt{\frac{|\boldsymbol{\Sigma}_{k}+\sigma_{l}\cdot\mathbf{I}|}% {|\boldsymbol{\Sigma}_{k}|}}(\mathcal{G}_{k}(\mathbf{p}_{k},\boldsymbol{\Sigma% }_{k})\ast\mathcal{G}_{l}(\mathbf{p}_{k},\sigma_{l}\cdot\mathbf{I}))(x)= square-root start_ARG divide start_ARG | bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_σ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⋅ bold_I | end_ARG start_ARG | bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | end_ARG end_ARG ( caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∗ caligraphic_G start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( bold_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⋅ bold_I ) ) ( italic_x ) (2)
=k(𝐩k,𝚺k+σl𝐈)(x)absentsubscript𝑘subscript𝐩𝑘subscript𝚺𝑘subscript𝜎𝑙𝐈𝑥\displaystyle=\mathcal{M}_{k}(\mathbf{p}_{k},\boldsymbol{\Sigma}_{k}+\sigma_{l% }\cdot\mathbf{I})(x)= caligraphic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_σ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⋅ bold_I ) ( italic_x )

Here, σlsubscript𝜎𝑙\sigma_{l}italic_σ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is a fixed hyperparamter that controls the scale of 𝒢lsubscript𝒢𝑙\mathcal{G}_{l}caligraphic_G start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT, while 𝐈𝐈\mathbf{I}bold_I is 2D unit matrix. Here the problem of 3DGS is that σlsubscript𝜎𝑙\sigma_{l}italic_σ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT is fixed as 0.3, which approximately leads to 0.3×31.640.331.64\sqrt{0.3}\times 3\approx 1.64square-root start_ARG 0.3 end_ARG × 3 ≈ 1.64 dilation as shown in Fig. 3.

Solution. During rendering, the scale of the Gaussian primitive in camera space should remain constant, regardless of any changes in rendering frequency. This is achieved by calculating the ratio r=ΔRpΔDc𝑟Δsubscript𝑅𝑝Δsubscript𝐷𝑐r=\frac{\Delta R_{p}}{\Delta D_{c}}italic_r = divide start_ARG roman_Δ italic_R start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG start_ARG roman_Δ italic_D start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG. ΔRpΔsubscript𝑅𝑝\Delta R_{p}roman_Δ italic_R start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT is the resolution ratio between training and rendering, solving the problem described in Fig. 3-(a). ΔDcΔsubscript𝐷𝑐\Delta D_{c}roman_Δ italic_D start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the distance(focal length) ratio between the rendering camera and the closest orientated training camera, solving the problem described in Fig. 3-(b).

𝒢k2D(x,r)SA-GSsuperscriptsubscript𝒢𝑘2𝐷subscript𝑥𝑟𝑆𝐴-𝐺𝑆\displaystyle\mathcal{G}_{k}^{2D}(x,r)_{SA\text{-}GS}caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_D end_POSTSUPERSCRIPT ( italic_x , italic_r ) start_POSTSUBSCRIPT italic_S italic_A - italic_G italic_S end_POSTSUBSCRIPT =|𝚺k+σlr2𝐈||𝚺k|(𝒢k(𝐩k,𝚺k)r𝒢l(𝐩k,σl𝐈))(x)absentsubscript𝚺𝑘subscript𝜎𝑙superscript𝑟2𝐈subscript𝚺𝑘subscript𝒢𝑘subscript𝐩𝑘subscript𝚺𝑘𝑟subscript𝒢𝑙subscript𝐩𝑘subscript𝜎𝑙𝐈𝑥\displaystyle=\sqrt{\frac{|\boldsymbol{\Sigma}_{k}+\sigma_{l}r^{2}\cdot\mathbf% {I}|}{|\boldsymbol{\Sigma}_{k}|}}(\mathcal{G}_{k}(\mathbf{p}_{k},\boldsymbol{% \Sigma}_{k})\ast r\mathcal{G}_{l}(\mathbf{p}_{k},\sigma_{l}\cdot\mathbf{I}))(x)= square-root start_ARG divide start_ARG | bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_σ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ bold_I | end_ARG start_ARG | bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | end_ARG end_ARG ( caligraphic_G start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ∗ italic_r caligraphic_G start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ( bold_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_σ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ⋅ bold_I ) ) ( italic_x ) (3)
=k(𝐩k,𝚺k+σlr2𝐈)(x)absentsubscript𝑘subscript𝐩𝑘subscript𝚺𝑘subscript𝜎𝑙superscript𝑟2𝐈𝑥\displaystyle=\mathcal{M}_{k}(\mathbf{p}_{k},\boldsymbol{\Sigma}_{k}+\sigma_{l% }r^{2}\cdot\mathbf{I})(x)= caligraphic_M start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( bold_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_σ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ bold_I ) ( italic_x )

Via this operation (named as 2D scale-adaptive filter), we can ensure a consistent scale and distribution of 2D Gaussian projections in camera space at different rendering settings, such that we can match the training settings.

3.2 Making Conventional Anti-Aliasing Great Again for Gaussians

Refer to caption
Figure 5: Super Sampling and Integration applied on a Gaussian primitive. Our super sampling method, denoted as (a), involves dividing each pixel into 9 sub-pixels when traversing the order-sorted Gaussians within a tile. Each sub-pixel independently undergoes α𝛼\alphaitalic_α-blending and weights the Gaussian spherical harmonic coefficient according to the sub-pixel sampling locations. (b) is our integration method that diagonalizes the Gaussian covariance matrix by pixel rotation. This decomposes the integration operation into the product of two marginal Gaussian distributions.

Our 2D scale-adaptive filter ensures that the Gaussian distribution remains consistent via matching arbitrary rendering settings with the training setting. Only after this scale adaptation, we can tackle the aliasing issue. Specifically, due to the Nyquist sampling theorem[15], the image will show aliasing effects as the rendering frequency decreases. In the conventional graphics literature, there are two techniques that can be used to deal with this problem of aliasing: super-sampling and pre-filtering. 3DGS[10] cannot leverage these old techniques to deal with anti-aliasing due to the aforementioned issue of Gaussian scale mismatch. Our method maintains consistent Gaussian scale across different resolutions, allowing for effective removal of scene aliasing. We chose to use super-sampling and its limiting case, integration, instead of pre-filtering, because the pre-filtering affects the α𝛼\alphaitalic_α-blending procedure of 3DGS which may be a future pursuit.

3.2.1 Super-sampling

Given a pixel Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, when traversing Gaussian primitives that have been order-sorted within a tile, we compute the distance between the centers of the S×S𝑆𝑆S\times Sitalic_S × italic_S sub-pixels and the center of the Gaussian primitive separately, as shown in Fig. 5(a). These sub-pixels have independent α𝛼\alphaitalic_α-blending processes and cumulative transparency Tssubscript𝑇𝑠T_{s}italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT.The color of a pixel Ctsubscript𝐶𝑡C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is determined by averaging the color of these sampled sub-pixels:

Ctsubscript𝐶𝑡\displaystyle C_{t}italic_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =1S2i=1Gs=1S2αsi×Tsi×(SHi)absent1superscript𝑆2superscriptsubscript𝑖1𝐺superscriptsubscript𝑠1superscript𝑆2superscriptsubscript𝛼𝑠𝑖superscriptsubscript𝑇𝑠𝑖𝑆subscript𝐻𝑖\displaystyle=\frac{1}{S^{2}}\sum_{i=1}^{G}\sum_{s=1}^{S^{2}}\alpha_{s}^{i}% \times T_{s}^{i}\times\mathcal{F}(SH_{i})= divide start_ARG 1 end_ARG start_ARG italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_G end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT × caligraphic_F ( italic_S italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) (4)
Tsisuperscriptsubscript𝑇𝑠𝑖\displaystyle T_{s}^{i}italic_T start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ={1,i=1j=1i1(1αsj),i>1absentcases1𝑖1missing-subexpressionsuperscriptsubscriptproduct𝑗1𝑖11superscriptsubscript𝛼𝑠𝑗𝑖1missing-subexpression\displaystyle=\left\{\begin{array}[]{lr}1,i=1\\ \prod_{j=1}^{i-1}(1-\alpha_{s}^{j}),i>1\end{array}\right.= { start_ARRAY start_ROW start_CELL 1 , italic_i = 1 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ( 1 - italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) , italic_i > 1 end_CELL start_CELL end_CELL end_ROW end_ARRAY

where G𝐺Gitalic_G is the number of Gaussian primitives on the z-buffer, αsisuperscriptsubscript𝛼𝑠𝑖\alpha_{s}^{i}italic_α start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT is the opacity calculated based on the distance between sthsubscript𝑠𝑡s_{th}italic_s start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT sub-pixel and ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT Gaussian, and SHi𝑆subscript𝐻𝑖SH_{i}italic_S italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the spherical harmonic coefficient of the ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT Gaussian. The function ()\mathcal{F}(\cdot)caligraphic_F ( ⋅ ) is used for the spherical harmonic coefficient to color conversion. For fast convergence, we set S=3𝑆3S=3italic_S = 3 in all experimental settings.

Refer to caption
Figure 6: Single-scale Training and multi-scale testing on the Mip-NeRF 360 Dataset[2] for zoom-out effect. 3DGS[10] has dilation artefacts (red boxes) at low resolutions. Our 2D scale-adaptive filter maintains the Gaussian scale consistency at low resolutions, and the super-sampling and integration methods further remove the aliasing artefacts (yellow boxes), yielding results that surpass Mip-Splatting[24].

3.2.2 Integration

When the super-sampling hyperpameter of S𝑆Sitalic_S goes to infinity, it becomes integration. Consider a single Gaussian’s projection on the 2D camera plane, a 2D Gaussian whose PDF we can represent as f(x,y)𝑓𝑥𝑦f(x,y)italic_f ( italic_x , italic_y ), where x𝑥xitalic_x and y𝑦yitalic_y are coordinates on the camera plane, and we take axes of the 2D Gaussian Projection as the coordinate system axes, thus making the correlation of the projected Gaussians zero. As the correlation is zero we have that f(x,y)𝑓𝑥𝑦f(x,y)italic_f ( italic_x , italic_y ) can further be factored into the product gx(x)gy(y)subscript𝑔𝑥𝑥subscript𝑔𝑦𝑦g_{x}(x)g_{y}(y)italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_x ) italic_g start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_y ), where g(t)=exp(12(tσ)2)2πσ𝑔𝑡12superscript𝑡𝜎22𝜋𝜎g(t)=\frac{\exp(-\frac{1}{2}(\frac{t}{\sigma})^{2})}{\sqrt{2\pi}\sigma}italic_g ( italic_t ) = divide start_ARG roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( divide start_ARG italic_t end_ARG start_ARG italic_σ end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG square-root start_ARG 2 italic_π end_ARG italic_σ end_ARG. Let Φ(t)=tg(t)𝑑tΦ𝑡superscriptsubscript𝑡𝑔𝑡differential-d𝑡\Phi(t)=\int_{-\infty}^{t}g(t)dtroman_Φ ( italic_t ) = ∫ start_POSTSUBSCRIPT - ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_g ( italic_t ) italic_d italic_t (subscripts x𝑥xitalic_x or y𝑦yitalic_y omitted) be the Gaussian integral. Let the region inside the pixel be P𝑃Pitalic_P. Hence when calculating α𝛼\alphaitalic_α during the traversal of the Gaussian z-buffer, we need to find the following double integral:

α=Pf(x,y)𝑑x𝑑y𝛼subscriptdouble-integral𝑃𝑓𝑥𝑦differential-d𝑥differential-d𝑦\alpha=\iint\limits_{P}f(x,y)dxdyitalic_α = ∬ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_f ( italic_x , italic_y ) italic_d italic_x italic_d italic_y (5)

Axis-aligned Case. When the axes of the pixel are parallel to the 2D Gaussian Projection’s axes, the evaluation of (5) is simple because it can be calculated as the product of two Gaussian single integrals:

[t]Pf(x,y)𝑑x𝑑ydelimited-[]𝑡subscriptdouble-integral𝑃𝑓𝑥𝑦differential-d𝑥differential-d𝑦\displaystyle[t]\iint\limits_{P}f(x,y)dxdy[ italic_t ] ∬ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_f ( italic_x , italic_y ) italic_d italic_x italic_d italic_y =(Pxgx(x)𝑑x)(Pygy(y)𝑑y)absentsubscriptsubscript𝑃𝑥subscript𝑔𝑥𝑥differential-d𝑥subscriptsubscript𝑃𝑦subscript𝑔𝑦𝑦differential-d𝑦\displaystyle=(\int\limits_{P_{x}}g_{x}(x)dx)(\int\limits_{P_{y}}g_{y}(y)dy)= ( ∫ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_x ) italic_d italic_x ) ( ∫ start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_y ) italic_d italic_y )
=(Φx(Pxmax)Φx(Pxmin))(Φy(Pymax)Φy(Pymin))absentsubscriptΦ𝑥subscript𝑃𝑥subscriptΦ𝑥subscript𝑃𝑥subscriptΦ𝑦subscript𝑃𝑦subscriptΦ𝑦subscript𝑃𝑦\displaystyle=(\Phi_{x}(P_{x\max})-\Phi_{x}(P_{x\min}))(\Phi_{y}(P_{y\max})-% \Phi_{y}(P_{y\min}))= ( roman_Φ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_x roman_max end_POSTSUBSCRIPT ) - roman_Φ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_x roman_min end_POSTSUBSCRIPT ) ) ( roman_Φ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_y roman_max end_POSTSUBSCRIPT ) - roman_Φ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_y roman_min end_POSTSUBSCRIPT ) ) (6)

where Pxminsubscript𝑃𝑥P_{x\min}italic_P start_POSTSUBSCRIPT italic_x roman_min end_POSTSUBSCRIPT and Pxmaxsubscript𝑃𝑥P_{x\max}italic_P start_POSTSUBSCRIPT italic_x roman_max end_POSTSUBSCRIPT are marginal Gaussian intervals on the x-axis, Pyminsubscript𝑃𝑦P_{y\min}italic_P start_POSTSUBSCRIPT italic_y roman_min end_POSTSUBSCRIPT and Pymaxsubscript𝑃𝑦P_{y\max}italic_P start_POSTSUBSCRIPT italic_y roman_max end_POSTSUBSCRIPT are the same on the y-axis. However, when the pixel’s sides is not aligned with the axes, x𝑥xitalic_x and y𝑦yitalic_y will be related. In the evaluation of (5) this is reflected as the inner integral containing variable instead of constant limits, and therefore the factorization carried out in (3.2.2) will no longer be applicable.

Pixel Rotation. In our implementation, we solve this problem by rotating the pixel such that it aligns with the Gaussian’s axes, as shown in Fig. 5(b). Thus, the integral can be easily computed using this approach as described in (3.2.2). However, rotating the pixel causes a deviation between the integral region and the original pixel region. We prove that for any pixel close enough to the center of the Gaussian to be affected during α𝛼\alphaitalic_α-blending, there exists a theoretical upper bound for the error. We also verify through numerical experimentation that this is empirically a good approximation. Detailed proofs and experimental results are provided in the supplementary material.

PSNR \uparrow SSIM \uparrow LPIPS\downarrow
1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res. Avg. 1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res. Avg. 1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res Avg.
3DGS[10] 29.26 26.84 22.16 19.63 24.47 0.877 0.863 0.726 0.612 0.769 0.185 0.148 0.198 0.223 0.189
3DGS(MS)[10] 20.11 23.50 32.51 23.72 24.96 0.604 0.774 0.956 0.832 0.792 0.389 0.212 0.051 0.118 0.192
Mip-Splatting[24] 29.26 30.23 30.56 29.61 29.91 0.875 0.909 0.929 0.934 0.911 0.187 0.116 0.080 0.066 0.113
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙(ours)SA\text{-}GS_{fil}\text{(ours)}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT (ours) 29.26 29.80 28.29 25.58 28.23 0.877 0.901 0.875 0.809 0.866 0.185 0.123 0.126 0.171 0.151
SA-GSint(ours)𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡(ours)SA\text{-}GS_{int}\text{(ours)}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT (ours) 29.14 30.06 30.13 28.81 29.53 0.873 0.905 0.921 0.919 0.904 0.188 0.118 0.086 0.078 0.118
SA-GSsup(ours)𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝(ours)SA\text{-}GS_{sup}\text{(ours)}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT (ours) 29.26 30.45 31.75 32.53 31.00 0.876 0.912 0.938 0.951 0.919 0.186 0.114 0.073 0.053 0.106
Table 1: Single-scale training and multi-scale testing on the Mip-NeRF 360 Dataset[2]. Except for 3DGS(MS)3𝐷𝐺𝑆𝑀𝑆3DGS(MS)3 italic_D italic_G italic_S ( italic_M italic_S )[10], which is trained on multiple scales, all other methods are trained on the largest scale (1×) and evaluated across four scales (1×, 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG×, 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG×, and 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG×), simulating zoom-out effect. SA-GSfil𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙SA\text{-}GS_{fil}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT means we only use the 2D scale-adaptive filter. All our variants significantly surpass 3DGS3𝐷𝐺𝑆3DGS3 italic_D italic_G italic_S at low resolutions, and SA-GSsup𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝SA\text{-}GS_{sup}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT yields better results than Mip-Splatting𝑀𝑖𝑝-𝑆𝑝𝑙𝑎𝑡𝑡𝑖𝑛𝑔Mip\text{-}Splattingitalic_M italic_i italic_p - italic_S italic_p italic_l italic_a italic_t italic_t italic_i italic_n italic_g[24].

4 Experiments

We first present the implementation details of SA-GS. We then evaluate its performance on the the unbounded Mip-NeRF 360 dataset[2] and the bounded Blender dataset[13]. Finally, we discuss some limitations of our approach.

4.1 Implementation

In our SA-GS framework, an advantage over Mip-splatting is that modifications are exclusively performed during the testing phase of 3DGS[10]. Therefore, in the training phase, we kept all the settings of the original Gaussian model, including training rounds, hyperparameter settings and densification strategy. In our super-sampling method, we synchronize all pixel threads333In 3DGS, each pixel is associated with a thread. within the same block before performing the alpha calculation to initialize the cumulative transparency in the shared memory. In our integration method, we project the pixel corner points towards the Gaussian axis to obtain an interval of Gaussian distribution for the marginal distributions. To simulate the pixel rotation, we multiply the pixel region area by a weight of 1sinθ+cosθ1𝑠𝑖𝑛𝜃𝑐𝑜𝑠𝜃\frac{1}{sin\theta+cos\theta}divide start_ARG 1 end_ARG start_ARG italic_s italic_i italic_n italic_θ + italic_c italic_o italic_s italic_θ end_ARG to compensate the error caused by rotation, where θ𝜃\thetaitalic_θ is the angle between the long Gaussian axis and the x-axis of the pixel plane. Please refer to the supplementary material for details.

4.2 Evaluation on the Mip-NeRF 360 Dataset

4.2.1 Single-scale Training and Multi-scale Testing:

We simulated the effect of zoom-out and zoom-in on this dataset and retrained all the baseline models for comparison. The same setup of Mip-Splatting[24] is used. Specifically, for zoom-out, we trained 3DGS[10] using full resolution images and then plug in our method to test on progressively lower resolutions (1×, 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG×, 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG×, 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG×). For zoom-in, we trained 3DGS[10] using 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG× resolution images and then plug in our method to test on progressively higher resolutions (1×, 2×, 4×, 8×). Table 1 and Table 2 show the quantitative results for these two experiments.

When zooming out, our method gives comparable results at the training resolution and better performance at lower resolutions. The (heuristic and fixed) dilation operation of 3DGS[10] leads to severe degradation at low resolutions. Mip-Splatting[24] replaces 3DGS’s dilation operation with 3D smoothing and a 2D Mip filter, but still suffers from scale ambiguity at different resolutions. As depicted in Figure 6, our 2D scale-adaptive filter ensures that the Gaussian distribution is consistent with the training settings. Our integration and super-sampling modules further enhance the anti-aliasing effect. The super-sampling version of our method gives the best results for this setting, which exceeds Mip-Splatting[24] by 1.1dB in PSNR (see Tab. 1).

When zooming in, 3DGS[10] shows severe erosion artefacts at high resolution. Mip-Splatting[24] uses the 3D smoothing filter to reduce the effects of scale ambiguity of 2D filtering. As depicted in Figure 7, our method keeps the Gaussian scales consistent and greatly reduces erosion artefacts. Note that integration and super sampling are only designed to address the decrease in sampling frequency (zoom-out). The most significant contribution is made by 2D scale-adaptive filter, which produces results comparable to Mip-Splatting[24].

Refer to caption
Figure 7: Single-scale Training and multi-scale testing on the Mip-NeRF 360 Dataset[2] for zoom-in effect. 3DGS[10] sees erosion artefacts (red boxes) at high resolution. By using only 2D scale-adaptive filters, we achieve a stable quality improvement at high resolutions without any re-training.
PSNR \uparrow SSIM \uparrow LPIPS\downarrow
1 Res. 2 Res. 4 Res. 8 Res. Avg. 1 Res. 2 Res. 4 Res. 8 Res. Avg. 1 Res. 2 Res. 4 Res. 8 Res Avg.
3DGS[10] 33.96 22.47 18.69 17.32 23.11 0.974 0.747 0.514 0.460 0.674 0.028 0.204 0.410 0.504 0.286
3DGS(MS)[10] 23.72 32.51 23.50 20.11 24.96 0.832 0.956 0.774 0.604 0.792 0.118 0.051 0.212 0.389 0.192
Mip-Splatting[24] 34.62 28.86 25.99 24.95 28.60 0.977 0.872 0.718 0.641 0.802 0.025 0.154 0.318 0.430 0.232
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙(ours)SA\text{-}GS_{fil}\text{(ours)}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT (ours) 33.96 27.89 25.32 24.40 27.89 0.974 0.840 0.677 0.615 0.777 0.028 0.189 0.360 0.465 0.260
Table 2: Single-scale training and multi-scale testing on the Mip-NeRF 360 Dataset[2]. Except for 3DGS(MS)3𝐷𝐺𝑆𝑀𝑆3DGS(MS)3 italic_D italic_G italic_S ( italic_M italic_S )[10], which is trained on multiple scales, all other re-training methods are trained on the 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG scale (1×) and evaluated across four scales (1×, 2×, 4×, and 8×), simulating zoom-in effect. SA-GSfil𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙SA\text{-}GS_{fil}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT means we only use 2D Scale-adaptive Filter. Our method significantly surpasses 3DGS3𝐷𝐺𝑆3DGS3 italic_D italic_G italic_S[10] at high resolutions and produce results comparable to Mip-Splatting𝑀𝑖𝑝-𝑆𝑝𝑙𝑎𝑡𝑡𝑖𝑛𝑔Mip\text{-}Splattingitalic_M italic_i italic_p - italic_S italic_p italic_l italic_a italic_t italic_t italic_i italic_n italic_g[24]. Note that the performance of SA-GSfil𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙SA\text{-}GS_{fil}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT is achieved without re-training.

4.3 Evaluation on the Blender Dataset

4.3.1 Multi-scale Training and Multi-scale Testing:

Following Mip-splatting, all baseline models were trained using multi-scale data from the train+test𝑡𝑟𝑎𝑖𝑛𝑡𝑒𝑠𝑡train+testitalic_t italic_r italic_a italic_i italic_n + italic_t italic_e italic_s italic_t section of the dataset and evaluated with multiscale data from the val𝑣𝑎𝑙valitalic_v italic_a italic_l section. We follow the image sampling ratio in Mip-Splatting[24] to train the 3DGS[10]. Our quantitative evaluation is shown in Table 3. Our approach yields comparable results with Mip-Splatting[24], and we use vanilla 3DGS[10] on multi-scale training only and do not need to modify the training procedure. Meanwhile, our approach significantly outperforms 3DGS[10], demonstrating stable performance at different resolutions. 3DGS[10] performance degrades as resolution decreases, even in the case that it is trained on multi-scale.

Refer to caption
Figure 8: Single-scale Training and multi-scale testing on the Blender[13] Dataset for zoom-in and zoom out effect. Our 2D scale-adaptive filter maintains the consistency of the 2D Gaussian projection when zooming out. Moreover, it alleviates erosion artefacts and does not modify the training procedure when zooming in. We use super-sampling and integration methods to further address aliasing.
PSNR \uparrow SSIM \uparrow LPIPS\downarrow
1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res. Avg. 1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res. Avg. 1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res Avg.
3DGS[10] 31.51 32.66 31.21 28.25 30.91 0.962 0.972 0.968 0.945 0.962 0.050 0.031 0.030 0.045 0.039
Mip-Splatting[24] 32.81 34.49 35.45 35.50 34.56 0.967 0.977 0.983 0.988 0.979 0.035 0.019 0.013 0.010 0.019
SA-GSint(ours)𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡(ours)SA\text{-}GS_{int}\text{(ours)}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT (ours) 30.84 32.71 34.26 32.80 32.65 0.956 0.969 0.978 0.979 0.971 0.055 0.031 0.021 0.019 0.032
SA-GSsup(ours)𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝(ours)SA\text{-}GS_{sup}\text{(ours)}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT (ours) 30.80 32.67 35.06 35.77 33.58 0.956 0.969 0.980 0.985 0.973 0.056 0.032 0.020 0.014 0.031
Table 3: Multi-scale Training and Multi-scale Testing on the Blender dataset[13]. Our training-free approach yields comparable results compared to Mip-Splatting[24]. Meanwhile, our approach significantly outperforms 3DGS3𝐷𝐺𝑆3DGS3 italic_D italic_G italic_S[10] and demonstrates stable performance at different resolutions.
PSNR \uparrow SSIM \uparrow LPIPS\downarrow
1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res. Avg. 1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res. Avg. 1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res Avg.
3DGS 35.10 27.91 22.42 18.76 26.05 0.974 0.949 0.862 0.736 0.880 0.029 0.033 0.069 0.133 0.066
3DGS(MS) 31.51 32.66 31.21 28.25 30.91 0.962 0.972 0.968 0.945 0.962 0.050 0.031 0.030 0.045 0.039
Mip-Splatting 34.59 35.11 31.98 28.14 32.45 0.973 0.979 0.975 0.952 0.970 0.032 0.019 0.019 0.029 0.025
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙(ours)SA\text{-}GS_{fil}\text{(ours)}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT (ours) 34.60 34.33 31.02 27.59 31.89 0.973 0.977 0.968 0.947 0.966 0.031 0.022 0.036 0.067 0.039
SA-GSint(ours)𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡(ours)SA\text{-}GS_{int}\text{(ours)}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT (ours) 34.35 34.39 30.99 26.89 31.65 0.972 0.978 0.971 0.940 0.965 0.032 0.020 0.023 0.039 0.029
SA-GSsup(ours)𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝(ours)SA\text{-}GS_{sup}\text{(ours)}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT (ours) 34.49 36.58 37.50 35.64 36.06 0.972 0.980 0.985 0.985 0.981 0.032 0.018 0.014 0.013 0.019
Table 4: Single-scale training and Multi-scale testing on the Blender dataset[13] for zoom-out effect. We use the same experiment protocol and model naming with the Mip-NeRF 360[2] experiment (of Table 1). Our method outperforms 3DGS[10], while SA-GSsup𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝SA\text{-}GS_{sup}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT significantly surpasses all previous works.

4.3.2 Single-scale Training and Multi-scale Testing:

We maintain a experiment protocol consistent with the Mip-NeRF 360[2] experiment mentioned above (Section 4.2) to evaluate the zoom-out and zoom-in effects. We also keep the same data split as in the multi-scale training experiment described above. Table 4 and Table 5 show the quantitative results for zoom-out and zoom-in effects. The qualitative results are shown in Fig. 8.

For zoom-out, our method achieves performance close to 3DGS[10] at training resolution and a steady increase in performance at lower resolutions. The super-sampling version of our method SA-GSsup𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝SA\text{-}GS_{sup}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT significantly outperforms Mip-Splatting[24] to achieve performance with a gain of 3.61dB in PSNR in this setting (see Tab. 4). For zoom-in, Our 2D scale-adaptive filter achieves comparable results to Mip-Splatting[24], and our method is training-free.

PSNR \uparrow SSIM \uparrow LPIPS\downarrow
1 Res. 2 Res. 4 Res. 8 Res. Avg. 1 Res. 2 Res. 4 Res. 8 Res. Avg. 1 Res. 2 Res. 4 Res. 8 Res Avg.
3DGS[10] 36.97 24.33 21.01 19.63 25.44 0.988 0.886 0.820 0.821 0.879 0.013 0.065 0.130 0.159 0.092
3DGS(MS)[10] 28.25 31.21 32.66 31.51 30.91 0.945 0.968 0.972 0.962 0.962 0.045 0.030 0.031 0.050 0.039
Mip-Splatting[24] 36.50 30.72 27.81 26.51 30.39 0.986 0.959 0.920 0.893 0.939 0.015 0.048 0.099 0.130 0.073
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙(ours)SA\text{-}GS_{fil}\text{(ours)}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT (ours) 35.74 30.38 27.63 26.36 30.03 0.984 0.953 0.912 0.885 0.933 0.016 0.059 0.111 0.141 0.082
Table 5: Single-scale training and Multi-scale testing on the Blender dataset[13] for zoom-in effect. We use the experiment and model naming with the Mip-NeRF 360[2] experiment (of Table  2). Our methods yields comparable results with Mip-Splatting𝑀𝑖𝑝-𝑆𝑝𝑙𝑎𝑡𝑡𝑖𝑛𝑔Mip\text{-}Splattingitalic_M italic_i italic_p - italic_S italic_p italic_l italic_a italic_t italic_t italic_i italic_n italic_g[24]. SA-GSfil𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙SA\text{-}GS_{fil}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT achieves this performance while being training-free.

5 Conclusion

We present SA-GS, a training-free framework that can seamlessly integrate with 3DGS[10] to enhance its anti-aliasing ability at arbitrary rendering frequencies. Specifically, we propose a 2D scale-adaptive filter, which maintains the 2D Gaussian projection scale’s consistency under different rendering settings. In addition, we employ conventional anti-aliasing techniques, super-sampling, and integration to significantly reduce image aliasing at lower sampling rates. SA-GS demonstrates superior or comparable performance to the state-of-the-art, as extensive validation is performed on both bounded and unbounded scenarios.
Limitations. Our method has no computational burden when zooming in, but when zooming out, the application of integration and super-sampling methods increases the rendering time. Due to shared memory, the elapsed time for super-sampling is comparable to that of integration, making it 15%similar-to\sim20% slower than the vanilla 3DGS[10]. However, integration can still be optimized(approximation calculations or table lookups), leading to further speedups. Overall, our approach receives a significant anti-aliasing performance boost with minimal trade-offs.

References

  • [1] Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 5855–5864 (2021)
  • [2] Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5470–5479 (2022)
  • [3] Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Zip-nerf: Anti-aliased grid-based neural radiance fields. arXiv preprint arXiv:2304.06706 (2023)
  • [4] Chen, W., Ling, H., Gao, J., Smith, E., Lehtinen, J., Jacobson, A., Fidler, S.: Learning to predict 3d objects with an interpolation-based differentiable renderer. Advances in neural information processing systems 32 (2019)
  • [5] Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: Radiance fields without neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5501–5510 (2022)
  • [6] Guédon, A., Lepetit, V.: Sugar: Surface-aligned gaussian splatting for efficient 3d mesh reconstruction and high-quality mesh rendering. arXiv preprint arXiv:2311.12775 (2023)
  • [7] Hu, W., Wang, Y., Ma, L., Yang, B., Gao, L., Liu, X., Ma, Y.: Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 19774–19783 (2023)
  • [8] Hu, W., Wang, Y., Ma, L., Yang, B., Gao, L., Liu, X., Ma, Y.: Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields (2023)
  • [9] Kato, H., Ushiku, Y., Harada, T.: Neural 3d mesh renderer. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3907–3916 (2018)
  • [10] Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering (2023)
  • [11] Liu, S., Li, T., Chen, W., Li, H.: Soft rasterizer: A differentiable renderer for image-based 3d reasoning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 7708–7717 (2019)
  • [12] Mildenhall, B., Srinivasan, P.P., Ortiz-Cayon, R., Kalantari, N.K., Ramamoorthi, R., Ng, R., Kar, A.: Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG) 38(4), 1–14 (2019)
  • [13] Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM 65(1), 99–106 (2021)
  • [14] Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG) 41(4), 1–15 (2022)
  • [15] Nyquist, H.: Certain topics in telegraph transmission theory. Transactions of the American Institute of Electrical Engineers 47(2), 617–644 (1928)
  • [16] Peng, Z., Hu, W., Shi, Y., Zhu, X., Zhang, X., Zhao, H., He, J., Liu, H., Fan, Z.: Synctalk: The devil is in the synchronization for talking head synthesis. arXiv preprint arXiv:2311.17590 (2023)
  • [17] Rückert, D., Franke, L., Stamminger, M.: Adop: Approximate differentiable one-pixel point rendering. ACM Transactions on Graphics (ToG) 41(4), 1–14 (2022)
  • [18] Sainz, M., Pajarola, R.: Point-based rendering techniques. Computers & Graphics 28(6), 869–879 (2004)
  • [19] Straub, J., Whelan, T., Ma, L., Chen, Y., Wijmans, E., Green, S., Engel, J.J., Mur-Artal, R., Ren, C., Verma, S., Clarkson, A., Yan, M., Budge, B., Yan, Y., Pan, X., Yon, J., Zou, Y., Leon, K., Carter, N., Briales, J., Gillingham, T., Mueggler, E., Pesqueira, L., Savva, M., Batra, D., Strasdat, H.M., Nardi, R.D., Goesele, M., Lovegrove, S., Newcombe, R.: The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797 (2019)
  • [20] Wei, Y., Wang, Z., Lu, Y., Xu, C., Liu, C., Zhao, H., Chen, S., Wang, Y.: Editable scene simulation for autonomous driving via collaborative llm-agents. arXiv preprint arXiv:2402.05746 (2024)
  • [21] Wu, Z., Liu, T., Luo, L., Zhong, Z., Chen, J., Xiao, H., Hou, C., Lou, H., Chen, Y., Yang, R., et al.: Mars: An instance-aware, modular and realistic simulator for autonomous driving. In: CAAI International Conference on Artificial Intelligence. pp. 3–15. Springer (2023)
  • [22] Xie, T., Zong, Z., Qiu, Y., Li, X., Feng, Y., Yang, Y., Jiang, C.: Physgaussian: Physics-integrated 3d gaussians for generative dynamics. arXiv preprint arXiv:2311.12198 (2023)
  • [23] Xu, Q., Xu, Z., Philip, J., Bi, S., Shu, Z., Sunkavalli, K., Neumann, U.: Point-nerf: Point-based neural radiance fields. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 5428–5438 (2022), https://api.semanticscholar.org/CorpusID:246210101
  • [24] Yu, Z., Chen, A., Huang, B., Sattler, T., Geiger, A.: Mip-splatting: Alias-free 3d gaussian splatting. arXiv preprint arXiv:2311.16493 (2023)
  • [25] Yuan, S., Zhao, H.: Slimmerf: Slimmable radiance fields. arXiv preprint arXiv:2312.10034 (2023)
  • [26] Zhou, Q., Li, W., Jiang, L., Wang, G., Zhou, G., Zhang, S., Zhao, H.: Pad: A dataset and benchmark for pose-agnostic anomaly detection. Advances in Neural Information Processing Systems 36 (2024)
  • [27] Zhu, Z., Chen, Y., Wu, Z., Hou, C., Shi, Y., Li, C., Li, P., Zhao, H., Zhou, G.: Latitude: Robotic global localization with truncated dynamic low-pass filter in city-scale nerf. In: 2023 IEEE International Conference on Robotics and Automation (ICRA). pp. 8326–8332. IEEE (2023)

SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing Supplementary Material

In this supplementary material, we first present the implementation details of the integration in F. Next, we proof the theoretical upper bound for the rotational error in G.1 and present the results of numerical experiments in G.2. Additionally, we present ablation studies of SA-GS in H. Finally, we report additional quantitative and qualitative results in I.

F Implementation Details of Integration

As stated in the main text, we simplify the integral operation by rotating the pixel. In the concrete implementation, the rotation is simulated by projection. Denote the two normalised eigenvectors of the 2D Gaussian distribution as vlongsubscript𝑣𝑙𝑜𝑛𝑔\vec{v}_{long}over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_l italic_o italic_n italic_g end_POSTSUBSCRIPT and vshortsubscript𝑣𝑠𝑜𝑟𝑡\vec{v}_{short}over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s italic_h italic_o italic_r italic_t end_POSTSUBSCRIPT, corresponding to larger and smaller eigenvalues respectively. We project the four corner points of the pixel in the direction of vlongsubscript𝑣𝑙𝑜𝑛𝑔\vec{v}_{long}over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_l italic_o italic_n italic_g end_POSTSUBSCRIPT and vshortsubscript𝑣𝑠𝑜𝑟𝑡\vec{v}_{short}over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s italic_h italic_o italic_r italic_t end_POSTSUBSCRIPT using dot product:

xminsubscript𝑥𝑚𝑖𝑛\displaystyle x_{min}italic_x start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT =min(Plu,lb,ru,rbvlong)absent𝑚𝑖𝑛subscript𝑃𝑙𝑢𝑙𝑏𝑟𝑢𝑟𝑏subscript𝑣𝑙𝑜𝑛𝑔\displaystyle=min(P_{lu,lb,ru,rb}\cdot\vec{v}_{long})= italic_m italic_i italic_n ( italic_P start_POSTSUBSCRIPT italic_l italic_u , italic_l italic_b , italic_r italic_u , italic_r italic_b end_POSTSUBSCRIPT ⋅ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_l italic_o italic_n italic_g end_POSTSUBSCRIPT ) (7)
xmaxsubscript𝑥𝑚𝑎𝑥\displaystyle x_{max}italic_x start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT =max(Plu,lb,ru,rbvlong)absent𝑚𝑎𝑥subscript𝑃𝑙𝑢𝑙𝑏𝑟𝑢𝑟𝑏subscript𝑣𝑙𝑜𝑛𝑔\displaystyle=max(P_{lu,lb,ru,rb}\cdot\vec{v}_{long})= italic_m italic_a italic_x ( italic_P start_POSTSUBSCRIPT italic_l italic_u , italic_l italic_b , italic_r italic_u , italic_r italic_b end_POSTSUBSCRIPT ⋅ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_l italic_o italic_n italic_g end_POSTSUBSCRIPT )
yminsubscript𝑦𝑚𝑖𝑛\displaystyle y_{min}italic_y start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT =min(Plu,lb,ru,rbvshort)absent𝑚𝑖𝑛subscript𝑃𝑙𝑢𝑙𝑏𝑟𝑢𝑟𝑏subscript𝑣𝑠𝑜𝑟𝑡\displaystyle=min(P_{lu,lb,ru,rb}\cdot\vec{v}_{short})= italic_m italic_i italic_n ( italic_P start_POSTSUBSCRIPT italic_l italic_u , italic_l italic_b , italic_r italic_u , italic_r italic_b end_POSTSUBSCRIPT ⋅ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s italic_h italic_o italic_r italic_t end_POSTSUBSCRIPT )
ymaxsubscript𝑦𝑚𝑎𝑥\displaystyle y_{max}italic_y start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT =max(Plu,lb,ru,rbvshort)absent𝑚𝑎𝑥subscript𝑃𝑙𝑢𝑙𝑏𝑟𝑢𝑟𝑏subscript𝑣𝑠𝑜𝑟𝑡\displaystyle=max(P_{lu,lb,ru,rb}\cdot\vec{v}_{short})= italic_m italic_a italic_x ( italic_P start_POSTSUBSCRIPT italic_l italic_u , italic_l italic_b , italic_r italic_u , italic_r italic_b end_POSTSUBSCRIPT ⋅ over→ start_ARG italic_v end_ARG start_POSTSUBSCRIPT italic_s italic_h italic_o italic_r italic_t end_POSTSUBSCRIPT )

where Plu,lb,ru,rbsubscript𝑃𝑙𝑢𝑙𝑏𝑟𝑢𝑟𝑏P_{lu,lb,ru,rb}italic_P start_POSTSUBSCRIPT italic_l italic_u , italic_l italic_b , italic_r italic_u , italic_r italic_b end_POSTSUBSCRIPT are the coordinates of the four corner points of the pixel, specifically the left upper, left lower, right upper and right lower. xmaxxminsubscript𝑥𝑚𝑎𝑥subscript𝑥𝑚𝑖𝑛x_{max}-x_{min}italic_x start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT or ymaxyminsubscript𝑦𝑚𝑎𝑥subscript𝑦𝑚𝑖𝑛y_{max}-y_{min}italic_y start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT is equivalent to the side length of the rotated pixel.

However, when rotating pixels in the above manner, their edge lengths always increase, resulting in an area that is larger than the correct range. On the other hand, restricting the area of the rotated pixel to be inside the original pixel causes a loss of integral area. To balance this issue, we scale the pixel area before projection to ensure that the rotated pixel area is equal to the original pixel area. Specifically, we multiply the original pixel edge lengths by the 1sinθ+cosθ1𝑠𝑖𝑛𝜃𝑐𝑜𝑠𝜃\frac{1}{sin\theta+cos\theta}divide start_ARG 1 end_ARG start_ARG italic_s italic_i italic_n italic_θ + italic_c italic_o italic_s italic_θ end_ARG, as illustrated in Fig. 10.

G Analysis of Rotational Errors

Although rotating the pixel simplifies the integral calculation, it inevitably introduces an error, even if the area of the pixel after rotation is equal to the original. We prove that for any pixel close enough to the center of the Gaussian to be affected during α𝛼\alphaitalic_α-blending, there exists a theoretical upper bound for the error. Additionally, We verify through numerical experimentation that this is empirically a good approximation.

Refer to caption
Figure 10: Area scaling when rotating pixels. In integration method, the pixel area is scaled before projection to ensure that the projected (rotated) pixel area is equal to the original pixel area. θ𝜃\thetaitalic_θ is the rotation angle of the pixel.

G.1 Theoretical Upper Bound

G.1.1 Normalization and Rotation

Let the center of the pixel be at coordinates (xc,yc)subscript𝑥𝑐subscript𝑦𝑐(x_{c},y_{c})( italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ), and the side length of the pixel be l𝑙litalic_l. Let the pixel have a counterclockwise tilt of θ𝜃\thetaitalic_θ with respect to the standard x-axis. The first problem we need to solve would be to eliminate the tilt in order to simplify the double integral. In order to do so, we would need to apply a rotation transformation to the Bivariate Gaussian Distribution without changing its main form. Hence we first normalize it into a Bivariate Normal Distribution via scaling.

We construct x,y=xσx,yσyformulae-sequencesuperscript𝑥superscript𝑦𝑥subscript𝜎𝑥𝑦subscript𝜎𝑦x^{*},y^{*}=\frac{x}{\sigma_{x}},\frac{y}{\sigma_{y}}italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG italic_x end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG , divide start_ARG italic_y end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG such that we have the Normal Distribution gx(x)=exp(12x2)2π=σxgx(x)superscriptsubscript𝑔𝑥superscript𝑥12superscript𝑥absent22𝜋subscript𝜎𝑥subscript𝑔𝑥𝑥g_{x}^{*}(x^{*})=\frac{\exp(-\frac{1}{2}x^{*2})}{\sqrt{2\pi}}=\sigma_{x}g_{x}(x)italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = divide start_ARG roman_exp ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_x start_POSTSUPERSCRIPT ∗ 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG square-root start_ARG 2 italic_π end_ARG end_ARG = italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_x ) (gysuperscriptsubscript𝑔𝑦g_{y}^{*}italic_g start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is analogous). Hence we have:

f(x,y)=σxσygx(x)gy(y)=σxσyf(x,y)superscript𝑓superscript𝑥superscript𝑦subscript𝜎𝑥subscript𝜎𝑦subscript𝑔𝑥𝑥subscript𝑔𝑦𝑦subscript𝜎𝑥subscript𝜎𝑦𝑓𝑥𝑦f^{*}(x^{*},y^{*})=\sigma_{x}\sigma_{y}g_{x}(x)g_{y}(y)=\sigma_{x}\sigma_{y}f(% x,y)italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) = italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_x ) italic_g start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT ( italic_y ) = italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT italic_f ( italic_x , italic_y ) (8)

The rotational symmetry of the Bivariate Normal Distribution now allows us to rotate the pixel clockwise by θ𝜃\thetaitalic_θ with respect to the origin without changing the integral (this does not affect the integral). Now the new pixel region, Psuperscript𝑃P^{*}italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, is a parallelogram with top and bottom edges parallel to the x-axis. Let the four corners (labeled in order analogously to quadrants) be labeled respectively P1(x1,y1)subscript𝑃1subscript𝑥1subscript𝑦1P_{1}(x_{1},y_{1})italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), P2(x2,y1)subscript𝑃2subscript𝑥2subscript𝑦1P_{2}(x_{2},y_{1})italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), P3(x3,y3)subscript𝑃3subscript𝑥3subscript𝑦3P_{3}(x_{3},y_{3})italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), P4(x4,y3)subscript𝑃4subscript𝑥4subscript𝑦3P_{4}(x_{4},y_{3})italic_P start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), and the slope of the edges P1P4subscript𝑃1subscript𝑃4P_{1}P_{4}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT and P2P3subscript𝑃2subscript𝑃3P_{2}P_{3}italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT be k𝑘kitalic_k. WLOG let y10subscript𝑦10y_{1}\geq 0italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≥ 0 (otherwise conduct a reflection across the x-axis).

G.1.2 Double Integration

In order to find bounds for Pf(x,y)𝑑x𝑑ysubscriptdouble-integralsuperscript𝑃superscript𝑓superscript𝑥superscript𝑦differential-dsuperscript𝑥differential-dsuperscript𝑦\iint\limits_{P^{*}}f^{*}(x^{*},y^{*})dx^{*}dy^{*}∬ start_POSTSUBSCRIPT italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) italic_d italic_x start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_d italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, if we keep the current boundary for Psuperscript𝑃P^{*}italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, we will need to integrate ΦgΦ𝑔\Phi groman_Φ italic_g, which would be unrepresentable. In order to find the theoretical range of the error, we instead try to fix the sliced Gaussian distribution by fixing ysuperscript𝑦y^{*}italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Combining with (8), we readily have the following bounds:

Pf(x,y)𝑑x𝑑ysubscriptdouble-integral𝑃𝑓𝑥𝑦differential-d𝑥differential-d𝑦\displaystyle\iint\limits_{P}f(x,y)dxdy∬ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_f ( italic_x , italic_y ) italic_d italic_x italic_d italic_y >1σxσyy3y1yy1k+x2yy1k+x1f(x,max{0,y3})𝑑x𝑑yabsent1subscript𝜎𝑥subscript𝜎𝑦superscriptsubscriptsubscript𝑦3subscript𝑦1superscriptsubscriptsuperscript𝑦subscript𝑦1𝑘subscript𝑥2𝑦subscript𝑦1𝑘subscript𝑥1superscript𝑓𝑥0subscript𝑦3differential-d𝑥differential-d𝑦\displaystyle>\frac{1}{\sigma_{x}\sigma_{y}}\int_{y_{3}}^{y_{1}}\int_{\frac{y^% {*}-y_{1}}{k}+x_{2}}^{\frac{y-y_{1}}{k}+x_{1}}f^{*}(x,\max\{0,y_{3}\})dxdy> divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG ∫ start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT divide start_ARG italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_k end_ARG + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG italic_y - italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_k end_ARG + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x , roman_max { 0 , italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT } ) italic_d italic_x italic_d italic_y (9)
Pf(x,y)𝑑x𝑑ysubscriptdouble-integral𝑃𝑓𝑥𝑦differential-d𝑥differential-d𝑦\displaystyle\iint\limits_{P}f(x,y)dxdy∬ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT italic_f ( italic_x , italic_y ) italic_d italic_x italic_d italic_y <1σxσyy3y1yy1k+x2yy1k+x1f(x,max{y1,|y3|})𝑑x𝑑yabsent1subscript𝜎𝑥subscript𝜎𝑦superscriptsubscriptsubscript𝑦3subscript𝑦1superscriptsubscriptsuperscript𝑦subscript𝑦1𝑘subscript𝑥2𝑦subscript𝑦1𝑘subscript𝑥1superscript𝑓𝑥subscript𝑦1subscript𝑦3differential-d𝑥differential-d𝑦\displaystyle<\frac{1}{\sigma_{x}\sigma_{y}}\int_{y_{3}}^{y_{1}}\int_{\frac{y^% {*}-y_{1}}{k}+x_{2}}^{\frac{y-y_{1}}{k}+x_{1}}f^{*}(x,\max\{y_{1},|y_{3}|\})dxdy< divide start_ARG 1 end_ARG start_ARG italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT end_ARG ∫ start_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT divide start_ARG italic_y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT - italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_k end_ARG + italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG italic_y - italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_k end_ARG + italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x , roman_max { italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , | italic_y start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT | } ) italic_d italic_x italic_d italic_y (10)

Fig. 11 presents a visualisation of the theoretical analysis. Obviously, when the coordinates on the pixel are bounded by a constant times the respective standard deviations, coordinates of Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT will also be bounded by the constant. Further, k𝑘kitalic_k is only affected by θ𝜃\thetaitalic_θ, σxsubscript𝜎𝑥\sigma_{x}italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, and σysubscript𝜎𝑦\sigma_{y}italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT. Hence the original double integral is bounded in our approximation, and so is its error.

Refer to caption
Figure 11: Visual demonstration of our theoretical analysis. We scale the general Gaussian distribution to a standard normal distribution to estimate an upper bound on the error between the rotated pixel and the original pixel.

G.2 Numerical Experiments

We verify the above theoretical analysis by numerical experiments. Adopting the notation established in G.1, we denote (xc,yc)subscript𝑥𝑐subscript𝑦𝑐(x_{c},y_{c})( italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) as the coordinates of the pixel centroid, l𝑙litalic_l as the pixel side length, θ𝜃\thetaitalic_θ as the angle defining the counterclockwise rotation of the pixel, and σxsubscript𝜎𝑥\sigma_{x}italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and σysubscript𝜎𝑦\sigma_{y}italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT as the standard deviations corresponding to the directions of the two principal eigenvectors of the Gaussian distribution, respectively.

We adopted the framework of parametric sensitivity analysis for our experimental setup, by fixing certain parameters while sampling within reasonable ranges for the others. This approach aims to quantify the differences between the original pixel integrals and their counterparts after rotation. Specifically, we set l=1𝑙1l=1italic_l = 1 and xc=0subscript𝑥𝑐0x_{c}=0italic_x start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = 0, allowing θ𝜃\thetaitalic_θ and ycsubscript𝑦𝑐y_{c}italic_y start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT to uniformly sample six values from their respective intervals [0,π4]0𝜋4[0,\frac{\pi}{4}][ 0 , divide start_ARG italic_π end_ARG start_ARG 4 end_ARG ] and [0.05,0.25]0.050.25[0.05,0.25][ 0.05 , 0.25 ], thereby generating 36 sub-tables, as depicted in Fig. 12. In each sub-table, the parameters σxsubscript𝜎𝑥\sigma_{x}italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and σysubscript𝜎𝑦\sigma_{y}italic_σ start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT delineate the horizontal and vertical axes, correspondingly, with both parameters uniformly sampling 30 values from the interval [0.15,3.77]0.153.77[0.15,3.77][ 0.15 , 3.77 ]. This interval encompasses the core portion of the Gaussian distribution represented in the tile.

The final numerical experiment give an average relative error of 0.51%. Since most of the errors are 0 or close to 0, for ease of visualisation, we convert all the errors to the (0,1) range and widen the differences in the region close to 0 by y=11+e800x𝑦11superscript𝑒800𝑥y=\frac{1}{1+e^{-800x}}italic_y = divide start_ARG 1 end_ARG start_ARG 1 + italic_e start_POSTSUPERSCRIPT - 800 italic_x end_POSTSUPERSCRIPT end_ARG, as shown in Fig. 12. It can be seen that the error increases as the anisotropy of the Gaussian distribution becomes more pronounced, and that the error range increases as θ𝜃\thetaitalic_θ increases. However, the overall error values calculated are small, confirming that our method is a good estimation.

H Ablation

In this section, we evaluate the effectiveness of 2D scale-adaptive filter(H.1) and anti-aliasing methods(H.2). Additionally, we present corresponding qualitative and quantitative results.

H.1 Effectiveness of the 2D Scale-adaptive Filter

To evaluate the effectiveness of the 2D Scale-adaptive filter, we perform ablation studies with single-scale training and multi-scale testing(zoom-out and zoom-in) on both the Mip-NeRF 360 dataset and the Blender dataset. The quantitative results are presented in Table 7, Table 8, Table 9, and Table 10.

Due to the scale consistency across rendering settings brought about by the 2D scale-adaptive filter, we get a very noticeable performance improvement over 3DGS in both zoom-out and zoom-in scenarios. 3DGS expands or shrinks at different rendering frequencies, thus exacerbating the aliasing effect, as illustrated in Fig. 13 and Fig. 14.

H.2 Effectiveness of the Anti-aliasing Methods

To evaluate the effectiveness of the anti-aliasing methods(integration and super-sampling), we perform ablation studies with single-scale training and multi-scale testing for zoom-out effect on both the Mip-NeRF 360 dataset and the Blender dataset. The quantitative results are presented in Table 7 and Table 9. Note that the integration and super-sampling methods are intended solely for decreasing rendering frequency. Therefore, we do not focus on analysing their performances in the zoom-in case. Table 8 and Table 10 demonstrate that they perform comparably with 3DGS.

The integration and super-sampling methods are ineffective when the 2D scale-adaptive filter fails due to scale inconsistency in the 3DGS. However, when the 2D scale-adaptive filter is operational, these methods can further enhance the anti-aliasing ability of the scene, as illustrated in Fig. 13. In summary, we conclude that 3DGS does not provide a more robust representation of the scene using conventional anti-aliasing methods, but our 2D scale-adaptive filter completely removes this limitation.

Refer to caption
Figure 12: Numerical Experimental Results of Integration Error. we convert all the errors to the (0,1) range after transformation and widened the differences in the region close to 0. The average relative error is 0.51%, verifying that our method is a good estimation.
PSNR \uparrow SSIM \uparrow LPIPS\downarrow
1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res. Avg. 1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res. Avg. 1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res Avg.
3DGS 29.26 26.84 22.16 19.63 24.47 0.877 0.863 0.726 0.612 0.769 0.185 0.148 0.198 0.223 0.189
3DGS+Integration 29.14 26.44 22.22 19.32 24.28 0.873 0.850 0.726 0.588 0.759 0.188 0.158 0.197 0.238 0.196
3DGS+Super-sampling 29.26 26.88 22.79 19.91 24.71 0.876 0.861 0.754 0.633 0.781 0.186 0.153 0.188 0.223 0.188
3DGS+Adaptive Filter 29.26 29.80 28.26 25.58 28.23 0.877 0.901 0.875 0.809 0.866 0.185 0.123 0.126 0.171 0.151
Full Method(SA-GSint𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡SA\text{-}GS_{int}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT) 29.14 30.06 30.13 28.81 29.53 0.873 0.905 0.921 0.919 0.904 0.188 0.118 0.086 0.078 0.118
Full Method(SA-GSsup𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝SA\text{-}GS_{sup}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT) 29.26 30.45 31.75 32.53 31.00 0.876 0.912 0.938 0.951 0.919 0.186 0.114 0.073 0.053 0.106
Table 7: Ablation studies for zoom-out effect on the Mip-NeRF 360 Dataset. All methods are trained on the largest scale (1×) and evaluated across four scales (1×, 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG×, 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG×, and 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG×). Our 2D scale-adaptive filter removes 3DGS bloat at low rendering frequencies. Additionally, our integration and super-sampling methods further enhance anti-aliasing ability, as illustrated in Fig. 13. It is important to note that the integration and super-sampling methods are only effective when the 2D scale-adaptive filter is active.
PSNR \uparrow SSIM \uparrow LPIPS\downarrow
1 Res. 2 Res. 4 Res. 8 Res. Avg. 1 Res. 2 Res. 4 Res. 8 Res. Avg. 1 Res. 2 Res. 4 Res. 8 Res Avg.
3DGS 33.96 22.47 18.69 17.32 23.11 0.974 0.747 0.514 0.460 0.674 0.028 0.204 0.410 0.504 0.286
3DGS+Integration 32.57 22.67 18.74 17.30 22.82 0.962 0.754 0.520 0.463 0.675 0.040 0.204 0.407 0.502 0.288
3DGS+Super-sampling 33.05 22.65 18.77 17.36 22.96 0.966 0.753 0.520 0.464 0.676 0.038 0.205 0.407 0.501 0.288
3DGS+Adaptive Filter 33.96 27.89 25.32 24.40 27.89 0.974 0.840 0.677 0.615 0.777 0.028 0.189 0.360 0.465 0.260
Table 8: Ablation studies for zoom-in effect on the Mip-NeRF 360 Dataset. All methods are trained on the 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG scale (1×) and evaluated across four scales (1×, 2×, 4×, and 8×). Our 2D scale-adaptive filter eliminates erosion artefacts at high rendering frequencies, as illustrated in Fig. 14. Integration and super-sampling methods are not designed for this case, which are comparable to 3DGS.
PSNR \uparrow SSIM \uparrow LPIPS\downarrow
1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res. Avg. 1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res. Avg. 1 Res. 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG Res. 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG Res. 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG Res Avg.
3DGS 35.10 27.91 22.42 18.76 26.05 0.974 0.949 0.862 0.736 0.880 0.029 0.033 0.069 0.133 0.066
3DGS+Integration 34.49 28.06 22.78 19.12 26.11 0.972 0.948 0.867 0.745 0.883 0.032 0.036 0.072 0.135 0.069
3DGS+Super-sampling 34.27 27.10 21.93 18.39 25.42 0.972 0.939 0.848 0.719 0.869 0.032 0.038 0.078 0.146 0.074
3DGS+Adaptive Filter 34.60 34.33 31.02 27.59 31.89 0.973 0.977 0.968 0.947 0.966 0.031 0.022 0.036 0.067 0.039
Full Method(SA-GSint𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡SA\text{-}GS_{int}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT) 34.35 34.39 30.99 26.89 31.65 0.972 0.978 0.971 0.940 0.965 0.032 0.020 0.023 0.039 0.029
Full Method(SA-GSsup𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝SA\text{-}GS_{sup}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT) 34.49 36.58 37.50 35.64 36.06 0.972 0.980 0.985 0.985 0.981 0.032 0.018 0.014 0.013 0.019
Table 9: Ablation studies for zoom-out effect on the Blender Dataset. All methods are trained on the largest scale (1×) and evaluated across four scales (1×, 1/212\nicefrac{{1}}{{2}}/ start_ARG 1 end_ARG start_ARG 2 end_ARG×, 1/414\nicefrac{{1}}{{4}}/ start_ARG 1 end_ARG start_ARG 4 end_ARG×, and 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG×). Our 2D scale-adaptive filter removes 3DGS bloat at low rendering frequencies. Additionally, our integration and super-sampling methods further enhance anti-aliasing ability, as illustrated in Fig. 13. It is important to note that the integration and super-sampling methods are only effective when the 2D scale-adaptive filter is active.
PSNR \uparrow SSIM \uparrow LPIPS\downarrow
1 Res. 2 Res. 4 Res. 8 Res. Avg. 1 Res. 2 Res. 4 Res. 8 Res. Avg. 1 Res. 2 Res. 4 Res. 8 Res Avg.
3DGS 36.97 24.33 21.01 19.63 25.44 0.988 0.886 0.820 0.821 0.879 0.013 0.065 0.130 0.159 0.092
3DGS+Integration 34.12 24.48 21.09 19.67 24.84 0.980 0.886 0.819 0.820 0.876 0.025 0.072 0.133 0.161 0.098
3DGS+Super-sampling 34.01 24.39 21.02 19.63 24.76 0.980 0.884 0.818 0.820 0.876 0.022 0.069 0.132 0.161 0.096
3DGS+Adaptive Filter 35.74 30.38 27.63 26.36 30.03 0.984 0.953 0.912 0.885 0.933 0.016 0.059 0.111 0.141 0.082
Table 10: Ablation studies for zoom-in effect on the Blender Dataset. All methods are trained on the 1/818\nicefrac{{1}}{{8}}/ start_ARG 1 end_ARG start_ARG 8 end_ARG scale (1×) and evaluated across four scales (1×, 2×, 4×, and 8×). Our 2D scale-adaptive filter eliminates erosion artefacts at high rendering frequencies, as illustrated in Fig. 14. Integration and super-sampling methods are not designed for this case, which are comparable to 3DGS.

I Additional Results

In this section, we provide more qualitative and quantitative results on the Mip-NeRF 360 dataset(I.1) and the Blender dataset(I.2).

I.1 Mip-NeRF 360 Dataset

We further evaluate the effect of our method on zoom-out and zoom-in settings for each scene of this dataset. The quantitative results with per-scene metrics can be found in Table 11 and Table 12. Qualitative comparison with state-of-the-art methods are provided in Fig. 15 and Fig. 16. Our method achieves superior or comparable performance compared to the state-of-the-art, while being training-free.

I.2 Blender Dataset

We further evaluate the effect of our method on zoom-out and zoom-in settings for each scene of this dataset. The quantitative results with per-scene metrics can be found in Table 13 and Table 14. Qualitative comparison with state-of-the-art methods are provided in Fig. 17 and Fig. 18. Our method achieves superior or comparable performance compared to the state-of-the-art, while being training-free.

Refer to caption
Figure 13: Single-scale Training and Multi-scale Testing for Zoom-out Effect. All method are trained on full resolution(1×1\times1 ×) and evaluated on smallest resolution(1/8×\nicefrac{{1}}{{8}}\times/ start_ARG 1 end_ARG start_ARG 8 end_ARG ×) to mimic zoom-out case. 3DGS suffers from bloat or erosion artefacts at different rendering frequencies, which can exacerbate the aliasing effect. Our 2D scale-adaptive filter maintains the scale consistency of the Gaussian across different rendering settings. Additionally, our integration and super-sampling methods further enhance the anti-aliasing ability of Gaussian scenes. It is important to note that the integration and super-sampling methods are only effective in combination with the 2D scale-adaptive filter.
Refer to caption
Figure 14: Single-scale Training and Multi-scale Testing for Zoom-in Effect. All method are trained on smallest resolution(1/8×\nicefrac{{1}}{{8}}\times/ start_ARG 1 end_ARG start_ARG 8 end_ARG ×) and evaluated on full resolution(1×1\times1 ×) to mimic zoom-in case. 3DGS suffers from bloat or erosion artefacts at different rendering frequencies, which can exacerbate the aliasing effect. Our 2D scale-adaptive filter maintains the scale consistency of the Gaussian across different rendering settings. Note that the integration and super-sampling methods are only designed for the zoom-out case, so in the zoom-in case they maintain comparable effects to the addition of a 2D scale-adaptive filter.
PSNR \uparrow
bonsai bicycle counter garden kitchen room stump flowers treehill Avg.
3DGS 27.06 21.16 26.13 23.33 26.69 29.15 25.00 20.38 21.37 24.47
3DGS(MS) 22.27 26.82 26.66 21.52 24.62 25.64 30.17 24.55 22.38 24.96
Mip-Splatting 26.44 32.96 30.42 25.63 30.54 32.88 34.01 30.86 25.47 29.91
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙𝑜𝑢𝑟𝑠SA\text{-}GS_{fil}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 31.74 24.79 29.56 27.28 30.74 33.37 28.98 23.62 23.98 28.23
SA-GSint(ours)𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡𝑜𝑢𝑟𝑠SA\text{-}GS_{int}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 32.63 26.21 29.41 30.22 32.51 33.85 30.58 25.25 25.14 29.53
SA-GSsup(ours)𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝𝑜𝑢𝑟𝑠SA\text{-}GS_{sup}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 34.13 27.74 31.25 31.97 34.19 35.63 31.89 26.14 26.01 31.00
SSIM \uparrow
bonsai bicycle counter garden kitchen room stump flowers treehill Avg.
3DGS 0.872 0.687 0.847 0.715 0.835 0.887 0.744 0.650 0.687 0.769
3DGS(MS) 0.718 0.873 0.867 0.686 0.764 0.859 0.914 0.737 0.709 0.792
Mip-Splatting 0.876 0.962 0.942 0.820 0.934 0.959 0.960 0.916 0.836 0.912
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙𝑜𝑢𝑟𝑠SA\text{-}GS_{fil}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.940 0.813 0.921 0.835 0.900 0.946 0.874 0.774 0.788 0.866
SA-GSint(ours)𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡𝑜𝑢𝑟𝑠SA\text{-}GS_{int}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.960 0.872 0.906 0.928 0.957 0.959 0.911 0.814 0.832 0.904
SA-GSsup(ours)𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝𝑜𝑢𝑟𝑠SA\text{-}GS_{sup}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.966 0.888 0.947 0.943 0.965 0.966 0.923 0.827 0.845 0.919
LPIPS \downarrow
bonsai bicycle counter garden kitchen room stump flowers treehill Avg.
3DGS 0.140 0.239 0.143 0.169 0.123 0.134 0.218 0.269 0.264 0.189
3DGS(MS) 0.228 0.154 0.148 0.251 0.183 0.140 0.131 0.234 0.261 0.192
Mip-Splatting 0.136 0.086 0.094 0.183 0.061 0.060 0.088 0.118 0.185 0.112
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙𝑜𝑢𝑟𝑠SA\text{-}GS_{fil}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.109 0.183 0.120 0.135 0.108 0.112 0.154 0.215 0.225 0.151
SA-GSint(ours)𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡𝑜𝑢𝑟𝑠SA\text{-}GS_{int}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.088 0.139 0.119 0.066 0.062 0.090 0.122 0.185 0.188 0.118
SA-GSsup(ours)𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝𝑜𝑢𝑟𝑠SA\text{-}GS_{sup}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.084 0.125 0.091 0.054 0.055 0.083 0.110 0.177 0.178 0.106
Table 11: Single-scale Training and Multi-scale testing on the Mip-NeRF 360 Dataset. For each scene, we report the arithmetic mean of each metric averaged over the 4 scales(1×1\times1 ×, 1/2×\nicefrac{{1}}{{2}}\times/ start_ARG 1 end_ARG start_ARG 2 end_ARG ×, 1/4×\nicefrac{{1}}{{4}}\times/ start_ARG 1 end_ARG start_ARG 4 end_ARG ×, 1/8×\nicefrac{{1}}{{8}}\times/ start_ARG 1 end_ARG start_ARG 8 end_ARG ×) used in the dataset. (MS) means multi-scale training.
PSNR \uparrow
bonsai bicycle counter garden kitchen room stump flowers treehill Avg.
3DGS 24.52 20.88 24.64 22.57 24.31 27.85 22.45 19.87 20.89 23.11
3DGS(MS) 22.27 26.82 26.66 21.52 24.62 25.64 30.17 24.55 22.38 24.96
Mip-Splatting 25.85 30.67 29.43 25.17 28.37 30.22 33.10 28.95 25.69 28.60
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙𝑜𝑢𝑟𝑠SA\text{-}GS_{fil}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 30.12 25.07 28.80 27.83 29.29 32.16 28.23 24.42 25.12 27.89
SSIM \uparrow
bonsai bicycle counter garden kitchen room stump flowers treehill Avg.
3DGS 0.769 0.593 0.771 0.616 0.720 0.850 0.582 0.559 0.603 0.674
3DGS(MS) 0.718 0.873 0.867 0.686 0.764 0.859 0.914 0.737 0.709 0.792
Mip-Splatting 0.733 0.898 0.877 0.704 0.747 0.828 0.921 0.779 0.728 0.802
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙𝑜𝑢𝑟𝑠SA\text{-}GS_{fil}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.878 0.705 0.862 0.716 0.790 0.910 0.748 0.673 0.707 0.777
LPIPS \downarrow
bonsai bicycle counter garden kitchen room stump flowers treehill Avg.
3DGS 0.250 0.316 0.238 0.292 0.266 0.207 0.326 0.336 0.345 0.286
3DGS(MS) 0.228 0.154 0.148 0.251 0.183 0.140 0.131 0.234 0.261 0.192
Mip-Splatting 0.275 0.170 0.188 0.290 0.248 0.198 0.165 0.249 0.302 0.232
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙𝑜𝑢𝑟𝑠SA\text{-}GS_{fil}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.196 0.305 0.210 0.281 0.237 0.191 0.281 0.318 0.325 0.260
Table 12: Single-scale Training and Multi-scale testing on the Mip-NeRF 360 Dataset. For each scene, we report the arithmetic mean of each metric averaged over the 4 scales(1×1\times1 ×, 2×2\times2 ×, 4×4\times4 ×, 8×8\times8 ×) used in the dataset. (MS) means multi-scale training.
PSNR \uparrow
chair drums ficus hotdog lego materials mic ship Avg.
3DGS 25.04 22.09 26.64 28.43 26.12 26.55 28.01 25.51 26.05
3DGS(MS) 30.47 26.16 29.96 34.92 30.91 30.41 32.97 31.45 30.91
Mip-Splatting 32.57 26.74 32.92 34.44 33.39 31.53 35.69 32.36 32.45
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙𝑜𝑢𝑟𝑠SA\text{-}GS_{fil}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 32.13 26.71 32.48 35.66 33.05 30.90 33.78 30.39 31.89
SA-GSint(ours)𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡𝑜𝑢𝑟𝑠SA\text{-}GS_{int}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 31.38 26.23 32.32 33.73 32.33 31.00 34.84 31.41 31.65
SA-GSsup(ours)𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝𝑜𝑢𝑟𝑠SA\text{-}GS_{sup}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 37.89 28.87 36.34 39.40 38.59 33.01 39.64 34.70 36.06
SSIM \uparrow
chair drums ficus hotdog lego materials mic ship Avg.
3DGS 0.893 0.837 0.913 0.914 0.865 0.898 0.905 0.818 0.880
3DGS(MS) 0.975 0.943 0.965 0.983 0.967 0.962 0.974 0.923 0.962
Mip-Splatting 0.984 0.950 0.983 0.981 0.979 0.973 0.981 0.925 0.970
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙𝑜𝑢𝑟𝑠SA\text{-}GS_{fil}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.977 0.953 0.981 0.985 0.975 0.972 0.981 0.904 0.966
SA-GSint(ours)𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡𝑜𝑢𝑟𝑠SA\text{-}GS_{int}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.978 0.943 0.981 0.978 0.974 0.970 0.977 0.919 0.965
SA-GSsup(ours)𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝𝑜𝑢𝑟𝑠SA\text{-}GS_{sup}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.994 0.968 0.991 0.992 0.993 0.980 0.994 0.935 0.981
LPIPS \downarrow
chair drums ficus hotdog lego materials mic ship Avg.
3DGS 0.047 0.090 0.059 0.040 0.073 0.051 0.044 0.124 0.066
3DGS(MS) 0.022 0.057 0.033 0.019 0.034 0.036 0.027 0.083 0.039
Mip-Splatting 0.011 0.042 0.013 0.014 0.016 0.020 0.011 0.072 0.025
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙𝑜𝑢𝑟𝑠SA\text{-}GS_{fil}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.029 0.051 0.020 0.026 0.032 0.027 0.031 0.095 0.039
SA-GSint(ours)𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡𝑜𝑢𝑟𝑠SA\text{-}GS_{int}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.016 0.046 0.014 0.018 0.021 0.021 0.015 0.077 0.029
SA-GSsup(ours)𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝𝑜𝑢𝑟𝑠SA\text{-}GS_{sup}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.006 0.035 0.008 0.010 0.007 0.017 0.006 0.066 0.019
Table 13: Single-scale Training and Multi-scale testing on the Blender Dataset. For each scene, we report the arithmetic mean of each metric averaged over the 4 scales(1×1\times1 ×, 1/2×\nicefrac{{1}}{{2}}\times/ start_ARG 1 end_ARG start_ARG 2 end_ARG ×, 1/4×\nicefrac{{1}}{{4}}\times/ start_ARG 1 end_ARG start_ARG 4 end_ARG ×, 1/8×\nicefrac{{1}}{{8}}\times/ start_ARG 1 end_ARG start_ARG 8 end_ARG ×) used in the dataset. (MS) means multi-scale training.
PSNR \uparrow
chair drums ficus hotdog lego materials mic ship Avg.
3DGS 24.26 22.14 23.72 27.87 23.99 25.45 29.22 26.86 25.44
3DGS(MS) 30.47 26.16 29.96 34.92 30.91 30.41 32.97 31.45 30.91
Mip-Splatting 30.11 25.76 28.51 34.65 29.54 30.05 33.79 30.66 30.39
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙𝑜𝑢𝑟𝑠SA\text{-}GS_{fil}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 30.42 25.29 27.54 33.89 29.44 29.68 33.87 30.11 30.03
SSIM \uparrow
chair drums ficus hotdog lego materials mic ship Avg.
3DGS 0.891 0.857 0.903 0.915 0.834 0.883 0.921 0.823 0.879
3DGS(MS) 0.975 0.943 0.965 0.983 0.967 0.962 0.974 0.923 0.962
Mip-Splatting 0.951 0.921 0.947 0.971 0.930 0.951 0.965 0.880 0.939
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙𝑜𝑢𝑟𝑠SA\text{-}GS_{fil}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.949 0.912 0.933 0.967 0.925 0.946 0.965 0.871 0.933
LPIPS \downarrow
chair drums ficus hotdog lego materials mic ship Avg.
3DGS 0.071 0.106 0.068 0.073 0.122 0.084 0.063 0.146 0.092
3DGS(MS) 0.022 0.057 0.033 0.019 0.034 0.036 0.027 0.083 0.039
Mip-Splatting 0.052 0.091 0.059 0.048 0.088 0.060 0.049 0.138 0.073
SA-GSfil(ours)𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙𝑜𝑢𝑟𝑠SA\text{-}GS_{fil}(ours)italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT ( italic_o italic_u italic_r italic_s ) 0.057 0.102 0.074 0.055 0.091 0.068 0.052 0.154 0.082
Table 14: Single-scale Training and Multi-scale testing on the Blender Dataset. For each scene, we report the arithmetic mean of each metric averaged over the 4 scales(1×1\times1 ×, 2×2\times2 ×, 4×4\times4 ×, 8×8\times8 ×) used in the dataset. (MS) means multi-scale training.
Refer to caption
Figure 15: Single-scale Training and Multi-scale Testing on the Mip-NeRF 360 Dataset. All method are trained on full resolution(1×1\times1 ×) and evaluated on smallest resolution(1/8×\nicefrac{{1}}{{8}}\times/ start_ARG 1 end_ARG start_ARG 8 end_ARG ×) to mimic zoom-out case. Our 2D scale-adaptive filter maintains the consistency of the Gaussian across different rendering settings. SA-GSint𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡SA\text{-}GS_{int}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT achieves results comparable to Mip-Splatting, while SA-GSsup𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝SA\text{-}GS_{sup}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT surpasses Mip-Splatting, resulting in optimal performance for this setting.
Refer to caption
Figure 16: Single-scale Training and Multi-scale Testing on the Mip-NeRF 360 Dataset. All method are trained on smallest resolution(1/8×\nicefrac{{1}}{{8}}\times/ start_ARG 1 end_ARG start_ARG 8 end_ARG ×) and evaluated on full resolution(1×1\times1 ×) to mimic zoom-in case. SA-GSfil𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙SA\text{-}GS_{fil}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT achieves performance comparable to Mip-Splatting. However, Our 2D scale-adaptive filter is training-free and does not add any computational burden.
Refer to caption
Figure 17: Single-scale Training and Multi-scale Testing on the Blender Dataset. All method are trained on full resolution(1×1\times1 ×) and evaluated on smallest resolution(1/8×\nicefrac{{1}}{{8}}\times/ start_ARG 1 end_ARG start_ARG 8 end_ARG ×) to mimic zoom-out case. Our 2D scale-adaptive filter maintains the consistency of the Gaussian across different rendering settings. SA-GSint𝑆𝐴-𝐺subscript𝑆𝑖𝑛𝑡SA\text{-}GS_{int}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_i italic_n italic_t end_POSTSUBSCRIPT achieves results comparable to Mip-Splatting, while SA-GSsup𝑆𝐴-𝐺subscript𝑆𝑠𝑢𝑝SA\text{-}GS_{sup}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_s italic_u italic_p end_POSTSUBSCRIPT surpasses Mip-Splatting, resulting in optimal performance for this setting.
Refer to caption
Figure 18: Single-scale Training and Multi-scale Testing on the Blender Dataset. All method are trained on smallest resolution(1/8×\nicefrac{{1}}{{8}}\times/ start_ARG 1 end_ARG start_ARG 8 end_ARG ×) and evaluated on full resolution(1×1\times1 ×) to mimic zoom-in case. SA-GSfil𝑆𝐴-𝐺subscript𝑆𝑓𝑖𝑙SA\text{-}GS_{fil}italic_S italic_A - italic_G italic_S start_POSTSUBSCRIPT italic_f italic_i italic_l end_POSTSUBSCRIPT achieves performance comparable to Mip-Splatting. However, Our 2D scale-adaptive filter is training-free and does not add any computational burden.