Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
Abstract
3D Gaussians have recently emerged as a highly efficient representation for 3D reconstruction and rendering. Despite its high rendering quality and speed at high resolutions, they both deteriorate drastically when rendered at lower resolutions or from far away camera position. During low resolution or far away rendering, the pixel size of the image can fall below the Nyquist frequency compared to the screen size of each splatted 3D Gaussian and leads to aliasing effect. The rendering is also drastically slowed down by the sequential alpha blending of more splatted Gaussians per pixel. To address these issues, we propose a multi-scale 3D Gaussian splatting algorithm, which maintains Gaussians at different scales to represent the same scene. Higher-resolution images are rendered with more small Gaussians, and lower-resolution images are rendered with fewer larger Gaussians. With similar training time, our algorithm can achieve 13%-66% PSNR and 160%-2400% rendering speed improvement at 4-128 scale rendering on Mip-NeRF360 dataset compared to the single scale 3D Gaussian splatting. Project website: https://jokeryan.github.io/projects/ms-gs/.
1 Introduction
3D Gaussian Splatting[12] has recently emerged as a highly efficient representation for novel view synthesis. Compared to the time-consuming ray marching used in most neural radiance fields (NeRF) [15, 16, 2], a high-resolution image can be rendered in real-time by rasterizing the splatted 3D Gaussians. However, this rasterization algorithm is subjected to severe aliasing effect and speed deterioration when rendering the same scene at low resolution or from distant positions as shown in Fig. 1. This limitation significantly constrain the application of the 3D Gaussian splatting algorithm in reconstructing and rendering large-scale scenes.
Aliasing effect is a consequence of inadequate sampling frequency failing to capture the continuous signal accurately. In the context of rendering, image pixels are sampled with an interval of one-pixel size. The signal can be considered as the 3D scene represented implicitly as in NeRF or explicitly as in 3D Gaussians. When part of the 3D scene is represented with high details but rendered with low resolution or from distant positions, the disparity between the low sampling and high signal frequencies culminates in aliasing artifacts. A naive solution is to render at high resolution and subsequently down-scale the rendered image to a lower resolution. However, this solution is not viable for scenes containing both near and far regions which are very common. Due to the inability of 3D Gaussian splatting algorithm to accommodate varying resolutions within a single image, rendering the entire image with a even higher resolution for the sake of far away regions is neither time nor memory efficient.
We postulate that the pronounced aliasing artifacts observed when rendering with 3D Gaussians, as opposed to other techniques such as NeRF, are primarily attributable to the splatting of small Gaussians. 3D regions with intricate details are represented with large amount of small Gaussians. When rendering these regions with low resolution or from a distant view, many splatted small Gaussians are cramped in one pixel and therefore the pixel color of this region is dominated by the front-most Gaussian, even if this Gaussian is much smaller than others and not at the center. This problem is further aggravated by the low pass filter in [12, 19] applied to each individual Gaussian with the intention to mitigate aliasing on edges at high resolutions. This problem is explained in more detail in Sec. 3.2.
In addition to the aliasing artifacts, the rendering speed of 3D Gaussians is also affected at low resolution. The number of 3D Gaussians that need to be rendered remains constant at lower resolutions, but they are more concentrated to fewer pixels. The Gaussians that are splatted to the same pixel cannot be rendered in parallel. This means that the image rendering is even slower at lower resolution in comparison with NeRF rendering time that reduces linearly with decreasing resolution. Hence, although aliasing is not a problem exclusive to 3D Gaussian splatting, it is more prominent and more difficult to tackle.
Contributions
To mitigate the aliasing problem for 3D Gaussian splatting, we propose a novel multi-scale 3D Gaussians to represent the scene at different levels of detail (LOD) as shown in Fig. 2. This is inspired by the mipmap and LOD algorithms widely used in computer graphics, which pre-computes textures and polygons at different scales to be rendered under different resolutions and distances. Similarly, we add larger, coarser Gaussians for lower resolutions by aggregating the smaller and finer Gaussians from higher resolutions. Depending on the pixel coverage of the splatted Gaussians during rendering, only a subset of the Gaussians is used. A simplified explanation for this is that the coarse Gaussians are used to render low-resolution images and the fine Gaussians are used to render high-resolution images. With fewer than 5% number of Gaussians added and a similar training time, our method can achieve 13%-66% PSNR and 160%-2400% rendering speed improvements at 4-128 scale rendering on Mip-NeRF360 dataset[2], while maintaining a comparable rendering quality and speed at 1 scale.
2 Related Works
2.1 Anti-Aliasing in Computer Graphics
Aliasing is a long-standing problem for computer graphics when rendering a scene to a discrete image. Traditional anti-aliasing techniques primarily target mesh representations. Supersampling Anti-Aliasing (SSAA) [5] renders the scene at a higher resolution before downscaling, leading to significantly more time and memory demain, and therefore is less used in real-time applications. The Multisample Anti-Aliasing (MSAA) [1, 5] algorithm selectively supersamples pixels on the edges, reducing resource and time consumption. This technique is not very suitable for 3D Gaussian splatting because of its requirement for regular grids and lack of support for variable sampling resolution at different pixels. The more recent Fast Approximate Anti-Aliasing (FXAA) [14, 11] is a post processing algorithm that smooths the jagged edges after the image is rendered. Unfortunately, this technique is also not suitable for Gaussian representation as the front-most Gaussian dominates the pixel color and produces chunky instead of jagged artifacts in mesh rendering.
In contrast to the supersampling methods mentioned above, our method takes the inspiration from hierarchical mipmap [18] and level of details (LOD) [9, 7] algorithms to address the aliasing for 3D Gaussians. Mipmap uses multi-scale textures for the rendering at different resolution or from different distances. LOD algorithm represents the models in a scene with different complexity to be rendered at different distances. Both techniques not only mitigate the aliasing effect by reducing the complexity of the scene representation, but also enhances rendering speed, particularly for large-scale scenes.
2.2 Anti-Aliasing in Neural Representation
The recent success of neural representations especially Neural Radiance Fields (NeRF) [15, 16, 6] has also inspired some works to develop algorithms against aliasing effect on neural representations beyond the traditional mesh representation. Mip-NeRF [2, 3] employ low pass filters on the positional encoding of the input spatial coordinates to reduce the scene signal frequency. Building on the hash grid representation used by InstantNGP [16] with no position encoding, Zip-NeRF [4] proposes a multi-sampling strategy in the conical frustum instead of the camera ray, at the cost of 6 rendering time. Similar to the mipmap algorithm in mesh texture rendering, Tri-MipNeRF [10] and MipGrid [17] proposes to use multi-scale feature grids for rendering at different resolution or distance.
Conversely, 3D Gaussian splatting [12] presents unique anti-aliasing challenges due to its distinct scene representation. It does not have any positional encoding or feature grid, and its requirement for regular grids conflicts the more flexible multi-sampling strategies. The concentration of small Gaussians in detail-rich regions exacerbates aliasing and speed issues, more so than in NeRF representations. To the best of our knowledge, we are the first to propose an anti-aliasing algorithm for scene reconstruction using 3D Gaussian splatting.
3 Preliminaries
3.1 3D Gaussian Splatting
3D Gaussian splatting is first proposed in EWA Splatting [19], and later used by [12] for scene reconstruction and novel view synthesis. The scene is represented by a set of 3D Gaussians with variance , center , density and color . During rendering, the 3D Gaussians are splatted to the 2D screen to by the perspective transformation to form 2D Gaussians . The image is then divided into 1616 regular tiles and all 2D Gaussians touching each tile are sorted based on their original depth. The color of each pixel in the tile is then rasterized from the sequential alpha blending the 2D Gaussians from front to back.
3.2 Cause of Aliasing in 3D Gaussian Splatting
Aliasing can occur when sampling a continuous signal with a discrete sampling function , where is a impulse function. The result of the sample in the spatial domain is:
(1) |
This sampled function converted into the frequency domain using Fourier transform operator becomes:
(2) | ||||
When the highest frequency component of the signal is greater than half of the sampling frequency , in the summation sequence would overlap with each other and causes the sampled signal to diverge from the actual signal. This phenomenon is the aliasing effect and the minimum sampling frequency needed to avoid aliasing is , known as the Nyquist frequency.
The EWA splatting [19] used by 3D Gaussian splatting [12] also tries to mitigate the aliasing problem by applying a low pass filter to the rendered color. To approximate this efficiently, it applies a Gaussian kernel as the low pass filter on each splatted 2D signal independently to produce a band limited signal:
(3) | ||||
where is the range of one pixel, is the 2D integrated Gaussian kernel, and are the opacity, color, and transmittance at each Gaussian, respectively. By combining the reconstruction Gaussian kernel and low pass Gaussian kernel of covariance matrix and , the band limit function becomes:
(4) | ||||
where represents all coefficients invariant of at each Gaussian and is determined by the screen pixel size. A simple understanding of this is that the covariance of each 3D Gaussian is increased based on the screen pixel size.
This method of applying a low pass filter to each 3D Gaussian independently helps to smooth the edges of the Gaussians when the Gaussians are not too small compared to the pixel size. However, it also gives rise to two substantial issues at low resolutions:
-
1.
added to the original covariance effectively increases the extent of each Gaussian, especially when is large compared to at low resolutions. Small Gaussians in the front dominate the color of the pixel and cause severe artifacts shown in Fig. 7.
-
2.
The number of Gaussians involved in the sequential for each pixel scales increases with decreasing image resolution. Due to the incremental calculation of the transmittance , the rendering even slower at lower resolutions.
4 Our Method
4.1 Multi-Scale Gaussians Based on Pixel Coverage
To mitigate the aliasing artifacts of 3D Gaussians [12] while avoiding the two problems of the EWA splatting [19], we introduce multi-scale 3D Gaussians (cf. Fig. 2) that tackle the problem on the scene-level instead of on each individual Gaussian. The 3D scene is represented with Gaussians from 4 levels of detail, corresponding to the , , , and downsampled resolution. Small finer-level Gaussians are aggregated to create larger Gaussians for coarser levels during training. Each 3D Gaussian belongs to one of the levels and is included or excluded independently during the rendering based on its “pixel coverage”.
Pixel Coverage of Gaussian.
The “pixel coverage” of a Gaussian reflects the size of the Gaussian when splatted onto the screen space compared to the pixel size at the current rendering resolution. The “pixel coverage” of a splatted 2D Gaussian is defined as the length of its horizontal or vertical axis until the low opacity level set, whichever is smaller, as shown in Fig. 3. The pixel coverage is measured in pixel count and the opacity threshold is set as .
The pixel coverage approximates the extent of a 2D splatted Gaussian in the spatial domain. During the rendering from a given camera direction, the color of each splatted Gaussian is constant within this pixel coverage. As a result, the coverage of this pixel approximates the inverse of the highest frequency component in this region. Compared to the sampling frequency of during rasterization, a signal frequency of can cause the sampling to fall below the Nyquist frequency needed to avoid aliasing.
Consequently, the Gaussians with pixel coverage should be filtered out during rendering to avoid aliasing. Since 3D Gaussian representation does not encode the signal of different frequencies at different Gaussians, naively filtering out the small Gaussians will result in a hole or part missing in the scene as shown in Fig. 4. To address this issue, we propose to aggregate the small Gaussians to form large Gaussians that encode the low-frequency signal. These large Gaussians would appear when the small Gaussians are filtered out.
Aggregate to Insert Large Gaussians.
All 3D Gaussians initialized from the input point cloud at the start of the training belong to the finest level . They are densified by splitting and cloning as in [12], and all the densified Gaussians would inherit the same level. After the warm-up stage of the first 1,000 iterations, we introduce coarse-level Gaussians by aggregating fine-level Gaussians that are too small as visualized in Fig. 5 and described in Algorithm 1. The procedure is outlined as follows:
-
1.
For all levels , we render all 3D Gaussians from at the times downsampled resolution of all training images. All 3D Gaussians with the minimal “pixel coverage” smaller than the filter threshold are chosen for the aggregation.
-
2.
The chosen 3D Gaussians are binned by a resolution voxel grid based on their positions. The attributes of all Gaussians within each voxel are aggregated to create a new Gaussian using average pooling, including position, scaling, opacity and color. More details are included in the supplementary.
-
3.
Based on the average “pixel coverage” of Gaussians in each voxel, the scaling of each new Gaussian created is enlarged by so that it is of a size suitable to be rendered at . This new Gaussian belongs to level .
Not all Gaussians from the fine levels are small. Many Gaussians in the background or in the textureless regions are large and do not need to be aggregated. The number of Gaussians created is often fewer than 5% of the final total number of Gaussians.
Multi-Scale Training and Selective Rendering.
After the large Gaussians are added, the model is trained with both the original images and the downsampled images. A maximum pixel coverage and a minimum pixel coverage of each Gaussian are stored for the selective rendering. If the rendering downsample scale equals to the downsample scale when the Gaussian is created, its and values are updated with the new pixel coverage :
(5) | ||||
where and are decay coefficients taking the empirical value of and , respectively.
During rendering at any resolution or camera distance, a Gaussian is selected for rendering if its pixel coverage on the screen satisfies the following condition:
(6) |
where and are the maximum and minimum relative pixel coverage taking the empirical values of and respectively. If the pixel coverage of a Gaussian is too much larger than the , it is filtered out from rendering. Similarly, if it is too much smaller than the and is smaller than , it is filtered out from rendering (cf. Fig. 6). The absolute threshold is used to preserve the large Gaussians from the lower scales, as they do not cause the aliasing problem if their screen size is not sufficiently small. This selective rendering procedure is described in Algorithm 2. Additionally, even if the Gaussians from the finest level are too large or the Gaussians from the coarsest level are too small, they are not filtered to render beyond the maximum and below the minimum training resolutions.
The pixel coverage range of each Gaussian allows the model to maintain multi-scale Gaussians for different levels of detail. The appropriate subset of Gaussians is chosen for rendering at different resolutions and distances. More smaller Gaussians encoding the high-frequency information are rendered at high resolution, and fewer and larger Gaussians encoding the low-frequency information are rendered at low resolution for less aliasing effect and faster speed.
Scale | 1x | 4x | 16x | 64x | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Metric | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time |
3D Gaussian[12] | 27.52 | 0.142 | 10.5 | 22.50 | 0.137 | 9.3 | 17.79 | 0.149 | 27.9 | 15.23 | N.A. | 103.3 |
3DGS + MS Train | 27.35 | 0.155 | 11.3 | 23.50 | 0.126 | 7.7 | 20.21 | 0.115 | 22.8 | 19.38 | N.A. | 84.8 |
3DGS + Filter Small | 27.40 | 0.153 | 10.0 | 23.81 | 0.149 | 5.4 | 20.02 | 0.186 | 4.8 | 17.38 | N.A. | 4.6 |
3DGS + Insert Large | 18.02 | 0.604 | 9.7 | 18.75 | 0.531 | 2.5 | 20.23 | 0.256 | 2.7 | 21.53 | N.A. | 7.1 |
Our Full Method | 27.39 | 0.155 | 9.1 | 24.82 | 0.132 | 5.4 | 24.75 | 0.066 | 4.9 | 25.35 | N.A. | 4.9 |
Scale | 1x | 4x | 16x | 64x | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Metric | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time |
3D Gaussian[12] | 23.74 | 0.096 | 6.5 | 19.70 | 0.105 | 11.1 | 15.61 | 0.068 | 43.4 | 13.88 | N.A. | 82.6 |
3DGS + MS Train | 22.97 | 0.118 | 6.0 | 21.46 | 0.086 | 9.6 | 18.56 | 0.049 | 37.4 | 16.54 | N.A. | 71.7 |
3DGS + Filter Small | 23.78 | 0.100 | 5.6 | 20.12 | 0.107 | 4.5 | 17.41 | 0.072 | 4.4 | 14.95 | N.A. | 4.7 |
3DGS + Insert Large | 10.84 | 0.697 | 5.1 | 11.15 | 0.703 | 1.7 | 11.73 | 0.447 | 1.7 | 12.62 | N.A. | 2.5 |
Our Full Method | 23.46 | 0.111 | 7.6 | 21.92 | 0.087 | 4.7 | 20.91 | 0.034 | 4.8 | 19.67 | N.A. | 5.9 |
Scale | 1x | 4x | 16x | 64x | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Metric | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time |
3D Gaussian[12] | 29.65 | 0.094 | 8.6 | 27.48 | 0.066 | 7.5 | 22.06 | 0.067 | 20.7 | 17.75 | N.A. | 59.7 |
3DGS + MS Train | 29.46 | 0.102 | 6.6 | 28.18 | 0.062 | 5.3 | 24.13 | 0.055 | 14.3 | 20.03 | N.A. | 41.3 |
3DGS + Filter Small | 29.68 | 0.095 | 6.7 | 28.26 | 0.064 | 4.2 | 24.52 | 0.078 | 3.6 | 18.29 | N.A. | 3.2 |
3DGS + Insert Large | 20.59 | 0.379 | 4.6 | 20.83 | 0.336 | 1.6 | 21.29 | 0.143 | 2.1 | 20.10 | N.A. | 4.2 |
Our Full Method | 29.70 | 0.096 | 7.4 | 28.43 | 0.064 | 3.9 | 27.66 | 0.036 | 3.4 | 25.70 | N.A. | 3.4 |
5 Experiments
In this section, we present a comprehensive evaluation of our proposed model, which is grounded on the implementation framework of the official release of the 3D Gaussian Splatting code. To achieve a similar training time as the baseline model, our models are trained for 40000 iterations with all other hyper-parameters unchanged. All rendering speed are measured on a single RTX3090 GPU. We evaluate the performance of the vanilla 3D Gaussian Splatting[12] algorithm and our model on the multi-scale 360[3], Tank And Temples[13], and Deep Blending[8] dataset, aligned with the data used by the original paper. These datasets cover a wide range of object centric, indoor, and ourdoor scenes.
Our evaluation focuses on the rendering quality and speed at multiple downsampling scales of 1x, 4x, 16x, and 64x derived from the test views. The rendering quality is measured in PSNR and LPIPS, while the speed is measured in per-image rendering time. This multi-scale evaluation is aimed at simulating the rendering performance in scenarios of low-resolution imaging or when captured from distant cameras. More detailed evaluations, including the results for more resolution scales and per-scene decomposition, are included in the supplementary materials due to the space constraint. Additionally, the supplementary materials include a video that offers an intuitive qualitative comparison of the two algorithms, vividly demonstrating the improvement of our algorithm in quality and speed from multiple viewpoints.
Quantitative Comparison.
As shown in Tab. 1, Tab. 2, and Tab. 3, our method can achieve substantial quality and speed improvements compared to the original 3D Gaussian Splatting [12] at lower resolutions. The quality and speed improvements become more pronounced as the resolution reduces, with the most noticeable 6-10dB PSNR and 20-30 speed gain at the 64 resolution scale. As the resolution reduces, the original splatting algorithm slows down while our method accelerates. The rendering quality and speed at the original resolution (1) remain comparable, indicating the effectiveness of our multi-scale Gaussians in representing both the high and low resolutions together.
Qualitative Comparison.
We present the qualitative comparison with the original 3D Gaussian Splatting [12] shown in Fig. 7, Fig. 8, and Fig. 9. At higher resolutions (1-8), both ours and the original algorithm can render the novel view rather faithfully. However, as the resolution reduces further(16-64), the original splatting algorithm produces severe artifacts, where the foreground becomes larger and larger, dominating the pixel colors as explained in Sec. 3.2. In contrast, the images rendered by our method closely resemble the ground truth across all resolution scales.
Ablations.
To evaluate the effectiveness of the different components proposed, we present the ablation quantitative results in Tab. 1, Tab. 2, and Tab. 3 and the ablation qualitative results in the supplementary. The three ablation methods evaluated are named “3DGS+MS Train”, “3DGS+Filter Small”, and “3DGS+Insert Large”. “3DGS+MS Train” reports the result with multi-scale training on top of the original 3D Gaussian splatting. The “3DGS+Filter Small” reports the result with small Gaussian filtering using pixel coverage and the multi-scale training, which is needed to update the maximum and minimum pixel coverage. Similarly, the “3DGS+Insert Large” reports the result with large Gaussian insertion and the multi-scale training.
The ablation results reflect that multi-scale training marginally improves low-resolution rendering quality, but the rendering speed remains very slow. When filtering out the small Gaussians with multi-scale training, the speed at low resolution is increased by 20-30 with minimal rendering quality loss. The speed gain is caused by the considerably fewer Gaussians rendered. When inserting the large Gaussians and training with multi-scale supervision, without the small Gaussian filtering, the rendering quality drops significantly because the details of the scene are covered with large Gaussians completely for all resolutions. However, when adding the large Gaussians together with the small Gaussian filtering, the rendering quality and speed at low resolution are enhanced significantly without jeopardizing the high-resolution quality. This indicates the effectiveness of all three components and the full method proposed.
6 Limitations
Since all Gaussian filtering of our proposed method relies on the pixel coverage, it can only be done after the initial splatting process when the coverage is calculated. Although the splatting of individual Gaussians are performed in parallel and does not takes more time at lower resolution, it is still a considerable overhead when rendering at a very low resolution. Even if a very small portion of the Gaussians are used for rendering in the end, all Gaussians still need to be splatted. This is the main reason why our rendering time is not decreased linearly as the resolution decreases. In our future work, we will look into a more lightweight criteria to filter small and large Gaussians before splatting them onto the screen to achieve an even faster rendering speed.
7 Conclusion
We analyzed the cause of the severe aliasing artifact and speed degradation of the existing 3D Gaussian splatting. We identified the key challenge of mitigating the aliasing for 3D Gaussian splatting lies in representing the scene with Gaussians of appropriate scale. Based on this observation, we propose to calculate the pixel coverage of 3D Gaussians during splatting and use this as a criteria for selective rendering. Gaussians that are too large or too small at the current rendering resolution are filtered for anti-aliasing and speed improvements. We also proposed to insert large Gaussians by aggregating small Gaussians during training to preserve the low frequency details and prevent part missing. Our experiments on various datasets support the effectiveness of our algorithm in rendering quality and speed at both high and low resolution, mitigating the severe aliasing artifact of the original 3D Gaussian splatting.
Acknowledgement.
This work is supported by the Agency for Science, Technology and Research (A*STAR) under its MTC Programmatic Funds (Grant No. M23L7b0021).
Supplementary Material
8 Video Comparison
To better demonstrate the improvement of our algorithm in quality and speed for different resolutions, we include a video comparing our results with the original 3D Gaussian Splatting[12] at multiple scenes from different views and resolutions.
9 Details of Gaussian Aggregation Algorithm
Due to the space constraint of the main paper, some details of the Gaussian aggregation process are omitted. In this section, we will elaborate further with some examples to help the readers understand and reproduce our work. The process consists of the following steps:
Render at Lower Resolution.
Since we want to insert large Gaussians that are of appropriate size to be rendered at lower resolutions, we need to aggregate small Gaussians to form large Gaussians. Pixel coverage is used to determine whether a Gaussian is too small, we need to render all Gaussians first to calculate their pixel coverage at all training cameras. For all coarse levels , we render all Gaussians from at times downsampled resolution. For example, we render all Gaussians from level 1 to 3 at the downsampled resolution from all training cameras to add large Gaussians for level 4. A Gaussian splatted to any of the training cameras with a pixel coverage smaller than is considered too small, and is included for the next step of aggregation.
10 Theoretical Anti-aliasing Effectiveness of Gaussian Aggregation for 1D Signals
Our algorithm eschews low-pass filters for individual Gaussians as they do not mitigate the slow rendering speed. Instead, as shown in Fig. 10, we opt to substitute smaller Gaussians with fewer, larger ones, reducing the signal bandwidth and the number of primitives rendered. Heeding the reviewer’s suggestion, we now delve deeper into the signal-processing analysis of our algorithm’s anti-aliasing effect from first principles. Aliasing arises when a signal’s bandwidth surpasses half the sampling frequency, as per Nyquist sampling theorem. Taking the mixture of 1D Gausssians as an example, where , we aim to prove that in our algorithm, they are consistently substituted with a Gaussian whose 3dB bandwidth is below the aliasing frequency threshold .
According to our algorithm, the mixture of Gaussians is first aggregated into an average Gaussian with average and . We can apply the Fourier transform to convert it to the frequency domain to become . The 3dB bandwidth is the frequency where the magnitude is of its peak magnitude. By solving
(7) |
we find .
Our algorithm then scales standard deviation up by , where is the selective rendering threshold and is the pixel coverage of the Gaussian. We determine by calculating the size at its level set, solving with on 8-bit color images. This yields and thus the scaled standard deviation becomes . Given , the scaled . Consequently, we calculate the 3dB bandwidth of the scaled Gaussian as:
(8) | ||||
This indicates that the bandwidth of the scaled Gaussians remains invariant to the attributes of the smaller Gaussians they replace, and is below half of the sampling frequency to avoid aliasing. While differing from the traditional low-pass filtering, our method is equally effective in anti-aliasing but more efficient in rendering.
Unbounded Scene Normalization.
The Gaussians can be located at the range of in unbounded scenes. This is not suitable for voxelization later as only a limited amount of voxels can be used. To normalize the unbounded space, the center region and the outer region are handled in different manners. The space bounded by a axis-aligned cube of length defined by the span of all training cameras is considered the center region, and the rest is considered the outer region. To preserve the structure in the center region, the coordinates are linearly scaled from to . To normalize the unbounded outer region, the coordinates are non-linearly scaled from to . The exact normalization is as follows:
(9) |
Voxelization.
After the Gaussian positions are normalized to , they need to be voxelized so that all Gaussians in one voxel are grouped together for the aggregation later. The size of the voxel increases as the resolutions decrease because coarser levels require fewer larger Gaussians. Specifically, when inserting large Gaussians for level , the voxel size is chosen to be an empirical value of . All Gaussians with their center in one voxel are grouped together for the next step. Although it is possible for a Gaussian to extent beyond the voxel while its center resides in the voxel, it is unlikely to reach too far as large Gaussians are filtered out in the earlier procedure.
Average Pooling and Enlargement
After the small Gaussians are grouped in individual voxels, their parameters are averaged to create the large Gaussian. Specifically, the large Gaussian takes the average position, rotation, spherical harmonics features, opacity and scaling. However, a new Gaussian would be too small if it remains at this scaling. Consequently, we calculate the average pixel coverage of all the aggregated small Gaussians using their pixel coverage derived earlier. The scaling of the new Gaussian is then enlarged by for its pixel coverage to be approximately , which is suitable to be rendered at level . This average pooling is not perfect, but simple and effective enough to produce a reasonable initialization for the multi-scale training later.
11 Qualitative Ablation Study
To better compare the effectiveness of each of our proposed module qualitatively, we present the rendering results of our method and various ablation models in Fig. 11–15. The ablation model design follows the experiment section in the main paper. Specifically, the “+MS Train” model is trained using multi-scale images, but the Gaussians are only of a single scale as in 3D Gaussian Splatting [12]. The low-resolution performance is slightly improved, but the rendering speed is as slow as the original method. The “+Filter Small” model filters the small Gaussians based on the pixel coverage on top of the multi-scale training. It significantly accelerates the low-resolution rendering process, but the scene has some part missing as shown in the rendered images. The image rendered also has artifacts like black dots at low resolutions, caused by the filtered small Gaussians. The “+Insert Large” model inserts the large Gaussians from aggregation on top of the multi-scale training. It has good rendering speed and quality at low resolutions, but the image rendered at high resolution is over-smoothed. This is caused by the finer level Gaussians not filtered out but optimized together with the inserted large Gaussians at low resolutions. Our ”Full Method” overcomes the weakness of the ablation models and produces high-quality rendering at fast speed on both high and low resolutions. The small Gaussians filtered improves the speed, and the large Gaussians inserted improves the quality at low resolutions. The qualitative ablation supports the effectiveness of our proposed components.
12 Quantitative Results on More Resolutions
We present the quantitative results of our method, the original 3D Gaussian Splatting[12], and the various ablation methods on more downsampled resolutions. The resolutions include those that are not used during training which demonstrate the performance and robustness of our model. The experiments are conducted on MipNeRF-360 dataset [3] as shown in Tab. 4, Tank and Temple dataset [13] as shown in Tab. 5, and Deep Blending dataset [8] as shown in Tab. 6.
Scale | 1x | 2x | 4x | 8x | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Metric | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time |
3D Gaussian[12] | 27.52 | 0.142 | 10.5 | 25.96 | 0.124 | 8.0 | 22.50 | 0.137 | 9.3 | 19.79 | 0.154 | 14.6 |
3DGS + MS Train | 27.35 | 0.155 | 11.3 | 26.33 | 0.128 | 7.3 | 23.50 | 0.126 | 7.7 | 21.38 | 0.131 | 12.1 |
3DGS + Filter Small | 27.40 | 0.153 | 10.0 | 26.42 | 0.129 | 6.8 | 23.81 | 0.149 | 5.4 | 21.73 | 0.175 | 5.1 |
3DGS + Insert Large | 18.02 | 0.604 | 9.7 | 18.28 | 0.593 | 3.4 | 18.75 | 0.531 | 2.5 | 19.39 | 0.419 | 2.2 |
Our Method | 27.39 | 0.155 | 9.1 | 26.44 | 0.134 | 6.3 | 24.82 | 0.132 | 5.4 | 24.44 | 0.112 | 5.1 |
Scale | 16x | 32x | 64x | 128x | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Metric | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time |
3D Gaussian[12] | 17.79 | 0.149 | 27.9 | 16.30 | 0.084 | 55.2 | 15.23 | N.A. | 103.3 | 14.55 | N.A. | 123.2 |
3DGS + MS Train | 20.21 | 0.115 | 22.8 | 19.80 | 0.060 | 45.6 | 19.38 | N.A. | 84.8 | 18.75 | N.A. | 100.1 |
3DGS + Filter Small | 20.02 | 0.186 | 4.8 | 18.81 | 0.090 | 4.4 | 17.38 | N.A. | 4.6 | 16.13 | N.A. | 4.8 |
3DGS + Insert Large | 20.23 | 0.256 | 2.7 | 21.17 | 0.081 | 4.6 | 21.53 | N.A. | 7.1 | 20.25 | N.A. | 9.4 |
Our Method | 24.75 | 0.066 | 4.9 | 25.06 | 0.025 | 4.7 | 25.35 | N.A. | 4.9 | 22.55 | N.A. | 5.0 |
Scale | 1x | 2x | 4x | 8x | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Metric | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time |
3D Gaussian[12] | 23.74 | 0.096 | 6.5 | 22.55 | 0.080 | 7.1 | 19.70 | 0.105 | 11.1 | 17.34 | 0.117 | 21.5 |
3DGS + MS Train | 22.97 | 0.118 | 6.0 | 23.04 | 0.083 | 6.3 | 21.46 | 0.086 | 9.6 | 20.18 | 0.080 | 18.5 |
3DGS + Filter Small | 23.78 | 0.100 | 5.6 | 22.76 | 0.079 | 5.1 | 20.12 | 0.107 | 4.5 | 18.62 | 0.122 | 4.4 |
3DGS + Insert Large | 10.84 | 0.697 | 5.1 | 10.96 | 0.719 | 2.4 | 11.15 | 0.703 | 1.7 | 11.40 | 0.631 | 1.6 |
Our Method | 23.46 | 0.111 | 7.6 | 22.44 | 0.095 | 5.6 | 21.92 | 0.087 | 4.7 | 20.88 | 0.082 | 4.6 |
Scale | 16x | 32x | 64x | ||||||
---|---|---|---|---|---|---|---|---|---|
Metric | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time |
3D Gaussian[12] | 15.61 | 0.068 | 43.4 | 14.45 | N.A. | 70.9 | 13.88 | N.A. | 82.6 |
3DGS + MS Train | 18.56 | 0.049 | 37.4 | 17.41 | N.A. | 61.7 | 16.54 | N.A. | 71.7 |
3DGS + Filter Small | 17.41 | 0.072 | 4.4 | 16.05 | N.A. | 4.5 | 14.95 | N.A. | 4.7 |
3DGS + Insert Large | 11.73 | 0.447 | 1.7 | 12.14 | N.A. | 2.1 | 12.62 | N.A. | 2.5 |
Our Method | 20.91 | 0.034 | 4.8 | 21.01 | N.A. | 5.4 | 19.67 | N.A. | 5.9 |
Scale | 1x | 2x | 4x | 8x | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Metric | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time |
3D Gaussian[12] | 29.65 | 0.094 | 8.6 | 29.41 | 0.065 | 6.6 | 27.48 | 0.066 | 7.5 | 24.67 | 0.076 | 11.3 |
3DGS + MS Train | 29.46 | 0.102 | 6.6 | 29.42 | 0.069 | 4.8 | 28.18 | 0.062 | 5.3 | 26.15 | 0.065 | 8.0 |
3DGS + Filter Small | 29.68 | 0.095 | 6.7 | 29.53 | 0.064 | 4.9 | 28.26 | 0.064 | 4.2 | 26.51 | 0.082 | 3.8 |
3DGS + Insert Large | 20.59 | 0.379 | 4.6 | 20.67 | 0.381 | 2.2 | 20.83 | 0.336 | 1.6 | 21.07 | 0.263 | 1.7 |
Our Method | 29.70 | 0.096 | 7.4 | 29.58 | 0.065 | 4.8 | 28.43 | 0.064 | 3.9 | 27.59 | 0.063 | 3.6 |
Scale | 16x | 32x | 64x | ||||||
---|---|---|---|---|---|---|---|---|---|
Metric | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time |
3D Gaussian[12] | 22.06 | 0.067 | 20.7 | 19.74 | N.A. | 36.3 | 17.75 | N.A. | 59.7 |
3DGS + MS Train | 24.13 | 0.055 | 14.3 | 22.09 | N.A. | 24.8 | 20.03 | N.A. | 41.3 |
3DGS + Filter Small | 24.52 | 0.078 | 3.6 | 22.01 | N.A. | 3.3 | 18.29 | N.A. | 3.2 |
3DGS + Insert Large | 21.29 | 0.143 | 2.1 | 21.14 | N.A. | 2.8 | 20.10 | N.A. | 4.2 |
Our Method | 27.66 | 0.036 | 3.4 | 27.22 | N.A. | 3.3 | 25.70 | N.A. | 3.4 |
13 Per-Scene Quantitative Results
We present the per-scene decomposition of the quantitative results of our method and the original 3D Gaussian splatting [12] in various resolutions. The experiments are carried on MipNeRF-360 dataset [3] as shown in Tab. 7, Tank and Temple dataset [13] as shown in Tab. 8, and Deep Blending dataset [8] as shown in Tab. 9. The scenes chosen to be tested on follow the experiments carried out in the original 3D Gaussian splatting paper [12].
Scale | 1x | 4x | 16x | 64x | 128x | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Scene | Metric | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time |
garden | 3D-GS[12] | 27.27 | 0.070 | 15.0 | 20.42 | 0.136 | 14.4 | 16.74 | 0.166 | 48.8 | 14.92 | N.A. | 200.9 | 14.29 | N.A. | 245.0 |
garden | Ours | 27.16 | 0.080 | 11.8 | 23.99 | 0.112 | 7.8 | 26.41 | 0.044 | 7.5 | 24.79 | N.A. | 8.6 | 21.19 | N.A. | 9.6 |
flowers | 3D-GS[12] | 21.41 | 0.309 | 9.1 | 18.89 | 0.239 | 8.8 | 15.46 | 0.165 | 24.9 | 13.90 | N.A. | 93.2 | 13.69 | N.A. | 112.2 |
flowers | Ours | 21.11 | 0.333 | 8.1 | 20.83 | 0.234 | 5.7 | 21.97 | 0.093 | 5.1 | 22.69 | N.A. | 4.9 | 21.82 | N.A. | 5.0 |
treehill | 3D-GS[12] | 22.60 | 0.274 | 10.0 | 21.63 | 0.232 | 9.7 | 18.71 | 0.193 | 24.6 | 16.19 | N.A. | 90.6 | 15.52 | N.A. | 97.0 |
treehill | Ours | 22.64 | 0.291 | 8.7 | 22.31 | 0.239 | 5.8 | 23.55 | 0.072 | 5.4 | 24.28 | N.A. | 4.9 | 22.27 | N.A. | 5.0 |
bicycle | 3D-GS[12] | 25.15 | 0.164 | 18.8 | 19.71 | 0.178 | 15.5 | 16.27 | 0.215 | 43.9 | 14.99 | N.A. | 163.8 | 15.15 | N.A. | 187.0 |
bicycle | Ours | 24.44 | 0.210 | 13.4 | 24.76 | 0.131 | 7.4 | 25.00 | 0.081 | 6.4 | 26.02 | N.A. | 6.5 | 21.56 | N.A. | 6.9 |
counter | 3D-GS[12] | 29.15 | 0.099 | 7.5 | 24.81 | 0.084 | 6.4 | 17.94 | 0.101 | 19.2 | 14.32 | N.A. | 60.4 | 13.39 | N.A. | 74.6 |
counter | Ours | 29.17 | 0.100 | 6.6 | 26.77 | 0.076 | 3.3 | 23.44 | 0.057 | 2.8 | 24.59 | N.A. | 2.7 | 21.14 | N.A. | 2.7 |
kitchen | 3D-GS[12] | 31.70 | 0.064 | 9.3 | 23.95 | 0.081 | 8.5 | 18.50 | 0.093 | 35.4 | 15.00 | N.A. | 124.4 | 14.15 | N.A. | 150.3 |
kitchen | Ours | 31.64 | 0.064 | 8.1 | 25.93 | 0.089 | 4.2 | 24.16 | 0.049 | 3.9 | 25.35 | N.A. | 3.3 | 21.50 | N.A. | 3.2 |
room | 3D-GS[12] | 31.63 | 0.093 | 8.0 | 26.60 | 0.057 | 5.1 | 19.50 | 0.096 | 12.0 | 15.50 | N.A. | 49.2 | 14.37 | N.A. | 70.8 |
room | Ours | 31.51 | 0.094 | 6.6 | 28.95 | 0.053 | 3.1 | 28.15 | 0.025 | 2.9 | 25.77 | N.A. | 2.9 | 21.82 | N.A. | 2.9 |
stump | 3D-GS[12] | 26.75 | 0.138 | 10.6 | 22.24 | 0.152 | 10.1 | 18.57 | 0.188 | 26.5 | 17.33 | N.A. | 95.2 | 16.97 | N.A. | 114.0 |
stump | Ours | 26.59 | 0.152 | 12.9 | 23.52 | 0.150 | 8.2 | 25.22 | 0.112 | 7.2 | 29.22 | N.A. | 7.1 | 29.09 | N.A. | 7.2 |
bonsai | 3D-GS[12] | 32.04 | 0.065 | 6.0 | 24.23 | 0.075 | 5.3 | 18.43 | 0.126 | 15.4 | 14.95 | N.A. | 52.4 | 13.46 | N.A. | 57.9 |
bonsai | Ours | 32.27 | 0.067 | 5.5 | 26.32 | 0.106 | 3.3 | 24.87 | 0.062 | 2.8 | 25.40 | N.A. | 2.9 | 22.53 | N.A. | 2.8 |
Scale | 1x | 4x | 16x | 64x | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Scene | Metric | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time |
truck | 3D-GS[12] | 25.39 | 0.064 | 7.3 | 19.97 | 0.103 | 11.3 | 15.69 | 0.064 | 49.2 | 14.20 | N.A. | 89.1 |
truck | Ours | 24.94 | 0.078 | 9.0 | 23.67 | 0.059 | 5.4 | 22.62 | 0.024 | 6.0 | 19.99 | N.A. | 8.6 |
train | 3D-GS[12] | 22.09 | 0.129 | 5.8 | 19.42 | 0.108 | 10.9 | 15.54 | 0.072 | 37.6 | 13.57 | N.A. | 76.1 |
train | Ours | 21.98 | 0.144 | 6.2 | 20.17 | 0.114 | 3.9 | 19.21 | 0.044 | 3.5 | 19.36 | N.A. | 3.3 |
Scale | 1x | 4x | 16x | 64x | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Scene | Metric | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time | PSNR | LPIPS | Time |
drjohnson | 3D-GS[12] | 29.14 | 0.106 | 10.1 | 27.23 | 0.079 | 9.3 | 22.73 | 0.078 | 26.3 | 18.60 | N.A. | 67.6 |
drjohnson | Ours | 29.19 | 0.108 | 8.6 | 27.96 | 0.078 | 4.4 | 26.80 | 0.051 | 3.9 | 27.19 | N.A. | 3.8 |
playroom | 3D-GS[12] | 30.15 | 0.082 | 7.0 | 27.72 | 0.053 | 5.7 | 21.40 | 0.056 | 15.0 | 16.89 | N.A. | 51.8 |
playroom | Ours | 30.20 | 0.084 | 6.2 | 28.89 | 0.051 | 3.4 | 28.53 | 0.020 | 3.0 | 24.22 | N.A. | 3.0 |
References
- Akeley [1993] Kurt Akeley. Reality engine graphics. In Proceedings of the 20th Annual Conference on Computer Graphics and Interactive Techniques, page 109–116, New York, NY, USA, 1993. Association for Computing Machinery.
- Barron et al. [2021] Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P. Srinivasan. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. ICCV, 2021.
- Barron et al. [2022] Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. CVPR, 2022.
- Barron et al. [2023] Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Zip-nerf: Anti-aliased grid-based neural radiance fields. ICCV, 2023.
- Beets and Barron [2000] Kristof Beets and David L. Barron. Super-sampling anti-aliasing analyzed. 2000.
- Chen et al. [2022] Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision (ECCV), 2022.
- Erikson [1996] Carl Erikson. Polygonal simplification. Technical Report 96-016, 1996.
- Hedman et al. [2018] Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and Gabriel Brostow. Deep blending for free-viewpoint image-based rendering. ACM Trans. Graph., 37(6), 2018.
- Heok and Daman [2004] Tan Kim Heok and D. Daman. A review on level of detail. In Proceedings. International Conference on Computer Graphics, Imaging and Visualization, 2004. CGIV 2004., pages 70–75, 2004.
- Hu et al. [2023] Wenbo Hu, Yuling Wang, Lin Ma, Bangbang Yang, Lin Gao, Xiao Liu, and Yuewen Ma. Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 19774–19783, 2023.
- Jimenez et al. [2011] Jorge Jimenez, Diego Gutiérrez, Jason Yang, Alexander Reshetov, Pete Demoreuille, Tobias Berghoff, Cedric Perthuis, Henry Yu, Morgan Mcguire, Timothy Lottes, Hugh Malan, and Emil Persson. Filtering approaches for real-time anti-aliasing. ACM SIGGRAPH 2011 Courses, SIGGRAPH’11, 2011.
- Kerbl et al. [2023] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), 2023.
- Knapitsch et al. [2017] Arno Knapitsch, Jaesik Park, Qian-Yi Zhou, and Vladlen Koltun. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics, 36(4), 2017.
- Lottes [2011] Timothy Lottes. Fxaa. Technical report, Nvidia, 2011.
- Mildenhall et al. [2020] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Müller et al. [2022] Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
- Nam et al. [2023] Seungtae Nam, Daniel Rho, Jong Hwan Ko, and Eunbyung Park. Mip-grid: Anti-aliased grid representations for neural radiance fields. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Williams [1983] Lance Williams. Pyramidal parametrics. SIGGRAPH Comput. Graph., 17(3):1–11, 1983.
- Zwicker et al. [2001] M. Zwicker, H. Pfister, J. van Baar, and M. Gross. Ewa volume splatting. In Proceedings Visualization, 2001. VIS ’01., pages 29–538, 2001.