2.1 Physically based Rendering and Path Guiding
The radiance of a shading point can be calculated using the rendering equation [Kajiya
1986]:
which consists of the emitted radiance
\(L_e(\mathbf {x})\) at the point and the integral over the sphere
\(S^2\), which aggregates the contributions of the incident radiance
\(L_i(\mathbf {x}, \omega _i)\) from all directions
\(\omega _i\). For each
\(\omega _i\), the product of incident radiance, the BSDF
\(f_s(\mathbf {x},\omega _i, \omega _o)\), and the geometry term
\(|\cos \theta _i|\) signify its contribution to the outgoing radiance. In 3D scenes, due to light bouncing off surfaces, the incoming radiance
\(L_i\) can be considered as originating from another point on a different surface. This implies that
\(L_i\) also follows the integral form described by the rendering Equation (
1) and needs to be evaluated at the point from which the scattered light originates. This makes the problem highly complex, and no analytical solution can be formed in general.
Path tracers solve the rendering equation via the Monte Carlo method, which draws random direction samples
\(\omega _i^{\prime }\), and the average of samples is the estimation of the integral,
where
\(p(\omega _{ik}^{\prime })\) is the
probability density function (PDF) from which the integrator samples at point
\(\mathbf {x}\). Due to the complexity of light transport over the 3D space, the variance of samples is high, and many importance sampling methods have been proposed to reduce this variance (e.g., Veach [
1998], Hanika et al. [
2015], and Conty Estevez and Kulla [
2018]). These methods usually focus on importance sampling of single components of the rendering equation. Here, a universal importance sampling technique for the whole product of the rendering equation can help further improve sampling efficiency. Path guiding, which we describe next, is a genre of methods developed along this direction.
Path guiding methods aim to find sufficient approximation of the incident radiance distribution, which serves to achieve better importance sampling that follows the global illumination distribution rather than local BSDF distribution. Jensen [
1995] and Lafortune et al. [
1995] first proposed to fit the incident light distribution for more efficient indirect lighting sampling. Vorba et al. [
2014] fit a Gaussian mixture to model the distribution estimated by an explicit photon tracing pass. Müller et al. (
2017,
2019) proposed the
Practical Path Guiding (PPG) algorithm to model the distribution using an SD-tree approach. Based on these techniques, full product guiding methods have also been proposed. Herholz et al. [
2016] calculated an approximation of full product on top of an earlier work [Vorba et al.
2014], and Diolatzis et al. [
2020] calculated an approximation on top of another study [Müller et al.
2017] to achieve product guiding. Both of these methods operate at the cost of higher overhead as the approximated product needs to be calculated multiple times from the corresponding learned incident radiance distribution. The above techniques share the same idea of partitioning the space and progressively learning discrete distributions. Each discrete distribution is shared among points within the spatial partition. A major issue of this approach is parallax error: The above techniques fail to accurately learn the distribution for close-distance incident radiance where the incoming light quickly changes within the same spatial partition. Ruppert et al. [
2020] proposed a parallax-aware robust fitting method to address this issue with a discrete approach. We compared their approach with our network-based approach in Section
7.3. Recent research shows that good path guiding requires a proper blending weight between BSDF and product-driven distributions [Müller et al.
2020], and the learned distribution should be variance aware, since the samples are not zero-variance [Rath et al.
2020]. These findings can help to achieve a more robust path guiding framework.
Recently, neural networks have been used to model scattered radiance distribution; both online and offline learning methods have been proposed. Zhu et al. [
2021b] used neural networks to estimate a quad-tree representation of incident radiance distribution using nearest photons as input. Currius et al. [
2020] used convolutional neural networks to estimate incident radiance represented by spherical Gaussians; this work is for real-time rendering, but the estimated radiance distribution could be used for path guiding. Zhu et al. [
2021a] applied an offline-trained neural network to efficiently sample complex scenes with lamps. This technique requires training a U-Net for more than 10 hours for just a single light source, while the estimated distribution is only a
\(16\times 16\) 2D map, which is not sufficient for representing general indirect lighting distributions. A much higher resolution is required for robustly guiding over the entire 3D scene; for instance, PPG’s quad-tree approach can represent a resolution of
\(2^{16} \times 2^{16}\)). These methods are categorized as offline learning, since they require training a network offline with massive training samples using ground-truth distributions.
Our framework, however, is categorized as online learning, which learns the distribution on the fly, without a ground-truth distribution as reference. Previously, Müller et al. [
2020] adopted normalizing flow [Kobyzev et al.
2021] to model the full product of incident radiance distribution. However, with an implicit density model like normalizing flow, each sample/density evaluation requires a full forward pass of multiple neural networks, which introduces heavy computation costs. In a modern path tracer with multiple importance sampling (typically, BSDF sampling and
next event estimation (NEE)), this means full forward pass needs to be executed for each surface sample at least two times. Moreover, the training process requires dense usage of differentiable transforms, which makes the training slower than that for regular neural networks. Indeed, Müller et al. [
2020] used two GPUs specifically for the normalizing flow’s neural network computation along with the CPU-based path tracing implementation; nevertheless, the sampling speed was still only 1/4 of PPG. Our work, in contrast, proposes the use of an explicitly parameterized density model in closed form that can be learned using a small MLP. The neural network is used to generate the closed-form distribution model rather than actual samples; therefore, we are able to freely generate samples or evaluate the density after a single forward pass. With careful implementation, we show that our method can reach a similar sampling rate to that of PPG, while the result has lower variance. In research that is parallel with our work, Dong et al. [
2023] used a small neural network to estimate per-shading-point distribution, which is similar to our approach. However, we also propose the use of NASG, a novel anisotropic model, in place of classic
von Mises-Fisher (vMF) distribution. NASG helps to further improve the fitting accuracy and guiding efficiency. We highlight the benefits through experiments in Section
7.
2.3 Density Models
Parametric density models have been extensively explored in statistics. Exponential distribution is a representative family, among which Gaussian mixture is the most widely used density model. Vorba et al. [
2014] used a 2D Gaussian mixture to model incident radiance distribution; however, since the domain of a 2D Gaussian is the entire 2D plane, when mapping it to unit sphere, it is necessary to discard samples outside the domain, which affects computation efficiency. Dodik et al. [
2022] further proposed using 5D Gaussians to model incident radiance distribution over the space. By using a tangent space formulation, it greatly reduced the number of discarded samples.
In the context of physically based rendering, spherical models have been widely leveraged. The
spherical Gaussian (SG) is widely used for radiance representation and density modelling (e.g., Wang et al. [
2009]). Actually, a normalized SG is equivalent to the vMF distribution in 3D. It has several desirable properties, such as computational efficiency and analytical tractability for integrals. However, its expressiveness is limited due to its isotropic nature, which restricts the shape of the distribution on the sphere. The Kent distribution [
1982] presents a possible remedy by generalizing the 3D vMF model to a five-parameter anisotopic spherical exponential model, providing higher expressiveness. However, its application within our framework is impeded by three significant limitations: high computational cost, the absence of a direct sampling method, and issues with numerical precision, particularly when the parameters for controlling concentration are high. These limitations are elaborated upon in Section
4.1.
Other recent works proposed spherical models specifically for graphics. Xu et al. [
2013] proposed an anisotropic spherical Gaussian model to achieve a larger variety of shapes; however, no analytic solution was found for its integral, which is problematic for normalization.
Heitz et al. [
2016] proposed another anisotripic model with a closed-form expression called
linearly transformed cosines (LTC). However, the integral of LTC requires computation of the inverse matrix, and one component requires in total 12 scalars to parameterize a component.
Another family of density models widely adopted in the rendering context is the polynomial family, among which spherical harmonics is the most commonly used model. Polynomial models have been extensively used in graphics [Sloan et al.
2002; Moon et al.
2016]. They can be easily mapped to unit spheres to represent spherical distributions. However, polynomial models are generally limited to capturing low-frequency distributions. To represent high-frequency distributions, a very high degree is required, leading to a significant increase in the number of parameters. Furthermore, there is currently no efficient approach to directly sample a polynomial distribution, although there does exist some relatively expensive importance sampling schemes (e.g., Jarosz et al. [
2009]). These were the main challenges in our pilot study that made us give up the idea of adopting them for our path guiding scheme.
Learning-based density models have been drawing more attention recently [Müller et al.
2020; Gilboa et al.
2021]. The model based on normalizing flow can successfully learn complex distributions [Müller et al.
2020] but suffers from the implicitness and heavy computation mentioned earlier.
Marginalizable Density Model Approximation (MDMA) [Gilboa et al.
2021] has been proposed as a closed-form learning-based density model, which is essentially a linear combination of multiple sets of 1D distributions. MDMA showed that closed-form normalization is an essential factor in learning density models, and inspired by their work, we propose such a model for our application.