research-article

Open access

Decorrelating ReSTIR Samplers via MCMC Mutations

Authors:

Matt PharrAuthors Info & Claims

ACM Transactions on Graphics, Volume 43, Issue 1

Article No.: 10, Pages 1 - 15

https://doi.org/10.1145/3629166

Published: 05 January 2024 Publication History

PDF eReader

Abstract

Monte Carlo rendering algorithms often utilize correlations between pixels to improve efficiency and enhance image quality. For real-time applications in particular, repeated reservoir resampling offers a powerful framework to reuse samples both spatially in an image and temporally across multiple frames. While such techniques achieve equal-error up to 100× faster for real-time direct lighting [Bitterli et al. 2020] and global illumination [Ouyang et al. 2021; Lin et al. 2021], they are still far from optimal. For instance, spatiotemporal resampling often introduces noticeable correlation artifacts, while reservoirs holding more than one sample suffer from impoverishment in the form of duplicate samples. We demonstrate how interleaving Markov Chain Monte Carlo (MCMC) mutations with reservoir resampling helps alleviate these issues, especially in scenes with glossy materials and difficult-to-sample lighting. Moreover, our approach does not introduce any bias, and in practice, we find considerable improvement in image quality with just a single mutation per reservoir sample in each frame.

1 Introduction

The efficiency of rendering algorithms often hinges on their ability to effectively evaluate similar integrals by reusing samples across pixels [Ward et al. 1988; Jensen 1996; Veach and Guibas 1997; Keller 1997]. In real-time path tracing, sample reuse becomes more critical since tracing rays is computationally intensive even on high-end consumer GPUs [Kilgariff et al. 2018]. Moreover, while existing denoisers drastically improve image quality even at low sample counts [Chaitanya et al. 2017; Schied et al. 2017;, 2018; Kozlowski and Cheblokov 2021; NVIDIA 2022], they are unable to reconstruct features missing from their input samples. Thus, sample reuse across pixels is often the only means to improve sampling quality given limited computational budgets. Compared to methods that generate independent samples, reuse is also at times the only practical approach to render challenging scenes with caustics and tricky lighting [Hachisuka and Jensen 2009; Veach and Guibas 1997].

Recent sampling algorithms for real-time ray tracing achieve massive speedups in scenes with complex illumination by sharing samples spatially within an image and temporally across frames [Bitterli et al. 2020; Ouyang et al. 2021; Lin et al. 2021;, 2022]. These so-called ReSTIR¹ based techniques select N high-contribution samples from a larger streamed candidate pool of size M. They do so by reformulating resampled importance sampling (RIS) [Talbot et al. 2005] in terms of weighted reservoir sampling (WRS) [Chao 1982]. While RIS effectively produces samples in proportion to an arbitrary target function (e.g., the integrand of the rendering equation), WRS makes resampling efficient by reducing storage costs from \(O(M)\) to \(O(N)\). Repeated resampling across pixels then helps distribute important samples over several frames for estimation.

Though ReSTIR derives impressive efficiency gains from correlated sampling, the benefits of repeated resampling are not indefinite. When only a few high-contribution samples have been identified, iterative spatial reuse creates blotchy artifacts as several pixels reuse the same sample (Figure 12, top row). Such undersampling artifacts eventually fade away with temporal reuse over several frames, using a user-specified parameter to balance pixel error with correlations from sample reuse (Figure 4). Unfortunately, simply emphasizing error reduction via greater reuse adds lag under camera movement with dynamically changing lighting and geometry (Section 2.4), and introduces distracting low-frequency artifacts (Figures 2 and 3) akin to those in photon mapping [Hachisuka and Jensen 2009], Metropolis Light Transport (MLT) [Veach and Guibas 1997] and Virtual Point Light (VPL) methods [Dachsbacher et al. 2014].

Fig. 1.

Fig. 2.

Fig. 3.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

Fig. 9.

Fig. 10.

Fig. 11.

Fig. 12.

As spatiotemporal correlations are difficult to quantify, resolving artifacts is challenging. For instance, popular denoisers that compute first- and second-order moments (e.g., Schied et al. [2017]) are less effective given imprecise variance estimates with correlated samples. For ReSTIR, trying to reduce such artifacts by increasing the reservoir size N is also ineffective, as resampling with replacement [Chao 1982] produces duplicate samples in the presence of strong correlations (see Wyman and Panteleev [2021, Figure 19]).

Inspired by work on Sequential Monte Carlo (SMC) [Doucet et al. 2001] and Population Monte Carlo (PMC) [Cappé et al. 2004], we demonstrate that interleaving MCMC mutations with reservoir resampling (Section 3) helps alleviate correlations and impoverishment, especially in scenes with glossy materials and difficult lighting. Unlike MLT where mutations drive information sharing across pixels, our mutations instead primarily mitigate artifacts caused by spatiotemporal reuse, with little-to-no visual impact in scenes where artifacts do not arise (Figure 13). Our approach highlights the complementary strengths of resampling and mutations for real-time rendering: resampling identifies samples with large contributions proportional to a pixel’s target distribution, while mutations diversify the resampled population by locally perturbing samples in proportion to the same target distribution. Furthermore, like Veach and Guibas [1997]’s bias elimination strategy for MLT, we show that resampling eliminates the need for any burn-in period with Metropolis–Hastings (MH) mutations [Metropolis et al. 1953; Hastings 1970] (Section 2.5, Appendix A). This drives considerable image quality improvements from even a single mutation per frame for each reservoir sample (Figures 1, 7, 9, and 11).

Fig. 13.

From an implementation perspective, our approach requires only simple additions to existing ReSTIR algorithms (see Algorithm 3)—we mutate reservoir samples using Metropolis–Hastings and an appropriate target function every frame after temporal reuse. This is immediately followed by an adjustment to each mutated sample’s contribution weight to maintain detailed balance and ensure unbiased estimation. Overall, our contributions include:

—

Demonstrating how to incorporate MCMC mutations within ReSTIR samplers to lessen correlation artifacts from spatiotemporal resampling.

—

Showing how to adjust the RIS contribution weight of mutated samples in an unbiased fashion for further resampling.

—

Situating ReSTIR in the broader family of techniques that jointly apply resampling and mutations to sampling problems, such as MLT, SMC, and PMC (see Table 2).

Table 1.

Table 2.

Though mutations can help produce images with better visual fidelity, we observe that, similar to blue-noise sampling [Mitchell 1987; Georgiev and Fajardo 2016; Heitz and Belcour 2019], they do not necessarily reduce error (Figure 9). This is true both in scenes with difficult-to-sample lighting (Figure 12, top row), as well as in scenes with ample lighting and diffuse materials where resampling easily finds important samples (Figure 13). Moreover, despite the effectiveness of mutations in scenes with strong correlations, we show in the supplemental document that with our current approach, even infinite mutations cannot eliminate correlations entirely.

We start with the key building blocks of our approach in the next section, and postpone discussion about related work to Section 6 for better context when comparing with our method.

2 Background

The rendering equation [Kajiya 1986] gives the outgoing radiance \(L_{\text{out}}\) leaving a point y in the direction \(\omega\). Expressed as an integral over directions, it is

\begin{equation} L_{\text{out}}(y, \omega) = L_{\text{e}}(y, \omega) + \int _{S^2} L_{\text{in}}(y, \omega _i)\ \rho (y, \omega , \omega _i)\ |\text{cos}\ \theta _i|\ \text{d}\omega _i. \end{equation}

(1)

Here \(L_{\text{e}}\) is the emitted radiance, \(L_{\text{in}}(y, \omega _i)\) is the incoming radiance from the direction \(\omega _i\), \(\rho (y, \omega , \omega _i)\) is the BSDF and \(\theta _i\) is the angle between \(\omega _i\) and the surface normal at y. Absent participating media, the incident radiance \(L_{\text{in}}\) is defined recursively as \(L_{\text{in}}(y, \omega _i) = L_{\text{out}}(t(y, \omega _i), -\omega _i)\); the function \(t(y, \omega _i)\) returns the point on the closest surface from y in direction \(\omega _i\). Integrating over the sphere of directions \(S^2\) then gives the total radiance scattered toward \(\omega\); the rendering equation can be estimated with Monte Carlo as

\begin{equation} L_{\text{out}}(y, \omega) \approx L_{\text{e}}(y, \omega) + \frac{1}{N}\sum _{i=1}^N \frac{L_{\text{in}}(y, \omega _i)\ \rho (y, \omega , \omega _i)\ |\text{cos}\ \theta _i|}{p(\omega _i)}, \end{equation}

(2)

where \(p(\omega _i)\) is the probability density function (PDF) with respect to solid angle used to sample incident directions \(\omega _i\).

As in Kajiya’s formulation, sometimes it is more convenient to reformulate Equation (1) over surfaces. To keep the discussion independent of the choice of formulation, we use \(\int _{\Omega } f(x)\ \text{d}x\) to generically represent the integral we want to evaluate with \(\Omega\) as its domain. This integral can likewise be estimated using

\begin{equation} \widehat{I}_{\text{MC}} := \frac{1}{N} \sum _{i=1}^N \frac{f(x_i)}{p(x_i)}, \end{equation}

(3)

where \(x_i\) are independent random samples drawn from any source PDF p that is non-zero on the support of f. In rendering, one often draws samples proportional to individual terms of the rendering equation to reduce variance (e.g., the BSDF \(\rho\)). To perform even better importance sampling, ReSTIR instead uses RIS to draw samples approximately proportional to the product of multiple terms in the integrand (e.g., \(L_{\text{in}} \cdot \rho \cdot |\text{cos}\ \theta |\)).

We review RIS and generalized RIS next (Sections 2.1 and 2.2); Section 2.3 discusses a streaming RIS implementation via reservoir sampling. Section 2.4 then describes how correlations arise within ReSTIR due to resampling. Section 2.5 reviews the Metropolis–Hastings algorithm we use in Section 3 to resolve correlation artifacts.

2.1 Resampled Importance Sampling (RIS)

RIS [Talbot et al. 2005; Lin et al. 2022] enables unbiased estimation and sample generation from a non-negative target function \(\hat{p}\) with an unknown normalization factor \(\int _{\Omega } \hat{p}(y)\ \text{d}y\). It does so by rewriting the standard Monte Carlo estimator from Equation (3) as

\begin{equation} \frac{1}{N} \sum _{i=1}^N \frac{f(x_i)}{\hat{p}(x_i)}\ \left(\int _{\Omega } \hat{p}(y)\ \text{d}y\right). \end{equation}

(4)

The normalization factor is estimated by generating \(M \ge 1\) candidate samples \({\bf y} = \lbrace y_1, \ldots , y_M\rbrace\) from a source PDF q that may be suboptimal but easy to sample from (e.g., \(q \propto \rho\)), yielding

\begin{equation} \frac{1}{N} \sum _{i=1}^N \frac{f(x_i)}{\hat{p}(x_i)}\ \left(\frac{1}{M} \sum _{j=1}^M \frac{\hat{p}(y_j)}{q(y_j)}\right). \end{equation}

(5)

The samples \({\bf x} = \lbrace x_1, \ldots , x_N\rbrace\) in turn are selected by randomly choosing an index \(j \in \lbrace 1, \ldots , M\rbrace\), N times, from the candidate pool \({\bf y}\) with discrete probabilities:

\begin{equation} \mathbb {P}(j\ |\ {\bf y}) = \frac{w(y_j)}{\sum _{k = 1}^M w(y_k)}. \end{equation}

(6)

Here the resampling weight w for each candidate \(y_j\) is given by

\begin{equation} w(y_j) = \frac{1}{M} \hat{p}(y_j) W(y_j), \end{equation}

(7)

where \(W := 1/q(y_j)\) is called the (unbiased) contribution weight for \(y_j\). The selected samples \(x_i\) are likewise given contribution weights

\begin{equation} \widehat{W}(x_i) := \frac{1}{\hat{p}(x_i)} \left(\sum _{j=1}^M w(y_j)\right) , \end{equation}

(8)

that assume the role of a reciprocal PDF, though these weights are only unbiased estimates for elements of the resampled set \({\bf x}\). This is because the parenthesized term for the normalization factor of \(\hat{p}\) is itself an estimator that has variance. Each \(x_i\!\in \!{\bf x}\) is also distributed only approximately in proportion to \(\hat{p}\) (i.e., \(\hat{p}\) is sampled perfectly only in the limit as \(M \rightarrow \infty\)). Since we resample with replacement, the set \({\bf x}\) can contain duplicate samples, which reflects that samples are selected in proportion to \(\hat{p}\). With this setup, Talbot [2005] shows that the RIS estimator

\begin{equation} \widehat{I}_{\text{RIS}} := \frac{1}{N} \sum _{i=1}^N f(x_i) \widehat{W}(x_i) , \end{equation}

(9)

is unbiased as long as \(\hat{p}\) and q are non-zero on the support of f, i.e.,

\begin{equation} \mathbb {E}[\widehat{I}_{\text{RIS}}] = \int _{\Omega } f(x)\ \text{d}x. \end{equation}

(10)

Combining with Multiple Importance Sampling (MIS). There are often several reasonable sampling strategies available in rendering, e.g., BSDF or light sampling. MIS [Veach and Guibas 1995b] allows multiple strategies to be combined robustly within RIS [Talbot 2005]. When each candidate \(y_j\) has its own source PDF \(q_j\), then MIS weights generalize the parenthesized term in Equation (5) with

\begin{equation} \sum _{j=1}^M m_j(y_j) \frac{\hat{p}(y_j)}{q_j(y_j)}. \end{equation}

(11)

Here, \(m_j \ge 0\) is the MIS weight for the jth sampling technique. These weights must form a partition of unity, i.e., \(\sum _{j=1}^M m_j(y) = 1\). A common choice is the balance heuristic \(m_j(y) = q_j(y)/\sum _{k=1}^M q_k(y)\) [Veach and Guibas 1995b]. With MIS, the resampling weight in Equation (7) becomes:

\begin{equation} w(y_j) = m_j(y_j) \hat{p}(y_j) W(y_j), \quad \text{where}\quad W(y_j) = \frac{1}{q_j(y_j)}. \end{equation}

(12)

Notice we recover \(m_j = 1/M\) when source PDFs are the same for each sample \(y_j\). MIS weights play an important role in ReSTIR—beyond reducing noise in the resampling weights, they also remove bias when the supports of the source and target distributions do not match integrand f’s support (see Section 4 in Bitterli et al. [2020] and Section 5 in Lin et al. [2022] for further details).

In practice, using RIS with the balance heuristic is costly, as all sampling strategies (i.e., the source PDFs) must be evaluated for each candidate sample \(y_j\). Bitterli [2022, Chapter 9.1.3] provides a similarly robust but more efficient heuristic called Pairwise MIS, which only requires \(O(M)\) PDF evaluations over the entire candidate pool. We use pairwise MIS when the number of sampling strategies M is greater than 2 (e.g., during spatial resampling in ReSTIR; see Section 2.4); otherwise we use the balance heuristic.

2.2 Generalized Resampled Importance Sampling (GRIS)

So far we assumed the resampling inputs \(y_j \sim q_j\) share a common integration domain \(\Omega\) with integrand f. This assumption may no longer hold when reusing spatially or temporally across an image (as in ReSTIR), and depends on the integral formulation used for the rendering equation. For instance, ReSTIR applied to global illumination [Ouyang et al. 2021; Lin et al. 2022] generates samples from PDFs with respect to solid angle. Reuse across pixels therefore requires a change of integration domain, necessitating a correction term in the resampling weights [Ouyang et al. 2021, Equation (11)]. ReSTIR for direct lighting [Bitterli et al. 2020] instead integrates over the surface of all lights, ensuring \(\Omega\) is fixed across samples.

Recent work by Lin et al. [2022] generalizes RIS to use candidate samples \(y_j\) originating from different domains \(\Omega _j\). It achieves this via shift mapping, i.e., a bijective transformation of samples from one pixel to corresponding samples on another pixel [Lehtinen et al. 2013]. In particular, if \(\Omega\) denotes the domain of integration for f, and \(S_j: \Omega _j \rightarrow \Omega\) are shifts that map \(y_j \in \Omega _j\) to the modified sample \(y^{\prime }_j \in \Omega\), then the resampling weight for \(y_j\) becomes

\begin{equation} w(y_j) = m_j(y^{\prime }_j) \hat{p}(y^{\prime }_j) W(y_j) \cdot \left|\frac{\partial y^{\prime }_j}{\partial y_j}\right|, \end{equation}

(13)

where the Jacobian determinant \(|\partial y^{\prime }_j / \partial y_j|\) accounts for the change of integration domain from \(\Omega _j\) to \(\Omega\). (Jacobians also appear in MIS weights \(m_j\); see Appendix B). The rest of the RIS procedure in Section 2.1 remains unchanged—substituting these resampling weights to Equation (8) provides the contribution weight for the selected \(y^{\prime }_j\).

Various shift mappings have been proposed to maximize the similarity between \(y^{\prime }_j\) and \(y_j\) such that \(|\partial y^{\prime }_j / \partial y_j| \approx 1\) [Hua et al. 2019, Section 3]. We describe the shift mappings we use in Section 4.

2.3 Weighted Reservoir Sampling (WRS)

WRS [Chao 1982] facilitates efficient RIS implementations using a single pass over elements in a stream \(\lbrace y_1, \ldots , y_M\rbrace\) to select a random sample. As in Section 2.1, each stream element has an associated resampling weight w. The basic idea is to process the stream one element at a time, and to select—from the \(m \lt M\) elements processed so far—a sample \(y_j\) with probability \(w(y_j)/\sum _{k=1}^m w(y_k)\). The next stream element \(y_{m+1}\) then replaces \(y_j\) with probability \(w(y_{m+1})/\sum _{k=1}^{m+1} w(y_k)\). The stream length M need not be known ahead of time, and WRS can be used to select \(N \gt 1\) samples if needed [Wyman 2021, Chapter 22.6].

WRS reduces the storage needed for resampling to \(O(N)\). A lightweight data structure called a reservoir is typically used to process the stream and store the selected samples, the stream length M and the weight sum \(\sum _{j=1}^M w(y_j)\); see Algorithm 1.

2.4 Reservoir-based Spatiotemporal Resampling

ReSTIR applies RIS and WRS in a chained fashion within and across pixels of an image. The first key idea is to approximately importance sample multiple terms in the rendering equation’s integrand through a per-pixel target function \(\hat{p}\). The second is to reuse samples from neighboring pixels to exploit the similarity between their target functions. The algorithm performs four steps every frame:

(1)

(Initial resampling) Select N samples from a candidate pool of M samples at each pixel. Equations (12) and (8) provide the resampling and contribution weights for the candidate and selected samples respectively. A reservoir stores the selected samples and their estimated contribution weights.

(2)

(Temporal resampling) Use Algorithm 2 to reuse samples across two corresponding pixels in consecutive frames t and \(t\!-\!1\). The resampling weight for each sample is computed using the contribution weight already stored in its reservoir.

(3)

(Spatial resampling) For each pixel, select K random reservoirs from a small spatial neighborhood and merge them into the pixel’s reservoir. This is similar to Algorithm 2 and can be repeated multiple times; for reference see Bitterli et al. [2020, Algorithm 4] and Ouyang et al. [2021, Algorithm 2].

(4)

(Final shading) Use Equation (9) to compute each pixel’s color.

Spatiotemporal reuse gives each pixel access to a large population of samples from its local neighborhood. As a result, ReSTIR quickly finds samples that make large contributions to pixels, using MIS weights and shift mappings to ensure unbiased estimation within the pixel where samples are reused. Nonetheless, gains from sharing samples are not indefinite, and correlation artifacts may arise from undersampling, imperfect shift mappings, and wrongly set parameters. For instance, performing multiple rounds of spatial resampling with too small a pixel radius can lead to blotchy artifacts. This happens when RIS identifies too few samples to effectively importance sample the integrand, e.g., due to difficult-to-sample lighting. Likewise, inadequately designed shift mappings may introduce geometric singularities into a sample’s resampling weight via the Jacobian determinant, causing the sample to be widely reused.

During temporal resampling, one must cap the stream length M of a temporally reused sample (Algorithm 2, line 3) to guarantee convergence—not doing so results in convergence to a wrong result [Lin et al. 2022, Section 6.4]. Unfortunately, the ideal \(M_{\text{cap}}\) cannot always be determined in a scene-agnostic way—small caps inadequately utilize the temporal history and result in higher variance (Lin et al. [2022, Figure 9]), while large caps increase correlation. In particular, increasing \(M_{\text{cap}}\) decreases the relative weight and hence selection probability of newly proposed samples, while artificially inflating a reservoir sample’s importance. As a result, an outlier reservoir sample’s estimated contribution weight must first decay to match a pixel’s actual value. Unfortunately, the outlier may be spread between neighboring pixels before it is replaced. This can lead to visible correlation artifacts and sample impoverishment over multiple frames (see Figure 4). We use the Metropolis-Hastings algorithm, described next, to address these issues in ReSTIR.

2.5 Metropolis–Hastings (MH)

Like RIS, the MH [Metropolis et al. 1953; Hastings 1970] algorithm generates a set of samples distributed proportionally to a non-negative and possibly unnormalized target function \(\hat{p}\). While RIS uses resampling to achieve this goal, MH instead constructs a Markov chain that has a stationary distribution proportional to \(\hat{p}\). In more detail, given an initial sample \(x^0 \in \Omega\), MH incrementally constructs a sequence of random samples \(x^0, x^1, x^2, ...\) as follows:

(1)

For \(k \ge 0\), generate a candidate sample \(z^k\) by applying a random mutation to the current sample \(x^k\) in the chain, i.e., sample \(z^k\) from a proposal density \(T(x^k \rightarrow z^k)\).

(2)

Compute an acceptance probability for the candidate \(z^k\):

\begin{equation} a(x^k \rightarrow z^k) := \min \left(1, \frac{\hat{p}(z^k)\ T(z^k \rightarrow x^k)}{\hat{p}(x^k)\ T(x^k \rightarrow z^k)}\right). \end{equation}

(14)

(3)

Set \(x^{k+1} = z^k\) with probability a; otherwise set \(x^{k+1} = x^k\).

The acceptance probability \(a(x^k \rightarrow z^k)\) ensures that samples are distributed proportional to the target function \(\hat{p}\). The detailed balance condition guarantees the existence of the Markov chain’s stationary distribution by requiring the transition density between any two sample values to be equal:

\begin{equation} \hat{p}(x^k) T(x^k \rightarrow z^k) a(x^k \rightarrow z^k) = \hat{p}(z^k) T(z^k \rightarrow x^k) a(z^k \rightarrow x^k). \end{equation}

(15)

To generate the correct distribution from all inputs, Markov chains must be ergodic. This can be guaranteed easily with mutations that always propose candidate samples over the entire support of \(\hat{p}\), i.e., \(T(x^k \rightarrow z^k) \gt 0\) for all \(x^k\) and \(z^k\) where \(\hat{p}(x^k) \gt 0\) and \(\hat{p}(z^k) \gt 0\). Even with this constraint, there is still much freedom in choosing mutation strategies—Section 4 describes the strategies we use.

Unlike RIS, MH does not estimate the value of integrals. It does however produce valid samples from its target function which can be used by a secondary estimator such as RIS for estimation (Section 3).

Eliminating start-up bias. MH assumes the initial sample \(x^0\) is generated with probability density proportional to \(\hat{p}\); using a sample not from this distribution results in start-up bias. A typical solution runs the Markov chain for several iterations until the initial state is “forgotten”, i.e., discarding several early samples generated by MH. Sadly, the length of this burn-in period is tricky to determine as it depends on the initial sample value and its actual distribution. Veach [1998, Chapter 11.3.1] instead proposed resampling \(x^0\) from M candidate samples \({\bf y} = \lbrace y_1, \ldots , y_M\rbrace\) generated using an easy-to-sample source PDF (much like Section 2.1). Equations (6) and (7) then provide the discrete probabilities and resampling weights (resp.) needed to select a candidate, i.e., \(x^0 = y_j\) for some \(j \in \lbrace 1, \ldots , M\rbrace\). Contributions of mutated samples initialized from \(x^0\) are weighted by Equation (8) to guarantee unbiasedness. Our mutations likewise leverage ReSTIR’s built-in resampling to avoid start-up bias.

3 Method

Sample selection with RIS from a target distribution improves with larger populations of candidate samples. ReSTIR provides access to a sizable candidate pool for resampling through spatiotemporal reuse, enabling it to quickly identify high-contribution samples via RIS. However, at times ReSTIR extensively reuses a few samples over multiple frames due to imperfect importance sampling and suboptimal parameters, with no mechanism to easily diversify an existing population of high-contribution samples.

Inspired by Sequential and Population Monte Carlo techniques (Section 6), we interleave reservoir resampling with MCMC mutations to mitigate correlations and sample impoverishment caused by spatiotemporal reuse. Our key observation is that mutating reservoir samples with the same per-pixel target function as RIS helps to quickly decorrelate the resampled population, especially when it contains outliers. In Algorithm 3, we use Metropolis-Hastings to locally perturb temporal reservoir samples selected by Algorithm 2; interleaving with resampling then diversifies the samples ReSTIR shares between pixels. We discuss key aspects of our work next, starting with how to modify mutated samples’ contribution weights to guarantee unbiased results.

Modified contribution weights. A contribution weight \(\widehat{W}\) (Equation (8)) estimates the reciprocal value of the target PDF \(\hat{p}/\int _{\Omega } \hat{p}\) that a sample is approximately distributed according to. \(\widehat{W}\) is needed to compute resampling weights for combining reservoirs (Algorithm 2, lines 7 and 11) and to estimate per-pixel shading (Equation (9)).

Contribution weights are sample dependent. Thus, a sample that undergoes mutation cannot reuse the weight associated with its original state, i.e., a mutated sample’s contribution weight should provide an unbiased estimate for the sample’s reciprocal target PDF. Our key contribution is to show that the unbiased contribution weight for any mutated sample \(x^k\), from a Markov chain \(x^0, \ldots , x^k, \ldots\), can be computed via the relation

\begin{equation} \widehat{W}(x^k) = \frac{\hat{p}(x^0)}{\hat{p}(x^k)} \widehat{W}(x^0). \end{equation}

(16)

Equation (16) does not depend on samples between \(x^0\) and \(x^k\) in the Markov chain and imposes no constraints on computing \(\widehat{W}(x^0)\), which can arise from prior resampling, runs of MH, or a mix of the two. This provides flexibility in where and when to mutate samples during ReSTIR (as long as mutations are confined to a given pixel).

One can get an intuitive feel for Equation (16) by substituting in the expression for \(\widehat{W}(x^0)\) from Equation (8):

(17)

Notice that the estimated normalization factor for \(\hat{p}\), i.e., the sum of weights w, remains unchanged for both the initial and mutated samples \(x^0\) and \(x^k\). This normalization factor arises via RIS prior to performing mutations (e.g., Algorithm 2, lines 14–15). Meanwhile, MH treats the resampling weights as fixed, simply redistributing a reservoir’s sample population proportionally to the per-pixel target function \(\hat{p}\). Equation (16) then encodes any required correction to a sample’s contribution weight to account for the sample mutation—unlike temporal and spatial resampling which reuse samples across different pixels, this equation does not contain any MIS weight or shift mapping as there is no change of integration domain, with samples only mutated within \(\hat{p}\)’s support.

Start-up bias. Algorithm 3 does not require a burn-in period for mutations, even though the samples used to initialize MH are not distributed exactly according to \(\hat{p}\). This is because we use the unbiased contribution weights of mutated samples for subsequent steps in ReSTIR, including when computing shading and resampling weights for further reuse. This approach eliminates start-up bias completely for any mutated sample \(x^k\) and function f by ensuring

\begin{equation} \mathbb {E}[f(x^k) \widehat{W}(x^k)] = \int _{\Omega } f(x)\ \text{d}x. \end{equation}

(18)

Appendix A provides a formal proof. Note that avoiding start-up bias does not imply samples generated using MH are well-distributed according to \(\hat{p}\). However, since we initialize MH using reservoir samples that are already distributed roughly proportional to the target function from resampling, our method does not rely on MH to find important samples (see Figure 13)—rather it decorrelates and diversifies outlier samples by mutating them locally in proportion to \(\hat{p}\) and adjusting their estimated contribution weight accordingly.

When to perform mutations?. Temporal reservoirs often contain stale samples, as ReSTIR assigns higher relative importance to existing samples. We therefore mutate samples output by Algorithm 2 within each pixel (Figure 5), using the same per-pixel target function as RIS for the current frame. Mutating samples randomly after temporal resampling diversifies the inputs to spatial resampling, protecting against possibly escalating amounts of sample impoverishment caused by repeated reuse.

Applying Algorithm 3 within each pixel to mutate samples after the initial or spatial resampling steps in ReSTIR (Section 2.4) is possible but not required. Like mutations, initial resampling serves to rejuvenate the sample population every frame (by introducing new independent samples into the population). Samples from spatial resampling are stored for future reuse; mutating them proportional to the current target function would cause them to lag by one frame.

Finally, Algorithm 3 places no restrictions on MH iteration count. To improve runtime performance, one could adaptively specify mutation counts per pixel (including no mutations) using, for instance, local correlation estimates. We leave development of such heuristics to future work and use a fixed, user-specified number of iterations.

4 Implementation Details

We perform mutations for both direct and indirect illumination in ReSTIR using Kelemen et al. [2002]’s primary sample space (PSS) parameterization. This conveniently allows applying mutations directly to random number sequences used to generate light-carrying paths, while constraining path vertices to remain on the scene manifold. Moreover, it simplifies the use of certain shift mappings in ReSTIR PT, e.g., the random replay shift [Lin et al. 2022, Section 7.2].

In this section, we represent samples with a path vertex notation \(\bar{\boldsymbol {\mathbf {x}}}= [\boldsymbol {\mathbf {x}}_0, \boldsymbol {\mathbf {x}}_1, \ldots , \boldsymbol {\mathbf {x}}_k] \in \Omega ^k(\mathcal {M})\), with \(\Omega ^k(\mathcal {M})\) the space of all paths of length k on the scene manifold \(\mathcal {M}\) (e.g., \(k = 2\) for direct lighting). Each path \(\bar{\boldsymbol {\mathbf {x}}}\) is uniquely determined² by a vector of random numbers \(\bar{\boldsymbol {\mathbf {u}}}= [u_0, u_1 \ldots ] \in [0, 1]^{O(k)}\). We use S to denote a shift mapping from a base path \(\bar{\boldsymbol {\mathbf {x}}}\) in one pixel to an offset path \(\bar{\boldsymbol {\mathbf {y}}}\) in another pixel, i.e., \(S([\boldsymbol {\mathbf {x}}_0, \boldsymbol {\mathbf {x}}_1, \ldots , \boldsymbol {\mathbf {x}}_k]) = [\boldsymbol {\mathbf {y}}_0, \boldsymbol {\mathbf {y}}_1, \ldots , \boldsymbol {\mathbf {y}}_k]\). Mutated paths and random numbers are represented using \(\bar{\boldsymbol {\mathbf {z}}}\) and \(\bar{\boldsymbol {\mathbf {v}}}\) (resp.).

4.1 Primary Sample Space

The PSS parameterization reformulates the acceptance probability in Equation (14) in terms of a contribution function C as follows:

\begin{equation} a(\bar{\boldsymbol {\mathbf {u}}}\rightarrow \bar{\boldsymbol {\mathbf {v}}}) := \min \left(1, \frac{C(\bar{\boldsymbol {\mathbf {v}}})\ T(\bar{\boldsymbol {\mathbf {v}}}\rightarrow \bar{\boldsymbol {\mathbf {u}}})}{C(\bar{\boldsymbol {\mathbf {u}}})\ T(\bar{\boldsymbol {\mathbf {u}}}\rightarrow \bar{\boldsymbol {\mathbf {v}}})}\right). \end{equation}

(19)

For us \(C(\bar{\boldsymbol {\mathbf {u}}}) := \hat{p}(\bar{\boldsymbol {\mathbf {y}}}(\bar{\boldsymbol {\mathbf {u}}})) / q(\bar{\boldsymbol {\mathbf {y}}}(\bar{\boldsymbol {\mathbf {u}}}))\), where \(\hat{p}\) is the per-pixel target function (also used for resampling) and q is the sampling PDF for generating \(\bar{\boldsymbol {\mathbf {y}}}\) from the random numbers \(\bar{\boldsymbol {\mathbf {u}}}\) ³ (with mutated path \(\bar{\boldsymbol {\mathbf {z}}}\) likewise generated from \(\bar{\boldsymbol {\mathbf {v}}}\)). As suggested by Kelemen et al. [2002], we compute \(\bar{\boldsymbol {\mathbf {v}}}\) by perturbing each element of \(\bar{\boldsymbol {\mathbf {u}}}\) with Gaussian noise. We use \(s = s_2\exp (-\log (s_2/s_1)U)\) as our perturbation amount with \(U \sim [0, 1)\) and \(s \in (s_1, s_2]\).

4.2 Direct Lighting

Our ReSTIR DI mutations perturb the directions of reservoir samples via their random numbers. For direct lighting, path \(\bar{\boldsymbol {\mathbf {y}}}= [\boldsymbol {\mathbf {y}}_0, \boldsymbol {\mathbf {y}}_1, \boldsymbol {\mathbf {y}}_2]\) and its PDF \(q(\bar{\boldsymbol {\mathbf {y}}})\) equals \(p_{\rho }(\omega) |\text{cos}\ \theta | / |\boldsymbol {\mathbf {y}}_2 - \boldsymbol {\mathbf {y}}_1|^2\), where \(p_{\rho }\) is the PDF for importance sampling the BSDF \(\rho\), \(\omega\) is the unit vector from \(\boldsymbol {\mathbf {y}}_1\) to \(\boldsymbol {\mathbf {y}}_2\), and \(\theta\) is the angle between \(\omega\) and the geometric surface normal at \(\boldsymbol {\mathbf {y}}_2\). The PDF \(q(\bar{\boldsymbol {\mathbf {z}}})\) is defined analogously as \(p_{\rho }(\nu) |\text{cos}\ \phi | / |\boldsymbol {\mathbf {z}}_2 - \boldsymbol {\mathbf {y}}_1|^2\) with direction \(\nu\) pointing from \(\boldsymbol {\mathbf {y}}_1\) to mutated vertex \(\boldsymbol {\mathbf {z}}_2\); \(\phi\) is the angle between \(\nu\) and the surface normal at \(\boldsymbol {\mathbf {z}}_2\). Random numbers for the starting MH sample \(\boldsymbol {\mathbf {y}}_2\) are recovered by inverting the sampling procedure for direction \(\omega\) [Bitterli et al. 2017]. Since this mutation is symmetric, the transition kernels in Equation (19) cancel.

4.3 Indirect Illumination

For ReSTIR PT, our mutation strategies are build on shift maps. Unlike a mutation, a shift mapping deterministically perturbs a base path \(\bar{\boldsymbol {\mathbf {x}}}\) through one pixel into an offset path \(\bar{\boldsymbol {\mathbf {y}}}\) through another pixel during resampling (e.g., Algorithm 2, line 9). For instance, a random replay (RR) shift reuses the random numbers that generate \(\bar{\boldsymbol {\mathbf {x}}}\) to trace \(\bar{\boldsymbol {\mathbf {y}}}\). Since tracing a full path is expensive, a reconnection is often used to connect the offset path to the base path at a given index i, i.e., \(\boldsymbol {\mathbf {y}}_{j} = \boldsymbol {\mathbf {x}}_{j}\) for \(j \ge i\). Connecting paths immediately with \(i = 2\) is called the reconnection (R) shift. Compared to random replay, reconnections are often better at producing paths with similar contributions for diffuse surfaces. But reconnecting \(\boldsymbol {\mathbf {y}}_{i-2}, \boldsymbol {\mathbf {y}}_{i-1}\) to \(\boldsymbol {\mathbf {x}}_i\) on a glossy surface can introduce paths with near-zero throughput, or introduce geometric singularities when \(\boldsymbol {\mathbf {y}}_{i-1}\) and \(\boldsymbol {\mathbf {x}}_i\) are too close.

We use Lin et al.’s [2022]hybrid (H) shift strategy (see Figure 6) to evaluate mutations in ReSTIR PT for all our results except Figure 8 where we use random replay. This shift mapping postpones reconnection using random replay until certain connectability conditions are met (e.g., surface roughness and distance between vertices).

Mutation strategies. As with direct lighting, one way to mutate a path is to perturb the random numbers used to generate it. Like a random replay shift, this approach expensively requires tracing a full path for each proposed mutation (which may be rejected). We refer to this mutation as a full path (FP) mutation.

A more computationally efficient approach mutates the offset path with random replay up to the reconnection vertex \(\boldsymbol {\mathbf {y}}_i = \boldsymbol {\mathbf {x}}_i\), and then connects to the base path starting at \(\boldsymbol {\mathbf {x}}_{i+1}\) instead (the full path is mutated if a reconnection is not possible). We observe that this partial path (PP) mutation strategy is not only faster, but also has higher acceptance (\(70\%\) vs. \(40\%\) on the scene from Figure 1) as it minimizes changes to the geometry of high-contribution paths selected via resampling. Moreover, its paths have similar contributions to the offset paths it mutates. Note that mutating path vertices with random replay until the reconnection to \(\boldsymbol {\mathbf {x}}_{i+1}\) can cause connectability conditions for the hybrid shift to fail. We reject such mutated samples by defining their transition PDF to be 0.

Taking a step further, our final strategy mutates only the reconnection vertex \(\boldsymbol {\mathbf {y}}_i\) (Figure 6) while keeping the rest of the offset path unchanged, i.e., \([\boldsymbol {\mathbf {z}}_0, \ldots , \boldsymbol {\mathbf {z}}_k] =\) \([\boldsymbol {\mathbf {y}}_0, \boldsymbol {\mathbf {y}}_1, \ldots , \boldsymbol {\mathbf {y}}_{i-1}, \boldsymbol {\mathbf {z}}_i, \boldsymbol {\mathbf {x}}_{i+1}, \ldots , \boldsymbol {\mathbf {x}}_k]\), where \(\boldsymbol {\mathbf {y}}_{i-1}\) connects to \(\boldsymbol {\mathbf {z}}_i\) with mutated random numbers. We found this reconnection vertex (RV) mutation only slightly less effective at reducing correlations. It is, however, significantly faster when performing multiple mutations, as only rays from \(\boldsymbol {\mathbf {y}}_{i-1}\) to \(\boldsymbol {\mathbf {z}}_i\) and \(\boldsymbol {\mathbf {z}}_i\) to \(\boldsymbol {\mathbf {x}}_{i+1}\) need to be traced. We use this mutation to generate results in Section 5, unless noted otherwise. Figure 10 compares the effectiveness of these mutation strategies.

Finally, note that the transition kernels \(T(\bar{\boldsymbol {\mathbf {v}}}\rightarrow \bar{\boldsymbol {\mathbf {u}}})\) and \(T(\bar{\boldsymbol {\mathbf {u}}}\rightarrow \bar{\boldsymbol {\mathbf {v}}})\) are no longer symmetric when offset paths contain a reconnection vertex. In Appendix C, we show that their ratio equals:

\begin{equation} \frac{T(\bar{\boldsymbol {\mathbf {v}}}\rightarrow \bar{\boldsymbol {\mathbf {u}}})}{T(\bar{\boldsymbol {\mathbf {u}}}\rightarrow \bar{\boldsymbol {\mathbf {v}}})} = \frac{|\text{cos}\ \phi |}{|\text{cos}\ \theta |} \frac{|\boldsymbol {\mathbf {y}}_{i+1} - \boldsymbol {\mathbf {y}}_{i}|^2}{|\boldsymbol {\mathbf {y}}_{i+1} - \boldsymbol {\mathbf {z}}_{i}|^2} \frac{p(\nu _{i-1}, \nu _i)}{p(\omega _{i-1}, \omega _i)} \frac{p(\nu _i, \omega _{i+1})}{p(\omega _i, \omega _{i+1})}, \end{equation}

(20)

where \(\omega _{i-1}, \omega _i\), and \(\omega _{i+1}\) are unit vectors from \(\boldsymbol {\mathbf {y}}_{i-1}\) to \(\boldsymbol {\mathbf {y}}_i\), \(\boldsymbol {\mathbf {y}}_i\) to \(\boldsymbol {\mathbf {y}}_{i+1} (= \boldsymbol {\mathbf {x}}_{i+1})\) and \(\boldsymbol {\mathbf {y}}_{i+1}\) to \(\boldsymbol {\mathbf {y}}_{i+2} (= \boldsymbol {\mathbf {x}}_{i+2})\) respectively, \(\nu _{i-1}\) and \(\nu _i\) are unit vectors from \(\boldsymbol {\mathbf {y}}_{i-1}\) to \(\boldsymbol {\mathbf {z}}_i\) and \(\boldsymbol {\mathbf {z}}_i\) to \(\boldsymbol {\mathbf {y}}_{i+1}\), \(\theta\) is the angle between \(\omega _{i}\) and the surface normal at \(\boldsymbol {\mathbf {y}}_{i+1}\), \(\phi\) is the angle between \(\nu _{i}\) and the surface normal at \(\boldsymbol {\mathbf {y}}_{i+1}\), and p is the solid angle PDF used to sample an outgoing direction. Any mutations applied to random numbers for the subpath \([\boldsymbol {\mathbf {y}}_0, \boldsymbol {\mathbf {y}}_1, \ldots , \boldsymbol {\mathbf {y}}_{i-1}]\) do not factor in the ratio as they are symmetric.

Reservoir storage. Lin et al. [2022, Section 8.2] note ReSTIR PT stores additional data in the reservoir from Algorithm 1, specifically a seed for random replay and the resampled path’s reconnection vertex. For the full and partial path mutation strategies, we need the path’s entire random number sequence since PSS mutations transform this sequence—as a result, the sequence cannot be regenerated from its original seed. This increases the reservoir size as path length grows. Luckily, the reconnection vertex mutation avoids this overhead, only mutating random numbers that sample \(\boldsymbol {\mathbf {z}}_i\) from the fixed offset vertex \(\boldsymbol {\mathbf {y}}_{i-1}\). As in ReSTIR DI, we recover random numbers for \(\boldsymbol {\mathbf {y}}_i\) by inverting the sampling of direction \(\boldsymbol {\mathbf {y}}_i - \boldsymbol {\mathbf {y}}_{i-1}\). The only additional information we store is the offset vertex \(\boldsymbol {\mathbf {y}}_{i+1}\) (which connects to mutated vertex \(\boldsymbol {\mathbf {z}}_i\)).

5 Results and Discussion

We prototyped our method in the open-source Falcor rendering framework [Kallweit et al. 2022]. All results use a GeForce RTX 3090 GPU at 1920 \(\times\) 1080 resolution. Our direct lighting implementation uses the same settings as Bitterli et al. [2020], i.e., initial candidate samples \(M = 32\), spatial reuse radius of 30 pixels from the current pixel, and \(M_\text{cap} = 20\). For indirect illumination, we set \(M = 32\) and the spatial reuse radius to 20 similar to Lin et al. [2022], but use a longer temporal history with \(M_{\text{cap}} = 50\) (unless noted otherwise). Our supplementary video shows 1 spp results for all our scenes; Table 1 gives single frame timings.

As we show in Figure 1 and Figures 7–12, short-range correlation artifacts are noticeably reduced in scenes with glossy materials and difficult lighting with just 1–5 mutations; further mutations have diminishing returns in improving image quality (Figures 9 and 10). Mutation cost overhead is generally less than simply increasing sample count (Figure 7), and recent denoisers [NVIDIA 2017] provide considerably better results with our decorrelated samples (Figure 1). Figure 8 shows mutations greatly reduce sample impoverishment, with fewer reservoirs sharing the exact same sample realizations.

Compared to standard path tracing, ReSTIR is much faster at achieving equal-error via correlated sampling for real-time direct [Bitterli et al. 2020, Figure 8] and global illumination [Lin et al. 2022, Figure 13]. Mutations however provide only marginal improvements in mean squared error in ReSTIR samplers (Figures 9 and 13), without ever negatively impacting results. Akin to blue-noise dithering [Georgiev and Fajardo 2016; Heitz and Belcour 2019], our image quality improves despite errors having similar magnitudes. The reason is mutating within a pixel leaves the sum of resampling weight unchanged in Equation (17), and these weights ultimately control RIS estimator variance (Equation (9)). Mutations do slightly reduce variance, as they indirectly alter resampling weights of future samples thanks to spatiotemporal reuse of the new, more diverse sample population; the supplementary document has more details. In Figures 11 and 12 we also ablate \(M_{\text{cap}}\) values to show the greater leeway our approach offers for this parameter, allowing the use of larger values to trade noise for correlation.

Since ReSTIR often suffers from correlation artifacts, we quantify improvements in correlation by computing sample covariance between pixels, which naturally generalizes sample variance. This metric measures the joint variability of two random variables (e.g., whether the error in two pixels varies similarly). For pixels i and j in image I, sample covariance \(\text{Cov}(i, j)\) between i and j is given by

\begin{equation} \text{Cov}(i, j) = \frac{1}{K-1} \sum _{k=1}^K (I_{ki} - \bar{I}_{i})(I_{kj} - \bar{I}_{j}), \end{equation}

(21)

where K is the number of images used to estimate covariance (we use \(K = 100\)), and \(\bar{I}\) is the average of K images. To capture the joint variability of a pixel with its local neighborhood, in our experiments, we average covariance estimates over boxes of a given radius centered at each pixel. We then further average over the entire image to get a single number. Figure 9 (bottom left) shows average radial covariance decreases with increasing spatial radius. This is expected as ReSTIR only reuses samples in local neighborhoods (so small-scale correlation artifacts are more pronounced); mutations reduce covariance in these short ranges.

Table 1 lists the reduction in average covariance (with pixel radius equal to 8) and the FLIP weighted median score [Andersson et al. 2020] we observe on our scenes—FLIP tends to be marginally more sensitive to short range correlations than mean squared error. As correlations are typically localized, the reduction is even larger for the image insets in our figures compared to the results in Table 1. Ineffective shift mappings in ReSTIR often result in increased correlations; mutations compensate for this shortcoming. For instance, mutations typically have fewer correlation artifacts to resolve with a hybrid shift in ReSTIR PT compared to, e.g., random replay (Figures 13 and 8 respectively), which highlights the benefit of using good shift mappings. In contrast, mutations provide greater covariance reduction in the scenes rendered with ReSTIR DI (Figure 7), where higher covariance stems from vertex reconnections failing to preserve path contributions for low roughness surfaces.

Why mutations help?. The supplemental document details why mutations reduce covariance, simplifying down to the following, somewhat unintuitive, phenomenon: without mutations, covariance between pixels i and j stems from the mismatch between the distributions of input samples and the target functions at i and j (Equation (16) in the supplemental). However, in the limit of infinite mutations, covariance is determined by each sample’s mismatch with its own pixel’s target function (Equation (10) in the supplemental) due to the ratio \(\hat{p}(x^0)/\hat{p}(x^k)\) in the mutated contribution weight (Equation (16)); this mismatch tends to be smaller. Though our analysis predicts that covariance does not vanish completely even with infinite mutations, our results show covariance is often reduced with just one mutation.

6 Related Work

Our method builds directly on the recent ReSTIR family of algorithms for real-time direct [Bitterli et al. 2020] and global illumination [Ouyang et al. 2021; Lin et al. 2021;, 2022]. We augment spatiotemporal reservoir resampling in ReSTIR with sample mutations, and demonstrate the complementary strengths of resampling and mutations in this framework. In graphics, our approach is most closely related to Metropolis Light Transport (MLT) [Veach and Guibas 1997] and associated techniques [Kelemen et al. 2002; Jakob and Marschner 2012; Lehtinen et al. 2013; Hachisuka et al. 2014; Otsu et al. 2018; Cline et al. 2005; Lai et al. 2007;, 2009; Bashford-Rogers et al. 2021]. In the broader Monte Carlo landscape, our approach belongs to the class of algorithms that jointly use resampling and mutations for sampling problems, such as SMC [Doucet et al. 2001] and PMC [Cappé et al. 2004]. We discuss the relation to MLT, SMC and PMC in more detail next; Table 2 provides a summary. We refer the reader to Bitterli et al. [2020, Section 7] and Lin et al. [2022, Section 9.3] for comparisons between ReSTIR and other rendering algorithms that exploit path reuse and spatial correlations.

Metropolis Light Transport. MLT uses statistically correlated samples generated by Metropolis–Hastings to solve the rendering equation. Unlike algorithms using independent samples, MLT is effective at finding difficult light paths by locally exploring the path space. It reuses samples by mutating high-contribution paths over the image. Algorithmically, our method resembles MLT in various ways. Both techniques require secondary estimators, respectively RIS and bidirectional path tracing (BDPT) [Lafortune and Willems 1993; Veach and Guibas 1995a], to normalize the MH target function. Samples used by these estimators are resampled into a smaller set to initialize MH (our Section 3 and Veach [1998, Chapter 11.3.1]), and contributions of mutated samples are effectively weighted by the same weights (Equation (17)) to remain unbiased (our Appendix A and Veach [1998, Appendix 11. A]).

The crucial difference between our work and MLT lies in how samples are reused across pixels. MLT latches onto high-contribution paths and mutates them over the entire image while resampling just to eliminate start-up bias. Thus, MLT results often contain correlation artifacts caused by mutations, applying MH to both find important samples and redistribute them between pixels. In contrast, ReSTIR derives spatiotemporal reuse exclusively from resampling; in this article, we mutate samples within each pixel to mitigate correlations and sample impoverishment from spatiotemporal resampling. As a result, our method does not require numerous MH iterations, as the primary purpose of mutations is not finding important paths (Figures 12 and 13). Further, our approach suits real-time rendering as it integrates seamlessly into ReSTIR. MLT can be adapted to mutate temporally, but unlike our work, the entire animated sequence must be available in advance [Van de Woestijne et al. 2017].

Several features have recently been added to MLT, including sample stratification [Cline et al. 2005; Gruson et al. 2020], MIS [Hachisuka et al. 2014] and enhanced mutation strategies [Jakob and Marschner 2012; Kaplanyan et al. 2014; Bitterli et al. 2017; Otsu et al. 2018; Luan et al. 2020]. Though we mostly employ simple PSS mutations [Kelemen et al. 2002], many of these improvements can also be incorporated into our approach to reduce correlations faster.

Sequential Monte Car- lo. SMC is a family of Monte Carlo methods used for filtering and tracking problems in Bayesian inference and signal processing [Doucet et al. 2001]. As shown in the inset, the goal is maintaining a population of weighted samples distributed roughly proportional to an evolving target distribution (with unknown normalization factor). Sample weights are adjusted every iteration to reflect each sample’s importance to the most recent distribution. Resampling discards samples with low weights and duplicates those with high weights. Mutations ensure the population does not contain identical samples. Unlike ReSTIR, which uses RIS, SMC methods use weighted importance sampling (WIS) to estimate correlated integrals in a chained fashion, i.e., the current step’s sample weights and normalization factors are defined incrementally based on corresponding quantities from earlier steps [Del Moral et al. 2006]. This allows temporally reusing samples for estimation, instead of generating new samples every frame.

SMC methods have found limited use in rendering: Hedman et al. [2016] maintain a temporally coherent distribution of VPLs for indirect illumination via a sampling method that moves as few of the VPLs between frames as possible. This work is only loosely inspired by SMC, in that it does not perform any sample mutations and generates biased results as it discards VPLs heuristically.

Ghosh et al. [2006] instead sample a sequence of per-pixel target functions for direct illumination of dynamic environment maps. Unlike ReSTIR, which derives its samples from spatiotemporally neighboring pixels, Ghosh et al. [2006] maintain a fixed sample population per pixel that is resampled and mutated to be updated for each frame. Large populations are needed for effective importance sampling, as high-contribution samples are not shared spatially between pixels; in contrast, ReSTIR often stores just a single sample per reservoir. SMC methods likely require MIS weights and shift mappings (like ReSTIR) to resolve bias and correctly derive effective spatiotemporal reuse from neighbors. Similar to our work, mutations mitigate sample impoverishment but do not provide reuse.

Population Monte Carlo. PMC methods also couple resampling and mutations to distribute weighted samples in proportion to a sequence of target functions [Cappé et al. 2004]. The main added feature is they sample using parametric mixture models with simple source PDFs. Mixture probabilities are tuned for each target function using previously generated samples and their importance.

In rendering, the PMC framework has been used for direct lighting [Fan et al. 2007; Lai et al. 2015], global illumination [Lai and Dyer 2007; Lai et al. 2007], and animation [Lai et al. 2009]. Lai et al.’s [2009] work is most relevant to ours: they derive sample reuse by mutating samples spatially and temporally across the image plane using Energy Redistribution Path Tracing (ERPT) [Cline et al. 2005]. Resampling serves to select high-contribution samples while discarding those with small weights; it is also used to refresh the sample population (much like initial resampling in ReSTIR) and eliminate start-up bias from mutations. Unlike our method, they require knowing animated sequences in advance, precluding most real-time applications.

7 Limitations and Future Work

In this article, we provide an unbiased mechanism leveraging MCMC mutations to diversify ReSTIR’s sample population. Often, just a single mutation per pixel effectively mitigates correlation artifacts in glossy scenes with complex lighting. However, as in most MCMC schemes, we cannot accurately predict the number of Metropolis–Hastings iterations needed to reduce correlations below a given threshold. Beyond the analysis in the supplemental document, further investigation is also needed to understand how mutations address sample impoverishment in ReSTIR—not just in terms of the number of duplicate samples (Figure 8), but also the discrepancy characteristics of the resulting sample population.

Mutations in ReSTIR have a non-negligible run-time overhead. Though we demonstrate improvements on an equal-time covariance metric with simple mutation strategies in both ReSTIR DI and PT (Figure 7 and Table 1), more sophisticated mutations [Jakob and Marschner 2012; Kaplanyan et al. 2014; Bitterli et al. 2017; Otsu et al. 2018; Luan et al. 2020] could provide further gains. Our decision to mutate only after temporal (but not spatial) resampling is informed in part by run-time considerations. As mentioned in Section 3, applying mutations selectively (i.e., not at each pixel every frame) based on local correlation heuristics could improve performance. As both mutations and ReSTIR’s initial path candidates serve to rejuvenate the sample population, it may also be worthwhile to carefully balance costs of per-pixel mutations and new path candidates.

Our proposed sample mutations reduce the correlation between nearby pixels, leading to an error distribution (likely) closer to white noise. But blue noise error distributions are often superior with respect to human perception [Mitchell 1987]; perhaps our mutations could be modified to more directly optimize for blue noise characteristics. For example, when deciding mutation acceptance, we might consider both the target function and the neighboring pixel samples, preferring mutations that introduce differing sample values. A further improvement might apply the insights of Heitz and Belcour [2019] to optimize the image-space distribution of error rather than solely considering the sample values.

Like Metropolis Light Transport, mutating samples across pixels potentially unlocks further amortization by sharing samples over the entire image (e.g., using the expected values technique [Veach 1998, Section 11.5]). We leave such “cross-pixel” mutations to future work as the change of integration domain from one pixel to another requires a further adjustment to a mutated sample’s contribution weight beyond Equation (16), i.e., as with temporal and spatial resampling, MIS weights and shift mappings are likely needed to correctly mutate a sample in proportion to the target function of a neighboring pixel whose support does not coincide entirely with its own. Alternatively, it is possible that the replica exchange approach adopted by Gruson et al. [2020] enables swapping of independent Markov chains from separate pixels in our framework as well.

Our mathematical presentation of resampling for ReSTIR in Section 2 is essentially unchanged from that of Lin et al. [2021] for fast volume rendering. Consequently, extending their approach to incorporate mutations from Section 3 likely requires no further modifications to the framework for volumes beyond the specific choice of mutation strategies, such as the scattering and propagation perturbations proposed by Pauly et al. [2000, Section 5.1].

More generally, by augmenting ReSTIR with mutations, our work establishes a closer correspondence between the RIS-based resampling techniques developed in graphics, and those in the broader statistics literature such as SMC and PMC. In particular, our approach stands to benefit from techniques such as annealed importance sampling [Neal 2001] used in SMC to reduce variance in the resampling weights [Ghosh et al. 2006, Section 4], as well as from adaptation strategies for mutation kernels developed in PMC to increase acceptance rates [Lai et al. 2007, Section 4.2]. Moreover, as in these fields, mutations in ReSTIR open the door not just to artifact-free integration (of the rendering equation), but also to tracking and filtering problems—for instance using well-distributed sample populations generated by our approach as training data for path guiding [Müller et al. 2017; Müller et al. 2019].

Acknowledgments

The authors thank Aaron Lefhon, Bill Dally and Keenan Crane for supporting this work. This work was generously supported by an NVIDIA Graduate Fellowship for the first author during his graduate studies at Carnegie Mellon University.

Footnotes

acronym for Reservoir-based Spatio-Temporal Importance Resampling.

As in Bitterli et al. [2017], we bijectively map between paths and their random numbers by padding paths with extra dimensions.

The starting unmutated path \(\bar{\boldsymbol {\mathbf {y}}}\) for MH could have been generated in ReSTIR from one of many sampling schemes (e.g., light or BSDF sampling), or over multiple rounds of resampling. Here, we do not require the random numbers \(\bar{\boldsymbol {\mathbf {u}}}\) that originally generated \(\bar{\boldsymbol {\mathbf {y}}}\); Sections 4.2 and 4.3 discuss the \(\bar{\boldsymbol {\mathbf {u}}}\) we use for mutations.

Supplementary Material

tog-22-0081-File004 (tog-22-0081-file004.m4v)

Supplementary video

Download
112.89 MB

3629166.supp (3629166.supp.pdf)

Supplementary material

Download
87.37 KB

A Unbiased Contribution Weights and Elimination of Startup Bias

Equation (16) shows how to update the contribution weight \(\widehat{W}(x^k)\) of a mutated sample \(x^k\) from the Markov chain \(x^0, \ldots , x^k, \ldots\), generated with target function \(\hat{p}\). Here we prove this rule yields an unbiased contribution weight for any mutated sample \(x^k\), i.e., for any f with the same or smaller support,

\begin{equation} \mathbb {E}[f(x^k) \widehat{W}(x^k)] = \int _{\Omega } f(x)\ \text{d}x. \end{equation}

(22)

We assume sample \(x^0\) initializing the chain has the same support \(\Omega\) as target function \(\hat{p}\). For us, this is guaranteed by chained applications of RIS with a valid shift map in Algorithm 2. Any \(x^0\) chosen by RIS is not distributed exactly proportional to \(\hat{p}\) (unless we have infinite samples), however, its contribution weight \(\widehat{W}(x^0)\) is unbiased and satisfies Equation (22) [Lin et al. 2022]. Next, we show access to \(\widehat{W}(x^0)\) is sufficient to eliminate any startup bias with MH.

Proof. To show Equation (22) holds, we first express the update rule for contribution weight \(\widehat{W}(x^k)\) (for \(k \gt 0\)) in terms of the previous sample \(x^{k-1}\) in the chain, as follows:

\begin{equation} \widehat{W}(x^k) = \frac{\hat{p}(x^{k-1})}{\hat{p}(x^k)} \widehat{W}(x^{k-1}). \end{equation}

(23)

This is equivalent to Equation (16), shown by recursively unfolding this relationship for all prior samples \(x^{k-1}\) to \(x^1\) in the chain. As in Section 2.5, we also assume a candidate mutation to \(x^{k-1}\) is generated using the proposal density \(T(x^{k-1} \rightarrow z^{k-1})\), with acceptance probability for candidate \(z^{k-1}\) given by Equation (14):

\begin{equation} a(x^{k-1} \rightarrow z^{k-1}) := \min \left(1, \frac{\hat{p}(z^{k-1})\ T(z^{k-1} \rightarrow x^{k-1})}{\hat{p}(x^{k-1})\ T(x^{k-1} \rightarrow z^{k-1})}\right). \end{equation}

(24)

Metropolis–Hastings sets \(x^k = z^{k-1}\) with probability a; otherwise \(x^k = x^{k-1}\). This lets us rewrite the expectation in Equation (22):

\begin{align} \mathbb {E}[f(x^k) \widehat{W}(x^k)] &= \mathbb {E}\left[f(x^k) \frac{\hat{p}(x^{k-1})}{\hat{p}(x^k)} \widehat{W}(x^{k-1})\right]\nonumber\\ &= \mathbb {E}\left[f(z^{k-1}) a(x^{k-1} \rightarrow z^{k-1}) \frac{\hat{p}(x^{k-1})}{\hat{p}(z^{k-1})} \widehat{W}(x^{k-1})\right]\nonumber\\ &\quad +\ \mathbb {E}[f(x^{k-1}) (1 - a(x^{k-1} \rightarrow z^{k-1})) \widehat{W}(x^{k-1})]. \end{align}

(25)

Rearranging the terms slightly yields

\begin{align} &\mathbb {E}[f(x^k) \widehat{W}(x^k)] = \mathbb {E}[f(x^{k-1}) \widehat{W}(x^{k-1})] \nonumber\\ &\quad + \mathbb {E}\left[f(z^{k-1}) a(x^{k-1} \rightarrow z^{k-1}) \frac{\hat{p}(x^{k-1})}{\hat{p}(z^{k-1})} \widehat{W}(x^{k-1})\right] \nonumber\\ &\quad - \mathbb {E}[f(x^{k-1}) a(x^{k-1} \rightarrow z^{k-1}) \widehat{W}(x^{k-1})]. \end{align}

(26)

We now write each expectation as an integral. First, assume an inductive hypothesis \(\mathbb {E}[f(x^{k-1}) \widehat{W}(x^{k-1})]\!=\! \int _{\Omega } f(x)\ \text{d}x\) for the \(k\!-\!1^{\textrm {st}}\) MH iteration. Base case \(k\!=\!1\) holds trivially, as \(\widehat{W}(x^0)\) is an unbiased contribution weight. Next, note that for any integrable function \(g(x^{k-1}, z^{k-1})\), its expectation \(\mathbb {E}[g(x^{k-1}, z^{k-1}) \widehat{W}(x^{k-1})]\) can be rewritten as a conditional expectation over candidate mutations:

\begin{align} &\mathbb {E}[g(x^{k-1}, z^{k-1}) \widehat{W}(x^{k-1})] = \mathbb {E}[E[g(x^{k-1}, z^{k-1}) \mid x^{k-1}]\ \widehat{W}(x^{k-1})] \nonumber\\ &\quad = \mathbb {E}\left[\left(\int _{\Omega } g(x^{k-1}, z) T(x^{k-1} \rightarrow z)\ \text{d}z\right) \widehat{W}(x^{k-1})\right], \end{align}

(27)

where T is the proposal density used for mutations. This lets us expand out Equation (26) as follows:

\begin{align} \mathbb {E}[f(x^k) \widehat{W}(x^k)] &= \int _{\Omega } f(x)\ \text{d}x \nonumber\\ &\quad + \int _{\Omega } \int _{\Omega } f(z) a(x \rightarrow z) \frac{\hat{p}(x)}{\hat{p}(z)} T(x \rightarrow z)\ \text{d}z\ \text{d}x \nonumber\\ &\quad -\int _{\Omega } \int _{\Omega } f(x) a(x \rightarrow z) T(x \rightarrow z)\ \text{d}z\ \text{d}x. \end{align}

(28)

Finally, we show that the two double integrals cancel each other out (resulting in \(\mathbb {E}[f(x^k) \widehat{W}(x^k)] = \int _{\Omega } f(x)\ \text{d}x\)) by invoking the detailed balance condition from Equation (15):

\begin{equation} \hat{p}(x) T(x \rightarrow z) a(x \rightarrow z) = \hat{p}(z) T(z \rightarrow x) a(z \rightarrow x) \end{equation}

(29)

and rewriting it as

\begin{equation} T(x \rightarrow z) a(x \rightarrow z) = \frac{\hat{p}(z)}{\hat{p}(x)} T(z \rightarrow x) a(z \rightarrow x). \end{equation}

(30)

Substituting for \(T(x \rightarrow z) a(x \rightarrow z)\) in the third line of Equation (28) yields the same integral as the second line, but with integration variables x and z swapped. Renaming x and z and swapping the integration order in the third line allows cancellation, simplifying to \(\int _{\Omega } f(x)\ \text{d}x\), yielding Equation (22), and giving a proof by induction.

B MIS Weights for Temporal Reuse

ReSTIR uses MIS weights during resampling (Equations (12) and (13)) to mitigate noise and bias from reusing samples across pixels. We provide explicit expressions for the MIS weights used in Algorithm 2 here; Lin et al. [2022, Equations (37)–(38)] provide similar expressions for Pairwise MIS weights needed for spatial resampling.

Let \(S_j\!:\!\Omega _j\!\rightarrow \!\Omega _i\) denote the shift map from pixel j to pixel i. Let \(x_i\) and \(x_j\) further represent the corresponding samples for these pixels, and \(S_j(x_j)\!=\!x_j^{\prime }\) and \(S_j^{-1}(x_i)\!=\!x_i^{\prime }\) the respective shift mapped values. The MIS weights for \(x_i\) and \(x_j^{\prime }\) are then given by

\begin{align} m_i(x_i) = \frac{M_i\ \hat{p}_i(x_i)}{M_i\ \hat{p}_i(x_i) + M_j\ \hat{p}_j(x_i^{\prime })\ |\partial x_i^{\prime } / \partial x_i|}, \end{align}

(31)

\begin{align} m_j(x_j^{\prime }) = \frac{M_j\ \hat{p}_j(x_j)\ |\partial x_j/\partial x_j^{\prime }|}{M_j\ \hat{p}_j(x_j)\ |\partial x_j/\partial x_j^{\prime }| + M_i\ \hat{p}_i(x_j^{\prime })}. \end{align}

(32)

We set \(m_i(x_i) = 1\) and \(m_j(x_j^{\prime }) = 0\) when valid shifts do not exist for \(x_i\) and \(x_j\) (resp.); Lin et al. [2022, Section 5.6] discusses properties of these MIS weights in detail.

C Transition Kernel for Mutating A Reconnection Vertex

A mutation involving a reconnection vertex \(\boldsymbol {\mathbf {y}}_i\) requires modifying random numbers not just for \(\boldsymbol {\mathbf {y}}_i\), but also for the non-mutated vertices \(\boldsymbol {\mathbf {y}}_{i+1}\) and \(\boldsymbol {\mathbf {y}}_{i+2}\). This is because the solid angle PDFs used to sample outgoing directions \(\nu _{i}\) and \(\omega _{i+1}\) in Section 4.3 depend on the mutated incoming directions \(\nu _{i-1}\) and \(\nu _{i}\) (resp.). Here we derive Equation (20) by first noting the joint PDF for connecting the mutated reconnection vertex \(\boldsymbol {\mathbf {z}}_i\) to \(\boldsymbol {\mathbf {z}}_{i+1}\) and \(\boldsymbol {\mathbf {z}}_{i+1}\) to \(\boldsymbol {\mathbf {z}}_{i+2}\) in the surface area measure is:

\begin{equation*} p(\boldsymbol {\mathbf {z}}_{i+2}, \boldsymbol {\mathbf {z}}_{i+1} | [\boldsymbol {\mathbf {z}}_0, \boldsymbol {\mathbf {z}}_1, \ldots , \boldsymbol {\mathbf {z}}_i]) = \delta (\boldsymbol {\mathbf {z}}_{i+2} - \boldsymbol {\mathbf {y}}_{i+2}) \delta (\boldsymbol {\mathbf {z}}_{i+1} - \boldsymbol {\mathbf {y}}_{i+1}). \end{equation*}

This is a product of delta functions as \(\boldsymbol {\mathbf {z}}_{i+1}=\boldsymbol {\mathbf {y}}_{i+1}\) and \(\boldsymbol {\mathbf {z}}_{i+2}=\boldsymbol {\mathbf {y}}_{i+2}\) are the only valid vertex positions. In the PSS to path space mapping, the joint PDF for the mutated random numbers \(\bar{\boldsymbol {\mathbf {v}}}_{i+1}\) and \(\bar{\boldsymbol {\mathbf {v}}}_{i+2}\) (for vertices \(\boldsymbol {\mathbf {y}}_{i+1}\) and \(\boldsymbol {\mathbf {y}}_{i+2}\)) is related to \(p(\boldsymbol {\mathbf {z}}_{i+2}, \boldsymbol {\mathbf {z}}_{i+1}\ |\ [\boldsymbol {\mathbf {z}}_0, \boldsymbol {\mathbf {z}}_1, \ldots , \boldsymbol {\mathbf {z}}_i])\) via a Jacobian determinant:

\begin{equation*} p(\bar{\boldsymbol {\mathbf {v}}}_{i+2}, \bar{\boldsymbol {\mathbf {v}}}_{i+1} | [\bar{\boldsymbol {\mathbf {v}}}_0, \bar{\boldsymbol {\mathbf {v}}}_1, \ldots , \bar{\boldsymbol {\mathbf {v}}}_i]) = p(\boldsymbol {\mathbf {z}}_{i+2}, \boldsymbol {\mathbf {z}}_{i+1} | [\boldsymbol {\mathbf {z}}_0, \boldsymbol {\mathbf {z}}_1, \ldots , \boldsymbol {\mathbf {z}}_i]) \left|\frac{\partial {\bar{\boldsymbol {\mathbf {z}}}}}{\partial {\bar{\boldsymbol {\mathbf {v}}}}}\right| \end{equation*}

This PDF serves as our proposal density \(T(\bar{\boldsymbol {\mathbf {u}}}\rightarrow \bar{\boldsymbol {\mathbf {v}}})\) for mutations, which then yields:

\begin{align} \frac{T(\bar{\boldsymbol {\mathbf {v}}}\rightarrow \bar{\boldsymbol {\mathbf {u}}})}{T(\bar{\boldsymbol {\mathbf {u}}}\rightarrow \bar{\boldsymbol {\mathbf {v}}})} &= \left|\frac{\partial {\bar{\boldsymbol {\mathbf {v}}}}}{\partial {\bar{\boldsymbol {\mathbf {z}}}}}\right| \left|\frac{\partial {\bar{\boldsymbol {\mathbf {y}}}}}{\partial {\bar{\boldsymbol {\mathbf {u}}}}}\right|\frac{\delta (\boldsymbol {\mathbf {y}}_{i+2} - \boldsymbol {\mathbf {z}}_{i+2}) \delta (\boldsymbol {\mathbf {y}}_{i+1} - \boldsymbol {\mathbf {z}}_{i+1})}{\delta (\boldsymbol {\mathbf {z}}_{i+2} - \boldsymbol {\mathbf {y}}_{i+2}) \delta (\boldsymbol {\mathbf {z}}_{i+1} - \boldsymbol {\mathbf {y}}_{i+1})} \nonumber\\ &= \left|\frac{\partial {\bar{\boldsymbol {\mathbf {v}}}}}{\partial {\bar{\nu }}}\right| \left|\frac{\partial {\bar{\nu }}}{\partial {\bar{\boldsymbol {\mathbf {z}}}}}\right|\left|\frac{\partial {\bar{\boldsymbol {\mathbf {y}}}}}{\partial {\bar{\omega }}}\right|\left|\frac{\partial {\bar{\omega }}}{\partial {\bar{\boldsymbol {\mathbf {u}}}}}\right|\nonumber\\ &= \left|\frac{\partial {\bar{\nu }}}{\partial {\bar{\boldsymbol {\mathbf {z}}}}}\right|\left|\frac{\partial {\bar{\boldsymbol {\mathbf {y}}}}}{\partial {\bar{\omega }}}\right|\frac{p(\nu _{i-1}, \nu _i)}{p(\omega _{i-1}, \omega _i)} \frac{p(\nu _i, \omega _{i+1})}{p(\omega _i, \omega _{i+1})}\nonumber\\ &= \frac{|\text{cos}\ \phi |}{|\text{cos}\ \theta |} \frac{|\boldsymbol {\mathbf {y}}_{i+1} - \boldsymbol {\mathbf {y}}_{i}|^2}{|\boldsymbol {\mathbf {y}}_{i+1} - \boldsymbol {\mathbf {z}}_{i}|^2} \frac{p(\nu _{i-1}, \nu _i)}{p(\omega _{i-1}, \omega _i)} \frac{p(\nu _i, \omega _{i+1})}{p(\omega _i, \omega _{i+1})}. \end{align}

(33)

The delta functions in the first line cancel since they are symmetric. In the third line, we use the fact that the Jacobian determinant of a sampling scheme is the same as its inverse PDF [Kelemen et al. 2002, Section 2]. The final step substitutes in the definition of the Jacobians relating the solid angle and area measures.

References

[1]

Pontus Andersson, Jim Nilsson, Tomas Akenine-Möller, Magnus Oskarsson, Kalle Åström, and Mark D. Fairchild. 2020. FLIP: A difference evaluator for alternating images. Proc. ACM Comput. Graph. Interact. Tech. 3, 2 (2020), 23 pages. DOI:

Abstract

1 Introduction

2 Background

2.1 Resampled Importance Sampling (RIS)

2.2 Generalized Resampled Importance Sampling (GRIS)

2.3 Weighted Reservoir Sampling (WRS)

2.4 Reservoir-based Spatiotemporal Resampling

2.5 Metropolis–Hastings (MH)

3 Method

4 Implementation Details

4.1 Primary Sample Space

4.2 Direct Lighting

4.3 Indirect Illumination

5 Results and Discussion

6 Related Work

7 Limitations and Future Work

Acknowledgments

Footnotes

Supplementary Material

A Unbiased Contribution Weights and Elimination of Startup Bias

B MIS Weights for Temporal Reuse

C Transition Kernel for Mutating A Reconnection Vertex

References

Cited By

Index Terms

Recommendations

Conditional Resampled Importance Sampling and ReSTIR

Generalized resampled importance sampling: foundations of ReSTIR

Spatiotemporal reservoir resampling for real-time ray tracing with dynamic direct lighting

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations