Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Dynamic LiDAR Re-simulation using Compositional Neural Fields

Hanfeng Wu1,2  Xingxing Zuo2,111Corresponding authors Stefan Leutenegger2
Or Litany3,4 Konrad Schindler1  Shengyu Huang1,111Corresponding authors
1 ETH Zurich  2 TU Munich  3 Technion 4 NVIDIA
Abstract

We introduce DyNFL, a novel neural field-based approach for high-fidelity re-simulation of LiDAR scans in dynamic driving scenes. DyNFL processes LiDAR measurements from dynamic environments, accompanied by bounding boxes of moving objects, to construct an editable neural field. This field, comprising separately reconstructed static background and dynamic objects, allows users to modify viewpoints, adjust object positions, and seamlessly add or remove objects in the re-simulated scene. A key innovation of our method is the neural field composition technique, which effectively integrates reconstructed neural assets from various scenes through a ray drop test, accounting for occlusions and transparent surfaces. Our evaluation with both synthetic and real-world environments demonstrates that DyNFL substantially improves dynamic scene LiDAR simulation, offering a combination of physical fidelity and flexible editing capabilities. [project page]

Refer to caption
Figure 1: Overview of DyNFL. Our method takes LiDAR scans and tracked bounding boxes of dynamic vehicles as input. DyNFL first decomposes the scene into a static background and N𝑁Nitalic_N dynamic vehicles, each modelled using a dedicated neural field. These neural fields are then composed to re-simulate LiDAR scans in dynamic scenes. Our composition technique supports various scene edits, including altering object trajectories, removing and adding reconstructed neural assets between scenes.

1 Introduction

We introduce a neural representation for the purpose of reconstructing and manipulating LiDAR scans of dynamic driving scenes. Counterfactual re-simulation is an emerging application in the realm of autonomous driving, offering a unique approach to examining ”what if” scenarios. This method involves creating a reconstruction of a real-world event, termed as digital twin, and then applying various modifications to it. These could include altering the environmental conditions, changing the actions of some agents, or introducing additional scene elements. Analyzing the outcomes of these edited scenarios provides insights into the functioning of the perception system, moreover they can be used to obtain training data for rare situations.

The essence of counterfactual re-simulation is the capability to authentically recreate variations of the original, factual observation. We address this challenge in the context of LiDAR on autonomous vehicles (AV). Existing approaches to LiDAR re-simulation have important limitations. Conventional simulators such as CARLA [9] and NVIDIA DRIVE Sim are capable of modeling LiDAR sensors. However, their reliance on manually designed 3D simulation assets requires significant human effort. LiDARsim [27] aims to remedy this by reconstructing vehicles and scenes from real measurements. While producing encouraging results, its two-stage LiDAR modeling lacks realism, particularly in terms of physical effects like multi-returns and reflected intensity, which were shown to matter for downstream processing [15]. Following NeRF’s [28] success in camera view synthesis, some works have applied neural fields for LiDAR modeling [19, 47, 62]. In particular, Neural LiDAR Fields (NFL) [19] offer a physically inspired LiDAR volumetric rendering scheme that accounts for two-way transmittance and beam width, allowing faithful recovery of secondary returns, intensity, and ray drops. These models are, however, limited to static scenes that do not change while multiple input views are scanned, and are thus of limited use for re-simulation in the presence of moving traffic. Recently, UniSim [58] followed Neural Scene Graph [32] in modeling road scenes as sets of movable NeRF instances on top of a static background. UniSim introduced a unified synthesis approach for camera and LiDAR sensors, but ignored physical sensor properties like two-way transmittance and beam width [19].

We present DyNFL, a novel approach for re-simulating LiDAR views of driving scenarios. Our method builds upon a neural SDF that enables an accurate representation of scene geometry, while at the same time enforcing physical accuracy by modeling two-way transmittance, like NFL [19]. Our primary contribution is a method for compositing neural fields that accurately integrates LiDAR measurements from individual fields representing different scene assets. With the help of a ray drop test, we effectively manage occlusions and transparent surfaces. This not only ensures physical accuracy, but also facilitates the inclusion of assets reconstructed from a variety of static and dynamic scenes, thereby enhancing control over the simulated content. Our method bridges the gap between the physical fidelity of the re-simulation and the flexible editing of dynamic scenes. We validate DyNFL with both synthetic and real-world data, focusing on three key areas: (i) high-quality view synthesis, (ii) perceptual fidelity, and (iii) asset manipulation. We find that our approach outperforms baseline models in terms of both range and intensity estimates. Its synthetic outputs also show higher agreement with real scans on object detection and segmentation tasks. DyNFL enables not only removal, duplication and repositioning of assets within the same scene, but also the inclusion of assets reconstructed in other scenes, paving the way for new applications.

2 Related work

2.1 Neural radiance fields and volume rendering

Neural Radiance Fields (NeRF) [28] have demonstrated remarkable success in novel-view image synthesis through neural volume rendering. These fields are characterized by the weights of Multilayer Perceptrons (MLPs), which enable the retrieval of volume density and RGB colors at any specified point within the field for image compositing via volume rendering. Several studies [2, 3, 49, 7, 12] have subsequently advanced NeRF’s rendering quality by addressing challenges such as reducing aliasing artifacts [2], scaling to unbounded large-scale scenarios [3], and capturing specular reflections on glossy surfaces [49]. Certain works [7, 12, 29, 20] have explored more effective representations of radiance fields. TensorsRF [7] employs multiple compact low-rank tensor components, such as vectors and matrices, to represent the radiance field. Plenoxels [12] accelerates NeRF training by replacing MLPs with explicit plenoptic elements stored in sparse voxels and factorizing appearance through spherical-harmonic functions. Müller et al. [29] achieved a substantial acceleration in rendering speed by employing a representation that combines trainable multi-resolution hash encodings (MHE) with shared shallow MLP networks. Kerbl et al. [20] introduce a novel volume rendering method utilizing 3D Gaussians to represent the radiance field and rendering images based on visibility-aware splatting of 3D Gaussians.

2.2 Dynamic neural radiance fields

Neural fields [56] can be extended to represent dynamic scenes. On top of the canonical scene representation, some methods [35, 33, 34, 61] additionally model 4D deformation fields. Meanwhile, some other works learn a space-time correlated [38, 24, 1, 26] or decomposed [48, 55, 57] neural field to encode 4D scenes, achieving fine-grained reconstruction of geometry and appearance. Some other methods decompose the scene into static and dynamic parts, and model each dynamic actor with dedicated neural fields. Neural Scene Graph [32] and Panoptic Neural Fields [22] treat every dynamic object in the scene as a node, and synthesize photo-realistic RGB images by jointly rendering from both dynamic nodes and static background. UniSim[58] employs the neural SDF representation to model dynamic scenes in driving scenarios, and renders them in a similar way as Neural Scene Graph [32].

2.3 Neural surface representation

A fundamental challenge for NeRF and its variants is to accurately recover the underlying 3D surface from the implicit radiance field. Surfaces obtained by thresholding on the volume density of NeRF often exhibit noise [50, 59]. To address this, implicit surface representations like occupancy [30, 31] and signed distance functions (SDF) [50, 59, 60, 41, 51, 63, 25, 52] in grid maps are commonly integrated into neural volume rendering techniques.

NeuS [50] introduces a neural SDF representation for surface reconstruction, proposing an unbiased weight function for the appearance composition process in volume rendering. Similarly, VolSDF [59] models scenes with a neural SDF and incorporates the SDF into the volume rendering process, advocating a sampling strategy for the viewing rays to bound opacity approximation error. Neuralangelo [25] improves surface reconstruction using multi-resolution hash encoding (MHE) [29] and SDF-based volume rendering [50]. While these methods often deliver satisfactory surface reconstructions, their training is time-consuming, taking hours for a single scene. Voxurf [54] offers a faster surface reconstruction method through a two-stage training procedure, recovering the coarse shape first and refining details later. Wang et al. [52] expedite NeuS training to several minutes by predicting SDFs through a pipeline composed of MHE and shallow MLPs.

Many works also incorporate distances measured by LiDAR as auxiliary information to constrain the radiance field. For instance, [6, 53] render depth by accumulating volume density and minimizing depth discrepancies between LiDAR and rendered depth during training. Rematas et al. [37] enforce empty space between the actual surface and the ray origin.

2.4 LiDAR simulation

While simulators like CARLA [9] and AirSim [39] can simulate LiDAR data, they require expensive human annotations and suffer from a noticeable sim-to-real gap due to limited rendering quality. Generative model-based methods for LiDAR synthesis [5, 64] offer an alternative but often lack control and produce distorted geometry [23]. Learning-based approaches [23, 11, 27] try to enhance realism by transferring real scan properties to simulations. For example, [15] uses a RINet trained on RGB and real LiDAR data to enhance simulated scans. LiDARsim [27] employs ray-surfel casting with explicit disk surfels for more accurate simulation. Huang et al. [19] have proposed Neural LiDAR Fields (NFL), combining neural fields with a physical LiDAR model for high-quality synthesis, but NFL is limited to static scenes and can produce noisy outputs due to its unconstrained volume density representation. UniSim [58] constructs neural scene representations from realistic LiDAR and camera data, using SDF-based volume rendering to generate sensor measurements from novel viewpoints.

3 Dynamic neural scene representation

Problem statement.

Consider a set of LiDAR scans 𝒳={𝐗t}t=1T𝒳superscriptsubscriptsubscript𝐗𝑡𝑡1𝑇\mathcal{X}=\{\mathbf{X}_{t}\}_{t=1}^{T}caligraphic_X = { bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT that have been compensated for ego-motion, along with tracked bounding boxes***We assume that these are are available. for moving vehicles ={𝐁tv}v=1Nsuperscriptsubscriptsuperscriptsubscript𝐁𝑡𝑣𝑣1𝑁\mathcal{B}=\{\mathbf{B}_{t}^{v}\}_{v=1}^{N}caligraphic_B = { bold_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, where T𝑇Titalic_T represents the total number of LiDAR scans, and N𝑁Nitalic_N is the count of moving vehicles. Each scan 𝐗tsubscript𝐗𝑡\mathbf{X}_{t}bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is composed of ntsubscript𝑛𝑡n_{t}italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT rays, each ray 𝐫𝐫\mathbf{r}bold_r is described by the tuple (𝐨,𝐝,ζ,e,pd)𝐨𝐝𝜁𝑒subscript𝑝𝑑(\mathbf{o},\mathbf{d},\zeta,e,p_{d})( bold_o , bold_d , italic_ζ , italic_e , italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ), where 𝐨𝐨\mathbf{o}bold_o and 𝐝𝐝\mathbf{d}bold_d denote the ray’s origin and direction, ζ𝜁\zetaitalic_ζ and e𝑒eitalic_e represent range and intensity values, and pd{0,1}subscript𝑝𝑑01p_{d}\in\{0,1\}italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∈ { 0 , 1 } indicates whether the ray is dropped or not due to insufficient returned radiant power.

The goal is to reconstruct the scene with a static-dynamic decomposed neural representation, that can enable the rendering of LiDAR scan 𝐗tgtsubscript𝐗tgt\mathbf{X}_{\text{tgt}}bold_X start_POSTSUBSCRIPT tgt end_POSTSUBSCRIPT from novel viewpoint 𝐓tgtsubscript𝐓tgt\mathbf{T}_{\text{tgt}}bold_T start_POSTSUBSCRIPT tgt end_POSTSUBSCRIPT. This setup also facilitates various object manipulations, including altering object trajectories, and inserting or removing objects from the scene. See Fig. 1.

3.1 Neural scene decomposition

We leverage the inductive bias that driving scenes can be decomposed into a static background and N𝑁Nitalic_N rigidly moving dynamic components [18, 13]. Consequently, we establish N+1𝑁1N+1italic_N + 1 neural fields. The neural field 𝐅staticsubscript𝐅static\mathbf{F}_{\text{static}}bold_F start_POSTSUBSCRIPT static end_POSTSUBSCRIPT is designated for the static component of the scene, capturing the unchanging background elements. Concurrently, the set of neural fields {𝐅v}v=1Nsuperscriptsubscriptsuperscript𝐅𝑣𝑣1𝑁\{\mathbf{F}^{v}\}_{v=1}^{N}{ bold_F start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is used to model the N𝑁Nitalic_N dynamic entities, specifically the vehicles in motion.

Neural field for static background.

The static background is encoded into a neural field 𝐅static:(𝐱,𝐝)(s,e,pd):subscript𝐅staticmaps-to𝐱𝐝𝑠𝑒subscript𝑝𝑑\mathbf{F}_{\text{static}}:(\mathbf{x},\mathbf{d})\mapsto(s,e,p_{d})bold_F start_POSTSUBSCRIPT static end_POSTSUBSCRIPT : ( bold_x , bold_d ) ↦ ( italic_s , italic_e , italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) that estimates the signed distance s𝑠sitalic_s, intensity e𝑒eitalic_e, and ray drop probability pd[0,1]subscript𝑝𝑑01p_{d}\in[0,1]italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∈ [ 0 , 1 ] given the point coordinates 𝐱𝐱\mathbf{x}bold_x and the ray direction 𝐝𝐝\mathbf{d}bold_d. In practice, we first use a multi-resolution hash encoding (MRH) [29] to map each point to its positional feature 𝐟pos32superscript32subscript𝐟posabsent\mathbf{f}_{\text{pos}}\in^{32}bold_f start_POSTSUBSCRIPT pos end_POSTSUBSCRIPT ∈ start_POSTSUPERSCRIPT 32 end_POSTSUPERSCRIPT, and project the view direction onto the first 16 coefficients of the spherical harmonics basis, resulting in 𝐟dirsubscript𝐟dir\mathbf{f}_{\text{dir}}bold_f start_POSTSUBSCRIPT dir end_POSTSUBSCRIPT. Subsequently, we utilize three Multilayer Perceptrons (MLPs) to estimate the scene properties as follows:

(s,𝐟geo)=fs(𝐟pos),e=fe(𝐟ray),pd=fdrop(𝐟ray).formulae-sequence𝑠subscript𝐟geosubscript𝑓𝑠subscript𝐟posformulae-sequence𝑒subscript𝑓𝑒subscript𝐟raysubscript𝑝𝑑subscript𝑓dropsubscript𝐟ray(s,\mathbf{f}_{\text{geo}})=f_{s}(\mathbf{f}_{\text{pos}}),\quad e=f_{e}(% \mathbf{f}_{\text{ray}}),\quad p_{d}=f_{\text{drop}}(\mathbf{f}_{\text{ray}}).( italic_s , bold_f start_POSTSUBSCRIPT geo end_POSTSUBSCRIPT ) = italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( bold_f start_POSTSUBSCRIPT pos end_POSTSUBSCRIPT ) , italic_e = italic_f start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ( bold_f start_POSTSUBSCRIPT ray end_POSTSUBSCRIPT ) , italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT drop end_POSTSUBSCRIPT ( bold_f start_POSTSUBSCRIPT ray end_POSTSUBSCRIPT ) . (1)

Here, fs,fe,subscript𝑓𝑠subscript𝑓𝑒f_{s},f_{e},italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT , and fdropsubscript𝑓dropf_{\text{drop}}italic_f start_POSTSUBSCRIPT drop end_POSTSUBSCRIPT are three MLPs, 𝐟ray31subscript𝐟raysuperscript31\mathbf{f}_{\text{ray}}\in\mathbb{R}^{31}bold_f start_POSTSUBSCRIPT ray end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 31 end_POSTSUPERSCRIPT represents the ray feature and is constructed by concatenating the per-point geometric feature and the directional feature. The geometric feature is denoted as 𝐟geo15subscript𝐟geosuperscript15\mathbf{f}_{\text{geo}}\in\mathbb{R}^{15}bold_f start_POSTSUBSCRIPT geo end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 15 end_POSTSUPERSCRIPT. For implementation details please refer to the supplementary material.

Neural fields for dynamic vehicles.

LiDAR scans collected over time are often misaligned due to the motion of both the sensor and other objects in the scene. Despite applying ego-motion for aligning static background points, dynamic objects remain blurred along their trajectories. Our approach to constructing a dynamic neural scene representation is grounded in the assumption that each dynamic object only undergoes rigid motion. Therefore, we can first align them over time and reconstruct them in their canonical coordinate frame, and then render them over time by reversing the alignment of the neural field.

Specifically, consider a dynamic vehicle v𝑣vitalic_v occurring in LiDAR scans {𝐗tv}t=1Tsuperscriptsubscriptsubscriptsuperscript𝐗𝑣𝑡𝑡1𝑇\{\mathbf{X}^{v}_{t}\}_{t=1}^{T}{ bold_X start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT along with the associated bounding boxes {𝐁tv3×8}t=1Tsuperscriptsubscriptsubscriptsuperscript𝐁𝑣𝑡superscript38𝑡1𝑇\{\mathbf{B}^{v}_{t}\in\mathbb{R}^{3\times 8}\}_{t=1}^{T}{ bold_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 × 8 end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT in the world coordinate frame. Here each bounding box is defined by its eight corners, and the first bounding box 𝐁1vsubscriptsuperscript𝐁𝑣1\mathbf{B}^{v}_{1}bold_B start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is considered as the canonical box. We estimate the relative transformations {𝐓tSE(3)}t=2Tsuperscriptsubscriptsubscript𝐓𝑡SE3𝑡2𝑇\{\mathbf{T}_{t}\in\text{SE}(3)\}_{t=2}^{T}{ bold_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ SE ( 3 ) } start_POSTSUBSCRIPT italic_t = 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT between the remaining T1𝑇1T-1italic_T - 1 bounding boxes and the canonical one, expressed as 𝐁1v=𝐓t𝐁tvsuperscriptsubscript𝐁1𝑣subscript𝐓𝑡superscriptsubscript𝐁𝑡𝑣\mathbf{B}_{1}^{v}=\mathbf{T}_{t}\mathbf{B}_{t}^{v}bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT = bold_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT.***𝐓𝐁=𝐑𝐁+𝐭𝐓𝐁𝐑𝐁𝐭\mathbf{T}\mathbf{B}=\mathbf{R}\mathbf{B}+\mathbf{t}bold_TB = bold_RB + bold_t, with 𝐑𝐑\mathbf{R}bold_R, 𝐭𝐭\mathbf{t}bold_t the rotation/translation components of 𝐓𝐓\mathbf{T}bold_T.. Subsequently, all LiDAR measurements on the object are transformed and accumulated in its canonical coordinate frame. The vehicle v𝑣vitalic_v is then reconstructed in its canonical space, akin to the static background, using a neural field 𝐅vsuperscript𝐅𝑣\mathbf{F}^{v}bold_F start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT. To render the dynamic vehicle at timestamp t𝑡titalic_t, the corresponding rigid transformation is applied to the queried rays. The dynamic neural field can thus be expressed as: 𝐅tv:(𝐓t𝐱,𝐓t𝐝)(s,e,pd):subscriptsuperscript𝐅𝑣𝑡maps-tosubscript𝐓𝑡𝐱subscript𝐓𝑡𝐝𝑠𝑒subscript𝑝𝑑\mathbf{F}^{v}_{t}:(\mathbf{T}_{t}\mathbf{x},\mathbf{T}_{t}\mathbf{d})\mapsto(% s,e,p_{d})bold_F start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : ( bold_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_x , bold_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_d ) ↦ ( italic_s , italic_e , italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). The rendering process for 𝐅vsuperscript𝐅𝑣\mathbf{F}^{v}bold_F start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT is the same as for the static neural field 𝐅staticsubscript𝐅static\mathbf{F}_{\text{static}}bold_F start_POSTSUBSCRIPT static end_POSTSUBSCRIPT.

4 Neural rendering of the dynamic scene

In this section, we present the methodology for rendering LiDAR scans from the neural scene representation. We begin by revisiting the density-based volume rendering formulation for active sensors [19] in Sec. 4.1. Subsequently, we explore the extension of that formulation to the SDF-based neural scene representation in Sec. 4.2. Finally, we discuss in detail how to render LiDAR measurements from individual neural fields in Sec. 4.3 and how to compose the results from different neural fields in Sec. 4.4.

4.1 Volume rendering for active sensor

LiDAR utilizes laser pulses to determine the distance to the nearest reflective surface, by analyzing the waveform profile of the returned radiant power. The radiant power P(ζ)𝑃𝜁P(\zeta)italic_P ( italic_ζ ) from range ζ𝜁\zetaitalic_ζ is the result of a convolution between the pulse power Pe(t)subscript𝑃𝑒𝑡P_{e}(t)italic_P start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ( italic_t ) and the impulse response H(ζ)𝐻𝜁H(\zeta)italic_H ( italic_ζ ), defined as [16, 17, 19]:

P(ζ)=02ζ/cPe(t)H(ζct2)𝑑t.𝑃𝜁superscriptsubscript02𝜁𝑐subscript𝑃𝑒𝑡𝐻𝜁𝑐𝑡2differential-d𝑡P(\zeta)=\int_{0}^{2\zeta/c}P_{e}(t)H(\zeta-\frac{ct}{2})\;dt\;.italic_P ( italic_ζ ) = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 italic_ζ / italic_c end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ( italic_t ) italic_H ( italic_ζ - divide start_ARG italic_c italic_t end_ARG start_ARG 2 end_ARG ) italic_d italic_t . (2)

The impulse response H(ζ)𝐻𝜁H(\zeta)italic_H ( italic_ζ ) is a product of the target and sensor impulse responses: H(ζ)=HT(ζ)HS(ζ)𝐻𝜁subscript𝐻𝑇𝜁subscript𝐻𝑆𝜁H(\zeta)=H_{T}(\zeta)\cdot H_{S}(\zeta)italic_H ( italic_ζ ) = italic_H start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_ζ ) ⋅ italic_H start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT ( italic_ζ ), and the individual components are expressed as:

HT(ζ)=ρπcos(θ)δ(ζζ¯),Hs(ζ)=T2(ζ)Aeζ2,formulae-sequencesubscript𝐻𝑇𝜁𝜌𝜋𝜃𝛿𝜁¯𝜁subscript𝐻𝑠𝜁superscript𝑇2𝜁subscript𝐴𝑒superscript𝜁2H_{T}(\zeta)=\frac{\rho}{\pi}\cos(\theta)\delta(\zeta-\bar{\zeta})\;,\quad H_{% s}(\zeta)=T^{2}(\zeta)\frac{A_{e}}{\zeta^{2}}\;,italic_H start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ( italic_ζ ) = divide start_ARG italic_ρ end_ARG start_ARG italic_π end_ARG roman_cos ( italic_θ ) italic_δ ( italic_ζ - over¯ start_ARG italic_ζ end_ARG ) , italic_H start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_ζ ) = italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ζ ) divide start_ARG italic_A start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , (3)

where ρ𝜌\rhoitalic_ρ represents the surface reflectance, θ𝜃\thetaitalic_θ denotes incidence angle, ζ¯¯𝜁\bar{\zeta}over¯ start_ARG italic_ζ end_ARG is the ground truth distance to the nearest reflective surface, T(ζ)𝑇𝜁T(\zeta)italic_T ( italic_ζ ) and Aesubscript𝐴𝑒A_{e}italic_A start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT describe the transmittance at range ζ𝜁\zetaitalic_ζ and effective sensor area, respectively. Due to the discontinuity introduced by the indicator function δ(ζζ¯)𝛿𝜁¯𝜁\delta(\zeta-\bar{\zeta})italic_δ ( italic_ζ - over¯ start_ARG italic_ζ end_ARG ),  Eq. 2 is non-differentiable and is thus not suitable for solving the inverse problem. NFL [19] solves it by extending it into a probabilistic formulation given by:

P(ζ)=CT2(ζ)σζρζζ2cos(θ).𝑃𝜁𝐶superscript𝑇2𝜁subscript𝜎𝜁subscript𝜌𝜁superscript𝜁2𝜃P(\zeta)=C\cdot\frac{T^{2}(\zeta)\cdot\sigma_{\zeta}\rho_{\zeta}}{\zeta^{2}}% \cos(\theta)\;.italic_P ( italic_ζ ) = italic_C ⋅ divide start_ARG italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ζ ) ⋅ italic_σ start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_cos ( italic_θ ) . (4)

Here, C𝐶Citalic_C accounts for the constant values, and σζsubscript𝜎𝜁\sigma_{\zeta}italic_σ start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT represents the density at range ζ𝜁\zetaitalic_ζ. The radiant power can be reconstructed using the volume rendering equation:

P=j=1Nζjζj+1CT2(ζ)σζρζζ2cos(θj)𝑑ζ=j=1Nwjρζj,𝑃superscriptsubscript𝑗1𝑁superscriptsubscriptsubscript𝜁𝑗subscript𝜁𝑗1𝐶superscript𝑇2𝜁subscript𝜎𝜁subscript𝜌𝜁superscript𝜁2subscript𝜃𝑗differential-d𝜁superscriptsubscript𝑗1𝑁subscript𝑤𝑗superscriptsubscript𝜌subscript𝜁𝑗P=\!\sum_{j=1}^{N}\int_{\zeta_{j}}^{\zeta_{j+1}}\!\!C\frac{T^{2}({\zeta})\cdot% \sigma_{\zeta}\rho_{\zeta}}{\zeta^{2}}\cos(\theta_{j})\;d\zeta=\!\sum_{j=1}^{N% }w_{j}\rho_{\zeta_{j}}^{\prime},italic_P = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_C divide start_ARG italic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ζ ) ⋅ italic_σ start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT end_ARG start_ARG italic_ζ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG roman_cos ( italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) italic_d italic_ζ = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , (5)

where the weights wj=2αζji=1j1(12αζi).subscript𝑤𝑗2subscript𝛼subscript𝜁𝑗superscriptsubscriptproduct𝑖1𝑗112subscript𝛼subscript𝜁𝑖w_{j}=2\alpha_{\zeta_{j}}\cdot\prod_{i=1}^{j-1}(1-2\alpha_{\zeta_{i}}).italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 2 italic_α start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT ( 1 - 2 italic_α start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) .  Here αζjsubscript𝛼subscript𝜁𝑗\alpha_{\zeta_{j}}italic_α start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the discrete opacity at range ζjsubscript𝜁𝑗\zeta_{j}italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Please refer to [19] for more details.

4.2 SDF-based volume rendering for active sensor

A neural scene representation based on probabilistic density often results in surfaces with noticeable noise due to insufficient surface regularization [50]. To address this, we opt for a signed distance-based scene representation and establish the LiDAR volume rendering formulation within that framework. Building upon SDF-based volume rendering for passive sensors [50], we compute the opaque density σ~ζisubscript~𝜎subscript𝜁𝑖\tilde{\sigma}_{\zeta_{i}}over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT as follows:

σ~ζi=max(dΦsdζi(f(ζi))Φs(f(ζi)),0),subscript~𝜎subscript𝜁𝑖dsubscriptΦ𝑠dsubscript𝜁𝑖𝑓subscript𝜁𝑖subscriptΦ𝑠𝑓subscript𝜁𝑖0\tilde{\sigma}_{\zeta_{i}}=\max\left(\frac{-\frac{{\rm d}\Phi_{s}}{{\rm d}% \zeta_{i}}(f(\zeta_{i}))}{\Phi_{s}(f(\zeta_{i}))},0\right),over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = roman_max ( divide start_ARG - divide start_ARG roman_d roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG roman_d italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) end_ARG start_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) end_ARG , 0 ) , (6)

where Φs()subscriptΦ𝑠\Phi_{s}(\cdot)roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( ⋅ ) represents the Sigmoid function, f(ζ)𝑓𝜁f(\zeta)italic_f ( italic_ζ ) evaluates the signed distance to the surface at range ζ𝜁\zetaitalic_ζ along the ray 𝐫𝐫\mathbf{r}bold_r. Next, we substitute the density σ𝜎\sigmaitalic_σ in Eq. 5 with the opaque density from Eq. 6 and re-evaluate the radiant power and weights as:

P=j=1N𝒯ζj2α~ζjρζj,w~j=2α~ζji=1j1(12α~ζi).formulae-sequence𝑃superscriptsubscript𝑗1𝑁subscriptsuperscript𝒯2subscript𝜁𝑗subscript~𝛼subscript𝜁𝑗superscriptsubscript𝜌subscript𝜁𝑗subscript~𝑤𝑗2subscript~𝛼subscript𝜁𝑗superscriptsubscriptproduct𝑖1𝑗112subscript~𝛼subscript𝜁𝑖P=\!\sum_{j=1}^{N}\mathcal{T}^{2}_{\zeta_{j}}\tilde{\alpha}_{\zeta_{j}}\rho_{% \zeta_{j}}^{\prime},\quad\tilde{w}_{j}=2\tilde{\alpha}_{\zeta_{j}}\cdot\prod_{% i=1}^{j-1}(1-2\tilde{\alpha}_{\zeta_{i}})\;.italic_P = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_ρ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , over~ start_ARG italic_w end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = 2 over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j - 1 end_POSTSUPERSCRIPT ( 1 - 2 over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) . (7)

In this context, α~ζjsubscript~𝛼subscript𝜁𝑗\tilde{\alpha}_{\zeta_{j}}over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT is computed as:

α~ζj=max(Φs(f(ζj))2Φs(f(ζj+1))22Φs(f(ζj))2,0).subscript~𝛼subscript𝜁𝑗subscriptΦ𝑠superscript𝑓subscript𝜁𝑗2subscriptΦ𝑠superscript𝑓subscript𝜁𝑗122subscriptΦ𝑠superscript𝑓subscript𝜁𝑗20\tilde{\alpha}_{\zeta_{j}}=\max\left(\!\frac{{\Phi_{s}(f(\zeta_{j}))}^{2}-{% \Phi_{s}(f(\zeta_{j+1}))}^{2}}{{2\Phi_{s}(f(\zeta_{j}))}^{2}},0\right).over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = roman_max ( divide start_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , 0 ) . (8)

Please refer to the supplementary material for more details.

4.3 Volume rendering for LiDAR measurements

To render the LiDAR measurements from a single neural field, we employ the hierarchical sampling technique [50] to sample a total of Ns=Nu+Nisubscript𝑁𝑠subscript𝑁𝑢subscript𝑁𝑖N_{s}=N_{u}+N_{i}italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = italic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT points along each ray, where Nusubscript𝑁𝑢N_{u}italic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT points are uniformly sampled, and Nisubscript𝑁𝑖N_{i}italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT points are probabilistically sampled based on the weights along the ray for denser sampling in the proximity of the surface. Subsequently, we compute the weights for the Nssubscript𝑁𝑠N_{s}italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT points following Eq. 8. The rendering of range, intensity, and ray drop for each ray can be expressed through volume rendering as follows: yest=j=1Nswjyjsubscript𝑦estsuperscriptsubscript𝑗1subscript𝑁𝑠subscript𝑤𝑗subscript𝑦𝑗y_{\text{est}}=\sum_{j=1}^{N_{s}}w_{j}y_{j}italic_y start_POSTSUBSCRIPT est end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, where y{ζ,e,pd}𝑦𝜁𝑒subscript𝑝𝑑y\in\{\zeta,e,p_{d}\}italic_y ∈ { italic_ζ , italic_e , italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT }.

4.4 Neural rendering for multiple fields

Our full neural scene representation comprises N+1𝑁1N+1italic_N + 1 neural fields as discussed in Sec. 3.1. Rendering from all these fields for each ray during inference is computationally intensive. To address this, we implement a two-stage method. In the first stage, we identify k0𝑘0k\geq 0italic_k ≥ 0 dynamic fields that are likely to intersect with a given ray, plus the static background. The second stage renders LiDAR measurements from these selected fields individually and then integrates them into a unified set of measurements.

Ray intersection test.

As outlined in Sec. 3.1, each dynamic neural field is reconstructed in its unique canonical space, defined by a corresponding canonical box. To determine neural fields that intersect a given ray at inference time, we begin by estimating the transformations {𝐓tv}v=1Nsuperscriptsubscriptsuperscriptsubscript𝐓𝑡𝑣𝑣1𝑁\{\mathbf{T}_{t}^{v}\}_{v=1}^{N}{ bold_T start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_v = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, which convert coordinates from the world framework to each vehicle’s canonical space at timestamp t𝑡titalic_t. These transformations are determined by interpolating the training set transformations using spherical linear interpolation (SLERP) [40]. Then, we apply the transformations to the queried ray and run intersection tests with the canonical boxes of the fields.

Neural rendering from multiple neural fields.

After identifying the k+1𝑘1k+1italic_k + 1 neural fields that potentially intersect with a ray, we run volume rendering on each field separately, yielding k+1𝑘1k+1italic_k + 1 distinct sets of LiDAR measurements. Next, we evaluate the ray drop probabilities across these fields. A ray is deemed dropped if all neural fields indicate a drop probability pd>0.5subscript𝑝𝑑0.5p_{d}>0.5italic_p start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT > 0.5. For rays not classified as dropped, we sort the estimated ranges in ascending order and select the nearest one as our final range prediction. The corresponding intensity value at the closest range is extracted from the same neural field.

5 Neural scene optimisation

Given the set of LiDAR scans and the associated tracked bounding boxes of the dynamic vehicles, we optimise our neural scene representation by minimising the loss:

=wζζ+wss+weikeik+wee+wdropdrop,subscript𝑤𝜁subscript𝜁subscript𝑤𝑠subscript𝑠subscript𝑤eiksubscripteiksubscript𝑤𝑒subscript𝑒subscript𝑤dropsubscriptdrop\mathcal{L}=w_{\zeta}\mathcal{L}_{\zeta}+w_{s}\mathcal{L}_{s}+w_{\text{eik}}% \mathcal{L}_{\text{eik}}+w_{e}\mathcal{L}_{e}+w_{\text{drop}}\mathcal{L}_{% \text{drop}},caligraphic_L = italic_w start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT eik end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT eik end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT drop end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT drop end_POSTSUBSCRIPT , (9)

where the wsubscript𝑤w_{*}italic_w start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT denote the weights of the individual loss terms subscript\mathcal{L}_{*}caligraphic_L start_POSTSUBSCRIPT ∗ end_POSTSUBSCRIPT, explained below.

Range reconstruction loss.

For range estimation, we employ an L1 loss, defined as: ζ=1||𝐫|ζestζgt|subscript𝜁1subscript𝐫subscript𝜁𝑒𝑠𝑡subscript𝜁𝑔𝑡\mathcal{L}_{\zeta}=\frac{1}{|\mathcal{R}|}\sum_{\mathbf{r}\in\mathcal{R}}|% \zeta_{est}-\zeta_{gt}|caligraphic_L start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_R | end_ARG ∑ start_POSTSUBSCRIPT bold_r ∈ caligraphic_R end_POSTSUBSCRIPT | italic_ζ start_POSTSUBSCRIPT italic_e italic_s italic_t end_POSTSUBSCRIPT - italic_ζ start_POSTSUBSCRIPT italic_g italic_t end_POSTSUBSCRIPT |, where \mathcal{R}caligraphic_R denotes the set of LiDAR rays, ζestsubscript𝜁𝑒𝑠𝑡\zeta_{est}italic_ζ start_POSTSUBSCRIPT italic_e italic_s italic_t end_POSTSUBSCRIPT and ζgtsubscript𝜁𝑔𝑡\zeta_{gt}italic_ζ start_POSTSUBSCRIPT italic_g italic_t end_POSTSUBSCRIPT are the estimated and actual ranges, respectively.

Surface point SDF regularisation.

Acknowledging that LiDAR points mostly come from the actual surface, we introduce an additional SDF regularisation term ssubscript𝑠\mathcal{L}_{s}caligraphic_L start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT that penalizes non-zero SDF values at surface points: s=1|𝒫|𝐩𝒫|s(𝐩)|subscript𝑠1𝒫subscript𝐩𝒫𝑠𝐩\mathcal{L}_{s}=\frac{1}{|\mathcal{P}|}\sum_{\mathbf{p}\in\mathcal{P}}|s(% \mathbf{p})|caligraphic_L start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_P | end_ARG ∑ start_POSTSUBSCRIPT bold_p ∈ caligraphic_P end_POSTSUBSCRIPT | italic_s ( bold_p ) |. Here 𝒫𝒫\mathcal{P}caligraphic_P denotes the set of surface points and s(𝐩)𝑠𝐩s({\mathbf{p}})italic_s ( bold_p ) represents the SDF value of the point 𝐩𝐩\mathbf{p}bold_p.

Eikonal constraint.

Following [14], we utilize the Eikonal loss eiksubscripteik\mathcal{L}_{\text{eik}}caligraphic_L start_POSTSUBSCRIPT eik end_POSTSUBSCRIPT, to regularize the SDF level set. This ensures the gradient norm of the SDF is approximately one at any queried point. The loss is computed as: eik=1|𝒵|𝐩𝒵(s(𝐩)21)2subscripteik1𝒵subscript𝐩𝒵superscriptsubscriptnorm𝑠𝐩212\mathcal{L}_{\text{eik}}=\frac{1}{|\mathcal{Z}|}\sum_{\mathbf{p}\in\mathcal{Z}% }(\|\nabla s(\mathbf{p})\|_{2}-1)^{2}caligraphic_L start_POSTSUBSCRIPT eik end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_Z | end_ARG ∑ start_POSTSUBSCRIPT bold_p ∈ caligraphic_Z end_POSTSUBSCRIPT ( ∥ ∇ italic_s ( bold_p ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where 𝒵𝒵\mathcal{Z}caligraphic_Z is the set of all the sampled points. To stabilise the training procedure, we adopt a numerical approach [25] to compute s(𝐩)𝑠𝐩\nabla s(\mathbf{p})∇ italic_s ( bold_p ) as:

s(𝐩)=s(𝐩+ϵ)s(𝐩ϵ)2ϵ,𝑠𝐩𝑠𝐩bold-italic-ϵ𝑠𝐩bold-italic-ϵ2italic-ϵ\nabla s(\mathbf{p})=\frac{s\left(\mathbf{p}+\boldsymbol{\epsilon}\right)-s% \left(\mathbf{p}-\boldsymbol{\epsilon}\right)}{2\epsilon}\;,∇ italic_s ( bold_p ) = divide start_ARG italic_s ( bold_p + bold_italic_ϵ ) - italic_s ( bold_p - bold_italic_ϵ ) end_ARG start_ARG 2 italic_ϵ end_ARG , (10)

where the numerical step size ϵitalic-ϵ\epsilonitalic_ϵ is set to be 103superscript10310^{-3}10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT meters.

Intensity Loss.

For intensity reconstruction, we apply the L2 loss, defined as: e=1||𝐫(eestegt)2.subscript𝑒1subscript𝐫superscriptsubscript𝑒𝑒𝑠𝑡subscript𝑒𝑔𝑡2\mathcal{L}_{e}=\frac{1}{|\mathcal{R}|}\sum_{\mathbf{r}\in\mathcal{R}}(e_{est}% -e_{gt})^{2}.caligraphic_L start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_R | end_ARG ∑ start_POSTSUBSCRIPT bold_r ∈ caligraphic_R end_POSTSUBSCRIPT ( italic_e start_POSTSUBSCRIPT italic_e italic_s italic_t end_POSTSUBSCRIPT - italic_e start_POSTSUBSCRIPT italic_g italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .

Ray drop loss.

We follow [19] to supervise the ray drop estimation with a combination of a binary cross entropy loss bcesubscript𝑏𝑐𝑒\mathcal{L}_{bce}caligraphic_L start_POSTSUBSCRIPT italic_b italic_c italic_e end_POSTSUBSCRIPT and a Lovasz loss lssubscript𝑙𝑠\mathcal{L}_{ls}caligraphic_L start_POSTSUBSCRIPT italic_l italic_s end_POSTSUBSCRIPT [4] as:

drop=1||𝐫(bce(pd,est,pd,gt)+ls(pd,est,pd,gt)).subscriptdrop1subscript𝐫subscript𝑏𝑐𝑒subscript𝑝𝑑𝑒𝑠𝑡subscript𝑝𝑑𝑔𝑡subscript𝑙𝑠subscript𝑝𝑑𝑒𝑠𝑡subscript𝑝𝑑𝑔𝑡\mathcal{L}_{\text{drop}}=\frac{1}{|\mathcal{R}|}\sum_{\mathbf{r}\in\mathcal{R% }}\left(\mathcal{L}_{bce}(p_{d,est},{p_{d,gt}})+\mathcal{L}_{ls}(p_{d,est},{p_% {d,gt}})\right)\;.caligraphic_L start_POSTSUBSCRIPT drop end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_R | end_ARG ∑ start_POSTSUBSCRIPT bold_r ∈ caligraphic_R end_POSTSUBSCRIPT ( caligraphic_L start_POSTSUBSCRIPT italic_b italic_c italic_e end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_d , italic_e italic_s italic_t end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_d , italic_g italic_t end_POSTSUBSCRIPT ) + caligraphic_L start_POSTSUBSCRIPT italic_l italic_s end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT italic_d , italic_e italic_s italic_t end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT italic_d , italic_g italic_t end_POSTSUBSCRIPT ) ) . (11)

Notably, in the context of dynamic neural field training, we include all LiDAR rays that intersect with the objects’ bounding boxes in a scene. A ray is classified as dropped either if it is labeled as such in the dataset or if it does not intersect with the actual surface of a dynamic vehicle (e.g., rays close to and parallel with the surfaces). This approach enhances the accuracy and realism of the reconstructed dynamic neural fields, improving the rendering fidelity at inference time.

6 Experiments

Refer to caption
Figure 2: Qualitative comparison of range estimation on Waymo Dynamic dataset. Dynamic vehicles are zoomed in, and points are color-coded by range errors (-100 Refer to caption 100 cm).

6.1 Datasets and evaluation protocol

Real-world dynamic scenes.

We construct the Waymo Dynamic dataset by selecting four representative scenes from Waymo Open [42], with multiple moving vehicles inside. These scenes are comprised of 50 consecutive frames. For evaluation purposes, every 5th frame is held out for testing, the remaining 40 frames are used for training.

Real-world static scenes.

We also evaluate our method on four static scenes as introduced in [19]. There are two settings, Waymo Interp applies the same evaluation protocol as Waymo Dynamic, while Waymo NVS employs a dedicated closed-loop evaluation to validate the real novel view synthesis performance. Please refer to NFL [19] for more details about this setting.

Synthetic static scenes.

TownClean and TownReal are synthetic static scenes introduced in NFL [19]. They consist of 50 simulated LiDAR scans of urban street environments, using idealised and diverging beams, respectively.

Evaluation metrics.

To evaluate the LiDAR range accuracy, we employ a suite of four metrics: mean absolute errors (MAE [cm]), median absolute errors (MedAE [cm]), Chamfer distance (CD [cm]) and MedAE for dynamic vehicles (MedAE Dyn [cm]). For intensity evaluation, We report root mean square error (RMSE). In addition to our primary evaluations, we assess the re-simulated LiDAR scans’ realism through two auxiliary tasks: object detection and semantic segmentation. For object detection, we calculate the detection agreement [27], both for all vehicles (Agg. [%]) and specifically for dynamic vehicles (Agg. Dyn. [%]). Regarding semantic segmentation, we report recall, precision, and intersection-over-union (IoU [%]). It is important to note that the predictions on the original LiDAR scans serve as our ground truth, against which we compare the results obtained from the re-simulated scans.

Baseline methods.

Regarding LiDAR simulation on static scenes, NFL [19] and LiDARsim [27] are the two closest baselines to compare to. Additionally, we include i-NGP [29], DS-NeRF [8], and URF [37] for comparison. As for simulation on dynamic scenes, we compare to LiDARsim [27] and UniSim [58].***We re-implement both LiDARsim and UniSim, since they are not open-sourced. Please refer to the supplementary material for implementation details.

Method MAE \downarrow MedAE \downarrow CD \downarrow MedAE Dyn \downarrow Intensity RMSE \downarrow
LiDARsim [27] 170.1 11.5 31.1 16.0 0.10
Unisim [58] 35.6 6.1 14.3 14.3 0.05
Ours 30.8 3.0 10.9 8.5 0.05
Table 1: Evaluation of LiDAR NVS on Waymo Dynamic dataset.
TownClean TownReal Waymo interp. Waymo NVS
Method MAE \downarrow MedAE \downarrow CD \downarrow MAE \downarrow MedAE \downarrow CD \downarrow MAE \downarrow MedAE \downarrow CD \downarrow MAE \downarrow MedAE \downarrow CD \downarrow
i-NGP [29] 42.2 4.1 17.4 49.8 4.8 19.9 26.4 5.5 11.6 30.4 7.3 15.3
DS-NeRF [8] 41.7 3.9 16.6 48.9 4.4 18.8 28.2 6.3 14.5 30.4 7.2 16.8
URF [37] 43.3 4.2 16.8 52.1 5.1 20.7 28.2 5.4 12.9 43.1 10.0 21.2
LiDARsim [27] 159.6 0.8 23.5 162.8 3.8 27.4 116.3 15.2 27.6 160.2 16.2 34.7
NFL[19] 32.0 2.3 9.0 39.2 3.0 11.5 30.8 5.1 12.1 32.6 5.5 13.2
Ours 26.7 0.7 6.7 33.9 2.1 10.4 28.3 4.7 12.5 28.6 4.9 13.0
Table 2: Evaluation of LiDAR NVS on static scenes.
Refer to caption
Figure 3: ECDF plots showcasing range errors across all the points (left) and specifically for points on dynamic vehicles (right). Our composition of neural fields outperforms LiDARsim [27] and UniSim [58], especially when it comes to dynamic vehicles.
Refer to caption
Figure 4: Qualitative results of range estimation. Regions with gross errors (-100 Refer to caption 100 cm) are highlighted.

6.2 LiDAR novel view synthesis evaluation

LiDAR NVS in dynamic scenes.

Quantitative comparisons with baseline methods are detailed in Tab. 1. DyNFL reconstructs more accurate range than LiDARsim [27] and UniSim [58]. The improvement is largely due to our SDF-based neural scene representation, which incorporates the physical aspects of LiDAR sensing. Additionally, our method employs a ray drop test when rendering multiple neural fields, leading to a more accurate reconstruction of dynamic vehicles, as evidenced in Fig. 2 and further supported by the errors shown in Fig. 3.

LiDAR NVS in static scenes.

In addition to dynamic scenes, we evaluate DyNFL against baseline methods in static scenarios, Tab. 2 and Fig. 4. DyNFL excels in reconstructing crisp geometry. A key observation is its superior performance on planar regions (e.g., the ground shown in Fig. 4), especially when compared to NFL [19], which also uses a neural field for surface representation. This improvement is largely due to the enhanced surface regularization provided by SDF-based surface modeling.

Datasets MAE \downarrow MedAE \downarrow CD \downarrow
TownClean 26.7(-1.5) 0.7(-0.2) 6.7(-0.5)
Waymo Interp 28.3 (0.1) 4.7 (-0.2) 12.5 (-0.1)
Waymo Dynamic 30.8 (-0.3) 3.0 (-0.2) 10.9 (-0.3)
Table 3: Ablation study of volume rendering for active sensing.
Datasets MAE \downarrow MedAE \downarrow CD \downarrow
TownReal 33.9(-3.3) 2.1(-0.0) 10.4(-1.2)
Waymo Interp 28.3 (-0.3) 4.7 (-0.1) 12.5 (-0.3)
Table 4: Ablation study of the surface points’ SDF regularisation.
Refer to caption
Figure 5: Qualitative results on Waymo Dynamic dataset. Our model equipped with a ray drop module effectively composites multiple neural fields, re-simulating LiDAR scans of high quality.
Refer to caption
Figure 6: Object detection results on Waymo Dynamic dataset. The ground truth and predicted bounding boxes are marked in red and blue, respectively.
Refer to caption
Figure 7: LiDAR novel view synthesis by changing sensor elevation angle (θ𝜃\thetaitalic_θ), poses (x,y,z𝑥𝑦𝑧x,y,zitalic_x , italic_y , italic_z) and number of beams on Waymo Dynamic dataset. The points are color-coded by the intensity values (0 Refer to caption 0.25).
GT Ours LiDARSim[27]
Threshold AP \uparrow AP \uparrow Agg. \uparrow Dyn. Agg. \uparrow AP \uparrow Agg. \uparrow Agg. Dyn. \uparrow
IoU>>>0.7 0.85 0.86 0.77 0.71 0.90 0.76 0.68
IoU>>>0.5 0.98 0.96 0.87 0.76 0.95 0.86 0.76
Table 5: Object detection results on Waymo Dyanmic datasets.
Vehicle Background
Method Recall \uparrow Precision \uparrow IoU \uparrow Recall \uparrow Precision \uparrow IoU \uparrow
i-NGP [29] 91.8 83.6 78.1 97.9 99.2 97.1
DS-NeRF [8] 89.3 84.8 77.3 98.1 98.8 97.0
URF [36] 86.9 79.8 72.0 97.7 98.5 96.2
Lidarsim [27] 89.6 68.9 64.0 94.5 98.9 93.5
NFL [19] 94.5 84.8 80.9 97.8 99.4 97.3
Ours 90.5 88.4 81.1 98.5 98.7 97.3
Table 6: Semantic segmentation results on Waymo NVS dataset.
Refer to caption
Figure 8: Qualitative results of object removal and insertion. DyNFL seamlessly inserts the neural asset (truck) into a new scene attributed to our superior compositional rendering scheme. In contrast, UniSim [58] struggles to accurately model geometry.
Refer to caption
Figure 9: Qualitative results of object trajectory manipulation. The truck can be successfully detected after manipulation, indicating high-realism LiDAR re-simulation achieved by DyNFL.
Refer to caption
Figure 10: Object detections on noisy re-simulated LiDAR scans.

6.3 Ablation study

SDF-based volume rendering for active sensing.

We begin by assessing the efficacy of our SDF-based volume rendering for active sensors, results are shown in Tab. 3. When compared to our baseline that uses the SDF-based volume rendering for passive sensing, DyNFL demonstrates enhanced performance in both synthetic (TownClean) and real-world (Waymo Interp and Waymo Dynamic) datasets, indicating the importance of considering the physical LiDAR sensing process when addressing the inverse problem.

Neural fields composition.

To validate the efficacy of our two-stage neural field composition approach, we compare it with an alternative approach utilized in UniSim [58]. The results are shown in Tab. 1. UniSim [58] blends different neural fields by sampling points from all intersected neural fields, followed by a single evaluation of volume rendering to produce the final LiDAR scan. In contrast, our method independently renders from each intersecting neural field first, and then combines these measurements into a final output using a ray drop test (cf. Fig. 5). This approach leads to markedly improved geometry reconstruction, exemplified by our method halving the Median Absolute Error (MedAE) across all points. This enhancement is even more evident when focusing solely on points related to dynamic vehicles (cf. Fig. 3).

Surface point SDF constraint.

We examine the importance of constraining the SDF at surface points, cf. Sec. 5, with the Town Real and Waymo Interp datasets. The results shown in Tab. 4 suggest that our method yields improved geometry reconstruction by explicitly enforcing SDF values near zero at the LiDAR points.

6.4 Auxiliary task evaluations

To assess the fidelity of our neural re-simulation and gauge the domain gap between re-simulated and real scans, we evaluate their applicability in two downstream tasks: object detection and semantic segmentation.

Object detection.

We run the pre-trained FSDv2 [10] object detector on re-simulated LiDAR scans of the Waymo Dynamic dataset. Our results are compared against those from LiDARsim [27], with the findings detailed in Tab. 5 and Fig. 6. In summary, detections in scans generated with DyNFL exhibit better agreement with those in real LiDAR scans. In Fig. 10 we additionally show detection results in the noisy background. The detections in the synthetic scans almost perfectly match those in the original scans. This indicates a high fidelity of our re-simulations, with a low domain gap to actual scans.

Semantic segmentation.

We segment synthetic scans into semantic classes with the pre-trained SPVNAS model [45] and show results in Tab. 6. DyNFL improves over baseline methods according to most evaluation metrics, underscoring the realism of our re-simulated LiDAR scans.

6.5 Scene editing

Beyond LiDAR novel view synthesis by adjusting the sensor configuration (cf. Fig. 7), we also demonstrate the practicality of our compositional neural fields approach with two scene editing applications.

Inserting objects from other scenes.

Our explicit scene decomposition and flexible composition technique enable seamless insertion and removal of neural assets across scenes. As demonstrated in Fig. 8, we are able to replace a car from one scene with a truck from another scene, achieving accurate reconstruction of both geometry and intensity. In contrast, UniSim [58] struggles to preserve high-quality geometry. This highlights the potential of our approach to generate diverse and realistic LiDAR scans of autonomous driving scenarios.

Manipulating object trajectories.

DyNFL also facilitates the manipulation of moving objects’ trajectories, by simply adjusting their relative poses to the canonical bounding box. Representative results are shown in  Fig. 9. The high realism of our re-simulation is also indicated by the successful detection of inserted virtual objects.

7 Conclusion and future work

We have presented DyNFL, a compositional neural fields approach for LiDAR re-simulation. Our method surpasses prior art in both static and dynamic scenes, offering powerful scene editing capabilities with the potential to synthesize diverse and high-quality scenes, e.g., to evaluate a perception system trained only on real data in closed-loop mode.

Despite state-of-the-art performance, there are still limitations we aim to address in future work. First, DyNFL has difficulties in synthesizing moving vehicles from unseen view angles. The task is even more challenging than pure shape completion, because the learned prior must include the ability to simulate also intensity, ray drop patterns, etc. in the unseen region. Second, our method currently relies on bounding boxes and trajectories of the moving objects, and its performance may be compromised when bounding boxes are inaccurate. Overcoming this dependency, exploring 4D representations while retaining scene editing flexibility, stands out as a main challenge for future research.

Acknowledgments

Or Litany is a Taub fellow and is supported by the Azrieli Foundation Early Career Faculty Fellowship.

References

  • Attal et al. [2023] Benjamin Attal, Jia-Bin Huang, Christian Richardt, Michael Zollhoefer, Johannes Kopf, Matthew O’Toole, and Changil Kim. HyperReel: High-fidelity 6-DoF video with ray-conditioned sampling. In CVPR, 2023.
  • Barron et al. [2021] Jonathan T Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, and Pratul P Srinivasan. Mip-NeRF: A multiscale representation for anti-aliasing neural radiance fields. In ICCV, 2021.
  • Barron et al. [2022] Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In CVPR, 2022.
  • Berman et al. [2018] Maxim Berman, Amal Rannen Triki, and Matthew B Blaschko. The Lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In CVPR, 2018.
  • Caccia et al. [2019] Lucas Caccia, Herke Van Hoof, Aaron Courville, and Joelle Pineau. Deep generative modeling of lidar data. In IROS. IEEE, 2019.
  • Chang et al. [2023] MingFang Chang, Akash Sharma, Michael Kaess, and Simon Lucey. Neural radiance field with LiDAR maps. In ICCV, 2023.
  • Chen et al. [2022] Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. TensorRF: Tensorial radiance fields. In ECCV. Springer, 2022.
  • Deng et al. [2022] Kangle Deng, Andrew Liu, Jun-Yan Zhu, and Deva Ramanan. Depth-supervised NeRF: Fewer views and faster training for free. In CVPR, 2022.
  • Dosovitskiy et al. [2017] Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. In Conference on robot learning. PMLR, 2017.
  • Fan et al. [2023] Lue Fan, Feng Wang, Naiyan Wang, and Zhaoxiang Zhang. FSD V2: Improving fully sparse 3d object detection with virtual voxels. arXiv preprint arXiv:2308.03755, 2023.
  • Fang et al. [2020] Jin Fang, Dingfu Zhou, Feilong Yan, Tongtong Zhao, Feihu Zhang, Yu Ma, Liang Wang, and Ruigang Yang. Augmented lidar simulator for autonomous driving. IEEE Robotics and Automation Letters, 5(2):1931–1938, 2020.
  • Fridovich-Keil et al. [2022] Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
  • Gojcic et al. [2021] Zan Gojcic, Or Litany, Andreas Wieser, Leonidas J Guibas, and Tolga Birdal. Weakly supervised learning of rigid 3d scene flow. CVPR 2021, 2021.
  • Gropp et al. [2020] Amos Gropp, Lior Yariv, Niv Haim, Matan Atzmon, and Yaron Lipman. Implicit geometric regularization for learning shapes. In Proceedings of Machine Learning and Systems 2020, pages 3569–3579. 2020.
  • Guillard et al. [2022] Benoît Guillard, Sai Vemprala, Jayesh K Gupta, Ondrej Miksik, Vibhav Vineet, Pascal Fua, and Ashish Kapoor. Learning to simulate realistic lidars. In IROS, pages 8173–8180. IEEE, 2022.
  • Hahner et al. [2021] Martin Hahner, Christos Sakaridis, Dengxin Dai, and Luc Van Gool. Fog simulation on real LiDAR point clouds for 3d object detection in adverse weather. In CVPR, 2021.
  • Hahner et al. [2022] Martin Hahner, Christos Sakaridis, Mario Bijelic, Felix Heide, Fisher Yu, Dengxin Dai, and Luc Van Gool. LiDAR snowfall simulation for robust 3d object detection. In CVPR, 2022.
  • Huang et al. [2022] Shengyu Huang, Zan Gojcic, Jiahui Huang, and Konrad Schindler Andreas Wieser. Dynamic 3d scene analysis by point cloud accumulation. In ECCV, 2022.
  • Huang et al. [2023] Shengyu Huang, Zan Gojcic, Zian Wang, Francis Williams, Yoni Kasten, Sanja Fidler, Konrad Schindler, and Or Litany. Neural lidar fields for novel view synthesis. In ICCV, 2023.
  • Kerbl et al. [2023] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4):1–14, 2023.
  • Kingma and Ba [2014] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • Kundu et al. [2022] Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Caroline Pantofaru, Leonidas Guibas, Andrea Tagliasacchi, Frank Dellaert, and Thomas Funkhouser. Panoptic neural fields: A semantic object-aware neural scene representation. In CVPR, 2022.
  • Li et al. [2023a] Chenqi Li, Yuan Ren, and Bingbing Liu. Pcgen: Point cloud generator for lidar simulation. In ICRA, pages 11676–11682. IEEE, 2023a.
  • Li et al. [2021] Zhengqi Li, Simon Niklaus, Noah Snavely, and Oliver Wang. Neural scene flow fields for space-time view synthesis of dynamic scenes. In CVPR, 2021.
  • Li et al. [2023b] Zhaoshuo Li, Thomas Müller, Alex Evans, Russell H Taylor, Mathias Unberath, Ming-Yu Liu, and Chen-Hsuan Lin. Neuralangelo: High-fidelity neural surface reconstruction. In CVPR, 2023b.
  • Liu et al. [2023] Yu-Lun Liu, Chen Gao, Andreas Meuleman, Hung-Yu Tseng, Ayush Saraf, Changil Kim, Yung-Yu Chuang, Johannes Kopf, and Jia-Bin Huang. Robust dynamic radiance fields. In CVPR, 2023.
  • Manivasagam et al. [2020] Sivabalan Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, and Raquel Urtasun. LiDARsim: Realistic LiDAR simulation by leveraging the real world. In CVPR, 2020.
  • Mildenhall et al. [2020] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. NerF: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  • Müller et al. [2022] Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, 2022.
  • Niemeyer et al. [2020] Michael Niemeyer, Lars Mescheder, Michael Oechsle, and Andreas Geiger. Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In CVPR, 2020.
  • Oechsle et al. [2021] Michael Oechsle, Songyou Peng, and Andreas Geiger. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In ICCV, 2021.
  • Ost et al. [2021] Julian Ost, Fahim Mannan, Nils Thuerey, Julian Knodt, and Felix Heide. Neural scene graphs for dynamic scenes. In CVPR, 2021.
  • Park et al. [2021a] Keunhong Park, Utkarsh Sinha, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Steven M. Seitz, and Ricardo Martin-Brualla. Nerfies: Deformable neural radiance fields. In ICCV, 2021a.
  • Park et al. [2021b] Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin-Brualla, and Steven M. Seitz. HyperNeRF: A higher-dimensional representation for topologically varying neural radiance fields. ACM Transactions on Graphics, 40(6), 2021b.
  • Pumarola et al. [2021] Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer. D-nerf: Neural radiance fields for dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10318–10327, 2021.
  • Rematas et al. [2021] Konstantinos Rematas, Andrew Liu, Pratul P Srinivasan, Jonathan T Barron, Andrea Tagliasacchi, Thomas Funkhouser, and Vittorio Ferrari. Urban radiance fields. arXiv preprint arXiv:2111.14643, 2021.
  • Rematas et al. [2022] Konstantinos Rematas, Andrew Liu, Pratul P Srinivasan, Jonathan T Barron, Andrea Tagliasacchi, Thomas Funkhouser, and Vittorio Ferrari. Urban radiance fields. In CVPR, 2022.
  • Sara Fridovich-Keil and Giacomo Meanti et al. [2023] Sara Fridovich-Keil and Giacomo Meanti, Frederik Rahbæk Warburg, Benjamin Recht, and Angjoo Kanazawa. K-planes: Explicit radiance fields in space, time, and appearance. In CVPR, 2023.
  • Shah et al. [2018] Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In International Conference on Field and Service Robotics, pages 621–635. Springer, 2018.
  • Shoemake [1985] Ken Shoemake. Animating rotation with quaternion curves. In Conference on Computer Graphics and Interactive Techniques, page 245–254. Association for Computing Machinery, 1985.
  • Sun et al. [2022] Jiaming Sun, Xi Chen, Qianqian Wang, Zhengqi Li, Hadar Averbuch-Elor, Xiaowei Zhou, and Noah Snavely. Neural 3d reconstruction in the wild. In ACM SIGGRAPH, 2022.
  • Sun et al. [2020] Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al. Scalability in perception for autonomous driving: Waymo open dataset. In CVPR, 2020.
  • Tagliasacchi and Mildenhall [2022] Andrea Tagliasacchi and Ben Mildenhall. Volume rendering digest (for nerf). arXiv preprint arXiv:2209.02417, 2022.
  • Tancik et al. [2023] Matthew Tancik, Ethan Weber, Evonne Ng, Ruilong Li, Brent Yi, Justin Kerr, Terrance Wang, Alexander Kristoffersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, David McAllister, and Angjoo Kanazawa. Nerfstudio: A modular framework for neural radiance field development. In ACM SIGGRAPH 2023 Conference Proceedings, 2023.
  • Tang et al. [2020] Haotian* Tang, Zhijian* Liu, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, and Song Han. Searching efficient 3d architectures with sparse point-voxel convolution. In ECCV, 2020.
  • Tang et al. [2022] Haotian Tang, Zhijian Liu, Xiuyu Li, Yujun Lin, and Song Han. Torchsparse: Efficient point cloud inference engine. Proceedings of Machine Learning and Systems, 4:302–315, 2022.
  • Tao et al. [2023] Tang Tao, Longfei Gao, Guangrun Wang, Peng Chen, Dayang Hao, Xiaodan Liang, Mathieu Salzmann, and Kaicheng Yu. Lidar-nerf: Novel lidar view synthesis via neural radiance fields. arXiv preprint arXiv:2304.10406, 2023.
  • Turki et al. [2023] Haithem Turki, Jason Y Zhang, Francesco Ferroni, and Deva Ramanan. Suds: Scalable urban dynamic scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12375–12385, 2023.
  • Verbin et al. [2022] Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T Barron, and Pratul P Srinivasan. Ref-nerf: Structured view-dependent appearance for neural radiance fields. In CVPR, 2022.
  • Wang et al. [2021] Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping Wang. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In NeurIPS, 2021.
  • Wang et al. [2022] Yiqun Wang, Ivan Skorokhodov, and Peter Wonka. Hf-neus: Improved surface reconstruction using high-frequency details. In NeurIPS, 2022.
  • Wang et al. [2023a] Yiming Wang, Qin Han, Marc Habermann, Kostas Daniilidis, Christian Theobalt, and Lingjie Liu. Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. In ICCV, 2023a.
  • Wang et al. [2023b] Zian Wang, Tianchang Shen, Jun Gao, Shengyu Huang, Jacob Munkberg, Jon Hasselgren, Zan Gojcic, Wenzheng Chen, and Sanja Fidler. Neural fields meet explicit geometric representations for inverse rendering of urban scenes. In CVPR, 2023b.
  • Wu et al. [2022a] Tong Wu, Jiaqi Wang, Xingang Pan, Xudong Xu, Christian Theobalt, Ziwei Liu, and Dahua Lin. Voxurf: Voxel-based efficient and accurate neural surface reconstruction. arXiv preprint arXiv:2208.12697, 2022a.
  • Wu et al. [2022b] Tianhao Wu, Fangcheng Zhong, Andrea Tagliasacchi, Forrester Cole, and Cengiz Oztireli. D^ 2nerf: Self-supervised decoupling of dynamic and static objects from a monocular video. Advances in Neural Information Processing Systems, 35:32653–32666, 2022b.
  • Xie et al. [2022] Yiheng Xie, Towaki Takikawa, Shunsuke Saito, Or Litany, Shiqin Yan, Numair Khan, Federico Tombari, James Tompkin, Vincent Sitzmann, and Srinath Sridhar. Neural fields in visual computing and beyond. In Computer Graphics Forum, pages 641–676, 2022.
  • Yang et al. [2023a] Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, and Yue Wang. Emernerf: Emergent spatial-temporal scene decomposition via self-supervision. arXiv preprint arXiv:2311.02077, 2023a.
  • Yang et al. [2023b] Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Manivasagam, Wei-Chiu Ma, Anqi Joyce Yang, and Raquel Urtasun. Unisim: A neural closed-loop sensor simulator. In CVPR, 2023b.
  • Yariv et al. [2021] Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. Volume rendering of neural implicit surfaces. In NeurIPS, 2021.
  • Yu et al. [2022] Zehao Yu, Songyou Peng, Michael Niemeyer, Torsten Sattler, and Andreas Geiger. MonoSDF: Exploring monocular geometric cues for neural implicit surface reconstruction. In NeurIPS, 2022.
  • Yuan et al. [2021] Wentao Yuan, Zhaoyang Lv, Tanner Schmidt, and Steven Lovegrove. Star: Self-supervised tracking and reconstruction of rigid objects in motion with neural rendering. In CVPR, 2021.
  • Zhang et al. [2023] Junge Zhang, Feihu Zhang, Shaochen Kuang, and Li Zhang. Nerf-lidar: Generating realistic lidar point clouds with neural radiance fields. arXiv preprint arXiv:2304.14811, 2023.
  • Zuo et al. [2023] Xingxing Zuo, Nan Yang, Nathaniel Merrill, Binbin Xu, and Stefan Leutenegger. Incremental dense reconstruction from monocular video with guided sparse feature volume fusion. IEEE Robotics and Automation Letters, 2023.
  • Zyrianov et al. [2022] Vlas Zyrianov, Xiyue Zhu, and Shenlong Wang. Learning to generate realistic lidar point clouds. In ECCV. Springer, 2022.

In this supplementary material, we first provide additional information about the datasets for our evaluations and implementation details of our proposed method in Sec. A. Next, we present more qualitative and quantitative results in Sec. B. Please also check the supplemental video for more results showcasing our performance. Finally, we provide the complete derivations of the SDF-based volume rendering for active sensor in Sec. C.

A Datasets and implementation details

A.1 Datasets

Waymo Dynamic.

For the Waymo Dynamic dataset, we take them from 4 scenes of Waymo Open Dataset [42]. There are multiple moving vehicles inside each scene. 50 consecutive frames are taken from each scene for our evaluation. The vehicles are deemed as dynamic if the speed is >1absent1>1\,> 1m/s. in any of the 50 frames. The corresponding scene IDs on Waymo Open Dataset for our selected scenes are shown as follows:

Scene ID
Scene 1 1083056852838271990_4080_000_4100_000
Scene 2 13271285919570645382_5320_000_5340_000
Scene 3 10072140764565668044_4060_000_4080_000
Scene 4 10500357041547037089_1474_800_1494_800

Waymo Dynamic NVS.

For the Waymo Dynamic NVS dataset, we use the same 4 scenes as chosen in Waymo Dynamic. We change the evaluation paradigm similar to Waymo NVS  [19] such that we first train the model on all 50 consecutive LiDAR frames then we synthesize 50 novel LiDAR frames with a sensor shift of 2 meters. We then train a new model on the new 50 synthetic LiDAR scans and evaluate against the original 50 LiDAR scans.

A.2 Implementation details

DynNFL.

Our model is implemented based on nerfstudio[44]. For the static neural field, we sample Ns=512subscript𝑁𝑠512N_{s}=512italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 512 points in total, with Nu=256subscript𝑁𝑢256N_{u}=256italic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 256 uniformly sampled points and Ni=256subscript𝑁𝑖256N_{i}=256italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 256 weighted sampled points with 8 upsample steps. In each upsample step, 32 points are sampled based on the weight distribution of the previously sampled points. For each dynamic neural field, we sample Ns=128subscript𝑁𝑠128N_{s}=128italic_N start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 128 points in total, with Nu=64subscript𝑁𝑢64N_{u}=64italic_N start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = 64 uniformly sampled points and Ni=64subscript𝑁𝑖64N_{i}=64italic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 64 weighted sampled points with 4 upsample steps. During training, we minimize the loss function using the Adam [21] optimiser, with an initial learning rate of 0.005. It linearly decays to 0.0005 towards the end of training. For the loss weights, we use wζ=3,we=50,wdrop=0.15,ws=1formulae-sequencesubscript𝑤𝜁3formulae-sequencesubscript𝑤𝑒50formulae-sequencesubscript𝑤drop0.15subscript𝑤𝑠1w_{\zeta}=3,w_{e}=50,w_{\text{drop}}=0.15,w_{s}=1italic_w start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT = 3 , italic_w start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT = 50 , italic_w start_POSTSUBSCRIPT drop end_POSTSUBSCRIPT = 0.15 , italic_w start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = 1, and weik=0.3subscript𝑤eik0.3w_{\text{eik}}=0.3italic_w start_POSTSUBSCRIPT eik end_POSTSUBSCRIPT = 0.3. The batch size is 4096 and we train the model for 60000 iterations on a single RTX3090 GPU with float32 precision.

LiDARsim.

We re-implement the LiDARsim [27] as one of our baselines. First, we estimated point-wise normal vectors by considering all points within a 20 cm radius ball within the training set. Following this, we applied voxel down-sampling [46], employing a 4 cm voxel size to reconstruct individual disk surfels at each point. The surfel orientation is defined based on the estimated normal vector. During inference, we apply the ray-surfel intersections test to determine the intersection points, thus the range and intensity values. We select a fixed surfel radius of 6 cm for the Waymo dataset and 12 cm for the Town dataset. To handle dynamic vehicles, we follow LiDARsim [27] by aggregating the LiDAR points for each vehicle from all the training frames and representing them in the canonical frame of each vehicle. During inference, we transform all the aggregated vehicle points from their canonical frames to the world frame and run ray-surfel intersection.

UniSim.

We re-implement UniSim’s [58] rendering process for LiDAR measurements by replacing our ray-drop test-based neural fields composition method with its joint rendering method. For every ray 𝐫(𝐨,𝐝)𝐫𝐨𝐝\mathbf{r}(\mathbf{o},\mathbf{d})bold_r ( bold_o , bold_d ), we begin by conducting an intersection test with all dynamic bounding boxes in the scene to identify the near and far limits. We then uniformly sample 512 points along each ray, assigning each point to either a dynamic neural field, if it falls within a dynamic bounding box, or to the static neural field otherwise. After sampling, we query the SDF and intensity values from the relevant neural fields. Finally, using the SDF-based volume rendering formula in Eq. 51 for active sensors, we calculate the weights and perform the rendering. Note that we use the same neural field architecture as in our method.

B Additional results

B.1 Waymo Dynamic NVS evaluation

To demonstrate the robustness of our method, we extend the evaluation paradigm to not only focus on interpolation performance. We incorporate Waymo NVS evaluation introduced in Sec. A to focus on close-loop novel view synthesis performance. As illustrated in Tab. 7, our method outperforms LiDARsim and Unisim in all aspects.

B.2 Future frame generation

We trained DyNFL using the initial 40 frames and assessed its performance against the last 10 frames. The results are presented on the Waymo Dynamic dataset in Tab. 8 and Fig. 11. Unsurprisingly, the performance is comparatively inferior to the original setting (cf. Tab. 1), as it requires extrapolation beyond the observed environment, and thus again a (possibly learned) scene prior. Nevertheless DyNFL continues to outperform LiDARsim. The degradation on dynamic vehicles is marginal, attributable to our precise pose interpolation and high-quality asset reconstruction. We will incorporate these findings in the final version.

Refer to caption
Figure 11: Qualitative results of LiDAR future frame simulation.
Method MAE \downarrow MedAE \downarrow CD \downarrow MedAE Dyn \downarrow Intensity RMSE \downarrow
LiDARsim [27] 448.4 55.1 77.0 38.7 0.13
Unisim [58] 115.1 9.7 33.5 24.3 0.19
Ours 72.9 3.8 22.9 14.0 0.07
Table 7: Evaluation of LiDAR NVS on Waymo Dynamic NVS.
Method MAE \downarrow MedAE \downarrow CD \downarrow MedAE Dyn \downarrow
LiDARsim 333.3 25.3 67.8 13.0
Ours 81.8 8.6 26.4 9.3
Table 8: Results of future frame simulation.

B.3 Runtime analysis

DyNFL training takes \approx7 hours on average on a single RTX 3090 GPU with fp16 precision and 16 hours with fp32 prevision, inference takes 2.2 seconds per LiDAR scan using fp16 precision and 7 seconds using fp32 precision. The envisioned offline use for counterfactual re-simulation prioritizes realism over efficiency. Runtime can potentially be improved for high-throughput applications by reducing rendering complexity.

B.4 More qualitative results

In this section, we provide more qualitative results. In Fig. 12, we showcase the 4 scenes from Waymo dynamic dataset. We show additional scene editing results in Fig. 13. Please check the supplementary videos for more qualitative results.

Refer to caption
Figure 12: Visualization of 4 selected scenes from Waymo Dynamic dataset. For each scene, we aggregate 50 frames. In the first row, points are color-coded by the intensity values(0  Refer to caption  0.25). In the second row, dynamic vehicles are painted as yellow.
Refer to caption
Figure 13: Visualization of scene editing capabilities. We showcase 3 kinds of scene editing capabilities including vehicle removal(left), trajectory manipulation(middle) and vehicle insertion(right). The first row represents the original scenes, the second row demonstrates the scenes after editing. All points are color-coded by the intensity values(0  Refer to caption  0.25).

C SDF-based LiDAR volume rendering

In this section, we start by introducing the preliminary of NeRF [28] following terminology as described in [43]. Then we provide the full derivation of the SDF-based volume rendering for active sensor.

C.1 Preliminary

Density.

For a ray emitted from the origin 𝐨3superscript3𝐨absent\mathbf{o}\in^{3}bold_o ∈ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT towards direction 𝐝3superscript3𝐝absent\mathbf{d}\in^{3}bold_d ∈ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT, the density σζsubscript𝜎𝜁\sigma_{\zeta}italic_σ start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT at range ζ𝜁\zetaitalic_ζ indicates the likelihood of light interacting with particles at that point 𝐫ζ=𝐨+ζ𝐝subscript𝐫𝜁𝐨𝜁𝐝\mathbf{r}_{\zeta}=\mathbf{o}+\zeta\mathbf{d}bold_r start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT = bold_o + italic_ζ bold_d. This interaction can include absorption or scattering of light. In passive sensing, density σ𝜎\sigmaitalic_σ is a critical factor in determining how much light from the scene’s illumination is likely to reach the sensor after passing through the medium.

Transmittance

quantifies the likelihood of light traveling through a given portion of the medium without being scattered or absorbed. Density is closely tied to the transmittance function 𝒯(ζ)𝒯𝜁\mathcal{T}(\zeta)caligraphic_T ( italic_ζ ), which indicates the probability of a ray traveling over the interval [0,ζ)0𝜁[0,\zeta)[ 0 , italic_ζ ) without hitting any particles. Then the probability 𝒯(ζ+dζ)𝒯𝜁𝑑𝜁\mathcal{T}(\zeta{+}d\zeta)caligraphic_T ( italic_ζ + italic_d italic_ζ ) of not hitting a particle when taking a differential step dζ𝑑𝜁d\zetaitalic_d italic_ζ is equal to 𝒯(ζ)𝒯𝜁\mathcal{T}(\zeta)caligraphic_T ( italic_ζ ), the likelihood of the ray reaching ζ𝜁\zetaitalic_ζ, times (1dζσ(ζ))1𝑑𝜁𝜎𝜁(1-d\zeta\cdot\sigma(\zeta))( 1 - italic_d italic_ζ ⋅ italic_σ ( italic_ζ ) ), the probability of not hitting anything during the step:

𝒯(ζ+dζ)=𝒯𝜁𝑑𝜁absent\displaystyle\mathcal{T}(\zeta+d\zeta)=caligraphic_T ( italic_ζ + italic_d italic_ζ ) = 𝒯(ζ)(1dζσ(ζ))𝒯𝜁1𝑑𝜁𝜎𝜁\displaystyle\mathcal{T}(\zeta)\cdot(1-d\zeta\cdot\sigma(\zeta))caligraphic_T ( italic_ζ ) ⋅ ( 1 - italic_d italic_ζ ⋅ italic_σ ( italic_ζ ) ) (12)
𝒯(ζ+dζ)𝒯(ζ)dζ𝒯𝜁𝑑𝜁𝒯𝜁𝑑𝜁absent\displaystyle\frac{\mathcal{T}(\zeta+d\zeta)-\mathcal{T}(\zeta)}{d\zeta}\equivdivide start_ARG caligraphic_T ( italic_ζ + italic_d italic_ζ ) - caligraphic_T ( italic_ζ ) end_ARG start_ARG italic_d italic_ζ end_ARG ≡ 𝒯(ζ)=𝒯(ζ)σ(ζ).superscript𝒯𝜁𝒯𝜁𝜎𝜁\displaystyle\mathcal{T}^{\prime}(\zeta)=-\mathcal{T}(\zeta)\cdot\sigma(\zeta)\;.caligraphic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ζ ) = - caligraphic_T ( italic_ζ ) ⋅ italic_σ ( italic_ζ ) . (13)

We solve the differential equation as follows:

𝒯(ζ)superscript𝒯𝜁\displaystyle\mathcal{T}^{\prime}(\zeta)caligraphic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ζ ) =𝒯(ζ)σ(ζ)absent𝒯𝜁𝜎𝜁\displaystyle=-\mathcal{T}(\zeta)\cdot\sigma(\zeta)= - caligraphic_T ( italic_ζ ) ⋅ italic_σ ( italic_ζ ) (14)
𝒯(ζ)𝒯(ζ)superscript𝒯𝜁𝒯𝜁\displaystyle\frac{\mathcal{T}^{\prime}(\zeta)}{\mathcal{T}(\zeta)}divide start_ARG caligraphic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ζ ) end_ARG start_ARG caligraphic_T ( italic_ζ ) end_ARG =σ(ζ)absent𝜎𝜁\displaystyle=-\sigma(\zeta)= - italic_σ ( italic_ζ ) (15)
ab𝒯(ζ)𝒯(ζ)𝑑ζsuperscriptsubscript𝑎𝑏superscript𝒯𝜁𝒯𝜁differential-d𝜁\displaystyle\int_{a}^{b}\frac{\mathcal{T}^{\prime}(\zeta)}{\mathcal{T}(\zeta)% }\;d\zeta∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT divide start_ARG caligraphic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_ζ ) end_ARG start_ARG caligraphic_T ( italic_ζ ) end_ARG italic_d italic_ζ =abσ(ζ)𝑑ζabsentsuperscriptsubscript𝑎𝑏𝜎𝜁differential-d𝜁\displaystyle=-\int_{a}^{b}\sigma(\zeta)\;d\zeta= - ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT italic_σ ( italic_ζ ) italic_d italic_ζ (16)
log𝒯(ζ)|abevaluated-at𝒯𝜁𝑎𝑏\displaystyle\left.\log\mathcal{T}(\zeta)\right|_{a}^{b}roman_log caligraphic_T ( italic_ζ ) | start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT =abσ(ζ)𝑑ζabsentsuperscriptsubscript𝑎𝑏𝜎𝜁differential-d𝜁\displaystyle=-\int_{a}^{b}\sigma(\zeta)\;d\zeta= - ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT italic_σ ( italic_ζ ) italic_d italic_ζ (17)
𝒯(ab)𝒯(b)𝒯(a)𝒯𝑎𝑏𝒯𝑏𝒯𝑎\displaystyle\mathcal{T}(a\rightarrow b)\equiv\frac{\mathcal{T}(b)}{\mathcal{T% }(a)}caligraphic_T ( italic_a → italic_b ) ≡ divide start_ARG caligraphic_T ( italic_b ) end_ARG start_ARG caligraphic_T ( italic_a ) end_ARG =exp(abσ(ζ)𝑑ζ).absentexpsuperscriptsubscript𝑎𝑏𝜎𝜁differential-d𝜁\displaystyle=\text{exp}\left(-\int_{a}^{b}\sigma(\zeta)\;d\zeta\right)\;.= exp ( - ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT italic_σ ( italic_ζ ) italic_d italic_ζ ) . (18)

Hence, for a ray segment between ζ0subscript𝜁0\zeta_{0}italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and ζ𝜁\zetaitalic_ζ, transmittance is given by:

𝒯ζ0ζ𝒯ζ𝒯ζ0=exp(ζ0ζσt𝑑t),subscript𝒯subscript𝜁0𝜁subscript𝒯𝜁subscript𝒯subscript𝜁0𝑒𝑥𝑝superscriptsubscriptsubscript𝜁0𝜁subscript𝜎𝑡differential-d𝑡\mathcal{T}_{\zeta_{0}\rightarrow\zeta}\equiv\frac{\mathcal{T}_{\zeta}}{% \mathcal{T}_{\zeta_{0}}}=exp({-\int_{\zeta_{0}}^{\zeta}\sigma_{t}dt})\;,caligraphic_T start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT → italic_ζ end_POSTSUBSCRIPT ≡ divide start_ARG caligraphic_T start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT end_ARG start_ARG caligraphic_T start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_ARG = italic_e italic_x italic_p ( - ∫ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ζ end_POSTSUPERSCRIPT italic_σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d italic_t ) , (19)

which leads to following factorization of the transmittance:

𝒯ζ=𝒯0ζ0𝒯ζ0ζ.subscript𝒯𝜁subscript𝒯0subscript𝜁0subscript𝒯subscript𝜁0𝜁\mathcal{T}_{\zeta}=\mathcal{T}_{0\rightarrow\zeta_{0}}\cdot\mathcal{T}_{\zeta% _{0}\rightarrow\zeta}\;.caligraphic_T start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT = caligraphic_T start_POSTSUBSCRIPT 0 → italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ caligraphic_T start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT → italic_ζ end_POSTSUBSCRIPT . (20)

Opacity

is the complement of transmittance and represents the fraction of light that is either absorbed or scattered in the medium. In a homogeneous medium with constant density σ𝜎\sigmaitalic_σ the opacity for a segment [ζj,ζj+1]subscript𝜁𝑗subscript𝜁𝑗1[\zeta_{j},\zeta_{j+1}][ italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ζ start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT ] of length ΔζΔ𝜁\Delta\zetaroman_Δ italic_ζ is given by αζj=1exp(σΔζ)subscript𝛼subscript𝜁𝑗1𝑒𝑥𝑝𝜎Δ𝜁\alpha_{\zeta_{j}}=1-exp(-\sigma\cdot\Delta\zeta)italic_α start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 1 - italic_e italic_x italic_p ( - italic_σ ⋅ roman_Δ italic_ζ )

C.2 SDF-based volume rendering for active sensor

NeuS[50] derives the opaque density based on the SDF which is:

σζi=max(dΦsdζi(f(ζi))Φs(f(ζi)),0)=max((f(ζi)𝐯)ϕs(f(ζi))Φs(f(ζi)),0),subscript𝜎subscript𝜁𝑖𝑑subscriptΦ𝑠𝑑subscript𝜁𝑖𝑓subscript𝜁𝑖subscriptΦ𝑠𝑓subscript𝜁𝑖0𝑓subscript𝜁𝑖𝐯subscriptitalic-ϕ𝑠𝑓subscript𝜁𝑖subscriptΦ𝑠𝑓subscript𝜁𝑖0\begin{split}\sigma_{\zeta_{i}}=&\max\left(\frac{-\frac{d\Phi_{s}}{d\zeta_{i}}% (f(\zeta_{i}))}{\Phi_{s}(f(\zeta_{i}))},0\right)\\ =&\max\left(\frac{-(\nabla f(\zeta_{i})\cdot\mathbf{v})\phi_{s}(f(\zeta_{i}))}% {\Phi_{s}(f(\zeta_{i}))},0\right)\;,\end{split}start_ROW start_CELL italic_σ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = end_CELL start_CELL roman_max ( divide start_ARG - divide start_ARG italic_d roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) end_ARG start_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) end_ARG , 0 ) end_CELL end_ROW start_ROW start_CELL = end_CELL start_CELL roman_max ( divide start_ARG - ( ∇ italic_f ( italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ bold_v ) italic_ϕ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) end_ARG start_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) end_ARG , 0 ) , end_CELL end_ROW (21)

where ΦssubscriptΦ𝑠\Phi_{s}roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT represents the Sigmoid function, f𝑓fitalic_f is the SDF function that maps a range ζ𝜁\zetaitalic_ζ to the SDF value of the point position 𝐨+𝐝ζ𝐨𝐝𝜁\mathbf{o}+\mathbf{d}*\zetabold_o + bold_d ∗ italic_ζ. Note that the integral term is computed by

(f(ζ)𝐯)ϕs(f(ζ))Φs(f(ζ))𝑑ζ=ln(Φs(f(ζ)))+C.𝑓𝜁𝐯subscriptitalic-ϕ𝑠𝑓𝜁subscriptΦ𝑠𝑓𝜁differential-d𝜁subscriptΦ𝑠𝑓𝜁𝐶\int\frac{-(\nabla f(\zeta)\cdot\mathbf{v})\phi_{s}(f(\zeta))}{\Phi_{s}(f(% \zeta))}d\zeta=-\ln(\Phi_{s}(f(\zeta)))+C\;.∫ divide start_ARG - ( ∇ italic_f ( italic_ζ ) ⋅ bold_v ) italic_ϕ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ ) ) end_ARG start_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ ) ) end_ARG italic_d italic_ζ = - roman_ln ( roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ ) ) ) + italic_C . (22)

We extend the density-based volume rendering for active sensor to SDF-based. Starting from the passive SDF-based volume rendering [50], We substitute the density σ~~𝜎\tilde{\sigma}over~ start_ARG italic_σ end_ARG with opaque density in 21 and evaluate the radiant power integrated from ray segment [a,b] with constant reflectivity ρasubscript𝜌𝑎\rho_{a}italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT.

Consider the case where (f(ζ)𝐯)>0𝑓𝜁𝐯0-(\nabla f(\zeta)\cdot\mathbf{v})>0- ( ∇ italic_f ( italic_ζ ) ⋅ bold_v ) > 0 within the ray segment [a,b]𝑎𝑏[a,b][ italic_a , italic_b ]. We have

P(ab)𝑃𝑎𝑏\displaystyle P(a\rightarrow b)italic_P ( italic_a → italic_b ) =ab𝒯2(at)σ~tρ(t)𝑑tabsentsuperscriptsubscript𝑎𝑏superscript𝒯2𝑎𝑡subscript~𝜎𝑡𝜌𝑡differential-d𝑡\displaystyle=\int_{a}^{b}\mathcal{T}^{2}(a\rightarrow t)\cdot\tilde{\sigma}_{% t}\cdot\rho(t)\;dt= ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_a → italic_t ) ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_ρ ( italic_t ) italic_d italic_t (23)
=ρaab𝒯2(at)σ~t𝑑tabsentsubscript𝜌𝑎superscriptsubscript𝑎𝑏superscript𝒯2𝑎𝑡subscript~𝜎𝑡differential-d𝑡\displaystyle=\rho_{a}\int_{a}^{b}\mathcal{T}^{2}(a\rightarrow t)\cdot\tilde{% \sigma}_{t}\;dt= italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_a → italic_t ) ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d italic_t (24)
=ρaabexp(at2σ~(u)𝑑u)σ~t𝑑tabsentsubscript𝜌𝑎superscriptsubscript𝑎𝑏expsuperscriptsubscript𝑎𝑡2~𝜎𝑢differential-d𝑢subscript~𝜎𝑡differential-d𝑡\displaystyle=\rho_{a}\int_{a}^{b}\text{exp}\left(-\int_{a}^{t}2\tilde{\sigma}% (u)\;du\right)\cdot\tilde{\sigma}_{t}\;dt= italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT exp ( - ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT 2 over~ start_ARG italic_σ end_ARG ( italic_u ) italic_d italic_u ) ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d italic_t (25)
=ρaabexp(2atσ~(u)𝑑u)σ~t𝑑tabsentsubscript𝜌𝑎superscriptsubscript𝑎𝑏exp2superscriptsubscript𝑎𝑡~𝜎𝑢differential-d𝑢subscript~𝜎𝑡differential-d𝑡\displaystyle=\rho_{a}\int_{a}^{b}\text{exp}\left(-2\int_{a}^{t}\tilde{\sigma}% (u)\;du\right)\cdot\tilde{\sigma}_{t}\;dt= italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT exp ( - 2 ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT over~ start_ARG italic_σ end_ARG ( italic_u ) italic_d italic_u ) ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d italic_t (26)
=ρaabexp(2ln(Φs(f(u)))|at)σ~t𝑑t.absentsubscript𝜌𝑎superscriptsubscript𝑎𝑏expevaluated-at2subscriptΦ𝑠𝑓𝑢𝑎𝑡subscript~𝜎𝑡differential-d𝑡\displaystyle=\rho_{a}\int_{a}^{b}\text{exp}\left(\left.2\ln(\Phi_{s}(f(u)))% \right|_{a}^{t}\right)\cdot\tilde{\sigma}_{t}\;dt\;.= italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT exp ( 2 roman_ln ( roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_u ) ) ) | start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d italic_t . (27)

Let Ωx=Φs(f(x))subscriptΩ𝑥subscriptΦ𝑠𝑓𝑥\Omega_{x}=\Phi_{s}(f(x))roman_Ω start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT = roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_x ) ), then

P(ab)𝑃𝑎𝑏\displaystyle P(a\rightarrow b)italic_P ( italic_a → italic_b ) =ρaabexp(2ln(Ωt)2ln(Ωa))σ~t𝑑tabsentsubscript𝜌𝑎superscriptsubscript𝑎𝑏exp2subscriptΩ𝑡2subscriptΩ𝑎subscript~𝜎𝑡differential-d𝑡\displaystyle=\rho_{a}\int_{a}^{b}\text{exp}\left(2\ln(\Omega_{t})-2\ln(\Omega% _{a})\right)\cdot\tilde{\sigma}_{t}\;dt= italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT exp ( 2 roman_ln ( roman_Ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - 2 roman_ln ( roman_Ω start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) ) ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d italic_t (28)
=ρaabΩt2Ωa2σ~t𝑑tabsentsubscript𝜌𝑎superscriptsubscript𝑎𝑏superscriptsubscriptΩ𝑡2superscriptsubscriptΩ𝑎2subscript~𝜎𝑡differential-d𝑡\displaystyle=\rho_{a}\int_{a}^{b}\frac{{\Omega_{t}}^{2}}{{\Omega_{a}}^{2}}% \cdot\tilde{\sigma}_{t}\;dt= italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT divide start_ARG roman_Ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Ω start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d italic_t (29)
=ρaΩa2abΩt2σ~t𝑑tabsentsubscript𝜌𝑎superscriptsubscriptΩ𝑎2superscriptsubscript𝑎𝑏superscriptsubscriptΩ𝑡2subscript~𝜎𝑡differential-d𝑡\displaystyle=\frac{\rho_{a}}{{\Omega_{a}}^{2}}\int_{a}^{b}{\Omega_{t}}^{2}% \cdot\tilde{\sigma}_{t}\;dt= divide start_ARG italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_ARG start_ARG roman_Ω start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT roman_Ω start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_d italic_t (30)
=ρaΩa2abdΦsdt(f(t))Φs(f(t))dtabsentsubscript𝜌𝑎superscriptsubscriptΩ𝑎2superscriptsubscript𝑎𝑏𝑑subscriptΦ𝑠𝑑𝑡𝑓𝑡subscriptΦ𝑠𝑓𝑡𝑑𝑡\displaystyle=\frac{\rho_{a}}{{\Omega_{a}}^{2}}\int_{a}^{b}-\frac{d\Phi_{s}}{% dt}(f(t))\cdot\Phi_{s}(f(t))\;dt= divide start_ARG italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_ARG start_ARG roman_Ω start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT - divide start_ARG italic_d roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_ARG start_ARG italic_d italic_t end_ARG ( italic_f ( italic_t ) ) ⋅ roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_t ) ) italic_d italic_t (31)
=ρaΩa2(12Φs(f(t))2|ab)absentsubscript𝜌𝑎superscriptsubscriptΩ𝑎2evaluated-at12subscriptΦ𝑠superscript𝑓𝑡2𝑎𝑏\displaystyle=\frac{\rho_{a}}{{\Omega_{a}}^{2}}(\left.-\frac{1}{2}{\Phi_{s}(f(% t))}^{2}\right|_{a}^{b})= divide start_ARG italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_ARG start_ARG roman_Ω start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_t ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT | start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ) (32)
=ρaΩa2(12Φs(f(a))212Φs(f(b))2)absentsubscript𝜌𝑎superscriptsubscriptΩ𝑎212subscriptΦ𝑠superscript𝑓𝑎212subscriptΦ𝑠superscript𝑓𝑏2\displaystyle=\frac{\rho_{a}}{{\Omega_{a}}^{2}}(\frac{1}{2}{\Phi_{s}(f(a))}^{2% }-\frac{1}{2}{\Phi_{s}(f(b))}^{2})= divide start_ARG italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_ARG start_ARG roman_Ω start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_a ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_b ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) (33)
=Φs(f(a))2Φs(f(b))22Φs(f(a))2ρa.absentsubscriptΦ𝑠superscript𝑓𝑎2subscriptΦ𝑠superscript𝑓𝑏22subscriptΦ𝑠superscript𝑓𝑎2subscript𝜌𝑎\displaystyle=\frac{{\Phi_{s}(f(a))}^{2}-{\Phi_{s}(f(b))}^{2}}{{2\Phi_{s}(f(a)% )}^{2}}\cdot\rho_{a}\;.= divide start_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_a ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_b ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_a ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ⋅ italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT . (34)

Consider the case where (f(ζ)𝐯)<0𝑓𝜁𝐯0-(\nabla f(\zeta)\cdot\mathbf{v})<0- ( ∇ italic_f ( italic_ζ ) ⋅ bold_v ) < 0 within the ray segment [a,b]𝑎𝑏[a,b][ italic_a , italic_b ]. Then

P(ab)𝑃𝑎𝑏\displaystyle P(a\rightarrow b)italic_P ( italic_a → italic_b ) =ab𝒯2(at)σ~tρ(t)𝑑tabsentsuperscriptsubscript𝑎𝑏superscript𝒯2𝑎𝑡subscript~𝜎𝑡𝜌𝑡differential-d𝑡\displaystyle=\int_{a}^{b}\mathcal{T}^{2}(a\rightarrow t)\cdot\tilde{\sigma}_{% t}\cdot\rho(t)\;dt= ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_a → italic_t ) ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ italic_ρ ( italic_t ) italic_d italic_t (35)
=ab𝒯2(at)0ρ(t)𝑑tabsentsuperscriptsubscript𝑎𝑏superscript𝒯2𝑎𝑡0𝜌𝑡differential-d𝑡\displaystyle=\int_{a}^{b}\mathcal{T}^{2}(a\rightarrow t)\cdot 0\cdot\rho(t)\;dt= ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_a → italic_t ) ⋅ 0 ⋅ italic_ρ ( italic_t ) italic_d italic_t (36)
=0.absent0\displaystyle=0\;.= 0 . (37)

Hence we conclude

P(ab)𝑃𝑎𝑏\displaystyle P(a\rightarrow b)italic_P ( italic_a → italic_b ) =max(Φs(f(a))2Φs(f(b))22Φs(f(a))2,0)ρa.absentsubscriptΦ𝑠superscript𝑓𝑎2subscriptΦ𝑠superscript𝑓𝑏22subscriptΦ𝑠superscript𝑓𝑎20subscript𝜌𝑎\displaystyle=\max\left(\frac{{\Phi_{s}(f(a))}^{2}-{\Phi_{s}(f(b))}^{2}}{{2% \Phi_{s}(f(a))}^{2}},0\right)\cdot\rho_{a}\;.= roman_max ( divide start_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_a ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_b ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_a ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , 0 ) ⋅ italic_ρ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT . (38)

Volume rendering of piecewise constant data.

Combining the above, we can evaluate the volume rendering integral through a medium with piecewise constant reflectivity:

P(ζN+1)𝑃subscript𝜁𝑁1\displaystyle P(\zeta_{N+1})italic_P ( italic_ζ start_POSTSUBSCRIPT italic_N + 1 end_POSTSUBSCRIPT ) =n=1Nζnζn+1𝒯2(ζ)σ~ζρζn𝑑ζabsentsuperscriptsubscript𝑛1𝑁superscriptsubscriptsubscript𝜁𝑛subscript𝜁𝑛1superscript𝒯2𝜁subscript~𝜎𝜁subscript𝜌subscript𝜁𝑛differential-d𝜁\displaystyle=\sum_{n=1}^{N}\int_{\zeta_{n}}^{\zeta_{n+1}}\mathcal{T}^{2}(% \zeta)\cdot\tilde{\sigma}_{\zeta}\cdot\rho_{\zeta_{n}}\;d\zeta= ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ζ ) ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT ⋅ italic_ρ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_d italic_ζ (39)
=n=1Nζnζn+1𝒯ζn2𝒯2(ζn\clipbox.50pt000ζ)σ~ζρζndabsentsuperscriptsubscript𝑛1𝑁superscriptsubscriptsubscript𝜁𝑛subscript𝜁𝑛1subscriptsuperscript𝒯2subscript𝜁𝑛superscript𝒯2\clipbox.50𝑝𝑡000absentsubscript𝜁𝑛𝜁subscript~𝜎𝜁subscript𝜌subscript𝜁𝑛𝑑\displaystyle=\sum_{n=1}^{N}\int_{\zeta_{n}}^{\zeta_{n+1}}\mathcal{T}^{2}_{% \zeta_{n}}\cdot\mathcal{T}^{2}(\zeta_{n}\mathrel{\mathchoice{\mkern 2.0mu% \clipbox{{.50pt}000}{$\displaystyle\vphantom{+}{\shortrightarrow}$}}{\mkern 2.% 0mu\clipbox{{.50pt}000}{$\textstyle\vphantom{+}{\shortrightarrow}$}}{\mkern 2.% 0mu\clipbox{{.50pt}000}{$\scriptstyle\vphantom{+}{\shortrightarrow}$}}{\mkern 2% .0mu\clipbox{{.50pt}000}{$\scriptscriptstyle\vphantom{+}{\shortrightarrow}$}}}% \zeta)\cdot\tilde{\sigma}_{\zeta}\cdot\rho_{\zeta_{n}}\;d= ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_RELOP .50 italic_p italic_t 000 → end_RELOP italic_ζ ) ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT ⋅ italic_ρ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_d
=n=1N𝒯ζn2ζnζn+1𝒯2(ζnζ)σ~ζρζn𝑑ζabsentsuperscriptsubscript𝑛1𝑁subscriptsuperscript𝒯2subscript𝜁𝑛superscriptsubscriptsubscript𝜁𝑛subscript𝜁𝑛1superscript𝒯2subscript𝜁𝑛𝜁subscript~𝜎𝜁subscript𝜌subscript𝜁𝑛differential-d𝜁\displaystyle=\sum_{n=1}^{N}\mathcal{T}^{2}_{\zeta_{n}}\int_{\zeta_{n}}^{\zeta% _{n+1}}\mathcal{T}^{2}(\zeta_{n}\rightarrow\zeta)\cdot\tilde{\sigma}_{\zeta}% \cdot\rho_{\zeta_{n}}\;d\zeta= ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → italic_ζ ) ⋅ over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT ⋅ italic_ρ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_d italic_ζ (40)
=n=1N𝒯ζn2P(ζnζn+1)absentsuperscriptsubscript𝑛1𝑁subscriptsuperscript𝒯2subscript𝜁𝑛𝑃subscript𝜁𝑛subscript𝜁𝑛1\displaystyle=\sum_{n=1}^{N}\mathcal{T}^{2}_{\zeta_{n}}P(\zeta_{n}\rightarrow% \zeta_{n+1})= ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_P ( italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT → italic_ζ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) (41)
=n=1N𝒯ζn2α~ζnρζn,absentsuperscriptsubscript𝑛1𝑁subscriptsuperscript𝒯2subscript𝜁𝑛subscript~𝛼subscript𝜁𝑛subscript𝜌subscript𝜁𝑛\displaystyle=\sum_{n=1}^{N}\mathcal{T}^{2}_{\zeta_{n}}\cdot\tilde{\alpha}_{% \zeta_{n}}\cdot\rho_{\zeta_{n}},= ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ italic_ρ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT , (42)

where

α~ζnmax(Φs(f(ζn)2Φs(f(ζn+1))22Φs(f(ζn))2,0).\displaystyle\tilde{\alpha}_{\zeta_{n}}\equiv\max\left(\frac{{\Phi_{s}(f(\zeta% _{n})}^{2}-{\Phi_{s}(f(\zeta_{n+1}))}^{2}}{{2\Phi_{s}(f(\zeta_{n}))}^{2}},0% \right)\;.over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ≡ roman_max ( divide start_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG , 0 ) . (43)

The discrete accumulated transmittance 𝒯𝒯\mathcal{T}caligraphic_T can be calculated as follows:

Consider the case where (f(ζ)𝐯)>0𝑓𝜁𝐯0-(\nabla f(\zeta)\cdot\mathbf{v})>0- ( ∇ italic_f ( italic_ζ ) ⋅ bold_v ) > 0 in [ζn,ζn+1]subscript𝜁𝑛subscript𝜁𝑛1[\zeta_{n},\zeta_{n+1}][ italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ζ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ]:

𝒯ζnsubscript𝒯subscript𝜁𝑛\displaystyle\mathcal{T}_{\zeta_{n}}caligraphic_T start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT =i=1n1(exp(ζnζn+1σ~ζdζ)\displaystyle=\prod_{i=1}^{n-1}(\exp(-\int_{\zeta_{n}}^{\zeta_{n+1}}\tilde{% \sigma}_{\zeta}\;d\zeta)= ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( roman_exp ( - ∫ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT italic_d italic_ζ ) (44)
=i=1n1(Φs(f(ζn+1))Φs(f(ζn)))absentsuperscriptsubscriptproduct𝑖1𝑛1subscriptΦ𝑠𝑓subscript𝜁𝑛1subscriptΦ𝑠𝑓subscript𝜁𝑛\displaystyle=\prod_{i=1}^{n-1}(\frac{\Phi_{s}(f(\zeta_{n+1}))}{\Phi_{s}(f(% \zeta_{n}))})= ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( divide start_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) ) end_ARG start_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) end_ARG ) (45)
𝒯ζn2subscriptsuperscript𝒯2subscript𝜁𝑛\displaystyle\mathcal{T}^{2}_{\zeta_{n}}caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT =i=1n1(Φs(f(ζn+1))2Φs(f(ζn))2)absentsuperscriptsubscriptproduct𝑖1𝑛1subscriptΦ𝑠superscript𝑓subscript𝜁𝑛12subscriptΦ𝑠superscript𝑓subscript𝜁𝑛2\displaystyle=\prod_{i=1}^{n-1}(\frac{{\Phi_{s}(f(\zeta_{n+1}))}^{2}}{{\Phi_{s% }(f(\zeta_{n}))}^{2}})= ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( divide start_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Φ start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_f ( italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (46)
=i=1n1(12α~ζn).absentsuperscriptsubscriptproduct𝑖1𝑛112subscript~𝛼subscript𝜁𝑛\displaystyle=\prod_{i=1}^{n-1}(1-2\tilde{\alpha}_{\zeta_{n}})\;.= ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( 1 - 2 over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) . (47)

Consider the case where (f(ζ)𝐯)<0𝑓𝜁𝐯0-(\nabla f(\zeta)\cdot\mathbf{v})<0- ( ∇ italic_f ( italic_ζ ) ⋅ bold_v ) < 0 in [ζn,ζn+1]subscript𝜁𝑛subscript𝜁𝑛1[\zeta_{n},\zeta_{n+1}][ italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_ζ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT ]:

𝒯ζnsubscript𝒯subscript𝜁𝑛\displaystyle\mathcal{T}_{\zeta_{n}}caligraphic_T start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT =i=1n1(exp(ζnζn+1σ~ζdζ)=i=1n1(1)\displaystyle=\prod_{i=1}^{n-1}(\exp(-\int_{\zeta_{n}}^{\zeta_{n+1}}\tilde{% \sigma}_{\zeta}\;d\zeta)=\prod_{i=1}^{n-1}(1)= ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( roman_exp ( - ∫ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT over~ start_ARG italic_σ end_ARG start_POSTSUBSCRIPT italic_ζ end_POSTSUBSCRIPT italic_d italic_ζ ) = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( 1 ) (48)
𝒯ζn2subscriptsuperscript𝒯2subscript𝜁𝑛\displaystyle\mathcal{T}^{2}_{\zeta_{n}}caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT =i=1n1(12)=i=1n1(12α~ζn).absentsuperscriptsubscriptproduct𝑖1𝑛1superscript12superscriptsubscriptproduct𝑖1𝑛112subscript~𝛼subscript𝜁𝑛\displaystyle=\prod_{i=1}^{n-1}(1^{2})=\prod_{i=1}^{n-1}(1-2\tilde{\alpha}_{% \zeta_{n}})\;.= ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( 1 start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( 1 - 2 over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) . (49)

In conclusion, the radiant power can be reformulated as:

P(ζN+1)=n=1N𝒯ζn2α~ζnρζn,𝑃subscript𝜁𝑁1superscriptsubscript𝑛1𝑁subscriptsuperscript𝒯2subscript𝜁𝑛subscript~𝛼subscript𝜁𝑛subscript𝜌subscript𝜁𝑛\displaystyle P(\zeta_{N+1})=\sum_{n=1}^{N}\mathcal{T}^{2}_{\zeta_{n}}\cdot% \tilde{\alpha}_{\zeta_{n}}\cdot\rho_{\zeta_{n}}\;,italic_P ( italic_ζ start_POSTSUBSCRIPT italic_N + 1 end_POSTSUBSCRIPT ) = ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ italic_ρ start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT , (50)

where 𝒯ζn2=i=1n1(12α~ζi)subscriptsuperscript𝒯2subscript𝜁𝑛superscriptsubscriptproduct𝑖1𝑛112subscript~𝛼subscript𝜁𝑖\mathcal{T}^{2}_{\zeta_{n}}=\prod_{i=1}^{n-1}(1-2\tilde{\alpha}_{\zeta_{i}})\;caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( 1 - 2 over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ).

Depth volume rendering of piecewise constant data

Note that α~ζn[0,0.5],𝒯ζn2[0,1],n=1N𝒯ζn2α~ζn=0.5formulae-sequencesubscript~𝛼subscript𝜁𝑛00.5formulae-sequencesubscriptsuperscript𝒯2subscript𝜁𝑛01superscriptsubscript𝑛1𝑁subscriptsuperscript𝒯2subscript𝜁𝑛subscript~𝛼subscript𝜁𝑛0.5\tilde{\alpha}_{\zeta_{n}}\in[0,0.5],\mathcal{T}^{2}_{\zeta_{n}}\in[0,1],\sum_% {n=1}^{N}\mathcal{T}^{2}_{\zeta_{n}}\cdot\tilde{\alpha}_{\zeta_{n}}=0.5over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ [ 0 , 0.5 ] , caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ [ 0 , 1 ] , ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0.5, for depth volumetric rendering, we have

ζ=n=1N2𝒯ζn2α~ζnζn=n=1Nwnζn,𝜁superscriptsubscript𝑛1𝑁2subscriptsuperscript𝒯2subscript𝜁𝑛subscript~𝛼subscript𝜁𝑛subscript𝜁𝑛superscriptsubscript𝑛1𝑁subscript𝑤𝑛subscript𝜁𝑛\displaystyle\zeta=\sum_{n=1}^{N}2\cdot\mathcal{T}^{2}_{\zeta_{n}}\cdot\tilde{% \alpha}_{\zeta_{n}}\cdot\zeta_{n}=\sum_{n=1}^{N}w_{n}\cdot\zeta_{n}\;,italic_ζ = ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT 2 ⋅ caligraphic_T start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ⋅ italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , (51)

where wn=2α~ζni=1n1(12α~ζi)subscript𝑤𝑛2subscript~𝛼subscript𝜁𝑛superscriptsubscriptproduct𝑖1𝑛112subscript~𝛼subscript𝜁𝑖w_{n}=2\tilde{\alpha}_{\zeta_{n}}\cdot\prod_{i=1}^{n-1}(1-2\tilde{\alpha}_{% \zeta_{i}})\;italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 2 over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⋅ ∏ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n - 1 end_POSTSUPERSCRIPT ( 1 - 2 over~ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ).