Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: textalpha
  • failed: bibentry

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY 4.0
arXiv:2312.09093v3 [cs.CV] 24 Jan 2024

Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption

Ziteng Cui1,2, Lin Gu3,1, Xiao Sun2, Xianzheng Ma4, Yu Qiao2, Tatsuya Harada1,3 Corresponding Author
Abstract

The standard Neural Radiance Fields (NeRF) paradigm employs a viewer-centered methodology, entangling the aspects of illumination and material reflectance into emission solely from 3D points. This simplified rendering approach presents challenges in accurately modeling images captured under adverse lighting conditions, such as low light or over-exposure. Motivated by the ancient Greek emission theory that posits visual perception as a result of rays emanating from the eyes, we slightly refine the conventional NeRF framework to train NeRF under challenging light conditions and generate normal-light condition novel views unsupervisedly. We introduce the concept of a “Concealing Field,” which assigns transmittance values to the surrounding air to account for illumination effects. In dark scenarios, we assume that object emissions maintain a standard lighting level but are attenuated as they traverse the air during the rendering process. Concealing Field thus compel NeRF to learn reasonable density and colour estimations for objects even in dimly lit situations. Similarly, the Concealing Field can mitigate over-exposed emissions during rendering stage. Furthermore, we present a comprehensive multi-view dataset captured under challenging illumination conditions for evaluation. Our code and proposed dataset are available at https://github.com/cuiziteng/Aleth-NeRF.

Refer to caption
Figure 1: Utilizing the Concealing Field assumption, Aleth-NeRF is capable of processing both low-light &\&& over-expose multi-view images as inputs and generating novel views with natural illumination.

Introduction

Neural Radiance Field (NeRF) (Mildenhall et al. 2020) has been demonstrated to effectively understand 3D scenes from 2D images and generate novel views. However, the formulation of NeRF and its follow-up variants assume captured images are under normal light, often failing to work under low-light (Mildenhall et al. 2021) or over-exposure scenarios. This is because vanilla NeRF is viewer-centered which models the amount of light emission from a location to the viewer without disentangling illumination and material (Fig. 1(a)) (Lyu et al. 2022). As a result, the NeRF algorithm interprets a dark scene as insufficient radiation of the 3D object particles, violating the estimation of the object’s material and geometry. In practical applications, images are often taken under challenging lighting conditions. Therefore, this paper aims to slightly modify vanilla NeRF for under &\&& over-exposure scenes. As shown in Fig. 1(c, d), the proposed, Aleth-NeRF, renders normal-light novel views despite the severe input images.

The rendering process in NeRF (Fig. 1(b)) is similar to the viewer-centered emission theory held by ancient Greek. Emission theory ignores the incident light but postulates visual rays emitted from the eye travel in straight lines and interacts with objects to form the visual perception. Therefore, the darkness of an entity is solely caused by the particles between the object and the eye. In other words, all objects are visible by default unless concealed. Inspired by this worldview, we assume a simple but NeRF-friendly concept that it is the concealing fields (gray particle in Fig. 1(c)) in viewing direction that attenuates the emission and makes the viewer see a low-light scene. This is in contrast to the standard NeRF setting where the density of air (white particle in Fig. 1(a)) is usually zero. Introducing the Concealing Field, which assigns the air particles with transmittance value allows NeRF to accurately estimate the colour and density of objects (yellow particles in Fig. 1(c)) in low-light conditions, therefore when removing the concealing fields, or Aletheia (\textalpha\textlambda\texteta\texttheta\textepsilon\textiota\textalpha111normally translated as ”unconcealedness”, ”disclosure” or ”revealing” (Heidegger, Stambaugh, and Schmidt 2010), we are able to render novel views with normal-light. On the contrary, for the over-exposed scene, deliberately adding the concealing fields in rendering stage could correct the exposure.

Our proposed method Aleth-NeRF takes low-light &\&& over-exposure images as inputs to train the model and learn the volumetric representation jointly with concealing fields. As shown in Fig.1(b), we jointly train NeRF with concealing fields between the object and viewer. For the low light scenario, we remove the concealing fields during the rendering stage (Fig. 1(c)). When dealing with over-exposure images, Aleth-NeRF would add concealing fields to suppress overly bright (Fig. 1(d)). Our contributions are summarized as follow:

  • We propose Aleth-NeRF, that trains under low-light &\&& over-exposure conditions and generates novel views under normal-light. Inspired by ancient Greek philosophy, we naturally extend the transmittance function in vanilla NeRF by modelling concealing fields between the objects and viewer to interpret lightness degradation.

  • We contribute a challenging illumination multi-view dataset, with paired sRGB low-light &\&& normal-light &\&& over-exposure images, dataset would also be public.

  • We compare with various image enhancement and exposure correction methods &\&& previous NeRF-based method (Mildenhall et al. 2021). Extensive experiments show that our Aleth-NeRF achieves satisfactory enhancement quality and multi-view consistency.

Related Works

Novel View synthesis with NeRF

NeRF (Mildenhall et al. 2020) is proposed for novel view synthesis from a collection of posed input images. The unique advantage of NeRF models exists in preserving the 3D geometry consistency thanks to its physical volume rendering scheme. In addition several methods have been proposed to speed up and improve NeRF training (Barron et al. 2021; Sara Fridovich-Keil and Alex Yu et al. 2022; Lindell, Martel, and Wetzstein 2021; Yu et al. 2021; Jain, Tancik, and Abbeel 2021; Deng et al. 2022; Müller et al. 2022).

Many of the latter works focus on improving NeRF’s performance under various degradation conditions, such as blurry (Ma et al. 2021), noisy (Pearl, Treibitz, and Korman 2022), reflection (Guo et al. 2022), glossy surfaces (Verbin et al. 2022), underwater (Levy et al. 2023), or use NeRF to handle super-resolution (Wang et al. 2021a; Bahat et al. 2022) and HDR reconstruction (Xin et al. 2021; Jun-Seong et al. 2022) in 3D space. Another line of research extends NeRF for lightness editing in 3D space. Some work, like NeRF-W (Martin-Brualla et al. 2021), focuses on rendering NeRF with uncontrolled in-the-wild images, other relighting works (Srinivasan et al. 2021; Rudnev et al. 2022; Zhang et al. 2021b) rely on known illumination conditions and introduce additional physical elements (i.e. normal, light, albedo, etc.), along with complex parametric modeling of these elements. Meanwhile, these methods are not specifically designed for low-light &\&& over-exposure conditions.

Among these, RAW-NeRF (Mildenhall et al. 2021) is more closer to our work, which proposes to render NeRF in HDR RAW domain and then post-process the rendered scene with image signal processor (ISP), RAW-NeRF has shown a preliminary ability to enhance the scene light but requires HDR RAW data for training, which make it hard to generalize on common used sRGB images. Instead our Aleth-NeRF could directly rendered on sRGB under &\&& over exposure images and injection unsupervised enhancement into 3D space by an effective concealing fields manner.

Enhancement in challenging light conditions

Challenging lightness can arise from multiple sources, encompassing natural lighting variances (such as low-light situations and overly bright scenes) as well as human-induced factors (such as incorrect camera exposure settings). To tackle these challenge lighting conditions, numerous techniques for image enhancement and exposure correction have been developed and proposed.

Image Enhancement &\&& Exposure Correction:

Image enhancement methods aims to enhance images with poor illumination, traditional methods usually rely on RetiNex theory (Land 1986; Guo, Li, and Ling 2017) or Histogram Equalization (Gonzalez and Woods 2006), currently deep neural networks (DNNs) based methods become the mainstream solutions, series of CNN &\&& Transformer-based methods have been developed (Wei et al. 2018; Moran et al. 2020; Wang et al. 2021b, 2022; Jiang et al. 2021; Guo et al. 2020; Jin, Yang, and Tan 2022; Ma et al. 2022; Yang et al. 2023). Meanwhile, several exposure correction methods have been proposed to consider both under &\&& over exposure conditions (Afifi et al. 2021; Nsampi, Hu, and Wang 2021; Cui et al. 2022; Huang et al. 2023), which aims to correct underexposure and its adverse overexposure images into normal-light condition. However, image enhancement &\&& exposure correction methods almost build on 2D image space operations, which often fail to exploit the 3D geometry of the scene and could not deal with multi-view inputs.

Video Enhancement &\&& Burst Enhancement:

Beyond above techniques that focus on single image. Video enhancement methods have been proposed to optimize the temporal consistency between adjacent frames, ensuring stability when processing different frames. These methods employ various approaches such as 3D convolution (Lv et al. 2018), optical flow (Zhang et al. 2021a), and event guidance (Liu et al. 2023). Burst enhancement also plays a crucial role in modern computational photography area (Delbracio et al. 2021; Hasinoff et al. 2016), where multiple frames are captured during exposure and processed using an ”align-merge-enhance” approach to produce a single output frame. In recent advancements, deep neural networks have been employed to replace traditional manual operation algorithms in these methods (Godard, Matzen, and Uyttendaele 2018; Mildenhall et al. 2018; Dudhane et al. 2022).

However, existing image &\&& video &\&& burst enhancement methods primarily focus on enhancing images in their original views, rather than generating coherent 3D scenes with novel views, for comparison we have to combine these enhancement methods with NeRF (see Table. 2 and Table. 3). In contrast, Aleth-NeRF is capable of directly synthesizing novel views under challenging light conditions while achieving state-of-the-art enhancement quality.

Methods

Refer to caption
Figure 2: Train on adverse lighting condition images Cadvsuperscript𝐶𝑎𝑑𝑣C^{adv}italic_C start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT, Aleth-NeRF performs unsupervised lightness correction by (a). remove concealing fields in low-light conditions and (b). add concealing fields in over-exposure conditions.

Neural Radiance Field Revisited

Radiance Field is defined as the density σ𝜎\sigmaitalic_σ and RGB colour value c𝑐citalic_c of a 3D location x under a 2D viewing direction d. The density σ𝜎\sigmaitalic_σ, on the one hand, represents the radiation capacity of the particle itself at x, and on the other hand, controls how much radiance is absorbed when other lights pass through x.

When rendering an image, a camera ray 𝐫(t)=𝐨+t𝐝𝐫𝑡𝐨𝑡𝐝\textbf{r}(t)=\textbf{o}+t\cdot\textbf{d}r ( italic_t ) = o + italic_t ⋅ d (𝐫𝐑𝐫𝐑\textbf{r}\in\textbf{R}r ∈ R) cast from the given camera position o towards direction d. All the radiance is accumulated along the ray to render its corresponding pixel value 𝐂(𝐫)𝐂𝐫\textbf{C}(\textbf{r})C ( r ). Formally,

𝐂(𝐫)=tntfT(𝐫(t))σ(𝐫(t))c(𝐫(t),𝐝)𝑑t,𝐂𝐫superscriptsubscriptsubscript𝑡𝑛subscript𝑡𝑓𝑇𝐫𝑡𝜎𝐫𝑡𝑐𝐫𝑡𝐝differential-d𝑡{\bf C}({\bf r})=\int_{t_{n}}^{t_{f}}T(\textbf{r}(t))\sigma({\bf r}(t))c({\bf r% }(t),{\bf d})dt,bold_C ( bold_r ) = ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUPERSCRIPT italic_T ( r ( italic_t ) ) italic_σ ( bold_r ( italic_t ) ) italic_c ( bold_r ( italic_t ) , bold_d ) italic_d italic_t , (1)

where

T(𝐫(t))=exp(tntσ(𝐫(s))𝑑s),𝑇𝐫𝑡superscriptsubscriptsubscript𝑡𝑛𝑡𝜎𝐫𝑠differential-d𝑠T(\textbf{r}(t))=\exp(-\int_{t_{n}}^{t}\sigma({\bf r}(s))ds),italic_T ( r ( italic_t ) ) = roman_exp ( - ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_σ ( bold_r ( italic_s ) ) italic_d italic_s ) , (2)

is known as the accumulated transmittance that denotes the radiance decay rate of the particle at 𝐫(t)𝐫𝑡\textbf{r}(t)r ( italic_t ) when it is occluded by particles closer to the camera (at 𝐫(s)𝐫𝑠\textbf{r}(s)r ( italic_s ), s<t𝑠𝑡s<titalic_s < italic_t). The integrals are computed by a discrete approximation over sampled 3D points along the ray 𝐫𝐫\bf rbold_r (see Fig. 1(b)). The discrete form of Eq. 1 and Eq. 2 are represented as follows:

𝐂(𝐫)=i=1NT(𝐫(i))(1exp(σ(𝐫(i))δ))c(𝐫(i),𝐝),𝐂𝐫superscriptsubscript𝑖1𝑁𝑇𝐫𝑖1𝜎𝐫𝑖𝛿𝑐𝐫𝑖𝐝{\bf C}({\bf r})=\sum_{i=1}^{N}T(\textbf{r}(i))(1-\exp(-\sigma(\textbf{r}(i))% \cdot\delta))\cdot c(\textbf{r}(i),\textbf{d}),bold_C ( bold_r ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_T ( r ( italic_i ) ) ( 1 - roman_exp ( - italic_σ ( r ( italic_i ) ) ⋅ italic_δ ) ) ⋅ italic_c ( r ( italic_i ) , d ) , (3)
T(𝐫(i))=exp(j=1i1σ(𝐫(j))δ).𝑇𝐫𝑖superscriptsubscript𝑗1𝑖1𝜎𝐫𝑗𝛿T(\textbf{r}(i))=\exp\left(-\sum_{j=1}^{i-1}\sigma(\textbf{r}(j))\cdot\delta% \right).italic_T ( r ( italic_i ) ) = roman_exp ( - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT italic_σ ( r ( italic_j ) ) ⋅ italic_δ ) . (4)

Value of σ(𝐫(i))𝜎𝐫𝑖\sigma(\textbf{r}(i))italic_σ ( r ( italic_i ) ) reflects the object occupancy in 𝐫(i)𝐫𝑖\textbf{r}(i)r ( italic_i ). δ𝛿\deltaitalic_δ is a constant distance value between adjacent sample points under uniform sampling.

Refer to caption
Figure 3: Along camera ray r (z𝑧zitalic_z axis), concealing fields and density σ𝜎\sigmaitalic_σ exhibit a negative correlation, (x,y)𝑥𝑦(x,y)( italic_x , italic_y ) denotes training images’ width and height.
Refer to caption
Figure 4: A low-light “bike” scene for example, we show the ablation analyze of different loss functions’ effectiveness.

For the network structure, NeRF learns two multilayer perceptron (MLP) networks: density MLP Fσsubscript𝐹𝜎F_{\sigma}italic_F start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT and colour MLP Fcsubscript𝐹𝑐F_{c}italic_F start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, to map the 3D location 𝐫(i)𝐫𝑖\textbf{r}(i)r ( italic_i ) and 2D viewing direction d to its density σ𝜎\sigmaitalic_σ and colour c𝑐citalic_c, specifically:

Fσ(𝐫(i))σ(𝐫(i)),𝐡subscript𝐹𝜎𝐫𝑖𝜎𝐫𝑖𝐡F_{\sigma}(\textbf{r}(i))\rightarrow\sigma(\textbf{r}(i)),\textbf{h}italic_F start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( r ( italic_i ) ) → italic_σ ( r ( italic_i ) ) , h (5)
Fc(𝐡,𝐝)c(𝐫(i),𝐝),subscript𝐹𝑐𝐡𝐝𝑐𝐫𝑖𝐝F_{c}(\textbf{h},\textbf{d})\rightarrow c(\textbf{r}(i),\textbf{d}),italic_F start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( h , d ) → italic_c ( r ( italic_i ) , d ) , (6)

where h is a hidden feature vector that is sent to colour MLP Fcsubscript𝐹𝑐F_{c}italic_F start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, color c(𝐫(i),𝐝)𝑐𝐫𝑖𝐝c(\textbf{r}(i),\textbf{d})italic_c ( r ( italic_i ) , d ) and density σ(𝐫(i))𝜎𝐫𝑖\sigma(\textbf{r}(i))italic_σ ( r ( italic_i ) ) are further activated by Sigmoid and ReLU functions to regularise their ranges into [0,1)01[0,1)[ 0 , 1 ) and [0,)0[0,\infty)[ 0 , ∞ ) respectively. Given ground truth images C, NeRF is optimised by minimising the MSE loss between predicted images 𝐂^^𝐂\hat{\textbf{C}}over^ start_ARG C end_ARG and ground truth images C:

mse=𝐫𝐑𝐂^(𝐫)𝐂(𝐫)2.subscript𝑚𝑠𝑒superscriptsubscript𝐫𝐑superscriptnorm^𝐂𝐫𝐂𝐫2\mathcal{L}_{mse}=\sum_{\textbf{r}}^{\textbf{R}}||\hat{\textbf{C}}(\textbf{r})% -\textbf{C}(\textbf{r})||^{2}.caligraphic_L start_POSTSUBSCRIPT italic_m italic_s italic_e end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT | | over^ start_ARG C end_ARG ( r ) - C ( r ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (7)

We refer more details such as positional encoding and hierarchical volume sampling to NeRF paper (Mildenhall et al. 2020).

Aleth-NeRF with Concealing Field

Given adverse lighting condition images 𝐂adv(𝐫)superscript𝐂𝑎𝑑𝑣𝐫\textbf{C}^{adv}(\textbf{r})C start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT ( r ) taken in low-light &\&& over-exposure conditions, our goal is to generate novel views 𝐂nor(𝐫)superscript𝐂𝑛𝑜𝑟𝐫\textbf{C}^{nor}(\textbf{r})C start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT ( r ) in normal light condition. The key idea is that Aleth-NeRF assumes 𝐂advsuperscript𝐂𝑎𝑑𝑣\textbf{C}^{adv}C start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT and 𝐂norsuperscript𝐂𝑛𝑜𝑟\textbf{C}^{nor}C start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT are rendered with the same underlying density field σ𝜎\sigmaitalic_σ (yellow particle in Fig. 1) along each camera ray r, but with or without the proposed concealing fields (gray particle in Fig. 1).

We model two types of concealing fields to reduce light transport in the volume rendering stage: local Concealing Field denoted as ΩΩ\Omegaroman_Ω at the voxel level, and global Concealing Field denoted as ΘGsubscriptΘ𝐺\Theta_{G}roman_Θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT at the scene level. Multiplying two fields, ΩΩ\Omegaroman_Ω and ΘGsubscriptΘ𝐺\Theta_{G}roman_Θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT gives the final concealing field value which reduce the accumulated transmittance. Fig. 2 shows an overview of the Aleth-NeRF training strategy. Fig. 3 shows concealing fields ΩΘGΩsubscriptΘ𝐺\Omega\cdot\Theta_{G}roman_Ω ⋅ roman_Θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT’s distribution.

Local Concealing Field

denoted as Ω(𝐫(i))Ω𝐫𝑖\Omega(\textbf{r}(i))roman_Ω ( r ( italic_i ) ), defines an extra light concealing capacity of a particle at 3D location 𝐫(i)𝐫𝑖\textbf{r}(i)r ( italic_i ). As depicted in Fig. 2, ΩΩ\Omegaroman_Ω is individually learned for each 3D position and is generated using the density MLP Fσsubscript𝐹𝜎F_{\sigma}italic_F start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT. To create the local concealing field ΩΩ\Omegaroman_Ω, we introduce an additional large kernel convolution layer (with a size of 7) built upon Fσsubscript𝐹𝜎F_{\sigma}italic_F start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT. This convolution process establishes spatial relationships between pixels and enriches the Concealing Field with predominantly light-related information rather than structural information (Zamir et al. 2020). This larger kernel convolution also effectively suppresses noise and contributes to smoother rendering outcomes.

convsize:7(Fσ(𝐫(i)))Ω(𝐫(i))𝑐𝑜𝑛subscript𝑣:𝑠𝑖𝑧𝑒7subscript𝐹𝜎𝐫𝑖Ω𝐫𝑖conv_{size:7}(F_{\sigma}(\textbf{r}(i)))\rightarrow\Omega(\textbf{r}(i))italic_c italic_o italic_n italic_v start_POSTSUBSCRIPT italic_s italic_i italic_z italic_e : 7 end_POSTSUBSCRIPT ( italic_F start_POSTSUBSCRIPT italic_σ end_POSTSUBSCRIPT ( r ( italic_i ) ) ) → roman_Ω ( r ( italic_i ) ) (8)

Global Concealing Field

denoted as ΘG(i)subscriptΘ𝐺𝑖\Theta_{G}(i)roman_Θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_i ), is defined as a set of learnable parameters corresponding to the camera distance i𝑖iitalic_i for all camera rays in R. ΘGsubscriptΘ𝐺\Theta_{G}roman_Θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT remains constant within a scene and is independent of voxels, as we posit that a specific degree of lighting influence remains consistent across the same scene. In our experiments, we initialize ΘG(i)subscriptΘ𝐺𝑖\Theta_{G}(i)roman_Θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_i ) with a value of 0.5 for each camera distance i𝑖iitalic_i. With the global and local concealing fields, the accumulated transmittance T𝑇Titalic_T in Eq. 4 is blighted by ΩΩ\Omegaroman_Ω and ΘGsubscriptΘ𝐺\Theta_{G}roman_Θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT to mimic the process of light suppression (see Fig. 1(b)), as follow:

Tconceal(𝐫(i))=superscript𝑇𝑐𝑜𝑛𝑐𝑒𝑎𝑙𝐫𝑖absent\displaystyle T^{conceal}(\textbf{r}(i))=italic_T start_POSTSUPERSCRIPT italic_c italic_o italic_n italic_c italic_e italic_a italic_l end_POSTSUPERSCRIPT ( r ( italic_i ) ) = exp(j=1i1σ(𝐫(j))δ)superscriptsubscript𝑗1𝑖1𝜎𝐫𝑗𝛿\displaystyle\exp\left(-\sum_{j=1}^{i-1}\sigma(\textbf{r}(j))\cdot\delta\right)roman_exp ( - ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT italic_σ ( r ( italic_j ) ) ⋅ italic_δ ) (9)
j=1i1Ω(𝐫(j))ΘG(j).\displaystyle\cdot\prod_{j=1}^{i-1}\Omega(\textbf{r}(j))\Theta_{G}(j).⋅ ∏ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT roman_Ω ( r ( italic_j ) ) roman_Θ start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ( italic_j ) .

Train and Test Schemes

of Aleth-NeRF for both low-light and over-exposure scenes are illustrated in Fig. 2. For low-light conditions (blue arrow in Fig. 2), Aleth-NeRF incorporates Tconcealsuperscript𝑇𝑐𝑜𝑛𝑐𝑒𝑎𝑙T^{conceal}italic_T start_POSTSUPERSCRIPT italic_c italic_o italic_n italic_c italic_e italic_a italic_l end_POSTSUPERSCRIPT during the volume rendering stage to generate low-light images 𝐂^advsuperscript^𝐂𝑎𝑑𝑣\hat{\textbf{C}}^{adv}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT, while removing the concealing fields to produce normal-light images 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT with the original T𝑇Titalic_T in Eq. 4. Conversely, for over-exposure scenes (yellow arrow in Fig. 2), Aleth-NeRF employs T𝑇Titalic_T to render very-bright images 𝐂^advsuperscript^𝐂𝑎𝑑𝑣\hat{\textbf{C}}^{adv}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT and incorporates the concealing fields along with Tconcealsuperscript𝑇𝑐𝑜𝑛𝑐𝑒𝑎𝑙T^{conceal}italic_T start_POSTSUPERSCRIPT italic_c italic_o italic_n italic_c italic_e italic_a italic_l end_POSTSUPERSCRIPT to render normal-light images 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT. The NeRF regression loss is applied to compare predicted adverse-light images 𝐂^advsuperscript^𝐂𝑎𝑑𝑣\hat{\textbf{C}}^{adv}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT with the corresponding ground truth 𝐂advsuperscript𝐂𝑎𝑑𝑣\textbf{C}^{adv}C start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT, while unsupervised lightness correction losses ensure the generation of predicted normal-light images 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT, more details please refer to next section.

Analysis of Concealing Fields

is shown in Fig. 3, we analyse the distribution of concealing fields and density along the camera ray 𝐫(i)𝐫𝑖\textbf{r}(i)r ( italic_i ) (z𝑧zitalic_z axis), where x𝑥xitalic_x, y𝑦yitalic_y are the training images’ width and height. In Fig. 3, brighter region denotes the larger possibility there exists concealing fields (gray line) or density (red line). We find that concealing fields exist in location 𝐫(i)𝐫𝑖\textbf{r}(i)r ( italic_i ) with lower density value σ(𝐫(i))𝜎𝐫𝑖\sigma(\textbf{r}(i))italic_σ ( r ( italic_i ) ). And the Pearson correlation coefficient Corr between concealing fields and density σ𝜎\sigmaitalic_σ are also negative along both xz𝑥𝑧x-zitalic_x - italic_z axis (-0.589) and yz𝑦𝑧y-zitalic_y - italic_z axis (-0.694). This validates that concealing fields are separated from density, thus rarely participating in scene rendering. Concealing fields exists more in locations 𝐫(i)𝐫𝑖\textbf{r}(i)r ( italic_i ) with sparse density, i.e. air outside the objects.

Effective Losses for Unsupervised Training

In this section, we present the loss functions that guide the unsupervised lightness correction of Aleth-NeRF, facilitating improve novel views synthesis and enhancement in low-light &\&& over-exposure conditions. An overview of the unsupervised training strategy is shown in Fig. 2.

NeRF Regression Loss:

Direct apply NeRF’s MSE loss function msesubscript𝑚𝑠𝑒\mathcal{L}_{mse}caligraphic_L start_POSTSUBSCRIPT italic_m italic_s italic_e end_POSTSUBSCRIPT in low-light and over-bright scenes would result in an imbalance of pixel weights. Dark pixels with small values would contribute minimally, while bright pixels with larger values would dominate the contribution. To rectify this imbalance and ensure effective NeRF training under non-standard lighting conditions, we propose the incorporation of an inverse tone curve (Brooks et al. 2019; Cui et al. 2021) denoted as ΦΦ\Phiroman_Φ. This curve serves to re-balance pixel weights, additionally we introduce a small value ϵ=1e3italic-ϵ1superscript𝑒3\epsilon=1e^{-3}italic_ϵ = 1 italic_e start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT within the regression loss to facilitate more accurate estimations. As a result, the formulated inverse tone MSE loss itmsesubscript𝑖𝑡𝑚𝑠𝑒\mathcal{L}_{it-mse}caligraphic_L start_POSTSUBSCRIPT italic_i italic_t - italic_m italic_s italic_e end_POSTSUBSCRIPT for NeRF training is presented as follows:

Φ(x)=12sin(sin1(12x)3),Φ𝑥12𝑠𝑖𝑛𝑠𝑖superscript𝑛112𝑥3\displaystyle\Phi(x)=\frac{1}{2}-sin(\frac{sin^{-1}(1-2x)}{3}),roman_Φ ( italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG - italic_s italic_i italic_n ( divide start_ARG italic_s italic_i italic_n start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( 1 - 2 italic_x ) end_ARG start_ARG 3 end_ARG ) , (10)
itmse=𝐫𝐑Φ(𝐂^(𝐫)+ϵ)Φ(𝐂(𝐫)+ϵ)2,subscript𝑖𝑡𝑚𝑠𝑒superscriptsubscript𝐫𝐑superscriptnormΦ^𝐂𝐫italic-ϵΦ𝐂𝐫italic-ϵ2\displaystyle\mathcal{L}_{it-mse}=\sum_{\textbf{r}}^{\textbf{R}}||\Phi(\hat{% \textbf{C}}(\textbf{r})+\epsilon)-\Phi(\textbf{C}(\textbf{r})+\epsilon)||^{2},caligraphic_L start_POSTSUBSCRIPT italic_i italic_t - italic_m italic_s italic_e end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT R end_POSTSUPERSCRIPT | | roman_Φ ( over^ start_ARG C end_ARG ( r ) + italic_ϵ ) - roman_Φ ( C ( r ) + italic_ϵ ) | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,

the comparison of msesubscript𝑚𝑠𝑒\mathcal{L}_{mse}caligraphic_L start_POSTSUBSCRIPT italic_m italic_s italic_e end_POSTSUBSCRIPT and itmsesubscript𝑖𝑡𝑚𝑠𝑒\mathcal{L}_{it-mse}caligraphic_L start_POSTSUBSCRIPT italic_i italic_t - italic_m italic_s italic_e end_POSTSUBSCRIPT is shown in Fig. 4(a), inverse tone MSE loss itmsesubscript𝑖𝑡𝑚𝑠𝑒\mathcal{L}_{it-mse}caligraphic_L start_POSTSUBSCRIPT italic_i italic_t - italic_m italic_s italic_e end_POSTSUBSCRIPT can enhance scene clarity while also suppress noise to a certain extent.

Degree &\&& Contrast Loss:

In order to regulate the level of enhancement, we incorporate a degree loss denoted as desubscript𝑑𝑒\mathcal{L}_{de}caligraphic_L start_POSTSUBSCRIPT italic_d italic_e end_POSTSUBSCRIPT on the predict normal-light images 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT. desubscript𝑑𝑒\mathcal{L}_{de}caligraphic_L start_POSTSUBSCRIPT italic_d italic_e end_POSTSUBSCRIPT involves a hyperparameter 𝐞𝐞\mathbf{e}bold_e, which signifies the enhancement degree inherent to Aleth-NeRF. To compute desubscript𝑑𝑒\mathcal{L}_{de}caligraphic_L start_POSTSUBSCRIPT italic_d italic_e end_POSTSUBSCRIPT, a global average pooling operation is applied to the 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT of each training batch. Subsequently, we minimize the degree loss desubscript𝑑𝑒\mathcal{L}_{de}caligraphic_L start_POSTSUBSCRIPT italic_d italic_e end_POSTSUBSCRIPT by comparing the pooled 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT with the specified degree value 𝐞𝐞\mathbf{e}bold_e. We set 𝐞𝐞\mathbf{e}bold_e to 0.45 in all experiments, ablation analyse of 𝐞𝐞\mathbf{e}bold_e is in Fig. 4(b).

de=avgpool(𝐂^nor(𝐫))𝐞2.subscript𝑑𝑒superscriptnormavgpoolsuperscript^𝐂nor𝐫𝐞2\mathcal{L}_{de}=||\rm{avgpool}(\hat{\textbf{C}}^{nor}(\textbf{r}))-\mathbf{e}% ||^{2}.caligraphic_L start_POSTSUBSCRIPT italic_d italic_e end_POSTSUBSCRIPT = | | roman_avgpool ( over^ start_ARG C end_ARG start_POSTSUPERSCRIPT roman_nor end_POSTSUPERSCRIPT ( r ) ) - bold_e | | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (11)

Then we additional design a contrast degree loss cosubscript𝑐𝑜\mathcal{L}_{co}caligraphic_L start_POSTSUBSCRIPT italic_c italic_o end_POSTSUBSCRIPT between the adjacent sampling pixels of predict normal-light images 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT and GT adverse-light images 𝐂advsuperscript𝐂𝑎𝑑𝑣\textbf{C}^{adv}C start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT:

co=subscript𝑐𝑜absent\displaystyle\mathcal{L}_{co}=caligraphic_L start_POSTSUBSCRIPT italic_c italic_o end_POSTSUBSCRIPT = k[+1,1](𝐂^nor(𝐫)𝐂^nor(𝐫+k))limit-fromsubscript𝑘11superscript^𝐂𝑛𝑜𝑟𝐫superscript^𝐂𝑛𝑜𝑟𝐫𝑘\displaystyle\sum_{k\in[+1,-1]}(\hat{\textbf{C}}^{nor}(\textbf{r})-\hat{% \textbf{C}}^{nor}(\textbf{r}+k))-∑ start_POSTSUBSCRIPT italic_k ∈ [ + 1 , - 1 ] end_POSTSUBSCRIPT ( over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT ( r ) - over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT ( r + italic_k ) ) - (12)
𝐞η(𝐂adv(𝐫)𝐂adv(𝐫+k)),𝐞𝜂superscript𝐂𝑎𝑑𝑣𝐫superscript𝐂𝑎𝑑𝑣𝐫𝑘\displaystyle\mathbf{e}\cdot\eta\cdot(\textbf{C}^{adv}(\textbf{r})-\textbf{C}^% {adv}(\textbf{r}+k)),bold_e ⋅ italic_η ⋅ ( C start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT ( r ) - C start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT ( r + italic_k ) ) ,

the contrast degree loss would maintain the structure similarity between 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT and 𝐂advsuperscript𝐂𝑎𝑑𝑣\textbf{C}^{adv}C start_POSTSUPERSCRIPT italic_a italic_d italic_v end_POSTSUPERSCRIPT, meanwhile control the contrast degree of 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT by parameter η𝜂\etaitalic_η, ablation analyze of η𝜂\etaitalic_η in low-light scene is shown in Fig. 4, when enhance degree 𝐞𝐞\mathbf{e}bold_e is fixed, higher η𝜂\etaitalic_η represents higher contrast in 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT. While in over-exposure scene, 𝐞η𝐞𝜂\mathbf{e}\cdot\etabold_e ⋅ italic_η would fixed to 0.50.50.50.5.

Color Constancy Loss:

Photos taken in low-light &\&& over exposure conditions would easily lose their color information, directly remove &\&& add the concealing fields may cause color imbalance. To regularize the color of the predict normal-light images 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT, we introduce color constancy loss ccsubscript𝑐𝑐\mathcal{L}_{cc}caligraphic_L start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT on 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT. Here we assume that 𝐂^norsuperscript^𝐂𝑛𝑜𝑟\hat{\textbf{C}}^{nor}over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT obey the gray-world assumption (Buchsbaum 1980; Guo et al. 2020; Jin, Yang, and Tan 2022), as follows:

cc=p,q(𝐂^nor(𝐫)p𝐂^nor(𝐫)q)2,subscript𝑐𝑐subscript𝑝𝑞superscriptsuperscript^𝐂𝑛𝑜𝑟superscript𝐫𝑝superscript^𝐂𝑛𝑜𝑟superscript𝐫𝑞2\mathcal{L}_{cc}=\sum_{p,q}(\hat{\textbf{C}}^{nor}(\textbf{r})^{p}-\hat{% \textbf{C}}^{nor}(\textbf{r})^{q})^{2},caligraphic_L start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_p , italic_q end_POSTSUBSCRIPT ( over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT ( r ) start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT - over^ start_ARG C end_ARG start_POSTSUPERSCRIPT italic_n italic_o italic_r end_POSTSUPERSCRIPT ( r ) start_POSTSUPERSCRIPT italic_q end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (13)

where (p,q){(R,G),(G,B),(B,R)}𝑝𝑞𝑅𝐺𝐺𝐵𝐵𝑅(p,q)\in\{(R,G),(G,B),(B,R)\}( italic_p , italic_q ) ∈ { ( italic_R , italic_G ) , ( italic_G , italic_B ) , ( italic_B , italic_R ) } represents any pair of color channels, an ablation of color constancy loss ccsubscript𝑐𝑐\mathcal{L}_{cc}caligraphic_L start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT can be found in Fig. 4(c).

Above all, Aleth-NeRF’s loss function \mathcal{L}caligraphic_L comprises four components: NeRF rendering loss itmsesubscript𝑖𝑡𝑚𝑠𝑒\mathcal{L}_{it-mse}caligraphic_L start_POSTSUBSCRIPT italic_i italic_t - italic_m italic_s italic_e end_POSTSUBSCRIPT, along with three unsupervised lightness correction losses (desubscript𝑑𝑒\mathcal{L}_{de}caligraphic_L start_POSTSUBSCRIPT italic_d italic_e end_POSTSUBSCRIPT, cosubscript𝑐𝑜\mathcal{L}_{co}caligraphic_L start_POSTSUBSCRIPT italic_c italic_o end_POSTSUBSCRIPT, and ccsubscript𝑐𝑐\mathcal{L}_{cc}caligraphic_L start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT). The overall training loss is then represented as:

=itmse+λ1de+λ2co+λ3cc,subscript𝑖𝑡𝑚𝑠𝑒subscript𝜆1subscript𝑑𝑒subscript𝜆2subscript𝑐𝑜subscript𝜆3subscript𝑐𝑐\mathcal{L}=\mathcal{L}_{it-mse}+\lambda_{1}\cdot\mathcal{L}_{de}+\lambda_{2}% \cdot\mathcal{L}_{co}+\lambda_{3}\cdot\mathcal{L}_{cc},caligraphic_L = caligraphic_L start_POSTSUBSCRIPT italic_i italic_t - italic_m italic_s italic_e end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ caligraphic_L start_POSTSUBSCRIPT italic_d italic_e end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ caligraphic_L start_POSTSUBSCRIPT italic_c italic_o end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ⋅ caligraphic_L start_POSTSUBSCRIPT italic_c italic_c end_POSTSUBSCRIPT , (14)

where λ1subscript𝜆1\lambda_{1}italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, λ2subscript𝜆2\lambda_{2}italic_λ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and λ3subscript𝜆3\lambda_{3}italic_λ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are three non-negative parameters to balance total loss weights, which we set to 1e31superscript𝑒31e^{-3}1 italic_e start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT, 1e31superscript𝑒31e^{-3}1 italic_e start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT and 1e81superscript𝑒81e^{-8}1 italic_e start_POSTSUPERSCRIPT - 8 end_POSTSUPERSCRIPT respectively.

Refer to caption
Figure 5: An comparison of enhancement results and model efficiency with RAW-NeRF (Mildenhall et al. 2021), note that RAW-NeRF take 16-bits HDR image as inputs while NeRF &\&& Aleth-NeRF take 8-bits LDR image as inputs.

Experiments

In this section, we first introduce our collected dataset with various exposure paired multi-view images, then we present the novel view synthesis results of Aleth-NeRF under both low-light and under-exposure settings. For practical implementation, our framework builds on the open-source PyTorch toolbox NeRF-Factory 222https://github.com/kakaobrain/nerf-factory. We utilize the Adam optimizer with an initial learning rate of 5e45superscript𝑒45e^{-4}5 italic_e start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT and employ a cosine learning rate decay strategy every 2500 iterations. The training batch size is set at 4096 for a total of 62500 iterations. To ensure fairness, all comparative experiments within the proposed dataset use the same training strategies and parameter configurations.

Challenging Illumination Multi-view Dataset

In this section, we introduce our collected paired low-light &\&& normal-light &\&& over-exposure multi-view dataset, names LOM dataset. Previous work (Mildenhall et al. 2021) also includes some low-light scenes, but their dataset concentrates on RAW denoising rather than sRGB low-light enhancement and does not include normal-light sRGB ground truth, making it hard to evaluate enhancement performance in novel view synthesis.

scene buu chair sofa bike shrub collected views 25 48 33 40 35 training views 22 43 29 36 30 evaluation views 3 5 4 4 5

Table 1: Details of the dataset split for LOM.

In our proposed LOM dataset, we collected 5 scenes (“buu”, “chair”, “sofa”, “bike”, “shrub”) in real-world. Each scene includes 2548similar-to254825\sim 4825 ∼ 48 images, for each view’s image, we generate low-light &\&& normal-light &\&& over-exposure pairs by adjusting exposure time and ISO while other configurations of the camera are fixed. We capture multi-view images by moving and rotating the camera tripod. Images are collected with resolution 3000 ×\times× 4000. We down-sample the original resolution with ratio 8 to 375 ×\times× 500 for convenience, and generate the ground truth view and angle information by adopting COLMAP (Schönberger and Frahm 2016; Schönberger et al. 2016) on the normal-light part images. For dataset split, in each scene, we choose 3 similar-to\sim 5 images as the testing set, 1 image as the validation set, and other images to be the training set, details of training and evaluation views split is shown in Table. 1. More details of our dataset please refer to supplementary part.

Refer to caption
Figure 6: Novel view synthesis comparison in low-light and over-exposure conditions.

Method buu chair sofa bike shrub mean PSNR/ SSIM/ LPIPS PSNR/ SSIM/ LPIPS PSNR/ SSIM/ LPIPS PSNR/ SSIM/ LPIPS PSNR/ SSIM/ LPIPS PSNR/ SSIM/ LPIPS NeRF  7.51/ 0.291/ 0.448  6.04/ 0.147/ 0.594 6.28/ 0.210/ 0.568  6.35/ 0.072/ 0.623  8.03/ 0.031/ 0.680  6.84/ 0.150/ 0.582 ① NeRF + Image Enhancement Methods NeRF + RetiNexNet 11.64/ 0.646/ 0.395 11.04/ 0.631/ 0.561 12.85/ 0.738/ 0.527 17.79/ 0.653/ 0.550 11.85/ 0.211/ 0.583 13.03/ 0.576/ 0.523 NeRF + Zero-DCE 17.81/ 0.833/ 0.357 12.44/ 0.684/ 0.547 14.43/ 0.787/ 0.539 10.16/ 0.468/ 0.557 12.58/ 0.282/ 0.540 13.48/ 0.610/ 0.488 NeRF + SCI 7.84/ 0.660/ 0.562 12.07/ 0.699/ 0.584 10.25/ 0.737/ 0.626 18.84/ 0.637/ 0.565 12.38/ 0.358/ 0.587 12.27/ 0.618/ 0.585 NeRF + IAT 14.03/ 0.656/ 0.453 19.08/ 0.800/ 0.565 10.49/ 0.528/ 0.678 17.16/ 0.657/ 0.534 16.15/ 0.344/ 0.573 15.38/ 0.597/ 0.561 ② Image Enhancement Methods + NeRF RetiNexNet + NeRF 16.19/ 0.780/ 0.396 16.89/ 0.756/ 0.543 16.98/ 0.807/ 0.577 18.00/ 0.707/ 0.482 14.86/ 0.284/ 0.518 16.58/ 0.667/ 0.503 Zero-DCE + NeRF 17.90/ 0.858/ 0.376 12.58/ 0.721/ 0.460 14.45/ 0.831/ 0.419 10.39/ 0.518/ 0.464 12.32/ 0.308/ 0.481 13.53/ 0.649/ 0.432 SCI + NeRF 7.76/ 0.692/ 0.525 19.77/ 0.802/ 0.674 10.08/ 0.772/ 0.520 13.44/ 0.658/ 0.435 18.16/ 0.503/ 0.475 13.84/ 0.689/ 0.510 IAT + NeRF 14.46/ 0.705/ 0.386 18.70/ 0.780/ 0.665 17.88/ 0.829/ 0.547 13.65/ 0.616/ 0.528 13.87/ 0.317/ 0.536 15.71/ 0.649/ 0.532 ③ Video Enhancement Methods + NeRF MBLLEN + NeRF 22.39/ 0.877/ 0.353 23.59/ 0.788/ 0.559 19.99/ 0.836/ 0.542 14.09/ 0.636/ 0.525 13.17/ 0.501/ 0.555 18.65/ 0.728/ 0.507 LLVE + NeRF 19.97/ 0.848/ 0.393 15.17/ 0.764/ 0.610 18.17/ 0.855/ 0.465 13.84/ 0.638/ 0.492 15.35/ 0.287/ 0.577 16.50/ 0.678/ 0.507 ④ Our Proposed End-to-end Method Aleth-NeRF 20.22/ 0.859/ 0.315 20.93/ 0.818/ 0.468 19.52/ 0.857/ 0.354 20.46/ 0.727/ 0.499 18.24/ 0.511/ 0.448 19.87/ 0.754/ 0.417

Table 2: We assess novel view synthesis performance in low-light settings by comparing generated images with ground truth normal-light views. Our evaluation metrics include PSNR \uparrow, SSIM \uparrow and LPIPS \downarrow. Bold denotes the best result, Underline denotes the second best result.

Method buu chair sofa bike shrub mean PSNR/ SSIM/ LPIPS PSNR/ SSIM/ LPIPS PSNR/ SSIM/ LPIPS PSNR/ SSIM/ LPIPS PSNR/ SSIM/ LPIPS PSNR/ SSIM/ LPIPS NeRF  7.12/0.674/0.499 11.05/ 0.741/ 0.418 10.22/ 0.783/ 0.475  9.65/ 0.698/ 0.416  9.96/ 0.405/ 0.480  9.60/ 0.660/ 0.457 ① NeRF + Exposure Correction Methods NeRF + HE 14.34/ 0.613/ 0.673 15.37/ 0.661/ 0.590 16.69/ 0.733/ 0.558 15.32/ 0.627/ 0.458 11.97/ 0.468/ 0.556 14.74/ 0.620/ 0.567 NeRF + IAT 14.11/ 0.780/ 0.433 19.24/ 0.810/ 0.491 16.60/ 0.837/ 0.459 17.73/ 0.760/ 0.394 14.05/ 0.381/ 0.499 16.35/ 0.714/ 0.455 NeRF + MSEC 16.13/ 0.800/ 0.427 15.60/ 0.786/ 0.472 16.56/ 0.807/ 0.495 12.60/ 0.716/ 0.465 13.66/ 0.332/ 0.509 14.91/ 0.688/ 0.473 ② Exposure Correction Methods + NeRF HE + NeRF 14.65/ 0.743/ 0.519 15.55/ 0.736/ 0.497 17.11/ 0.781/ 0.477 15.77/ 0.692/ 0.367 12.61/ 0.506/ 0.622 15.13/ 0.692/ 0.496 IAT + NeRF 16.22/ 0.815/ 0.486 18.98/ 0.799/ 0.503 18.45/ 0.849/ 0.478 19.63/ 0.776/ 0.408 15.63/ 0.434/ 0.477 17.78/ 0.734/ 0.470 MSEC + NeRF 15.53/ 0.817/ 0.499 16.95/ 0.758/ 0.580 19.60/ 0.817/ 0.498 18.90/ 0.725/ 0.483 15.48/ 0.400/ 0.499 17.29/ 0.703/ 0.512 ③ Our Proposed End-to-end Method Aleth-NeRF 16.78/ 0.805/ 0.611 20.08/ 0.820/ 0.499 17.85/ 0.852/ 0.458 19.85/ 0.773/ 0.392 15.91/ 0.477/ 0.483 18.09/ 0.745/ 0.488

Table 3: We assess novel view synthesis performance in over-exposure settings by comparing generated images with ground truth normal-light views. Our evaluation metrics include PSNR \uparrow, SSIM \uparrow and LPIPS \downarrow. Bold denotes the best result, Underline denotes the second best result.

Novel View Synthesis in Low-light Condition

In this section, we show novel view synthesis results in low-light scenes, and make comparison between predicted normal-light views with ground truth normal-light views (see Table. 2). We design multiple comparison experiments in proposed dataset as follow:

We start by training vanilla NeRF (Mildenhall et al. 2020) under low-light conditions as a baseline (first row in Table 2). ① Subsequently, we apply various state-of-the-art enhancement methods to post-process the rendered NeRF results (denote as “NeRF + ” in Table 2). ② Then we pre-process low-light images with enhancement methods and then train NeRF on these enhanced images (denote as “* + NeRF” in Table. 2). Notably, we observe that the “NeRF + *” methods tend to underperform compared to the “* + NeRF” methods in low-light setting, especially in perceptual similarity (LPIPS \downarrow). This potentially arise from NeRF’s production of poor-quality images in dark scenes, causing enhancement methods to struggle when applied to the generated dark scene by NeRF. ③ At last, we apply two video enhancement methods MBLLEN (Lv et al. 2018) and LLVE (Zhang et al. 2021a) to pre-process low-light images and then trained with NeRF. ④ Above all, our Aleth-NeRF could gain the best or second best performance on most scenes (see Table. 2), at the same time Aleth-NeRF does not require any training data as prior knowledge, and trained in an end-to-end manner without any pre-processing nor post-processing.

Beyond enhancement methods, RAW-NeRF (Mildenhall et al. 2021) also conducted experiments under low-light conditions, due to their model is trained on RAW data and hard to implement on our dataset, a comparison with RAW-NeRF on their proposed dataset scene is shown in Fig. 5, while NeRF and Aleth-NeRF are trained with 8-bits LDR inputs and RAW-NeRF is trained with 16-bits HDR inputs, our Aleth-NeRF could also gain delight enhancement results while save much data occupy and training time.

Fig. 6 shows qualitative visualization results in LOM dataset’s “sofa” and “shrub” scenes, with the comparison of current SOTA image enhancement methods Zero-DCE (Guo et al. 2020), SCI (Ma et al. 2022) and video enhancement method LLVE (Zhang et al. 2021a), we can see that our Aleth-NeRF could gain more vivid enhancement results and closer to ground truth, meanwhile also gain much better in multi-view consistency (see depth maps). More visualization results please refer to supplementary part.

Novel View Synthesis in Over-Exposure Condition

Then we show novel view synthesis results in over-exposure scenes, similar to low-light scenes, we make comparison with 3 mainstream exposure correction methods, including traditional histogram equalization (HE) method (Gonzalez and Woods 2006) and two current SOTA deep network methods MSEC (Afifi et al. 2021) and IAT (Cui et al. 2022).

As it shown in Table. 3, we use vanilla NeRF trained on over-exposure images to serve as a baseline. ① Then we adopt exposure correction methods to post-process the generated results of NeRF (denote as “NeRF + *”). ② We also adopt exposure correction methods to pre-process over-exposure images and then train NeRF on the processed images (denote as “* + NeRF”). ③ Our proposed Aleth-NeRF method. could gain best results in PSNR and SSIM, but the performance in LPIPS is not very good. Some visualization results are shown in Fig. 6, our Aleth-NeRF could suppress over-bright meanwhile maintain multi-view consistency.

One major reason for the less pronounced LPIPS results on over-exposed images maybe that Aleth-NeRF’s training directly based on degraded overexposed images. Therefore, unsupervised learning faces challenges in compensating for these degraded pieces of information. Aleth-NeRF’s assumption about darkness is grounded in the Concealing Field present in the air, we believe that a more effective modeling of darkness might offer improved solutions for addressing and handling overexposure issues.

Conclusion

In this paper, we propose Aleth-NeRF model. Inspired by wisdom of ancient Greeks, we introduce a concept: Concealing Field. After incorporating the concealing fields into NeRF, we can effectively address novel view synthesis tasks in scenes with low-light &\&& over-exposure conditions. Additionally, we proposed several unsupervised loss functions to constrain the generation of the concealing fields, make Aleth-NeRF could achieve SOTA reults on novel view synthesis under low-light &\&& over-exposure conditions.

One limitation is that Aleth-NeRF should be specifically trained for each scene, which is the same as vanilla NeRF. Besides, Aleth-NeRF may fail in scenes with non-uniform lighting conditions (Wang, Xu, and Lau 2022) or shadow conditions (Qu et al. 2017), we believe that this is also a valuable research topic for future exploration.

Acknowledgements

This work was done during first author internship at Shanghai Artificial Intelligence Laboratory. This work was partially supported by JST Moonshot R&\&&D Grant Number JPMJPS2011, CREST Grant Number JPMJCR2015 and Basic Research Grant (Super AI) of Institute for AI and Beyond of the University of Tokyo. And this work is supported in part by the National Key R&\&&D Program of China (NO. 2022ZD0160100).

Meanwhile, many thanks to the colleagues from the University of Tokyo MIL Laboratory and Shanghai AI Lab for their active discussions in this work. Additionally, special thanks to Tianhan Xu for his valuable suggestions during the dataset collection phase of this work.

References

  • Afifi et al. (2021) Afifi, M.; Derpanis, K. G.; Ommer, B.; and Brown, M. S. 2021. Learning Multi-Scale Photo Exposure Correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9157–9167.
  • Bahat et al. (2022) Bahat, Y.; Zhang, Y.; Sommerhoff, H.; Kolb, A.; and Heide, F. 2022. Neural Volume Super-Resolution.
  • Barron et al. (2021) Barron, J. T.; Mildenhall, B.; Tancik, M.; Hedman, P.; Martin-Brualla, R.; and Srinivasan, P. P. 2021. Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. arXiv:2103.13415.
  • Brooks et al. (2019) Brooks, T.; Mildenhall, B.; Xue, T.; Chen, J.; Sharlet, D.; and Barron, J. T. 2019. Unprocessing Images for Learned Raw Denoising. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  • Buchsbaum (1980) Buchsbaum, G. 1980. A spatial processor model for object colour perception. Journal of the Franklin Institute, 310(1): 1–26.
  • Cui et al. (2022) Cui, Z.; Li, K.; Gu, L.; Su, S.; Gao, P.; Jiang, Z.; Qiao, Y.; and Harada, T. 2022. You Only Need 90K Parameters to Adapt Light: a Light Weight Transformer for Image Enhancement and Exposure Correction. In 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022. BMVA Press.
  • Cui et al. (2021) Cui, Z.; Qi, G.-J.; Gu, L.; You, S.; Zhang, Z.; and Harada, T. 2021. Multitask AET With Orthogonal Tangent Regularity for Dark Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2553–2562.
  • Delbracio et al. (2021) Delbracio, M.; Kelly, D.; Brown, M. S.; and Milanfar, P. 2021. Mobile Computational Photography: A Tour. Annual Review of Vision Science, 7(1): 571–604. PMID: 34524880.
  • Deng et al. (2022) Deng, K.; Liu, A.; Zhu, J.-Y.; and Ramanan, D. 2022. Depth-supervised NeRF: Fewer Views and Faster Training for Free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  • Dudhane et al. (2022) Dudhane, A.; Zamir, S. W.; Khan, S.; Khan, F. S.; and Yang, M.-H. 2022. Burst image restoration and enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5759–5768.
  • Godard, Matzen, and Uyttendaele (2018) Godard, C.; Matzen, K.; and Uyttendaele, M. 2018. Deep Burst Denoising. In Proceedings of the European Conference on Computer Vision (ECCV).
  • Gonzalez and Woods (2006) Gonzalez, R. C.; and Woods, R. E. 2006. Digital Image Processing (3rd Edition). USA: Prentice-Hall, Inc. ISBN 013168728X.
  • Guo et al. (2020) Guo, C. G.; Li, C.; Guo, J.; Loy, C. C.; Hou, J.; Kwong, S.; and Cong, R. 2020. Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 1780–1789.
  • Guo, Li, and Ling (2017) Guo, X.; Li, Y.; and Ling, H. 2017. LIME: Low-Light Image Enhancement via Illumination Map Estimation. IEEE Transactions on Image Processing, 26(2): 982–993.
  • Guo et al. (2022) Guo, Y.-C.; Kang, D.; Bao, L.; He, Y.; and Zhang, S.-H. 2022. NeRFReN: Neural Radiance Fields With Reflections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 18409–18418.
  • Hasinoff et al. (2016) Hasinoff, S. W.; Sharlet, D.; Geiss, R.; Adams, A.; Barron, J. T.; Kainz, F.; Chen, J.; and Levoy, M. 2016. Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM Transactions on Graphics (ToG), 35(6): 1–12.
  • Heidegger, Stambaugh, and Schmidt (2010) Heidegger, M.; Stambaugh, J.; and Schmidt, D. 2010. Being and Time. SUNY Series in Contemporary Co. State University of New York Press.
  • Huang et al. (2023) Huang, J.; Zhao, F.; Zhou, M.; Xiao, J.; Zheng, N.; Zheng, K.; and Xiong, Z. 2023. Learning Sample Relationship for Exposure Correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9904–9913.
  • Jain, Tancik, and Abbeel (2021) Jain, A.; Tancik, M.; and Abbeel, P. 2021. Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 5885–5894.
  • Jiang et al. (2021) Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; and Wang, Z. 2021. Enlightengan: Deep light enhancement without paired supervision. IEEE Transactions on Image Processing, 30: 2340–2349.
  • Jin, Yang, and Tan (2022) Jin, Y.; Yang, W.; and Tan, R. T. 2022. Unsupervised Night Image Enhancement: When Layer Decomposition Meets Light-Effects Suppression. arXiv preprint arXiv:2207.10564.
  • Jun-Seong et al. (2022) Jun-Seong, K.; Yu-Ji, K.; Ye-Bin, M.; and Oh, T.-H. 2022. HDR-Plenoxels: Self-Calibrating High Dynamic Range Radiance Fields. In ECCV.
  • Land (1986) Land, E. H. 1986. An Alternative Technique for the Computation of the Designator in the Retinex Theory of Color Vision. Proceedings of the National Academy of Sciences of the United States of America.
  • Levy et al. (2023) Levy, D.; Peleg, A.; Pearl, N.; Rosenbaum, D.; Akkaynak, D.; Korman, S.; and Treibitz, T. 2023. SeaThru-NeRF: Neural Radiance Fields in Scattering Media. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 56–65.
  • Lindell, Martel, and Wetzstein (2021) Lindell, D. B.; Martel, J. N. P.; and Wetzstein, G. 2021. AutoInt: Automatic Integration for Fast Neural Volume Rendering. In Proc. CVPR.
  • Liu et al. (2023) Liu, L.; An, J.; Liu, J.; Yuan, S.; Chen, X.; Zhou, W.; Li, H.; Wang, Y. F.; and Tian, Q. 2023. Low-Light Video Enhancement with Synthetic Event Guidance. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2): 1692–1700.
  • Lv et al. (2018) Lv, F.; Lu, F.; Wu, J.; and Lim, C. 2018. MBLLEN: Low-light Image/Video Enhancement Using CNNs. In British Machine Vision Conference.
  • Lyu et al. (2022) Lyu, L.; Tewari, A.; Leimkuehler, T.; Habermann, M.; and Theobalt, C. 2022. Neural Radiance Transfer Fields for Relightable Novel-view Synthesis with Global Illumination. In ECCV.
  • Ma et al. (2021) Ma, L.; Li, X.; Liao, J.; Zhang, Q.; Wang, X.; Wang, J.; and Sander, P. V. 2021. Deblur-NeRF: Neural Radiance Fields from Blurry Images. arXiv preprint arXiv:2111.14292.
  • Ma et al. (2022) Ma, L.; Ma, T.; Liu, R.; Fan, X.; and Luo, Z. 2022. Toward Fast, Flexible, and Robust Low-Light Image Enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5637–5646.
  • Martin-Brualla et al. (2021) Martin-Brualla, R.; Radwan, N.; Sajjadi, M. S. M.; Barron, J. T.; Dosovitskiy, A.; and Duckworth, D. 2021. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In CVPR.
  • Mildenhall et al. (2018) Mildenhall, B.; Barron, J. T.; Chen, J.; Sharlet, D.; Ng, R.; and Carroll, R. 2018. Burst denoising with kernel prediction networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2502–2510.
  • Mildenhall et al. (2021) Mildenhall, B.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P. P.; and Barron, J. T. 2021. NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images. arXiv.
  • Mildenhall et al. (2020) Mildenhall, B.; Srinivasan, P. P.; Tancik, M.; Barron, J. T.; Ramamoorthi, R.; and Ng, R. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.
  • Moran et al. (2020) Moran, S.; Marza, P.; McDonagh, S.; Parisot, S.; and Slabaugh, G. 2020. DeepLPF: Deep Local Parametric Filters for Image Enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.
  • Müller et al. (2022) Müller, T.; Evans, A.; Schied, C.; and Keller, A. 2022. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Trans. Graph., 41(4): 102:1–102:15.
  • Nsampi, Hu, and Wang (2021) Nsampi, N. E.; Hu, Z.; and Wang, Q. 2021. Learning exposure correction via consistency modeling. In Proc. Brit. Mach. Vision Conf.
  • Pearl, Treibitz, and Korman (2022) Pearl, N.; Treibitz, T.; and Korman, S. 2022. NAN: Noise-Aware NeRFs for Burst-Denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12672–12681.
  • Qu et al. (2017) Qu, L.; Tian, J.; He, S.; Tang, Y.; and Lau, R. W. H. 2017. DeshadowNet: A Multi-context Embedding Deep Network for Shadow Removal. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2308–2316.
  • Rudnev et al. (2022) Rudnev, V.; Elgharib, M.; Smith, W.; Liu, L.; Golyanik, V.; and Theobalt, C. 2022. NeRF for Outdoor Scene Relighting. In European Conference on Computer Vision (ECCV).
  • Sara Fridovich-Keil and Alex Yu et al. (2022) Sara Fridovich-Keil and Alex Yu; Tancik, M.; Chen, Q.; Recht, B.; and Kanazawa, A. 2022. Plenoxels: Radiance Fields without Neural Networks. In CVPR.
  • Schönberger and Frahm (2016) Schönberger, J. L.; and Frahm, J.-M. 2016. Structure-from-Motion Revisited. In Conference on Computer Vision and Pattern Recognition (CVPR).
  • Schönberger et al. (2016) Schönberger, J. L.; Zheng, E.; Pollefeys, M.; and Frahm, J.-M. 2016. Pixelwise View Selection for Unstructured Multi-View Stereo. In European Conference on Computer Vision (ECCV).
  • Srinivasan et al. (2021) Srinivasan, P. P.; Deng, B.; Zhang, X.; Tancik, M.; Mildenhall, B.; and Barron, J. T. 2021. Nerv: Neural reflectance and visibility fields for relighting and view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7495–7504.
  • Verbin et al. (2022) Verbin, D.; Hedman, P.; Mildenhall, B.; Zickler, T.; Barron, J. T.; and Srinivasan, P. P. 2022. Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields. CVPR.
  • Wang et al. (2021a) Wang, C.; Wu, X.; Guo, Y.-C.; Zhang, S.-H.; Tai, Y.-W.; and Hu, S.-M. 2021a. NeRF-SR: High-Quality Neural Radiance Fields using Supersampling. arXiv.
  • Wang, Xu, and Lau (2022) Wang, H.; Xu, K.; and Lau, R. W. 2022. Local Color Distributions Prior for Image Enhancement. In Proceedings of the European Conference on Computer Vision (ECCV).
  • Wang et al. (2022) Wang, T.; Zhang, K.; Shen, T.; Luo, W.; Stenger, B.; and Lu, T. 2022. Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and Transformer-Based Method. arXiv preprint arXiv:2212.11548.
  • Wang et al. (2021b) Wang, Y.; Wan, R.; Yang, W.; Li, H.; Chau, L.-P.; and Kot, A. C. 2021b. Low-Light Image Enhancement with Normalizing Flow. arXiv preprint arXiv:2109.05923.
  • Wei et al. (2018) Wei, C.; Wang, W.; Yang, W.; and Liu, J. 2018. Deep Retinex Decomposition for Low-Light Enhancement. In British Machine Vision Conference.
  • Xin et al. (2021) Xin, H.; Qi, Z.; Ying, F.; Hongdong, L.; Xuan, W.; and Qing, W. 2021. HDR-NeRF: High Dynamic Range Neural Radiance Fields. arXiv preprint arXiv:2111.14451.
  • Yang et al. (2023) Yang, S.; Ding, M.; Wu, Y.; Li, Z.; and Zhang, J. 2023. Implicit Neural Representation for Cooperative Low-light Image Enhancement. arXiv:2303.11722.
  • Yu et al. (2021) Yu, A.; Li, R.; Tancik, M.; Li, H.; Ng, R.; and Kanazawa, A. 2021. PlenOctrees for Real-time Rendering of Neural Radiance Fields. In ICCV.
  • Zamir et al. (2020) Zamir, S. W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F. S.; Yang, M.-H.; and Shao, L. 2020. CycleISP: Real Image Restoration via Improved Data Synthesis. In CVPR.
  • Zhang et al. (2021a) Zhang, F.; Li, Y.; You, S.; and Fu, Y. 2021a. Learning Temporal Consistency for Low Light Video Enhancement From Single Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 4967–4976.
  • Zhang et al. (2021b) Zhang, X.; Srinivasan, P. P.; Deng, B.; Debevec, P.; Freeman, W. T.; and Barron, J. T. 2021b. NeRFactor: Neural Factorization of Shape and Reflectance under an Unknown Illumination. ACM Trans. Graph., 40(6).

Dataset Details

We captured the scenes using a DJI Osmo Action 3 camera (Fig.7(b)). To obtain multi-view images, we moved and rotated the camera tripod (Fig.7(a)). Paired low-light, normal-light, and over-exposure images were created by adjusting the exposure time and ISO value, while maintaining fixed camera configurations, all sRGB images are generated by default camera ISP processor. For precision, each image also featured a DSLR color checker (Fig. 7(c)) to ensure accurate color representation and facilitate a comprehensive assessment of both Aleth-NeRF-generated images and comparison methods.

As shown in Fig. 8, we divide the dataset into training views encompassing low-light &\&& over-exposure images, and evaluation views representing the ground truth of normal lighting conditions. The number of training and testing views for each scene can be found in Table 1. The ground truth of the evaluation views we selected is solely for assessing the quality of model generation. In practical usage, Aleth-NeRF is capable of generating seamless and coherent normal light views with given camera position. The Y-channel pixel distribution of low-light &\&& normal-light &\&& over-exposure images are shown in the bottom of Fig. 8, where x-label is the pixel value range from 0255similar-to02550\sim 2550 ∼ 255 and y-label is the probability of corresponding pixel value.

Aleth-NeRF on Other Framework

The Concealing Field assumption of Aleth-NeRF is a broad concept, allowing Aleth-NeRF to be applied to other NeRF (Mildenhall et al. 2020) follow-up variants like Instant-NGP (Müller et al. 2022) to expedite training. To explore this, we conducted an experiment by integrating Aleth-NeRF into the Instant-NGP framework.

An comparison of low-light condition “buu” scene is shown in Table. 4. Within the Instant-NGP framework, the training time for Aleth-NeRF is significantly reduced, although the resulting performance might be slightly inferior compared to the architecture solely based on NeRF.

Refer to caption
Figure 7: Dataset collection equipment.
Table 4: Aleth-NeRF on different frameworks, low-light condition “buu” scene for example.

PSNR\uparrow/SSIM\uparrow/LPIPS\downarrow Training Time + NeRF 19.14/ 0.839 / 0.306 2.2hours / 4 GPUs + Instant-NGP 18.09/ 0.806/ 0.344 4.8min/ 1GPU

More Visualization Results

We show more novel view generation examples of “buu” scene in Fig. 9 and “chair” scene in Fig. 10, these examples encompass rendering results under both low-light and over-exposure conditions. In panel (a), we show the results of vanilla NeRF rendering. Panels (b, d, f) correspond to the “NeRF + *” outlined in Table 1 of the main text, while ”*” denotes enhancement methods [(b): RetiNexNet (Wei et al. 2018), (d): Zero-DCE (Guo et al. 2020), (f): LLVE (Zhang et al. 2021a))] in low-light scenes and exposure correction methods [(b): histogram equlization (Gonzalez and Woods 2006), (d): MSEC (Afifi et al. 2021) and (f): IAT (Cui et al. 2022)] in over-exposure scenes. On the other hand, panels (c, e, g) illustrate the “ + NeRF” strategies as mentioned in Table 1 of the main text, these methods involve enhancing pre-processed training views using enhancement techniques and then training NeRF using the improved views. (h) is the rendering results of our proposed Aleth-NeRF. (g) is the ground truth nomal-light novel view.

We could find that “NeRF + *” series methods perform worse in image quality when compare with “* + NeRF” series methods, especially in the low-light condition comparision, this discrepancy implies that NeRF training is susceptible to disruptions caused by low-light and over-exposure conditions, which make the enhancement methods ineffective on the generated views. “* + NeRF” series methods denotes to pre-process multi-views with 2D enhancement models, sometimes fail to ensure 3D consistency, leading to blurriness as depicted in Fig. 9 and Fig. 10.

Future Research Directions

We see significant potential in exploring novel view synthesis in challenging lighting conditions. This task involves synthesizing new views and enhancing our understanding of the physical world. Here are some feasible directions:

  • Artificially collecting camera coordinates is labour-intensive, algorithms like COLMAP (Schönberger and Frahm 2016; Schönberger et al. 2016) often fail in the abnormal lighting conditions. Therefore, the exploration and development of coordinate-free NeRFs are essential.

  • Similar to the original version of NeRF, Aleth-NeRF requires separate training for each scene. Therefore, a key challenge lies in designing a scene-generalizable NeRF that performs well in adverse lighting conditions.

  • Aleth-NeRF can achieve unsupervised lightness correction, while works like NeRF-W (Martin-Brualla et al. 2021) can balance NeRF for different lighting conditions. Therefore, exploring how to design more effective physical modeling to simultaneously accomplish both tasks is a valuable direction.

Refer to caption
Figure 8: We collected a dataset comprising 5 scenes. Each scene we split the low-light and over-exposure training views used for model training, along with ground-truth normal-light novel views for performance evaluation.
Refer to caption
Figure 9: Comparison of novel view generation under low-light &\&& over-exposure “buu” scene.
Refer to caption
Figure 10: Comparison of novel view generation under low-light &\&& over-exposure “chair” scene.