Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

WISER: multimodal variational inference for full-waveform inversion without dimensionality reduction

Ziyi Yin1, Rafael Orozco1, Felix J. Herrmann1
1Georgia Institute of Technology
1Correspondence: ziyi.yin@gatech.edu
Abstract

We present a semi-amortized variational inference framework designed for computationally feasible uncertainty quantification in 2D full-waveform inversion to explore the multimodal posterior distribution without dimensionality reduction. The framework is called WISER, short for full-Waveform variational Inference via Subsurface Extensions with Refinements. WISER leverages the power of generative artificial intelligence to perform approximate amortized inference that is low-cost albeit showing an amortization gap. This gap is closed through non-amortized refinements that make frugal use of acoustic wave physics. Case studies illustrate that WISER is capable of full-resolution, computationally feasible, and reliable uncertainty estimates of velocity models and imaged reflectivities.

1 Introduction

Full-waveform inversion (FWI) aims to estimate unknown multi-dimensional (D2D2\mathrm{D}\geq 2roman_D ≥ 2) velocity models, denoted as 𝐱𝐱\mathbf{x}bold_x, from noisy seismic data, 𝐲𝐲\mathbf{y}bold_y, by inverting the nonlinear forward operator, \mathcal{F}caligraphic_F, which relates 𝐱𝐱\mathbf{x}bold_x and 𝐲𝐲\mathbf{y}bold_y via 𝐲=(𝐱)+ϵ𝐲𝐱bold-italic-ϵ\mathbf{y}=\mathcal{F}(\mathbf{x})+\boldsymbol{\epsilon}bold_y = caligraphic_F ( bold_x ) + bold_italic_ϵ with ϵbold-italic-ϵ\boldsymbol{\epsilon}bold_italic_ϵ measurement noise [1]. FWI poses significant challenges, as it requires solving a high-dimensional, non-convex, and ill-posed inverse problem, with a computationally demanding forward operator in multiple dimensions. In addition, the inherent nonuniqueness of FWI results leads to multiple possible Earth models compatible with the observed data, underscoring the need for uncertainty quantification (UQ) to handle this multimodality.

The trade-off between accuracy and computational cost is a critical consideration in any high-dimensional inference routine with expensive forward operators [2], including FWI. To circumvent the costs associated with global optimization, several approaches have attempted localized UQ [3, 4, 5, 6] based on the Laplace approximation. However, these approaches may not capture the full complexities of multimodal parameter spaces. In contrast, a Bayesian inference approach offers a costly but comprehensive resolution of the posterior distribution, p(𝐱𝐲)𝑝conditional𝐱𝐲p(\mathbf{x}\mid\mathbf{y})italic_p ( bold_x ∣ bold_y ).

Bayesian inference algorithms are broadly categorized into two groups. The first, sampling-based methods, like Markov-chain Monte Carlo [McMC, 7], struggle with high-dimensional parameter spaces. To meet this challenge, they often rely on too restrictive low-dimensional parameterizations to reduce the number of sampling iterations [8, 9, 10, 11, 12, 13], which could bias the inference results, rendering them impractical for multi-D UQ studies especially when solutions are nonunique.

The second category, optimization-based methods, like variational inference [VI, 14], seek to approximate the posterior distribution using classes of known parameterized distributions. VI can be subdivided into amortized and non-amortized methods. Amortized VI involves a computationally intensive offline training phase, leveraging advances in generative artificial intelligence (genAI), particularly with models like conditional diffusion [15] and conditional normalizing flows [CNFs, 16]. After training, amortized VI provides rapid sampling during inference [17, 18], exemplified by the WISE framework [19] for multi-D FWI problems. However, these methods may suffer from an amortization gap — implying that the amortized networks may only deliver suboptimal inference for a single observation at inference time, particularly when trained with limited examples or when there exists a discrepancy between training and inference [20]. Conversely, non-amortized VI dedicates entire computational resources to the online inference [21, 22, 23]. They result in more accurate inference, but the costly optimization has to be carried out repeatedly for new observations, and integrating realistic priors can be challenging since the prior needs to be embedded involving density evaluations [24].

This paper introduces WISER as a semi-amortized VI framework to facilitate computationally feasible and reliable UQ for multi-D FWI without dimensionality reduction. Building on WISE, we train CNFs for efficient, suboptimal amortized inference, but then follow up with a crucial refinement step that only needs frugal use of the forward operator and its gradient. The refinement step aligns the posterior samples with the observation during inference, effectively bridging the amortization gap and enhancing inference accuracy.

Our contributions are organized as follows. We begin by outlining WISER in Algorithm 1. We explore the algorithm by explaining amortized VI with WISE, followed by computationally feasible multi-D physics-based refinement. The performance of WISER is demonstrated through realistic synthetic 2D case studies using the Compass model [25], showcasing improvements over WISE for both in- and out-of-distribution scenarios.

Algorithm 1 WISER: full-Waveform variational Inference via Subsurface Extensions with Refinements
1:Offline training phase
2:
3:Dataset generation
4:for i=1:N:𝑖1𝑁i=1:Nitalic_i = 1 : italic_N do
5:     𝐱(i)p(𝐱)similar-tosuperscript𝐱𝑖𝑝𝐱\mathbf{x}^{(i)}\sim p(\mathbf{x})bold_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∼ italic_p ( bold_x )
6:     ϵ(i)p(ϵ)similar-tosuperscriptbold-italic-ϵ𝑖𝑝bold-italic-ϵ\boldsymbol{\epsilon}^{(i)}\sim p(\boldsymbol{\epsilon})bold_italic_ϵ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ∼ italic_p ( bold_italic_ϵ )
7:     𝐲(i)=(𝐱(i))+ϵ(i)superscript𝐲𝑖superscript𝐱𝑖superscriptbold-italic-ϵ𝑖\mathbf{y}^{(i)}=\mathcal{F}(\mathbf{x}^{(i)})+\boldsymbol{\epsilon}^{(i)}bold_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = caligraphic_F ( bold_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) + bold_italic_ϵ start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT
8:     𝐲¯(i)=¯(𝐱0)𝐲(i)superscript¯𝐲𝑖¯superscriptsubscript𝐱0topsuperscript𝐲𝑖\overline{\mathbf{y}}^{(i)}=\overline{\nabla\mathcal{F}}(\mathbf{x}_{0})^{\top% }\mathbf{y}^{(i)}over¯ start_ARG bold_y end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT = over¯ start_ARG ∇ caligraphic_F end_ARG ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT
9:end for
10:
11:Network training
12:𝜽=argmin𝜽1Ni=1N(12f𝜽(𝐱(i);𝐲¯(i))22log|det𝐉f𝜽|)superscript𝜽𝜽argmin1𝑁superscriptsubscript𝑖1𝑁12superscriptsubscriptnormsubscript𝑓𝜽superscript𝐱𝑖superscript¯𝐲𝑖22subscript𝐉subscript𝑓𝜽\displaystyle\boldsymbol{\theta}^{\ast}=\underset{\boldsymbol{\theta}}{% \operatorname{argmin}}\quad\frac{1}{N}\sum_{i=1}^{N}\left(\frac{1}{2}\|f_{% \boldsymbol{\theta}}\left(\mathbf{x}^{(i)};\overline{\mathbf{y}}^{(i)}\right)% \|_{2}^{2}-\log\left|\det\mathbf{J}_{f_{\boldsymbol{\theta}}}\right|\right)bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = underbold_italic_θ start_ARG roman_argmin end_ARG divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ; over¯ start_ARG bold_y end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_log | roman_det bold_J start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT | )
13:
14:Online inference phase
15:
16:𝐲¯obs=¯(𝐱0)𝐲obssubscript¯𝐲obs¯superscriptsubscript𝐱0topsubscript𝐲obs\overline{\mathbf{y}}_{\mathrm{obs}}=\overline{\nabla\mathcal{F}}(\mathbf{x}_{% 0})^{\top}\mathbf{y}_{\mathrm{obs}}over¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT = over¯ start_ARG ∇ caligraphic_F end_ARG ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_y start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT
17:for i=1:M:𝑖1𝑀i=1:Mitalic_i = 1 : italic_M do
18:     𝐳iN(𝟎,𝐈)similar-tosubscript𝐳𝑖N0𝐈\mathbf{z}_{i}\sim\mathrm{N}(\mathbf{0},\mathbf{I})bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ roman_N ( bold_0 , bold_I )
19:     𝐱i=f𝜽1(hϕ(𝐳i);𝐲¯obs)subscript𝐱𝑖superscriptsubscript𝑓superscript𝜽1subscriptbold-italic-ϕsubscript𝐳𝑖subscript¯𝐲obs\mathbf{x}_{i}=f_{\boldsymbol{\theta}^{\ast}}^{-1}\left(h_{\boldsymbol{\phi}}% \left(\mathbf{z}_{i}\right);\overline{\mathbf{y}}_{\mathrm{obs}}\right)bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ; over¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT )
20:end for
21:
22:Physics-based refinements
23:for ii=1:maxiter1:𝑖𝑖1subscriptmaxiter1ii=1:\mathrm{maxiter}_{1}italic_i italic_i = 1 : roman_maxiter start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT do
24:     for i=1:M:𝑖1𝑀i=1:Mitalic_i = 1 : italic_M do
25:         𝐠i=𝐱i[12σ2(𝐱i)𝐲obs22+12γ2𝐱if𝜽1(hϕ(𝐳i);𝐲¯obs)22]subscript𝐠𝑖subscriptsubscript𝐱𝑖12superscript𝜎2superscriptsubscriptnormsubscript𝐱𝑖subscript𝐲obs2212superscript𝛾2superscriptsubscriptnormsubscript𝐱𝑖superscriptsubscript𝑓superscript𝜽1subscriptbold-italic-ϕsubscript𝐳𝑖subscript¯𝐲obs22\displaystyle\mathbf{g}_{i}=\nabla_{\mathbf{x}_{i}}\left[\frac{1}{2\sigma^{2}}% \|\mathcal{F}(\mathbf{x}_{i})-\mathbf{y}_{\mathrm{obs}}\|_{2}^{2}+\frac{1}{2% \gamma^{2}}\|\mathbf{x}_{i}-f_{\boldsymbol{\theta}^{\ast}}^{-1}\left(h_{% \boldsymbol{\phi}}\left(\mathbf{z}_{i}\right);\overline{\mathbf{y}}_{\mathrm{% obs}}\right)\|_{2}^{2}\right]bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∇ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ caligraphic_F ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - bold_y start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ; over¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]
26:         𝐱i=𝐱iτ𝐠isubscript𝐱𝑖subscript𝐱𝑖𝜏subscript𝐠𝑖\mathbf{x}_{i}=\mathbf{x}_{i}-\tau\mathbf{g}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_τ bold_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
27:     end for
28:     for iii=1:maxiter2:𝑖𝑖𝑖1subscriptmaxiter2iii=1:\mathrm{maxiter}_{2}italic_i italic_i italic_i = 1 : roman_maxiter start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT do
29:         (ϕ)=i=1M[12γ2𝐱if𝜽1(hϕ(𝐳i);𝐲¯obs)22+12hϕ(𝐳i)22log|det𝐉hϕ|]bold-italic-ϕsuperscriptsubscript𝑖1𝑀delimited-[]12superscript𝛾2superscriptsubscriptnormsubscript𝐱𝑖superscriptsubscript𝑓superscript𝜽1subscriptbold-italic-ϕsubscript𝐳𝑖subscript¯𝐲obs2212superscriptsubscriptnormsubscriptbold-italic-ϕsubscript𝐳𝑖22subscript𝐉subscriptbold-italic-ϕ\displaystyle\mathcal{L}(\boldsymbol{\phi})=\sum_{i=1}^{M}\left[\frac{1}{2% \gamma^{2}}\|\mathbf{x}_{i}-f_{\boldsymbol{\theta}^{\ast}}^{-1}\left(h_{% \boldsymbol{\phi}}\left(\mathbf{z}_{i}\right);\overline{\mathbf{y}}_{\mathrm{% obs}}\right)\|_{2}^{2}+\frac{1}{2}\|h_{\boldsymbol{\phi}}\left(\mathbf{z}_{i}% \right)\|_{2}^{2}-\log\left|\det\mathbf{J}_{h_{\boldsymbol{\phi}}}\right|\right]caligraphic_L ( bold_italic_ϕ ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT [ divide start_ARG 1 end_ARG start_ARG 2 italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ; over¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_log | roman_det bold_J start_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ]
30:         ϕADAM((ϕ))bold-italic-ϕADAMbold-italic-ϕ\boldsymbol{\phi}\leftarrow\mathrm{ADAM}\left(\mathcal{L}(\boldsymbol{\phi})\right)bold_italic_ϕ ← roman_ADAM ( caligraphic_L ( bold_italic_ϕ ) )
31:     end for
32:end for
33:
34:Output: {f𝜽1(hϕ(𝐳i);𝐲¯obs)}i=1Msuperscriptsubscriptsuperscriptsubscript𝑓superscript𝜽1subscriptbold-italic-ϕsubscript𝐳𝑖subscript¯𝐲obs𝑖1𝑀\{f_{\boldsymbol{\theta}^{\ast}}^{-1}\left(h_{\boldsymbol{\phi}}\left(\mathbf{% z}_{i}\right);\overline{\mathbf{y}}_{\mathrm{obs}}\right)\}_{i=1}^{M}{ italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ; over¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT as samples of p(𝐱|𝐲obs)𝑝conditional𝐱subscript𝐲obsp(\mathbf{x}|\mathbf{y}_{\mathrm{obs}})italic_p ( bold_x | bold_y start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT )

2 Amortized VI with WISE (lines 1—20)

WISER starts with an offline training phase that leverages conditional generative models to approximate the posterior distribution. This is achieved by WISE [19], which involves generating a training dataset (lines 3—9 of Algorithm 1) and training the CNFs (line 11—12).

2.1 Dataset generation (lines 3—9)

We begin by drawing N𝑁Nitalic_N velocity models from the prior distribution, denoted by p(𝐱)𝑝𝐱p(\mathbf{x})italic_p ( bold_x ) (line 5). For each sample, 𝐱(i)superscript𝐱𝑖\mathbf{x}^{(i)}bold_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT, we simulate the observed data, 𝐲(i)superscript𝐲𝑖\mathbf{y}^{(i)}bold_y start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT, by performing the wave modeling and adding a random noise term (lines 6—7). Next, we compute common-image gathers [CIGs, 26] for each observed data with an initial smooth 1D migration-velocity model, 𝐱0subscript𝐱0\mathbf{x}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, which can be rather inaccurate. These CIGs, represented by 𝐲¯(i)superscript¯𝐲𝑖\overline{\mathbf{y}}^{(i)}over¯ start_ARG bold_y end_ARG start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT, are produced by applying the adjoint of the extended migration operator, ¯(𝐱0)¯superscriptsubscript𝐱0top\overline{\nabla\mathcal{F}}(\mathbf{x}_{0})^{\top}over¯ start_ARG ∇ caligraphic_F end_ARG ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, to the observed data. Using CIGs as the set of physics-informed summary statistics not only preserves information from the observed seismic data [27] but also enhances the training of CNFs in the next stage [28, 29], as they help to decode the wave physics, translating prestack data to the image (subsurface-offset) domain.

2.2 Network training (lines 11—12)

CNFs are trained with pairs of velocity models and CIGs via minimization of the objective in line 12. The symbol f𝜽subscript𝑓𝜽f_{\boldsymbol{\theta}}italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT denotes the CNFs, characterized by their network weights, 𝜽𝜽\boldsymbol{\theta}bold_italic_θ, and the Jacobian, 𝐉f𝜽subscript𝐉subscript𝑓𝜽\mathbf{J}_{f_{\boldsymbol{\theta}}}bold_J start_POSTSUBSCRIPT italic_f start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT end_POSTSUBSCRIPT. The term “normalizing” within CNFs implies their functionality to transform realizations of velocity models, 𝐱(i)superscript𝐱𝑖\mathbf{x}^{(i)}bold_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT, into Gaussian noise from a standard multivariate normal distribution (as defined by the 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT norm), conditioned on the summary statistics (CIGs).

2.3 Online inference (lines 14—20)

The aforementioned data generation and CNF training procedures conclude the offline training phase. During online inference, amortized VI is enabled by leveraging the inherent invertibility of CNFs. For a given observation, 𝐲obssubscript𝐲obs\mathbf{y}_{\mathrm{obs}}bold_y start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT, the online cost is merely generation of a single set of CIGs (line 16). Subsequently, the posterior samples are generated by applying the inverse of the CNFs to Gaussian noise realizations, conditioned on these CIGs (lines 18—19)111We slightly abuse the notation to assume hϕsubscriptbold-italic-ϕh_{\boldsymbol{\phi}}italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT as an identity operator here..

3 Physics-based refinements (lines 22—32)

Consider a single observation, 𝐲obssubscript𝐲obs\mathbf{y}_{\mathrm{obs}}bold_y start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT, and its corresponding posterior samples, 𝐱ip(𝐱𝐲¯obs)similar-tosubscript𝐱𝑖𝑝conditional𝐱subscript¯𝐲obs\mathbf{x}_{i}\sim p(\mathbf{x}\mid\overline{\mathbf{y}}_{\mathrm{obs}})bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∼ italic_p ( bold_x ∣ over¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT ). The latent representations generated by the trained CNFs, 𝐳^i=f𝜽(𝐱i;𝐲¯obs)subscript^𝐳𝑖subscript𝑓superscript𝜽subscript𝐱𝑖subscript¯𝐲obs{\hat{\mathbf{z}}_{i}}=f_{\boldsymbol{\theta}^{\ast}}(\mathbf{x}_{i};\overline% {\mathbf{y}}_{\mathrm{obs}})over^ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; over¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT ), may not conform exactly to the standard Gaussian distribution during inference. To address this issue, we follow Siahkoohi et al. [17] to apply latent space corrections to fine-tune the trained CNFs. This involves integrating a shallower, yet invertible, network222For linear inverse problems in seismic imaging, Siahkoohi et al. [17] show that an elementwise scaling and shift mechanism is adequate to bridge the gap. However, given the complex, non-convex nature of FWI, we employ hϕsubscriptbold-italic-ϕh_{\boldsymbol{\phi}}italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT as a generic invertible network., specifically trained to map realizations of true Gaussian noise to the corresponding latent codes, 𝐳^isubscript^𝐳𝑖{\hat{\mathbf{z}}_{i}}over^ start_ARG bold_z end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Adhering to a transfer learning approach, we maintain the weights of the trained CNFs while solely updating the weights of the shallower network by minimizing the following objective:

minimizeϕ𝔼𝐳N(𝟎,𝐈)[12σ2f𝜽1(hϕ(𝐳);𝐲¯obs)𝐲obs22+12hϕ(𝐳)22log|det𝐉hϕ|].bold-italic-ϕminimizesubscript𝔼similar-to𝐳N0𝐈delimited-[]12superscript𝜎2superscriptsubscriptnormsuperscriptsubscript𝑓superscript𝜽1subscriptbold-italic-ϕ𝐳subscript¯𝐲obssubscript𝐲obs2212superscriptsubscriptnormsubscriptbold-italic-ϕ𝐳22subscript𝐉subscriptbold-italic-ϕ\underset{\boldsymbol{\phi}}{\operatorname{minimize}}\quad\mathbb{E}_{\mathbf{% z}\sim\mathrm{N}(\mathbf{0},\mathbf{I})}\left[\frac{1}{2\sigma^{2}}\|\mathcal{% F}\circ f_{\boldsymbol{\theta}^{\ast}}^{-1}\left(h_{\boldsymbol{\phi}}\left(% \mathbf{z}\right);\overline{\mathbf{y}}_{\mathrm{obs}}\right)-\mathbf{y}_{% \mathrm{obs}}\|_{2}^{2}+\frac{1}{2}\|h_{\boldsymbol{\phi}}\left(\mathbf{z}% \right)\|_{2}^{2}-\log\left|\det\mathbf{J}_{h_{\boldsymbol{\phi}}}\right|% \right].underbold_italic_ϕ start_ARG roman_minimize end_ARG blackboard_E start_POSTSUBSCRIPT bold_z ∼ roman_N ( bold_0 , bold_I ) end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ caligraphic_F ∘ italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_z ) ; over¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT ) - bold_y start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_z ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_log | roman_det bold_J start_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ] . (1)

Here, the refinement network, hϕsubscriptbold-italic-ϕh_{\boldsymbol{\phi}}italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT, mitigates the amortization gap by adjusting the latent variable 𝐳𝐳\mathbf{z}bold_z before feeding it to the inverse of the trained CNFs, f𝜽1superscriptsubscript𝑓superscript𝜽1f_{\boldsymbol{\theta}^{\ast}}^{-1}italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Intuitively, minimizing the first terms ties the posterior samples closer to the observed data. The second and third terms prevent the corrected latent space from being far from the Gaussian distribution, which implicitly takes advantage of the prior information existing in the amortized training phase.

Equation 1 offers a fine-tuning approach that leverages the full multi-D wave physics to refine the amortized VI framework for a single observation at inference phase. However, it introduces notable computational demands because it necessitates the coupling of the modeling operator and the networks. Specifically, every update to the network weights, ϕbold-italic-ϕ\boldsymbol{\phi}bold_italic_ϕ, requires costly wave modeling operations. Given that network training typically involves numerous iterations, these computational demands can render it impractical for realistic FWI applications.

To relieve this computational burden, we adopt a strategy from Siahkoohi et al. [30] to reformulate Equation 1 into a weak form by allowing the network output to be only weakly enforced (in an 2subscript2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT sense) to be the corrected velocity models. The objective function for this weak formulation reads:

minimize𝐱1:M,ϕ1Mi=1N[12σ2(𝐱i)𝐲obs22+12γ2𝐱if𝜽1(hϕ(𝐳i);𝐲¯obs)22+12hϕ(𝐳i)22log|det𝐉hϕ|].subscript𝐱:1𝑀bold-italic-ϕminimize1𝑀superscriptsubscript𝑖1𝑁delimited-[]12superscript𝜎2superscriptsubscriptnormsubscript𝐱𝑖subscript𝐲obs2212superscript𝛾2superscriptsubscriptnormsubscript𝐱𝑖superscriptsubscript𝑓superscript𝜽1subscriptbold-italic-ϕsubscript𝐳𝑖subscript¯𝐲obs2212superscriptsubscriptnormsubscriptbold-italic-ϕsubscript𝐳𝑖22subscript𝐉subscriptbold-italic-ϕ\underset{\mathbf{x}_{1:M},\boldsymbol{\phi}}{\operatorname{minimize}}\quad% \frac{1}{M}\sum_{i=1}^{N}\left[\frac{1}{2\sigma^{2}}\|\mathcal{F}(\mathbf{x}_{% i})-\mathbf{y}_{\mathrm{obs}}\|_{2}^{2}+\frac{1}{2\gamma^{2}}\|\mathbf{x}_{i}-% f_{\boldsymbol{\theta}^{\ast}}^{-1}\left(h_{\boldsymbol{\phi}}\left(\mathbf{z}% _{i}\right);\overline{\mathbf{y}}_{\mathrm{obs}}\right)\|_{2}^{2}+\frac{1}{2}% \|h_{\boldsymbol{\phi}}\left(\mathbf{z}_{i}\right)\|_{2}^{2}-\log\left|\det% \mathbf{J}_{h_{\boldsymbol{\phi}}}\right|\right].start_UNDERACCENT bold_x start_POSTSUBSCRIPT 1 : italic_M end_POSTSUBSCRIPT , bold_italic_ϕ end_UNDERACCENT start_ARG roman_minimize end_ARG divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT [ divide start_ARG 1 end_ARG start_ARG 2 italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ caligraphic_F ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) - bold_y start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∥ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ; over¯ start_ARG bold_y end_ARG start_POSTSUBSCRIPT roman_obs end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT ( bold_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - roman_log | roman_det bold_J start_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT end_POSTSUBSCRIPT | ] . (2)

We strategically decouple the computationally expensive forward operator, \mathcal{F}caligraphic_F, from the more cheap-to-evaluate networks, f𝜽subscript𝑓superscript𝜽f_{\boldsymbol{\theta}^{\ast}}italic_f start_POSTSUBSCRIPT bold_italic_θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and hϕsubscriptbold-italic-ϕh_{\boldsymbol{\phi}}italic_h start_POSTSUBSCRIPT bold_italic_ϕ end_POSTSUBSCRIPT. This is achieved in a penalty form with the assumption that the misfit between the network outputs and the posterior samples adheres to a Gaussian distribution, N(𝟎,γ2𝐈)N0superscript𝛾2𝐈\mathrm{N}(\mathbf{0},\gamma^{2}\mathbf{I})roman_N ( bold_0 , italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ), where γ𝛾\gammaitalic_γ is a hyperparameter dictating the trade-off between data misfit and regularization. Setting γ𝛾\gammaitalic_γ to infinity equates this weak formulation to the constrained formulation in Equation 1. This weak formulation also supports optimization strategies for updating the velocity models with physical constraints [31, 32] and multiscale optimization techniques [33].

WISER takes full computational advantage of this weak formulation by employing a nested loop structure. The outer loop is dedicated to updating M𝑀Mitalic_M velocity models, 𝐱isubscript𝐱𝑖\mathbf{x}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, through costly gradient descent steps (lines 24—27 of Algorithm 1), while the inner loop (lines 28—31) focuses on more updates (with the ADAM optimizer) to network weights, ϕbold-italic-ϕ\boldsymbol{\phi}bold_italic_ϕ, without computationally expensive physics modeling. To achieve a balance, we conduct maxiter2=128subscriptmaxiter2128\mathrm{maxiter}_{2}=128roman_maxiter start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 128 iterations in the inner loop. After refinements, WISER first evaluates the refined network on the latent variables to obtain refined latent codes. Subsequently, the amortized network uses the refined codes conditioned on the CIGs to compute the corrected posterior samples (line 34).

4 Case studies

Evaluation of WISER is conducted through synthetic case studies utilizing 2D slices of the Compass model and 2D acoustic wave physics. The parameter of interest is discretized into 512×256512256512\times 256512 × 256 gridpoints with a spatial resolution of 12.5m12.5m12.5\,\mathrm{m}12.5 roman_m, resulting in over 105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT degrees of freedom. The forward operator, \mathcal{F}caligraphic_F, simulates acoustic data with absorbing boundaries. A Ricker wavelet with a central frequency of 15Hz15Hz15\,\mathrm{Hz}15 roman_Hz and an energy cut below 3Hz3Hz3\,\mathrm{Hz}3 roman_Hz is employed. We use 512512512512 sources towed at 12.5m12.5m12.5\,\mathrm{m}12.5 roman_m depth and 64646464 ocean-bottom nodes (OBNs) located at jittered sampled horizontal positions [34]. We employ source-receiver reciprocity during the modeling and sensitivity calculations. The observed data, 𝐲obssuperscript𝐲obs\mathbf{y}^{\mathrm{obs}}bold_y start_POSTSUPERSCRIPT roman_obs end_POSTSUPERSCRIPT, is perturbed with band-limited Gaussian noise to achieve a signal-to-noise ratio (S/N) of 12dB12dB12\,\mathrm{dB}12 roman_dB. The training of the CNFs uses N=800𝑁800N=800italic_N = 800 pairs of velocity models and CIGs. To demonstrate WISER’s efficacy in mitigating the amortization gap, we compare results from WISE and WISER under two scenarios during inference:

  1. (i)

    observed shot data is generated using an in-distribution velocity model with the same forward operator;

  2. (ii)

    observed shot data is produced by an out-of-distribution (OOD) velocity model and also a slightly altered forward operator.

4.1 Case 1: in distribution

The ground-truth velocity model is an unseen 2D slice from the Compass model, shown in Figure 1a. Following Algorithm 1, we initiate WISER by drawing M=16𝑀16M=16italic_M = 16 Gaussian noise realizations to create the initial set of 16161616 velocity models, depicted in Figure 1b. To minimize computational demands, stochastic gradients [35] are calculated in line 25 of Algorithm 1. Each particle’s gradient is estimated using only 1111 randomly selected OBN gather from the observed data. We also add box constraints to the velocity models to restrict their range to 1.481.481.481.48 to 4.44km/s4.44kms4.44\,\mathrm{km/s}4.44 roman_km / roman_s. Following maxiter1=80subscriptmaxiter180\mathrm{maxiter}_{1}=80roman_maxiter start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 80 outer iterations—equivalent to 20202020 data passes or 2560256025602560 PDE solves3331 PDE solve means solving the wave equation for a single source. A gradient requires 2 PDE solves (forward and adjoint).—–we obtain the posterior samples from WISER in Figure 1c.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Figure 1: Comparison between WISE and WISER for an in-distribution case. (a) Unseen ground-truth velocity model. (b) Estimated velocity models from WISE. The conditional mean estimate (CM) is shown in the center. For posterior samples, horizontal traces at Z=2.7km𝑍2.7kmZ=2.7\,\mathrm{km}italic_Z = 2.7 roman_km and vertical traces at X=3.6km𝑋3.6kmX=3.6\,\mathrm{km}italic_X = 3.6 roman_km are displayed on the top and on the right, respectively. (d) Imaged reflectivity samples from WISE. (f) Zoom-in views of (d) overlaying on the CM of WISE. (c)(e)(g) are the counterparts from WISER, showcasing significant improvements.

4.1.1 Observations

The conditional mean estimate (CM) from WISE lacks finer details, particularly beneath the unconformity at depths below 2.4km2.4km2.4\,\mathrm{km}2.4 roman_km (in red). This is attributed to the excessive variability in structural details of the posterior samples, visible on the right panel of Figure 1b.

In contrast, WISER generates more consistent and accurate posterior samples. In Figure 1c, the right panel shows that the uncertainty from WISER is reduced below the unconformity. The upper panel illustrates that the uncertainty is more focused on the dipping events at the unconformity, highlighting areas of poor illumination.

4.1.2 Impact on imaging

To assess the impact of uncertainty in velocity models on downstream tasks, we conduct high-frequency imaging using a Ricker wavelet with central frequency of 30Hz30Hz30\,\mathrm{Hz}30 roman_Hz and compare the imaged reflectivities derived from the posterior samples of both WISE and WISER, shown in Figure 1d and Figure 1e, respectively.

The imaged reflectivities produced by CM from WISER exhibit superior continuity and a better correlation with the CM migration-velocity model, particularly noticeable in Figure 1g under the unconformity. Also, reflectivity samples produced by WISER demonstrate improved alignment among themselves compared to those produced by WISE. In addition, notable vertical shifts observed in the imaged reflectivities from WISE to WISER indicate significant adjustments in the positioning of subsurface reflectors, underlining the necessity of the refinement procedure for precisely estimating migration-velocity models that locate subsurface reflectors more accurately.

4.2 Case 2: out of distribution

To test the robustness and adaptability of WISER when faced with unexpected variations at inference, we also evaluate WISER’s performance under OOD scenarios. We introduce alterations to the velocity model depicted in Figure 1a through an elementwise perturbation shown in Figure 2a. This manipulation modifies the velocity values across different depth levels, resulting in a significant shift in their statistical distribution, illustrated in Figure 2b. We use the perturbed velocity as the unseen ground-truth velocity model in this case study, shown in Figure 2c. To further expand the amortization gap, we modify the encoding of the forward operator by introducing a higher amplitude of band-limited Gaussian noise (S/N 0dB0dB0\,\mathrm{dB}0 roman_dB).

These complexities present substantial challenges for WISE, leading to biased inference results as depicted in Figure 2d. The yellow histograms in Figure 2b show that the velocity values of the posterior samples from WISE closely resemble those of the original velocity model, despite the different distribution of the ground-truth velocity model. This indicates that WISE tends to incorporate an inductive bias from the training samples. In WISER, we conduct maxiter1=160subscriptmaxiter1160\mathrm{maxiter}_{1}=160roman_maxiter start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 160 outer iterations, using M=16𝑀16M=16italic_M = 16 particles and 1111 OBN per gradient. We also employ the frequency continuation method [33] to compute the gradient in line 25 of Algorithm 1, transitioning gradually from low-frequency to high-frequency data. This results in 40404040 datapasses or 5120512051205120 PDE solves in total.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Refer to caption
(i)
Refer to caption
(j)
Refer to caption
(k)
Figure 2: OOD case study. (a) Curves for velocity-value perturbations; (b) histograms of values at the depth of 0.5km0.5km0.5\,\mathrm{km}0.5 roman_km and 2.8km2.8km2.8\,\mathrm{km}2.8 roman_km in the original velocity model (Figure 1a), perturbed velocity model (Figure 2c), posterior samples of WISE, and WISER, shown in red, blue, yellow and green color, respectively. (c)—(i) Comparison between WISE and WISER. The ordering remains the same as in Figure 1. (j) FWI result starting with a posterior sample from WISE. (k) A posterior sample from WISER.

4.2.1 Observations

WISER produces more accurate posterior samples shown in Figure 2e. Furthermore, the statistical distribution of the velocity values in the WISER posterior samples (green histogram in Figure 2b) aligns better with the distribution of the unseen ground-truth velocity values (blue histogram in Figure 2b), demonstrating WISER’s robustness against potential distribution shifts during inference.

To further showcase WISER’s robustness under severe measurement noise, we compare a posterior sample from WISER (Figure 2k) to a velocity model estimated by FWI (Figure 2j), derived by minimizing only the data likelihood (the first term in line 25 of Algorithm 1), while starting from the same initial model as WISER. The FWI result is significantly impacted by noise, while the posterior samples from WISER remain relatively noise-free and capture all pertinent geological structures.

4.2.2 Impact on imaging

The imaging results from WISE (Figure 2f) and WISER (Figure 2g) reveal noticeable discrepancies in quality. The CM migration-velocity model from WISE leads to discontinuities in the imaged reflectivities, particularly at the horizontal layer around 1.8km1.8km1.8\,\mathrm{km}1.8 roman_km depth and more so below the unconformity. In contrast, the CM from WISER significantly improves the continuity of the imaged reflectivities across the entire seismic section. The imaged reflectivity samples from WISER not only show better consistency among themselves but also align more accurately with the estimated CM migration-velocity model, particularly visible in Figure 2i.

5 Discussion and conclusions

The primary contribution of WISER is to leverage both genAI and physics to achieve a semi-amortized VI framework for scalable (D2D2\mathrm{D}\geq 2roman_D ≥ 2) and reliable UQ for FWI even in situations where local approximations are unsuitable. At its core, WISER harnesses the strengths of both amortized and non-amortized VI: the amortized posterior obtained through offline training provides a low-fidelity but fast mapping, and the physics-based refinements offer reliable and accurate inference. Both approaches benefit from information preservation exhibited by CIGs, rendering our inference successful where conventional FWI fails due to cycle skipping.

Compared to McMC methods that rely on low-dimensional parameterizations, WISER does not impose intrinsic dimensionality reductions or simplifications of the forward model. Therefore, WISER is capable of delivering full-resolution UQ for realistic multi-D FWI problems. CNFs are primed for large-scale 3D inversion thanks to their invertibility, which allows for memory-efficient training and inference [36].

Compared to non-amortized VI methods, WISER also requires significantly less computational resources during inference. This is because WISE, as a precursor of WISER, already provides near-accurate posterior samples, making the refinement procedure computationally feasible. Zhang et al. [22] show that non-amortized VI, without access to realistic prior information, requires O(106)Osuperscript106\mathrm{O}(10^{6})roman_O ( 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT ) to O(108)Osuperscript108\mathrm{O}(10^{8})roman_O ( 10 start_POSTSUPERSCRIPT 8 end_POSTSUPERSCRIPT ) PDE solves, while WISER only needs O(103)Osuperscript103\mathrm{O}(10^{3})roman_O ( 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ) PDE solves. Apart from the computational cost reduction, WISER ensures that the posterior samples realistically resemble Earth models, thanks to the integration of the conditional prior information from WISE. Contrary to non-amortized VI, which needs density evaluations to embed the prior (i.e., p(𝐱)𝑝𝐱p(\mathbf{x})italic_p ( bold_x ) in line 5 of Algorithm 1) to produce realistic Earth models, WISER only needs access to samples of the prior distribution (i.e., 𝐱(i)superscript𝐱𝑖\mathbf{x}^{(i)}bold_x start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT in line 5).

Opportunities for future research remain. Although case 2 demonstrates the robustness of WISER concerning OOD issues, these issues could be fundamentally addressed by diversifying the training set of WISE through a foundation model [37]. Also, our OOD case study has not yet explored scenarios where the likelihood term is more pathologically misspecified, such as the presence of unremoved shear wave energy outside the range of the forward operator, which calls for further investigations. Our approach will also benefit from calibration of the estimated posterior on which the authors report elsewhere, including application of WISE(R) in 3D.

In conclusion, this paper sets the stage for deploying genAI models to facilitate high-dimensional Bayesian inference with computationally intensive multi-D forward operators. Deep learning and AI have been criticized for their reliance on realistic training samples, but WISER alleviates this reliance and still offers computationally feasible and reliable inference through a blend of offline training and online frugal physics-based refinements, preparing our approach for large 3D deployment.

6 Acknowledgement

This research was carried out with the support of Georgia Research Alliance and partners of the ML4Seismic Center.

7 Related material

The scripts to reproduce the experiments are available on the SLIM GitHub page \seqsplithttps://github.com/slimgroup/WISER.jl.

8 Declaration of generative AI and AI-assisted technologies in the writing process

During the preparation of this work, the authors used ChatGPT to refine sentence structures and improve the readability of the manuscript. After using this service, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

References

  • Virieux and Operto [2009] J. Virieux and S. Operto. An overview of full-waveform inversion in exploration geophysics. GEOPHYSICS, 74(6):WCC1–WCC26, 11 2009. doi: 10.1190/1.3238367. URL http://dx.doi.org/10.1190/1.3238367.
  • Cranmer et al. [2020] Kyle Cranmer, Johann Brehmer, and Gilles Louppe. The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48):30055–30062, 2020.
  • Fang et al. [2018] Zhilong Fang, Curt Da Silva, Rachel Kuske, and Felix J Herrmann. Uncertainty quantification for inverse problems with weak partial-differential-equation constraints. Geophysics, 83(6):R629–R647, 2018.
  • Keating and Innanen [2021] Scott D Keating and Kristopher A Innanen. Null-space shuttles for targeted uncertainty analysis in full-waveform inversion. Geophysics, 86(1):R63–R76, 2021.
  • Izzatullah et al. [2023] Muhammad Izzatullah, Matteo Ravasi, and Tariq Alkhalifah. Physics reliable frugal uncertainty analysis for full waveform inversion. arXiv preprint arXiv:2305.07921, 2023.
  • Hoffmann et al. [2024] Alexandre Hoffmann, Romain Brossier, Ludovic Métivier, and Alizia Tarayoun. Local uncertainty quantification for 3-d time-domain full-waveform inversion with ensemble kalman filters: application to a north sea obc data set. Geophysical Journal International, 237(3):1353–1383, 2024.
  • Cowles and Carlin [1996] Mary Kathryn Cowles and Bradley P Carlin. Markov chain monte carlo convergence diagnostics: a comparative review. Journal of the American statistical Association, 91(434):883–904, 1996.
  • Fang et al. [2020] Zhilong Fang, Hongjian Fang, and Laurent Demanet. Deep generator priors for bayesian seismic inversion. In Advances in geophysics, volume 61, pages 179–216. Elsevier, 2020.
  • Liang et al. [2023] Zhouji Liang, Florian Wellmann, and Omar Ghattas. Uncertainty quantification of geologic model parameters in 3d gravity inversion by hessian-informed markov chain monte carlo. Geophysics, 88(1):G1–G18, 2023.
  • Wei et al. [2023] Xiaolong Wei, Jiajia Sun, and Mrinal K Sen. Quantifying uncertainty of salt body shapes recovered from gravity data using trans-dimensional markov chain monte carlo sampling. Geophysical Journal International, 232(3):1957–1978, 2023.
  • Wei et al. [2024a] Xiaolong Wei, Jiajia Sun, and Mrinal Sen. 3d monte carlo geometry inversion using gravity data. Geophysics, 89(3):1–62, 2024a.
  • Wei et al. [2024b] Xiaolong Wei, Jiajia Sun, and Mrinal K Sen. Reconstruction of multiple target bodies using trans-dimensional bayesian inversion with different constraints. IEEE Transactions on Geoscience and Remote Sensing, 2024b.
  • Dhabaria and Singh [2024] Nirmit Dhabaria and Satish C Singh. Hamiltonian monte carlo based elastic full-waveform inversion of wide-angle seismic data. Geophysical Journal International, 237(3):1384–1399, 2024.
  • Zhang et al. [2021] Xin Zhang, Muhammad Atif Nawaz, Xuebin Zhao, and Andrew Curtis. An introduction to variational inference in geophysical inverse problems. In Advances in geophysics, volume 62, pages 73–140. Elsevier, 2021.
  • Baldassari et al. [2024] Lorenzo Baldassari, Ali Siahkoohi, Josselin Garnier, Knut Solna, and Maarten V de Hoop. Conditional score-based diffusion models for bayesian inference in infinite dimensions. Advances in Neural Information Processing Systems, 36, 2024.
  • Winkler et al. [2019] Christina Winkler, Daniel Worrall, Emiel Hoogeboom, and Max Welling. Learning likelihoods with conditional normalizing flows. arXiv preprint arXiv:1912.00042, 2019.
  • Siahkoohi et al. [2023] Ali Siahkoohi, Gabrio Rizzuti, Rafael Orozco, and Felix J. Herrmann. Reliable amortized variational inference with physics-based latent distribution correction. GEOPHYSICS, 88(3):R297–R322, 04 2023. doi: 10.1190/geo2022-0472.1. URL http://dx.doi.org/10.1190/geo2022-0472.1.
  • Orozco et al. [2023a] Rafael Orozco, Mathias Louboutin, Ali Siahkoohi, Gabrio Rizzuti, Tristan van Leeuwen, and Felix Johan Herrmann. Amortized normalizing flows for transcranial ultrasound with uncertainty quantification. In Medical Imaging with Deep Learning, 2023a. URL https://openreview.net/forum?id=LoJG-lUIlk.
  • Yin et al. [2024] Ziyi Yin, Rafael Orozco, Mathias Louboutin, and Felix J Herrmann. Wise: full-waveform variational inference via subsurface extensions. Geophysics, 89(4):1–31, 2024.
  • Marino et al. [2018] Joe Marino, Yisong Yue, and Stephan Mandt. Iterative amortized inference. In International Conference on Machine Learning, pages 3403–3412. PMLR, 2018.
  • Zhao et al. [2022] Xuebin Zhao, Andrew Curtis, and Xin Zhang. Bayesian seismic tomography using normalizing flows. Geophysical Journal International, 228(1):213–239, 2022.
  • Zhang et al. [2023] Xin Zhang, Angus Lomas, Muhong Zhou, York Zheng, and Andrew Curtis. 3-d bayesian variational full waveform inversion. Geophysical Journal International, 234(1):546–561, 2023.
  • Zhang and Curtis [2024] Xin Zhang and Andrew Curtis. Bayesian variational time-lapse full waveform inversion. Geophysical Journal International, 237(3):1624–1638, 2024.
  • Kruse et al. [2021] Jakob Kruse, Gianluca Detommaso, Ullrich Köthe, and Robert Scheichl. Hint: Hierarchical invertible neural transport for density estimation and bayesian inference. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 8191–8199, 2021.
  • Jones et al. [2012] CE Jones, JA Edgar, JI Selvage, and H Crook. Building complex synthetic models to evaluate acquisition geometries and velocity inversion technologies. In 74th EAGE Conference and Exhibition incorporating EUROPEC 2012, pages cp–293. European Association of Geoscientists & Engineers, 2012.
  • Hou and Symes [2016] Jie Hou and William W. Symes. Accelerating extended least-squares migration with weighted conjugate gradient iteration. GEOPHYSICS, 81(4):S165–S179, 07 2016. doi: 10.1190/geo2015-0499.1. URL http://dx.doi.org/10.1190/geo2015-0499.1.
  • ten Kroode [2023] Fons ten Kroode. An omnidirectional seismic image extension. Inverse Problems, 39(3):035003, 2023.
  • Radev et al. [2020] Stefan T Radev, Ulf K Mertens, Andreas Voss, Lynton Ardizzone, and Ullrich Köthe. Bayesflow: Learning complex stochastic models with invertible neural networks. IEEE transactions on neural networks and learning systems, 33(4):1452–1466, 2020.
  • Orozco et al. [2023b] Rafael Orozco, Ali Siahkoohi, Gabrio Rizzuti, Tristan van Leeuwen, and Felix Herrmann. Adjoint operators enable fast and amortized machine learning based bayesian uncertainty quantification. Medical Imaging 2023: Image Processing, 04 2023b. doi: 10.1117/12.2651691. URL http://dx.doi.org/10.1117/12.2651691.
  • Siahkoohi et al. [2020] Ali Siahkoohi, Gabrio Rizzuti, and Felix J Herrmann. Weak deep priors for seismic imaging. In SEG Technical Program Expanded Abstracts 2020, pages 2998–3002. Society of Exploration Geophysicists, 2020.
  • Esser et al. [2018] Ernie Esser, Lluis Guasch, Tristan van Leeuwen, Aleksandr Y Aravkin, and Felix J Herrmann. Total variation regularization strategies in full-waveform inversion. SIAM Journal on Imaging Sciences, 11(1):376–406, 2018.
  • Peters et al. [2019] Bas Peters, Brendan R Smithyman, and Felix J Herrmann. Projection methods and applications for seismic nonlinear inverse problems with multiple constraints. Geophysics, 84(2):R251–R269, 2019.
  • Bunks et al. [1995] Carey Bunks, Fatimetou M Saleck, S Zaleski, and G Chavent. Multiscale seismic waveform inversion. Geophysics, 60(5):1457–1473, 1995.
  • Hennenfent and Herrmann [2008] Gilles Hennenfent and Felix J Herrmann. Simply denoise: Wavefield reconstruction via jittered undersampling. Geophysics, 73(3):V19–V28, 2008.
  • Herrmann et al. [2013] Felix J Herrmann, Ian Hanlon, Rajiv Kumar, Tristan van Leeuwen, Xiang Li, Brendan Smithyman, Haneet Wason, Andrew J Calvert, Mostafa Javanmehri, and Eric Takam Takougang. Frugal full-waveform inversion: From theory to a practical algorithm. The Leading Edge, 32(9):1082–1092, 2013.
  • Orozco et al. [2023c] Rafael Orozco, Philipp Witte, Mathias Louboutin, Ali Siahkoohi, Gabrio Rizzuti, Bas Peters, and Felix J Herrmann. Invertiblenetworks.jl: A julia package for scalable normalizing flows. arXiv preprint arXiv:2312.13480, 2023c.
  • Sheng et al. [2023] Hanlin Sheng, Xinming Wu, Xu Si, Jintao Li, Sibio Zhang, and Xudong Duan. Seismic foundation model (sfm): a new generation deep learning model in geophysics. arXiv preprint arXiv:2309.02791, 2023.