Neural Born Series Operator for Biomedical Ultrasound Computed Tomography

Zhijun Zeng These authors contributed equally to this work. Department of Mathematical Sciences, Tsinghua University, Beijing, China. DP Technology, Ltd., Beijing, China. Yihang Zheng College of Future Technology, Peking University, Beijing, China. National Biomedical Imaging Center, Peking University, Beijing, China. Youjia Zheng College of Future Technology, Peking University, Beijing, China. National Biomedical Imaging Center, Peking University, Beijing, China. Yubing Li Institute of Acoustics, Chinese Academy of Sciences, Beijing, China. Zuoqiang Shi Yau Mathematical Sciences Center, Tsinghua University, Beijing, China. Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, China. He Sun Corresponding author: hesun@pku.edu.cn College of Future Technology, Peking University, Beijing, China. National Biomedical Imaging Center, Peking University, Beijing, China.

Abstract

Ultrasound Computed Tomography (USCT) provides a radiation-free option for high-resolution clinical imaging. Despite its potential, the computationally intensive Full Waveform Inversion (FWI) required for tissue property reconstruction limits its clinical utility. This paper introduces the Neural Born Series Operator (NBSO), a novel technique designed to speed up wave simulations, thereby facilitating a more efficient USCT image reconstruction process through an NBSO-based FWI pipeline. Thoroughly validated on comprehensive brain and breast datasets, simulated under experimental USCT conditions, the NBSO proves to be accurate and efficient in both forward simulation and image reconstruction. This advancement demonstrates the potential of neural operators in facilitating near real-time USCT reconstruction, making the clinical application of USCT increasingly viable and promising.

1 Introduction

Ultrasound Computed Tomography (USCT) emerges as an innovative radiation-free imaging approach with exceptional potential for high-resolution clinical imaging of human tissues [33, 9, 17]. Unlike conventional B-mode ultrasound, which demands manual operation by physicians and relies solely on reflected signals for image formation, USCT employs specialized configurations such as annular, cylindrical, or hemispherical transducer arrays [40, 9, 5]. This arrangement enables the simultaneous acquisition of both transmitted and reflected signals from wave-tissue interactions by sequentially emitting waves from each transducer and measuring signals using the remaining ones [12, 41, 49].

Refer to caption — Figure 1: Schematic diagram of a USCT system: (a) The imaging target is positioned within an annular transducer array, where each transducer emits waves in sequence, while the remaining transducers serve as receivers; (b) This setup results in measurement data forming a square matrix in the frequency domain, with each row representing the signals received from a single source.

Leveraging this approach, USCT holds the potential to retrieve 2D/3D structural information of biological tissues [30, 2, 9, 47, 28] akin to medical X-ray computed tomography (CT). However, unlike X-ray CT, the ultrasonic wavelength is comparable to the target’s scale, rendering the straight-ray approximation invalid in USCT [45]. Consequently, wavefields modulated by tissues show significant scattering effects. To enhance image quality, USCT typically employs partial differential equations (PDEs) to model ultrasonic wave propagation and solves a PDE-constrained nonlinear inverse problem to reconstruct tissue properties, such as attenuation and sound speed [3, 33]. This technology is commonly referred to as full waveform inversion (FWI). FWI, while effective, requires multiple iterations of PDE simulations for a single reconstruction. Due to the computationally intensive nature of numerical PDE solvers, achieving quasi-real-time FWI image reconstruction remains a challenge [2, 23]. This limitation restricts the broader application of FWI and USCT in clinical diagnosis, particularly in first-aid scenarios like promptly identifying stroke types (ischemic or hemorrhagic) [9].

Neural operators have defined a novel deep learning framework for efficient PDE-based physics simulations [24, 19, 20, 25]. Grounded in the operator theory, they approximate complex mappings between PDE parameters and their physical field outputs within function spaces [15]. This methodology effectively merges the efficiency of neural networks and the modeling accuracy of PDEs, demonstrating remarkable potential in diverse applications. However, despite their computational speed, current neural operators often struggle to achieve sufficient accuracy when solving acoustic equations, including time-domain wave equations and frequency-domain Helmholtz equations, particularly in practical imaging scenarios. In biomedical imaging, for instance, the requirement for a large field of view (FOV), often exceeding 50 wavelengths, results in highly oscillatory solutions that challenge existing neural operator models.

In this paper, we present the Neural Born Series Operator (NBSO), a novel operator learning method tailored for efficiently solving large-scale Helmholtz equations. Drawing inspiration from the advanced numerical Helmholtz equation solver, Convergent Born Series (CBS) [31, 14], the NBSO leverages the intrinsic structure of the Helmholtz equation in its neural network architecture design. Rigorously trained and validated on USCT datasets that simulate a real-world system, NBSO proves to be a highly accurate and efficient surrogate model for addressing biomedical FWI. This paper makes three key contributions to the field:

•

Large-Scale USCT Dataset Construction: We create a comprehensive Helmholtz equation dataset based on real USCT system parameters, featuring over 600,000 pairs of sound speed and wavefield for brain and breast tissues.
•

Innovative NBSO Framework: We introduce the NBSO, achieving unparalleled precision among neural operators for solving the highly oscillatory Helmholtz equations prevalent in biomedical imaging.
•

NBSO-based in Biomedical FWI: We apply NBSO to biomedical FWI and demonstrate its potential to reconstruct high-quality breast and brain images at a speed 100 times faster than conventional methods, paving the way for quasi-real-time USCT imaging.

2 Related Works

2.1 Full Waveform Inversion

FWI addresses a PDE-constrained nonlinear inverse problem in imaging, offering a breakthrough solutions in reconstructing properties of complex media. Initially prominent in geophysical exploration for oil and gas reservoir imaging[36, 22, 11], FWI integrates wave physics and multiple scattering effects for data interpretation, surpassing traditional ray approximation-based methods [43] in imaging quality. FWI’s utility has recently expanded to the biomedical imaging sector, with notable applications in optical diffraction tomography (ODT) for imaging thick biological samples [16], and USCT for high-resolution imaging of breasts [28, 41], limbs [3, 8], and brains [9, 18]. Despite these advancements, FWI’s translation to clinical settings remains limited due to its significant computational demands, particularly in the scenarios requiring large field of view. Presently, single reconstruction tasks may take hours or even days on high-performance computing clusters [9, 16]. Accelerating FWI is essential for its broader adoption and future progress, especially in biomedical imaging.

2.2 Neural Operators

Neural operators defines a novel deep-learning paradigm that approximates mappings between infinite-dimensional function spaces, in contrast to conventional neural networks restricted to finite-dimensional Euclidean spaces. This capability positions them as a promising solution for complex, large-scale PDEs, historically challenging in practical applications. A notable implementation in this domain is the Deep Operator Network (DeepONet) [24, 25, 4, 29, 7, 21], featuring a distinctive “branch and trunk” network structure. Alternatively, the Fourier Neural Operator (FNO) [20, 19, 32, 39, 10] creatively parameterizes the integral kernel in Fourier space, enhancing a neural network’s expressiveness and efficiency for operator learning. Both techniques have demonstrated reasonable accuracy, high inference efficiency, and robustness against noise and generalization errors [26]. Spurred by these initial successes, various specialized forms have also been proposed for solving PDE of specific forms. For example, Learned Born Series (LBO)[35] and Born FNO (BFNO)[48] have been proposed for handling Helmholtz equations by modifying FNO based on Born series. Despite their effectiveness in managing complex PDEs, neural operators typically lag behind traditional methods in computational accuracy, which warrants further investigation.

2.3 Neural Operators for PDE Inverse Problems

Thanks to their impressive approximation capabilities and negligible computational speed, neural operators are increasingly employed to solve PDE-constrained inverse problems in diverse fields such as turbulent flows [20], solid mechanics [38], weather forecasting [32], material designs [27] and nano-photonics [37]. There are also growing interests in neural operator-based FWI, with many studies focusing on directly learning the mapping from observed data to parametric images of physical properties, as seen in geophysics [46, 50] and ultrasound imaging [23]. However, this data-to-image mapping is often non-unique due to limited observations, leading to generalization challenges. An alternative is to use neural operators as surrogate forward models to accelerate PDE solving, mapping physical properties to data, followed by inversion using a gradient-based optimizer. This approach enhances generalization and robustness, demonstrated in seismic imaging [44], electromagnetic imaging [48], and time-of-flight tomography in ultrasound [6]. Yet, most of these methods focus on limited computational domains (under 30 wavenumbers) to ease learning highly-oscillatory features. The potential of neural operator-based FWI, particularly in biomedical imaging applications like USCT with larger computational domains (around 100 wavenumbers), remains unexplored.

3 Problem Definition

3.1 Image Formation Model of USCT

USCT acquires data by first activating point sources with known locations and subsequently measuring the corresponding scattered acoustic signals that have been modulated by internal biological tissues. In steady-state conditions, the propagation of ultrasound waves can be accurately described by a heterogeneous Helmholtz equation, assuming negligible shear motion and nonlinear effects,

{\left[\nabla^{2}+\left(\frac{\omega}{c(x)}\right)^{2}\right]u(x)}=-\rho(x),

(1)

where $r=|x|$ , $\omega$ is the angular frequency of ultrasound waves, $c(x)$ is the spatial distribution of sound speed, $\rho(x)$ is the source term and $u(x)$ is the complex acoustic field. In the context of USCT, the heterogeneity in sound speed $c(x)$ is confined to a specific domain of interest (DOI), while the sound speed remains constant $c_{0}$ outside of this domain. To account for this boundary condition, the Helmholtz equation must satisfy the Sommerfeld radiation condition:

\lim_{r\rightarrow\infty}r^{\frac{n-1}{2}}\left(\frac{\partial}{\partial r}-i% \frac{\omega}{c_{0}}\right)u(x)=0.

(2)

A full USCT observation employs a sequential activation scheme, where point sources, represented by $\rho_{k}(x)$ ,are activated one at a time using $N$ transducers arranged in an annular configuration. The corresponding scattered acoustic fields are measured at the same $M$ transducer locations, resulting in a measurement dataset denoted as $y=\{y_{k};y_{k}=u_{k}(x_{f})+n_{k}\}$ , where $k=1,\ldots,M$ represents the point source index, $x_{f}\in\mathbb{R}^{M}$ defines the locations of all transducers, and $n_{k}$ is the measurement noise.

3.2 Inverse Problem: Frequency-domain FWI

The central goal of USCT is to reconstruct the spatial distribution of sound speed within biological tissues, denoted by $c(x)$ , using the measurements obtained from the transducers, represented by $y$ . This inverse problem can be formulated as a PDE-constrained optimization problem:

\begin{split}\underset{c(x),u_{k}(x)}{\min}&\ L=\sum_{k=1}^{M}L_{k}=\sum_{k=1}% ^{M}\|y_{k}-u_{k}(x_{f})\|_{2}^{2}\\ s.t.&{\left[\nabla^{2}+\left(\frac{\omega}{c(x)}\right)^{2}\right]u_{k}(x)}=-% \rho_{k}(x).\end{split}

(3)

Since the Helmholtz equation governs steady-state wave propagation, namely the wavefield in the frequency domain, this inverse problem is commonly referred to as frequency-domain full waveform inversion (FD-FWI). Akin to many other inverse problems, FD-FWI is typically solved using gradient-based optimizers. A prevalent approach for computing the gradient, $\partial L_{k}/\partial c$ , in FWI is the adjoint method,

\begin{split}{\left[\nabla^{2}+\left(\frac{\omega}{c(x)}\right)^{2}\right]u_{k% }(x)}&=-\rho_{k}(x),\\ {\left[\nabla^{2}+\left(\frac{\omega}{c(x)}\right)^{2}\right]\lambda_{k}(x)}&=% \sum_{i=1}^{M}[u_{k}(x_{f}^{(i)})-y_{k}^{(i)}]\delta(x_{f}^{(i)}),\\ \frac{\partial L_{k}}{\partial c}(x)&=-2\omega^{2}\frac{\lambda_{k}^{\star}(x)% u_{k}(x)}{c(x)^{3}},\end{split}

(4)

where $i$ denotes the index of USCT transducers and $\delta(\cdot)$ defines a normalized point source at a specific transducer location. The gradient is proportional to the product of two wavefields, where $u_{k}(x)$ is the forward simulation result for source $\rho_{k}$ and $\lambda_{k}(x)$ arises from the backward simulation where the source term is defined based on the discrepancies between forward predictions and measured data. (See Supplementary Material for theoretical details.) Based Eq. 4, a gradient update requires solving the Helmholtz equation twice for each source, in total $2M$ times for a full observation. The computational efficiency of Helmholtz equation solvers is the primary time bottleneck for FWI.

4 Method

The earlier discussions have highlighted the challenges involved in solving large-scale Helmholtz equations for biomedical FWI. In response, this section outlines our approach which employs a novel neural operator to tackle these challenges. Section 4.1 introduces an AI-assisted pipeline for creating USCT datasets. These datasets are distinctive for their anatomically accurate targets and incorporation of real-world experimental hardware configurations. Following this, Section 4.2 introduces the NBSO, our novel neural network architecture tailored to approximate the complex, highly oscillatory solutions of Helmholtz equations. Finally, Section 4.3 details our approach for applying the NBSO to solve frequency-domain FWI.

4.1 USCT Dataset with Realistic Anatomy and Experimental Scenario

Current studies on neural operators for Helmholtz equations often employ oversimplified setups. These studies typically create heterogeneous sound speeds for operator learning based on public datasets, such as MNIST [42], or Gaussian/Uniform random fields[48, 35]. Additionally, their solution domains are confined to fewer than 20-30 wavelengths, much smaller than the real-world field of view (FOV) in biomedical USCT applications. To develop a more effective neural operator for USCT FWI, we have created a USCT dataset featuring anatomically realistic phantoms and corresponding wavefields simulated using parameters from an actual USCT system. Our dataset comprises 9590 human organ phantoms, including 8000 breasts and 1590 brains. For each phantom, we simulate wavefields from 64 different sources, resulting in a total of 613,760 input-output data pairs. Our dataset will be publicly released upon request.

4.1.1 Generation of Realistic Breast Phantoms

The breast phantoms in our study are created using a specialized tool from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project at the US Food and Drugs Administration (US-FDA). This tool is capable of generating diverse tissue maps of breast anatomy. Subsequently, we follow [17] to build sound speed distributions based on different breast density types, including all-fatty, fibroglandular, heterogeneous, and extremely dense breasts. The breast data generation process consists of three steps: 1) First, we use the VICTRE tool to generate ten 3D breast models. 2) Second, the 3D models are converted into 2D phantoms through a process of random slicing. 3) Finally, the area surrounding the breast models is filled with water to replicate experimental conditions. This methodology has enabled us to generate a comprehensive dataset of 8000 anatomically realistic breast samples.

4.1.2 Generation of Realistic Brain Phantoms

In contrast to the breast phantoms, our brain phantoms are derived by adapting data from another prevalent transcranial imaging modality, MRI. An open-source brain MRI dataset, FastMRI [1], combined with physics-inspired style transfer techniques, is utilized to generate anatomically accurate brain sound speeds. This process also involves three steps: 1) First, we conduct semantic segmentation of T2-weighted MRI images using the SAM vision foundational model. 2) Second, each segmented MRI region is mapped to a sound speed range that aligns with physical reality. 3) Finally, the surrounding medium is assigned the sound speed of water. Employing this method, we have effectively generated 1590 brain samples. Due to MRI’s limited capacity to capture bone structures, the skull element has been intentionally excluded from our current brain phantom creation. We will incorporate the skull in data generation in our future work, as it is the most important scattering interface in brain.

4.1.3 Wavefield Simulation using Experimental USCT Settings

Our dataset’s wavefields are simulated using the experimental parameters of a real annular USCT system [41]. This system comprises 512 transducers uniformly distributed around a ring with a diameter of 22 centimeters. The system operates within a frequency range of 0.4 MHz to 1.2 MHz, corresponding to an acoustic wavelength of 1.25 mm to 3.75 mm. In our simulations, we focus on wave propagation at a fixed frequency of 0.5 MHz, so the FOV contains around 80 wavenumbers. For each target within our study, 64 wavefields are simulated. We achieve this by selecting source locations at uniform intervals, utilizing every 8th transducer in the array. The numerical solutions of Helmholtz equations are generated using the Convergent Born Series (CBS) algorithm (see Sec.4.2.1 for details).

4.2 Neural Born Series Operator

In this section, we introduce the NBSO, a proficient tool designed to accurately and efficiently approximate the implicit mapping $\mathcal{S}:(c,\rho)\mapsto u$ , as defined by the Helmholtz equation in Eq. 1. Drawing inspiration from the CBS solver for the Helmholtz equation, the NBSO utilizes an iterative scheme in developing its neural network architecture. Moreover, the inputs and outputs within each iterative block of the NBSO are carefully designed based on physical concepts, thereby enhancing the operator’s ability to approximate and generalize effectively.

4.2.1 Born Series and Convergent Born Series

By defining a scattering potential $v=\left(\frac{\omega}{c(x)}\right)^{2}-\kappa^{2}-i\epsilon$ with $\kappa^{2},\epsilon>0$ , we can reformulate the Helmholtz equation in Eq. 1 as:

\mathcal{S}^{\prime}u(x)\triangleq\left[\nabla^{2}+\kappa^{2}+i\epsilon\right]% u(x)=-\rho(x)-v(x)u(x).

(5)

This leads to the derivation of the Born series for the Helmholtz equation:

u=\mathcal{G}\rho+\mathcal{G}vu\implies u=\sum_{n=0}^{\infty}(\mathcal{G}v)^{n% }\mathcal{G}\rho,

(6)

where $\mathcal{G}=\mathcal{F}^{-1}\circ(p^{2}-\kappa^{2}-i\epsilon)^{-1}\circ% \mathcal{F}$ is the Green operator of $\mathcal{S}^{\prime}$ , $\mathcal{F}$ is the Fourier transform operator and $p^{2}=p_{x}^{2}+p_{y}^{2}$ is the squared magnitude of the Fourier transformed coordinates. While the Born Series has been effective for solving Helmholtz equations with weak scattering potentials, its convergence is limited in the presence of strongly scattering samples. This poses challenges for practical wave simulations, including biomedical USCT.

To address this limitation, Osnabrugge et al. [31] proposed a modification to the Born series by introducing a preconditioner $q=iv/\epsilon$ . This approach involves multiplying both sides of Eq. 6 with $q$ , leading to a new formulation

\begin{split}qu=q\mathcal{G}\rho+q\mathcal{G}vu\implies\quad\quad\quad\quad\\ u=\sum_{n=0}^{\infty}(\mathcal{M})^{n}(q\mathcal{M}\rho),\mathcal{M}=q\mathcal% {G}v+(1-q),\end{split}

(7)

Here, the newly introduced operator $\mathcal{M}$ maintains a spectral radius smaller than 1 with a suitable choice of $\epsilon$ and $\kappa_{0}$ , ensuring convergence for strong scattering potentials. Consequently, this modified Born series is termed the Convergent Born Series (CBS).

4.2.2 Network Architecture

The NBSO’s network architecture is crafted based on the iterative structure of the CBS method. As depicted in Fig. 3, the NBSO comprises three distinct phases: 1) encoding the sound speed $c$ and the source $\rho$ into a latent space, 2) iteratively updating the latent wavefield states $\hat{u}_{n}$ using NBSO layers, and 3) projecting the final latent state back into the physical space. This architecture is mathematically represented as follows:

\begin{split}\text{Phase $1$:}&\quad v_{c}=\mathcal{E}_{\theta_{c}}(c),\quad% \hat{u}_{0}=w_{0}=\mathcal{E}_{\theta_{\text{init}}}(c,\rho),\\ \text{Phase $2$:}&\quad w_{n+1}=\mathcal{M}_{\theta_{n}}(w_{n},v_{c}),\quad% \hat{u}_{n+1}=\hat{u}_{n}+w_{n+1},\\ \text{Phase $3$:}&\quad u_{\text{out}}=\mathcal{D}_{\theta_{\text{Pred}}}(\hat% {u}_{N}).\end{split}

(8)

In this model, $v_{c}$ is the latent state of sound speed, $\hat{u}_{n}$ is the latent state of the wavefield, $w_{n}$ is the additive correction per NBSO layer, $u_{\text{out}}$ is the output wavefield in the physical space, with $n$ ranging from $0$ to $N$ . The encoders $\mathcal{E}_{\theta_{c}}$ and $\mathcal{E}_{\theta_{\text{init}}}$ process sound speed and the initial wavefield, respectively, while $\mathcal{M}_{\theta_{n}}$ acts as a CBS-like operator to update the latent wavefield. This operator is further expanded as

\mathcal{M}_{\theta_{n}}(w_{n},v_{c})=\Phi_{\theta_{n}}\left(\mathcal{F}^{-1}% \circ R_{\theta_{n}}\circ\mathcal{F}(w_{n}v_{c})\right),

(9)

where $\Phi_{\theta_{n}}$ is a neural network and $R_{\theta_{n}}$ represents the trainable Fourier modes. Each NBSO layer is assigned independent weights to enhance the modeling capacity. The decoder $\mathcal{D}_{\theta_{\text{Pred}}}$ then predicts the wavefield in physical space.

Our NBSO aligns closely with the CBS method, differing from other Born series inspired methods in that the initial iteration state depends on both $c$ and $\rho$ , not just $\rho$ . This design significantly enhances convergence and accuracy, especially in scenarios involving strongly scattering media.

Moreover, our studies suggest that learning the complex scattering amplitude is much more efficient than approximating the entire wavefield directly. In this methodology, the complete wavefield prediction is constructed as the product of the neural network’s output (namely scattering amplitude), $u_{\text{out}}$ , and the homogeneous field for the source $\rho$ in water, $u_{\text{in}}(\rho)$ ,

u_{\text{Pred}}=(1+u_{\text{out}})u_{\text{in}}(\rho).

(10)

This approach significantly alleviates the challenge for the neural operator in terms of learning the high-frequency oscillations that are characteristic of the homogeneous field.

4.3 NBSO-based USCT FWI

Having established the NBSO as a precise and efficient surrogate model for the Helmholtz equation, we are well-prepared to compute the FWI gradient via the adjoint method, as specified in Eq. 4. This computation involves running the NBSO twice for each transducer source, which includes conducting a forward simulation followed by residual back-propagation. Armed with the calculated gradient, we then employ the Limited-memory BFGS algorithm is to reconstruct the sound speed of target biological tissues.

5 Results

In this section, we evaluate the efficacy of our proposed method for both forward simulations and inverse problems in USCT. In Sec. 5.1, we begin by comparing NBSO against three esteemed neural network baselines—FNO [20], BFNO [48], and UNet [34]—in solving forward Helmholtz equations. Subsequently, in Sec. 5.2, we delve into the performance of NBSO in FWI reconstructions across different scenarios. These outcomes are also benchmarked against results from other neural networks and conventional CBS solvers. All comparisons utilize datasets from both breast and brain USCT datasets. Implementation details is provided in the supplementary material. The code will be made publicly available upon request.

5.1 USCT Forward Simulation

In this section, we conduct extensive testing of NBSO’s prediction accuracy on our realistic USCT dataset, as described in Sec. 4.1. We implement two distinct settings for our analysis: 1) we train the neural networks using data with a fixed source located on the left-most transducer, and 2) we train neural networks utilizing data from 64 distinct sources. Essentially, the first setting assesses NBSO’s proficiency in mapping sound speed to wavefield, whereas the second evaluates its generalization capability in solving the Helmholtz Equation with varied source locations. We measure wavefield prediction accuracy quantitatively using the relative rooted mean squared error (RRMSE) between the neural network’s predicted wavefield $\hat{u}(x)$ and the true wavefield $u(x)$

RRMSE(u(x),\hat{u}(x))=\sqrt{\frac{\int|u(x)-\hat{u}(x)|^{2}dx}{\int|u(x)|^{2}% dx}}

(11)

Both breast and brain datasets are used to test all baseline methods. Considering the unequal number of breast and brain phantoms, we ensure comparable metrics by training and validating neural networks using a subset of breasts (2400 for training, 240 for validation) and the entire set of brains (1500 for training, 90 for validation).

	Single-source		Multi-source
	Brain	Breast	Brain	Breast
NBSO	19.2	20.8	19.0	20.7
FNO	20.8	21.9	19.6	23.2
BFNO	19.7	21.5	19.1	21.7
UNet	22.6	31.8	23.1	27.2
NBSO ${}^{\star}$	19.7	22.1	-	-
FNO ${}^{\star}$	21.8	22.2	-	-
BFNO ${}^{\star}$	20.1	22.2	-	-
UNet ${}^{\star}$	24.6	32.5	-	-

Table 1: Comparison of the prediction error (RRMSE) between NBSO and baseline neural networks (FNO, BFNO, UNet). Each row displays results from a specific network architecture, while each column specifies the dataset and source configuration for neural network training. Methods marked by a star do not incorporate scattering amplitude learning. NBSO consistently outperforms other baseline neural networks in prediction accuracy, especially in the multi-source setting.

The RRMSE results for NBSO and other baseline neural networks are presented in Table 1. In our experiments, we not only adhere to the original formulations (marked by superscripted star) of FNO, BFNO, and UNet, but also test their modified versions (without star) integrated with our scattering amplitude learning methodology. As demonstrated in the left half of Table 1, the scattering amplitude learning improves all frameworks in the fixed source scenario. Consequently, in the experiments with multiple sources, network performance is evaluated only with the scattering amplitude incorporated. As demonstrated in Table 1, NBSO surpasses all other baselines in both breast and brain datasets, regardless of the source configurations. Fig. 4 provides a visual comparison of the predicted wavefields for a representative breast phantom, as generated by various baselines, all trained on breast datasets and assuming multiple sources. NBSO captures high-frequency features with greater accuracy than other methods.

5.2 USCT Full Waveform Inversion

In this section, we evaluate the reconstruction quality of USCT FWI using the NBSO alongside various baseline methods. Our experimental setup simulates the USCT system with 256 transducers uniformly distributed around the ring, tasked with source activation and signal collection. As detailed in Sec. 4.3, image reconstruction for all methods is conducted using the adjoint method in conjunction with the LBFGS optimizer, and the quality of image reconstruction is assessed using SSIM and PSNR metrics. Given that our USCT FWI requires computing wavefields from multiple sources for gradient calculation, only neural networks trained in the multi-source setting are used in this section.

Table. 2 showcases the USCT FWI’s performance on 30 breast phantoms using NBSO, FNO, UNet, BFNO, and CBS solvers at different SNR levels (noise-free, 10dB, 5dB) for observed data. NBSO consistently outperforms all neural network-based methods in reconstruction quality, regardless of noise levels. This is further illustrated in Fig. 5, where we display the reconstruction results for breast phantoms of two different types, all-fatty and fibroglandular. The NBSO-based FWI effectively recovers key anatomical features in both cases, whereas other neural networks occasionally introduce significant artifacts. The superior performance of NBSO over other baselines is especially evident when applied to all-fatty breast tissues, where the higher contrast of sound speed indicates stronger scattering effects. Remarkably, the addition of measurement noise does not substantially degrades NBSO’s reconstructions, suggesting that the neural operator might possess some implicit prior knowledge about the target, acquired from training on anatomically realistic data. However, it is also noted that NBSO’s FWI results appear blurrier compared to the ground-truth and CBS-FWI, indicating NBSO’s potential shortfall in capturing high-frequency wavefield features.

SNR	Metrics	CBS	NBSO	FNO	BFNO	UNet
Clean	PSNR	33.1	27.8	27.0	26.1	26.1
Clean	SSIM	0.958	0.907	0.901	0.892	0.890
10dB	PSNR	31.3	27.5	26.7	26.00	26.1
10dB	SSIM	0.886	0.894	0.894	0.888	0.888
5dB	PSNR	28.2	27.2	26.6	25.9	25.9
5dB	SSIM	0.796	0.889	0.881	0.882	0.884

Table 2: Comparison of FWI reconstruction quality across 30 breast phantoms. Each row indicates the reconstruction PSNR & SSIM at a specific SNR level, and each column corresponds to a different neural network architecture. NBSO consistently exhibits superior reconstruction quality compared to other neural networks.

	Quantity	Single	Batch
CBS(CPU)	mean	0.197s	47.994s
CBS(CPU)	std	0.005s	1.26s
CBS(GPU)	mean	0.036s	7.160s
CBS(GPU)	std	0.003s	0.82s
NBSO(GPU)	mean	0.004s	0.135s
NBSO(GPU)	std	0.0003s	0.015s

Table 3: Comparison of computation speeds between CBS and NBSO. NBSO is one to two orders of magnitude faster than CBS, with even greater performance in batch-processing scenarios.

	Metrics	Breast	Brain
Mix	PSNR	26.0	26.8
training	SSIM	0.882	0.875
Breast	PSNR	26.5	17.6
only	SSIM	0.886	0.644
Brain	PSNR	21.5	23.5
only	SSIM	0.829	0.771

Table 4: Comparison of FWI reconstruction quality with NBSO trained on different datasets. Each row specifies the training set, while each column corresponds to different validation set.

While NBSO’s reconstruction quality may slightly lag behind that of CBS, its computational efficiency is substantially higher, as demonstrated in Table 3. We use Intel i7-13700KF and NVIDIA RTX4080ti for speed evaluations. NBSO surpasses GPU-accelerated CBS by an order of magnitude and CPU-based CBS by two orders of magnitude for a single source. Given the capability of neural networks for batch processing—predicting the wavefields of multiple inputs parallelly—the efficiency gap between NBSO and CBS becomes even more pronounced. In summary, NBSO presents a well-balanced solution, offering reasonable reconstruction quality alongside superior efficiency.

We further investigate the adaptability of NBSO-based FWI to out-of-distribution targets. As reported in Table 4 and illustrated in Fig. 6, we assess the reconstruction quality on both breast and brain phantoms, using NBSO trained on breast data, brain data, or a combination of both. NBSO demonstrates competent performance on targets that are within its training distribution. However, it shows a marked decrease in effectiveness for targets significantly divergent from its training set. Notably, training NBSO on a mixed dataset enhances its generalizability across datasets, yielding better image reconstruction quality than training on a single data type for brain imaging.

6 Conclusion

In this paper, we present the NBSO, a novel operator learning method designed to solve the Helmholtz equation with high accuracy and efficiency. Building on this, we developed an NBSO-based FWI pipeline for biomedical USCT image reconstruction. Our approach, rigorously trained and validated on breast and brain datasets under experimental USCT conditions, surpasses leading neural network baselines such as FNO, BFNO, and UNet. This success points to the feasibility of quasi-real-time USCT reconstruction using neural networks, opening new avenues for their clinical applications.

References

zbo [2019] fastMRI: An open dataset and benchmarks for accelerated MRI. arXiv, 2019.
Bachmann and Tromp [2020] Etienne Bachmann and Jeroen Tromp. Source encoding for viscoacoustic ultrasound computed tomography. The Journal of the Acoustical Society of America, 147(5):3221--3235, 2020.
Bernard et al. [2017] Simon Bernard, Vadim Monteiller, Dimitri Komatitsch, and Philippe Lasaygues. Ultrasonic computed tomography based on full-waveform inversion for bone quantitative imaging. Physics in Medicine and Biology, 62(17):7011--7035, 2017.
Cai et al. [2021] Shengze Cai, Zhicheng Wang, Lu Lu, Tamer A. Zaki, and George Em Karniadakis. DeepM&mnet: Inferring the electroconvection multiphysics fields based on operator approximation by neural networks. Journal of Computational Physics, 436:110296, 2021.
Cueto et al. [2022] Carlos Cueto, Lluis Guasch, Javier Cudeiro, Òscar Calderón Agudo, Thomas Robins, Oscar Bates, George Strong, and Meng-Xing Tang. Spatial response identification enables robust experimental ultrasound computed tomography. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 69(1):27--37, 2022.
Dai et al. [2023] Haocheng Dai, Michael Penwarden, Robert M. Kirby, and Joshi Sarang. Neural Operator Learning for Ultrasound Tomography Inversion. arXiv, 2023.
Di Leoni et al. [2021] P. Clark Di Leoni, L. Lu, C. Meneveau, G. Karniadakis, and T. A. Zaki. DeepONet prediction of linear instability waves in high-speed boundary layers. arXiv, 2021.
Fincke et al. [2022] Jonathan Fincke, Xiang Zhang, Bonghun Shin, Gregory Ely, and Brian W. Anthony. Quantitative sound speed imaging of cortical bone and soft tissue: Results from observational data sets. IEEE Transactions on Medical Imaging, 41(3):502--514, 2022.
Guasch et al. [2020] Lluís Guasch, Oscar Calderón Agudo, Meng-Xing Tang, Parashkev Nachev, and Michael Warner. Full-waveform inversion imaging of the human brain. npj Digital Medicine, 3(1):28, 2020.
Guibas et al. [2022] John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, and Bryan Catanzaro. Adaptive fourier neural operators: Efficient token mixers for transformers. arXiv, 2022.
He et al. [2023] Weiguang He, Yubing Li, and Lu Liu. Elastic vertically transversely isotropic full-waveform inversion: From synthetic to field data. Geophysics, 88(3):1MJ--V289, 2023.
Huang et al. [2014] Lianjie Huang, Youzuo Lin, Zhigang Zhang, Yassin Labyed, Sirui Tan, Nghia Q. Nguyen, Kenneth M. Hanson, Daniel Sandoval, and Michael Williamson M.d. Breast ultrasound waveform tomography: using both transmission and reflection data, and numerical virtual point sources. In Medical Imaging 2014: Ultrasonic Imaging and Tomography, pages 187--198. SPIE, 2014.
Kirillov et al. [2023] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything. arXiv, 2023.
Kleinman et al. [2020] R. Kleinman, G. Roach, and P Van Den Berg. Convergent born series improves the accuracy of numerical solution of timeindependent photoacoustic wave equation. 67(9):849–--855, 2020.
Kovachki et al. [2021] Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural Operator: Learning Maps Between Function Spaces. arXiv, 2021.
Lee et al. [2022] Moosung Lee, Hervé Hugonnet, and YongKeun Park. Inverse problem solver for multiple light scattering using modified born series. Optica, 9(2):177--182, 2022.
Li et al. [2022] Fu Li, Umberto Villa, Seonyeong Park, and Mark A. Anastasio. 3-d stochastic numerical breast phantoms for enabling virtual imaging trials of ultrasound computed tomography. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 69(1):135--146, 2022.
Li et al. [2023] Yubing Li, Jian Wang, Chang Su, Weijun Lin, Xiuming Wang, and Yi Luo. Quantitative ultrasound brain imaging with multiscale deconvolutional waveform inversion. Chinese Physics B, 32(1), 2023.
Li et al. [2020] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equations. arXiv, 2020.
Li et al. [2021] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. In ICLR, 2021.
Lin et al. [2021] Chensen Lin, Zhen Li, Lu Lu, Shengze Cai, Martin Maxey, and George Em Karniadakis. Operator learning for predicting multiscale bubble growth dynamics. J Chem Phys, 154(10):104118, 2021.
Lin and Huang [2014] Youzuo Lin and Lianjie Huang. Acoustic- and elastic-waveform inversion using a modified total-variation regularization scheme. Geophysical Journal International, 200(1):489--502, 2014.
Lozenski et al. [2023] Luke Lozenski, Hanchen Wang, Fu Li, Mark A Anastasio, Brendt Wohlberg, Youzuo Lin, and Umberto Villa. Learned Full Waveform Inversion Incorporating Task Information for Ultrasound Computed Tomography. arXiv, 2023.
Lu et al. [2019] Lu Lu, Pengzhan Jin, and George Karniadakis. DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv, 2019.
Lu et al. [2021] Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George EM Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3, 2021.
Lu et al. [2022a] Lu Lu, Xuhui Meng, Shengze Cai, Zhiping Mao, Somdatta Goswami, Zhongqiang Zhang, and George EM Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Computer Methods in Applied Mechanics and Engineering, 393, 2022a.
Lu et al. [2022b] Lu Lu, Raphaël Pestourie, Steven G. Johnson, and Giuseppe Romano. Multifidelity deep neural operators for efficient learning of partial differential equations with application to fast inverse design of nanoscale heat transport. Phys. Rev. Res., 4(2):023210, 2022b. Publisher: American Physical Society.
Lucka et al. [2021] Felix Lucka, Mailyn Pérez-Liva, Bradley E Treeby, and Ben T Cox. High resolution 3d ultrasonic breast imaging by time-domain full waveform inversion. Inverse Problems, 38(2):025008, 2021.
Mao et al. [2021] Zhiping Mao, Lu Lu, Olaf Marxen, Tamer A. Zaki, and George Em Karniadakis. DeepM&mnet for hypersonics: Predicting the coupled flow and finite-rate chemistry behind a normal shock using neural-network approximation of operators. Journal of Computational Physics, 447:110698, 2021.
Martiartu et al. [2020] Naiara Korta Martiartu, Christian Boehm, and Andreas Fichtner. 3-d wave-equation-based finite-frequency tomography for ultrasound computed tomography. IEEE Trans. Ultrason., Ferroelect., Freq. Contr., 67(7):1332--1343, 2020.
Osnabrugge et al. [2016] Gerwin Osnabrugge, Saroch Leedumrongwatthanakun, and Ivo M. Vellekoop. A convergent born series for solving the inhomogeneous helmholtz equation in arbitrarily large media. Journal of Computational Physics, 322:113--124, 2016.
Pathak et al. [2022] Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, Pedram Hassanzadeh, Karthik Kashinath, and Animashree Anandkumar. FourCastNet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv, 2022.
Pérez-Liva et al. [2017] M. Pérez-Liva, J. L. Herraiz, J. M. Udías, E. Miller, B. T. Cox, and B. E. Treeby. Time domain reconstruction of sound speed and attenuation in ultrasound computed tomography using full wave inversion. J Acoust Soc Am, 141(3):1595, 2017.
Ronneberger et al. [2015] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention--MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234--241. Springer, 2015.
Stanziola et al. [2023] Antonio Stanziola, Simon Arridge, Ben T Cox, and Bradley E Treeby. A learned born series for highly-scattering media. JASA Express Letters, 3(5), 2023.
Virieux and Operto [2009] Jean Virieux and Stéphane Operto. An overview of full-waveform inversion in exploration geophysics. Geophysics, 74(6):WCC1--WCC26, 2009.
Wang et al. [2021a] Ning Wang, Wei Yan, Yurui Qu, Siqi Ma, Stan Z Li, and Min Qiu. Intelligent designs in nanophotonics: from optimization towards inverse creation. PhotoniX, 2(1):1--35, 2021a.
Wang et al. [2021b] Sifan Wang, Mohamed Aziz Bhouri, and Paris Perdikaris. Fast PDE-constrained optimization via self-supervised operator learning. 2021b.
Wen et al. [2022] Gege Wen, Zongyi Li, Kamyar Azizzadenesheli, Anima Anandkumar, and Sally M. Benson. U-FNO—an enhanced fourier neural operator-based deep-learning model for multiphase flow. Advances in Water Resources, 163:104180, 2022.
Wiskin et al. [2017] J. W. Wiskin, D. T. Borup, E. Iuanow, J. Klock, and Mark W. Lenox. 3-d nonlinear acoustic inverse scattering: Algorithm and quantitative results. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 64(8):1161--1174, 2017.
Wu et al. [2023] Xiaoqing Wu, Yubing Li, Chang Su, Panpan Li, Xiangda Wang, and Weijun Lin. Ultrasound computed tomography based on full waveform inversion with source directivity calibration. Ultrasonics, 132:107004, 2023.
Xiao et al. [2017] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv, 2017.
Xu et al. [2022] Dixiang Xu, Weiguang He, Lu Liu, and Yubing Li. Point cloud-based high-dimensional optimal transport for full waveform inversion. IEEE Transactions on Geoscience and Remote Sensing, 60:1--10, 2022.
Yang et al. [2023] Yan Yang, Angela F Gao, Kamyar Azizzadenesheli, Robert W Clayton, and Zachary E Ross. Rapid seismic waveform modeling and inversion with neural operators. IEEE Transactions on Geoscience and Remote Sensing, 61:1--12, 2023.
Yuan et al. [2023] Yu Yuan, Yue Zhao, Yang Xiao, Jing Jin, Naizhang Feng, and Yi Shen. Optimization of reconstruction time of ultrasound computed tomography with a piecewise homogeneous region-based refract-ray model. Ultrasonics, 127:106837, 2023.
Zeng et al. [2022] Qili Zeng, Shihang Feng, Brendt Wohlberg, and Youzuo Lin. InversionNet3d: Efficient and scalable learning for 3-d full-waveform inversion. IEEE Transactions on Geoscience and Remote Sensing, 60:1--16, 2022.
Zhang et al. [2012] Zhigang Zhang, Lianjie Huang, and Youzuo Lin. Efficient implementation of ultrasound waveform tomography using source encoding. In Medical Imaging 2012: Ultrasonic Imaging, Tomography, and Therapy, pages 22--31. SPIE, 2012.
Zhao et al. [2023] Qingqing Zhao, Yanting Ma, Petros Boufounos, Saleh Nabi, and Hassan Mansour. Deep born operator learning for reflection tomographic imaging. In ICASSP, 2023.
Zhou et al. [2023] Chenchen Zhou, Kailiang Xu, and Dean Ta. Frequency-domain full-waveform inversion-based musculoskeletal ultrasound computed tomography. The Journal of the Acoustical Society of America, 154(1):279--294, 2023.
Zhu et al. [2023] Min Zhu, Shihang Feng, Youzuo Lin, and Lu Lu. Fourier-DeepONet: Fourier-enhanced deep operator networks for full waveform inversion with improved accuracy, generalizability, and robustness. Computer Methods in Applied Mechanics and Engineering, 416:116300, 2023.

\thetitle

Supplementary Material

7 Adjoint Method for Frequency-domain FWI

The frequency-domain FWI can be formulated as a PDE-constrained optimization problem:

\begin{split}\underset{c(x),u_{k}(x)}{\min}&\ L=\sum_{k=1}^{M}L_{k}=\sum_{k=1}% ^{M}\|y_{k}-u_{k}(x_{f})\|_{2}^{2}\\ s.t.&{\left[\nabla^{2}+\left(\frac{\omega}{c(x)}\right)^{2}\right]u_{k}(x)}=-% \rho_{k}(x).\end{split}

(12)

A prevalent approach for computing the gradient, $\partial L_{k}/\partial c$ , in FWI is the adjoint method. Using the method of Lagrange multipliers, the problem can be converted into an unconstrained form

\begin{split}\underset{c(x),u_{k}(x),\lambda_{k}(x)}{\min}\mathcal{L}=\sum_{k=% 1}^{M}\mathcal{L}_{k}=\sum_{k=1}^{M}\|y_{k}-u_{k}(x_{f})\|_{2}^{2}\\ -\sum_{k=1}^{M}\left\langle\lambda_{k}(x),\mathcal{S}_{c}u_{k}(x)+\rho_{k}(x)% \right\rangle\end{split}

(13)

where $\mathcal{L}$ is the Lagrangian function, $\left\langle f,g\right\rangle$ denotes the real part of inner product of function $f$ and $g$ in $L^{2}(\mathbb{C})$ , $y_{k}$ denotes the measurement obtained by transducer for source $\rho_{k}$ , and $\mathcal{S}_{c}$ is the differential operator

\mathcal{S}_{c}(\cdot)=\left[\nabla^{2}+\left(\frac{\omega}{c(x)}\right)^{2}% \right](\cdot),

(14)

We then calculate the partial derivatives of $\mathcal{L}_{k}$ with respect to $\lambda_{k},u_{k},c$ , respectively. Setting $\frac{\partial\mathcal{L}_{k}}{\partial\lambda_{k}}(x)=0$ yields the Helmholtz equations itself. Similarly, enforcing $\frac{\partial\mathcal{L}_{k}}{\partial u_{k}}(x)=0$ leads to the derivation of the adjoint equation,

\mathcal{S}_{c}\lambda_{k}(x)=\sum_{i=1}^{M}[u_{k}(x_{f}^{(i)})-y_{k}^{(i)}]% \delta(x_{f}^{(i)}),

(15)

where $i$ denotes the index of USCT transducers and $\delta(\cdot)$ defines a normalized point source at a specific transducer location. Substituting Eq. 14 and Eq. 15 into $\partial\mathcal{L}_{k}/\partial c$ results in

\begin{split}\frac{\partial\mathcal{L}_{k}}{\partial c}(x)&=\frac{\partial% \mathcal{S}_{c}}{\partial c}(x)\lambda_{k}^{\star}(x)u_{k}(x)\\ &=-2\omega^{2}\frac{\lambda_{k}^{\star}(x)u_{k}(x)}{c(x)^{3}},\end{split}

(16)

The gradient is proportional to the product of two wavefields, where $u_{k}(x)$ is the forward simulation result for source $\rho_{k}$ and $\lambda_{k}(x)$ arises from the backward simulation whose source term is defined by the discrepancies between forward predictions and measured data.

8 Implementation Details

8.1 Numerical Settings for Data Generation

We refer to the Wavesim [31] for the details of the governing equations. We simulate the numerical solution on a square with side length of 24 centimeters. The step size of grid is 0.05cm thus the grid shape is $480\times 480$ . For boundary condition, we use an absorbing boundary layer (type ARL in Wavesim) whose boundary width is $[75,75]$ grid outside the solving region. We choose the points per wavelenght $PPW=6$ for the absorbing boundary layer.

8.2 Hyperparameters of Baseline Models

FNO

We use a vanilla FNO model with 6 FNO layers whose modes are $\{[30]\times 6\}$ and width is 20 in the single-source setting. For multi-source problem, we increase the layer number to 8 and set the modes and width as $\{[50]\times 9\}$ and 40, respectively, to enlarge the representative ability.

BFNO

BFNO[48] is a modified version of FNO, so we use the same layer number and mode number as FNO for both settings.

UNet

We implement UNet using the same structure as [34] but a comparable model size to FNO for the sake of fairness.

•

Downsample Blocks: Each downsample block contains two Conv2d layer with LeakyReLU activation and GroupNorm. The first layer maps channel $c_{\text{in}}$ to $c_{\text{out}}$ with stride $2$ and kernel size $3$ , while the second layer preserves the channel number but changes the stride number to $1$ .
•

Upsample Blocks: The implementation of upsample blocks is the same as downsample blocks but with a skip channel in the first Conv2d layer.

We use the Unet structure with resolution size sequence $\{[60]\times 6,[120]\times 5,[240]\times 5,[480]\times 4\}$ and 4 skip channels for Upsample block. An input block with the downsample structure using stride 1 is added to the beginning.

Problem	Quantity	NBSO	FNO	BFNO	UNet
Single-source	$\#$ PARAM	10.1M	10.1M	10.1M	20.0M
Single-source	$\#$ MEM	16.4G	12.0G	16.4G	11.8G
Multi-source	$\#$ PARAM	144M	144M	144M	36.0M
Multi-source	$\#$ MEM	35.5G	31.9G	35.5G	12.0G

Table 5: Number of trainable parameters (1st and 3rd row) and memory used (2nd and 4th row) of neural network models in single-source and multi-source setting.

8.3 Training of Neural Networks

The training of neural network models, including the baselines (FNO, BFNO, UNet), is performed with the ADAM optimizer, with a learning rate $\eta=10^{-2}$ , for 50 epochs in both single-source and multi-source settings. The training objective is to minimize the relative $L_{2}$ loss function, incorporating a weight decay rate of $10^{-5}$ . We also implement a Cosine-Annealing learning rate scheduler with a step size of $10$ and a minimum learning rate of $\eta_{\min}=2\times 10^{-4}$ . In the multi-source setting, models are trained using 4 GPUs in the DeepSpeed framework with a batch size of 10, while single-source models are trained on a single GPU with the same batch size.

9 Results

9.1 Forward Simulation for Brain Phantoms

Fig. 7 offers a visual comparison of wavefields predicted by NBSO for a representative brain phantom, alongside those generated by baseline models trained on the same brain datasets and assuming multiple sources. NBSO captures high-frequency features with greater accuracy than other methods.

9.2 USCT FWI for Brain Phantoms

Table. 6 showcases the USCT FWI’s performance on 30 brain phantoms using NBSO, FNO, UNet, BFNO, and CBS solvers at different SNR levels (noise-free, 10dB, 5dB) for observed data. NBSO consistently outperforms all neural network-based methods in reconstruction quality, regardless of noise levels. This is further illustrated in Fig. 8, where we display the reconstruction results for a brain phantom.

SNR	Metrics	CBS	NBSO	FNO	BFNO	UNet
Clean	PSNR	28.3	23.8	23.0	22.6	21.0
Clean	SSIM	0.883	0.783	0.765	0.760	0.711
10dB	PSNR	27.5	23.5	22.9	22.5	21.0
10dB	SSIM	0.818	0.762	0.757	0.758	0.709
5dB	PSNR	25.7	23.1	22.7	22.4	20.9
5dB	SSIM	0.685	0.738	0.734	0.733	0.706

Table 6: Comparison of FWI reconstruction quality across 30 brain phantoms. Each row indicates the reconstruction PSNR & SSIM at a specific SNR level, and each column corresponds to a different neural network architecture. NBSO consistently exhibits superior reconstruction quality compared to other neural networks.