Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

  • failed: epic

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY 4.0
arXiv:2312.15575v1 [eess.IV] 25 Dec 2023

Neural Born Series Operator for Biomedical Ultrasound Computed Tomography

Zhijun Zeng These authors contributed equally to this work. Department of Mathematical Sciences, Tsinghua University, Beijing, China. DP Technology, Ltd., Beijing, China. Yihang Zheng College of Future Technology, Peking University, Beijing, China. National Biomedical Imaging Center, Peking University, Beijing, China. Youjia Zheng College of Future Technology, Peking University, Beijing, China. National Biomedical Imaging Center, Peking University, Beijing, China. Yubing Li Institute of Acoustics, Chinese Academy of Sciences, Beijing, China. Zuoqiang Shi Yau Mathematical Sciences Center, Tsinghua University, Beijing, China. Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, China. He Sun Corresponding author: hesun@pku.edu.cn College of Future Technology, Peking University, Beijing, China. National Biomedical Imaging Center, Peking University, Beijing, China.
Abstract

Ultrasound Computed Tomography (USCT) provides a radiation-free option for high-resolution clinical imaging. Despite its potential, the computationally intensive Full Waveform Inversion (FWI) required for tissue property reconstruction limits its clinical utility. This paper introduces the Neural Born Series Operator (NBSO), a novel technique designed to speed up wave simulations, thereby facilitating a more efficient USCT image reconstruction process through an NBSO-based FWI pipeline. Thoroughly validated on comprehensive brain and breast datasets, simulated under experimental USCT conditions, the NBSO proves to be accurate and efficient in both forward simulation and image reconstruction. This advancement demonstrates the potential of neural operators in facilitating near real-time USCT reconstruction, making the clinical application of USCT increasingly viable and promising.

1 Introduction

Ultrasound Computed Tomography (USCT) emerges as an innovative radiation-free imaging approach with exceptional potential for high-resolution clinical imaging of human tissues [33, 9, 17]. Unlike conventional B-mode ultrasound, which demands manual operation by physicians and relies solely on reflected signals for image formation, USCT employs specialized configurations such as annular, cylindrical, or hemispherical transducer arrays [40, 9, 5]. This arrangement enables the simultaneous acquisition of both transmitted and reflected signals from wave-tissue interactions by sequentially emitting waves from each transducer and measuring signals using the remaining ones [12, 41, 49].

Refer to caption
Figure 1: Schematic diagram of a USCT system: (a) The imaging target is positioned within an annular transducer array, where each transducer emits waves in sequence, while the remaining transducers serve as receivers; (b) This setup results in measurement data forming a square matrix in the frequency domain, with each row representing the signals received from a single source.

Leveraging this approach, USCT holds the potential to retrieve 2D/3D structural information of biological tissues [30, 2, 9, 47, 28] akin to medical X-ray computed tomography (CT). However, unlike X-ray CT, the ultrasonic wavelength is comparable to the target’s scale, rendering the straight-ray approximation invalid in USCT [45]. Consequently, wavefields modulated by tissues show significant scattering effects. To enhance image quality, USCT typically employs partial differential equations (PDEs) to model ultrasonic wave propagation and solves a PDE-constrained nonlinear inverse problem to reconstruct tissue properties, such as attenuation and sound speed [3, 33]. This technology is commonly referred to as full waveform inversion (FWI). FWI, while effective, requires multiple iterations of PDE simulations for a single reconstruction. Due to the computationally intensive nature of numerical PDE solvers, achieving quasi-real-time FWI image reconstruction remains a challenge [2, 23]. This limitation restricts the broader application of FWI and USCT in clinical diagnosis, particularly in first-aid scenarios like promptly identifying stroke types (ischemic or hemorrhagic) [9].

Neural operators have defined a novel deep learning framework for efficient PDE-based physics simulations [24, 19, 20, 25]. Grounded in the operator theory, they approximate complex mappings between PDE parameters and their physical field outputs within function spaces [15]. This methodology effectively merges the efficiency of neural networks and the modeling accuracy of PDEs, demonstrating remarkable potential in diverse applications. However, despite their computational speed, current neural operators often struggle to achieve sufficient accuracy when solving acoustic equations, including time-domain wave equations and frequency-domain Helmholtz equations, particularly in practical imaging scenarios. In biomedical imaging, for instance, the requirement for a large field of view (FOV), often exceeding 50 wavelengths, results in highly oscillatory solutions that challenge existing neural operator models.

In this paper, we present the Neural Born Series Operator (NBSO), a novel operator learning method tailored for efficiently solving large-scale Helmholtz equations. Drawing inspiration from the advanced numerical Helmholtz equation solver, Convergent Born Series (CBS) [31, 14], the NBSO leverages the intrinsic structure of the Helmholtz equation in its neural network architecture design. Rigorously trained and validated on USCT datasets that simulate a real-world system, NBSO proves to be a highly accurate and efficient surrogate model for addressing biomedical FWI. This paper makes three key contributions to the field:

  • Large-Scale USCT Dataset Construction: We create a comprehensive Helmholtz equation dataset based on real USCT system parameters, featuring over 600,000 pairs of sound speed and wavefield for brain and breast tissues.

  • Innovative NBSO Framework: We introduce the NBSO, achieving unparalleled precision among neural operators for solving the highly oscillatory Helmholtz equations prevalent in biomedical imaging.

  • NBSO-based in Biomedical FWI: We apply NBSO to biomedical FWI and demonstrate its potential to reconstruct high-quality breast and brain images at a speed 100 times faster than conventional methods, paving the way for quasi-real-time USCT imaging.

2 Related Works

2.1 Full Waveform Inversion

FWI addresses a PDE-constrained nonlinear inverse problem in imaging, offering a breakthrough solutions in reconstructing properties of complex media. Initially prominent in geophysical exploration for oil and gas reservoir imaging[36, 22, 11], FWI integrates wave physics and multiple scattering effects for data interpretation, surpassing traditional ray approximation-based methods [43] in imaging quality. FWI’s utility has recently expanded to the biomedical imaging sector, with notable applications in optical diffraction tomography (ODT) for imaging thick biological samples [16], and USCT for high-resolution imaging of breasts [28, 41], limbs [3, 8], and brains [9, 18]. Despite these advancements, FWI’s translation to clinical settings remains limited due to its significant computational demands, particularly in the scenarios requiring large field of view. Presently, single reconstruction tasks may take hours or even days on high-performance computing clusters [9, 16]. Accelerating FWI is essential for its broader adoption and future progress, especially in biomedical imaging.

2.2 Neural Operators

Neural operators defines a novel deep-learning paradigm that approximates mappings between infinite-dimensional function spaces, in contrast to conventional neural networks restricted to finite-dimensional Euclidean spaces. This capability positions them as a promising solution for complex, large-scale PDEs, historically challenging in practical applications. A notable implementation in this domain is the Deep Operator Network (DeepONet) [24, 25, 4, 29, 7, 21], featuring a distinctive “branch and trunk” network structure. Alternatively, the Fourier Neural Operator (FNO) [20, 19, 32, 39, 10] creatively parameterizes the integral kernel in Fourier space, enhancing a neural network’s expressiveness and efficiency for operator learning. Both techniques have demonstrated reasonable accuracy, high inference efficiency, and robustness against noise and generalization errors [26]. Spurred by these initial successes, various specialized forms have also been proposed for solving PDE of specific forms. For example, Learned Born Series (LBO)[35] and Born FNO (BFNO)[48] have been proposed for handling Helmholtz equations by modifying FNO based on Born series. Despite their effectiveness in managing complex PDEs, neural operators typically lag behind traditional methods in computational accuracy, which warrants further investigation.

2.3 Neural Operators for PDE Inverse Problems

Thanks to their impressive approximation capabilities and negligible computational speed, neural operators are increasingly employed to solve PDE-constrained inverse problems in diverse fields such as turbulent flows [20], solid mechanics [38], weather forecasting [32], material designs [27] and nano-photonics [37]. There are also growing interests in neural operator-based FWI, with many studies focusing on directly learning the mapping from observed data to parametric images of physical properties, as seen in geophysics [46, 50] and ultrasound imaging [23]. However, this data-to-image mapping is often non-unique due to limited observations, leading to generalization challenges. An alternative is to use neural operators as surrogate forward models to accelerate PDE solving, mapping physical properties to data, followed by inversion using a gradient-based optimizer. This approach enhances generalization and robustness, demonstrated in seismic imaging [44], electromagnetic imaging [48], and time-of-flight tomography in ultrasound [6]. Yet, most of these methods focus on limited computational domains (under 30 wavenumbers) to ease learning highly-oscillatory features. The potential of neural operator-based FWI, particularly in biomedical imaging applications like USCT with larger computational domains (around 100 wavenumbers), remains unexplored.

3 Problem Definition

3.1 Image Formation Model of USCT

USCT acquires data by first activating point sources with known locations and subsequently measuring the corresponding scattered acoustic signals that have been modulated by internal biological tissues. In steady-state conditions, the propagation of ultrasound waves can be accurately described by a heterogeneous Helmholtz equation, assuming negligible shear motion and nonlinear effects,

[2+(ωc(x))2]u(x)=ρ(x),delimited-[]superscript2superscript𝜔𝑐𝑥2𝑢𝑥𝜌𝑥{\left[\nabla^{2}+\left(\frac{\omega}{c(x)}\right)^{2}\right]u(x)}=-\rho(x),[ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG italic_ω end_ARG start_ARG italic_c ( italic_x ) end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] italic_u ( italic_x ) = - italic_ρ ( italic_x ) , (1)

where r=|x|𝑟𝑥r=|x|italic_r = | italic_x |, ω𝜔\omegaitalic_ω is the angular frequency of ultrasound waves, c(x)𝑐𝑥c(x)italic_c ( italic_x ) is the spatial distribution of sound speed, ρ(x)𝜌𝑥\rho(x)italic_ρ ( italic_x ) is the source term and u(x)𝑢𝑥u(x)italic_u ( italic_x ) is the complex acoustic field. In the context of USCT, the heterogeneity in sound speed c(x)𝑐𝑥c(x)italic_c ( italic_x ) is confined to a specific domain of interest (DOI), while the sound speed remains constant c0subscript𝑐0c_{0}italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT outside of this domain. To account for this boundary condition, the Helmholtz equation must satisfy the Sommerfeld radiation condition:

limrrn12(riωc0)u(x)=0.subscript𝑟superscript𝑟𝑛12𝑟𝑖𝜔subscript𝑐0𝑢𝑥0\lim_{r\rightarrow\infty}r^{\frac{n-1}{2}}\left(\frac{\partial}{\partial r}-i% \frac{\omega}{c_{0}}\right)u(x)=0.roman_lim start_POSTSUBSCRIPT italic_r → ∞ end_POSTSUBSCRIPT italic_r start_POSTSUPERSCRIPT divide start_ARG italic_n - 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ( divide start_ARG ∂ end_ARG start_ARG ∂ italic_r end_ARG - italic_i divide start_ARG italic_ω end_ARG start_ARG italic_c start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG ) italic_u ( italic_x ) = 0 . (2)

A full USCT observation employs a sequential activation scheme, where point sources, represented by ρk(x)subscript𝜌𝑘𝑥\rho_{k}(x)italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ),are activated one at a time using N𝑁Nitalic_N transducers arranged in an annular configuration. The corresponding scattered acoustic fields are measured at the same M𝑀Mitalic_M transducer locations, resulting in a measurement dataset denoted as y={yk;yk=uk(xf)+nk}𝑦subscript𝑦𝑘subscript𝑦𝑘subscript𝑢𝑘subscript𝑥𝑓subscript𝑛𝑘y=\{y_{k};y_{k}=u_{k}(x_{f})+n_{k}\}italic_y = { italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ; italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) + italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }, where k=1,,M𝑘1𝑀k=1,\ldots,Mitalic_k = 1 , … , italic_M represents the point source index, xfMsubscript𝑥𝑓superscript𝑀x_{f}\in\mathbb{R}^{M}italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT defines the locations of all transducers, and nksubscript𝑛𝑘n_{k}italic_n start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the measurement noise.

3.2 Inverse Problem: Frequency-domain FWI

The central goal of USCT is to reconstruct the spatial distribution of sound speed within biological tissues, denoted by c(x)𝑐𝑥c(x)italic_c ( italic_x ), using the measurements obtained from the transducers, represented by y𝑦yitalic_y. This inverse problem can be formulated as a PDE-constrained optimization problem:

minc(x),uk(x)L=k=1MLk=k=1Mykuk(xf)22s.t.[2+(ωc(x))2]uk(x)=ρk(x).formulae-sequence𝑐𝑥subscript𝑢𝑘𝑥𝐿superscriptsubscript𝑘1𝑀subscript𝐿𝑘superscriptsubscript𝑘1𝑀superscriptsubscriptdelimited-∥∥subscript𝑦𝑘subscript𝑢𝑘subscript𝑥𝑓22𝑠𝑡delimited-[]superscript2superscript𝜔𝑐𝑥2subscript𝑢𝑘𝑥subscript𝜌𝑘𝑥\begin{split}\underset{c(x),u_{k}(x)}{\min}&\ L=\sum_{k=1}^{M}L_{k}=\sum_{k=1}% ^{M}\|y_{k}-u_{k}(x_{f})\|_{2}^{2}\\ s.t.&{\left[\nabla^{2}+\left(\frac{\omega}{c(x)}\right)^{2}\right]u_{k}(x)}=-% \rho_{k}(x).\end{split}start_ROW start_CELL start_UNDERACCENT italic_c ( italic_x ) , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) end_UNDERACCENT start_ARG roman_min end_ARG end_CELL start_CELL italic_L = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∥ italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_s . italic_t . end_CELL start_CELL [ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG italic_ω end_ARG start_ARG italic_c ( italic_x ) end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) = - italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) . end_CELL end_ROW (3)

Since the Helmholtz equation governs steady-state wave propagation, namely the wavefield in the frequency domain, this inverse problem is commonly referred to as frequency-domain full waveform inversion (FD-FWI). Akin to many other inverse problems, FD-FWI is typically solved using gradient-based optimizers. A prevalent approach for computing the gradient, Lk/csubscript𝐿𝑘𝑐\partial L_{k}/\partial c∂ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / ∂ italic_c, in FWI is the adjoint method,

[2+(ωc(x))2]uk(x)=ρk(x),[2+(ωc(x))2]λk(x)=i=1M[uk(xf(i))yk(i)]δ(xf(i)),Lkc(x)=2ω2λk(x)uk(x)c(x)3,formulae-sequencedelimited-[]superscript2superscript𝜔𝑐𝑥2subscript𝑢𝑘𝑥subscript𝜌𝑘𝑥formulae-sequencedelimited-[]superscript2superscript𝜔𝑐𝑥2subscript𝜆𝑘𝑥superscriptsubscript𝑖1𝑀delimited-[]subscript𝑢𝑘superscriptsubscript𝑥𝑓𝑖superscriptsubscript𝑦𝑘𝑖𝛿superscriptsubscript𝑥𝑓𝑖subscript𝐿𝑘𝑐𝑥2superscript𝜔2superscriptsubscript𝜆𝑘𝑥subscript𝑢𝑘𝑥𝑐superscript𝑥3\begin{split}{\left[\nabla^{2}+\left(\frac{\omega}{c(x)}\right)^{2}\right]u_{k% }(x)}&=-\rho_{k}(x),\\ {\left[\nabla^{2}+\left(\frac{\omega}{c(x)}\right)^{2}\right]\lambda_{k}(x)}&=% \sum_{i=1}^{M}[u_{k}(x_{f}^{(i)})-y_{k}^{(i)}]\delta(x_{f}^{(i)}),\\ \frac{\partial L_{k}}{\partial c}(x)&=-2\omega^{2}\frac{\lambda_{k}^{\star}(x)% u_{k}(x)}{c(x)^{3}},\end{split}start_ROW start_CELL [ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG italic_ω end_ARG start_ARG italic_c ( italic_x ) end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) end_CELL start_CELL = - italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) , end_CELL end_ROW start_ROW start_CELL [ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG italic_ω end_ARG start_ARG italic_c ( italic_x ) end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT [ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ] italic_δ ( italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_c end_ARG ( italic_x ) end_CELL start_CELL = - 2 italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_x ) italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG italic_c ( italic_x ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG , end_CELL end_ROW (4)

where i𝑖iitalic_i denotes the index of USCT transducers and δ()𝛿\delta(\cdot)italic_δ ( ⋅ ) defines a normalized point source at a specific transducer location. The gradient is proportional to the product of two wavefields, where uk(x)subscript𝑢𝑘𝑥u_{k}(x)italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) is the forward simulation result for source ρksubscript𝜌𝑘\rho_{k}italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and λk(x)subscript𝜆𝑘𝑥\lambda_{k}(x)italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) arises from the backward simulation where the source term is defined based on the discrepancies between forward predictions and measured data. (See Supplementary Material for theoretical details.) Based Eq. 4, a gradient update requires solving the Helmholtz equation twice for each source, in total 2M2𝑀2M2 italic_M times for a full observation. The computational efficiency of Helmholtz equation solvers is the primary time bottleneck for FWI.

4 Method

The earlier discussions have highlighted the challenges involved in solving large-scale Helmholtz equations for biomedical FWI. In response, this section outlines our approach which employs a novel neural operator to tackle these challenges. Section 4.1 introduces an AI-assisted pipeline for creating USCT datasets. These datasets are distinctive for their anatomically accurate targets and incorporation of real-world experimental hardware configurations. Following this, Section 4.2 introduces the NBSO, our novel neural network architecture tailored to approximate the complex, highly oscillatory solutions of Helmholtz equations. Finally, Section 4.3 details our approach for applying the NBSO to solve frequency-domain FWI.

Refer to caption
Figure 2: USCT dataset construction pipeline: (a) Breast: 3D breast phantoms are created, assigned sound speeds, and then sliced into 2D maps. (b) Brain: MRI datasets are processed with the SAM vision model [13] for semantic segmentation, mapping each region to a sound speed range. Wavefields for both are simulated using the CBS solver with experimental parameters.

4.1 USCT Dataset with Realistic Anatomy and Experimental Scenario

Current studies on neural operators for Helmholtz equations often employ oversimplified setups. These studies typically create heterogeneous sound speeds for operator learning based on public datasets, such as MNIST [42], or Gaussian/Uniform random fields[48, 35]. Additionally, their solution domains are confined to fewer than 20-30 wavelengths, much smaller than the real-world field of view (FOV) in biomedical USCT applications. To develop a more effective neural operator for USCT FWI, we have created a USCT dataset featuring anatomically realistic phantoms and corresponding wavefields simulated using parameters from an actual USCT system. Our dataset comprises 9590 human organ phantoms, including 8000 breasts and 1590 brains. For each phantom, we simulate wavefields from 64 different sources, resulting in a total of 613,760 input-output data pairs. Our dataset will be publicly released upon request.

4.1.1 Generation of Realistic Breast Phantoms

The breast phantoms in our study are created using a specialized tool from the Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) project at the US Food and Drugs Administration (US-FDA). This tool is capable of generating diverse tissue maps of breast anatomy. Subsequently, we follow [17] to build sound speed distributions based on different breast density types, including all-fatty, fibroglandular, heterogeneous, and extremely dense breasts. The breast data generation process consists of three steps: 1) First, we use the VICTRE tool to generate ten 3D breast models. 2) Second, the 3D models are converted into 2D phantoms through a process of random slicing. 3) Finally, the area surrounding the breast models is filled with water to replicate experimental conditions. This methodology has enabled us to generate a comprehensive dataset of 8000 anatomically realistic breast samples.

4.1.2 Generation of Realistic Brain Phantoms

In contrast to the breast phantoms, our brain phantoms are derived by adapting data from another prevalent transcranial imaging modality, MRI. An open-source brain MRI dataset, FastMRI [1], combined with physics-inspired style transfer techniques, is utilized to generate anatomically accurate brain sound speeds. This process also involves three steps: 1) First, we conduct semantic segmentation of T2-weighted MRI images using the SAM vision foundational model. 2) Second, each segmented MRI region is mapped to a sound speed range that aligns with physical reality. 3) Finally, the surrounding medium is assigned the sound speed of water. Employing this method, we have effectively generated 1590 brain samples. Due to MRI’s limited capacity to capture bone structures, the skull element has been intentionally excluded from our current brain phantom creation. We will incorporate the skull in data generation in our future work, as it is the most important scattering interface in brain.

4.1.3 Wavefield Simulation using Experimental USCT Settings

Our dataset’s wavefields are simulated using the experimental parameters of a real annular USCT system [41]. This system comprises 512 transducers uniformly distributed around a ring with a diameter of 22 centimeters. The system operates within a frequency range of 0.4 MHz to 1.2 MHz, corresponding to an acoustic wavelength of 1.25 mm to 3.75 mm. In our simulations, we focus on wave propagation at a fixed frequency of 0.5 MHz, so the FOV contains around 80 wavenumbers. For each target within our study, 64 wavefields are simulated. We achieve this by selecting source locations at uniform intervals, utilizing every 8th transducer in the array. The numerical solutions of Helmholtz equations are generated using the Convergent Born Series (CBS) algorithm (see Sec.4.2.1 for details).

4.2 Neural Born Series Operator

In this section, we introduce the NBSO, a proficient tool designed to accurately and efficiently approximate the implicit mapping 𝒮:(c,ρ)u:𝒮maps-to𝑐𝜌𝑢\mathcal{S}:(c,\rho)\mapsto ucaligraphic_S : ( italic_c , italic_ρ ) ↦ italic_u, as defined by the Helmholtz equation in Eq. 1. Drawing inspiration from the CBS solver for the Helmholtz equation, the NBSO utilizes an iterative scheme in developing its neural network architecture. Moreover, the inputs and outputs within each iterative block of the NBSO are carefully designed based on physical concepts, thereby enhancing the operator’s ability to approximate and generalize effectively.

4.2.1 Born Series and Convergent Born Series

By defining a scattering potential v=(ωc(x))2κ2iϵ𝑣superscript𝜔𝑐𝑥2superscript𝜅2𝑖italic-ϵv=\left(\frac{\omega}{c(x)}\right)^{2}-\kappa^{2}-i\epsilonitalic_v = ( divide start_ARG italic_ω end_ARG start_ARG italic_c ( italic_x ) end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_κ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_i italic_ϵ with κ2,ϵ>0superscript𝜅2italic-ϵ0\kappa^{2},\epsilon>0italic_κ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_ϵ > 0, we can reformulate the Helmholtz equation in Eq. 1 as:

𝒮u(x)[2+κ2+iϵ]u(x)=ρ(x)v(x)u(x).superscript𝒮𝑢𝑥delimited-[]superscript2superscript𝜅2𝑖italic-ϵ𝑢𝑥𝜌𝑥𝑣𝑥𝑢𝑥\mathcal{S}^{\prime}u(x)\triangleq\left[\nabla^{2}+\kappa^{2}+i\epsilon\right]% u(x)=-\rho(x)-v(x)u(x).caligraphic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_u ( italic_x ) ≜ [ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_κ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_i italic_ϵ ] italic_u ( italic_x ) = - italic_ρ ( italic_x ) - italic_v ( italic_x ) italic_u ( italic_x ) . (5)

This leads to the derivation of the Born series for the Helmholtz equation:

u=𝒢ρ+𝒢vuu=n=0(𝒢v)n𝒢ρ,𝑢𝒢𝜌𝒢𝑣𝑢𝑢superscriptsubscript𝑛0superscript𝒢𝑣𝑛𝒢𝜌u=\mathcal{G}\rho+\mathcal{G}vu\implies u=\sum_{n=0}^{\infty}(\mathcal{G}v)^{n% }\mathcal{G}\rho,italic_u = caligraphic_G italic_ρ + caligraphic_G italic_v italic_u ⟹ italic_u = ∑ start_POSTSUBSCRIPT italic_n = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( caligraphic_G italic_v ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT caligraphic_G italic_ρ , (6)

where 𝒢=1(p2κ2iϵ)1𝒢superscript1superscriptsuperscript𝑝2superscript𝜅2𝑖italic-ϵ1\mathcal{G}=\mathcal{F}^{-1}\circ(p^{2}-\kappa^{2}-i\epsilon)^{-1}\circ% \mathcal{F}caligraphic_G = caligraphic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∘ ( italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_κ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_i italic_ϵ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∘ caligraphic_F is the Green operator of 𝒮superscript𝒮\mathcal{S}^{\prime}caligraphic_S start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, \mathcal{F}caligraphic_F is the Fourier transform operator and p2=px2+py2superscript𝑝2superscriptsubscript𝑝𝑥2superscriptsubscript𝑝𝑦2p^{2}=p_{x}^{2}+p_{y}^{2}italic_p start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_p start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_p start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is the squared magnitude of the Fourier transformed coordinates. While the Born Series has been effective for solving Helmholtz equations with weak scattering potentials, its convergence is limited in the presence of strongly scattering samples. This poses challenges for practical wave simulations, including biomedical USCT.

To address this limitation, Osnabrugge et al. [31] proposed a modification to the Born series by introducing a preconditioner q=iv/ϵ𝑞𝑖𝑣italic-ϵq=iv/\epsilonitalic_q = italic_i italic_v / italic_ϵ. This approach involves multiplying both sides of Eq. 6 with q𝑞qitalic_q, leading to a new formulation

qu=q𝒢ρ+q𝒢vuu=n=0()n(qρ),=q𝒢v+(1q),\begin{split}qu=q\mathcal{G}\rho+q\mathcal{G}vu\implies\quad\quad\quad\quad\\ u=\sum_{n=0}^{\infty}(\mathcal{M})^{n}(q\mathcal{M}\rho),\mathcal{M}=q\mathcal% {G}v+(1-q),\end{split}start_ROW start_CELL italic_q italic_u = italic_q caligraphic_G italic_ρ + italic_q caligraphic_G italic_v italic_u ⟹ end_CELL end_ROW start_ROW start_CELL italic_u = ∑ start_POSTSUBSCRIPT italic_n = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( caligraphic_M ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_q caligraphic_M italic_ρ ) , caligraphic_M = italic_q caligraphic_G italic_v + ( 1 - italic_q ) , end_CELL end_ROW (7)

Here, the newly introduced operator \mathcal{M}caligraphic_M maintains a spectral radius smaller than 1 with a suitable choice of ϵitalic-ϵ\epsilonitalic_ϵ and κ0subscript𝜅0\kappa_{0}italic_κ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, ensuring convergence for strong scattering potentials. Consequently, this modified Born series is termed the Convergent Born Series (CBS).

4.2.2 Network Architecture

The NBSO’s network architecture is crafted based on the iterative structure of the CBS method. As depicted in Fig. 3, the NBSO comprises three distinct phases: 1) encoding the sound speed c𝑐citalic_c and the source ρ𝜌\rhoitalic_ρ into a latent space, 2) iteratively updating the latent wavefield states u^nsubscript^𝑢𝑛\hat{u}_{n}over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT using NBSO layers, and 3) projecting the final latent state back into the physical space. This architecture is mathematically represented as follows:

Phase 1:vc=θc(c),u^0=w0=θ𝚒𝚗𝚒𝚝(c,ρ),Phase 2:wn+1=θn(wn,vc),u^n+1=u^n+wn+1,Phase 3:u𝚘𝚞𝚝=𝒟θ𝙿𝚛𝚎𝚍(u^N).formulae-sequenceformulae-sequencePhase 1:subscript𝑣𝑐subscriptsubscript𝜃𝑐𝑐subscript^𝑢0subscript𝑤0subscriptsubscript𝜃𝚒𝚗𝚒𝚝𝑐𝜌formulae-sequencePhase 2:subscript𝑤𝑛1subscriptsubscript𝜃𝑛subscript𝑤𝑛subscript𝑣𝑐formulae-sequencesubscript^𝑢𝑛1subscript^𝑢𝑛subscript𝑤𝑛1Phase 3:subscript𝑢𝚘𝚞𝚝subscript𝒟subscript𝜃𝙿𝚛𝚎𝚍subscript^𝑢𝑁\begin{split}\text{Phase $1$:}&\quad v_{c}=\mathcal{E}_{\theta_{c}}(c),\quad% \hat{u}_{0}=w_{0}=\mathcal{E}_{\theta_{\text{init}}}(c,\rho),\\ \text{Phase $2$:}&\quad w_{n+1}=\mathcal{M}_{\theta_{n}}(w_{n},v_{c}),\quad% \hat{u}_{n+1}=\hat{u}_{n}+w_{n+1},\\ \text{Phase $3$:}&\quad u_{\text{out}}=\mathcal{D}_{\theta_{\text{Pred}}}(\hat% {u}_{N}).\end{split}start_ROW start_CELL Phase 1 : end_CELL start_CELL italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = caligraphic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_c ) , over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = caligraphic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT init end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_c , italic_ρ ) , end_CELL end_ROW start_ROW start_CELL Phase 2 : end_CELL start_CELL italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = caligraphic_M start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) , over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT = over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT + italic_w start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL Phase 3 : end_CELL start_CELL italic_u start_POSTSUBSCRIPT out end_POSTSUBSCRIPT = caligraphic_D start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT Pred end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) . end_CELL end_ROW (8)

In this model, vcsubscript𝑣𝑐v_{c}italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the latent state of sound speed, u^nsubscript^𝑢𝑛\hat{u}_{n}over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the latent state of the wavefield, wnsubscript𝑤𝑛w_{n}italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT is the additive correction per NBSO layer, u𝚘𝚞𝚝subscript𝑢𝚘𝚞𝚝u_{\text{out}}italic_u start_POSTSUBSCRIPT out end_POSTSUBSCRIPT is the output wavefield in the physical space, with n𝑛nitalic_n ranging from 00 to N𝑁Nitalic_N. The encoders θcsubscriptsubscript𝜃𝑐\mathcal{E}_{\theta_{c}}caligraphic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_POSTSUBSCRIPT and θ𝚒𝚗𝚒𝚝subscriptsubscript𝜃𝚒𝚗𝚒𝚝\mathcal{E}_{\theta_{\text{init}}}caligraphic_E start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT init end_POSTSUBSCRIPT end_POSTSUBSCRIPT process sound speed and the initial wavefield, respectively, while θnsubscriptsubscript𝜃𝑛\mathcal{M}_{\theta_{n}}caligraphic_M start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT acts as a CBS-like operator to update the latent wavefield. This operator is further expanded as

θn(wn,vc)=Φθn(1Rθn(wnvc)),subscriptsubscript𝜃𝑛subscript𝑤𝑛subscript𝑣𝑐subscriptΦsubscript𝜃𝑛superscript1subscript𝑅subscript𝜃𝑛subscript𝑤𝑛subscript𝑣𝑐\mathcal{M}_{\theta_{n}}(w_{n},v_{c})=\Phi_{\theta_{n}}\left(\mathcal{F}^{-1}% \circ R_{\theta_{n}}\circ\mathcal{F}(w_{n}v_{c})\right),caligraphic_M start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) = roman_Φ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( caligraphic_F start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∘ italic_R start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ caligraphic_F ( italic_w start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ) ) , (9)

where ΦθnsubscriptΦsubscript𝜃𝑛\Phi_{\theta_{n}}roman_Φ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT is a neural network and Rθnsubscript𝑅subscript𝜃𝑛R_{\theta_{n}}italic_R start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT represents the trainable Fourier modes. Each NBSO layer is assigned independent weights to enhance the modeling capacity. The decoder 𝒟θ𝙿𝚛𝚎𝚍subscript𝒟subscript𝜃𝙿𝚛𝚎𝚍\mathcal{D}_{\theta_{\text{Pred}}}caligraphic_D start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT Pred end_POSTSUBSCRIPT end_POSTSUBSCRIPT then predicts the wavefield in physical space.

Our NBSO aligns closely with the CBS method, differing from other Born series inspired methods in that the initial iteration state depends on both c𝑐citalic_c and ρ𝜌\rhoitalic_ρ, not just ρ𝜌\rhoitalic_ρ. This design significantly enhances convergence and accuracy, especially in scenarios involving strongly scattering media.

Moreover, our studies suggest that learning the complex scattering amplitude is much more efficient than approximating the entire wavefield directly. In this methodology, the complete wavefield prediction is constructed as the product of the neural network’s output (namely scattering amplitude), u𝚘𝚞𝚝subscript𝑢𝚘𝚞𝚝u_{\text{out}}italic_u start_POSTSUBSCRIPT out end_POSTSUBSCRIPT, and the homogeneous field for the source ρ𝜌\rhoitalic_ρ in water, u𝚒𝚗(ρ)subscript𝑢𝚒𝚗𝜌u_{\text{in}}(\rho)italic_u start_POSTSUBSCRIPT in end_POSTSUBSCRIPT ( italic_ρ ),

u𝙿𝚛𝚎𝚍=(1+u𝚘𝚞𝚝)u𝚒𝚗(ρ).subscript𝑢𝙿𝚛𝚎𝚍1subscript𝑢𝚘𝚞𝚝subscript𝑢𝚒𝚗𝜌u_{\text{Pred}}=(1+u_{\text{out}})u_{\text{in}}(\rho).italic_u start_POSTSUBSCRIPT Pred end_POSTSUBSCRIPT = ( 1 + italic_u start_POSTSUBSCRIPT out end_POSTSUBSCRIPT ) italic_u start_POSTSUBSCRIPT in end_POSTSUBSCRIPT ( italic_ρ ) . (10)

This approach significantly alleviates the challenge for the neural operator in terms of learning the high-frequency oscillations that are characteristic of the homogeneous field.

Refer to caption
Figure 3: Overview of NBSO architecture: The NBSO closely follows the CBS solver process, involving: 1) Encoding sound speed c𝑐citalic_c and source ρ𝜌\rhoitalic_ρ into latent states, vcsubscript𝑣𝑐v_{c}italic_v start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT, u^0subscriptnormal-^𝑢0\hat{u}_{0}over^ start_ARG italic_u end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and w0subscript𝑤0w_{0}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT; 2) Iteratively updating these latent states using NBSO layers; 3) Projecting the final latent state back into physical space to form the wavefield.

4.3 NBSO-based USCT FWI

Having established the NBSO as a precise and efficient surrogate model for the Helmholtz equation, we are well-prepared to compute the FWI gradient via the adjoint method, as specified in Eq. 4. This computation involves running the NBSO twice for each transducer source, which includes conducting a forward simulation followed by residual back-propagation. Armed with the calculated gradient, we then employ the Limited-memory BFGS algorithm is to reconstruct the sound speed of target biological tissues.

Refer to caption
Figure 4: Wavefield predictions for a breast by NBSO, FNO, BFNO, and UNet (real-part displayed). The CBS result serves as the ground-truth reference. NBSO reveals superior accuracy in capturing high-frequency features relative to the other neural network approaches.

5 Results

In this section, we evaluate the efficacy of our proposed method for both forward simulations and inverse problems in USCT. In Sec. 5.1, we begin by comparing NBSO against three esteemed neural network baselines—FNO [20], BFNO [48], and UNet [34]—in solving forward Helmholtz equations. Subsequently, in Sec. 5.2, we delve into the performance of NBSO in FWI reconstructions across different scenarios. These outcomes are also benchmarked against results from other neural networks and conventional CBS solvers. All comparisons utilize datasets from both breast and brain USCT datasets. Implementation details is provided in the supplementary material. The code will be made publicly available upon request.

5.1 USCT Forward Simulation

In this section, we conduct extensive testing of NBSO’s prediction accuracy on our realistic USCT dataset, as described in Sec. 4.1. We implement two distinct settings for our analysis: 1) we train the neural networks using data with a fixed source located on the left-most transducer, and 2) we train neural networks utilizing data from 64 distinct sources. Essentially, the first setting assesses NBSO’s proficiency in mapping sound speed to wavefield, whereas the second evaluates its generalization capability in solving the Helmholtz Equation with varied source locations. We measure wavefield prediction accuracy quantitatively using the relative rooted mean squared error (RRMSE) between the neural network’s predicted wavefield u^(x)^𝑢𝑥\hat{u}(x)over^ start_ARG italic_u end_ARG ( italic_x ) and the true wavefield u(x)𝑢𝑥u(x)italic_u ( italic_x )

RRMSE(u(x),u^(x))=|u(x)u^(x)|2𝑑x|u(x)|2𝑑x𝑅𝑅𝑀𝑆𝐸𝑢𝑥^𝑢𝑥superscript𝑢𝑥^𝑢𝑥2differential-d𝑥superscript𝑢𝑥2differential-d𝑥RRMSE(u(x),\hat{u}(x))=\sqrt{\frac{\int|u(x)-\hat{u}(x)|^{2}dx}{\int|u(x)|^{2}% dx}}italic_R italic_R italic_M italic_S italic_E ( italic_u ( italic_x ) , over^ start_ARG italic_u end_ARG ( italic_x ) ) = square-root start_ARG divide start_ARG ∫ | italic_u ( italic_x ) - over^ start_ARG italic_u end_ARG ( italic_x ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_x end_ARG start_ARG ∫ | italic_u ( italic_x ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_x end_ARG end_ARG (11)

Both breast and brain datasets are used to test all baseline methods. Considering the unequal number of breast and brain phantoms, we ensure comparable metrics by training and validating neural networks using a subset of breasts (2400 for training, 240 for validation) and the entire set of brains (1500 for training, 90 for validation).

Single-source Multi-source
Brain Breast Brain Breast
NBSO 19.2 20.8 19.0 20.7
FNO 20.8 21.9 19.6 23.2
BFNO 19.7 21.5 19.1 21.7
UNet 22.6 31.8 23.1 27.2
NBSO{}^{\star}start_FLOATSUPERSCRIPT ⋆ end_FLOATSUPERSCRIPT 19.7 22.1 - -
FNO{}^{\star}start_FLOATSUPERSCRIPT ⋆ end_FLOATSUPERSCRIPT 21.8 22.2 - -
BFNO{}^{\star}start_FLOATSUPERSCRIPT ⋆ end_FLOATSUPERSCRIPT 20.1 22.2 - -
UNet{}^{\star}start_FLOATSUPERSCRIPT ⋆ end_FLOATSUPERSCRIPT 24.6 32.5 - -
Table 1: Comparison of the prediction error (RRMSE) between NBSO and baseline neural networks (FNO, BFNO, UNet). Each row displays results from a specific network architecture, while each column specifies the dataset and source configuration for neural network training. Methods marked by a star do not incorporate scattering amplitude learning. NBSO consistently outperforms other baseline neural networks in prediction accuracy, especially in the multi-source setting.
Refer to caption
Figure 5: USCT FWI reconstruction of breasts using CBS, NBSO, FNO, BFNO, and UNet. Results of two breast types ((a) fibroglandular and (b) all-fatty) and three SNR levels (Noise-Free, 10dB, 5dB) are reported. 256 transducers are involved for emitting and receiving. NBSO consistently outperforms other neural network-based methods in reconstruction quality, especially evident for the all-fatty breast.

The RRMSE results for NBSO and other baseline neural networks are presented in Table 1. In our experiments, we not only adhere to the original formulations (marked by superscripted star) of FNO, BFNO, and UNet, but also test their modified versions (without star) integrated with our scattering amplitude learning methodology. As demonstrated in the left half of Table 1, the scattering amplitude learning improves all frameworks in the fixed source scenario. Consequently, in the experiments with multiple sources, network performance is evaluated only with the scattering amplitude incorporated. As demonstrated in Table 1, NBSO surpasses all other baselines in both breast and brain datasets, regardless of the source configurations. Fig. 4 provides a visual comparison of the predicted wavefields for a representative breast phantom, as generated by various baselines, all trained on breast datasets and assuming multiple sources. NBSO captures high-frequency features with greater accuracy than other methods.

5.2 USCT Full Waveform Inversion

In this section, we evaluate the reconstruction quality of USCT FWI using the NBSO alongside various baseline methods. Our experimental setup simulates the USCT system with 256 transducers uniformly distributed around the ring, tasked with source activation and signal collection. As detailed in Sec. 4.3, image reconstruction for all methods is conducted using the adjoint method in conjunction with the LBFGS optimizer, and the quality of image reconstruction is assessed using SSIM and PSNR metrics. Given that our USCT FWI requires computing wavefields from multiple sources for gradient calculation, only neural networks trained in the multi-source setting are used in this section.

Table. 2 showcases the USCT FWI’s performance on 30 breast phantoms using NBSO, FNO, UNet, BFNO, and CBS solvers at different SNR levels (noise-free, 10dB, 5dB) for observed data. NBSO consistently outperforms all neural network-based methods in reconstruction quality, regardless of noise levels. This is further illustrated in Fig. 5, where we display the reconstruction results for breast phantoms of two different types, all-fatty and fibroglandular. The NBSO-based FWI effectively recovers key anatomical features in both cases, whereas other neural networks occasionally introduce significant artifacts. The superior performance of NBSO over other baselines is especially evident when applied to all-fatty breast tissues, where the higher contrast of sound speed indicates stronger scattering effects. Remarkably, the addition of measurement noise does not substantially degrades NBSO’s reconstructions, suggesting that the neural operator might possess some implicit prior knowledge about the target, acquired from training on anatomically realistic data. However, it is also noted that NBSO’s FWI results appear blurrier compared to the ground-truth and CBS-FWI, indicating NBSO’s potential shortfall in capturing high-frequency wavefield features.

SNR Metrics CBS NBSO FNO BFNO UNet
Clean PSNR 33.1 27.8 27.0 26.1 26.1
SSIM 0.958 0.907 0.901 0.892 0.890
10dB PSNR 31.3 27.5 26.7 26.00 26.1
SSIM 0.886 0.894 0.894 0.888 0.888
5dB PSNR 28.2 27.2 26.6 25.9 25.9
SSIM 0.796 0.889 0.881 0.882 0.884
Table 2: Comparison of FWI reconstruction quality across 30 breast phantoms. Each row indicates the reconstruction PSNR & SSIM at a specific SNR level, and each column corresponds to a different neural network architecture. NBSO consistently exhibits superior reconstruction quality compared to other neural networks.
Quantity Single Batch
CBS(CPU) mean 0.197s 47.994s
std 0.005s 1.26s
CBS(GPU) mean 0.036s 7.160s
std 0.003s 0.82s
NBSO(GPU) mean 0.004s 0.135s
std 0.0003s 0.015s
Table 3: Comparison of computation speeds between CBS and NBSO. NBSO is one to two orders of magnitude faster than CBS, with even greater performance in batch-processing scenarios.
Metrics Breast Brain
Mix PSNR 26.0 26.8
training SSIM 0.882 0.875
Breast PSNR 26.5 17.6
only SSIM 0.886 0.644
Brain PSNR 21.5 23.5
only SSIM 0.829 0.771
Table 4: Comparison of FWI reconstruction quality with NBSO trained on different datasets. Each row specifies the training set, while each column corresponds to different validation set.
Refer to caption
Figure 6: USCT FWI reconstruction using NBSO trained on different datasets. In descending order, the rows correspond to the ground-truth alongside FWI results from NBSO trained on mixed, breast-only, and brain-only data. NBSO-based FWI works well in reconstructing in-distribution data but struggles with out-of-distribution images. Mixed dataset training enhances generalizability, yielding the best performance in brain imaging.

While NBSO’s reconstruction quality may slightly lag behind that of CBS, its computational efficiency is substantially higher, as demonstrated in Table 3. We use Intel i7-13700KF and NVIDIA RTX4080ti for speed evaluations. NBSO surpasses GPU-accelerated CBS by an order of magnitude and CPU-based CBS by two orders of magnitude for a single source. Given the capability of neural networks for batch processing—predicting the wavefields of multiple inputs parallelly—the efficiency gap between NBSO and CBS becomes even more pronounced. In summary, NBSO presents a well-balanced solution, offering reasonable reconstruction quality alongside superior efficiency.

We further investigate the adaptability of NBSO-based FWI to out-of-distribution targets. As reported in Table 4 and illustrated in Fig. 6, we assess the reconstruction quality on both breast and brain phantoms, using NBSO trained on breast data, brain data, or a combination of both. NBSO demonstrates competent performance on targets that are within its training distribution. However, it shows a marked decrease in effectiveness for targets significantly divergent from its training set. Notably, training NBSO on a mixed dataset enhances its generalizability across datasets, yielding better image reconstruction quality than training on a single data type for brain imaging.

6 Conclusion

In this paper, we present the NBSO, a novel operator learning method designed to solve the Helmholtz equation with high accuracy and efficiency. Building on this, we developed an NBSO-based FWI pipeline for biomedical USCT image reconstruction. Our approach, rigorously trained and validated on breast and brain datasets under experimental USCT conditions, surpasses leading neural network baselines such as FNO, BFNO, and UNet. This success points to the feasibility of quasi-real-time USCT reconstruction using neural networks, opening new avenues for their clinical applications.

References

  • zbo [2019] fastMRI: An open dataset and benchmarks for accelerated MRI. arXiv, 2019.
  • Bachmann and Tromp [2020] Etienne Bachmann and Jeroen Tromp. Source encoding for viscoacoustic ultrasound computed tomography. The Journal of the Acoustical Society of America, 147(5):3221--3235, 2020.
  • Bernard et al. [2017] Simon Bernard, Vadim Monteiller, Dimitri Komatitsch, and Philippe Lasaygues. Ultrasonic computed tomography based on full-waveform inversion for bone quantitative imaging. Physics in Medicine and Biology, 62(17):7011--7035, 2017.
  • Cai et al. [2021] Shengze Cai, Zhicheng Wang, Lu Lu, Tamer A. Zaki, and George Em Karniadakis. DeepM&mnet: Inferring the electroconvection multiphysics fields based on operator approximation by neural networks. Journal of Computational Physics, 436:110296, 2021.
  • Cueto et al. [2022] Carlos Cueto, Lluis Guasch, Javier Cudeiro, Òscar Calderón Agudo, Thomas Robins, Oscar Bates, George Strong, and Meng-Xing Tang. Spatial response identification enables robust experimental ultrasound computed tomography. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 69(1):27--37, 2022.
  • Dai et al. [2023] Haocheng Dai, Michael Penwarden, Robert M. Kirby, and Joshi Sarang. Neural Operator Learning for Ultrasound Tomography Inversion. arXiv, 2023.
  • Di Leoni et al. [2021] P. Clark Di Leoni, L. Lu, C. Meneveau, G. Karniadakis, and T. A. Zaki. DeepONet prediction of linear instability waves in high-speed boundary layers. arXiv, 2021.
  • Fincke et al. [2022] Jonathan Fincke, Xiang Zhang, Bonghun Shin, Gregory Ely, and Brian W. Anthony. Quantitative sound speed imaging of cortical bone and soft tissue: Results from observational data sets. IEEE Transactions on Medical Imaging, 41(3):502--514, 2022.
  • Guasch et al. [2020] Lluís Guasch, Oscar Calderón Agudo, Meng-Xing Tang, Parashkev Nachev, and Michael Warner. Full-waveform inversion imaging of the human brain. npj Digital Medicine, 3(1):28, 2020.
  • Guibas et al. [2022] John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, and Bryan Catanzaro. Adaptive fourier neural operators: Efficient token mixers for transformers. arXiv, 2022.
  • He et al. [2023] Weiguang He, Yubing Li, and Lu Liu. Elastic vertically transversely isotropic full-waveform inversion: From synthetic to field data. Geophysics, 88(3):1MJ--V289, 2023.
  • Huang et al. [2014] Lianjie Huang, Youzuo Lin, Zhigang Zhang, Yassin Labyed, Sirui Tan, Nghia Q. Nguyen, Kenneth M. Hanson, Daniel Sandoval, and Michael Williamson M.d. Breast ultrasound waveform tomography: using both transmission and reflection data, and numerical virtual point sources. In Medical Imaging 2014: Ultrasonic Imaging and Tomography, pages 187--198. SPIE, 2014.
  • Kirillov et al. [2023] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything. arXiv, 2023.
  • Kleinman et al. [2020] R. Kleinman, G. Roach, and P Van Den Berg. Convergent born series improves the accuracy of numerical solution of timeindependent photoacoustic wave equation. 67(9):849–--855, 2020.
  • Kovachki et al. [2021] Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural Operator: Learning Maps Between Function Spaces. arXiv, 2021.
  • Lee et al. [2022] Moosung Lee, Hervé Hugonnet, and YongKeun Park. Inverse problem solver for multiple light scattering using modified born series. Optica, 9(2):177--182, 2022.
  • Li et al. [2022] Fu Li, Umberto Villa, Seonyeong Park, and Mark A. Anastasio. 3-d stochastic numerical breast phantoms for enabling virtual imaging trials of ultrasound computed tomography. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 69(1):135--146, 2022.
  • Li et al. [2023] Yubing Li, Jian Wang, Chang Su, Weijun Lin, Xiuming Wang, and Yi Luo. Quantitative ultrasound brain imaging with multiscale deconvolutional waveform inversion. Chinese Physics B, 32(1), 2023.
  • Li et al. [2020] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equations. arXiv, 2020.
  • Li et al. [2021] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. In ICLR, 2021.
  • Lin et al. [2021] Chensen Lin, Zhen Li, Lu Lu, Shengze Cai, Martin Maxey, and George Em Karniadakis. Operator learning for predicting multiscale bubble growth dynamics. J Chem Phys, 154(10):104118, 2021.
  • Lin and Huang [2014] Youzuo Lin and Lianjie Huang. Acoustic- and elastic-waveform inversion using a modified total-variation regularization scheme. Geophysical Journal International, 200(1):489--502, 2014.
  • Lozenski et al. [2023] Luke Lozenski, Hanchen Wang, Fu Li, Mark A Anastasio, Brendt Wohlberg, Youzuo Lin, and Umberto Villa. Learned Full Waveform Inversion Incorporating Task Information for Ultrasound Computed Tomography. arXiv, 2023.
  • Lu et al. [2019] Lu Lu, Pengzhan Jin, and George Karniadakis. DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators. arXiv, 2019.
  • Lu et al. [2021] Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George EM Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nature Machine Intelligence, 3, 2021.
  • Lu et al. [2022a] Lu Lu, Xuhui Meng, Shengze Cai, Zhiping Mao, Somdatta Goswami, Zhongqiang Zhang, and George EM Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Computer Methods in Applied Mechanics and Engineering, 393, 2022a.
  • Lu et al. [2022b] Lu Lu, Raphaël Pestourie, Steven G. Johnson, and Giuseppe Romano. Multifidelity deep neural operators for efficient learning of partial differential equations with application to fast inverse design of nanoscale heat transport. Phys. Rev. Res., 4(2):023210, 2022b. Publisher: American Physical Society.
  • Lucka et al. [2021] Felix Lucka, Mailyn Pérez-Liva, Bradley E Treeby, and Ben T Cox. High resolution 3d ultrasonic breast imaging by time-domain full waveform inversion. Inverse Problems, 38(2):025008, 2021.
  • Mao et al. [2021] Zhiping Mao, Lu Lu, Olaf Marxen, Tamer A. Zaki, and George Em Karniadakis. DeepM&mnet for hypersonics: Predicting the coupled flow and finite-rate chemistry behind a normal shock using neural-network approximation of operators. Journal of Computational Physics, 447:110698, 2021.
  • Martiartu et al. [2020] Naiara Korta Martiartu, Christian Boehm, and Andreas Fichtner. 3-d wave-equation-based finite-frequency tomography for ultrasound computed tomography. IEEE Trans. Ultrason., Ferroelect., Freq. Contr., 67(7):1332--1343, 2020.
  • Osnabrugge et al. [2016] Gerwin Osnabrugge, Saroch Leedumrongwatthanakun, and Ivo M. Vellekoop. A convergent born series for solving the inhomogeneous helmholtz equation in arbitrarily large media. Journal of Computational Physics, 322:113--124, 2016.
  • Pathak et al. [2022] Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, Pedram Hassanzadeh, Karthik Kashinath, and Animashree Anandkumar. FourCastNet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv, 2022.
  • Pérez-Liva et al. [2017] M. Pérez-Liva, J. L. Herraiz, J. M. Udías, E. Miller, B. T. Cox, and B. E. Treeby. Time domain reconstruction of sound speed and attenuation in ultrasound computed tomography using full wave inversion. J Acoust Soc Am, 141(3):1595, 2017.
  • Ronneberger et al. [2015] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention--MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234--241. Springer, 2015.
  • Stanziola et al. [2023] Antonio Stanziola, Simon Arridge, Ben T Cox, and Bradley E Treeby. A learned born series for highly-scattering media. JASA Express Letters, 3(5), 2023.
  • Virieux and Operto [2009] Jean Virieux and Stéphane Operto. An overview of full-waveform inversion in exploration geophysics. Geophysics, 74(6):WCC1--WCC26, 2009.
  • Wang et al. [2021a] Ning Wang, Wei Yan, Yurui Qu, Siqi Ma, Stan Z Li, and Min Qiu. Intelligent designs in nanophotonics: from optimization towards inverse creation. PhotoniX, 2(1):1--35, 2021a.
  • Wang et al. [2021b] Sifan Wang, Mohamed Aziz Bhouri, and Paris Perdikaris. Fast PDE-constrained optimization via self-supervised operator learning. 2021b.
  • Wen et al. [2022] Gege Wen, Zongyi Li, Kamyar Azizzadenesheli, Anima Anandkumar, and Sally M. Benson. U-FNO—an enhanced fourier neural operator-based deep-learning model for multiphase flow. Advances in Water Resources, 163:104180, 2022.
  • Wiskin et al. [2017] J. W. Wiskin, D. T. Borup, E. Iuanow, J. Klock, and Mark W. Lenox. 3-d nonlinear acoustic inverse scattering: Algorithm and quantitative results. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 64(8):1161--1174, 2017.
  • Wu et al. [2023] Xiaoqing Wu, Yubing Li, Chang Su, Panpan Li, Xiangda Wang, and Weijun Lin. Ultrasound computed tomography based on full waveform inversion with source directivity calibration. Ultrasonics, 132:107004, 2023.
  • Xiao et al. [2017] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv, 2017.
  • Xu et al. [2022] Dixiang Xu, Weiguang He, Lu Liu, and Yubing Li. Point cloud-based high-dimensional optimal transport for full waveform inversion. IEEE Transactions on Geoscience and Remote Sensing, 60:1--10, 2022.
  • Yang et al. [2023] Yan Yang, Angela F Gao, Kamyar Azizzadenesheli, Robert W Clayton, and Zachary E Ross. Rapid seismic waveform modeling and inversion with neural operators. IEEE Transactions on Geoscience and Remote Sensing, 61:1--12, 2023.
  • Yuan et al. [2023] Yu Yuan, Yue Zhao, Yang Xiao, Jing Jin, Naizhang Feng, and Yi Shen. Optimization of reconstruction time of ultrasound computed tomography with a piecewise homogeneous region-based refract-ray model. Ultrasonics, 127:106837, 2023.
  • Zeng et al. [2022] Qili Zeng, Shihang Feng, Brendt Wohlberg, and Youzuo Lin. InversionNet3d: Efficient and scalable learning for 3-d full-waveform inversion. IEEE Transactions on Geoscience and Remote Sensing, 60:1--16, 2022.
  • Zhang et al. [2012] Zhigang Zhang, Lianjie Huang, and Youzuo Lin. Efficient implementation of ultrasound waveform tomography using source encoding. In Medical Imaging 2012: Ultrasonic Imaging, Tomography, and Therapy, pages 22--31. SPIE, 2012.
  • Zhao et al. [2023] Qingqing Zhao, Yanting Ma, Petros Boufounos, Saleh Nabi, and Hassan Mansour. Deep born operator learning for reflection tomographic imaging. In ICASSP, 2023.
  • Zhou et al. [2023] Chenchen Zhou, Kailiang Xu, and Dean Ta. Frequency-domain full-waveform inversion-based musculoskeletal ultrasound computed tomography. The Journal of the Acoustical Society of America, 154(1):279--294, 2023.
  • Zhu et al. [2023] Min Zhu, Shihang Feng, Youzuo Lin, and Lu Lu. Fourier-DeepONet: Fourier-enhanced deep operator networks for full waveform inversion with improved accuracy, generalizability, and robustness. Computer Methods in Applied Mechanics and Engineering, 416:116300, 2023.
\thetitle

Supplementary Material

7 Adjoint Method for Frequency-domain FWI

The frequency-domain FWI can be formulated as a PDE-constrained optimization problem:

minc(x),uk(x)L=k=1MLk=k=1Mykuk(xf)22s.t.[2+(ωc(x))2]uk(x)=ρk(x).formulae-sequence𝑐𝑥subscript𝑢𝑘𝑥𝐿superscriptsubscript𝑘1𝑀subscript𝐿𝑘superscriptsubscript𝑘1𝑀superscriptsubscriptdelimited-∥∥subscript𝑦𝑘subscript𝑢𝑘subscript𝑥𝑓22𝑠𝑡delimited-[]superscript2superscript𝜔𝑐𝑥2subscript𝑢𝑘𝑥subscript𝜌𝑘𝑥\begin{split}\underset{c(x),u_{k}(x)}{\min}&\ L=\sum_{k=1}^{M}L_{k}=\sum_{k=1}% ^{M}\|y_{k}-u_{k}(x_{f})\|_{2}^{2}\\ s.t.&{\left[\nabla^{2}+\left(\frac{\omega}{c(x)}\right)^{2}\right]u_{k}(x)}=-% \rho_{k}(x).\end{split}start_ROW start_CELL start_UNDERACCENT italic_c ( italic_x ) , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) end_UNDERACCENT start_ARG roman_min end_ARG end_CELL start_CELL italic_L = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∥ italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_s . italic_t . end_CELL start_CELL [ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG italic_ω end_ARG start_ARG italic_c ( italic_x ) end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) = - italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) . end_CELL end_ROW (12)

A prevalent approach for computing the gradient, Lk/csubscript𝐿𝑘𝑐\partial L_{k}/\partial c∂ italic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / ∂ italic_c, in FWI is the adjoint method. Using the method of Lagrange multipliers, the problem can be converted into an unconstrained form

minc(x),uk(x),λk(x)=k=1Mk=k=1Mykuk(xf)22k=1Mλk(x),𝒮cuk(x)+ρk(x)𝑐𝑥subscript𝑢𝑘𝑥subscript𝜆𝑘𝑥superscriptsubscript𝑘1𝑀subscript𝑘superscriptsubscript𝑘1𝑀superscriptsubscriptdelimited-∥∥subscript𝑦𝑘subscript𝑢𝑘subscript𝑥𝑓22superscriptsubscript𝑘1𝑀subscript𝜆𝑘𝑥subscript𝒮𝑐subscript𝑢𝑘𝑥subscript𝜌𝑘𝑥\begin{split}\underset{c(x),u_{k}(x),\lambda_{k}(x)}{\min}\mathcal{L}=\sum_{k=% 1}^{M}\mathcal{L}_{k}=\sum_{k=1}^{M}\|y_{k}-u_{k}(x_{f})\|_{2}^{2}\\ -\sum_{k=1}^{M}\left\langle\lambda_{k}(x),\mathcal{S}_{c}u_{k}(x)+\rho_{k}(x)% \right\rangle\end{split}start_ROW start_CELL start_UNDERACCENT italic_c ( italic_x ) , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) , italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) end_UNDERACCENT start_ARG roman_min end_ARG caligraphic_L = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∥ italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ⟨ italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) , caligraphic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) + italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) ⟩ end_CELL end_ROW (13)

where \mathcal{L}caligraphic_L is the Lagrangian function, f,g𝑓𝑔\left\langle f,g\right\rangle⟨ italic_f , italic_g ⟩ denotes the real part of inner product of function f𝑓fitalic_f and g𝑔gitalic_g in L2()superscript𝐿2L^{2}(\mathbb{C})italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_C ), yksubscript𝑦𝑘y_{k}italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denotes the measurement obtained by transducer for source ρksubscript𝜌𝑘\rho_{k}italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and 𝒮csubscript𝒮𝑐\mathcal{S}_{c}caligraphic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT is the differential operator

𝒮c()=[2+(ωc(x))2](),subscript𝒮𝑐delimited-[]superscript2superscript𝜔𝑐𝑥2\mathcal{S}_{c}(\cdot)=\left[\nabla^{2}+\left(\frac{\omega}{c(x)}\right)^{2}% \right](\cdot),caligraphic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( ⋅ ) = [ ∇ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + ( divide start_ARG italic_ω end_ARG start_ARG italic_c ( italic_x ) end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] ( ⋅ ) , (14)

We then calculate the partial derivatives of ksubscript𝑘\mathcal{L}_{k}caligraphic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT with respect to λk,uk,csubscript𝜆𝑘subscript𝑢𝑘𝑐\lambda_{k},u_{k},citalic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_c, respectively. Setting kλk(x)=0subscript𝑘subscript𝜆𝑘𝑥0\frac{\partial\mathcal{L}_{k}}{\partial\lambda_{k}}(x)=0divide start_ARG ∂ caligraphic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ( italic_x ) = 0 yields the Helmholtz equations itself. Similarly, enforcing kuk(x)=0subscript𝑘subscript𝑢𝑘𝑥0\frac{\partial\mathcal{L}_{k}}{\partial u_{k}}(x)=0divide start_ARG ∂ caligraphic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG ( italic_x ) = 0 leads to the derivation of the adjoint equation,

𝒮cλk(x)=i=1M[uk(xf(i))yk(i)]δ(xf(i)),subscript𝒮𝑐subscript𝜆𝑘𝑥superscriptsubscript𝑖1𝑀delimited-[]subscript𝑢𝑘superscriptsubscript𝑥𝑓𝑖superscriptsubscript𝑦𝑘𝑖𝛿superscriptsubscript𝑥𝑓𝑖\mathcal{S}_{c}\lambda_{k}(x)=\sum_{i=1}^{M}[u_{k}(x_{f}^{(i)})-y_{k}^{(i)}]% \delta(x_{f}^{(i)}),caligraphic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT [ italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ] italic_δ ( italic_x start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ) , (15)

where i𝑖iitalic_i denotes the index of USCT transducers and δ()𝛿\delta(\cdot)italic_δ ( ⋅ ) defines a normalized point source at a specific transducer location. Substituting Eq. 14 and Eq. 15 into k/csubscript𝑘𝑐\partial\mathcal{L}_{k}/\partial c∂ caligraphic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT / ∂ italic_c results in

kc(x)=𝒮cc(x)λk(x)uk(x)=2ω2λk(x)uk(x)c(x)3,subscript𝑘𝑐𝑥subscript𝒮𝑐𝑐𝑥superscriptsubscript𝜆𝑘𝑥subscript𝑢𝑘𝑥2superscript𝜔2superscriptsubscript𝜆𝑘𝑥subscript𝑢𝑘𝑥𝑐superscript𝑥3\begin{split}\frac{\partial\mathcal{L}_{k}}{\partial c}(x)&=\frac{\partial% \mathcal{S}_{c}}{\partial c}(x)\lambda_{k}^{\star}(x)u_{k}(x)\\ &=-2\omega^{2}\frac{\lambda_{k}^{\star}(x)u_{k}(x)}{c(x)^{3}},\end{split}start_ROW start_CELL divide start_ARG ∂ caligraphic_L start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_c end_ARG ( italic_x ) end_CELL start_CELL = divide start_ARG ∂ caligraphic_S start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_c end_ARG ( italic_x ) italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_x ) italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = - 2 italic_ω start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT divide start_ARG italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⋆ end_POSTSUPERSCRIPT ( italic_x ) italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) end_ARG start_ARG italic_c ( italic_x ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_ARG , end_CELL end_ROW (16)

The gradient is proportional to the product of two wavefields, where uk(x)subscript𝑢𝑘𝑥u_{k}(x)italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) is the forward simulation result for source ρksubscript𝜌𝑘\rho_{k}italic_ρ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and λk(x)subscript𝜆𝑘𝑥\lambda_{k}(x)italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ( italic_x ) arises from the backward simulation whose source term is defined by the discrepancies between forward predictions and measured data.

8 Implementation Details

8.1 Numerical Settings for Data Generation

We refer to the Wavesim [31] for the details of the governing equations. We simulate the numerical solution on a square with side length of 24 centimeters. The step size of grid is 0.05cm thus the grid shape is 480×480480480480\times 480480 × 480. For boundary condition, we use an absorbing boundary layer (type ARL in Wavesim) whose boundary width is [75,75]7575[75,75][ 75 , 75 ] grid outside the solving region. We choose the points per wavelenght PPW=6𝑃𝑃𝑊6PPW=6italic_P italic_P italic_W = 6 for the absorbing boundary layer.

8.2 Hyperparameters of Baseline Models

FNO

We use a vanilla FNO model with 6 FNO layers whose modes are {[30]×6}delimited-[]306\{[30]\times 6\}{ [ 30 ] × 6 } and width is 20 in the single-source setting. For multi-source problem, we increase the layer number to 8 and set the modes and width as {[50]×9}delimited-[]509\{[50]\times 9\}{ [ 50 ] × 9 } and 40, respectively, to enlarge the representative ability.

BFNO

BFNO[48] is a modified version of FNO, so we use the same layer number and mode number as FNO for both settings.

UNet

We implement UNet using the same structure as [34] but a comparable model size to FNO for the sake of fairness.

  • Downsample Blocks: Each downsample block contains two Conv2d layer with LeakyReLU activation and GroupNorm. The first layer maps channel c𝚒𝚗subscript𝑐𝚒𝚗c_{\text{in}}italic_c start_POSTSUBSCRIPT in end_POSTSUBSCRIPT to c𝚘𝚞𝚝subscript𝑐𝚘𝚞𝚝c_{\text{out}}italic_c start_POSTSUBSCRIPT out end_POSTSUBSCRIPT with stride 2222 and kernel size 3333, while the second layer preserves the channel number but changes the stride number to 1111.

  • Upsample Blocks: The implementation of upsample blocks is the same as downsample blocks but with a skip channel in the first Conv2d layer.

We use the Unet structure with resolution size sequence {[60]×6,[120]×5,[240]×5,[480]×4}delimited-[]606delimited-[]1205delimited-[]2405delimited-[]4804\{[60]\times 6,[120]\times 5,[240]\times 5,[480]\times 4\}{ [ 60 ] × 6 , [ 120 ] × 5 , [ 240 ] × 5 , [ 480 ] × 4 } and 4 skip channels for Upsample block. An input block with the downsample structure using stride 1 is added to the beginning.

Problem Quantity NBSO FNO BFNO UNet
Single-source ##\## PARAM 10.1M 10.1M 10.1M 20.0M
##\## MEM 16.4G 12.0G 16.4G 11.8G
Multi-source ##\## PARAM 144M 144M 144M 36.0M
##\## MEM 35.5G 31.9G 35.5G 12.0G
Table 5: Number of trainable parameters (1st and 3rd row) and memory used (2nd and 4th row) of neural network models in single-source and multi-source setting.

8.3 Training of Neural Networks

The training of neural network models, including the baselines (FNO, BFNO, UNet), is performed with the ADAM optimizer, with a learning rate η=102𝜂superscript102\eta=10^{-2}italic_η = 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT, for 50 epochs in both single-source and multi-source settings. The training objective is to minimize the relative L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT loss function, incorporating a weight decay rate of 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT. We also implement a Cosine-Annealing learning rate scheduler with a step size of 10101010 and a minimum learning rate of ηmin=2×104subscript𝜂2superscript104\eta_{\min}=2\times 10^{-4}italic_η start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT = 2 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. In the multi-source setting, models are trained using 4 GPUs in the DeepSpeed framework with a batch size of 10, while single-source models are trained on a single GPU with the same batch size.

9 Results

9.1 Forward Simulation for Brain Phantoms

Fig. 7 offers a visual comparison of wavefields predicted by NBSO for a representative brain phantom, alongside those generated by baseline models trained on the same brain datasets and assuming multiple sources. NBSO captures high-frequency features with greater accuracy than other methods.

Refer to caption
Figure 7: Wavefield predictions for a brain by NBSO, FNO, BFNO, and UNet (real-part displayed). The CBS result serves as the ground-truth reference. NBSO reveals superior accuracy in capturing high-frequency features relative to the other neural network approaches.

9.2 USCT FWI for Brain Phantoms

Table. 6 showcases the USCT FWI’s performance on 30 brain phantoms using NBSO, FNO, UNet, BFNO, and CBS solvers at different SNR levels (noise-free, 10dB, 5dB) for observed data. NBSO consistently outperforms all neural network-based methods in reconstruction quality, regardless of noise levels. This is further illustrated in Fig. 8, where we display the reconstruction results for a brain phantom.

SNR Metrics CBS NBSO FNO BFNO UNet
Clean PSNR 28.3 23.8 23.0 22.6 21.0
SSIM 0.883 0.783 0.765 0.760 0.711
10dB PSNR 27.5 23.5 22.9 22.5 21.0
SSIM 0.818 0.762 0.757 0.758 0.709
5dB PSNR 25.7 23.1 22.7 22.4 20.9
SSIM 0.685 0.738 0.734 0.733 0.706
Table 6: Comparison of FWI reconstruction quality across 30 brain phantoms. Each row indicates the reconstruction PSNR & SSIM at a specific SNR level, and each column corresponds to a different neural network architecture. NBSO consistently exhibits superior reconstruction quality compared to other neural networks.
Refer to caption
Figure 8: USCT FWI reconstruction of brains using CBS, NBSO, FNO, BFNO, and UNet. Results of three SNR levels (Noise-Free, 10dB, 5dB) are reported. 256 transducers are involved for emitting and receiving. NBSO consistently outperforms other neural network-based methods in reconstruction quality.