Reviewing Material-Sensitive Computed Tomography: From Handcrafted Algorithms to Modern Deep Learning

Weiss, Moritz; Meisen, Tobias

doi:10.3390/ndt2030018

Open AccessSystematic Review

Reviewing Material-Sensitive Computed Tomography: From Handcrafted Algorithms to Modern Deep Learning

by

Moritz Weiss

^*

and

Tobias Meisen

^*

Institute for Technologies and Management of Digital Transformation, University of Wuppertal, 42119 Wuppertal, Germany

^*

Authors to whom correspondence should be addressed.

NDT 2024, 2(3), 286-310; https://doi.org/10.3390/ndt2030018

Submission received: 23 May 2024 / Revised: 10 July 2024 / Accepted: 22 July 2024 / Published: 30 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

Computed tomography (CT) is a widely utilised imaging technique in both clinical and industrial applications. CT scan results, presented as a volume revealing linear attenuation coefficients, are intricately influenced by scan parameters and the sample’s geometry and material composition. Accurately mapping these coefficients to specific materials is a complex task. Traditionally, material decomposition in CT relied on classical algorithms using handcrafted features based on X-ray physics. However, there is a rising trend towards data-driven approaches, particularly deep learning, which offer promising improvements in accuracy and efficiency. This survey explores the transition from classical to data-driven approaches in material-sensitive CT, examining a comprehensive corpus of literature identified through a detailed and reproducible search using Scopus. Our analysis addresses several key research questions: the origin and generation of training datasets, the models and architectures employed, the extent to which deep learning methods reduce the need for domain-specific expertise, and the hardware requirements for training these models. We explore the implications of these findings on the integration of deep learning into CT practices and the potential reduction in the necessity for extensive domain knowledge. In conclusion, this survey highlights a significant shift towards deep learning in material-resolving CT and discusses the challenges and opportunities this presents. The transition suggests a future where data-driven approaches may dominate, offering enhanced precision and robustness in material-resolving CT while potentially transforming the role of domain experts in the field.

Keywords:

X-ray imaging; computed tomography; dual-energy computed tomography; material identification; material decomposition

1. Introduction

Clinical diagnosis and industrial engineering have undergone a revolutionary transformation with the advent of X-ray computed tomography (CT) when Sir Godfrey Hounsfield performed the first clinical CT scan in 1971 using a self-invented scanner [1]. The relevance and importance of this invention are evident in the fact that it earned him the Nobel Prize for Medicine in 1979, which he shared with Allan Cormack [2]. Allan Cormack proposed the mathematical background for the recombination of all X-ray projections, which is essentially the reconstruction algorithm [2]. CT, traditionally rooted in medical imaging, has found its way into diverse fields, such as aerospace, automotive, geology, and archaeology, proving its versatility. With CT scanners, it becomes possible to visualise structures within a volume by measuring X-ray projections through an object and subsequently reconstructing the structure of the object, resulting in a so-called volume. This allows for both qualitative inspection, such as flaw detection, and quantitative analysis, including metrology and material quantification. Essentially, projections acquired by the detector represent the attenuation line integral along the X-ray’s path through the scanned object. Following the scan, a reconstruction algorithm, e.g., the practical cone-beam algorithm originally proposed by Feldkamp et al. [3], computes a volume that reveals linear attenuation coefficients

μ_{L}^{(x, y, z)}

per volume pixel (voxel). For a material-resolving algorithm, the goal is to discover a function that maps these attenuation coefficients to specific outputs. These outputs might be an atomic number, a density, or fractions of predetermined materials. We differentiate between algorithms that (a) decompose the volume into fractions of basis materials and (b) identify effective materials. Since the literature lacks a clear naming convention, we define identification and decomposition separately as follows. Algorithms that operate without prior knowledge of materials are referred to as material identification algorithms, and their purpose is to determine an effective material. Counterintuitively, the method called Z-Rho decomposition, which computes atomic numbers and densities, is a material identification method. In the case of composites, however, these algorithms typically identify the average value, which may lead to misinterpretations of the outcomes in terms of atomic number and density. Conversely, algorithms that explicitly incorporate prior knowledge, particularly when using basis materials as the linear independent basis, are termed material decomposition algorithms because they inherently decompose single voxels into fractions of these basis materials.

In this paper, we look for a methodical paradigm shift in material-resolving CT induced by modern deep learning trends. Depending on the specific task, it has been shown in a large number of application domains that deep learning is faster, more precise, and more robust compared to classical algorithms. Especially for computer vision tasks such as image processing, detection, classification, and segmentation, deep learning offers major benefits [4,5]. Post-processing CT data, which essentially consists of reconstructed volumes and projections that are images, can be viewed as an image-to-image translation task in computer vision, subject to the physical constraints of X-ray attenuation physics. This raises the main question of whether a comparable paradigm shift from classical feature engineering to deep learning can be found or expected soon in the field of CT. Such a paradigm shift might be driven by robust foundation models that are not limited to clinical or industrial CT, which is why we consider both domains in this paper. Therefore, we conduct a structured literature search and review, guided by four main research questions (RQs). The answers to these research questions may provide other researchers with a comprehensive overview of the current state of research and highlight possible next steps.

Training a deep learning model requires exemplary datasets, which significantly influence the model’s prediction quality. Thus, the origin or simulation method of these datasets is highly important, which will be the scope of RQ1. Next, we will examine the deep learning models in RQ2 regarding their architectures, training strategies, and whether they are specifically designed for material decomposition or identification tasks. As mentioned above, in computer vision, deep learning models have replaced manual (classical) feature engineering, almost entirely removing the need for extensive domain knowledge. The elimination of domain knowledge needed for material-resolving CT when using deep learning methods will be discussed in RQ3. Last but not least, we take a look at the hardware used for training the neural networks in RQ4. Our main contribution is an analysis of the expected paradigm shift in material-resolving CT with reference to the research questions:

RQ1:: Are there common datasets or simulation techniques used for deep learning in material-resolving CT?
RQ2:: Which deep learning models are used, and are they specifically engineered for material-sensitive CT applications?
RQ3:: Does employing deep learning eliminate the necessity for expert domain knowledge in material-sensitive CT?
RQ4:: What are the hardware requirements for training and executing the models?

We will answer these questions in the following sections, which are structured as follows. In Section 2, the physics background is provided, followed by a brief historical overview describing the roots of classical algorithms for material decomposition and material identification in computed tomography. The general methodology of the publication search and the selection criteria for the in-depth review are described in Section 3. Section 4 provides the in-depth review. The research questions mentioned above are answered in Section 5, which also provides a summary and an outlook.

2. Physics Background and History

For a number of photons

I_{0} (E)

at a certain energy E, the number of photons penetrating an object of intersection length s and linear attenuation coefficient

μ_{L} (E)

can be expressed by

I (E) = I_{0} (E) \cdot e^{- μ_{L} (E) \cdot s}

(1)

which is known as the Beer–Lambert law. In combination with a reconstruction algorithm, an CT scanner is a machine used to measure the linear attenuation coefficients inside an object. The linear attenuation coefficients strongly depend on the object’s material, as well as its density and the photon energies of the X-ray beam. Figure 1 shows the interaction processes that contribute to the measured linear attenuation coefficient for iron and photon energies up to

6 MeV

.

For energies up to

50 keV

, the attenuation process is dominated by photoabsorption. The incoming photon is absorbed by an electron. If the energy is high enough, the electron is ejected from the atom. The steep increase in the cross-section of photoabsorption in the innermost energy levels creates a characteristic absorption peak in the attenuation spectrum visible at

7.1 keV

, as indicated by the dashed line in Figure 1, which is called the K-edge. This process is called K-edge absorption, and it reveals characteristic information about the material from which the electron is ejected. The relaxation process of the ionised atom emits a characteristic photon, which may be detectable and used for material decomposition since its energy is unique to the material. This method is called X-ray fluorescence.

At greater photon energies, Compton scattering predominantly governs the attenuation process. This involves an inelastic scattering interaction between a photon from the X-ray beam and an electron within the material, causing attenuation. The photon’s scattering angle is directly correlated with its energy loss, which is described by the Klein–Nishina equation. Starting at

1022 keV

, the linear attenuation coefficient starts to reveal a new type of attenuation known as pair production. The photon energy is high enough to produce an electron–positron pair close to a nucleus, which is mandatory to conserve overall momentum. At even higher energies, the photon can interact with an electron and produce an electron–positron pair, which is called triplet production. Pair production is significant primarily in the context of LINAC-CT, a form of computed tomography that employs a linear accelerator (LINAC) as the primary source of electrons.

2.1. Motivation and Strategies for Multi-Energy CT

X-ray sources apply an acceleration voltage to an electron beam, which determines the upper energy for the photons being produced. The left plot in Figure 2 shows a typical photon spectrum emitted by an X-ray source. For such a photon spectrum, one can define a quantity called the effective energy,

E_{e}

, which is the energy of a monochromatic X-ray beam exhibiting the same penetration as the polychromatic beam through an object [6]. Typical values for the effective energy range from

30 %

to

40 %

of the maximum photon energy in a polychromatic spectrum [6]. Still, the effective energy of a photon spectrum heavily relies on the photon spectrum itself, as well as the attenuating material. In Figure 2 (left), the effective energy for each spectrum is highlighted by a dashed line. Qualitatively speaking, and interpreting the CT scanner as a machine to measure linear attenuation coefficients, both photon spectra apply their measurements at different energies on the linear attenuation curve from Figure 1. Therefore, multiple CT scans using distinct photon spectra (multi-energy CT) reveal more information about the entirety of the linear attenuation behaviour of the scanned sample.

In the case of two measurements, or more precisely, energy channels, the method is called dual-energy CT (DECT). The acquisition of dual-energy scan data can be carried out using different detector technologies:

Energy-integrating detectors can be used in multiple setups to measure two distinct energy channels. The easiest setup is called a dual scan, where the source’s photon spectrum is alternated by different pre-filtration and changing the acceleration voltage of the electrons in the X-ray tube. Due to the two subsequent, individual scans, a dual scan is technically possible on almost every CT scanner on the market. Since the subsequent scans are time-consuming and may reveal critical motion artefacts (e.g., patient motion in clinical CT), other scan setups may be preferred. Fast kV switching rapidly alternates the acceleration voltage of the X-ray tube in order to collect two distinct energy projections subsequently for a given position during a scan. Dual-source setups include two X-ray tubes and two detectors to acquire the projections from two energy channels simultaneously. Sandwich detectors consist of two (or more) stacked layers, where the different layers measure the projections at different energy channels.
Energy-discriminating photon-counting detectors (PCD) originate from clinical CT, where ideally, every single photon is detected not only spatially on the detector but also with energy information. The counting event is then sorted into predefined energy channels, which are defined by thresholds. For an energy channel with a threshold of $t_{1} = 20 keV$ , a photon detected with $E_{γ} = 40 keV$ will be counted in this energy channel.

For a comprehensive overview of dual-energy CT acquisition techniques and their modelling in a simulation, see Faby et al. [7]. Both detector technologies may output multiple energy channels, but the underlying mechanics of building up the signal are fundamentally different. Figure 2 shows two incident photon spectra in a dual-scan setup (left). For example, when using an EID, two subsequent scans can be conducted using

80 kV

and

120 kV

to generate a dual-energy scan. Energy-integrating detectors, indicated by their name, integrate the incoming photon spectrum along the energy axis. With increasing energy, the photons contribute more to the overall signal measured by the detector. Note that the spectral proximity of the effective energies from the two spectra on the left side of Figure 2 decreases the contrast between the dual-energy volumes and, therefore, decreases the strength of the linear independence of the dual-energy measurement. On the right side of Figure 2, the counting probabilities for a photon-counting detector with two energy channels are shown. Primarily, each photon creates a pulse in the detector, which is counted as an event. The height of the pulse is translated into the photon’s energy, which enables the subsequent categorisation into a certain energy channel predefined using thresholds t. In Figure 2, the right plot shows the counting probabilities for two thresholds,

t 1

and

t 2

, which separate the photon spectrum into two energy channels,

C_{1}

and

C_{2}

. Specifically, the channels are

t_{1} \to C_{1} = [30, \infty [

and

t_{2} \to C_{2} = [60, \infty [

. In post-processing, these energy channels can be mapped into distinct energy bins. The low and high energy bins,

B_{low}

and

B_{high}

, are defined as

B_{low} = C_{1} - C_{2} and B_{high} = C_{2}

(2)

For the example in Figure 2, the calculation was performed for a

3 mm

cadmium telluride detector using a method described by Schlomka et al. [8]. In contrast to the EID, the PCD ideally does not aggregate the energy of all photons. If the photon flux is high enough to exceed the maximum counting rate of the PCD, multiple photons can be falsely counted together, which is known as pile-up. Each photon’s individual energy is then summed, resulting in a counting event of one very high-energy photon. Depending on the specifications, the counting mechanism may differentiate more than two energy thresholds, yielding more energy channels. As indicated by Faby et al., additional energy channels may be employed to reduce noise in CT volumes [7].

Figure 2. Two photon spectra illuminating an energy-integrating detector (left). The

120 kV

spectrum is prefiltered using

0.5 mm

copper. The effective energies, drawn dashed, are evaluated using a

1 mm

aluminium block. On the right, the photon-counting detection probabilities (response functions) of a

3 mm

cadmium telluride (CdTe) detector up to

120 keV

. The response functions for the CdTe-based detector are calculated using the model proposed by Schlomka et al. [8]. Initial X-ray spectra are simulated using a reflection target in aRTist [9]. The shown spectra are attenuated by

1 mm

aluminium, which approximates the X-ray tube’s window.

Figure 2. Two photon spectra illuminating an energy-integrating detector (left). The

120 kV

spectrum is prefiltered using

0.5 mm

copper. The effective energies, drawn dashed, are evaluated using a

1 mm

aluminium block. On the right, the photon-counting detection probabilities (response functions) of a

3 mm

cadmium telluride (CdTe) detector up to

120 keV

. The response functions for the CdTe-based detector are calculated using the model proposed by Schlomka et al. [8]. Initial X-ray spectra are simulated using a reflection target in aRTist [9]. The shown spectra are attenuated by

1 mm

aluminium, which approximates the X-ray tube’s window.

As described and depicted for iron in Figure 1, each material has its own characteristic linear attenuation curve (

μ_{L}

depending on the photon energy E). If we were able to measure this entire curve or its characteristic regions/features for a given sample of an unknown material, which can also be a composite, the determination of the material would be clear. The most obvious feature is the K-edge mentioned above. This characteristic discontinuity appears at an energy that is absolutely unique for each material, or more precisely, each element on the periodic table. Even for multi-elemental mixtures, such as barium iodide (

{BaI}_{2}

), two individual absorption peaks can be observed: a barium peak at

37.44 keV

and an iodine peak at

33.17 keV

. By adjusting the peak voltages of the X-ray spectra to produce effective energies that are slightly below and above a known K-edge energy, two images (or, after reconstruction, a dual-energy volume) can be generated that exhibit pronounced contrast in the material possessing this K-edge.

In this paper, this method of material-sensitive computed tomography is called K-edge imaging, and it may be a promising approach for clinical CT with high-Z contrast agents such as gadolinium, ytterbium, tantalum, tungsten, gold, and bismuth, as summarised by Jost et al. [10]. Currently, K-edge imaging is not available for a lot of use cases, particularly when the acceleration voltage (the maximum photon energy) reaches several hundred kilovolts, which is the typical energy domain of industrial CT scanners.

In terms of Figure 1, photons that exhibit K-edge absorption are completely absorbed. Therefore, almost no photons from this energy level reach the detector to contribute to the detector’s signal. The photons that reach the detector originate from higher energy levels, where no K-edge information is available. It is still useful to acquire a dual-energy image regardless of whether the linear attenuation coefficient behaves in a more or less continuous, almost linear, fashion.

2.2. History

The hunt for a function mapping linear attenuation coefficients in the reconstructed volume to materials started almost 50 years ago. In 1976, Alvarez et al. described the linear attenuation coefficient

μ_{L}

(see Equation (1)) as a linear combination of basis functions with energy-independent constants

a_{1}

and

a_{2}

, given by

μ_{L}^{(x, y, z)} (E) = \underset{Photo effect}{\underset{︸}{a_{1}^{(x, y, z)} \cdot \frac{1}{E^{3}}}} + \underset{Compton scattering}{\underset{︸}{a_{2}^{(x, y, z)} \cdot f_{KN} (E)}}

(3)

where the first part of the equation is an approximation of the photoabsorption, which is proportional to the inverse of

E^{3}

for a fixed material, and

f_{KN}

is the Klein–Nishina function describing the inelastic photon scattering (Compton effect) with electrons [11]. Two (spectral) independent measurements are needed to solve Equation (3) and to determine

a_{1}

and

a_{2}

. For a fixed-scan setup, Alvarez described values of

a_{1}

and

a_{2}

to differentiate between human fat and brain tissue (see Table 1).

The purpose of his publication was to enhance clinical diagnostic abilities through the use of energy-selective reconstructions while keeping radiation exposure levels within normal ranges. Consequently, while its primary focus was not directly on identifying materials, it stands as a cornerstone in the field of material-sensitive computed tomography.

More than 25 years later, in 2003, Heismann et al. published a paper mentioning the idea behind Alvarez’s spectral separation as a linear combination of a constant being proportional to the photoelectric effect and a constant proportional to the Klein–Nishina equation [12]. Heismann et al. applied the same idea but with the goal of calculating the atomic number Z and the density

ρ

(Z-

ρ

decomposition). They derived a function to calculate Z and

ρ

directly from two given attenuation coefficients

μ_{1}

and

μ_{2}

F (Z) = \frac{f_{1} (Z)}{f_{2} (Z)} = \frac{μ_{1}}{μ_{2}} with f_{n} = \int w_{n} (E) (\frac{κ}{ρ}) (E, Z) d E and n \in {0, 1}

(4)

where

w_{n} (E)

are weighting functions depending on the source and detector characteristics and the mass attenuation coefficient

\frac{κ}{ρ}

taken from numerical tables. In the example shown by Heismann,

μ_{1}

and

μ_{2}

originate from a polychromatic CT scan. The inverse function

F^{- 1}

from Equation (4) can be found by interpolation since

F (Z)

is monotonically rising for

1 \leq Z \leq 30

, which was shown by Heismann [12]. Heismann applied his algorithm to different chemical samples such as

{NH}_{3}

and NaOH dissolved in aqueous solutions.

The major difference from Alvarez’s method is that Heismann’s method computes effective values instead of discrete fractions. In the case of two-material mixtures with known materials, it is possible to calculate the individual fractions. For three-material systems with known constituents, the exact decomposition may be ambiguous depending on the prior information.

The publications of Alvarez and Heismann appear as foundational works in the field of material-sensitive computed tomography. Alvarez showed material-separating reconstructions for increased inter-material contrast using prior knowledge. Heismann proposed a method to identify materials in a given dual-energy volume using extensive knowledge about the X-ray source and detector. As we will see, most of the referenced publications below also mention Alvarez or Heismann and build on their foundational works.

3. Methodology

Finding research work, e.g., articles or conference proceedings, for a systematic literature search can be quite challenging, as described by vom Brocke et al. [13]. We strive to follow the recommendations of vom Brocke et al., as well as the PRISMA guidelines [14,15], to conduct an effective literature search, which is reproducible, comprehensive, and adds value to its specific scientific community. A PRISMA flow diagram for our literature search can be found in the Supplementary Materials. In the following, the method of finding publications, which is the backbone of this review, is explained in detail.

We use Scopus to find publications with the queries described below. By searching with highly specialised queries, we can extract information, such as historical developments and modern trends, from a large database provided by Elsevier through Scopus.

Using Scopus, one can browse the publication database by simple keywords or with detailed filters and operators. We use a filter called TITLE-ABS-KEY(<str>), which searches for the string <str> in all document titles, abstracts, and keywords. Other search strings, namely without an explicit operator mentioned, are surrounded by quotation marks, which creates a loose coupling between the words. Scopus searches for matches closely related to the given search string, including slight variations, such as UK and American spelling, and plural forms [16]. For further insights, we use a filter called REF(<str>), which searches for the string <str> explicitly in the document’s references. Since the REF filter also includes the REFPUBYEAR filter, which represents the referenced publication’s year, we are able to filter for publications such as REF(Alvarez 1976). The base search query for our Scopus search consists of the following keywords:

1.: TITLE-ABS-KEY(computed tomography OR ct);
2.: TITLE-ABS-KEY(material);
3.: TITLE-ABS-KEY(dual-energy OR multi-energy OR photon counting).

While the first two elements, computed tomography (CT) and material, are obvious, we add specific keywords for filtering energy-resolving approaches, which are needed for material-sensitive approaches (see Section 2). This base query can be refined with keywords like “artificial intelligence” without any further processing steps to search directly in the paper’s contents. The resulting number of publications found with Scopus is shown in Figure 3.

A generally rising trend can be observed in the base query (blue line). Beginning in 2018, AI techniques have become increasingly pertinent in the field of material-sensitive CT. The two red lines show the number of publications found using the base query, filtered by publications referencing Alvarez et al. [11] or Heismann et al. [12], which are described as fundamental works for material decomposition and material identification in Section 2. Obviously, the work of Alvarez et al. is widely known and used for further advancements. Disregarding statistical fluctuations, the rising trend of Alvarez’s citations seems to have stagnated since AI began to be used around 2020. This situation may have arisen because Alvarez’s method of effect-based decomposition is primarily applied to handcrafted classical algorithms, whereas data-driven models learn directly from the datasets provided.

Given this large set of publications, our selection process, based on vom Brocke et al. [13], is described as follows. A publication has to meet the following criteria in order to be accepted for our corpus:

1.

It must be accessible through one of the major publishers;

2.

It must propose a new (original) approach or a substantial improvement in terms of the following:

2.1.: X-ray physics modelling (for classical approaches);
2.2.: Data-driven architecture/model or training strategies.

3.

It must outline a reproducible setup for CT scanning;

4.

It must utilise at least two (energy) channels since this is a condition for material-resolving CT given by Alvarez [11] and Heismann [12].

To remove some degree of arbitrariness from the selection process, we try to identify relevant technological improvements using criterion 2. Since it is impossible to remove arbitrariness entirely, we try to filter for classical algorithmic approaches that incorporate new components from X-ray physics, such as the extension from Heismann’s approach to higher energies using an approximation for pair production in the photon’s attenuation process. For deep learning approaches, we look for new architectures and training strategies. Nevertheless, drawing a line between major and minor refinements of a deep learning architecture is quite challenging. For this specific application in material-resolving CT, which lacks well-known benchmarks for comparison, a performance comparison of models is impossible. Therefore, we look for physics-related improvements in the architecture, such as precision learning with the reconstruction operator, as presented by Su et al. [17]. In edge cases, we add a certain publication to our corpus. Criterion 3 is important to make the approach comparable since different X-ray detection technologies, namely energy-integrating detectors and photon-counting detectors, offer different advantages and disadvantages depending on the use case. Another element of CT acquisition is the tube voltage, which is the primary determinant of the emitted photon spectrum, in conjunction with the anode material (mostly tungsten) and physical dimension, as well as the pre-filtration applied to the beam. The photon spectrum is particularly vital because it defines the overall image contrast. Criterion 4 is given by Alvarez et al. [11] and Heismann et al. [12]. Both publications provide explanations of why material decomposition (Alvarez) and material identification (Z-Rho decomposition; Heismann) require a minimum of two energy channels. Additionally, for publications describing data-driven approaches, the publication needs to contain information about the following:

1.: The model’s architecture;
2.: The training data’s origin (simulation, labelled).

The model’s architecture should be reproducible by anyone with foundational expertise in computer vision and deep learning who reads the paper. The same applies to the datasets used for training, which should be reproducible or at least well described. Some research uses manually labelled data from real scans or simulated data. However, some datasets come from clinical cases, making them partially closed-source, which is accepted as long as the datasets are thoroughly described.

Using these criteria, we identify 24 publications for the paper corpus, including 3 from industrial CT. We address this significant imbalance between clinical and industry-related publications by further considering the ICT conference proceedings from 2016 up to and including 2024. While not indexed in Scopus, this conference is one of the largest gatherings for industrial computed tomography in Europe, and most of the relevant contributions to industrial CT are likely to be published there. We identify two more publications from the ICT conference proceedings suitable for this survey.

Finally, the corpus consists of 26 publications. We divide this corpus into clinical and industrial use cases on the one hand and classical algorithms and data-driven approaches on the other. Figure 4 shows the individual shares of these categories.

Table 2 lists the publications based on data-driven or deep learning approaches, and Table 3 lists the publications utilising classical algorithms.

4. Analysis of Selected Publications

Historically, computed tomography was developed for clinical applications, as described above. Figure 5 shows a timeline revealing the papers from the corpus together with technological milestones.

For example, the first commercial clinical CT system using a photon-counting detector was released in 2021 by Siemens Healthineers.

The entirety of the publication corpus consists predominantly of medical publications (see Figure 4). Namely, we look at 21 clinical publications and 5 industrial publications, of which two are classical algorithmic-based [36,40] and three are driven by deep-learning [21,25,32].

Classical and data-driven approaches can be analysed similarly regarding the photon energies, X-ray detectors (Section 4.1), and materials (Section 4.2) involved in the decomposition or identification process. Subsequently, the classical algorithms are described from a methodical point of view (Section 4.3). Focusing on the datasets employed for training and the specific architectures utilised for each approach, the data-driven approaches are examined in Section 4.4.

4.1. Photon Energies and Detectors

In order to collect the spectral information necessary for material sensitivity, different acquisition strategies can be pursued. As described in Section 2 (especially Figure 2), energy-integrating detectors and photon-counting detectors can be used to acquire dual- or multi-energy CT data. Photon-counting detectors efficiently register nearly every incoming photon as an individual event with a specific photon energy and categorise this event into predetermined energy bins. We identified five publications that applied classical material-sensitive algorithms to CT data from photon-counting detectors with multiple energy bins [35,36,37,38,39]. Roessl et al. used a PCD with eight equidistant energy channels ranging from

10 keV

to

80 keV

and a tube with

90 keV

[37]. A two-channel PCD and a

125 kV

tube were used by Son et al., who categorised the photons into a 20–

60 keV

and a 60–

125 keV

bin [38]. A very low peak voltage of

50 kV

can be found in the works of Wang et al. and Firsching et al. [35,39]. Wang et al. used a four-channel PCD, while Firsching et al. used a Medipix2 detector (256-pixel width and 256-pixel height), detecting 41 photon energy thresholds from

7.3 keV

to

53.3 keV

. Jumanazarov et al. experimented with 2, 6, and 15 energy bins in a photon-counting detector. They found that six energy bins delivered the best results for their use case.

Looking at data-driven approaches, six publications described methods that rely on PCD data [18,19,24,27,29,30]. Shi et al. used two (energy) channels [29], Shi et al. used three channels [30], Bussod et al. used four channels [19], Abascal et al. used five channels [18], and Long et al. and Guo et al. used twelve channels [24,27].

Another strategy to obtain two or more energy channels is a dual scan with an energy-integrating detector and two different X-ray spectra in subsequent scans. The dual-scan strategy was applied by Heismann et al., who used

80 kV

for the low-energy scan and

140 kV

for the high-energy scan [12]. For an industrial application, Xing et al. conducted a dual scan with

3 MVp

and

6 MVp

using an energy-integrating detector built from a cesium iodide scintillator [40].

We identified seven publications focusing on data-driven methods using EID data [17,20,21,25,26,31,32]. Three publications mentioned hybrid approaches using both EID and PCD data [22,23,28]. Usually, approaches employing both detector technologies utilised PCD data as inputs and EID data as outputs (ground truth) to their models. In clinical CT, EID scans expose patients to significantly higher radiation doses, which motivated the authors to employ PCDs in clinical settings and generate high-quality data with their models [22,28].

Figure 6 shows an overview of all publications in the paper corpus resolved by the peak acceleration voltage of the X-ray tube. Obviously, photon-counting detectors are not used for LINAC-CT according to the given paper corpus [21,32,40]. Drawing a line between clinical and industrial methods, when looking at tube voltages, is somewhat clear since industrial applications need higher photon energies in order to penetrate the samples. EIDs and PCDs are both used for clinical and industrial applications, as shown in Figure 6.

In summary, data-driven approaches appear to be compatible with both detector technologies and are not limited to specific photon spectra.

4.2. Materials

The materials to be discriminated or identified are closely related to the application’s domain: clinical or industrial. Figure 7 shows the materials mentioned by the publications in the corpus. Materials commonly used in clinical settings include water and various types of tissue, which are all low-Z materials, as shown in the lower-left corner of the figure. Bones, containing calcium in a composite called hydroxyapatite, are frequently described as matrix composites with a mean density of around

1.2 g / {cm}^{3}

, which means they are still relatively transparent for X-ray photos from

50 keV

to

80 keV

.

The outliers in the clinical low-Z material cluster shown in Figure 7 are iodine and gadolinium. Since iodine and gadolinium exhibit K-edge absorption at

E_{k, I} = 33.17 keV

and

E_{k, G d} = 50.24 keV

, their characteristic linear attenuation coefficients

μ_{L}

may be visible in a clinical CT scan, depending on the actual setup (see [10]). For example, the injection of a contrast agent increases the visibility of blood vessels.

The attenuation coefficients of these contrast agents are very high; however, because they are used in small proportions within a solution, the total attenuation is sufficient to generate contrast while minimising the creation of artefacts. Representing some materials used for industrial CT applications exclusively, Xing et al. worked with iron, aluminium, copper, and lead [40]. While aluminium is located next to the clinical material cluster in the lower-left corner of the figure and can be identified using acceleration voltages below

140 keV

, as shown by Heismann [12], iron, copper, and lead need much higher voltages to be identified. If the acceleration voltage, or photon energy, is too low, the sample absorbs all incoming photons, leaving no useful information that the detector can capture. Therefore, Xing et al. used a linear accelerator, producing up to

6 MeV

X-ray photons (see Figure 6). Regarding aluminium, iron, and copper, Weiss et al. showed an end-to-end material identification architecture for LINAC-CT using a deep learning approach [32]. The materials silicon and magnesium described by Busi et al. [19] and lead shown by Xing et al. [40] were not found in the data-driven publications from this paper’s corpus.

In summary, despite the different materials used in clinical and industrial CT, data-driven methods are most likely not restricted to specific materials, as indicated in Figure 7. Compared to the periodic table of elements, many elements are missing from Figure 7, likely due to the limited range of clinically and technically relevant materials. From a technical point of view, the majority of the approaches presented in the following should generalise to almost any material, as long as the CT scan is of good quality, which can be difficult for materials with heavy X-ray attenuation.

4.3. Classical Algorithmic Approaches

For the scope of this work, classical algorithmic approaches or algorithms refer to more or less complex procedures of data processing to generate a certain output by using extensive expert knowledge in the field of computed tomography. Such algorithms typically rely on the underlying photon physics. In Section 1, Figure 1 highlights the constituents that comprise the linear attenuation coefficient

μ_{L}

. For most material decomposition or identification applications, photoelectric absorption and Compton scattering are the most important attenuation factors, which have been thoroughly documented by Alvarez et al. and Heismann et al. [11,12].

In 2009, Firsching published his PhD thesis describing a modern method of algorithmic material decomposition [35]. He summarised the ambiguous decomposition of multi-material samples nicely: For a basis material decomposition, as described by Alvarez et al., the underlying materials act as linearly independent basis vectors. To achieve this linear independence for a given material system, characteristic discontinuities, namely the K-edges of an element, need to be exploited. The K-edges of clinically relevant elements are extremely low, and photons with that energy range, typically below

20 keV

, do not penetrate the human body and, therefore, are not detectable. To provide higher K-edges, contrast agents, such as iodine or gadolinium, are used, which exhibit K-edges at

33.2 keV

and

50.2 keV

[35]. Additionally, Firsching measured the spectral properties of the X-ray source, as well as the detector’s response function, with subsequent Monte Carlo modelling.

Roessl et al. described a similar basis material decomposition approach using the photo effect and gadolinium, which together form a linearly independent basis due to the K-edge of gadolinium. They aimed to measure local gadolinium densities, which enables separation, or more practical contrast, between the gadolinium contrast agent and calcified plaque, which may form a plug in blood vessels.

For an industrial security application, Jumanazarov et al. published results for atomic number identification in the range

6 \leq Z \leq 15

, also employing basis-material decomposition.

Regarding modern classical

Z - ρ

-decomposition, Son et al. built on Heismann’s work by proposing a stoichiometric calibration for photon-counting detectors in order to achieve a better effective atomic number and electron density [38]. Another

Z - ρ

-decomposition algorithm for industrial applications was proposed by Xing et al. [40], who also built on the work of Heismann and Alvarez. They extended Alvarez’s decomposition of the linear attenuation coefficient by adding a term for pair production to enhance the basis-material decomposition into the MeV range. A linear accelerator served as their photon source, providing photons up to

6 MeV

. Water and carbon were chosen as the basis materials, and the algorithm was applied to copper, iron, aluminium, and the basis materials themselves. They noted that elements with a higher atomic number, like copper, noise, and CT artefacts, posed a challenge that the algorithm could not adequately address [40].

To summarise, classical algorithms used to decompose or identify materials rely heavily on modelling the underlying X-ray physics. The choice of an application-specific basis is crucial for achieving good contrast provided by the basis vectors. In general, for N materials to be identified, the basis needs N linearly independent basis vectors. For

N \geq 3

, identifying an appropriate basis can be significantly challenging or even impossible, which is why contrast agents with a detectable K-edge are utilised to provide another linearly independent vector (dimensionality) to the basis.

4.4. Data-Driven Approaches

This section analyses the publications from our corpus focusing on material-sensitive CT using data-driven methods. First, common datasets are described in Section 4.4.1, and the models used are described in Section 4.4.2.

4.4.1. Datasets

A handful of datasets were generated using the XCAT phantom [43,44]. Developed by Segars et al., the XCAT phantom is presented as software that models human anatomy, including organs, bones, and blood vessels. The software can generate voxel volumes based on specified anatomical parameters, which are immediately ready for CT simulation to produce realistic data corresponding to these parameters [44].

Shi et al. simulated 12 samples/humans with the XCAT phantom and extracted 140 slices from the chest-abdomen region of each sample [30]. Other authors also used the XCAT phantom for the generation of training and test data for clinical applications [22,29]. Targeting industrial CT applications, Fang et al. used the XCAT phantom, virtually filled with industrially relevant materials, such as iron, magnesium, Teflon, and many more, to generate their training data together with a self-built simulation [21]. Another publicly available phantom similar to the XCAT phantom, the FORBILD phantom [45], was used by Cao et al. [20] to generate a phantom consisting of soft tissue, lung-inflated tissue, spongiosa, water and contrast agents with gadolinium or iodine. Nevertheless, for simulation, the authors needed to implement a realistic X-ray physics process in order to generate the training data.

From another perspective, a radiologist can create a dataset from real CT scans by annotating these scans manually. The X-ray physics is implicitly included in the CT scanning process, while the radiologist’s experience contributes to integrating material information. This method was used by Wang et al., Li et al., Long et al., and Guo et al. [24,26,27,31]. Especially for clinical datasets, extensive prior knowledge about human anatomy may simplify the annotation process. Gong et al. used iodine as a contrast agent, which could be separated quite easily due to its high attenuation compared to bones and tissue [23].

Sidky et al. published a challenge dataset consisting of simulated dual-energy CT volumes and corresponding material maps (labels) [46]. In general, the dataset was designed to simulate a human breast CT scan with a focus on detecting calcifications. Figure 8 shows a dual-energy input data tuple, which suffers from sparse angular coverage of CT angles (sparse-angle CT) and, therefore, exhibits streaking artefacts [46]. Alongside the validation data, Sidky et al. published 1000 training tuples for data-driven training approaches. The dataset features an image resolution of 512 × 512 and includes two energy channels for input data along with three material channels for output data.

Krebbers et al. used energy-dispersive X-ray spectroscopy (EDX) to generate labels for scanned graphite samples [25]. When conducting EDX scans with an electron microscope, the penetration depth of electrons may be quite shallow compared to the sample’s dimensions, which generally makes EDX a measurement technique for surfaces. Weiss et al. created virtual samples using a region-growth algorithm on randomly initialised seed points [32].

4.4.2. Models

In 2015, Ronneberger et al. proposed their U-Net architecture for clinical segmentation tasks [41]. U-Net is a deep neural network that makes dense predictions on images. It takes an image with C channels, height H, and width W (

C \times H \times W

) as input and produces a pixel-wise output for some output channels

C^{'}

(

C^{'} \times H \times W

). For example, an RGB image contains

C = 3

channels. Broadly, the model consists of an encoder and a decoder, as shown in Figure 9. For each encoder layer, a combination of

3 \times 3

convolutions, ReLU activation, and a final

2 \times 2

max pooling is applied. After the encoding, the feature space is decoded using transpose convolutions and

3 \times 3

convolutions. Some intermediate features from the encoder are concatenated into certain decoder layers to enhance the decoding performance of spatial information, which is the main contribution of the U-Net architecture.

The original U-Net architecture comprised four max pool layers, decreasing the spatial dimensions by a factor of approximately

2^{4} = 16

. Regarding convolutions, the input data, provided as a grayscale image with one channel, is convolved to 64 channels in the first layer and doubled in each subsequent layer, resulting in 1024 feature channels in the feature space. We use the notation

U - Net (depth 4, ftrs 64)

in the following to describe the specific architecture ablated from the original U-Net.

Besides using the exact channel count and spatial pooling specified by Ronneberger et al., the U-Net architecture’s idea is frequently used by other authors for material decomposition and identification. As indicated in Table 4, the majority of authors use an architecture derived from U-Net. Cao et al. used the original U-Net for sinogram completion since their scan setup provided sparse angular coverage, which is usually the case in clinical applications [20]. For material decomposition, they used three separate U-Nets that share the feature space after each U-Net’s encoding path. Li et al. also used the original U-Net to calculate water and iodine basis images from a single-energy scan [26]. Shi et al. did the same thing in the projection domain [29]. Subsequently, they applied multiple MLPs to check whether the output of the U-Net was consistent with the X-ray physics given by prior knowledge about the CT acquisition process. Fang et al. [21], Abascal et al. [18], and Bussod et al. [19] used shallow (fewer features and a smaller number of depth layers) versions of U-Net compared to the architecture proposed by Ronneberger. Nadkarni et al. used a modified version of a shallow U-Net to directly process three-dimensional (volumetric) data [28].

Taking the structural idea behind U-Net, several authors have tried to enhance performance by replacing some parts of the architecture. Long et al. used a model called FC-PRNet (Fully Convolutional Pyramidal Residual Network), which is almost identical to U-Net [27]. Shi et al. introduced a graph edge-conditioned U-Net that utilises 3 × 3, 5 × 5, and 7 × 7 convolutions in each encoder layer in parallel [30]. This is followed by local and non-local feature aggregation (LNFA). The purpose of this structure is to effectively capture local features at different spatial scales. Weiss et al. replaced the encoder in U-Net with a vision transformer (original work by Dosovitskiy et al. [42]) while retaining the decoder in the original U-Net [32]. Krebbers et al. reported that a model known as sensor3D outperformed U-Net in their graphite decomposition application [25]. However, we could not locate the architecture of this network due to the absence of citations and diagrams in the publication.

Su et al. proposed an architecture called DIRECT-Net, which resembles U-Net, including skip connections, but lacks spatial pooling layers. Their approach is unique compared to all other approaches mentioned in the scope of this paper since they used projections as input data and material maps in the image (reconstructed volumes) as output. Generally, most information in a CT scan can be found in the projections. However, due to the mathematical nature of the reconstruction algorithm, some information is lost, resulting in the reconstructed volume containing less information than the original projections. Therefore, Su et al. preprocessed the dual-energy sinograms, which contained one pixel row from each projection, with their DIRECT-Net to obtain eight “sinogram-like” feature maps [17]. These sinogram-like features were reconstructed using a basic backprojection reconstruction algorithm, which was implemented in TensorFlow, providing tensor gradients natively. The results included eight images (volumes) that were fed through another DIRECT-Net to calculate the material decomposition maps. The advantage of this approach is that it enables the entire model, from inputting sinograms (projection slices) to producing material maps, to be trained in a comprehensive end-to-end manner, incorporating the reconstruction as a predetermined operator, which supports backpropagation during training. Maier et al. defined the integration of known operators mapping from one space to another into deep learning models as precision learning [47].

Gong et al. used a CNN with inception blocks to map multi-energy volumes to material maps, revealing soft tissue, iodine, and bone material [23]. They showed that their model outperformed U-Net in this use case.

Across the selected publications, three authors used a generative adversarial network (GAN). While originating from unsupervised learning applications, GANs are also used nowadays for transfer learning [48]. In a nutshell, GANs consist of a generator and a discriminator. The generator creates output images, known as candidates, which are expected to align with the distribution defined by the training data. The discriminator’s role is to determine whether these candidates are artificially produced by the generator or whether they are from the actual training dataset. Both the generator and discriminator are trained simultaneously, resulting in the generator producing increasingly refined images, while the discriminator improves its ability to differentiate between artificial and real images.

Wang et al. proposed a generator built from U-Net with a small transformer layer, including multi-head self-attention and an MLP, in the feature space subsequent to the encoder [31]. Their discriminator tries to determine whether the images (water and iodine maps) are artificial or real. They defined a hybrid loss as

L = L_{1} + L_{adversarial} + L_{VGG}

where

L_{1}

is the L1-loss,

L_{adversarial}

is the discriminator’s loss, and

L_{VGG}

is known as perceptual loss [49]. Guo et al. use a modified U-Net as a generator to produce material maps from multi-energy volumes [24]. Subsequently, the input data and the predicted images are fed into the discriminator (two-layer CNN with pooling) to decide whether the data are artificial or real (from the training dataset). Geng et al. applied the same technique to separate a physically inserted needle from a human torso (background) in projection images without reconstruction [22].

Sidky et al. [46] organised a material identification challenge with a corresponding dataset, which was mentioned in the previous section. An associated report about the challenge’s results by Sidky et al. [46] summarised the top ten best-performing methods. We examined the three top-performing deep learning models that secured first, third, and fourth positions. Second place was claimed by a classical X-ray representation method that fits iteratively by applying X-ray physics to the provided data, which can be interpreted as machine learning. All models reported take a dual-energy volume as input and predict three material channels. Team GM_CNU won the challenge with a two-step model. In the first step, an algebraic reconstruction is used to generate two material-basis images (volumes) from the given dual-energy sinograms. A CNN (RED-CNN [50], very similar to U-Net) is subsequently used to calculate the three material maps from the basis images. The second step is an iterative module starting with a simulation, which projects the material images into dual-energy sinograms, which are in the same data domain as the inputs. Reconstruction of these artificial sinograms, as well as the difference from the actual sinograms, yields residual images. The residuals and the predicted material maps are fed into a CNN that calculates an increment for the material maps to adapt them in a physically constrained manner [46]. This approach is quite similar to that of Li et al. [26] and uses a projection operator to check whether the predictions of the material decomposition fit the input data.

Team MIR used four U-Nets. The first calculates the sum of all material constituents per pixel. The second finds pixels where calcifications are present (binary). The third computes calcium and adipose maps. The fourth refines the calcification and adipose maps [46]. Taking fourth place in the challenge, team WashUDEAM used an algorithm called dual-energy alternating minimisation. First, the dual-energy images are directly decomposed into two basis materials (AB) from the three present materials (ABC), which is done for all combinations, such as AB, AC, and BC. The outcome consists of six basis material maps in different combinations, which are input into a U-Net that converts them into the targeted three material maps [46].

Summarising the models described in this section, Table 4 reveals an abundance of U-Nets or modified versions of them across clinical and industrial applications for material resolution. Hybrid architectures do exist that extend the capabilities of U-Net by checking its output in terms of X-ray physics and consistency with the input data. We also note that the transformer architecture, although promising in other computer vision disciplines, is still relatively new and has not yet been established in the field of computed tomography. As mentioned, the first approach using a vision transformer was published by Weiss et al. [32].

From a timeline perspective, the basic use of one-step models is continuously being replaced with multi-step models, as described by Sidky et al. [46]. For example, after the first material decomposition or identification step, an X-ray simulator can project the material data back to the input data’s space. This explicitly verifies whether the material data matches the original input data.

4.4.3. Computational Considerations

The ascent of data-driven algorithms is primarily fuelled by advancements in computer chip technology. Moore’s law is a famous rule of thumb that roughly states that the number of transistors on a microchip doubles every two years. It was quite precise for the latter decades of the 20th century; however, its continued applicability is now being questioned due to the unresolved challenge of manufacturing electronic structures smaller than a certain critical physical size. For graphics processing units (GPUs), some authors mention Huang’s law, which postulates an even faster performance growth for GPUs compared to Moore’s law [51]. The most interesting metric for such an evaluation may be the rapid increase in FLOPS per dollar over time. Even for pure academic researchers without economic pressure, computational resources remain constrained. The accessibility to very large and complex deep learning architectures increases along with the FLOPS per dollar. Since we are rapidly reaching limitations in the manufacturing process of traditional computer chips, researchers are looking for other concepts, such as neuromorphic processors, as shown by IBM with the NorthPole chip [52].

It is not uncommon for authors to omit the specific hardware they utilised to train their models. However, it can be reasonably assumed that the majority of U-Nets, without any significant modifications, can be trained on commercially available GPU hardware. This is because a big advantage of U-Nets is that they can be trained on fairly small GPU setups using single gaming GPUs, small workstations, or even laptops. Three papers from the corpus provide information about the hardware used for model training. Abascal et al. [18] used an Nvidia GTX 1080Ti, and Shi et al. [29] used an Nvidia RTX 2080. In contrast, Weiss et al. trained their U-Net, modified with a vision transformer, on an Nvidia DGX A100 system [32].

From this point of view, it is unclear whether the authors used U-Nets because they were simply good enough for the purpose or because larger and more complex architectures exceeded their computational resources. Larger models generally require more data, which can be another limitation, particularly for CT data, as it is costly to collect such data in the real world (see Section 4.4.1).

5. Summary and Future Trends

This section provides answers to the research questions in Section 1 by summarising the results from the in-depth review in Section 4. Additionally, we provide a forecast for future trends regarding data-driven methods for material-resolving CT.

5.1. RQ1: Are There Common Datasets or Simulation Techniques Used for Deep Learning in Material-Resolving CT?

Data-driven methods are enabled by large datasets to train the underlying models. These datasets can originate from either simulations or real CT scans, with the latter requiring manual annotation. In an ideal scenario, the data distribution, which is generated by a certain simulation, perfectly fits the data distribution generated by a real CT scanner. We identified four publications that generated their training data using the XCAT phantom, which is described in Section 4.4.1 [22,29,30]. Multiple authors have manually annotated real scans, with some utilising alternative measurement techniques to replace or support the annotation process [31]. For example, Bussod et al. used synchrotron (monoenergetic) CT scans to easily separate the materials in eight clinical cases of knee osteoarthritis [19]. Regarding open datasets and benchmarks, Sidky et al. provided a dataset in the context of their clinical material decomposition challenge [46]. Since Sidky’s dataset is based on a human breast scan that includes adipose tissue and calcifications, it is not suitable for industrial CT neural networks.

In general, real data with accurate annotations are highly preferred for model training since they come from the same distribution as the data that will be used for inference. However, collecting real data can be expensive and time-consuming, leading many scientists to use simulations instead.

With respect to the question at hand, a differentiation of answers is required. With respect to medical applications, the dataset from Sidky et al. represents the first publicly available dataset suitable to benchmark deep learning approaches in this field. However, it is constrained in scope and needs to be further augmented. Conversely, for industrial applications with elevated energies, neither publicly available datasets nor suitable complimentary phantom creation tools are currently available.

5.2. RQ2: Which Deep Learning Models Are Used and Are They Specifically Engineered for Material-Sensitive CT Applications?

Table 4 summarises the models used by the publications analysed in the scope of this work. The U-Net architecture proposed by Ronneberger et al. [41] is currently of high importance for material-sensitive CT (see Section 4.4). We found publications that used almost the original U-Net [20,26,29]. Some publications used shallow and narrow versions of U-Net, namely with fewer features and pooling layers [18,19,21]. The strong locality of fixed kernel sizes in U-Nets was extended to larger local receptive fields by Shi et al., who used a local and non-local feature aggregation approach with differently sized convolutions, and Weiss et al., who used a vision transformer on large local patches [30,32]. In order to verify and refine the results given by neural networks, Li et al. [26] and team GM_CNU from the AAPM challenge [46] implemented a subsequent module to physically project the material maps (volumes) back into the projection space and refined the material maps by observing the residuals. Furthermore, we identified three publications that implemented generative adversarial networks [22,24,31]. All of them used the discriminator in the GAN architecture to verify whether the generated materials, whether in the reconstructed volume or the projection space, aligned with the training data distribution or, from a physical standpoint, appeared physically reasonable.

The second question can be answered accordingly. Current approaches are essentially based on the U-Net architecture, which was invented for CT imaging applications [41]. The contribution by Weiss et al. was the first attempt in the field of vision transformers [32]. However, the answer to the first question in particular appears to be an obstacle. Newer-generation models require access to extensive training datasets in order to achieve their full potential. However, this is not currently the case in the material-sensitive CT domain. The availability of enhanced and verified simulation models could provide a solution by enabling the generation of sufficient training data and allowing the use of transfer learning approaches to bridge the gap between simulation and reality.

5.3. RQ3: Does Employing Deep Learning Eliminate the Necessity for Expert Domain Knowledge in Material-Sensitive CT?

In Figure 3, we mentioned a stagnating number of citations of Alvarez et al. with the rise of AI methods for material-sensitive CT. Is it true that researchers using deep learning to solve a certain problem do not care about alternative non-AI methods? For material-sensitive CT, we try to evaluate this question. In order to train a data-driven model, we need an exemplary dataset. Especially for material-resolving CT, all publications in our corpus conducted supervised training, which needs inputs and outputs in the dataset. Technically, if a sufficiently large and accurate dataset is available, a researcher needs AI-related knowledge exclusively to build and train a model without any idea about computed tomography, merely training a mapping function from some inputs to some outputs. We described three methods that used an open dataset in the scope of the material decomposition challenge posed by Sidky et al. [46]. These models performed the best on the given decomposition problem and were far from vanilla computer vision models, such as U-Net. All of them used techniques or ablations from classical algorithms: X-ray simulations to verify the model’s outputs by projecting them back into the original data space or feature vectors derived from classical basis vector decomposition, as proposed by Alvarez et al. [11]. Obviously, the researchers have extensive CT domain knowledge in order to build such architectures.

On the other hand, we found that some authors used vanilla U-Nets (including shallow and narrow versions) on simulated datasets they created. They also needed CT domain knowledge for the creation and meaningful parametrisation of the simulations. Su et al. used vanilla U-Nets but incorporated CT domain knowledge through the implementation of the CT reconstruction operator as a module in their deep learning architecture.

In summary, there was no evident loss of CT domain knowledge in the publications utilising AI methods for material-sensitive CT. The authors, likely experts in the CT field, employed deep learning as an additional tool to enhance material decomposition and identification. As indicated by the results of the material decomposition challenge posed by Sidky et al., data-driven approaches have begun to replace classical manual feature engineering in this discipline. It is important to note that the CT domain, like other domains, is undergoing a transformation. While this process is occurring at a slower pace compared to other domains, such as classic computer vision, where deep learning approaches are currently dominant, it has demonstrably begun (see Figure 3). The success of this transformation will undoubtedly depend on the future development of simulations and data availability. It is evident today that deep learning methods are suitable for learning complex mapping functions in the CT domain.

5.4. RQ4: What Are the Hardware Requirements for Training and Executing Models?

It is challenging to provide a comprehensive answer to the fourth question, as the majority (more than 85%) of the papers included in the corpus lack any information on the training hardware used. As previously stated, it is only possible to draw indirect conclusions about the requirements. The models employed were either U-Nets or architectures based on them. A small number of publications, however, deviated from this. The prevailing transformer architecture has thus far been employed in only one instance. It has been reported that the hardware requirements for U-Nets are low, particularly in relation to the training data volumes. Conversely, it cannot be concluded that a low hardware configuration is sufficient, as it may have been precisely this that influenced the choice of these models. Nevertheless, given the lag of the CT domain behind current developments in deep learning, as discussed in this study, it seems reasonable to conclude that the hardware requirements can also be met to a satisfactory extent in the future.

5.5. Future Trends

The observed trend of implicitly learning manually crafted features through deep learning models is also evident within the CT domain. The investigation of AI-driven models has been on the rise since 2018, and there is no indication that this trend will abate in the near future. A notable advantage is that existing models, such as U-Net, for dense image predictions already address a multitude of challenges within the domain very effectively. Nevertheless, the most successful outcomes in domain-specific tasks have been achieved by integrating domain-specific knowledge into these foundational models, thereby leveraging features that are particularly significant for the specific task. It is anticipated that this development will continue, with an increasing diversity of deep learning models being employed in all areas of pre- and post-processing CT data.

Further, over the past decade, there has been a notable increase in the number of milestone models, accompanied by a corresponding rise in model complexity and computational cost [53]. This trend is not currently observed in models used for computed tomography, likely due to the necessity for parameter efficiency, as gathering real-world CT data is both expensive and time-consuming. Therefore, it is both sensible and necessary to collect real CT data and consolidate these data into a common dataset. This will promote comparability between approaches and make current deep learning models applicable to the CT domain. This is also true for simulations, which are highly suitable as training foundations but require parallelisability and cost-effective, preferably free, licensing models.

Besides the data availability issue discussed above, multiple challenges are currently slowing down the development, especially the deployment progress, of artificial intelligence in production environments. Artificial intelligence, particularly deep learning, is still in its early developmental stages, making it essential to continue discussions on its legal status. AI models in real production environments often struggle to generalise to new data distributions that differ from their training data [54]. A general lack of governing standards for the employment of AI creates a diffuse view of compliance and security. For example, a model trained with sensitive data may not be secure against malicious attacks and reverse engineering of the underlying data distribution [54]. Fusing these challenges and projecting onto an exemplary AI model working with clinical CT data, it may be unclear whether a certain model trained with clinical CT data of European citizens (a) can generalise to CT data of American or Asian citizens without revealing unexpected and unexplainable behaviour originating from anatomical differences (e.g., bone mineral densities [55]) and (b) is compliant with the data regulations given by European laws. In early 2024, the European Union published a regulation called the AI Act in response to these challenges [56]. The AI Act differentiates AI applications by their risk, e.g., it permits the employment of AI for social scoring and biometric categorisation [56]. As AI for clinical CT applications is in the high-risk category, the overall transformation process to new techniques has slowed down dramatically. A British research institute called the Manufacturing Technology Centre published a document that captures the current state of trustworthy AI and provides key points on how to develop and deploy trustworthy AI [54]. In Germany, the German Institute for Standardisation is working on AI standardisation, which will lead to a certification process for AI systems being labelled with “KI made in Europe” [57].

In an ideal scenario, the entire lifecycle of an AI tool, as described by Qarouty et al. [54], from the definition of requirements to the data collection process, training, and finishing in the successful deployment or even the retirement of the tool, is certified and fulfils rigorous standards. In reality, it is quite difficult to define a certification process before the tool’s required research work has been completed. When pushing into new technologies, usually there is a development phase free of regulations to explore and experiment with new possibilities. The regulations, in this case, the process behind building a trustworthy AI system, follow as soon as the technology readiness level (TRL) increases and surpasses the proof-of-concept stage. In the scope of this review, we highlighted the ongoing paradigm shift from classical algorithms built by domain experts to deep learning methods. The majority of approaches in the corpus are located above the proof-of-concept stage but far below the completed and qualified stage (TRL

⋘ 8

) since such qualification processes, especially for high-risk AI, do not exist as of today. A domain-specific qualification and certification for AI in the world of computed tomography will enhance trustworthiness, thereby facilitating the successful introduction of this new technology into the market.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/ndt2030018/s1, PRISMA 2020 flow diagram for new systematic reviews which included searches of databases and registers only.

Author Contributions

Conceptualisation, M.W. and T.M.; methodology, M.W. and T.M.; We acknowledge the use of ChatGPT and DeepL for assistance in refining English grammar and concise wording since both authors are not native speakers. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The methodology used to generate the paper corpus is described and can be reproduced using Scopus. All publications mentioned are available for download from established publishers.

Conflicts of Interest

Moritz Weiss is conducting his PhD research at the University of Wuppertal and is employed as an AI Research Scientist at diondo GmbH. There is no financial dependency between the chair of Meisen at the University of Wuppertal and diondo GmbH. This work was not influenced by diondo GmbH since it was created as part of Moritz Weiss’ PhD research. Diondo GmbH sells industrial CT systems. The other author, Tobias Meisen, declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Richmond, C. Sir Godfrey Hounsfield. BMJ 2004, 329, 687. [Google Scholar] [CrossRef]
Nobel Prize Outreach. The Nobel Prize in Physiology or Medicine 1979. Available online: https://www.nobelprize.org/prizes/medicine/1979/press-release/ (accessed on 21 May 2024).
Feldkamp, L.A.; Davis, L.C.; Kress, J.W. Practical Cone-Beam Algorithm. J. Opt. Soc. Am. A 1984, 1, 612. [Google Scholar] [CrossRef]
Dong, S.; Wang, P.; Abbas, K. A Survey on Deep Learning and Its Applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
Ofir, N.; Nebel, J.C. Classic versus Deep Learning Approaches to Address Computer Vision Challenges. arXiv 2021, arXiv:2101.09744. [Google Scholar]
Perry Sprawls. Physical Principle of Medical Imaging (Web Version), 2nd ed., 1993. Available online: http://www.sprawls.org/ppmi2/RADPEN/ (accessed on 21 May 2024).
Faby, S.; Kuchenbecker, S.; Sawall, S.; Simons, D.; Schlemmer, H.P.; Lell, M.; Kachelrieß, M. Performance of Today’s Dual Energy CT and Future Multi Energy CT in Virtual Non-Contrast Imaging and in Iodine Quantification: A Simulation Study. Med. Phys. 2015, 42, 4349–4366. [Google Scholar] [CrossRef]
Schlomka, J.P.; Roessl, E.; Dorscheid, R.; Dill, S.; Martens, G.; Istel, T.; Bäumer, C.; Herrmann, C.; Steadman, R.; Zeitler, G.; et al. Experimental Feasibility of Multi-Energy Photon-Counting K-edge Imaging in Pre-Clinical Computed Tomography. Phys. Med. Biol. 2008, 53, 4031–4047. [Google Scholar] [CrossRef]
Bellon, C. aRTist—Analytical RT Inspection Simulation Tool; BAM Federal Institute for Materials Research and Testing: Berlin, Germany, 2007. [Google Scholar]
Jost, G.; McDermott, M.; Gutjahr, R.; Nowak, T.; Schmidt, B.; Pietsch, H. New Contrast Media for K-Edge Imaging With Photon-Counting Detector CT. Investig. Radiol. 2023, 58, 515–522. [Google Scholar] [CrossRef]
Alvarez, R.E.; Macovski, A. Energy-Selective Reconstructions in X-ray Computerised Tomography. Phys. Med. Biol. 1976, 21, 733–744. [Google Scholar] [CrossRef]
Heismann, B.J.; Leppert, J.; Stierstorfer, K. Density and Atomic Number Measurements with Spectral X-Ray Attenuation Method. J. Appl. Phys. 2003, 94, 2073–2079. [Google Scholar] [CrossRef]
Vom Brocke, J.; Simons, A.; Riemer, K.; Niehaves, B.; Plattfaut, R.; Cleven, A. Standing on the Shoulders of Giants: Challenges and Recommendations of Literature Search in Information Systems Research. Commun. Assoc. Inf. Syst. 2015, 37, 9. [Google Scholar] [CrossRef]
Page, M.J.; Moher, D.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. PRISMA 2020 Explanation and Elaboration: Updated Guidance and Exemplars for Reporting Systematic Reviews. BMJ 2021, 372, n160. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
University College London. Scopus: Techniques for Searching. Available online: https://library-guides.ucl.ac.uk/scopus/search-techniques (accessed on 24 April 2024).
Su, T.; Sun, X.; Yang, J.; Mi, D.; Zhang, Y.; Wu, H.; Fang, S.; Chen, Y.; Zheng, H.; Liang, D.; et al. DIRECT-Net: A Unified Mutual-domain Material Decomposition Network for Quantitative Dual-energy CT Imaging. Med. Phys. 2022, 49, 917–934. [Google Scholar] [CrossRef]
Abascal, J.F.P.J.; Ducros, N.; Pronina, V.; Rit, S.; Rodesch, P.A.; Broussaud, T.; Bussod, S.; Douek, P.; Hauptmann, A.; Arridge, S.; et al. Material Decomposition in Spectral CT Using Deep Learning: A Sim2Real Transfer Approach. IEEE Access 2021, 9, 25632–25647. [Google Scholar] [CrossRef]
Bussod, S.; Abascal, J.F.; Arridge, S.; Hauptmann, A.; Chappard, C.; Ducros, N.; Peyrin, F. Convolutional Neural Network for Material Decomposition in Spectral CT Scans. In Proceedings of the 2020 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 24–28 August 2021; pp. 1259–1263. [Google Scholar] [CrossRef]
Cao, W.; Shapira, N.; Maidment, A.; Daerr, H.; Noël, P.B. Hepatic Dual-Contrast CT Imaging: Slow Triple kVp Switching CT with CNN-based Sinogram Completion and Material Decomposition. J. Med. Imaging 2022, 9, 014003. [Google Scholar] [CrossRef]
Fang, W.; Li, L. Attenuation Image Referenced (AIR) Effective Atom Number Image Calculation for MeV Dual-Energy Container CT Using Image-Domain Deep Learning Framework. Results Phys. 2022, 35, 105406. [Google Scholar] [CrossRef]
Geng, M.; Tian, Z.; Jiang, Z.; You, Y.; Feng, X.; Xia, Y.; Yang, K.; Ren, Q.; Meng, X.; Maier, A.; et al. PMS-GAN: Parallel Multi-Stream Generative Adversarial Network for Multi-Material Decomposition in Spectral Computed Tomography. IEEE Trans. Med. Imaging 2021, 40, 571–584. [Google Scholar] [CrossRef]
Gong, H.; Tao, S.; Rajendran, K.; Zhou, W.; McCollough, C.H.; Leng, S. Deep-learning-based Direct Inversion for Material Decomposition. Med. Phys. 2020, 47, 6294–6309. [Google Scholar] [CrossRef]
Guo, X.; He, P.; Lv, X.; Ren, X.; Li, Y.; Liu, Y.; Lei, X.; Feng, P.; Shan, H. Material Decomposition of Spectral CT Images via Attention-Based Global Convolutional Generative Adversarial Network. Nucl. Sci. Tech. 2023, 34, 45. [Google Scholar] [CrossRef]
Krebbers, L.T.; Grozmani, N.; Lottermoser, B.G.; Schmitt, R.H. Application of Multispectral Computed Tomography for the Characterisation of Natural Graphite. J. Nondestruct. Test. 2023, 28, 3. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Tie, X.; Li, K.; Zhang, R.; Qi, Z.; Budde, A.; Grist, T.M.; Chen, G.H. A Quality-checked and Physics-constrained Deep Learning Method to Estimate Material Basis Images from single-kV Contrast-enhanced Chest CT Scans. Med. Phys. 2023, 50, 3368–3388. [Google Scholar] [CrossRef]
Long, Z.; Feng, P.; He, P.; Wu, X.; Guo, X.; Ren, X.; Chen, M.; Gao, J.; Wei, B.; Cong, W. Fully Convolutional Pyramidal Residual Network for Material Discrimination of Spectral CT. IEEE Access 2019, 7, 167187–167194. [Google Scholar] [CrossRef]
Nadkarni, R.; Allphin, A.; Clark, D.; Badea, C. Material Decomposition from Photon-Counting CT Using a Convolutional Neural Network and Energy-Integrating CT Training Labels. In Proceedings of the 7th International Conference on Image Formation in X-ray Computed Tomography, Baltimore, MD, USA, 12–16 June 2022; Stayman, J.W., Ed.; SPIE: Baltimore, MD, USA, 2022; p. 14. [Google Scholar] [CrossRef]
Shi, Z.; Li, H.; Li, J.; Wang, Z.; Cao, Q. Raw-Data-Based Material Decomposition Using Modified U-Net for Low-Dose Spectral CT. In Proceedings of the 2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Suzhou, China, 19–21 October 2019; pp. 1–5. [Google Scholar] [CrossRef]
Shi, Z.; Kong, F.; Cheng, M.; Cao, H.; Ouyang, S.; Cao, Q. Multi-Energy CT Material Decomposition Using Graph Model Improved CNN. Med. Biol. Eng. Comput. 2024, 62, 1213–1228. [Google Scholar] [CrossRef]
Wang, G.; Liu, Z.; Huang, Z.; Zhang, N.; Luo, H.; Liu, L.; Shen, H.; Che, C.; Niu, T.; Liang, D.; et al. Improved GAN: Using a Transformer Module Generator Approach for Material Decomposition. Comput. Biol. Med. 2022, 149, 105952. [Google Scholar] [CrossRef] [PubMed]
Weiss, M.; Brierley, N.; Von Schmid, M.; Meisen, T. End-To-End Deep Learning Material Discrimination Using Dual-Energy LINAC-CT. J. Nondestruct. Testing 2024, 29. [Google Scholar] [CrossRef]
Azevedo, S.G.; Martz, H.E.; Aufderheide, M.B.; Brown, W.D.; Champley, K.M.; Kallman, J.S.; Roberson, G.P.; Schneberk, D.; Seetho, I.M.; Smith, J.A. System-Independent Characterization of Materials Using Dual-Energy Computed Tomography. IEEE Trans. Nucl. Sci. 2016, 63, 341–350. [Google Scholar] [CrossRef]
Busi, M.; Mohan, K.A.; Dooraghi, A.A.; Champley, K.M.; Martz, H.E.; Olsen, U.L. Method for System-Independent Material Characterization from Spectral X-ray CT. NDT E Int. 2019, 107, 102136. [Google Scholar] [CrossRef]
Firsching, M. Material Reconstruction in X-ray Imaging. Ph.D. Thesis, University of Erlangen-Nürnberg, Erlangen, Germany, 2009; pp. 1–104. Available online: https://ecap.nat.fau.de/wp-content/uploads/2017/05/2009_Firsching_Dissertation.pdf (accessed on 21 May 2024).
Jumanazarov, D.; Alimova, A.; Abdikarimov, A.; Koo, J.; Poulsen, H.F.; Olsen, U.L.; Iovea, M. Material Classification Using Basis Material Decomposition from Spectral X-ray CT. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrometers Detect. Assoc. Equip. 2023, 1056, 168637. [Google Scholar] [CrossRef]
Roessl, E.; Proksa, R. K-Edge Imaging in x-Ray Computed Tomography Using Multi-Bin Photon Counting Detectors. Phys. Med. Biol. 2007, 52, 4679–4696. [Google Scholar] [CrossRef] [PubMed]
Son, K.; Kim, D.; Lee, S. Improving the Accuracy of the Effective Atomic Number (EAN) and Relative Electron Density (RED) with Stoichiometric Calibration on PCD-CT Images. Sensors 2022, 22, 9220. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Cai, A.; Wu, W.; Zhang, T.; Liu, F.; Yu, H. IMD-MTFC: Image-Domain Material Decomposition via Material-Image Tensor Factorization and Clustering for Spectral CT. IEEE Trans. Radiat. Plasma Med. Sci. 2023, 7, 382–393. [Google Scholar] [CrossRef]
Xing, Y.; Zhang, L.; Duan, X.; Cheng, J.; Chen, Z. A Reconstruction Method for Dual High-Energy CT With MeV X-rays. IEEE Trans. Nucl. Sci. 2011, 58, 537–546. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
Segars, W.P.; Sturgeon, G.; Mendonca, S.; Grimes, J.; Tsui, B.M.W. 4D XCAT Phantom for Multimodality Imaging Research. Med. Phys. 2010, 37, 4902–4915. [Google Scholar] [CrossRef]
Segars, W.P.; Mahesh, M.; Beck, T.J.; Frey, E.C.; Tsui, B.M.W. Realistic CT Simulation Using the 4D XCAT Phantom: Realistic CT Simulation Using the 4D XCAT Phantom. Med. Phys. 2008, 35, 3800–3808. [Google Scholar] [CrossRef]
Yu, Z.; Noo, F.; Dennerlein, F.; Wunderlich, A.; Lauritsch, G.; Hornegger, J. Simulation Tools for Two-Dimensional Experiments in x-Ray Computed Tomography Using the FORBILD Head Phantom. Phys. Med. Biol. 2012, 57, N237–N252. [Google Scholar] [CrossRef]
Sidky, E.Y.; Pan, X. Report on the AAPM Deep-Learning Spectral CT Grand Challenge. Med. Phys. 2024, 51, 772–785. [Google Scholar] [CrossRef]
Maier, A.; Schebesch, F.; Syben, C.; Würfl, T.; Steidl, S.; Choi, J.H.; Fahrig, R. Precision Learning: Towards Use of Known Operators in Neural Networks. arXiv 2018, arXiv:1712.00374. [Google Scholar]
Li, B.; François-Lavet, V.; Doan, T.; Pineau, J. Domain Adversarial Reinforcement Learning. arXiv 2021, arXiv:2102.07097. [Google Scholar]
Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. arXiv 2016, arXiv:1603.08155. [Google Scholar]
Chen, H.; Zhang, Y.; Kalra, M.K.; Lin, F.; Chen, Y.; Liao, P.; Zhou, J.; Wang, G. Low-Dose CT With a Residual Encoder-Decoder Convolutional Neural Network. IEEE Trans. Med. Imaging 2017, 36, 2524–2535. [Google Scholar] [CrossRef] [PubMed]
Trends in GPU Price-Performance. 2022. Available online: https://epochai.org/blog/trends-in-gpu-price-performance (accessed on 17 May 2024).
Modha, D.S.; Akopyan, F.; Andreopoulos, A.; Appuswamy, R.; Arthur, J.V.; Cassidy, A.S.; Datta, P.; DeBole, M.V.; Esser, S.K.; Otero, C.O.; et al. Neural Inference at the Frontier of Energy, Space, and Time. Science 2023, 382, 329–335. [Google Scholar] [CrossRef] [PubMed]
Sevilla, J.; Heim, L.; Ho, A.; Besiroglu, T.; Hobbhahn, M.; Villalobos, P. Compute Trends Across Three Eras of Machine Learning. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar] [CrossRef]
Qarout, Y.; Begg, M.; Fearon, L.; Russell, D. Trustworthy AI Framework; Manufacturing Technology Centre: Coventry, UK, 2024. [Google Scholar]
Zengin, A.; Pye, S.; Cook, M.; Adams, J.; Wu, F.; O’Neill, T.; Ward, K. Ethnic Differences in Bone Geometry between White, Black and South Asian Men in the UK. Bone 2016, 91, 180–185. [Google Scholar] [CrossRef]
European Commission. AI Act. 2024. Available online: https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai (accessed on 27 June 2024).
German Institute for Standardisation. Second Edition of the German Standardization Roadmap AI. 2022. Available online: https://www.din.de/de/forschung-und-innovation/themen/kuenstliche-intelligenz/fahrplan-festlegen (accessed on 27 June 2024).

Figure 1. Linear attenuation coefficient,

μ_{L}

, of iron and its components: photoelectric effect, Rayleigh scattering, Compton scattering, and pair production, up to

6 MeV

. The characteristic K-edge of iron at

7.1 keV

is indicated by a dashed line.

Figure 1. Linear attenuation coefficient,

μ_{L}

, of iron and its components: photoelectric effect, Rayleigh scattering, Compton scattering, and pair production, up to

6 MeV

. The characteristic K-edge of iron at

7.1 keV

is indicated by a dashed line.

Figure 3. Number of publications found with the Scopus search engine on 4 April 2024 using the query described above and further filter keywords. The filter REF(Alvarez 1976) selects all publications found using the query that reference Alvarez’s publication [11] from 1976. The same holds for REF (Heismann 2003), referencing Heismann et al.’s publication [12].

Figure 4. Overview of the paper corpus used in this work. Clinical publications are highlighted in blue/green, and industrial publications are highlighted in red.

Figure 5. Timeline of innovations (bottom) [2,41,42] and publications from the paper corpus (top) [11,12,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40]. The bottom shows innovations in CT technology, as well as milestone architectures in computer vision. Publications with the annotation AI employ deep learning techniques. The timeline is not true to scale.

Figure 6. Peak acceleration voltages in the publications from the paper corpus [11,12,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40], colour-coded by domain. Dashed lines represent the use of photon-counting detectors, illustrating multiple energy bins distributed across a photon spectrum (not quantitative). If a certain method utilises both PCD and EID data, the corresponding bar is solid (EID). AI labels emphasise the implementation of data-driven methods.

Figure 7. Materials mentioned in the publications from the corpus, visualised by atomic number and density. Materials exclusively described in classical algorithms are indicated in red. Blue atomic symbols represent materials described by both classical and data-driven techniques. Additionally, the atomic symbols indicated in grey are not mentioned in the publications from the corpus. Several authors reference the same materials, such as water, but these materials are plotted only once as a representative example.

Figure 8. Input data of the material decomposition challenge by Sidky et al. The image is taken from their publication [46] (Creative Commons license) reporting on the results of the challenge.

Figure 9. Schematic showing the core concept of the U-Net architecture, originally proposed by Ronneberger et al. [41].

Table 1. Empirical values for the coefficients

a_{1}

and

a_{2}

according to Equation (3) for brain and fat tissue calculated by Alvarez et al. [11].

Table 1. Empirical values for the coefficients

a_{1}

and

a_{2}

according to Equation (3) for brain and fat tissue calculated by Alvarez et al. [11].

Tissue	$a_{1}$ in ${keV}^{3} {cm}^{- 1}$	$a_{2}$ in ${cm}^{- 1}$
brain	4792	0.1694
fat	2868	0.1784

Table 2. Data-driven publications in the paper corpus.

Authors	Year
[18]	2021
[19]	2021
[20]	2022
[21]	2022
[22]	2021
[23]	2020
[24]	2023
[25]	2023
[26]	2023
[27]	2019
[28]	2022
[29]	2019
[30]	2024
[17]	2022
[31]	2022
[32]	2024

Table 3. Classical algorithmic publications in the paper corpus.

Authors	Year
[11]	1976
[33]	2016
[34]	2019
[35]	2009
[12]	2003
[36]	2023
[37]	2007
[38]	2022
[39]	2023
[40]	2011

Table 4. Overview of the models and datasets from the corpus. The works of Nadkarni [28] and Krebbers [25] lack descriptions of the sizes of their datasets.

Author	Model Family	Dataset Origin	Dataset Size	Data Domain
Long 2019 [27]	FC-PRNet	scan	≈200	image
Shi 2019 [29]	U-Net	simulation	140	projections
Bussod 2021 [19]	U-Net	scan (synchrotron)	450 K	projections
Gong 2020 [23]	U-Net + InceptNet	scan	110 K	image
Geng 2021 [22]	PMS-GAN	sim + scan	124 + 124	projections
Abascal 2021 [18]	U-Net	sim on real data	5400	image + projection
Su 2022 [17]	U-Net	sim	10 K	projection + image
Fang 2022 [21]	U-Net	sim	300	image
Nadkarni 2022 [28]	U-Net	scan	-	image
Wang 2022 [31]	GAN	scan	8159	image
Li 2023 [26]	U-Net + MLP	scans	7218	image
Cao 2022 [20]	CNN	sim	≈12 K	image
Guo 2023 [24]	GAN + U-Net	scan	1 K	image
Shi 2024 [30]	U-Net	simulation	35 K	image
Krebbers 2023 [25]	sensor3D	scan + XRD	-	image
Weiss 2024 [32]	U-Net	simulation	64 K	image

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Weiss, M.; Meisen, T. Reviewing Material-Sensitive Computed Tomography: From Handcrafted Algorithms to Modern Deep Learning. NDT 2024, 2, 286-310. https://doi.org/10.3390/ndt2030018

AMA Style

Weiss M, Meisen T. Reviewing Material-Sensitive Computed Tomography: From Handcrafted Algorithms to Modern Deep Learning. NDT. 2024; 2(3):286-310. https://doi.org/10.3390/ndt2030018

Chicago/Turabian Style

Weiss, Moritz, and Tobias Meisen. 2024. "Reviewing Material-Sensitive Computed Tomography: From Handcrafted Algorithms to Modern Deep Learning" NDT 2, no. 3: 286-310. https://doi.org/10.3390/ndt2030018

Article Menu

Reviewing Material-Sensitive Computed Tomography: From Handcrafted Algorithms to Modern Deep Learning

Abstract

1. Introduction

2. Physics Background and History

2.1. Motivation and Strategies for Multi-Energy CT

2.2. History

3. Methodology

4. Analysis of Selected Publications

4.1. Photon Energies and Detectors

4.2. Materials

4.3. Classical Algorithmic Approaches

4.4. Data-Driven Approaches

4.4.1. Datasets

4.4.2. Models

4.4.3. Computational Considerations

5. Summary and Future Trends

5.1. RQ1: Are There Common Datasets or Simulation Techniques Used for Deep Learning in Material-Resolving CT?

5.2. RQ2: Which Deep Learning Models Are Used and Are They Specifically Engineered for Material-Sensitive CT Applications?

5.3. RQ3: Does Employing Deep Learning Eliminate the Necessity for Expert Domain Knowledge in Material-Sensitive CT?

5.4. RQ4: What Are the Hardware Requirements for Training and Executing Models?

5.5. Future Trends

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI