Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

[1,2]\fnmStefan P. \surHein

\equalcont

These authors contributed equally to this work.

\equalcont

These authors contributed equally to this work.

1]\orgdivDepartment of Nuclear Medicine, \orgnameTechnical University of Munich, \orgaddress\cityMunich, \postcode81675, \countryGermany

2]\orgdivChair of Biomedical Physics, Department of Physics, \orgnameTUM School of Natural Sciences, Technical University of Munich, \orgaddress\cityGarching, \postcode85748, \countryGermany

3]\orgdivMunich Institute of Biomedical Engineering, \orgnameTechnical University of Munich, \orgaddress\cityGarching, \postcode85748, \countryGermany

4]\orgdivDivision of Nuclear Medicine and Molecular Imaging, The Russell H. Morgan Department of Radiology and Radiological Science, \orgnameJohns Hopkins University School of Medicine, \orgaddress\cityBaltimore, \postcode21205, \stateMD, \countryUSA

Towards AI Lesion Tracking in PET/CT Imaging: A Siamese-based CNN Pipeline applied on PSMA PET/CT Scans

stefan.hein@tum.de    \fnmManuel \surSchultheiss    \fnmAndrei \surGafita    \fnmRaphael \surZaum    \fnmFarid \surYagubbayli    \fnmIsabel \surRauscher    \fnmMatthias \surEiber    \fnmFranz \surPfeiffer    \fnmWolfgang A. \surWeber [ [ [ [
Abstract

Assessing tumor response to systemic therapies is one of the main applications of PET/CT. Routinely, only a small subset of index lesions out of multiple lesions is analyzed. However, this operator dependent selection may bias the results due to possible significant inter-metastatic heterogeneity of response to therapy. Automated, AI based approaches for lesion tracking hold promise in enabling the analysis of many more lesions and thus providing a better assessment of tumor response. This work introduces a Siamese CNN approach for lesion tracking between PET/CT scans.

Our approach is applied on the laborious task of tracking a high number of bone lesions in full-body baseline and follow-up [68Ga]Ga- or [18F]F-PSMA PET/CT scans after two cycles of [177Lu]Lu-PSMA therapy of metastatic castration resistant prostate cancer patients. Data preparation includes lesion segmentation and affine registration. Our algorithm extracts suitable lesion patches and forwards them into a Siamese CNN trained to classify the lesion patch pairs as corresponding or non-corresponding lesions. Experiments have been performed with different input patch types and a Siamese network in 2D and 3D. The CNN model successfully learned to classify lesion assignments, reaching a lesion tracking accuracy of 83 % in its best configuration with an AUC = 0.91. For remaining lesions the pipeline accomplished a re-identification rate of 89 %. We proved that a CNN may facilitate the tracking of multiple lesions in PSMA PET/CT scans. Future clinical studies are necessary if this improves the prediction of the outcome of therapies.

keywords:
PET/CT, Lesion Tracking, AI, Siamese, CNN, Bone Lesion, PSMA, Metastatic Castration Resistant Prostate Cancer

1 Introduction

The American Cancer Society statistics of 2023 describing a general rise of cancer incidence [1] and cancer still representing one of the most frequent causes of death, show the high importance of targeted therapies and both specific and sensitive diagnostics. The reliable assessment of tumor response to systemic therapy is essential for drug development and patient management. Especially the combination of Positron Emission Tomography (PET) and Computed Tomography (CT) in nuclear medicine serves as a superior technique for sensitive lesion detection[2] applicable for a high number of various cancer types (e.g. [3],[4],[5],[6]).

With prostate cancer being predicted to be the most common malignancy, surpassing breast cancer by 2030[7], the extensive research in the recent years has shown new successful ways for prostate cancer diagnostics and therapy, and their combination - theranostics. The success of prostate-specific membrane antigen (PSMA)-targeted radioligands labeled with positron- or γ𝛾\gammaitalic_γ-emitting isotopes for imaging or with β𝛽\betaitalic_β- or α𝛼\alphaitalic_α-emitting isotopes for therapy has rapidly led to new standards in prostate cancer management.[8] While PSMA-based PET/CT-imaging has already become the main diagnostic tool for response monitoring in prostate cancer cases in Europe [9], in the USA the usage of [68Ga]Ga-PSMA-11[10] is a rather young method, approved by the American Food and Drug Administration (FDA) in December 2020[11]. The same ligand type used for therapy as [177Lu]Lu-PSMA-617 (PluvictoTMTM{}^{\text{TM}}start_FLOATSUPERSCRIPT TM end_FLOATSUPERSCRIPT) has just recently been approved by the FDA in March 2022[12] and the European Medicines Agency in December 2022[13] for metastatic castration resistant prostate cancer patients.

Refer to caption
Figure 1: PSMA-PET/CT Scans showing a Male 73 year old Patient with Partial Response to [177Lu]Lu-PSMA Therapy: The baseline scan prior to therapy is shown on the left side, the follow-up scan after two therapy cycles on the right side: A&H Maximum intensity projections (MIP), B&E axial CT slices of the upper thorax, C&F corresponding axial PET slices, D&G PET/CT overlay. The MIP show a clear reduction of tumor burdon with less bone lesions. Physiologic PSMA-uptake can be found in the lacrimal glands, salivatory glands, liver, spleen, small bowel, kidneys and bladder. The white arrows indicate resolved lesions, the red arrows remaining lesions. A complete comparative 3D analysis of all lesions is laborious and could be facilitated by an automated tracking algorithm.

The high sensitivity and specificity of PSMA-PET scans in detecting lesions together with the high anatomical contrast of the CT offers an enormous opportunity for accurate response monitoring and the detection of disease progression or regression in early stages. However, this creates new challenges to be solved. Especially prostate cancer patients frequently show diffuse bone lesion patterns with a high number of tracer uptake foci to be examined.[14] Each tumor response assessment requires the comparison of lesions in baseline and follow-up scan. Since detailed manual lesion tracking is laborious and error-prone, routinely, only a small subset of index lesions out of multiple lesions is analyzed.[15] However, the operator dependent selection of index lesions may bias the results due to possible significant inter-metastatic heterogeneity of response to therapy. Imaging with PSMA-radioligands might require a deeper and broader lesion analysis.

New automated and AI-based approaches for lesion tracking may enable the analysis of many more lesions providing a better assessment of tumor response. Also, the acquired data may offer the differentiation of therapy response for single body regions and show the impact of lesions in specific regions on the outcome of the disease. With the possibly highest number of PSMA-uptake foci and consequently the highest difficulty, automated lesions tracking in PET/CT scans seems to find its best use case for bone lesions of prostate cancer patients. For this reason, we focus on prostate cancer bone lesions for our presented new tracking approach.

So far, only few approaches exist for automated lesion tracking specifically in PET/CT scans. As the anatomy surrounding lesions is mostly similar, the techniques rely on image registration.[16][17] To our best knowledge, there is yet no tracking algorithm for PET/CT applying artificial intelligence, whereas several deep learning approaches are applied for segmentation or localization of lesions in PET scans[18][19]. Opfer et al. were among the first to suggest a tracking algorithm for lesions in PET/CT scans. It is based on a global rigid registration of consecutive CT scans and applies block matching and SUV region growing after a click on a selected baseline lesion.[16] Fox et al. also proposes comparative analysis approach including a registration-based propagation of a selected region into the follow-up PET/CT scan, which has to be accepted or rejected by the reader.[17]

In contrast to PET/CT scans, lesion tracking has been more studied for other imaging modalities, as ultrasound or stand-alone CT. There, registration-based approaches are common as well. Tan et al. use a deformable multiresolution B-Spline registration for evaluating treatment response in CT images of ovarial cancer patients.[20] Within ultrasound and CT images, however, CNNs (Convolutional Neural Networks) have already been applied for lesion tracking. In Dankerl et al.[21] a pattern recognition network detects landmarks in the follow-up scan and localizes follow-up lesions based on a patient specific graphical network. Furthermore, on the basis of a three-step-registration algorithm, Hering et al. use a nnU-Net for baseline CT lesion segmentation and perform lesion tracking by segmenting lesions in the propagated ROI of the follow-up CT scan.[22]

In case deep learning is directly applied for lesion tracking, it is mostly in the shape of Siamese networks. In ultrasound images, they are applied by Gomariz et al.[23] and Liu et al.[24] for liver landmark and liver respiratory motion tracking. In CT scans, Rafael-Palou et al.[25] uses a 3D Siamese network for lung nodule tracking and Cai et al.[26] extend a 3D Siamese network in depth for CT lesion tracking.

In this work, we aim to fill the gap of an AI lesion tracker for PET/CT scans, by applying a Siamese CNN structure in two and three dimensions on bone lesions in PSMA PET/CT images of metastatic castration resistant prostate cancer patients. With this, we select one of the most difficult lesion tracking applications.

We created a new fully-automated algorithm, tracking all existing lesions in baseline and follow-up scan by assigning corresponding lesions as well as recognizing resolved, new or fused lesions. After applying lesion segmentation and whole-body affine registration, a case distinction extracts lesion patches from suitable positions within the 3D dataset that serve as input data for the Siamese network. We performed several experiments with different multi-channel patch types, inserting combinations of PET, CT or segmentation data into either a 2D or 3D Siamese architecture.

2 Methods

2.1 Workflow Overview

Each baseline and follow-up full-body CT and PET scan in DICOM format was processed by the qPSMA software[27], to create a lesion segmentation mask. This algorithm extracts bone lesions by applying an SUV threshold within a segmented bone mask, taking into account a possible misalignment of PET and CT data.

As the patient’s position can be shifted in follow-up scans, a rough alignment is required for relocalization during later patch extraction. Therefore an affine registration is applied (section 2.2).

In a next step image patches are cropped around every lesion and its surrounding region in the baseline and the follow-up scan. The suitable extraction positions are determined by a case distinction algorithm (section 2.3).

The Siamese CNN (section 2.4) compares pairs of baseline and follow-up lesion patches and decides, whether the patch pair shows corresponding lesions. The approach narrows the tracking down to a classification problem: true patch pair for corresponding lesions, false patch pair for non-corresponding lesions.

We performed our experiments in two and three dimensions. Therefore, we created a related 2D and 3D Siamese network and respectively cropped all patches in a 2D and 3D version.

Refer to caption
Figure 2: Processing workflow: Schematic presentation of the proposed pipeline for AI-based PET/CT lesion tracking. Data preparation of the baseline and follow-up scans include a binary bone lesion segmentation (Seg.) and an affine image registration. Patches are extracted around every lesion and then analyzed by a siamese CNN, which assigns the lesions in the baseline and follow-up PET/CT scans.

2.2 Image Registration

Since the body shape of the patients possibly changes in the course of therapy, an affine registration is performed that also allows for shearing and scaling in addition to the rotation and translation of a rigid transformation. The follow-up CT scan is registered towards the baseline CT scan. As a result we obtain a shear and rotation matrix 𝐀=(a11,a33)𝐀subscript𝑎11subscript𝑎33\mathbf{A}=(a_{11},...\ a_{33})bold_A = ( italic_a start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT , … italic_a start_POSTSUBSCRIPT 33 end_POSTSUBSCRIPT ) and the translation vector 𝐭=(t1,t2,t3)𝐭subscript𝑡1subscript𝑡2subscript𝑡3\mathbf{t}=(t_{1},t_{2},t_{3})bold_t = ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), along with the center of rotation 𝐜=(c1,c2,c3)𝐜subscript𝑐1subscript𝑐2subscript𝑐3\mathbf{c}=(c_{1},c_{2},c_{3})bold_c = ( italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), defined as the center of the baseline scan[ElastixManual]. The transformation is then applied on the follow-up PET scan as well as on its 3D bone segmentation mask.

For the relocalization during patch extraction, positions 𝐱𝐱\mathbf{x}bold_x in the baseline and follow-up original scans can be projected into the respective other scan using the forward transformation 𝐓(𝐱)𝐓𝐱\mathbf{T}(\mathbf{x})bold_T ( bold_x ) or the reverse transformation 𝐓𝟏(𝐱)superscript𝐓1𝐱\mathbf{T^{-1}}(\mathbf{x})bold_T start_POSTSUPERSCRIPT - bold_1 end_POSTSUPERSCRIPT ( bold_x ) by applying the inverse shear and rotation matrix 𝐀1superscript𝐀1\mathbf{A}^{-1}bold_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT.

𝐓:𝐱followup=𝐀(𝐱baseline𝐜)+𝐭+𝐜:𝐓subscript𝐱𝑓𝑜𝑙𝑙𝑜𝑤𝑢𝑝𝐀subscript𝐱𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒𝐜𝐭𝐜\mathbf{T}:\ \mathbf{x}_{follow-up}=\mathbf{A}\cdot\left(\mathbf{x}_{\ % baseline}-\mathbf{c}\right)+\mathbf{t}+\mathbf{c}bold_T : bold_x start_POSTSUBSCRIPT italic_f italic_o italic_l italic_l italic_o italic_w - italic_u italic_p end_POSTSUBSCRIPT = bold_A ⋅ ( bold_x start_POSTSUBSCRIPT italic_b italic_a italic_s italic_e italic_l italic_i italic_n italic_e end_POSTSUBSCRIPT - bold_c ) + bold_t + bold_c (1)
𝐓𝟏:𝐱baseline=𝐀1(𝐱followup𝐜𝐭)+𝐜:superscript𝐓1subscript𝐱𝑏𝑎𝑠𝑒𝑙𝑖𝑛𝑒superscript𝐀1subscript𝐱𝑓𝑜𝑙𝑙𝑜𝑤𝑢𝑝𝐜𝐭𝐜\mathbf{T^{-1}}:\ \mathbf{x}_{\ baseline}=\mathbf{A}^{-1}\cdot\left(\mathbf{x}% _{follow-up}-\mathbf{c}-\mathbf{t}\right)+\mathbf{c}bold_T start_POSTSUPERSCRIPT - bold_1 end_POSTSUPERSCRIPT : bold_x start_POSTSUBSCRIPT italic_b italic_a italic_s italic_e italic_l italic_i italic_n italic_e end_POSTSUBSCRIPT = bold_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ ( bold_x start_POSTSUBSCRIPT italic_f italic_o italic_l italic_l italic_o italic_w - italic_u italic_p end_POSTSUBSCRIPT - bold_c - bold_t ) + bold_c (2)

For image registration SimpleElastix[28] was used.

2.3 Patch Extraction

The patches in this study were obtained by cropping axial layers from a whole-body dataset. For a 2D patch around a point of interest p(x,y,z)𝑝𝑥𝑦𝑧p(x,y,z)italic_p ( italic_x , italic_y , italic_z ), the z-position selects the axial layer of the dataset, which consists of a CT scan, PET scan, and binary lesion segmentation. The scans are cropped around the x-y-position within a 50×\times×50 pixel frame, as shown in figure 3. In 3D, several axial slices are included in the patch. For the experiments, 3D patch sizes of 50×\times×50×\times×5 pixels and 50×\times×50×\times×11 pixels were chosen. The extraction position is determined in the same way as for 2D patches, with the only difference being that for 50×\times×50×\times×5 pixel patches, two axial layers above and below the determined z-position are added to the patch, and for 50×\times×50×\times×11 pixel patches, respectively, five axial layers above and below. As described in 3.1, the resolution in x- and y-direction is higher than in z-direction and thus a pixel in an x-y-plane displays a smaller area than those in z-direction.

Refer to caption
Figure 3: 2D lesion patch extraction: An axial 50×\times×50 pixel patch is cropped around the determined extraction point p(x,y,z)𝑝𝑥𝑦𝑧p(x,y,z)italic_p ( italic_x , italic_y , italic_z ).

When creating a two-dimensional or a small three-dimensional patch out of a three-dimensional lesion, the position of the extraction is of high importance. Extracting the patch only at the Center of Mass (CoM) of each lesion and then creating patch pairs can lead to unsuitable true patch pairs that do not show the same anatomical region, even though they represent corresponding lesions. This is the case when a baseline lesion divides into several small follow-up lesions, as shown in figure 4A. When comparing a relatively small to a large lesion, the two centers of mass are on different axial layers, even though they are corresponding lesions. Therefore, the ROI of the larger lesion, within the comparison with the smaller one, is not its CoM. Instead, its ROI is represented by the anatomical environment of the smaller lesion. It is, thus, necessary to transfer the z-coordinate of the second CoM into the large lesion and extract a patch around the larger lesion at the adjusted axial layer. Hence, the lesion patches cannot be extracted independently. It is always necessary to extract a patch pair together, as the patch position for a lesion depends both on its own structure and on that of the compared lesion.

An algorithm uses hierarchical cases applied sequentially to determine the patch extraction point (figure 4 B). The 3D CoM of a lesion is used as a starting point and centered within the 2D axial slice of the lesion.

Case I projects the corrected CoM of the follow-up lesion into the baseline lesion using 𝐓𝟏(𝐱)superscript𝐓1𝐱\mathbf{T^{-1}}(\mathbf{x})bold_T start_POSTSUPERSCRIPT - bold_1 end_POSTSUPERSCRIPT ( bold_x ), mostly used if the baseline lesion is larger and has either shrunk in volume or has divided into several smaller follow-up lesions.

Case II is implemented if Case I is not successful, projecting the corrected CoM of the baseline lesion into the follow-up lesion (𝐓(𝐱)𝐓𝐱\mathbf{T}(\mathbf{x})bold_T ( bold_x )), mostly applied with the baseline lesion being the smaller one.

Case III is utilized if due to shifts between the baseline and follow-up lesions the transferred CoMs cannot be identified within the lesions, mostly due to minor registration errors. To address this challenge, the algorithm locates the intersection of both lesions and designates the center of the overlap as the patch point for each lesion. The overlap is then transformed using 𝐓(𝐱)𝐓𝐱\mathbf{T}(\mathbf{x})bold_T ( bold_x ) to determine its coordinates in the follow-up scan.

Even though Cases A to C mostly extract true lesion pairs, sometimes, due to registration errors, they may yield false patch pairs. Therefore, the use of the Siamese CNN approach remains relevant.

Case IV is designed for lesions that do not overlap, which can happen when corresponding lesions are not accurately registered. This method is also utilized for most false patch pairs. For the smaller lesion in the pair, the patch point is set at the axial layer of its CoM. For the larger lesion, the shape of the smaller lesion is projected onto the nearest end of the larger lesion in the z-direction. The point is then taken at the layer where the CoM of the projected smaller lesion would be positioned. Mathematically, the z-coordinate of the extraction point of the larger lesion zextr,lsubscript𝑧𝑒𝑥𝑡𝑟𝑙z_{extr,l}italic_z start_POSTSUBSCRIPT italic_e italic_x italic_t italic_r , italic_l end_POSTSUBSCRIPT can be expressed as follows:

zextr,l={zmax,lΔzs/2ifzCoM,s>zCoM,lzmin,l+Δzs/2ifzCoM,s<zCoM,l .subscript𝑧𝑒𝑥𝑡𝑟𝑙casessubscript𝑧𝑚𝑎𝑥𝑙Δsubscript𝑧𝑠2ifsubscript𝑧𝐶𝑜𝑀𝑠subscript𝑧𝐶𝑜𝑀𝑙subscript𝑧𝑚𝑖𝑛𝑙Δsubscript𝑧𝑠2ifsubscript𝑧𝐶𝑜𝑀𝑠subscript𝑧𝐶𝑜𝑀𝑙 .z_{extr,l}=\begin{cases}z_{max,l}-\Delta z_{s}/2&\ \ \text{if}\ z_{CoM,s}>z_{% CoM,l}\\ z_{min,l}+\Delta z_{s}/2&\ \ \text{if}\ z_{CoM,s}<z_{CoM,l}\text{ .}\end{cases}italic_z start_POSTSUBSCRIPT italic_e italic_x italic_t italic_r , italic_l end_POSTSUBSCRIPT = { start_ROW start_CELL italic_z start_POSTSUBSCRIPT italic_m italic_a italic_x , italic_l end_POSTSUBSCRIPT - roman_Δ italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT / 2 end_CELL start_CELL if italic_z start_POSTSUBSCRIPT italic_C italic_o italic_M , italic_s end_POSTSUBSCRIPT > italic_z start_POSTSUBSCRIPT italic_C italic_o italic_M , italic_l end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_z start_POSTSUBSCRIPT italic_m italic_i italic_n , italic_l end_POSTSUBSCRIPT + roman_Δ italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT / 2 end_CELL start_CELL if italic_z start_POSTSUBSCRIPT italic_C italic_o italic_M , italic_s end_POSTSUBSCRIPT < italic_z start_POSTSUBSCRIPT italic_C italic_o italic_M , italic_l end_POSTSUBSCRIPT . end_CELL end_ROW (3)

Here, ΔzsΔsubscript𝑧𝑠\Delta z_{s}roman_Δ italic_z start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT represents the height in the z-direction of the smaller lesion, and zmax,lsubscript𝑧𝑚𝑎𝑥𝑙z_{max,l}italic_z start_POSTSUBSCRIPT italic_m italic_a italic_x , italic_l end_POSTSUBSCRIPT or zmin,lsubscript𝑧𝑚𝑖𝑛𝑙z_{min,l}italic_z start_POSTSUBSCRIPT italic_m italic_i italic_n , italic_l end_POSTSUBSCRIPT indicates the z-coordinate of the highest or lowest point of the larger lesion. zCoM,ssubscript𝑧𝐶𝑜𝑀𝑠z_{CoM,s}italic_z start_POSTSUBSCRIPT italic_C italic_o italic_M , italic_s end_POSTSUBSCRIPT and zCoM,lsubscript𝑧𝐶𝑜𝑀𝑙z_{CoM,l}italic_z start_POSTSUBSCRIPT italic_C italic_o italic_M , italic_l end_POSTSUBSCRIPT denote the z-coordinates of the CoMs of the two lesions.

For the test set, a lesion from the baseline scan is compared to all follow-up lesions in a specific ROI around the margins of the projected baseline lesion (𝐓(𝐱)𝐓𝐱\mathbf{T}(\mathbf{x})bold_T ( bold_x ), eq. 1). We defined it with Δx=10Δ𝑥10\Delta x=10roman_Δ italic_x = 10, Δy=10Δ𝑦10\Delta y=10roman_Δ italic_y = 10 and Δz=5Δ𝑧5\Delta z=5roman_Δ italic_z = 5. In z-direction the interval is smaller as its pixel size is usually larger than in x- and y- direction. This ROI was chosen as it contains 99% of the corresponding lesions, when tested on the train and validation set.

Refer to caption
Figure 4: Principle of the patch extraction algorithm:
(A) Unsuitable patch pair after extraction at center of mass: A large baseline lesion shrank and divided into three smaller lesions in the follow up-scan. The example shows a baseline and follow-up lesion patch of the left scapula extracted at the lesions’ center of mass (CoM). Patch extraction at each lesion’s CoM can lead to patch pairs showing different anatomical environments, even though they show corresponding lesions. For this reason, a detailed patch extraction algorithm determines the suitable patch extraction point p(x,y,z)𝑝𝑥𝑦𝑧p(x,y,z)italic_p ( italic_x , italic_y , italic_z ).
(B) Patch extraction cases: In the algorithm a case distinction hierarchically applies the four cases A-D to find suitable points p(x,y,z)𝑝𝑥𝑦𝑧p(x,y,z)italic_p ( italic_x , italic_y , italic_z ) within the baseline and follow-up lesion for the patch pair extraction. If one case does not lead to a result, the algorithm passes on to the next one. For each case, the illustration shows the baseline lesion(s) on the left side and the follow-up lesion(s) on the right side. The gray cross-sectional areas indicate the final extracted axial patches. 𝐓𝐓\mathbf{T}bold_T and 𝐓𝟏superscript𝐓1\mathbf{T^{-1}}bold_T start_POSTSUPERSCRIPT - bold_1 end_POSTSUPERSCRIPT indicate point transfers to the respective other scan (eq. 1, eq. 2).

2.4 Siamese CNN

A Siamese network consisting of two parallel CNN branches was chosen to process the input of patch pairs (fig. 5). It uses shared weights for the two branches and was inspired by Koch et al. [29], who used a similar network for one-shot image recognition. . Each CNN branch has three convolutional layers with ReLU activation and three pooling layers. Batch normalization is applied before every convolutional ayer. The output of a branch again undergoes batch normalization, is flattened to a vector 𝐡(𝐗)𝐡𝐗\mathbf{h}(\mathbf{X})bold_h ( bold_X ) and is merged in a custom-defined 𝐋𝟏subscript𝐋1\mathbf{L_{1}}bold_L start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT-layer, which takes the element-wise absolute difference of the feature vectors 𝐡(𝐗𝟏)𝐡subscript𝐗1\mathbf{h}(\mathbf{X_{1}})bold_h ( bold_X start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ) and 𝐡(𝐗𝟐)𝐡subscript𝐗2\mathbf{h}(\mathbf{X_{2}})bold_h ( bold_X start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT ):

𝐋𝟏(𝐗𝟏,𝐗𝟐)i=𝐡(𝐗𝟏)i𝐡(𝐗𝟐)i1 .subscript𝐋1subscriptsubscript𝐗1subscript𝐗2𝑖subscriptnorm𝐡subscriptsubscript𝐗1𝑖𝐡subscriptsubscript𝐗2𝑖1 .\mathbf{L_{1}}(\mathbf{X_{1}},\mathbf{X_{2}})_{i}=||\mathbf{h}(\mathbf{X_{1}})% _{i}-\mathbf{h}(\mathbf{X_{2}})_{i}||_{1}\text{ .}bold_L start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = | | bold_h ( bold_X start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_h ( bold_X start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT ) start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (4)

The output 𝐋𝟏(𝐗𝟏,𝐗𝟐)subscript𝐋1subscript𝐗1subscript𝐗2\mathbf{L_{1}}(\mathbf{X_{1}},\mathbf{X_{2}})bold_L start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT ) is fed to a final fully-connected decision layer (dense layer) and a softmax function distributing the final network output to the two classes true patch pair and wrong patch pair.

Refer to caption
Figure 5: Structure of the siamese network: Patches are processed by two parallel CNN branches with shared weights, whose output is merged by a 𝐋𝟏subscript𝐋1\mathbf{L_{1}}bold_L start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT-layer. Architecture details of the branches are shown in table 1.

To process the different patch dimensions (2D, 3D), the Siamese network was created in 2D and 3D. Also, for the differently sized 3D patches with z=5 and z=11, the kernel size of the CNN had to be adapted as a larger patch can be further scaled down by larger kernels. Details of the branch architecture are displayed in table 1.

Table 1: Architecture Details of the used Siamese Branches
Layer Size Size Kernels Stride Output Output
2D 3D z5||||z11 2D (3D) 2D 3D z5||||z11
Input 50×\times×50×\times×1* 50×\times×50×\times×5||||11×\times×1*
Convolutional 5×\times×5×\times×1† 5×\times×5×\times×2||||4×\times×1† 20 1×\times×1(×\times×1) 46×\times×46×\times×20 46×\times×46×\times×4||||8×\times×20
Pooling 3×\times×3 3×\times×3×\times×1||||2 2×\times×2(×\times×1) 22×\times×22×\times×20 22×\times×22×\times×4||||7×\times×20
Convolutional 3×\times×3×\times×20 3×\times×3×\times×2||||3×\times×20 40 1×\times×1(×\times×1) 22×\times×22×\times×40 22×\times×22×\times×3||||5×\times×40
Pooling 2×\times×2 2×\times×2×\times×1||||2 2×\times×2(×\times×1) 11×\times×11×\times×40 11×\times×11×\times×3||||5×\times×40
Convolutional 3×\times×3×\times×40 3×\times×3×\times×2||||3×\times×40 50 1×\times×1(×\times×1) 9×\times×9×\times×50 9×\times×9×\times×2||||3×\times×50
Pooling 2×\times×2 2×\times×2×\times×1||||2 2×\times×2(×\times×1) 4×\times×4×\times×50 4×\times×4×\times×2||||2×\times×50
\botrule
00footnotetext: The table shows the structure of the layers for a 2D Siamese branch and for a 3D Siamese branch, which exists in two different versions for patches with z=5 and patches with z=11. Annotation example for the kernel size of the first 3D convolutional layer: 5×\times×5×\times×2×\times×1 for patches with z=5 and 5×\times×5×\times×4×\times×1 for patches with z=11.00footnotetext: *1-channel patches as input. For 2-channel patches, the input shape is 50×\times×50×\times×2 (2D) and 50×\times×50×\times×5||||11×\times×2 (3D), respectively.
00footnotetext: †First layer structure for 1-channel patches. For 2-channel patches, only the kernel size of first layer is extended to 5×\times×5×\times×2 (2D)and 5×\times×5×\times×4||||2×\times×2 (3D), respectively.

Applying the Adam optimizer[30], which already effectively adapts the learning rate due to momentum optimization, for our network case, we achieved best performance with an initial learning rate of 1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT. Nevertheless, an additional combination of learning rate scheduling together with momentum optimization has shown good results.[31] In several siamese networks, different learning rate decay have been applied.[29][32] For this reason, we chose performance scheduling for the learning rate with a patience of 5555 epochs, a scheduling factor of 0.20.20.20.2 and a minimum learning rate of 1×1051superscript1051\times 10^{-5}1 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT.

As the lesion tracking task requires the comparison of different scans with possibly changed morphologies during therapy, a high grade of generalization is necessary. In order to achieve this and prevent overfitting, regularization is applied in various ways. In addition to the batch normalization, 2superscript2\ell^{2}roman_ℓ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-parameter regularization is used with a penalty factor of 5×1045superscript1045\times 10^{-4}5 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT as recommended for CNNs.[33] Following further suggestions for CNNs, a dropout rate of 40% is applied to the top three layers.[34]

3 Experiments and Results

3.1 Dataset

In this project, the processed dataset consists of 36 patients with metastatic castration resistant prostate cancer with a high number of bone lesions requiring follow-up tracking. The patients were treated with [177Lu]Lu-PSMA-I&T (imaging and therapy)[35] and a total number of 188 cycles (median 5 cycles, range 1-31). Eligibility criteria were previous treatment with abiraterone or enzalutamide, previous taxane-based chemotherapy or chemoineligibility, and positive PSMA-ligand uptake at PET scan. The [177Lu]Lu-PSMA-I&T was given 6-8 weekly with an activity of 7.4 GBq in up to six cycles in a row with a potential continuation after a therapy break. Within the selected patient group, prostate-specific antigen decline of >50%absentpercent50>50\%> 50 % was achieved in 15 patients (43 %), median PSA-progression-free survival was 5.3 months, and median overall survival (OS) was 14.2 months. All patients gave written consent for evaluation of their data.

For each patient a full-body baseline and a follow-up [68Ga]Ga-PSMA-11[10] or [18F]F-rhPSMA-7[36][37] PET/CT scan after two cycles of [177Lu]Lu-PSMA-I&T[35] therapy are analyzed, adding up to 2111 baseline lesions and 2658 separately segmented foci in the follow-up scans. The baseline and follow-up lesions show 1490 true lesion assignments.

The PET/CT tracers were administered in compliance with the German Medicinal Products Act, AMG x13(2b), and in accordance with the responsible regulatory body (Government of Oberbayern). They were synthesized and all scans performed at our institution as decribed in Eiber et al.[38] and Kroenke et al. [39]. For the image acquisition the scanners Siemens Biograph mCT and Siemens Biograph Vision were used. After image reconstruction, the transaxial pixel size was 4.07 mm for PET and 1.52 mm for CT, each with a 5 mm or 3mm slice thickness.

Ground truth lesion pairs were manually assigned by an experienced nuclear medicine physician using an in-house software. The dataset was splitted into 70% training set, 15% validation set and 15% test set. The distribution was applied in a way that ensured this ratio for the patients as well as for the lesions. In addition, data augmentation in form of patch rotation was implemented to further increase the size of the training dataset.

3.2 Experimental Setup

Lesions may evolve along different patterns, which have to be recognized by the pipeline. A lesion can be remaining in both scans, split into several lesions or fuse with neighboring lesions. Also, a baseline lesion can resolve completely or a new lesion can develop in the follow-up.

The Siamese CNN is trained with balanced ground truth data containing equal numbers of true patch pairs and false patch pairs. For the application of the network on a test set, all possibly relevant lesion combinations are fed to the CNN in the shape of patch pairs. The predictions are used to assign the lesions. If all patch pairs of a baseline lesion are classified as false patch pairs, this lesion can be declared as resolved and no follow-up lesions are assigned. The same situation for a lesion from the follow-up scan suggests that it is a new lesion.

As described above (sec. 2.3 and 2.4), experiments have been performed with a 2D network and two 3D network variations. Respectively, all extracted patches exist in a 2D and two 3D versions.

Since whole-body imaging data is available from CT, PET and binary lesion segmentation, all of it can serve as an input for the patch extraction. Therefore, we processed four different patch types in the network, single-channel CT patches and several two-channel patches that combined CT & PET, CT & Binary Lesion Segmentation and CT & Segmented CT data. As Segmented CT we define the element-wise product of the Binary Lesion Segmentation patch with the CT patch, showing the CT information only within the segmented lesion.

All runs have been carried out several times with different seeds to show statistical stability.

3.3 Reference Model

To validate the learning ability of the used network, the performance of the trained CNN is compared to that of a non-trainable model, which only compares the intensities of two 2D CT patches by subtracting them and taking the pixel-wise absolute value of the difference. This is equal to directly applying the 𝐋𝟏subscript𝐋1\mathbf{L_{1}}bold_L start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT-layer (eq. 4) without any CNN branches. The reference model is incapable of learning to compare anatomical structures. The decision output y𝑦yitalic_y of the intensity model with the input patches 𝐗𝟏subscript𝐗1\mathbf{X_{1}}bold_X start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT and 𝐗𝟐subscript𝐗2\mathbf{X_{2}}bold_X start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT of shape (N,N)𝑁𝑁(N,N)( italic_N , italic_N ) can be described as:

y=11N2i,j=1N𝐋𝟏i,j(𝐗𝟏,𝐗𝟐)𝑦11superscript𝑁2superscriptsubscript𝑖𝑗1𝑁subscriptsubscript𝐋1𝑖𝑗subscript𝐗1subscript𝐗2y=1-\frac{1}{N^{2}}\sum_{i,j=1}^{N}\mathbf{L_{1}}_{i,j}(\mathbf{X_{1}},\mathbf% {X_{2}})italic_y = 1 - divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i , italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_L start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( bold_X start_POSTSUBSCRIPT bold_1 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT bold_2 end_POSTSUBSCRIPT ) (5)

3.4 Siamese Network Performance

Every training was carried out for 400 epochs. We chose the lowest validation loss as criteria to select the epoch with the best weights. During testing, as default, a threshold of 0.5 is applied for the classification problem with the two classes true patch pair and false patch pair. For each combination of 2D or 3D Siamese CNN and the different patch types, table 2 shows the performance result on the validation set during training and on the test set.

Table 2: Performance of the Siamese network
Dim. Patch Size Patch Type Validation Acc Validation Loss Test Accuracy
2D 50×\times×50 CT 0.790±0.015plus-or-minus0.7900.0150.790\pm 0.0150.790 ± 0.015 0.508±0.030plus-or-minus0.5080.0300.508\pm 0.0300.508 ± 0.030 0.830±0.003plus-or-minus0.8300.0030.830\pm 0.0030.830 ± 0.003
2D 50×\times×50 CT/PET 0.800±0.008plus-or-minus0.8000.0080.800\pm 0.0080.800 ± 0.008 0.521±0.016plus-or-minus0.5210.0160.521\pm 0.0160.521 ± 0.016 0.806±0.012plus-or-minus0.8060.0120.806\pm 0.0120.806 ± 0.012
2D 50×\times×50 CT/Seg 0.781±0.010plus-or-minus0.7810.0100.781\pm 0.0100.781 ± 0.010 0.530±0.011plus-or-minus0.5300.0110.530\pm 0.0110.530 ± 0.011 0.795±0.001plus-or-minus0.7950.0010.795\pm 0.0010.795 ± 0.001
2D 50×\times×50 CT/Seg CT 0.795±0.011plus-or-minus0.7950.0110.795\pm 0.0110.795 ± 0.011 0.515±0.013plus-or-minus0.5150.0130.515\pm 0.0130.515 ± 0.013 0.788±0.011plus-or-minus0.7880.0110.788\pm 0.0110.788 ± 0.011
3D 50×\times×50×\times×5 CT 0.804±0.008plus-or-minus0.8040.0080.804\pm 0.0080.804 ± 0.008 0.486±0.008plus-or-minus0.4860.0080.486\pm 0.0080.486 ± 0.008 0.820±0.008plus-or-minus0.8200.0080.820\pm 0.0080.820 ± 0.008
3D 50×\times×50×\times×5 CT/PET 0.802±0.002plus-or-minus0.8020.0020.802\pm 0.0020.802 ± 0.002 0.499±0.007plus-or-minus0.4990.0070.499\pm 0.0070.499 ± 0.007 0.786±0.019plus-or-minus0.7860.0190.786\pm 0.0190.786 ± 0.019
3D 50×\times×50×\times×5 CT/Seg 0.819±0.005plus-or-minus0.8190.0050.819\pm 0.0050.819 ± 0.005 0.472±0.010plus-or-minus0.4720.0100.472\pm 0.0100.472 ± 0.010 0.790±0.022plus-or-minus0.7900.0220.790\pm 0.0220.790 ± 0.022
3D 50×\times×50×\times×5 CT/Seg CT 0.821±0.011plus-or-minus0.8210.0110.821\pm 0.0110.821 ± 0.011 0.475±0.023plus-or-minus0.4750.0230.475\pm 0.0230.475 ± 0.023 0.803±0.012plus-or-minus0.8030.0120.803\pm 0.0120.803 ± 0.012
3D 50×\times×50×\times×11 CT 0.829±0.013plus-or-minus0.8290.0130.829\pm 0.0130.829 ± 0.013 0.441±0.016plus-or-minus0.4410.0160.441\pm 0.0160.441 ± 0.016 0.786±0.007plus-or-minus0.7860.0070.786\pm 0.0070.786 ± 0.007
3D 50×\times×50×\times×11 CT/PET 0.817±0.004plus-or-minus0.8170.0040.817\pm 0.0040.817 ± 0.004 0.485±0.006plus-or-minus0.4850.0060.485\pm 0.0060.485 ± 0.006 0.761±0.017plus-or-minus0.7610.0170.761\pm 0.0170.761 ± 0.017
3D 50×\times×50×\times×11 CT/Seg 0.830±0.004plus-or-minus0.8300.0040.830\pm 0.0040.830 ± 0.004 0.463±0.007plus-or-minus0.4630.0070.463\pm 0.0070.463 ± 0.007 0.791±0.017plus-or-minus0.7910.0170.791\pm 0.0170.791 ± 0.017
3D 50×\times×50×\times×11 CT/Seg CT 0.836±0.007plus-or-minus0.8360.0070.836\pm 0.0070.836 ± 0.007 0.465±0.016plus-or-minus0.4650.0160.465\pm 0.0160.465 ± 0.016 0.790±0.018plus-or-minus0.7900.0180.790\pm 0.0180.790 ± 0.018
\botrule
00footnotetext: Seg: Binary Lesion Segmentation; Seg CT: Segmented CT

On the test set, the 2D Siamese CNN outperforms for most of the patch types. Within the same network type, training with the single-channel CT patches reaches the best test accuracy, especially for the 2D network (83 %) and the 3D Siamese CNN with 50×\times×50×\times×5 patches (82 %). Considering the standard deviation, the CT patches also work amongst the best in the 3D network with 50×\times×50×\times×11 patches.

For the 3D networks training, the two-channel CT/Segmented CT patches rank second and yield good test accuracy values with e.g. 80% for 50×\times×50×\times×5 patches. Furthermore, the direct comparison between the two 3D Siamese networks shows better test results for the 50×\times×50×\times×5 architecture.

Since the accuracy depends on the applied threshold of the network, we analyzed the results with ROC curves and the respective AUC. The ROC curves of the differently trained networks are shown in figure 6 and confirm the described results. As a reference model, all graphs show the ROC of the non-trainable intensities model (sec. 3.3). All trained networks outperform the intensity model proving a successful training process. Clearly, the 2D Siamese network trained with single-channel CT patches reaches the best result with an AUC=0.91.

Refer to caption
(a) 2D Siamese CNN with 50×\times×50 Patches
Refer to caption
(b) 3D Siamese CNN with 50×\times×50×\times×5 Patches
Refer to caption
(c) 3D Siamese CNN with 50×\times×50×\times×11 Patches
Figure 6: ROC curves of the siamese networks trained with different patch types: The AUC values for each curve are given in the legend. The best network configuration is reached with a 2D Siamese network trained with single-channel CT patches (AUC=0.91). The non-trainable reference intensity model (sec. 3.3) reaches an AUC=0.82, which is outperformed by all networks and patch types.

To obtain the best model indicated by the upper left of the ROC, we performed threshold optimization by maximizing Gmeansubscript𝐺meanG_{\text{mean}}italic_G start_POSTSUBSCRIPT mean end_POSTSUBSCRIPT. Table 3 displays the statistical parameters before and after threshold optimization. The final model reaches an 83% accuracy, a 88% sensitivity, a 80% specificity and a 76% precision. This results in a Gmeansubscript𝐺meanG_{\text{mean}}italic_G start_POSTSUBSCRIPT mean end_POSTSUBSCRIPT of 0.84 and a F1subscript𝐹1F_{1}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-score of 0.82. The resulting confusion matrix with a total number of 524 decision cases in the test set is shown in figure 7.

Table 3: Statistical parameters for the performance of the 2D Siamese network trained with 50×\times×50 CT patches
Threshold Accuracy Sensitivity Specificity Precision
Default 0.5 0.830±0.003plus-or-minus0.8300.0030.830\pm 0.0030.830 ± 0.003 0.731±0.039plus-or-minus0.7310.0390.731\pm 0.0390.731 ± 0.039 0.910±0.032plus-or-minus0.9100.0320.910\pm 0.0320.910 ± 0.032 0.850±0.037plus-or-minus0.8500.0370.850\pm 0.0370.850 ± 0.037
Gmeansubscript𝐺meanG_{\text{mean}}italic_G start_POSTSUBSCRIPT mean end_POSTSUBSCRIPT optimized 0.833±0.006plus-or-minus0.8330.0060.833\pm 0.0060.833 ± 0.006 0.881±0.021plus-or-minus0.8810.0210.881\pm 0.0210.881 ± 0.021 0.800±0.016plus-or-minus0.8000.0160.800\pm 0.0160.800 ± 0.016 0.755±0.012plus-or-minus0.7550.0120.755\pm 0.0120.755 ± 0.012
\botrule
Refer to caption
Figure 7: Confusion matrix for lesion tracking: The figure illustrates the number of true positives (upper left), false positives(upper right), false negatives (lower left) and true negatives (lower right) among a cumulative total of 524 decisions by the 2D siamese network trained with 50×\times×50 CT patches. Threshold optimization was performed by Gmeansubscript𝐺meanG_{\text{mean}}italic_G start_POSTSUBSCRIPT mean end_POSTSUBSCRIPT maximization.

As a last step of the pipeline, the results of the network decisions are used to determine if baseline lesions remain, resolve or newly appear in the follow-up. The pipeline performance on the detection of the different lesion development cases are shown in figure 8. Remaining lesions are tracked successfully in 89.3% of the cases. Amongst them, baseline lesions persisting as single lesions in the follow-up are recognized with a 93.2% success rate, whereas splitting cases are recognized in 67.6% and fused lesions in 44.4% of the cases. Resolved baseline lesions are correctly identified in 40.2% and new follow-up lesions in 38.6% of the instances.

An application example of Siamese CNN lesion tracking in two consecutive scans of a patient is shown in figure 9, where the CNN in one lesion tracking case even outperforms the nuclear medicine specialist (fig. 9 C).

Refer to caption
Figure 8: Pipeline performance on different lesion development cases: Remaining lesion cases, with the subcategories of lesions remaining as single lesions, split lesion cases and fused lesion cases, and finally the cases of resolved (disappeared) lesions and newly developed lesions.
Refer to caption
Figure 9: Application of CNN lesion tracking on PSMA-PET/CT scans of a prostate cancer patient: Comparison of the nuclear medicine specialist’s ground truth lesion assignment (left column) to the result of CNN lesion tracking (right column), with each the baseline scan on the left side of the dotted line and the follow-up on the right side. Baseline and follow-up lesions with the same color have been assigned as corresponding lesions, fused lesions are marked yellow, resolved lesions green and new lesions red. (A) 3D overview of the lesion distribution and the tracking result: Exemplary, the coronal, sagittal and axial view display planes of the lesion indicated with the white cross. All shown lesions except one (B) have been correctly tracked by the CNN. (B) Resolved lesion closely located to a remaining lesion: The CNN has wrongly classified the resolved lesion as a fusion with the remaining lesion due to its close anatomic spatial relation. Often, these cases even show a high inter-observer variability amongst nuclear medicine specialists. (C) CNN outperforms nuclear medicine specialist: The ground truth set shows a wrongly annotated split lesion. However, the CNN suggested the second follow-up lesion to be a new lesion. After reviewing the PET dataset, the CNN’s decision has been proven correct.

4 Discussion

In our project, for the first time, an AI-based pipeline proved its ability to successfully and automatically track bone lesions in PSMA-PET/CT scans by using the two imaging modalities. The high sensitivity and specificity of the PET enables clear lesion segmentation and positioning, whereas the high anatomic contrast of the CT facilitates accurate position tracking. In contrast to mere CT imaging, where lesions have to be located in the follow-up without knowing their shape in advance, in multimodal PET/CT, the follow-up lesion is already presegmented by the PET and only has to be assigned with the help of its CT position. This is a main advantage for the analysis of the high number of neighboring lesions in the PSMA-PET/CT scans. Our technique models the way of a physician relating baseline and follow-up.

This assumption has been proved in our experiments. The analysis revealed that PET data and its binary lesion segmentation are crucial for localization during patch extraction. However, the PET and segmentation information within patches are not consistent enough for lesion tracking by comparing these patches. This is supported by the fact that CT patches in our pipeline outperformed all other patch types with two channels, including the PET, binary lesion segmentation data, or the segmented CT, which also depends on the PET lesion segmentation. While the anatomical environment of the lesions represented by the CT data is roughly maintained during therapy, the shape of the PET signal at its segmentation can change strongly. This data in a second channel does not provide added information for the Siamese CNN but might even mislead it.

Even with single-channel CT patches for the Siamese network, the overall procedure is a PET/CT lesion tracking approach. Our project differs from other mere CT lesion tracking approaches as the lesions are selected and cropped to patches on the basis of the PET scan instead of the CT scan. In our approach, the position of the CT patches contains the information given by the PET data.

Given the lesion tracking task as a three-dimensional problem, the result of our experiments with the best performance achieved by a 2D Siamese CNN instead of a 3D network seems surprising. However, the 3D positional relation of the lesions is already processed by the case distinction algorithm during patch extraction (sec. 2.3) passing the information on in form of extracted patches and their lesion IDs. The 3D patches contain more additional information about the environment, which is partly overlapping with the information handled by the case distinction algorithm. Even though the larger patches processed by 3D networks with far more parameters partly reaches similar results as the 2D network, the larger input does not represent an added value to the pipeline. In our CNN approach, the additional information might even lead to reduced performance. The fact that the 3D network trained with 50×\times×50×\times×5 patches achieves better results than the one processing 50×\times×50×\times×11 inputs might have the same origin. Processing 2D lesion patches by a 2D Siamese network seems to be the more efficient and effective way for our PET/CT lesion tracking approach.

As a further result of our experiments, we found the added value of the 50×\times×50×\times×5 CT/Segmented CT patches amongst the two-channel patches for the 3D networks. The additional segmented CT information helps positioning the lesion within the patch, which is more relevant in the larger 3D patches and less in 2D patches.

Comparing the result of our AI PET/CT lesion tracking with an accuracy of 83.3% to other CNN approaches for lesion tracking in CT scans, we reach a similar performance level, even tough in our cases have the additional challenge of a high number of closely neighboring bone lesions that have to be distinguished. For a lower number of neighboring lesions, Hering et al. [22] achieve 80% test accuracy during whole-body soft-tissue lesion tracking and Rafael-Palou et al. [25] re-identify 88.8% of pulmonary nodules in follow-up scans. Within the group of remaining lesions, we even reach re-identification rate of 89.3%.

Decoupling the segmentation and tracking made it possible to recognize resolved, new, remaining, fused and split lesions. Especially split lesions can cause problems in a combined approach, which runs the tracking based on follow-up lesion segmentation.[22] In our pipeline, the position of the lesion contained in the segmentation is processed by the patch extraction case distinction. By being fed possible patch pairs, the Siamese network only classifies between true and false patch pairs and decides independently of the fact that the same baseline lesion might already have been assigned to other follow-up lesions. With this, several assignments or no assignment of a lesion are possible, enabling the lesion tracking to identify resolved, new, split, and fused lesions.

The fact that we used a heterogeneous dataset for the training and testing of our pipeline, including images acquired by different scanners and with even different slice thicknesses in z-direction shows the robustness of the approach. It is capable of handling clinical data. As our developed principle of AI lesion tracking is independent of bone lesions in PSMA-PET/CT scans, it can also be applied on other types of PET/CT scans.

Those were only an application example.

Our findings are encouraging for the use in multimodal imaging. Nevertheless, the following limitations should be noted. In our approach, inaccuracies in the segmentation by the qPSMA software[27] are forwarded to the tracking result. Especially, in cases of disease progress and growing lesions, the segmentation software wrongfully draws large, fused follow-up lesions spreading in several bones instead of segmenting several uptake foci separately. In this case, the completely separated lesion segmentation and tracking causes errors. This is still a challenge for our pipeline and lowers the detection rate of fused lesion cases. Due to the segmentation method [27] being specialized to the segmentation of bone lesions in PSMA-PET/CT scans, we chose those as an application example. However, a new and more pipeline-optimized segmentation approach is currently in development. In addition, our test set contained a high number of resolved baseline and new follow-up lesions right beside remaining lesion pairs. Even tough our pipeline handles most of the neighboring lesions well, this leads to a lower success rate for resolved and new lesion detection, which is to be improved in future works. A possibility could be to integrate a transformer network structure into our pipeline, as proposed by Tang et al.[40]. The network might increase the anatomical precision for the described cases. In this context, it has be to noted that for the closely located lesions human readers reach their limits, too, concerning the differentiation of split and new, or resolved and fused lesions. For this reason, the gold standard is far from being perfect. This is demonstrated by figure 9C, where a CNN is outperforming a nuclear physician’s decision, proving the ground truth dataset wrong in this case.

5 Conclusion

In this paper, we successfully fill the gap of AI lesion tracking in PET/CT scans, using a pipeline of segmentation, registration, a case distinction algorithm, and a Siamese Convolutional Neural Network. The AI approach is capable of handling the high amount of lesions detected by the new technology of PSMA-PET/CT.

Different training configurations of 2D and 3D network approaches were examined, achieving an overall tracking accuracy of 83% with an AUC=0.91 and a re-identification rate for remaining lesions of 89%. This result is comparable to tracking approaches in other imaging modalities, even though PSMA-PET/CT scans show a relatively high number of closely located lesions.

The advantage of our project is the fully automated approach. Instead of having the user select single lesions to track, our pipeline automatically tracks all existing lesions in the scans, making it applicable for patients with a high number of lesions. With this, our approach enables a new way to process higher amounts of data for tumor response assessment. The presented approach is not limited to PSMA-PET/CT scans but is applicable for every oncological PET/CT with any other radiopharmaceutical. It could revolutionize the speed and depth of PET/CT analysis. Future clinical studies are planned to evaluate its impact.

M.S. is currently employed by Siemens Healthineers but was not affiliated with the company during his contributions to the research phase of this manuscript.

References

  • [1] Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA Cancer J Clin. 2023;73:17–48.
  • [2] Weber WA, Figlin R. Monitoring cancer treatment with PET/CT: Does it make a difference?. J Nucl Med. 2007;48:36–45.
  • [3] Kasamon YL, Jones RJ, Wahl RL. Integrating PET and PET/CT into the risk-adapted therapy of lymphoma. J Nucl Med. 2007;48:19–27.
  • [4] Van den Abbeele AD. The lessons of GIST—PET and PET/CT: A new paradigm for imaging. Oncologist. 2008;13:8–13.
  • [5] Roedl JB, Halpern EF, Colen RR, Sahani DV, Fischman AJ, Blake MA. Metabolic tumor width parameters as determined on PET/CT predict disease-free survival and treatment response in squamous cell carcinoma of the esophagus. Mol Imaging Biol. 2009;11:54–60.
  • [6] Rauscher I, Maurer T, Fendler WP, Sommer WH, Schwaiger M, Eiber M. 68Ga-PSMA ligand PET/CT in patients with prostate cancer: How we review and report. Cancer Imaging. 2016;16:1–10.
  • [7] Quante AS, Ming C, Rottmann M, et al. Projections of cancer incidence and cancer-related deaths in Germany by 2020 and 2030. Cancer Med. 2016;5:2649–2656.
  • [8] Weber WA, Barthel H, Bengel F, Eiber M, Herrmann K, Schäfers M. What is theranostics?. J Nucl Med. 2023;64:669–670.
  • [9] Eiber M, Herrmann K, Calais J, et al. Prostate cancer molecular imaging standardized evaluation (PROMISE): Proposed miTNM classification for the interpretation of PSMA-ligand PET/CT. J Nucl Med. 2018;59:469–478.
  • [10] Eder M, Schäfer M, Bauder-Wüst U, et al. 68Ga-complex lipophilicity and the targeting property of a urea-based PSMA inhibitor for PET imaging. Bioconjug Chem. 2012;23:688–697.
  • [11] Hennrich U, Eder M. [68Ga]Ga-PSMA-11: The first FDA-approved 68 Ga-radiopharmaceutical for PET imaging of prostate cancer. Pharmaceuticals. 2021;14:1–12.
  • [12] Hennrich U, Eder M. [177Lu]Lu-PSMA-617 (PluvictoTM): The first FDA-approved radiotherapeutical for treatment of prostate cancer. Pharmaceuticals. 2022;15:1–13.
  • [13] Pluvicto. European Medicines Agency website. https://www.ema.europa.eu/en/medicines/human/EPAR/pluvicto. Accessed on April 15, 2024.
  • [14] Carlin BI, Andriole GL. The natural history, skeletal complications, and management of bone metastases in patients with prostate carcinoma. Cancer. 2000;88:2989–2994.
  • [15] Wahl RL, Jacene H, Kasamon Y, Lodge MA. From RECIST to PERCIST: Evolving considerations for PET response criteria in solid tumors. J Nucl Med. 2009;50:122–150.
  • [16] Opfer R, Brenner W, Carlsen I, Renisch S, Sabczynski J, Wiemker R. Automatic lesion tracking for a PET/CT based computer aided cancer therapy monitoring system. In: Proc Medical Imaging 2008: Computer-Aided Diagnosis. SPIE; 2008;6915:691513.
  • [17] Fox JJ, Autran-Blanc E, Morris MJ, et al. Practical Approach for Comparative Analysis of Multilesion Molecular Imaging Using a Semiautomated Program for PET/CT. J Nucl Med. 2011;52:1727–1732.
  • [18] Seifert R, Herrmann K, Kleesiek J, et al. Semiautomatically quantified tumor volume using 68Ga-PSMA-11 PET as a biomarker for survival in patients with advanced prostate cancer. J Nucl Med. 2020;61:1786–1792.
  • [19] Sibille L, Seifert R, Avramovic N, et al. (18)F-FDG PET/CT uptake classification in lymphoma and lung cancer by using deep convolutional neural networks. Radiology. 2020;294:445–452.
  • [20] Tan M, Li Z, Qiu Y, et al. A new approach to evaluate drug treatment response of ovarian cancer patients based on deformable image registration. IEEE Trans Med Imag. 2016;35:316–325.
  • [21] Dankerl P, Cavallaro A, Dietzel M, et al. Clinical evaluation of semi-automatic landmark-based lesion tracking software for CT-scans. Cancer Imaging. 2014;14:1–7.
  • [22] Hering A, Peisen F, Amaral T, et al. Whole-body soft-tissue lesion tracking and segmentation in longitudinal CT imaging studies. In: Proc of Machine Learning Research. ICLR; 2021;Vol. 143:312–326.
  • [23] Gomariz A, Li W, Ozkan E, Tanner C, Goksel O. Siamese networks with location prior for landmark tracking in liver ultrasound sequences. In: IEEE 16th International Symposium on Biomedical Imaging. IEEE; 2019:1757–1760.
  • [24] Liu F, Liu D, Tian J, Xie X, Yang X, Wang K. Cascaded one-shot deformable convolutional neural networks: Developing a deep learning model for respiratory motion estimation in ultrasound sequences. Med Image Anal. 2020;65.
  • [25] Rafael-Palou X, Aubanell A, Bonavita I, et al. Re-identification and growth detection of pulmonary nodules without image registration using 3D siamese neural networks. Med Image Anal. 2021;67.
  • [26] Cai J, Tang Y, Yan K, et al. Deep lesion tracker: Monitoring lesions in 4D longitudinal imaging studies. In: Proc IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE; 2021:15154–15164.
  • [27] Gafita A, Bieth M, Krönke M, et al. qPSMA: Semiautomatic software for whole-body tumor burden assessment in prostate cancer using (68)Ga-PSMA11 PET/CT.. J. Nucl. Med.. 2019;60:1277–1283.
  • [28] Marstal K, Berendsen F, Staring M, Klein S. SimpleElastix: A user-friendly, multi-lingual library for medical image registration. In: Proc IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE; 2016:574–582.
  • [29] Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop. ICLM; 2015;Vol. 2.
  • [30] Kingma DP, Ba JL. Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations. ICLR; 2015.
  • [31] Senior A, Heigold G, Yang K, Inc G. An empirical study of learning rates in deep neural networks for speech recognition. In: Proc IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE; 2013:6724–6728.
  • [32] Zagoruyko S, Komodakis N. Learning to compare image patches via convolutional neural networks. In: Proc IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE; 2015:4353–4361.
  • [33] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Comm ACM. 2017;60:84–90.
  • [34] Szepesi P, Szilágyi L. Detection of pneumonia using convolutional neural networks and deep learning. Biocybern Biomed Eng. 2022;42:1012–1022.
  • [35] Karimzadeh A, Heck M, Tauber R, et al. 177Lu-PSMA-I&T for treatment of metastatic castration-resistant prostate cancer: Prognostic value of scintigraphic and clinical biomarkers. J Nucl Med. 2023;64:402–409.
  • [36] Wurzer A, Carlo DD, Schmidt A, et al. Radiohybrid ligands: A novel tracer concept exemplified by 18F- or 68Ga-labeled rhPSMA inhibitors. J Nucl Med. 2020;61:735–742.
  • [37] Wurzer A, Parzinger M, Konrad M, et al. Preclinical comparison of four [18F, natGa]rhPSMA-7 isomers: influence of the stereoconfiguration on pharmacokinetics. EJNMMI Research. 2020;10.
  • [38] Eiber M, Maurer T, Souvatzoglou M, et al. Evaluation of hybrid 68Ga-PSMA ligand PET/CT in 248 patients with biochemical recurrence after radical prostatectomy. J Nucl Med. 2015;56:668–674.
  • [39] Kroenke M, Schweiger L, Horn T, et al. Validation of 18F-rhPSMA-7 and 18F-rhPSMA-7.3 PET imaging results with histopathology from salvage surgery in patients with biochemical recurrence of prostate cancer. J Nucl Med. 2022;63:1809–1814.
  • [40] Tang W, Kang H, Zhang H, Yu P, Arnold CW, Zhang R. Transformer Lesion Tracker. In: Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. Springer; 2022;LNCS Vol. 13436:196–206.