Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

TauAD: MRI-free Tau Anomaly Detection in PET Imaging via Conditioned Diffusion Models

Lujia Zhong1,3, Shuo Huang1,2, Jiaxin Yue1,3,
Jianwei Zhang1,3, Zhiwei Deng1,3, Wenhao Chi1,3, Yonggang Shi1,2,3
1
Stevens Neuroimaging and Informatics Institute, Keck School of Medicine,
University of Southern California
2 Alfred E. Mann Department of Biomedical Engineering, Viterbi School of Engineering,
University of Southern California
3 Ming Hsieh Department of Electrical and Computer Engineering, Viterbi School of Engineering,
University of Southern California
Abstract

The emergence of tau PET imaging over the last decade has enabled Alzheimer’s disease (AD) researchers to examine tau pathology in vivo and more effectively characterize the disease trajectories of AD. Current tau PET analysis methods, however, typically perform inferences on large cortical ROIs and are limited in the detection of localized tau pathology that varies across subjects. Furthermore, a high-resolution MRI is required to carry out conventional tau PET analysis, which is not commonly acquired in clinical practices and may not be acquired for many elderly patients with dementia due to strong motion artifacts, claustrophobia, or certain metal implants. In this work, we propose a novel conditional diffusion model to perform MRI-free anomaly detection from tau PET imaging data. By including individualized conditions and two complementary loss maps from pseudo-healthy and pseudo-unhealthy reconstructions, our model computes an anomaly map across the entire brain area that allows simply training a support vector machine (SVM) for classifying disease severity. We train our model on ADNI subjects (n=534) and evaluate its performance on a separate dataset from the preclinical subjects of the A4 clinical trial (n=447). We demonstrate that our method outperforms baseline generative models and the conventional Z-score-based method in anomaly localization without mis-detecting off-target bindings in sub-cortical and out-of-brain areas. By classifying the A4 subjects according to their anomaly map using the SVM trained on ADNI data, we show that our method can successfully group preclinical subjects with significantly different cognitive functions, which further demonstrates the effectiveness of our method in capturing biologically relevant anomaly in tau PET imaging.

1 Introduction

There is an explosion of interest in tau pathology in AD studies because of its strong correlation with clinical symptoms [20] and the increasing availability of tau PET imaging [23]. Previous tau PET studies typically rely on canonical cortical parcellation to perform ROI-level inferences of tau pathology [23], and there is a lack of robust tools for the automated detection of localized tau pathology that varies across subjects. In addition, the reliance on high-resolution MRI in popular tau PET analysis tools poses another limitation in clinical practice [7]. First, elderly patients with dementia are much more prone to movement artifacts than healthy volunteers during the MRI scanning process, which often results in low quality images unusable for downstream analysis [29]. Second, more importantly, high-resolution MRI scans are typically not collected in clinical settings where only scans with thick slices are acquired for radiology reads [12]. Third, elderly patients with claustrophobia or certain metal implants that cannot go through the MRI scanning process will be excluded from studies if high-resolution MRIs are required. To advance the state-of-the-art in tau PET analysis, we propose a novel diffusion model-based pipeline for the automated and MRI-free detection of tau pathology from PET imaging data in this work.

For the localized detection of anomalies in tau PET imaging data, Z-score-based normative analysis can be applied in a common space [1] , but the varied cortical folding and resulting misalignment will no doubt compromise our ability in the accurate detection of tau pathology. Recent works utilizing VAEs [5, 9], flow-based models [33], and GANs [8, 24, 2, 34] hold the promise of more accurate individualized reconstruction and anomaly localization. Denoising diffusion probabilistic models (DDPM) [11] recently have been applied to medical anomaly detection because of their superior generation ability and controllability [32, 14, 3]. The application of these methods to tau pathology detection, however, is limited by suboptimal utilization of data information and reconstruction of pseudo-healthy data.

In this work, we propose a conditioned diffusion model that extends the DDPM model with individualized conditional information for localized tau anomaly detection in PET imaging. Our method also has the desired MRI-free property [17] for tau PET analysis, which not only allows the inclusion of tau PET data from subjects without MRI scans, but can also simplify the workflow in clinical practice where high-resolution MRI is not typically acquired. Different from previous generative methods for anomaly detection, our proposed method reconstructs both pseudo-healthy and pseudo-unhealthy data to compute two loss maps that provide complementary information for the improved detection of anomaly due to tau pathology. Another desired property of out model is that it can automatically differentiate real anomalies in tau PET images from common off-target bindings. We train our conditioned diffusion models on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data (n=534) [19] and apply the trained model to preclinical subjects from the Anti-Amyloid Treatment in Asymptomatic Alzheimer’s Disease study (A4) (n=447) [26]. Compared with baseline models of GAN, VAE, DDPM, DDIM [32] and the conventional Z-score-based method, the results demonstrate the superiority of our method in anomaly localization without false positive detection of off-target bindings. By applying an SVM trained on the anomaly maps of ADNI subjects to the A4 subjects with no clinical AD symptoms, we show that the tau anomaly detected by our method can successfully group preclinical subjects with significantly different cognitive functions.

2 Preliminary

Diffusion models [25, 11] have recently gained increasing attention due to their ability to model complex data distributions and provide better controllability during the stochastic sampling process. It starts by adding random noise to a given data point 𝐱0subscript𝐱0\mathbf{x}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to transform it into a simple distribution, such as a standard Gaussian distribution 𝒩(0,I)𝒩0𝐼\mathcal{N}(0,I)caligraphic_N ( 0 , italic_I ), and then gradually transforms it back into a more complex distribution q(𝐱0)𝑞subscript𝐱0q(\mathbf{x}_{0})italic_q ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) where the original data is located. It includes a predefined forward process and a learned backward process.

2.1 Forward process

The forward diffusion process is designed to gradually corrupt the data by adding noise over a series of time steps. Specifically, given a data point 𝐱0subscript𝐱0\mathbf{x}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT from the data distribution q(𝐱0)𝑞subscript𝐱0q(\mathbf{x}_{0})italic_q ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), the forward diffusion process produces a sequence of noisy data points 𝐱1,𝐱2,,𝐱Tsubscript𝐱1subscript𝐱2subscript𝐱𝑇\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{T}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, where 𝐱Tsubscript𝐱𝑇\mathbf{x}_{T}bold_x start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is closest to pure standard Gaussian distribution. At each time step t𝑡titalic_t, noise is added according to a predefined variance schedule βtsubscript𝛽𝑡\beta_{t}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT:

q(𝐱t𝐱t1)=𝒩(𝐱t;1βt𝐱t1,βt𝐈)𝑞conditionalsubscript𝐱𝑡subscript𝐱𝑡1𝒩subscript𝐱𝑡1subscript𝛽𝑡subscript𝐱𝑡1subscript𝛽𝑡𝐈q(\mathbf{x}_{t}\mid\mathbf{x}_{t-1})=\mathcal{N}(\mathbf{x}_{t};\sqrt{1-\beta% _{t}}\mathbf{x}_{t-1},\beta_{t}\mathbf{I})italic_q ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ bold_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) = caligraphic_N ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; square-root start_ARG 1 - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG bold_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_I ) (1)

where 𝒩𝒩\mathcal{N}caligraphic_N denotes a Gaussian distribution, and βtsubscript𝛽𝑡\beta_{t}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT controls the amount of noise added at each step. This noising process can be simplified as a single step directly from 𝐱0subscript𝐱0\mathbf{x}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to 𝐱tsubscript𝐱𝑡\mathbf{x}_{t}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. This marginal distribution of 𝐱tsubscript𝐱𝑡\mathbf{x}_{t}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT given the initial data point 𝐱0subscript𝐱0\mathbf{x}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT can be expressed as:

q(𝐱t𝐱0)=𝒩(𝐱t;α¯t𝐱0,(1α¯t)𝐈)𝑞conditionalsubscript𝐱𝑡subscript𝐱0𝒩subscript𝐱𝑡subscript¯𝛼𝑡subscript𝐱01subscript¯𝛼𝑡𝐈q(\mathbf{x}_{t}\mid\mathbf{x}_{0})=\mathcal{N}(\mathbf{x}_{t};\sqrt{\bar{% \alpha}_{t}}\mathbf{x}_{0},(1-\bar{\alpha}_{t})\mathbf{I})italic_q ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_N ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; square-root start_ARG over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ( 1 - over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) bold_I ) (2)

where α¯t=s=1t(1βs)subscript¯𝛼𝑡superscriptsubscriptproduct𝑠1𝑡1subscript𝛽𝑠\bar{\alpha}_{t}=\prod_{s=1}^{t}(1-\beta_{s})over¯ start_ARG italic_α end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ∏ start_POSTSUBSCRIPT italic_s = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( 1 - italic_β start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ).

2.2 Reverse process

The goal of the reverse diffusion process is to generate new data samples by reversing the forward process, involving transforming a sample from a simple distribution, usually a Gaussian noise distribution, back into a data point in the original data distribution.

The reverse process can be described as:

pθ(𝐱t1𝐱t)=𝒩(𝐱t1;μθ(𝐱t,t),Σθ(𝐱t,t))subscript𝑝𝜃conditionalsubscript𝐱𝑡1subscript𝐱𝑡𝒩subscript𝐱𝑡1subscript𝜇𝜃subscript𝐱𝑡𝑡subscriptΣ𝜃subscript𝐱𝑡𝑡p_{\theta}(\mathbf{x}_{t-1}\mid\mathbf{x}_{t})=\mathcal{N}(\mathbf{x}_{t-1};% \mu_{\theta}(\mathbf{x}_{t},t),\Sigma_{\theta}(\mathbf{x}_{t},t))italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∣ bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = caligraphic_N ( bold_x start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ; italic_μ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) , roman_Σ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) ) (3)

where μθsubscript𝜇𝜃\mu_{\theta}italic_μ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT and ΣθsubscriptΣ𝜃\Sigma_{\theta}roman_Σ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT are the mean and covariance parameters of the Gaussian distribution, typically modeled by a neural network with parameters θ𝜃\thetaitalic_θ. The objective is to learn the parameters θ𝜃\thetaitalic_θ such that the reverse process approximates the true data distribution. This is typically done by maximizing the Evidence Lower Bound (ELBo) [18]:

log𝔼q(𝐱1:T𝐱0)[pθ(𝐱0:T)q(𝐱1:T𝐱0)]subscript𝔼𝑞conditionalsubscript𝐱:1𝑇subscript𝐱0delimited-[]subscript𝑝𝜃subscript𝐱:0𝑇𝑞conditionalsubscript𝐱:1𝑇subscript𝐱0\log\mathbb{E}_{q(\mathbf{x}_{1:T}\mid\mathbf{x}_{0})}\left[\frac{p_{\theta}(% \mathbf{x}_{0:T})}{q(\mathbf{x}_{1:T}\mid\mathbf{x}_{0})}\right]roman_log blackboard_E start_POSTSUBSCRIPT italic_q ( bold_x start_POSTSUBSCRIPT 1 : italic_T end_POSTSUBSCRIPT ∣ bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT [ divide start_ARG italic_p start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT 0 : italic_T end_POSTSUBSCRIPT ) end_ARG start_ARG italic_q ( bold_x start_POSTSUBSCRIPT 1 : italic_T end_POSTSUBSCRIPT ∣ bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG ] (4)

2.3 Training objective

The final training objective for diffusion models involves minimizing the KL divergence between the posterior forward and reverse processes. In practice, the objective function can be simplified with reparameterization techniques as:

Lsimplified=𝔼t,𝐱0,ϵ[ϵϵθ(𝐱t,t)22]subscript𝐿simplifiedsubscript𝔼𝑡subscript𝐱0bold-italic-ϵdelimited-[]subscriptsuperscriptnormbold-italic-ϵsubscriptbold-italic-ϵ𝜃subscript𝐱𝑡𝑡22L_{\text{simplified}}=\mathbb{E}_{t,\mathbf{x}_{0},\boldsymbol{\epsilon}}\left% [\left\|\boldsymbol{\epsilon}-\boldsymbol{\epsilon}_{\theta}(\mathbf{x}_{t},t)% \right\|^{2}_{2}\right]italic_L start_POSTSUBSCRIPT simplified end_POSTSUBSCRIPT = blackboard_E start_POSTSUBSCRIPT italic_t , bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_ϵ end_POSTSUBSCRIPT [ ∥ bold_italic_ϵ - bold_italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] (5)

where ϵ𝒩(𝟎,𝐈)similar-tobold-italic-ϵ𝒩0𝐈\boldsymbol{\epsilon}\sim\mathcal{N}(\mathbf{0},\mathbf{I})bold_italic_ϵ ∼ caligraphic_N ( bold_0 , bold_I ) is standard Gaussian noise and ϵθsubscriptbold-italic-ϵ𝜃\boldsymbol{\epsilon}_{\theta}bold_italic_ϵ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is a neural network parameterized by θ𝜃\thetaitalic_θ that predicts the added noise.

3 Method

In this work, we train a conditioned diffusion model on data from both cognitively normal (CN) subjects and patients with mild cognitive impairment (MCI) / Alzheimer’s disease (AD). Our model is trained on 2D slices with additional constraints of edge, slice position, and class labels (0 for normal and 1 for anomalous data) for better anatomy alignment and more accurate anomaly localization with “MRI-free" tau PET data preprocessing.

Refer to caption
Figure 1: Overview of our proposed method, including three phases. (a) Preprocessing of 3D data by selecting slice, calculating edge map, and adding noise; (b) training phase with conditions of edge, position (Pos.), class label (Cls.), and timestep (t); (c) inference stage of diffusion models with two different class labels.

The overall architecture of our proposed method is illustrated in Fig. 1, which splits into 3 phases: (a) we preprocess 3D data by selecting a slice and calculating an edge map, then a noisy version slice is acquired by adding Gaussian noise according to Eq. 2; (b) we train a diffusion model on the 2D slice conditioned on edge map, slice position, and class label; (c) with the well-trained diffusion model, inference is applied to any given slice to predict its pseudo-healthy and pseudo-unhealthy versions with class labels of 0 and 1, respectively. Then, the pseudo-unhealthy and pseudo-healthy loss maps are acquired by the pseudo-unhealthy slice subtracting the original slice and the original slice subtracting the pseudo-healthy slice, respectively, without taking absolute values.

3.1 Conditions

Our method models the conditional distribution of 2D TauPET images based on class label, edge map, and additional positional embedding for better reconstruction and slice consistency within 3D data. During training, each slice is assigned a number denoting their absolute position within 3D data, starting from 0, 1, 2, …, and the edge map is calculated for each slice separately. Edge maps are concatenated with the noisy version of slices in channel dimension as model inputs. To encode the slice positions, we map them to a higher dimension according to sinusoidal positional embedding [28]. The class label serves as a lookup number and is fed into a learnable embedding layer.

Refer to caption
Figure 2: Impact of edge map and class labels on data reconstruction. (a) data with anomaly and (b) data without anomaly.

3.2 Anomaly localization

To reconstruct the pseudo-unhealthy and pseudo-healthy data, we add 500 steps of noise to the original data during inference, which keeps partial data information and allows us to enhance or suppress the anomaly conditioned on different class labels. Besides, the edge map contains the high-frequency information of the original data, including anomaly areas with salient intensity changes. In this context, as demonstrated in Fig. 2, the conditioned generation process is jointly controlled by class labels and edge maps. For data from unhealthy subjects (MCI or AD), the edge map characterizes anomaly information, which can be identified by diffusion models, leading to enhanced and suppressed anomalies in reconstructed data conditioned on class one and class zero, respectively. For data from healthy subjects, where both edge maps and input data lack any indication of anomaly, reconstructions conditioned on both class labels appear to be anomaly-free because there is no evident anomaly in the original data to enhance or suppress in the perspective of diffusion models, leading to near zero values in the resulting anomaly map. Therefore, we add the loss maps from both class labels to obtain a robust anomaly map of tau pathology.

3.3 Anomaly detection and scoring

By concatenating the anomaly maps of all 2D slices in the tau PET scan of a subject, we can calcuate a subject-level anomaly score by averaging the anomaly values within the brain area. Furthermore, we train an SVM classifier based on the 3D anomaly map to classify the disease severity of each subject. The features for training the SVM classifier are the mean anomaly scores in a set of predefined ROIs, where all derived features originate from the same training dataset and associated class labels used for training conditioned diffusion models. The ROIs in each tau PET scan are obtained by registering each scan to a tau PET-based template space, where 112 ROIs are defined on the anatomical ROI following the FreeSurfer protocol [6].

4 Experiments

4.1 Datasets

We demonstrate the efficacy of our method with datasets from the ADNI and A4 studies. We use the ADNI dataset exclusively for model training and validation and the A4 dataset for final evaluation.

All tau PET scans used in this work are acquired with the AV1451/Flortaucipir (18F) tracer. For “MRI-free" preprocessing, similar to [27], we first build a template by choosing a healthy ADNI subject (Subject ID: 002_S_0413), which has both tau PET and T1-weighted MRI scans. After processing it with FreeSurfer [6], we use the PET data in the MRI space of this subject as the template. All ADNI and A4 tau PET data were nonlinearly registered to this template with Elastix [15], followed by a Gaussian smoothing and the calculation of standardized uptake value ratio (SUVR) that divides intensity at each voxel by the mean value within the inferior cerebellum. The total preprocessing time for a single subject is within 1 minute, allowing fast and robust preprocessing of tau PET imaging without the overhead of high-resolution MRI scanning, T1 MRI preprocessing, and cross-modality registration.

ADNI subjects with tau PET scans were screened for inclusion in our study. We used a tau positivity criterion defined as follows: 95 percentile of tau SUVR >>>1.4 [31]. In total, 380 tau-negative CN subjects, 111 tau-positive MCI, and 43 tau-positive AD subjects were included in our work to train the conditioned diffusion model and SVM classifier.

The A4 cohort includes 447 subjects with tau PET scans from the A4 study. All A4 subjects have a global clinical dementia rating (CDR) score of zero. The A4 data are exclusively used for final evaluation without being used for model training or validation in any form.

In total, we have 534 subjects from ADNI for model training and validation and 447 subjects from A4 for final performance evaluation.

4.2 Implementation Details

We resize all data to 160×160×160 and only use the middle part axial slices that cover the brain area from 33rd slice to 110th slice for training, including 78 slices for each subject. Because most of the SUVR values are less than 2, we empirically normalize the data by dividing them by 2. The conditioned diffusion models with a similar architecture to [11] are trained on the 534 ADNI subjects, totaling 41,652 slices, for 100,000 iterations with a batch size of 8 and AdamW optimizer with a learning rate of 2e-4. We choose a training to validation data split ratio of 9:1. Following [22], we use v-prediction in all our experiments. The training costs about 40 hours on a single NVIDIA RTX A5000 GPU. To compare anomaly localization with Z-score and deep learning baselines, we build a mean and a standard deviation 3D map as templates from the 380 CN subjects in ADNI, and the GAN/VAE/DDPM/DDIM baselines are trained on only the CN subjects. The backbone of DDPM and our method are the same architecture of U-Net with ResNet Blocks [10] following the implementation of [35]; the GAN and VAE baselines are also constructed with ResNet Blocks to allow a fair comparison following the implementations of [36] and [21], respectively. As for the DDIM baseline, We directly adapt the trained DDPM model with the same noising and denoising process of DDIM for anomaly localization as mentioned in [32]. We find the best noising steps of DDIM in tau PET images to be 150 instead of 500 as in [32]. Therefore, for comparison fairness, we use 150 steps of DDIM to achieve the best anomaly localization ability of the baseline.

To evaluate the classification results, we use the following cognitive assessments of the A4 study: (1) The COGDIGIT test requires a subject to match numbers with symbols on a sheet of paper according to a given look-up table, and has been shown to be sensitive to cognitive functioning changes [13]; (2) The COGFCSR16 test can identify mild dementia in patients with AD. A subject is administered three trials of free recall (no cues) of 16 pictures with interference of counting backward for 20 seconds, where the pictures were studied before the trials [16]; (3) The COGLOGIC test measures subjects’ ability to immediately recall a story verbatim that was orally presented. After 20 or 30 minutes, recall without cues is again tested (delayed recall) [16].

4.3 Results

Table 1: Image reconstruction results on A4. The metrics of the conditioned DDPM (class labels) are calculated between the original data and the reconstruction under the SVM-predicted label. The pos. represents slice position condition.
Model SSIM \uparrow MSE \downarrow PSNR \uparrow
GAN (ResNet backbone) 0.8747 ±plus-or-minus\pm± 0.0635 0.0068 ±plus-or-minus\pm± 0.0122 25.1129 ±plus-or-minus\pm± 1.4051
VAE (ResNet backbone) 0.9280 ±plus-or-minus\pm± 0.0470 0.0018 ±plus-or-minus\pm± 0.0044 32.5514 ±plus-or-minus\pm± 2.2528
DDPM 0.8006 ±plus-or-minus\pm± 0.0915 0.0096 ±plus-or-minus\pm± 0.0059 25.2951 ±plus-or-minus\pm± 2.0427
Conditioned DDPM+pos. 0.8047 ±plus-or-minus\pm± 0.0969 0.0076 ±plus-or-minus\pm± 0.0050 26.3203 ±plus-or-minus\pm± 1.9530
Conditioned DDPM+pos.+edge (ours) 0.8913 ±plus-or-minus\pm± 0.0745 0.0028 ±plus-or-minus\pm± 0.0031 30.5090 ±plus-or-minus\pm± 2.4281

To demonstrate the effectiveness of our method, we first illustrate the improvement of reconstruction performance in Table 1, the superiority of precise reconstruction on anomaly detection in Fig. 3, and the robustness of our sampling process in Fig. 4. Then, we compare the anomaly localization of our method with the traditional Z-score method and state-of-the-art deep learning competitors of GAN/VAE/DDPM/DDIM in Fig. 5. Finally, we further validate our anomaly localization by analyzing the pathology-related anomaly distribution amongst four main lobes in Fig. 6 and classifying the preclinical subjects into two groups, which shows significant cognitive function differences in Fig. 7.

Refer to caption
Figure 3: Reconstruction of two subjects (a) and (b) in brain regions where the cortical folding typically differs across subjects. The pseudo-healthy data for the Z-score is the CN template. White dashed circles show false positives from anatomical misalignment.
Refer to caption
Figure 4: Pseudo-healthy reconstruction results in different time steps (L𝐿Litalic_L). The last column represents the standard deviation map for 10 times running with 900 timesteps. White dashed circles indicate inconsistent reconstruction.

Most anomaly detection methods based on GAN or VAE rely fundamentally on individual-based reconstruction. Because of the stochastic nature of the sampling process, diffusion models usually lead to worse reconstruction of original input data. Here, we ease such problems for better individualized reconstruction and, therefore, superior anomaly localization by adding slice positional encoding and anatomical constraint with edge maps. Table 1 presents the ablation study on the reconstruction performance of the baselines of GAN/VAE/DDPM and our methods according to the structural similarity index measure (SSIM) [30], mean squared error (MSE), and peak signal-to-noise ratio (PSNR). With conditions of slice position and edge, our model shows significantly improved SSIM/PSNR and lower MSE that are comparable to previous state-of-the-art anomaly detection models of GAN/VAE and, therefore, better anatomy alignment and slice consistency. Even though our improved reconstruction is not as good as VAE yet, one thing to note is that the reconstruction is necessary but not sufficient property for anomaly localization. Both GAN/VAE and our method are good enough on reconstruction for precise anatomy alignment and individualized anomaly detection, but GAN/VAE is worse at generative modeling that’s necessary for anomaly localization.

Refer to caption
Figure 5: Comparison of anomaly map with Z-score method and GAN/VAE/DDPM/DDIM baselines. The results come from two subjects in the A4 datasets: (a) B14479344 (subject ID) without anomaly and (b) B30470679 with anomaly areas. White dashed circles show areas of false positives.

The Z-score method and DDPM baseline inherently perform worse on anomaly localization because of their suboptimal brain anatomy alignment, as shown in Fig. 3 and Fig. 4, respectively. In Fig. 3, our reconstructed pseudo-healthy data is quite consistent with the input data. Still, the Z-score method’s CN template (mean of CNs) cannot capture the individual difference, especially on gyrus folding, leading to false positives. Besides, diffusion models suffer from unstable reconstructions: small denoising steps lead to not fully removed anomaly and, therefore, false negatives, while large steps result in unaligned reconstruction and, therefore, false positives [32]. As shown in Fig. 4 (middle), the DDPM reconstructed brain anatomy is not faithful to the input (white dashed circles) because of lacking edge constraints. The standard deviation map (upper right) shows a large variance of DDPM for ten times running. Our method presents consistency across multiple generations and remains robust to changes in sampling steps. This results in the added benefit of not fine-tuning the hyperparameter of denoising steps.

Based on superior anatomy alignment, robust generation, and better data distribution modeling, our method outperforms baselines in subject-level localized anomaly detection as shown in Fig. 5, where the Z-score maps wrongly capture the background distribution shifts between A4 and ADNI datasets as anomaly regions, leading to large red areas in the background. For common off-target bindings in sub-cortical and out-of-brain areas (e.g., white dashed circles in Fig. 5 (a)), the Z-score method and baselines of GAN, DDPM, and DDIM cannot distinguish them from tauopathy-related anomalies, leading to false positives on the anomaly maps. Even though the VAE and DDPM baselines show more robustness with more true positives compared to GAN, they still cannot correctly detect all anomalies (e.g., white dashed circles in Fig. 5 (b)). This demonstrates the necessity of utilizing pseudo-healthy and pseudo-unhealthy reconstruction as a complement to better detect anomalies. Overall, our method shows superiority for localized tau pathology detection by modeling the underlying data distribution with precise individual anatomy reconstruction.

Refer to caption
Figure 6: Boxplot of mean anomaly score in ROIs for A4 and ADNI datasets. The x-axis and y-axis denote the ROI region name and mean anomaly score, respectively. P-values <<< 0.05 (*), <<< 0.01 (**), <<< 0.001 (***), <<< 0.0001 (****).
Refer to caption
Figure 7: Boxplot of overall anomaly score and cognitive assessments for A4 dataset. The x-axis in the middle and right plots denotes cognitive assessment names. The DIGITTOTAL comes from the COGDIGIT test, meaning the total number of correctly matched digits; the FCFREE1, FCFREE2, and FCFREE3 are from the COGFCSR16 test, representing recalled picture number during three trials; the LIMMTOTAL and LDELTOTAL are evaluation scores of immediate recall and delayed recall, respectively, in the COGLOGIC test. In all these assessments, lower scores mean worse cognitive functionality. P-values <<< 0.05 (*), <<< 0.01 (**), <<< 0.001 (***), <<< 0.0001 (****).

To validate anomaly detection of our method, we divide the A4 datasets into two groups (positive and negative) according to the SVM classification results and analyze anomaly scores and cognitive assessment scores distributions between the two groups. Fig. 6 shows the mean anomaly score within four brain lobes, where the tau commonly deposits, in both ADNI and A4 datasets. The positive group presents significantly higher anomaly scores, demonstrating the effectiveness of our classification. The ADNI cohort exhibits a more pronounced anomaly, which is expected given that ADNI subjects have more severe tau pathology, while the A4 cohort comprises preclinical individuals with mild tau deposits. Additionally, the more severe anomaly located in the temporal and parietal lobes aligns with the typical progression of Alzheimer’s disease, where these regions are most significantly affected [4]. In terms of preclinical cognitive assessments, as illustrated in Fig. 7, the positive group exhibits statistically significant differences (p-value <<much-less-than<<< < 0.05) from the negative group with notably lower scores across all cognitive assessments and higher subject-level anomaly scores, which suggests the success of our method on grouping preclinical subjects with significantly different cognitive functionality by identifying biologically relevant anomaly. This highlights the potential of our method to assist in preventing the progression of Alzheimer’s disease in its early stages.

4.4 Limitation

Like many previous methods, a limitation of our method is that it’s trained on 2D slices of 3D scans, which leads to inconsistency between slices within a 3D volume. This problem is largely suppressed by added edge map conditions and can be further eliminated by training 3D models in future work. Another limitation compared to the prior methods of GAN/VAE is that diffusion models perform hundreds of denoising steps to generate pseudo-data, which leads to longer inference time in clinical situations. Even though this limitation is not significant because medical image preprocessing usually takes a few hours, and we only take 30 minutes to process a single 3D data on NVIDIA RTX A5000 GPU, this problem can be eased by faster and even more accurate sampling strategies, which have been extensively explored in many work recently. We’ll leave this for our future work as well.

5 Conclusion

In this paper, we presented a novel weakly supervised approach for tau anomaly detection in PET imaging by adopting an “MRI-free" preprocessing pipeline, implicitly conditioning the model with supplementary information for individualized reconstruction, and utilizing both pseudo-healthy and pseudo-unhealthy reconstructions for more effective localized tau pathology detection over Z-score and GAN/VAE/DDPM/DDIM baselines. The training and evaluation are conducted on two separate datasets, showing the robustness of our method in handling out-of-distribution data and its contribution to cognitive impairment staging in the early stages.

Acknowledgments and Disclosure of Funding

This work was supported by the National Institute of Health (NIH) under grants RF1AG077578, RF1AG064584, R01EB022744, R21AG064776, R01AG062007, U19AG078109, and P30AG066530.

References

  • [1] Akamatsu, G., Ikari, Y., Ohnishi, A., Matsumoto, K., Nishida, H., Yamamoto, Y., Senda, M., Initiative, J.A.D.N.: Voxel-based statistical analysis and quantification of amyloid pet in the japanese alzheimer’s disease neuroimaging initiative (j-adni) multi-center study. EJNMMI research 9,  1–9 (2019)
  • [2] Baydargil, H.B., Park, J.S., Kang, D.Y.: Anomaly analysis of alzheimer’s disease in pet images using an unsupervised adversarial deep learning model. Applied Sciences 11(5),  2187 (2021)
  • [3] Behrendt, F., Bhattacharya, D., Krüger, J., Opfer, R., Schlaefer, A.: Patched diffusion models for unsupervised anomaly detection in brain mri. In: Medical Imaging with Deep Learning. pp. 1019–1032. PMLR (2024)
  • [4] Blanc, F., Colloby, S.J., Philippi, N., de Petigny, X., Jung, B., Demuynck, C., Phillipps, C., Anthony, P., Thomas, A., Bing, F., et al.: Cortical thickness in dementia with lewy bodies and alzheimer’s disease: a comparison of prodromal and dementia stages. PloS one 10(6), e0127396 (2015)
  • [5] Choi, H., Ha, S., Kang, H., Lee, H., Lee, D.S.: Deep learning only by normal brain pet identify unheralded brain anomalies. EBioMedicine 43, 447–453 (2019)
  • [6] Fischl, B.: Freesurfer. Neuroimage 62(2), 774–781 (2012)
  • [7] Greve, D.N., Salat, D.H., Bowen, S.L., Izquierdo-Garcia, D., Schultz, A.P., Catana, C., Becker, J.A., Svarer, C., Knudsen, G.M., Sperling, R.A., Johnson, K.A.: Different partial volume correction methods lead to different conclusions: An 18f-fdg-pet study of aging. NeuroImage 132, 334–343 (2016)
  • [8] Han, C., Rundo, L., Murao, K., Noguchi, T., Shimahara, Y., Milacski, Z.Á., Koshino, S., Sala, E., Nakayama, H., Satoh, S.: Madgan: Unsupervised medical anomaly detection gan using multiple adjacent brain mri slice reconstruction. BMC bioinformatics 22(2), 1–20 (2021)
  • [9] Hassanaly, R., Brianceau, C., Solal, M., Colliot, O., Burgos, N.: Evaluation of pseudo-healthy image reconstruction for anomaly detection with deep generative models: Application to brain fdg pet. arXiv preprint arXiv:2401.16363 (2024)
  • [10] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
  • [11] Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Advances in neural information processing systems 33, 6840–6851 (2020)
  • [12] Iglesias, J.E., Billot, B., Balbastre, Y., Magdamo, C., Arnold, S.E., Das, S., Edlow, B.L., Alexander, D.C., Golland, P., Fischl, B.: Synthsr: A public ai tool to turn heterogeneous clinical brain scans into high-resolution t1-weighted images for 3d morphometry. Science advances 9(5), eadd3607 (2023)
  • [13] Jaeger, J.: Digit symbol substitution test: the case for sensitivity over specificity in neuropsychological testing. Journal of clinical psychopharmacology 38(5),  513 (2018)
  • [14] Kascenas, A., Sanchez, P., Schrempf, P., Wang, C., Clackett, W., Mikhael, S.S., Voisey, J.P., Goatman, K., Weir, A., Pugeault, N., et al.: The role of noise in denoising models for anomaly detection in medical images. Medical Image Analysis 90, 102963 (2023)
  • [15] Klein, S., Staring, M., Murphy, K., Viergever, M.A., Pluim, J.P.: Elastix: a toolbox for intensity-based medical image registration. IEEE transactions on medical imaging 29(1), 196–205 (2009)
  • [16] Knight Alzheimer’s Disease Research Center: Psychometric codebook. https://knightadrc.wustl.edu/wp-content/uploads/2021/07/Psychometric-Codebook-7-22-19.pdf (2019), accessed: 2024-02-16
  • [17] Landau, S.M., Ward, T.J., Murphy, A., Iaccarino, L., Harrison, T.M., La Joie, R., Baker, S., Koeppe, R.A., Jagust, W.J., Initiative, A.D.N.: Quantification of amyloid beta and tau pet without a structural mri. Alzheimer’s & Dementia 19(2), 444–455 (2023)
  • [18] Luo, C.: Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970 (2022)
  • [19] Mueller, S., Weiner, M., Thal, L., Petersen, R., Jack, C., Jagust, W., Trojanowski, J., Toga, A., Beckett, L.: The alzheimer’s disease neuroimaging initiative. Neuroimaging Clinics of North America 15(4), 869–877 (11 2005)
  • [20] Nelson, P.T., Alafuzoff, I., Bigio, E.H., Bouras, C., Braak, H., Cairns, N.J., Castellani, R.J., Crain, B.J., Davies, P., Tredici, K.D., et al.: Correlation of alzheimer disease neuropathologic changes with cognitive status: a review of the literature. Journal of Neuropathology & Experimental Neurology 71(5), 362–381 (2012)
  • [21] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)
  • [22] Salimans, T., Ho, J.: Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512 (2022)
  • [23] Schöll, M., Lockhart, S.N., Schonhaut, D.R., O’Neil, J.P., Janabi, M., Ossenkoppele, R., Baker, S.L., Vogel, J.W., Faria, J., Schwimmer, H.D., Rabinovici, G.D., Jagust, W.J.: PET imaging of tau deposition in the aging human brain. Neuron 89(5), 971–982 (2016)
  • [24] Siddiquee, M.M.R., Zhou, Z., Tajbakhsh, N., Feng, R., Gotway, M.B., Bengio, Y., Liang, J.: Learning fixed points in generative adversarial networks: From image-to-image translation to disease detection and localization. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 191–200 (2019)
  • [25] Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning. pp. 2256–2265. PMLR (2015)
  • [26] Sperling, R.A., Donohue, M.C., Raman, R., Sun, C.K., Yaari, R., Holdridge, K., Siemers, E., Johnson, K.A., Aisen, P.S., for the A4 Study Team: Association of Factors With Elevated Amyloid Burden in Clinically Normal Older Individuals. JAMA Neurology 77(6), 735–745 (06 2020)
  • [27] Sun, X., Liang, S., Fu, L., Zhang, X., Feng, T., Li, P., Zhang, T., Wang, L., Yin, X., Zhang, W., et al.: A human brain tau pet template in mni space for the voxel-wise analysis of alzheimer’s disease. Journal of Neuroscience Methods 328, 108438 (2019)
  • [28] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
  • [29] Versluis, M., Peeters, J., van Rooden, S., van der Grond, J., van Buchem, M., Webb, A., van Osch, M.: Origin and reduction of motion and f0 artifacts in high resolution t2*-weighted magnetic resonance imaging: Application in alzheimer’s disease patients. NeuroImage 51(3), 1082–1088 (2010)
  • [30] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4), 600–612 (2004)
  • [31] Weigand, A.J., Maass, A., Eglit, G.L., Bondi, M.W.: What’s the cut-point?: a systematic investigation of tau pet thresholding methods. Alzheimer’s research & therapy 14(1),  49 (2022)
  • [32] Wolleb, J., Bieder, F., Sandkühler, R., Cattin, P.C.: Diffusion models for medical anomaly detection. In: International Conference on Medical image computing and computer-assisted intervention. pp. 35–45. Springer (2022)
  • [33] Xiong, Z., Ding, Q., Zhao, Y., Zhang, X.: Pet-3dflow: A normalizing flow based method for 3d pet anomaly detection. In: International Workshop on Computational Mathematics Modeling in Cancer Analysis. pp. 91–100. Springer (2023)
  • [34] Yang, X., Chin, B., Silosky, M., Litwiller, D., Ghosh, D., Xing, F.: Learning with synthesized data for generalizable lesion detection in real pet images. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 116–126. Springer (2023)
  • [35] Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 3836–3847 (2023)
  • [36] Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision. pp. 2223–2232 (2017)

NeurIPS Paper Checklist

  1. 1.

    Claims

  2. Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope?

  3. Answer: [Yes]

  4. Justification: Contributions and scope are clearly stated in the abstract and introduction.

  5. Guidelines:

    • The answer NA means that the abstract and introduction do not include the claims made in the paper.

    • The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers.

    • The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings.

    • It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper.

  6. 2.

    Limitations

  7. Question: Does the paper discuss the limitations of the work performed by the authors?

  8. Answer: [Yes]

  9. Justification: One subsection discusses the limitations of the work.

  10. Guidelines:

    • The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper.

    • The authors are encouraged to create a separate "Limitations" section in their paper.

    • The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be.

    • The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated.

    • The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon.

    • The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size.

    • If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness.

    • While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren’t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations.

  11. 3.

    Theory Assumptions and Proofs

  12. Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?

  13. Answer: [N/A]

  14. Justification: This paper does not include theoretical results.

  15. Guidelines:

    • The answer NA means that the paper does not include theoretical results.

    • All the theorems, formulas, and proofs in the paper should be numbered and cross-referenced.

    • All assumptions should be clearly stated or referenced in the statement of any theorems.

    • The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition.

    • Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material.

    • Theorems and Lemmas that the proof relies upon should be properly referenced.

  16. 4.

    Experimental Result Reproducibility

  17. Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)?

  18. Answer: [Yes]

  19. Justification: This paper clearly states implementation details to reproduce results.

  20. Guidelines:

    • The answer NA means that the paper does not include experiments.

    • If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not.

    • If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable.

    • Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed.

    • While NeurIPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example

      1. (a)

        If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm.

      2. (b)

        If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully.

      3. (c)

        If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset).

      4. (d)

        We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results.

  21. 5.

    Open access to data and code

  22. Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?

  23. Answer: [No]

  24. Justification: The code will be provided upon acceptance of this paper.

  25. Guidelines:

    • The answer NA means that paper does not include experiments requiring code.

    • Please see the NeurIPS code and data submission guidelines (https://nips.cc/public/guides/CodeSubmissionPolicy) for more details.

    • While we encourage the release of code and data, we understand that this might not be possible, so “No” is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark).

    • The instructions should contain the exact command and environment needed to run to reproduce the results. See the NeurIPS code and data submission guidelines (https://nips.cc/public/guides/CodeSubmissionPolicy) for more details.

    • The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc.

    • The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why.

    • At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable).

    • Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted.

  26. 6.

    Experimental Setting/Details

  27. Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results?

  28. Answer: [Yes]

  29. Justification: The experimental settings and datasets are clearly specified for understanding.

  30. Guidelines:

    • The answer NA means that the paper does not include experiments.

    • The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them.

    • The full details can be provided either with the code, in appendix, or as supplemental material.

  31. 7.

    Experiment Statistical Significance

  32. Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments?

  33. Answer: [Yes]

  34. Justification: Statistical significance tests are conducted and reported.

  35. Guidelines:

    • The answer NA means that the paper does not include experiments.

    • The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper.

    • The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions).

    • The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.)

    • The assumptions made should be given (e.g., Normally distributed errors).

    • It should be clear whether the error bar is the standard deviation or the standard error of the mean.

    • It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified.

    • For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates).

    • If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text.

  36. 8.

    Experiments Compute Resources

  37. Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments?

  38. Answer: [Yes]

  39. Justification: Sufficient information is provided in the implementation details subsection.

  40. Guidelines:

    • The answer NA means that the paper does not include experiments.

    • The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage.

    • The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute.

    • The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn’t make it into the paper).

  41. 9.

    Code Of Ethics

  42. Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics https://neurips.cc/public/EthicsGuidelines?

  43. Answer: [Yes]

  44. Justification: The NeurIPS Code of Ethics is abided throughout the research.

  45. Guidelines:

    • The answer NA means that the authors have not reviewed the NeurIPS Code of Ethics.

    • If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics.

    • The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction).

  46. 10.

    Broader Impacts

  47. Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed?

  48. Answer: [Yes]

  49. Justification: This Tau PET anomaly detection research will boost clinical application on Alzheimer’s disease without negative social impacts.

  50. Guidelines:

    • The answer NA means that there is no societal impact of the work performed.

    • If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact.

    • Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations.

    • The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster.

    • The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology.

    • If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML).

  51. 11.

    Safeguards

  52. Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?

  53. Answer: [N/A]

  54. Justification: This research does not pose such risks.

  55. Guidelines:

    • The answer NA means that the paper poses no such risks.

    • Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters.

    • Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images.

    • We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.

  56. 12.

    Licenses for existing assets

  57. Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?

  58. Answer: [Yes]

  59. Justification: The citations are included for all existing assets used in this paper.

  60. Guidelines:

    • The answer NA means that the paper does not use existing assets.

    • The authors should cite the original paper that produced the code package or dataset.

    • The authors should state which version of the asset is used and, if possible, include a URL.

    • The name of the license (e.g., CC-BY 4.0) should be included for each asset.

    • For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided.

    • If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset.

    • For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided.

    • If this information is not available online, the authors are encouraged to reach out to the asset’s creators.

  61. 13.

    New Assets

  62. Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?

  63. Answer: [N/A]

  64. Justification: This paper does not release new assets.

  65. Guidelines:

    • The answer NA means that the paper does not release new assets.

    • Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc.

    • The paper should discuss whether and how consent was obtained from people whose asset is used.

    • At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file.

  66. 14.

    Crowdsourcing and Research with Human Subjects

  67. Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)?

  68. Answer: [N/A]

  69. Justification: This paper does not involve crowdsourcing nor research with human subjects.

  70. Guidelines:

    • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects.

    • Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper.

    • According to the NeurIPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector.

  71. 15.

    Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects

  72. Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained?

  73. Answer: [N/A]

  74. Justification: This paper does not involve crowdsourcing nor research with human subjects.

  75. Guidelines:

    • The answer NA means that the paper does not involve crowdsourcing nor research with human subjects.

    • Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper.

    • We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the NeurIPS Code of Ethics and the guidelines for their institution.

    • For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.