Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
License: CC BY 4.0
arXiv:2303.07540v2 [cs.LG] 06 Apr 2024
11institutetext: Department of Computer Science, University of Sheffield, Sheffield, UK 22institutetext: Centre for Machine Intelligence, University of Sheffield, Sheffield, UK 33institutetext: Department of Infection, Immunity and Cardiovascular Disease, University of Sheffield, Sheffield, UK 44institutetext: Department of Clinical Radiology, Sheffield Teaching Hospitals, Sheffield, UK 55institutetext: INSIGNEO, Institute for in Silico Medicine, University of Sheffield, Sheffield, UK
55email: {p.c.tripathi(✉), m.suvon, laschobs1, shuo.zhou, s.alabed, a.j.swift, h.lu}@sheffield.ac.uk

Tensor-based Multimodal Learning for Prediction of Pulmonary Arterial Wedge Pressure from Cardiac MRI

Prasun C. Tripathi(✉) 11    Mohammod N. I. Suvon 11    Lawrence Schobs 11    Shuo Zhou 1122    Samer Alabed 334455    Andrew J. Swift 334455    Haiping Lu 112255
Abstract

Heart failure is a severe and life-threatening condition that can lead to elevated pressure in the left ventricle. Pulmonary Arterial Wedge Pressure (PAWP) is an important surrogate marker indicating high pressure in the left ventricle. PAWP is determined by Right Heart Catheterization (RHC) but it is an invasive procedure. A non-invasive method is useful in quickly identifying high-risk patients from a large population. In this work, we develop a tensor learning-based pipeline for identifying PAWP from multimodal cardiac Magnetic Resonance Imaging (MRI). This pipeline extracts spatial and temporal features from high-dimensional scans. For quality control, we incorporate an uncertainty-based binning strategy to identify poor-quality training samples. We leverage complementary information by integrating features from multimodal data: cardiac MRI with short-axis and four-chamber views, and cardiac measurements. The experimental analysis on a large cohort of 1346134613461346 subjects who underwent the RHC procedure for PAWP estimation indicates that the proposed pipeline has a diagnostic value and can produce promising performance with significant improvement over the baseline in clinical practice (i.e., ΔΔ\Deltaroman_ΔAUC =0.10absent0.10=0.10= 0.10, ΔΔ\Deltaroman_ΔAccuracy =0.06absent0.06=0.06= 0.06, and ΔΔ\Deltaroman_ΔMCC =0.39absent0.39=0.39= 0.39). The decision curve analysis further confirms the clinical utility of our method. The source code can be found at: https://github.com/prasunc/PAWP.

Keywords:
Cardiac MRI Multimodal Learning Pulmonary Arterial Wedge Pressure.

1 Introduction

Heart failure is usually characterized by the inability of the heart to supply enough oxygen and blood to other organs of the body [4]. It is a major cause of mortality and hospitalization  [14]. Elevated Pulmonary Arterial Wedge Pressure (PAWP) is indicative of raised left ventricular filling pressure and reduced contractility of the heart. In the absence of mitral valve or pulmonary vasculature disease, PAWP correlates with the severity of heart failure and risk of hospitalization [1]. While PAWP can be measured by invasive and expensive Right Heart Catheterization (RHC), simpler and non-invasive techniques could aid in better monitoring of heart failure patients. Cardiac Magnetic Resonance Imaging (MRI) is an effective tool for identifying various heart conditions and its ability to detect disease and predict outcome has been further improved by machine learning techniques [3]. For instance, Swift et al. [17] introduced a machine-learning pipeline for identifying Pulmonary Arterial Hypertension (PAH). Recently, Uthoff et al. [18] developed geodesically smoothed tensor features for predicting mortality in PAH.

Cardiac MRI scans contain high-dimensional spatial and temporal features generated throughout the cardiac cycle. The small number of samples compared to the high-dimensional features poses a challenge for machine learning classifiers. To address this issue, Multilinear Principal Component Analysis (MPCA) [11] utilizes a tensor-based approach to reduce feature dimensions while preserving the information for each mode, i.e. spatial and temporal information in cardiac MRI. Hence, the MPCA method is well-suited for analyzing cardiac MRI scans. The application of the MPCA method to predict PAWP might further increase the diagnostic yield of cardiac MRI in heart failure patients and help to establish cardiac MRI as a non-invasive alternative to RHC. Existing MPCA-based pipelines for cardiac MRI [17, 18, 2] rely on manually labeled landmarks that are used for aligning heart regions in cardiac MRI. The manual labeling of landmarks is a cumbersome task for physicians and impractical for analyzing large cohorts. Moreover, even small deviations in the landmark placement may significantly impact the classification performance of automatic pipelines [16]. To tackle this challenge, we leverage automated landmarks with uncertainty quantification [15] in our pipeline. We also extract complementary information from multimodal data from short-axis, four-chamber, and Cardiac Measurements (CM). We use CM features (i.e., left atrial volume and left ventricular mass) identified in the baseline work by Garg et al. [5] for PAWP prediction.

Our main contributions are summarized as follows: 1) Methodology: We developed a fully automatic pipeline for PAWP prediction using cardiac MRI data, which includes automatic landmark detection with uncertainty quantification, an uncertainty-based binning strategy for training sample selection, tensor feature learning, and multimodal feature integration. 2) Effectiveness: Extensive experiments on the cardiac MRI scans of 1346134613461346 patients with various heart diseases validated our pipeline with a significant improvement (ΔΔ\Deltaroman_ΔAUC =0.1027absent0.1027=0.1027= 0.1027, ΔΔ\Deltaroman_ΔAccuracy =0.0628absent0.0628=0.0628= 0.0628, and ΔΔ\Deltaroman_ΔMCC =0.3917absent0.3917=0.3917= 0.3917) over the current clinical baseline. 3) Clinical utility: Decision curve analysis indicates the diagnostic value of our pipeline, which can be used in screening high-risk patients from a large population.

2 Methods

Refer to caption
Figure 1: The schematic overview of the PAWP prediction pipeline including preprocessing, tensor feature learning, and performance analysis. The blocks in gray color are explained in more detail in Section 2.

As shown in Fig. 1, the proposed pipeline for PAWP prediction comprises three components: preprocessing, tensor feature learning, and performance analysis.

Cardiac MRI Preprocessing: The preprocessing of cardiac MRI contains (1111) normalization of scans, (2222) automatic landmark detection, (3333) inter-subject registration, and (4444) in-plane downsampling. We standardize cardiac MRI intensity levels using Z-score normalization [7] to eliminate inter-subject variations. Furthermore, we detect automatic landmarks which is explained in the next paragraph. We perform affine registration to align the heart regions of different subjects to a target image space. We then carry out in-plane scaling of scans by max-pooling at 2222, 4444, 8888, and 16161616 times and obtain down-sampled resolutions of 128×128128128128\times 128128 × 128, 64×64646464\times 6464 × 64, 32×32323232\times 3232 × 32, and 16×16161616\times 1616 × 16, respectively.

Landmark Detection and Uncertainty-based Sample Binning: We utilize supervised learning to automate landmark detection using an ensemble of Convolutional Neural Networks (CNNs) for each modality (short-axis and four-chamber). We use the U-Net-like architecture and utilize the same training regime implemented in [15]. We employ Ensemble Maximum Heatmap Activation (E-MHA) strategy [15] which incorporates an ensemble of five models for each modality. We utilize three landmarks for each modality, with the short-axis modality using the inferior hinge point, superior hinge point, and inferolateral inflection point of the right ventricular apex, and the four-chamber modality using the left ventricular apex and mitral and tricuspid annulus. E-MHA produces an associated uncertainty estimate for each landmark prediction, representing the model’s epistemic uncertainty as a continuous scalar value.

A minor error in landmark prediction can result in incorrect image registration [16]. To address this issue, we hypothesize that incorrectly preprocessed samples resulting from inaccurate landmarks can introduce ambiguity during model training. For quality control, it is crucial to identify and effectively handle such samples. In this study, we leverage predicted landmarks and epistemic uncertainties to tackle this problem using uncertainty-based binning. To this end, we partition the training scans based on the uncertainty values of the landmarks. The predicted landmarks are divided into K𝐾Kitalic_K quantiles, i.e., Q={q1,q2,,qK}𝑄subscript𝑞1subscript𝑞2subscript𝑞𝐾Q=\{q_{1},q_{2},...,q_{K}\}italic_Q = { italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_q start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT }, based on the epistemic uncertainty values. We then iteratively filter out training samples starting from the highest uncertain quantile. A sample is discarded if the uncertainty of any of its landmarks lies in quantile qksubscript𝑞𝑘q_{k}italic_q start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT where k={1,2,,K}𝑘12𝐾k=\{1,2,...,K\}italic_k = { 1 , 2 , … , italic_K }. The samples are discarded iteratively until there is no improvement in the validation performance, as measured by the area under the curve (AUC), for two subsequent iterations.

Tensor Feature Learning: To extract features from processed cardiac scans, we employ tensor feature learning, i.e. Multilinear Principal Component Analysis (MPCA) [11], which learns multilinear bases from cardiac MRI stacks to obtain low-dimensional features for prediction. Suppose we have M𝑀Mitalic_M scans as third-order tensors in the form of {𝒳1,𝒳2,..,𝒳MI1×I2×I3}\{\mathcal{X}_{1},\mathcal{X}_{2},..,\mathcal{X}_{M}\in\mathbb{R}^{I_{1}\times I% _{2}\times I_{3}}\}{ caligraphic_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , . . , caligraphic_X start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT }. The low-dimensional tensor features {𝒴1,𝒴2,..,𝒴MP1×P2×P3}\{\mathcal{Y}_{1},\mathcal{Y}_{2},..,\mathcal{Y}_{M}\in\mathbb{R}^{P_{1}\times P% _{2}\times P_{3}}\}{ caligraphic_Y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , caligraphic_Y start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , . . , caligraphic_Y start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } are extracted by learning three (N=3𝑁3N=3italic_N = 3) projection matrices {U(n)In×Pn,n=1,2,3}formulae-sequencesuperscript𝑈𝑛superscriptsubscript𝐼𝑛subscript𝑃𝑛𝑛123\{U^{(n)}\in\mathbb{R}^{I_{n}\times P_{n}},n=1,2,3\}{ italic_U start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT × italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_n = 1 , 2 , 3 } as follows:

𝒴m=𝒳m×1U(1)T×2U(2)T×3U(3)T,m=1,2,,M,formulae-sequencesubscript𝒴𝑚subscript3subscript2subscript1subscript𝒳𝑚superscript𝑈superscript1𝑇superscript𝑈superscript2𝑇superscript𝑈superscript3𝑇𝑚12𝑀\mathcal{Y}_{m}=\mathcal{X}_{m}\times_{1}U^{(1)^{T}}\times_{2}U^{(2)^{T}}% \times_{3}U^{(3)^{T}},m=1,2,...,M,caligraphic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = caligraphic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT × start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_U start_POSTSUPERSCRIPT ( 1 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT × start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_U start_POSTSUPERSCRIPT ( 2 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT × start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_U start_POSTSUPERSCRIPT ( 3 ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT , italic_m = 1 , 2 , … , italic_M , (1)

where Pn<Insubscript𝑃𝑛subscript𝐼𝑛P_{n}<I_{n}italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT < italic_I start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, and ×nsubscript𝑛\times_{n}× start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT denotes a mode-wise product. Therefore, the feature dimensions are reduced from I1×I2×I3subscript𝐼1subscript𝐼2subscript𝐼3I_{1}\times I_{2}\times I_{3}italic_I start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_I start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_I start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT to P1×P2×P3subscript𝑃1subscript𝑃2subscript𝑃3P_{1}\times P_{2}\times P_{3}italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT × italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT × italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT. We optimize the projection matrices {U(n)}superscript𝑈𝑛\{U^{(n)}\}{ italic_U start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT } by maximizing total scatter ψ𝒴=m=1M𝒴m𝒴¯F2subscript𝜓𝒴superscriptsubscript𝑚1𝑀superscriptsubscriptnormsubscript𝒴𝑚¯𝒴𝐹2\psi_{\mathcal{Y}}=\sum_{m=1}^{M}||\mathcal{Y}_{m}-\bar{\mathcal{Y}}||_{F}^{2}italic_ψ start_POSTSUBSCRIPT caligraphic_Y end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT | | caligraphic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - over¯ start_ARG caligraphic_Y end_ARG | | start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, where 𝒴¯=1Mm=1M𝒴m¯𝒴1𝑀superscriptsubscript𝑚1𝑀subscript𝒴𝑚\bar{\mathcal{Y}}=\frac{1}{M}\sum_{m=1}^{M}\mathcal{Y}_{m}over¯ start_ARG caligraphic_Y end_ARG = divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT caligraphic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is the mean tensor feature and ||.||F||.||_{F}| | . | | start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT is the Frobenius norm [10]. We solve this problem using an iterative projection method. In MPCA, {P1,P2,P3}subscript𝑃1subscript𝑃2subscript𝑃3\{P_{1},P_{2},P_{3}\}{ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT } can be determined by the explained variance ratio, which is a hyperparameter. Furthermore, we apply Fisher discriminant analysis to select the most significant features based on their Fisher score [8]. We select the top k𝑘kitalic_k-ranked features and employ Support Vector Machine (SVM) for classification.

Multimodal Feature Integration: To enhance performance, we perform multimodal feature integration using features extracted from the short-axis, four-chamber, and Cardiac Measurements (CM). We adopt two strategies for feature integration, namely the early and late fusion of features [6]. In early fusion, the features are fused at the input level without doing any transformation. We concatenate features from the short-axis and four-chamber to perform this fusion. We then apply MPCA [11] on the concatenated tensor, enabling the selection of multimodal features. In late fusion, the integration of features is performed at the common latent space that allows the fusion of features that have different dimensionalities. In this way, we can perform a late fusion of CM features with short-axis and four-chamber features. However, we can not perform an early fusion of CM features with short-axis and four-chamber features.

Performance Evaluation: In this paper, we use three primary metrics: Area Under Curve (AUC), accuracy, and Matthew’s Correlation Coefficient (MCC), to evaluate the performance of the proposed pipeline. Decision Curve Analysis (DCA) is also conducted to demonstrate the clinical utility of our methodology.

Table 1: Baseline characteristics of included patients. p𝑝pitalic_p values were obtained using t𝑡titalic_t-test [20].
Low PAWP(15absent15\leq 15≤ 15) High PAWP(>15absent15>15> 15) p𝑝pitalic_p-value
Number of patients 940940940940 406406406406 -
Age (in years) 64.8±14.2plus-or-minus64.814.264.8\pm 14.264.8 ± 14.2 70.5±10.6plus-or-minus70.510.670.5\pm 10.670.5 ± 10.6 <0.01absent0.01<0.01< 0.01
Body Surface Area (BSA) 1.88±0.28plus-or-minus1.880.281.88\pm 0.281.88 ± 0.28 1.93±0.24plus-or-minus1.930.241.93\pm 0.241.93 ± 0.24 <0.01absent0.01<0.01< 0.01
Heart Rate (bpm) 73.9±15.5plus-or-minus73.915.573.9\pm 15.573.9 ± 15.5 67.6±15.9plus-or-minus67.615.967.6\pm 15.967.6 ± 15.9 <0.01absent0.01<0.01< 0.01
Left Ventricle Mass (LVM) 92.3±25plus-or-minus92.32592.3\pm 2592.3 ± 25 106±33.1plus-or-minus10633.1106\pm 33.1106 ± 33.1 <0.01absent0.01<0.01< 0.01
Left Atrial Volume (ml2𝑚superscript𝑙2ml^{2}italic_m italic_l start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT) 72.2±33.7plus-or-minus72.233.772.2\pm 33.772.2 ± 33.7 132.2±56.7plus-or-minus132.256.7132.2\pm 56.7132.2 ± 56.7 <0.01absent0.01<0.01< 0.01
PAWP (mmHg) 10.3±3.1plus-or-minus10.33.110.3\pm 3.110.3 ± 3.1 21.7±4.96plus-or-minus21.74.9621.7\pm 4.9621.7 ± 4.96 <0.01absent0.01<0.01< 0.01

3 Experimental Results and Analysis

Study Population: Patients with suspected pulmonary hypertension were identified after institutional review board approval and ethics committee review. A total of 1346134613461346 patients who underwent Right Heart Catheterization (RHC) and cardiac MRI scans within 24242424 hours were included. Of these patients, 940940940940 had normal PAWP (15absent15\leq 15≤ 15 mmHg), while 406406406406 had elevated PAWP (>15absent15>15> 15 mmHg). Table 1 summarizes baseline patient characteristics. RHC was performed using a balloon-tipped 7.57.57.57.5 French thermodilution catheter.

Cardiac MRI and measurement: MRI scans were obtained using a 1.51.51.51.5 Tesla whole-body GE HDx MRI scanner (GE Healthcare, Milwaukee, USA) equipped with 8888-channel cardiac coils and retrospective electrocardiogram gating. Two cardiac MRI protocols, short-axis and four-chamber, were employed, following standard clinical protocols to acquire cardiac-gated multi-slice steady-state sequences with a slice thickness of 8888 mm, a field of view of 48×43.24843.248\times 43.248 × 43.2, a matrix size of 512×512512512512\times 512512 × 512, a bandwidth of 125125125125 kHz, and TR/TE of 3.7/1.63.71.63.7/1.63.7 / 1.6 ms. Following  [5], left ventricle mass and left atrial volume were selected as cardiac measurements.

Experimental Design: We conducted experiments on short-axis and four-chamber scans across four scales. To determine the optimal parameters, we performed 10101010-fold cross-validation on the training set. From MPCA, we selected the top 210210210210 features. We employed early and late fusion on short-axis and four-chamber scans, respectively, while CM features were only fused using the late fusion strategy. We divided the data into a training set of 1081108110811081 cases and a testing set of 265265265265 cases. To simulate a real testing scenario, we designed the experiments such that patients diagnosed in the early years were part of the training set, while patients diagnosed in recent years were part of the testing set. We also partitioned the test into 5555 parts based on the diagnosis time to perform different runs of methods and report standard deviations of methods in comparison results. For SVM, we selected the optimal hyper-parameters from {0.001,0.01,0.1,1}0.0010.010.11\{0.001,0.01,0.1,1\}{ 0.001 , 0.01 , 0.1 , 1 } using the grid search technique. The code for the experiments has been implemented in Python (version 3.93.93.93.9). We leveraged the cardiac MRI preprocessing pipeline and MPCA from the Python library PyKale [9] and SVM implementation is taken from scikit-learn [12].

Refer to caption
Figure 2: Performance comparison of removing a different number of bins of training data on 10101010-fold cross-validation.

Refer to caption
(a) Short-axis versus short-axis and CM
Refer to caption
(b) Four-chamber versus four-chamber and CM
Figure 3: The effect of combining CM features on short-axis and four-chamber. SA: Short-axis; FC: Four-chamber.
Table 2: Performance comparison using three metrics (with best in bold and second best underlined). FC: Four-Chamber features; SA: Short-Axis features; CM: Cardiac Measurement features. The standard deviations of methods were obtained by dividing the test set into 5555 parts based on the diagnosis time.
Modality Resolution AUC Accuracy MCC
Unimodal (CM) [5] - 0.7300±0.04plus-or-minus0.73000.040.7300\pm 0.040.7300 ± 0.04 0.7400±0.03plus-or-minus0.74000.030.7400\pm 0.030.7400 ± 0.03 0.1182±0.03plus-or-minus0.11820.030.1182\pm 0.030.1182 ± 0.03
Unimodal (SA) [17] 64×64646464\times 6464 × 64 0.7391±0.05plus-or-minus0.73910.050.7391\pm 0.050.7391 ± 0.05 0.7312±0.07plus-or-minus0.73120.070.7312\pm 0.070.7312 ± 0.07 0.3604±0.02plus-or-minus0.36040.020.3604\pm 0.020.3604 ± 0.02
128×128128128128\times 128128 × 128 0.7495±0.05plus-or-minus0.74950.050.7495\pm 0.050.7495 ± 0.05 0.7321±0.04plus-or-minus0.73210.040.7321\pm 0.040.7321 ± 0.04 0.3277±0.01plus-or-minus0.32770.010.3277\pm 0.010.3277 ± 0.01
Unimodal (FC) [17] 64×64646464\times 6464 × 64 0.8034±0.02plus-or-minus0.80340.020.8034\pm 0.020.8034 ± 0.02 0.7509±0.04plus-or-minus0.75090.040.7509\pm 0.040.7509 ± 0.04 0.4240±0.02plus-or-minus0.42400.020.4240\pm 0.020.4240 ± 0.02
128×128128128128\times 128128 × 128 0.8100±0.04plus-or-minus0.81000.040.8100\pm 0.040.8100 ± 0.04 0.7925±0.05plus-or-minus0.79250.050.7925\pm 0.050.7925 ± 0.05 0.4666±0.02plus-or-minus0.46660.020.4666\pm 0.020.4666 ± 0.02
Bi-modal (SA and FC): 64×64646464\times 6464 × 64 0.7998±0.01plus-or-minus0.79980.010.7998\pm 0.010.7998 ± 0.01 0.7698±0.03plus-or-minus0.76980.030.7698\pm 0.030.7698 ± 0.03 0.4185±0.03plus-or-minus0.41850.030.4185\pm 0.030.4185 ± 0.03
Early fusion 128×128128128128\times 128128 × 128 0.7470±0.02plus-or-minus0.74700.020.7470\pm 0.020.7470 ± 0.02 0.7283±0.02plus-or-minus0.72830.020.7283\pm 0.020.7283 ± 0.02 0.3512±0.02plus-or-minus0.35120.020.3512\pm 0.020.3512 ± 0.02
Bi-modal (SA and FC): 64×64646464\times 6464 × 64 0.8028±0.04plus-or-minus0.80280.040.8028\pm 0.040.8028 ± 0.04 0.7509±0.03plus-or-minus0.75090.030.7509\pm 0.030.7509 ± 0.03 0.3644±0.01plus-or-minus0.36440.010.3644\pm 0.010.3644 ± 0.01
Late fusion 128×128128128128\times 128128 × 128 0.8122±0.03plus-or-minus0.81220.030.8122\pm 0.030.8122 ± 0.03 0.7547±0.03plus-or-minus0.75470.030.7547\pm 0.030.7547 ± 0.03 0.3594±0.02plus-or-minus0.35940.020.3594\pm 0.020.3594 ± 0.02
Bi-modal (SA and CM): 64×64646464\times 6464 × 64 0.7564±0.04plus-or-minus0.75640.040.7564\pm 0.040.7564 ± 0.04 0.7585±0.02plus-or-minus0.75850.020.7585\pm 0.020.7585 ± 0.02 0.3825±0.02plus-or-minus0.38250.020.3825\pm 0.020.3825 ± 0.02
Late fusion 128×128128128128\times 128128 × 128 0.7629±0.03plus-or-minus0.76290.030.7629\pm 0.030.7629 ± 0.03 0.7434±0.03plus-or-minus0.74340.030.7434\pm 0.030.7434 ± 0.03 0.3666±0.03plus-or-minus0.36660.030.3666\pm 0.030.3666 ± 0.03
Bi-modal (FC and CM): 64×64646464\times 6464 × 64 0.8061±0.03plus-or-minus0.80610.030.8061\pm 0.030.8061 ± 0.03 0.7709±0.02plus-or-minus0.77090.020.7709\pm 0.020.7709 ± 0.02 0.4435±0.02plus-or-minus0.44350.020.4435\pm 0.020.4435 ± 0.02
Late fusion 128×128128128128\times 128128 × 128 0.8135±0.02plus-or-minus0.81350.020.8135\pm 0.020.8135 ± 0.02 0.7925±0.02plus-or-minus0.79250.020.7925\pm 0.020.7925 ± 0.02 0.4999±0.03plus-or-minus0.49990.030.4999\pm 0.030.4999 ± 0.03
Tri-modal (FC, SA, and CM) 64×64646464\times 6464 × 64 0.8146±0.04plus-or-minus0.81460.040.8146\pm 0.040.8146 ± 0.04 0.7774±0.03plus-or-minus0.77740.030.7774\pm 0.030.7774 ± 0.03 0.4460±0.02plus-or-minus0.44600.020.4460\pm 0.020.4460 ± 0.02
Hybrid fusion 128×128128128128\times 128128 × 128 0.8327±0.06plus-or-minus0.83270.06\mathbf{0.8327\pm 0.06}bold_0.8327 ± bold_0.06 0.8038±0.05plus-or-minus0.80380.05\mathbf{0.8038\pm 0.05}bold_0.8038 ± bold_0.05 0.5099±0.04plus-or-minus0.50990.04\mathbf{0.5099\pm 0.04}bold_0.5099 ± bold_0.04
Tri-modal Hybrid fusion 64×64646464\times 6464 × 64 0.7892±0.04plus-or-minus0.78920.040.7892\pm 0.040.7892 ± 0.04 0.7513±0.05plus-or-minus0.75130.050.7513\pm 0.050.7513 ± 0.05 0.4278±0.02plus-or-minus0.42780.020.4278\pm 0.020.4278 ± 0.02
without uncertainty binning 128×128128128128\times 128128 × 128 0.8036±0.03plus-or-minus0.80360.030.8036\pm 0.030.8036 ± 0.03 0.7820±0.04plus-or-minus0.78200.040.7820\pm 0.040.7820 ± 0.04 0.4779±0.01plus-or-minus0.47790.010.4779\pm 0.010.4779 ± 0.01

Uncertainty-Based Sample Binning: To improve the quality of training data, we used quantile binning to remove training samples with uncertain landmarks. The landmarks were divided into 50505050 bins, and then removed one bin at a time in the descending order of their uncertainties. Figure 2 depicts the results of binning using 10101010-fold cross-validation on the training set, where the performance improves consistently over the four scales when removed bins 5absent5\leq 5≤ 5. Based on the results, we removed 5555 bins (129129129129 out of 1081108110811081 samples) from the training set, and used the remaining 952952952952 training samples for the following experiments.

Unimodal Study: The performance of three models on single-modality is reported in Table 2, including short-axis (SA), four-chamber (FC), and cardiac measurements (CM), where the CM based unimodal is considered as the baseline. The results demonstrate an improvement of ΔΔ\Deltaroman_ΔAUC =0.0800absent0.0800=0.0800= 0.0800 ΔΔ\Deltaroman_ΔAccuracy =0.0527absent0.0527=0.0527= 0.0527, and ΔΔ\Deltaroman_ΔMCC =0.3484absent0.3484=0.3484= 0.3484 over the baseline obtained by FC based unimodal, which indicates that tensor-based features have a diagnostic value.

Bi-modal Study: In this experiment, we compared the performance of bi-modal models. As shown in Table 2, bimodal (four-chamber and CM) produces superior performance (i.e., AUC =0.8135absent0.8135=0.8135= 0.8135, Accuracy=0.79250.79250.79250.7925 and MCC =0.4999absent0.4999=0.4999= 0.4999) among bi-modal models. Next, we investigated the effect of fusing CM features with short-axis and four-chamber modalities in Fig. 3. It can be observed from these figures that the fusion of CM features enhances the diagnostic power of cardiac MRI modalities at all scales. The bi-modal (four-chamber and CM) model achieved the improvement in the performance (ΔΔ\Deltaroman_ΔAUC =0.0035absent0.0035=0.0035= 0.0035 and ΔΔ\Deltaroman_ΔMCC =0.0333absent0.0333=0.0333= 0.0333) over the unimodal (four-chamber) model.

Effectiveness of Tri-modal: In this experiment, we performed a fusion of CM features with the bi-modal models to create two tri-modal models. The first tri-modal is tri-modal late (CM with a late fusion of short-axis and four-chamber) and the second tri-modal is a tri-modal hybrid (CM with an early fusion of short-axis and four-chamber). As shown in Fig. 4, CM features enhance the performance of bi-modal models and tri-modal hybrid outperforms all. The tri-modal hybrid obtained the best performance (Table 2, where AUC =0.8327absent0.8327=0.8327= 0.8327, Accuracy =0.8038absent0.8038=0.8038= 0.8038, and MCC =0.5099absent0.5099=0.5099= 0.5099) and a significant improvement of ΔΔ\Deltaroman_ΔAUC =0.1027absent0.1027=0.1027= 0.1027, ΔΔ\Deltaroman_ΔAccuracy =0.0628absent0.0628=0.0628= 0.0628, and ΔΔ\Deltaroman_ΔMCC =0.3917absent0.3917=0.3917= 0.3917 over the baseline method.

Refer to caption
Figure 4: The effect of combining CM features on the bi-modals including early and late fusion of four-chamber and short-axis. Early fusion: early fusion of short-axis and four-chamber; late fusion: late fusion of short-axis and four-chamber.
Refer to caption
Figure 5: Evaluating clinical utility of our method using Decision Curve Analysis (DCA) [19].“Treat All” means treating all patients, regardless of their actual disease status, while “Treat None” means treating no patients at all. Our predictive model’s net benefit is compared with the net benefit of treating everyone or no one to determine its overall utility.

Decision Curve Analysis (DCA) [19, 13] on the performance suggests the potential clinical utility of the proposed method. As shown in Fig. 5, the Tri-modal model outperformed the baseline method for most possible benefit/harm preferences, where benefit indicates a positive net benefit (i.e. correct diagnosis) and harm indicates a negative net benefit (i.e. incorrect diagnosis). The tri-modal model (the best model) obtained a higher net benefit between decision threshold probabilities of 0.300.300.300.30 and 0.700.700.700.70 which implies that our method has a diagnostic value and can be used in screening high-risk patients from a large population.

Feature contributions: Our model is interpretable. The highly-weighted features were detected in the left ventricle and interventricular septum in cardiac MRI. For cardiac measurements, left atrial volume (0.778/1) contributed more than left ventricular mass (0.222/1) to the prediction.

4 Conclusions

This paper proposed a tensor learning-based pipeline for PAWP classification. We demonstrated that: 1111) tensor-based features have a diagnostic value for PAWP, 2222) the integration of CM features improved the performance of unimodal and bi-modal methods, 3333) the pipeline can be used to screen a large population, as shown using decision curve analysis. However, the current study is limited to single institutional data. In the future, we would like to explore the applicability of the method for multi-institutional data using domain adaptation techniques.

Acknowledgment

The study was supported by the Wellcome Trust grants 215799/Z/19/Z and 205188/Z/16/Z.

References

  • [1] Adamson, P.B., Abraham, W.T., Bourge, R.C., Costanzo, M.R., Hasan, A., Yadav, C., Henderson, J., Cowart, P., Stevenson, L.W.: Wireless pulmonary artery pressure monitoring guides management to reduce decompensation in heart failure with preserved ejection fraction. Circulation: Heart Failure 7(6), 935–944 (2014)
  • [2] Alabed, S., Uthoff, J., Zhou, S., Garg, P., Dwivedi, K., Alandejani, F., Gosling, R., Schobs, L., Brook, M., Shahin, Y., et al.: Machine learning cardiac-MRI features predict mortality in newly diagnosed pulmonary arterial hypertension. European Heart Journal-Digital Health 3(2), 265–275 (2022)
  • [3] Assadi, H., Alabed, S., Maiter, A., Salehi, M., Li, R., Ripley, D.P., Van der Geest, R.J., Zhong, Y., Zhong, L., Swift, A.J., et al.: The role of artificial intelligence in predicting outcomes by cardiovascular magnetic resonance: a comprehensive systematic review. Medicina 58(8),  1087 (2022)
  • [4] Emdin, M., Vittorini, S., Passino, C., Clerico, A.: Old and new biomarkers of heart failure. European Journal of Heart Failure 11(4), 331–335 (2009)
  • [5] Garg, P., Gosling, R., Swoboda, P., Jones, R., Rothman, A., Wild, J.M., Kiely, D.G., Condliffe, R., Alabed, S., Swift, A.J.: Cardiac magnetic resonance identifies raised left ventricular filling pressure: prognostic implications. European Heart Journal 43(26), 2511–2522 (2022)
  • [6] Huang, S.C., Pareek, A., Zamanian, R., Banerjee, I., Lungren, M.P.: Multimodal fusion with deep neural networks for leveraging ct imaging and electronic health record: a case-study in pulmonary embolism detection. Scientific Reports 10(1),  1–9 (2020)
  • [7] Jain, A., Nandakumar, K., Ross, A.: Score normalization in multimodal biometric systems. Pattern Recognition 38(12), 2270–2285 (2005)
  • [8] Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: A data perspective. ACM Computing Surveys (CSUR) 50(6),  94 (2018)
  • [9] Lu, H., Liu, X., Zhou, S., Turner, R., Bai, P., Koot, R.E., Chasmai, M., Schobs, L., Xu, H.: Pykale: Knowledge-aware machine learning from multiple sources in python. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management. pp. 4274–4278 (2022)
  • [10] Lu, H., Plataniotis, K.N., Venetsanopoulos, A.: Multilinear subspace learning: dimensionality reduction of multidimensional data. CRC press (2013)
  • [11] Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: MPCA: Multilinear principal component analysis of tensor objects. IEEE Transactions on Neural Networks 19(1), 18–39 (2008)
  • [12] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
  • [13] Sadatsafavi, M., Adibi, A., Puhan, M., Gershon, A., Aaron, S.D., Sin, D.D.: Moving beyond AUC: decision curve analysis for quantifying net benefit of risk prediction models. European Respiratory Journal 58(5) (2021)
  • [14] Savarese, G., Becher, P.M., Lund, L.H., Seferovic, P., Rosano, G.M., Coats, A.J.: Global burden of heart failure: a comprehensive and updated review of epidemiology. Cardiovascular Research 118(17), 3272–3287 (2022)
  • [15] Schöbs, L., Swift, A.J., Lu, H.: Uncertainty estimation for heatmap-based landmark localization. IEEE Transactions on Medical Imaging (2022)
  • [16] Schobs, L., Zhou, S., Cogliano, M., Swift, A.J., Lu, H.: Confidence-quantifying landmark localisation for cardiac MRI. In: 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI). pp. 985–988. IEEE (2021)
  • [17] Swift, A.J., Lu, H., Uthoff, J., Garg, P., Cogliano, M., Taylor, J., Metherall, P., Zhou, S., Johns, C.S., Alabed, S., et al.: A machine learning cardiac magnetic resonance approach to extract disease features and automate pulmonary arterial hypertension diagnosis. European Heart Journal-Cardiovascular Imaging 22(2), 236–245 (2021)
  • [18] Uthoff, J., Alabed, S., Swift, A.J., Lu, H.: Geodesically smoothed tensor features for pulmonary hypertension prognosis using the heart and surrounding tissues. In: 23rd International Conference Medical Image Computing and Computer Assisted Intervention–MICCAI 2020. pp. 253–262 (2020)
  • [19] Vickers, A.J., Elkin, E.B.: Decision curve analysis: a novel method for evaluating prediction models. Medical Decision Making 26(6), 565–574 (2006)
  • [20] Welch, B.L.: The generalization of ‘student’s’problem when several different population varlances are involved. Biometrika 34(1-2), 28–35 (1947)