Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
\tnotemark

[1]

\tnotetext

[1]Goldstein is supported by National Institutes of Health (NIH) K08EB030120 and R01EB030130. Cooper is supported by NIH U24NS133949, R01LM013523, and U01CA220401. Infrastructure used by this work, including REDCap, is supported by UL1TR001422. Cooper also reports honoraria from Risk Appraisal Forum, Lynn Sage Breast Cancer Foundation, Jayne Koskinas Ted Giovanis Foundation for Health & Policy and Consultation for Tempus. The other authors have no relevant financial interest in the products or companies described in this article.

[type=editor, auid=000,bioid=1, prefix=, role=, orcid=https://orcid.org/0000-0001-6666-2179] \cormark[1]

1]organization=Department of Pathology, Northwestern University, addressline=750 North Lake Shore Dr., city=Chicago, postcode=60611, state=Illinois, country=USA

[]

2]organization=Department of Urology, Northwestern University, addressline=675 North St Clair St., city=Chicago, postcode=60611, state=Illinois, country=USA

[]

[role=, suffix=, ] 3]organization=CZ Biohub, city=Chicago, state=Illinois, country=USA

\cortext

[cor1]Corresponding author

Machine learning identification of maternal inflammatory response and histologic choroamnionitis from placental membrane whole slide images

Abhishek Sharma as711@northwestern.edu [    Ramin Nateghi [    Marina Ayad    Lee A.D. Cooper [    Jeffery A. Goldstein
Abstract

Introduction: The placenta forms a critical barrier to infection through pregnancy, labor and, delivery. Inflammatory processes in the placenta have short-term, and long-term consequences for offspring health. Digital pathology and machine learning can play an important role in understanding placental inflammation, and there have been very few investigations into methods for predicting and understanding Maternal Inflammatory Response (MIR). This work intends to investigate the potential of using machine learning to understand MIR based on whole slide images (WSI), and establish early benchmarks.
Methods: To that end, use Multiple Instance Learning framework with 3 feature extractors: ImageNet-based EfficientNet-v2s, and 2 histopathology foundation models, UNI and Phikon to investigate predictability of MIR stage from histopathology WSIs. We also interpret predictions from these models using the learned attention maps from these models. We also use the MIL framework for predicting white blood cells count (WBC) and maximum fever temperature (Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT).
Results: Attention-based MIL models are able to classify MIR with a balanced accuracy of up to 88.5% with a Cohen’s Kappa (κ𝜅\kappaitalic_κ) of up to 0.772. Furthermore, we found that the pathology foundation models (UNI and Phikon) are both able to achieve higher performance with balanced accuracy of 87.2 % and 88.5 % respectively, and κ𝜅\kappaitalic_κ of 0.751 and 0.772 respectively, compared to ImageNet-based feature extractor (EfficientNet-v2s) which achieves a balanced accuracy of 83.7% and κ𝜅\kappaitalic_κ of 0.724. For WBC and Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT prediction, we found mild correlation between actual values and those predicted from histopathology WSIs.
Discussion: We used MIL framework for predicting MIR stage from WSIs, and compared effectiveness of foundation models as feature extractors, with that of an ImageNet-based model. We further investigated model failure cases and found them to be either edge cases prone to interobserver variability, examples of pathologist’s overreach, or mislabeled due to processing errors.

keywords:
Placenta \sepMaternal Inflammatory Response \sepWhole Slide Images \sepMultiple Instance Learning \sepAttention Models
{graphicalabstract}[Uncaptioned image]
{highlights}

A machine learning model can identify maternal inflammatory response in placental membranes from whole slide images with high accuracy.

Medical imaging foundation models are better feature extractors for MIR identification than model pretrained for ImageNet classification.

As an exploratory analysis, we found white blood cell count and maximum fever temperature to be mildly predictable from placental whole slide images.

1 Introduction

Maternal Inflammatory Response: Definitions and Pathogenesis

Acute Placental Inflammation (API) is categorized into Maternal Inflammatory Response (MIR) or Fetal Inflammatory Response (FIR) depending on the origin of the inflammatory cells. MIR is characterized by extravasation of maternal neutrophils with movement towards the chorionic layer, followed by moving across the amnion and into amniotic space. MIR is divided into 3 stages. These stages are defined according to the Amsterdam criteria [35]. Stage 1 corresponds to subchorionitis, where where maternal neutrophils are limited to the cellular chorion in the membranes and subchorionic fibrin in the chorionic plate. Stage 2, chorioamnionitis is characterized by neutrophilic migration into the fibrous chorion and amniotic connective tissue. Stage 3, chorioamnionitis with amnion necrosis, is characterized by presence of neutrophil karyorrhectic debris, asbcess formation, and thickened basement membranes. MIR is classically seen in the presence of ascending infection by Group B. Streptococcus, E. coli and other pathogenic bacteria [39, 41]. However, organisms are not always identified and some authorities argue that MIR may be the result of sterile inflammation. [51, 42, 21, 23, 38, 32]

Immediate and long-term implications of MIR

MIR is associated with several adverse health outcomes. Bacteria causing MIR may enter the maternal or fetal circulation, causing puerperal fever or early onset neonatal sepsis, respectively. When MIR co-occurs with FIR, there is an increased risk of neonatal death [25, 27]. MIR2 is a risk factor for recurrent wheeze [22, 30], asthma [22, 16, 12], and chronic lung disease [12]. MIR has also been associated with lower mental development index [50], and increased risk of autism spectrum disorder [46]. In a small case-control study, severe MIR has also been linked to cerebral palsy albeit indirectly [37].

Challenges in analysing MIR

The stages of MIR are broadly defined with significant variability in how different pathologists classify these patterns and the cut-off between different stages [17, 36]. In a study by Redline et al. [36], showed a moderate kappa agreement of 0.580.580.580.58 for the diagnosis of no MIR (MIR0) vs. any MIR (MIR1-3). An older study using a previous definition of histologic chorioamnionitis showed a kappa of 0.72 [43]. A study using secondary review of generalist pathologist diagnoses by an expert placental pathologist showed a kappa for any MIR of 0.6 [13]. This variability creates challenges for analysis. This may explain the relatively loose association between histologic chorioamnionitis and clinical chorioamnionitis or markers of maternal systemic inflammation [34, 32, 23, 28].

Machine Learning and Multiple Instance Learning

Machine Learning along with digital pathology allows investigations to be conducted with larger datasets [5, 10, 44]. Consequently, there have been several studies conducted in recent years utilizing machine learning to build tools for pathological diagnoses, and analyse digital pathology datasets. These include both local-level tasks like nuclei segmentation, and global-level tasks prostate cancer detection [45].

Whole slide images are up to 100,000 x 60,000 pixels – too large for efficient computation. The earliest studies in computational pathology randomly selected patches from WSI in the hope that they would carry diagnostic material from the slide. Then convolutional neural networks (ResNet, EfficientNet) or attention-based models (ViT) were finetuned on the resultant dataset [24]. Later, multiple instance learning (MIL) based methods were proposed, that take all the patches into account, and learn to ascribe importance to patches. The importance scores can be calculated using probability of belonging to the positive class [4], as a proxy for selecting highly important patches. Another approach was proposed in [26], where attention mechanism was used to force the model to ascribe importance to patches during classification. These attention scores were calculated based on features extracted using some pretrained model e.g., ResNet50, instead of using raw patches directly. Recently, several groups have released pathology foundation models. Unlike ResNet or EfficientNet, these models use transformer architecture and have been trained on tens of thousands to millions of slides [15, 6, 53]. These models show superior performance on benchmarks. However, the training sets lack placental slides and benchmarks are nearly all on detection, classification, and mutation identification of neoplasia.

Recently, several studies have been conducted employing machine learning and MIL for predicting and analyzing placenta [29]. Chen et al. [7] used CNNs to design a system for morphological characterization, and clinically meaningful feature analysis of placentas from photos. Zhang et al. [52] propose a Cycle-GAN along with attention module and saliency constraint to enable cross-domain image segmentation by translating target domain placenta pictures (from 1 hospital) into source domain (a different hospital), and adapting a pretrained segmentation model to segment them. Clymer et al. [9] designed multiresolution CNNs to classify desidual vasculopathy in placental membranes. Mobadersany et al. [31] used MIL framwork to improve prediction of gestational age from placental WSI. Andreasen et al. [3] proposed deep learning method for placenta segmentation from obstetric ultrasound. Goldstein et al. [18] used the MIL framework for classification of placental villous infarction, perivillous fibrin deposition, and intervillous thrombus. Patnaik et al. [33] used a pretrained ResNet-18 for feature extraction from placental histopathology images, and classify maternal vascular malperfusion. Irmakci et al. [19] investigated the challenges posed to Machine Learning models by presence of tissue contaminants e.g., floaters in WSIs. Studies have also been conducted to classify cell and regions in the placental disc [14, 49]

Recently, Chou et al. [8] used machine learning and other quantitative techniques to characterize maternal inflammatory response (MIR) in placental membranes.

In this study, we investigate MIR stage prediction from WSI.

2 Methods

Dataset: Patients and Placentas

The study was approved by our Institutional Review Board as STU00214052 and operated under waiver of consent. Inclusion criteria were singleton placentas examined at our institution between 2010 and 2024. We selected cases with and without MIR for retrieval and scanning. Slides were digitized on a Leica GT450 scanner with a 40× objective magnification (0.263 μm𝜇𝑚\mu mitalic_μ italic_m per pixel). Patient demographic, laboratory, and placental pathology information were extracted from our electronic data warehouse. Slides were linked to pathology reports via optical character recognition of the labels with human review. MIR stage involving the membranes was identified from reports using regular expressions [48]. Unfortunately, our site does not grade MIR. Highest fever (Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT) was defined as the highest maternal temperature recorded in the 72 hours before delivery. White blood cell count (WBC) was defined as the highest maternal white blood cell count in samples collected in the 72 hours before delivery.

Data processing for MIR prediction

We simplified the MIR stage classification problem by combining data from MIR0 and MIR1 into a one class, and MIR2 and MIR3 into another. Patches of size 224×224224224224\times 224224 × 224 with no overlap were extracted at 20x from whole slide images of placental membranes from each patient. We used three different feature extractors. EfficientNetV2S [47] was trained using the ImageNet dataset of everyday objects (non-medical) [11]. Phikon [15] and UNI [6] are pathology foundation models, trained on diverse datasets of histology images, though not including placenta. These features were stored in ’tfrecords’ files for the aggregation and classification step in MIL pipeline.

Dataset was split into training, validation and testing splits, where 20%percent2020\%20 % of the data (678 samples) were held-out as the test set, and the training (2436 samples) and validation set (271 samples) were used for training and hyperparameter optimization respectively.

Data processing for white blood cell count and fever prediction

For white blood cell count and Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT prediction, we used UNI and Phikon for extracting features which were then aggregated using learned attention scores, and used by the regression network for prediction. The data split between training, validation, and test was created in a similar manner as for MIR prediction. However, there are fewer data point available with WBC (1320 training, 142 validation, and 370 test samples) and Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT (2256 training, 251 validation, and 632 test samples) information.

Machine Learning Models

We used an attention-based multiple instance learning architecture similar to [26]. The architecture splits information processing into two stages- First stage involves patch-level feature extraction using a pretrained network. The pretrained network filters information from patches, that is available during stage 2 i.e., aggregation for the final prediction. Thus, the representations learnt by feature extractor model can play a crucial role in the final predictions. Stage 2 involves aggregation of patch-level feature to make the final prediction. The architecture uses attention to weigh different patch features. These attention weights are themselves learned during training.

Refer to caption
Figure 1: (A) Model architecture for MIR stage classification. Each MIR class has a corresponding set of attention weights for the patch embeddings [26]. The aggregated embeddings for each class are processed by fully connected classification layers followed by a softmax layer to convert to class probabilities. The class with highest probability is the model prediction. Attention maps are visualized by coloring each patch based on corresponding attention weight. (B) Model architecture for white blood cell count (WBC) and maternal highest temperature (Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT) prediction

Foundation models for feature extraction:

As a first step for the MIL pipeline, we extracted patch-level features from each slide using different feature extractors. We used EfficientNet-v2s [47], trained on ImageNet dataset (a non-medical dataset) [11], which served as our model performance baseline. We investigated and compared features from two pathology foundation models- 1) Phikon [15], and 2) UNI [6], against the baseline.

Loss functions:

For MIR stage prediction we used hinge loss. We also investigated effect of class weighting along with these loss functions. For white blood cell count prediction and fever prediction, we used mean squared error (mse), and weighted mean squared error (wmse). For weighted mean squared error calculation, we first split the ground truth values into 10101010 bins, then assigned weights to these bins which are inversely proportional to the number of samples in the corresponding bin. The mean squared error corresponding to each sample is then weighted according to the bin its ground truth belongs to.

Evaluation Metrics

For MIR prediction, we used area under the receiver operator characteristic curve (auroc), Balanced Accuracy, Mathew’s Correlation Coefficient (mcc), and Cohen’s Kappa (κ𝜅\kappaitalic_κ) to evaluation the classification task. To evaluate white blood cell count prediction and fever prediction, we used mean squared error (MSE), mean absolute error (MAE), and R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-score.

3 Results

Table 1 summarizes the information about patients. Patients with MIR0,1 and MIR2,3 were of similar maternal age at delivery, and had a similar gestational age, although with a higher variation. Patients with MIR2,3 were more likely to be nulliparous, and less likely to have diabetes and hypertension in pregnancy, than patients with MIR0,1.

Table 1: The median (IQR) values for maternal age and gestational age are shown below, along with the count (percentage) for parous, diabetes, and hypertension. p-values are reported from Mann-Whitney U rank test for Maternal and Gestational Age, and Chi-squared test for Parous, Diabetes, and Hyptension.
value MIR 0,1 (n = 2786) MIR 2, 3 (n = 599) p_mir01_mir23
Maternal Age (yrs.) 33 (30 - 36) 32 (28 - 36) 6.74E-07
Gestational Age (wks.) 37 (34 - 39) 37 (27 - 39) 0.000275
Parous 1352 (0.49) 221 (0.38) 6.15E-05
Diabetes in Pregnancy 333 (0.12) 56 (0.09) 0.088
Hypertension in Pregnancy 764 (0.27) 80 (0.13) 3.97E-10

MIR stage prediction: Balanced accuracy, MCC, Cohen’s kappa

Table 2 shows auroc, balanced accuracy, Mathew’s correlation coefficient (mcc), and Cohen’s kappa (κ𝜅\kappaitalic_κ) values for EfficientNet, Phikon, and UNI features. These models achieved a balanced accuracy and Cohen’s Kappa (κ𝜅\kappaitalic_κ) of >83.7%absentpercent83.7>83.7\%> 83.7 %, and >0.724absent0.724>0.724> 0.724 respectively.

Table 2: Comparison of performance between different feature extractors in terms of auroc, balanced accuracy, mcc, and Cohen’s Kappa, specificity, and sensitivity. Values reported are for the test set (N=676).
AUROC Balanced Acc. MCC κ𝜅\kappaitalic_κ Specificity Sensitivity
EfficientNet 0.837 83.7 % 0.729 0.724 0.97 0.70
Phikon 0.885 88.5 % 0.772 0.772 0.96 0.81
UNI 0.872 87.2 % 0.751 0.751 0.96 0.79
Refer to caption
Figure 2: Test set confusion matrices for EfficientNet (top), UNI (bottom-left), Phikon (bottom-right).

Comparison between feature extractors

Table 2 shows a comparison between a feature extractor trained on ImageNet (EfficientNet-v2s), and 2 pathology foundation models (UNI and Phikon). EfficientNet is prone to making more false negative predictions compared to UNI and Phikon, and is worse at identifying MIR2,3. We found that both the foundation models perform significantly better than EfficientNet in terms of balanced accuracy. In addition, we found that Phikon is marginally better than UNI.

Visualizing slide-level and patch-level features

Figure 3 a the t-SNE visualization of slide-level features from the test set, aggregated by the pooling layer of the Phikon MIL model. We see slides being separated into two major clusters. Data points with MIR0 and MIR1 are shown in dark blue and light blue respectively. MIR2 and MIR3 shown in red, and light red respectively. The aggregated embeddings show definite, but imperfect separation between the MIR0,1 and MIR2,3 slides. The embeddings also show a gradient with MIR0 slides (dark blue) farthest to the left and MIR3 slides (bright red) mostly to the right. MIR1 and MIR2 are placed in between, with MIR1 being closer to MIR0 than MIR3, while MIR2 is closer to MIR3.

Figure 4 shows t-SNE visualization for top-1 attention patches from all the wsi. The weighted average featuer vectors encode information relevant to the MIR stage diagnosis, as reflected by separate blue and red clusters for MIR0,1 and MIR2,3 respectively.

Refer to caption
Figure 3: t-SNE space of aggregated features from Phikon in the test set. Colors represent the true class. MIR0 is shown in dark blue, MIR1 in light blue, MIR2 in light red, and MIR3 in dark red.
Refer to caption
Figure 4: t-SNE embeddings of top-1 attended patches for attention branch 1 (See model architecture in Fig. 1), from each slide in the test set. Image patches corresponding to randomly sampled patches are shown with patching outlines highlighting the original diagnosis (MIR Stage) of corresponding WSIs. Blue patches are largely normal placenta or some neutrophil infiltration. The red patches show stroma with greater neutrophil infiltration. All the embeddings are shown in the inset.

Investigating model failure cases: False positives and negatives.

To understand how the model functions or malfunctions, we investigated a few cases where the model gives false-positive or false-negative results. Specifically, we looked at the extreme false positive (rightmost blue points in Figure 3), and extreme false negative (leftmost red points in Figure 3). The attention heatmap, and top-10 attention patches for a false positive and false negative are shown in Fig. 5 and 6 respectively.

Refer to caption
Figure 5: An example of a false positive. (A) WSI, (B) Attention Heatmap, (C) Top-10 attention patches from attention branch 1 (See model architecture in Fig. 1). This is a case with severe MIR1 with decidual necrosis, but does not meet criteria for MIR2.
Refer to caption
Figure 6: An example of false negative. (A) WSI, (B) Attention Heatmap, (C) Top-10 attention patches from attention branch 1 (See model architecture in Fig. 1). Review of high attention patches showed amnion with activated-appearing macrophages. Manual review of the whole slide showed scant inflammation, which may not be classified as MIR2 by all observers.

White blood cell count prediction

Table 3 shows performance of UNI and Phikon-based regression networks for white blood cell prediction. UNI performs slightly better than Phikon in terms of RMSE, MAE, and R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-score values. However, both the models show weak correlation between predicted and actual values of WBC.

Refer to caption
Figure 7: UNI predicts white blood cell count (WBC) with an Rsup2𝑅supremum2R\sup{2}italic_R roman_sup 2 of 0.071. Phikon predicts WBC with a R2 of 0.043 (See Table 3)

Maternal highest temperature prediction

Table 3 shows performance of UNI and Phikon-based regression networks for fever prediction (Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT). UNI performs slightly better than Phikon in terms of RMSE, MAE, and R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-score. Both the models show mild correlation between predicted and actual values of Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT.

Refer to caption
Figure 8: UNI predicts maximum fever temperature (Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT) with a R2 of 0.396 and rmse of 1.021. Phikon predicts Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT with a R2 of 0.291 and rmse of 1.125. (See Table 3)
Table 3: Performance for white blood cell count (WBC) prediction and fever (Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT) prediction using UNI features
value Model RMSE MAE R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT slope
WBC UNI 5.370 4.045 0.071 0.123
Phikon 7.744 6.392 0.043 0.186
Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT UNI 1.021 0.856 0.132 0.396
Phikon 1.125 0.999 0.123 0.291

4 Discussion

Automated analysis of MIR

We investigated the feasibility of predicting MIR stages (MIR0,1 and MIR2,3) from whole slide images. We found that attention-based MIL models are able to classify MIR with a balanced accuracy of up to 88.5% with a Cohen’s κ𝜅\kappaitalic_κ of up to 0.772. Furthermore, we found that the pathology foundation models (UNI and Phikon) are both able to achieve higher performance with balanced accuracy of 87.2 % and 88.5 % respectively, and Cohen’s Kappa (κ)𝜅(\kappa)( italic_κ ) of 0.751 and 0.772 respectively, compared to ImageNet-based feature extractor (EfficientNet-v2s). This is despite the fact that both the pathology foundation models have not been pretrained on any placental data.

We also investigated prediction of white blood cell count (WBC), and fever (Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT) from WSI. We found that the regression model with the UNI feature extractor, shows a moderate correlation between the predicted and actual Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT, indicating a moderate ability to predict fever from histopathological images. Prediction of white blood cell count shows a much weaker correlation.

Comparison between feature extractors In this study, we found that feature from histology foundation models- UNI, and Phikon, outperform ImageNet features for MIR stage prediction, despite the foundation models not being pretrained on any placenta data. In addition, we found Phikon to be slightly better than UNI.

Interobserver variability Of the cases misclassified by our model, many showed borderline findings between MIR1 and MIR2. This highlights the challenges of interobserver variability in building and assessing models. In a different context, we found that training on noisy labels from multiple observers can yield expert level performance [2]. Future studies could show similar improvement.

Clinical chorioamnionitis is defined by maternal fever, fetal or maternal tachycardia, uterine tenderness, and foul discharge, and is often accompanied by elevated WBC count [1, 40, 20]. As an exploratory study, we tested whether we could estimate Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT or WBC using placental histology. We found weak, but statistically significant correlations.

Strengths and limitations:

One of the major strengths of this study is that the dataset the large (N = 3,385) real-world dataset. Further, we have investigated the use of medical imaging foundation models (Phikon and UNI) as feature extractors in the MIL pipeline, and demonstrated that they perform better than using features extracted from model pretrained on images from everyday objects i.e., ImageNet (EfficientNet-v2s), for MIR classification, even though no placental images were used to train Phikon and UNI. We also showed that Phikon and UNI both achieve similar performance for the MIR prediction task. Thus highlighting the need for finetuning on placenta data for these models to extract finer details from placenta images, and achieve further performance gains. We investigated model’s ability to predict fever from histopathological images, and found moderate correlation between predicted and actual Tmaxsubscript𝑇𝑚𝑎𝑥T_{max}italic_T start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT.

One of the weaknesses of this study is that the dataset was sourced from a single site. The models might not generalize to datasets sources from other sites. Further limitations include that other maternal or fetal signs of inflammation were not considered. The rarity of neonatal sepsis poses challenges to predict it, using our data. Also, due to the nature of our dataset, long-term neonatal outcomes could not be investigated.

Contributions

Conception of the work: J.A.G., L.A.D.C., A.S.; Patient selection/scanning: J.A.G.; Technique and tool development: A.S., R.N., L.A.D.C., M.A.; Data preparation: A.S.; Experiments: A.S.; Analysis: A.S., J.A.G.; Manuscript: A.S.; Editing: A.S., J.A.G., L.A.D.C.

Declaration of competing interest

The authors have no relevant disclosures

References

  • Ajayi et al. [2022] Ajayi, S.O., Morris, J., Aleem, S., Pease, M.E., Wang, A., Mowes, A., Welles, S.L., Anday, E.K., Bhandari, V., 2022. Association of clinical signs of chorioamnionitis with histological chorioamnionitis and neonatal outcomes. The Journal of Maternal-Fetal & Neonatal Medicine 35, 10337–10347.
  • Amgad et al. [2022] Amgad, M., Atteya, L.A., Hussein, H., Mohammed, K.H., Hafiz, E., Elsebaie, M.A., Alhusseiny, A.M., AlMoslemany, M.A., Elmatboly, A.M., Pappalardo, P.A., et al., 2022. Nucls: A scalable crowdsourcing approach and dataset for nucleus classification and segmentation in breast cancer. GigaScience 11, giac037.
  • Andreasen et al. [2023] Andreasen, L.A., Feragen, A., Christensen, A.N., Thybo, J.K., Svendsen, M.B.S., Zepf, K., Lekadir, K., Tolsgaard, M.G., 2023. Multi-centre deep learning for placenta segmentation in obstetric ultrasound with multi-observer and cross-country generalization. Scientific Reports 13, 2221.
  • Campanella et al. [2019] Campanella, G., Hanna, M.G., Geneslaw, L., Miraflor, A., Werneck Krauss Silva, V., Busam, K.J., Brogi, E., Reuter, V.E., Klimstra, D.S., Fuchs, T.J., 2019. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nature medicine 25, 1301–1309.
  • Chang et al. [2019] Chang, H.Y., Jung, C.K., Woo, J.I., Lee, S., Cho, J., Kim, S.W., Kwak, T.Y., 2019. Artificial intelligence in pathology. Journal of pathology and translational medicine 53, 1–12.
  • Chen et al. [2024] Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F., Jaume, G., Chen, B., Zhang, A., Shao, D., Song, A.H., Shaban, M., et al., 2024. Towards a general-purpose foundation model for computational pathology. Nature Medicine .
  • Chen et al. [2020] Chen, Y., Zhang, Z., Wu, C., Davaasuren, D., Goldstein, J.A., Gernand, A.D., Wang, J.Z., 2020. Ai-plax: Ai-based placental assessment and examination using photos. Computerized Medical Imaging and Graphics 84, 101744.
  • Chou et al. [2024] Chou, T., Senkow, K.J., Nguyen, M.B., Patel, P.V., Sandepudi, K., Cooper, L.A., Goldstein, J.A., 2024. Quantitative modeling to characterize maternal inflammatory response of histologic chorioamnionitis in placental membranes. American Journal of Reproductive Immunology 92, e13944. URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/aji.13944, doi:https://doi.org/10.1111/aji.13944, arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/aji.13944.
  • Clymer et al. [2020] Clymer, D., Kostadinov, S., Catov, J., Skvarca, L., Pantanowitz, L., Cagan, J., LeDuc, P., 2020. Decidual vasculopathy identification in whole slide images using multiresolution hierarchical convolutional neural networks. The American Journal of Pathology 190, 2111--2122.
  • Cui and Zhang [2021] Cui, M., Zhang, D.Y., 2021. Artificial intelligence and computational pathology. Laboratory Investigation 101, 412--422.
  • Deng et al. [2009] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L., 2009. Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition, Ieee. pp. 248--255.
  • Dessardo et al. [2019] Dessardo, N.S., Mustać, E., Banac, S., Dessardo, S., 2019. Paths of causal influence from prenatal inflammation and preterm gestation to childhood asthma symptoms. J Asthma 56, 823--832.
  • Ernst et al. [2023] Ernst, L.M., Basic, E., Freedman, A.A., Price, E., Suresh, S., 2023. Comparison of placental pathology reports from spontaneous preterm births finalized by general surgical pathologists versus perinatal pathologist: A call to action. The American Journal of Surgical Pathology 47, 1116--1121.
  • Ferlaino et al. [2018] Ferlaino, M., Glastonbury, C.A., Motta-Mejia, C., Vatish, M., Granne, I., Kennedy, S., Lindgren, C.M., Nellåker, C., 2018. Towards deep cellular phenotyping in placental histology. arXiv preprint arXiv:1804.03270 .
  • Filiot et al. [2023] Filiot, A., Ghermi, R., Olivier, A., Jacob, P., Fidon, L., Mac Kain, A., Saillard, C., Schiratti, J.B., 2023. Scaling self-supervised learning for histopathology with masked image modeling. medRxiv , 2023--07.
  • Getahun et al. [2010] Getahun, D., Strickland, D., Zeiger, R.S., Fassett, M.J., Chen, W., Rhoads, G.G., Jacobsen, S.J., 2010. Effect of chorioamnionitis on early childhood asthma. Archives of pediatrics & adolescent medicine 164, 187--192.
  • Goldstein et al. [2020] Goldstein, J.A., Gallagher, K., Beck, C., Kumar, R., Gernand, A.D., 2020. Maternal-fetal inflammation in the placenta and the developmental origins of health and disease. Frontiers in immunology 11, 531543.
  • Goldstein et al. [2023] Goldstein, J.A., Nateghi, R., Irmakci, I., Cooper, L.A., 2023. Machine learning classification of placental villous infarction, perivillous fibrin deposition, and intervillous thrombus. Placenta 135, 43--50.
  • Irmakci et al. [2024] Irmakci, I., Nateghi, R., Zhou, R., Vescovo, M., Saft, M., Ross, A.E., Yang, X.J., Cooper, L.A., Goldstein, J.A., 2024. Tissue contamination challenges the credibility of machine learning models in real world digital pathology. Modern Pathology 37, 100422.
  • Jung et al. [2023] Jung, E., Romero, R., Suksai, M., Gotsch, F., Chaemsaithong, P., Erez, O., Conde-Agudelo, A., Gomez-Lopez, N., Berry, S.M., Meyyazhagan, A., et al., 2023. Clinical chorioamnionitis at term: definition, pathogenesis, microbiology, diagnosis, and treatment. American journal of obstetrics and gynecology .
  • Kim et al. [2015] Kim, C.J., Romero, R., Chaemsaithong, P., Chaiyasit, N., Yoon, B.H., Kim, Y.M., 2015. Acute chorioamnionitis and funisitis: definition, pathologic features, and clinical significance. American journal of obstetrics and gynecology 213, S29--S52.
  • Kumar et al. [2008] Kumar, R., Yu, Y., Story, R.E., Pongracic, J.A., Gupta, R., Pearson, C., Ortiz, K., Bauchner, H.C., Wang, X., 2008. Prematurity, chorioamnionitis, and the development of recurrent wheezing: a prospective birth cohort study. Journal of Allergy and Clinical Immunology 121, 878--884.
  • Lagodka et al. [2022] Lagodka, S., Petrucci, S., Moretti, M.L., Cabbad, M., Lakhi, N.A., 2022. Fetal and maternal inflammatory response in the setting of maternal intrapartum fever with and without clinical and histologic chorioamnionitis. American Journal of Obstetrics & Gynecology MFM 4, 100539.
  • Laleh et al. [2022] Laleh, N.G., Muti, H.S., Loeffler, C.M.L., Echle, A., Saldanha, O.L., Mahmood, F., Lu, M.Y., Trautwein, C., Langer, R., Dislich, B., et al., 2022. Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology. Medical image analysis 79, 102474.
  • Lau et al. [2005] Lau, J., Magee, F., Qiu, Z., Houbé, J., Von Dadelszen, P., Lee, S.K., 2005. Chorioamnionitis with a fetal inflammatory response is associated with higher neonatal mortality, morbidity, and resource use than chorioamnionitis displaying a maternal inflammatory response only. American journal of obstetrics and gynecology 193, 708--713.
  • Lu et al. [2021] Lu, M.Y., Williamson, D.F., Chen, T.Y., Chen, R.J., Barbieri, M., Mahmood, F., 2021. Data-efficient and weakly supervised computational pathology on whole-slide images. Nature biomedical engineering 5, 555--570.
  • Lynch et al. [2018] Lynch, A.M., Berning, A.A., Thevarajah, T.S., Wagner, B.D., Post, M.D., McCourt, E.A., Cathcart, J.N., Hodges, J.K., Mandava, N., Gibbs, R.S., et al., 2018. The role of the maternal and fetal inflammatory response in retinopathy of prematurity. American Journal of Reproductive Immunology 80, e12986.
  • Maki et al. [2022] Maki, Y., Sato, Y., Furukawa, S., Sameshima, H., 2022. Histological severity of maternal and fetal inflammation is correlated with the prevalence of maternal clinical signs. Journal of Obstetrics and Gynaecology Research 48, 1318--1327.
  • Marletta et al. [2023] Marletta, S., Pantanowitz, L., Santonicco, N., Caputo, A., Bragantini, E., Brunelli, M., Girolami, I., Eccher, A., 2023. Application of digital imaging and artificial intelligence to pathology of the placenta. Pediatric and Developmental Pathology 26, 5--12.
  • McDowell et al. [2016] McDowell, K.M., Jobe, A.H., Fenchel, M., Hardie, W.D., Gisslen, T., Young, L.R., Chougnet, C.A., Davis, S.D., Kallapur, S.G., 2016. Pulmonary morbidity in infancy after exposure to chorioamnionitis in late preterm infants. Annals of the American Thoracic Society 13, 867--876.
  • Mobadersany et al. [2021] Mobadersany, P., Cooper, L.A., Goldstein, J.A., 2021. Gestaltnet: aggregation and attention to improve deep learning of gestational age from placental whole-slide images. Laboratory Investigation 101, 942--951.
  • Oh et al. [2017] Oh, K.J., Kim, S.M., Hong, J.S., Maymon, E., Erez, O., Panaitescu, B., Gomez-Lopez, N., Romero, R., Yoon, B.H., 2017. Twenty-four percent of patients with clinical chorioamnionitis in preterm gestations have no evidence of either culture-proven intraamniotic infection or intraamniotic inflammation. American journal of obstetrics and gynecology 216, 604--e1.
  • Patnaik et al. [2024] Patnaik, P., Khodaee, A., Vasam, G., Mukherjee, A., Salsabili, S., Ukwatta, E., Grynspan, D., Chan, A.D., Bainbridge, S., 2024. Automated detection of microscopic placental features indicative of maternal vascular malperfusion using machine learning. Placenta 145, 19--26.
  • Rallis et al. [2022] Rallis, D., Lithoxopoulou, M., Pervana, S., Karagianni, P., Hatziioannidis, I., Soubasi, V., Tsakalidis, C., 2022. Clinical chorioamnionitis and histologic placental inflammation: association with early-neonatal sepsis. The Journal of Maternal-Fetal & Neonatal Medicine 35, 8090--8096.
  • Redline [2006] Redline, R.W., 2006. Inflammatory responses in the placenta and umbilical cord, in: Seminars in Fetal and Neonatal Medicine, Elsevier. pp. 296--301.
  • Redline et al. [2022] Redline, R.W., Vik, T., Heerema-McKenney, A., Jamtoy, A.H., Ravishankar, S., Ton Nu, T.N., Vogt, C., Ng, P., Nelson, K.B., Lydersen, S., et al., 2022. Interobserver reliability for identifying specific patterns of placental injury as defined by the amsterdam classification. Archives of Pathology & Laboratory Medicine 146, 372--378.
  • Redline et al. [2000] Redline, R.W., Wilson-Costello, D., Borawski, E., Fanaroff, A.A., Hack, M., 2000. The relationship between placental and other perinatal risk factors for neurologic impairment in very low birth weight children. Pediatric research 47, 721--726.
  • Roberts et al. [2012] Roberts, D.J., Celi, A.C., Riley, L.E., Onderdonk, A.B., Boyd, T.K., Johnson, L.C., Lieberman, E., 2012. Acute histologic chorioamnionitis at term: nearly always noninfectious. PloS one 7, e31819.
  • Romero et al. [2016a] Romero, R., Chaemsaithong, P., Docheva, N., Korzeniewski, S.J., Kusanovic, J.P., Yoon, B.H., Kim, J.S., Chaiyasit, N., Ahmed, A.I., Qureshi, F., et al., 2016a. Clinical chorioamnionitis at term vi: acute chorioamnionitis and funisitis according to the presence or absence of microorganisms and inflammation in the amniotic cavity. Journal of perinatal medicine 44, 33--51.
  • Romero et al. [2016b] Romero, R., Chaemsaithong, P., Korzeniewski, S.J., Kusanovic, J.P., Docheva, N., Martinez-Varea, A., Ahmed, A.I., Yoon, B.H., Hassan, S.S., Chaiworapongsa, T., et al., 2016b. Clinical chorioamnionitis at term iii: how well do clinical criteria perform in the identification of proven intra-amniotic infection? Journal of perinatal medicine 44, 23--32.
  • Romero et al. [2019] Romero, R., Gomez-Lopez, N., Winters, A.D., Jung, E., Shaman, M., Bieda, J., Panaitescu, B., Pacora, P., Erez, O., Greenberg, J.M., et al., 2019. Evidence that intra-amniotic infections are often the result of an ascending invasion--a molecular microbiological study. Journal of perinatal medicine 47, 915--931.
  • Romero et al. [2014] Romero, R., Miranda, J., Chaiworapongsa, T., Korzeniewski, S.J., Chaemsaithong, P., Gotsch, F., Dong, Z., Ahmed, A.I., Yoon, B.H., Hassan, S.S., et al., 2014. Prevalence and clinical significance of sterile intra-amniotic inflammation in patients with preterm labor and intact membranes. American journal of reproductive immunology 72, 458--474.
  • Simmonds et al. [2004] Simmonds, M., Jeffery, H., Watson, G., Russell, P., 2004. Intraobserver and interobserver variability for the histologic diagnosis of chorioamnionitis. American journal of obstetrics and gynecology 190, 152--155.
  • Song et al. [2023] Song, A.H., Jaume, G., Williamson, D.F., Lu, M.Y., Vaidya, A., Miller, T.R., Mahmood, F., 2023. Artificial intelligence for digital and computational pathology. Nature Reviews Bioengineering 1, 930--949.
  • Srinidhi et al. [2021] Srinidhi, C.L., Ciga, O., Martel, A.L., 2021. Deep neural network models for computational histopathology: A survey. Medical image analysis 67, 101813.
  • Straughen et al. [2017] Straughen, J.K., Misra, D.P., Divine, G., Shah, R., Perez, G., VanHorn, S., Onbreyt, V., Dygulska, B., Schmitt, R., Lederman, S., et al., 2017. The association between placental histopathology and autism spectrum disorder. Placenta 57, 183--188.
  • Tan and Le [2021] Tan, M., Le, Q., 2021. Efficientnetv2: Smaller models and faster training, in: International conference on machine learning, PMLR. pp. 10096--10106.
  • Van Rossum [2020] Van Rossum, G., 2020. The Python Library Reference, release 3.8.2. Python Software Foundation.
  • Vanea et al. [2022] Vanea, C., Džigurski, J., Rukins, V., Dodi, O., Siigur, S., Salumäe, L., Meir, K., Parks, W., Hochner-Celnikier, D., Fraser, A., et al., 2022. Happy: A deep learning pipeline for mapping cell-to-tissue graphs across placenta histology whole slide images .
  • Xiao et al. [2018] Xiao, D., Zhu, T., Qu, Y., Gou, X., Huang, Q., Li, X., Mu, D., 2018. Maternal chorioamnionitis and neurodevelopmental outcomes in preterm and very preterm neonates: a meta-analysis. PLoS One 13, e0208302.
  • Zaidi et al. [2020] Zaidi, H., Lamalmi, N., Lahlou, L., Slaoui, M., Barkat, A., Alamrani, S., Alhamany, Z., 2020. Clinical predictive factors of histological chorioamnionitis: case-control study. Heliyon 6.
  • Zhang et al. [2020] Zhang, Z., Davaasuren, D., Wu, C., Goldstein, J.A., Gernand, A.D., Wang, J.Z., 2020. Multi-region saliency-aware learning for cross-domain placenta image segmentation. Pattern recognition letters 140, 165--171.
  • Zimmermann et al. [2024] Zimmermann, E., Vorontsov, E., Viret, J., Casson, A., Zelechowski, M., Shaikovski, G., Tenenholtz, N., Hall, J., Fuchs, T., Fusi, N., et al., 2024. Virchow 2: Scaling self-supervised mixed magnification models in pathology. arXiv preprint arXiv:2408.00738 .