Abstract
Fréchet Inception Distance (FID) is a widely used metric for assessing synthetic image quality. It relies on an ImageNet-based feature extractor, making its applicability to medical imaging unclear. A recent trend is to adapt FID to medical imaging through feature extractors trained on medical images. Our study challenges this practice by demonstrating that ImageNet-based extractors are more consistent and aligned with human judgment than their RadImageNet counterparts. We evaluated sixteen StyleGAN2 networks across four medical imaging modalities and four data augmentation techniques with Fréchet distances (FDs) computed using eleven ImageNet or RadImageNet-trained feature extractors. Comparison with human judgment via visual Turing tests revealed that ImageNet-based extractors produced rankings consistent with human judgment, with the FD derived from the ImageNet-trained SwAV extractor significantly correlating with expert evaluations. In contrast, RadImageNet-based rankings were volatile and inconsistent with human judgment. Our findings challenge prevailing assumptions, providing novel evidence that medical image-trained feature extractors do not inherently improve FDs and can even compromise their reliability. Our code is available at https://github.com/mckellwoodland/fid-med-eval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon, I., et al. (eds.) NIPS. vol. 30. Curran Associates, Inc. (2017)
Woodland, M., et al.: Evaluating the performance of stylegan2-ada on medical images. In: Zhao, C., et al. (eds.) SASHIMI. pp. 142–153. Springer (2022). https://doi.org/10.1007/978-3-031-16980-9_14
Borji, A.: Pros and cons of gan evaluation measures. Comput. Vis. Image Underst. 179, 41–65 (2019). https://doi.org/10.1016/j.cviu.2018.10.009
Truong, T., Mohammadi, S., Lenga, M.: How transferable are self-supervised features in medical image classification tasks? In: Jung, K., et al. (eds.) MLHC. vol. 158, pp. 54–74. PMLR (2021)
Kynkäänniemi, T., Karras, T., Aittala, M., Aila, T., Lehtinen, J.: The role of imagenet classes in fréchet inception distance. arXiv:2203.06026 (2023)
Mei, X., et al.: Radimagenet: An open radiologic deep learning research dataset for effective transfer learning. Radiol.: Artif. Intell. 4(5) (2022). https://doi.org/10.1148/ryai.210315
Osuala, R., et al.: medigan: a Python library of pretrained generative models for medical image synthesis. J. Med. Imaging 10(6), 061403 (2023). https://doi.org/10.1117/1.JMI.10.6.061403
Anton, J., et al.: How well do self-supervised models transfer to medical imaging? J. Imaging 8(12), 320 (2022). https://doi.org/10.3390/jimaging8120320
Morozov, S., Voynov, A., Babenko, A.: On self-supervised image representations for gan evaluation. In: ICLR (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Zabih, R., et al. (eds.) CVPR. IEEE (2020)
Chen, J., Wei, J., Li, R.: Targan: target-aware generative adversarial networks for multi-modality medical image translation. In: de Bruijne, M., et al. (eds.) MICCAI. pp. 24–33. Springer (2021). https://doi.org/10.1007/978-3-030-87231-1_3
Jung, E., Luna, M., Park, S.H.: Conditional gan with an attention-based generator and a 3d discriminator for 3d medical image generation. In: de Bruijne, M., et al. (eds.) MICCAI. pp. 318–328. Springer (2021). https://doi.org/10.1007/978-3-030-87231-1_31
Tronchin, L., Sicilia, R., Cordelli, E., Ramella, S., Soda, P.: Evaluating gans in medical imaging. In: Engelhardt, S., et al. (eds.) DGM4MICCAI, DALI. pp. 112–121. Springer (2021). https://doi.org/10.1007/978-3-030-88210-5_10
Heimann, T., et al.: Comparison and evaluation of methods for liver segmentation from ct datasets. IEEE Trans. Med. Imaging 28(8), 1251–1265 (2009). https://doi.org/10.1109/TMI.2009.2013851
Wang, X., et al.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Chellappa, R., et al. (eds.) CVPR. IEEE (2017)
Antonelli, M., et al.: The medical segmentation decathlon. Nat. Commun. 13(1), 4128 (2022). https://doi.org/10.1038/s41467-022-30695-9
Simpson, A.L., et al.: A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv:1902.09063 (2019)
Bernard, O., et al.: Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: Is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018). https://doi.org/10.1109/TMI.2018.2837502
Karras, T., et al.: Analyzing and improving the image quality of stylegan. In: Zabih, R., et al. (eds.) CVPR. IEEE (2020)
Karras, T., et al.: Training generative adversarial networks with limited data. In: Larochelle, H., et al. (eds.) NeurIPS. vol. 33, pp. 12104–12114. Curran Associates, Inc. (2020)
Zhao, S., Liu, Z., Lin, J., Zhu, J.Y., Han, S.: Differentiable augmentation for data-efficient gan training. In: Larochelle, H., et al. (eds.) NeurIPS. vol. 33, pp. 7559–7570. Curran Associates, Inc. (2020)
Jiang, L., Dai, B., Wu, W., Loy, C.C.: Deceive d: Adaptive pseudo augmentation for gan training with limited data. In: Ranzato, M. (ed.) NeurIPS. vol. 34, pp. 21655–21667. Curran Associates, Inc. (2021)
Dowson, D., Landau, B.: The fréchet distance between multivariate normal distributions. J. Multivar. Anal. 12(3), 450–455 (1982). https://doi.org/10.1016/0047-259X(82)90077-X
Szegedy, C., et al.: Going deeper with convolutions. In: Bischof, H., et al. (eds.) CVPR. IEEE (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Zabih, R., et al. (eds.) CVPR. IEEE (2016)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. vol. 31 (2017). https://doi.org/10.1609/aaai.v31i1.11231
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR. IEEE (2017)
Deng, J., et al.: Imagenet: A large-scale hierarchical image database. In: CVPR. pp. 248–255. IEEE (2009). https://doi.org/10.1109/CVPR.2009.5206848
Caron, M., et al.: Unsupervised learning of visual features by contrasting cluster assignments. In: NeurIPS. vol. 33, pp. 9912–9924. Curran Associates, Inc. (2020)
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Berg, T., et al. (eds.) ICCV. pp. 9650–9660. IEEE (2021)
Li, Z., Wang, Y., Yu, J.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Berg, T., et al. (eds.) ICCV. pp. 10012–10022. IEEE (2021)
Zhou, H.Y., Lu, C., Yang, S., Yu, Y.: Convnets vs. transformers: Whose visual representations are more transferable? In: Vandenhende, S., et al. (eds.) ICCV Workshops. pp. 2230–2238. IEEE (2021)
Kang, M., Shim, W., Cho, M., Park, J.: Studiogan: A taxonomy and benchmark of gans for image synthesis. Trans. Pattern Anal. Mach. Intell. 45(12), 15725–15742 (2023). https://doi.org/10.1109/TPAMI.2023.3306436
Acknowledgments
Research reported in this publication was supported in part by resources of the Image Guided Cancer Therapy Research Program (IGCT) at The University of Texas MD Anderson Cancer Center, a generous gift from the Apache Corporation, the National Institutes of Health/NCI under award number P30CA016672, and the Tumor Measurement Initiative through the MD Anderson Strategic Initiative Development Program (STRIDE). We thank the NIH Clinical Center for the ChestX-ray14 dataset, Dr. Rishi Agrawal and Dr. Carol Wu for their generative modeling feedback, Dr. Vikram Haheshri and Dr. Oleg Igoshin for the discussion that led to the hypothesis testing contribution, and Erica Goodoff - Senior Scientific Editor in the Research Medical Library at MD Anderson for editing this article. GPT4 was used in the proofreading stage of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare.
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Woodland, M. et al. (2024). Feature Extraction for Generative Medical Imaging Evaluation: New Evidence Against an Evolving Trend. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15012. Springer, Cham. https://doi.org/10.1007/978-3-031-72390-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-72390-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72389-6
Online ISBN: 978-3-031-72390-2
eBook Packages: Computer ScienceComputer Science (R0)