Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Feature Extraction for Generative Medical Imaging Evaluation: New Evidence Against an Evolving Trend

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 (MICCAI 2024)

Abstract

Fréchet Inception Distance (FID) is a widely used metric for assessing synthetic image quality. It relies on an ImageNet-based feature extractor, making its applicability to medical imaging unclear. A recent trend is to adapt FID to medical imaging through feature extractors trained on medical images. Our study challenges this practice by demonstrating that ImageNet-based extractors are more consistent and aligned with human judgment than their RadImageNet counterparts. We evaluated sixteen StyleGAN2 networks across four medical imaging modalities and four data augmentation techniques with Fréchet distances (FDs) computed using eleven ImageNet or RadImageNet-trained feature extractors. Comparison with human judgment via visual Turing tests revealed that ImageNet-based extractors produced rankings consistent with human judgment, with the FD derived from the ImageNet-trained SwAV extractor significantly correlating with expert evaluations. In contrast, RadImageNet-based rankings were volatile and inconsistent with human judgment. Our findings challenge prevailing assumptions, providing novel evidence that medical image-trained feature extractors do not inherently improve FDs and can even compromise their reliability. Our code is available at https://github.com/mckellwoodland/fid-med-eval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://sliver07.grand-challenge.org/.

  2. 2.

    https://nihcc.app.box.com/v/ChestXray-NIHCC.

  3. 3.

    http://medicaldecathlon.com/, CC-BY-SA 4.0 license.

  4. 4.

    https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html.

References

  1. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon, I., et al. (eds.) NIPS. vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  2. Woodland, M., et al.: Evaluating the performance of stylegan2-ada on medical images. In: Zhao, C., et al. (eds.) SASHIMI. pp. 142–153. Springer (2022). https://doi.org/10.1007/978-3-031-16980-9_14

  3. Borji, A.: Pros and cons of gan evaluation measures. Comput. Vis. Image Underst. 179, 41–65 (2019). https://doi.org/10.1016/j.cviu.2018.10.009

    Article  Google Scholar 

  4. Truong, T., Mohammadi, S., Lenga, M.: How transferable are self-supervised features in medical image classification tasks? In: Jung, K., et al. (eds.) MLHC. vol. 158, pp. 54–74. PMLR (2021)

    Google Scholar 

  5. Kynkäänniemi, T., Karras, T., Aittala, M., Aila, T., Lehtinen, J.: The role of imagenet classes in fréchet inception distance. arXiv:2203.06026 (2023)

  6. Mei, X., et al.: Radimagenet: An open radiologic deep learning research dataset for effective transfer learning. Radiol.: Artif. Intell. 4(5) (2022). https://doi.org/10.1148/ryai.210315

  7. Osuala, R., et al.: medigan: a Python library of pretrained generative models for medical image synthesis. J. Med. Imaging 10(6), 061403 (2023). https://doi.org/10.1117/1.JMI.10.6.061403

    Article  Google Scholar 

  8. Anton, J., et al.: How well do self-supervised models transfer to medical imaging? J. Imaging 8(12),  320 (2022). https://doi.org/10.3390/jimaging8120320

    Article  Google Scholar 

  9. Morozov, S., Voynov, A., Babenko, A.: On self-supervised image representations for gan evaluation. In: ICLR (2020)

    Google Scholar 

  10. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Zabih, R., et al. (eds.) CVPR. IEEE (2020)

    Google Scholar 

  11. Chen, J., Wei, J., Li, R.: Targan: target-aware generative adversarial networks for multi-modality medical image translation. In: de Bruijne, M., et al. (eds.) MICCAI. pp. 24–33. Springer (2021). https://doi.org/10.1007/978-3-030-87231-1_3

  12. Jung, E., Luna, M., Park, S.H.: Conditional gan with an attention-based generator and a 3d discriminator for 3d medical image generation. In: de Bruijne, M., et al. (eds.) MICCAI. pp. 318–328. Springer (2021). https://doi.org/10.1007/978-3-030-87231-1_31

  13. Tronchin, L., Sicilia, R., Cordelli, E., Ramella, S., Soda, P.: Evaluating gans in medical imaging. In: Engelhardt, S., et al. (eds.) DGM4MICCAI, DALI. pp. 112–121. Springer (2021). https://doi.org/10.1007/978-3-030-88210-5_10

  14. Heimann, T., et al.: Comparison and evaluation of methods for liver segmentation from ct datasets. IEEE Trans. Med. Imaging 28(8), 1251–1265 (2009). https://doi.org/10.1109/TMI.2009.2013851

    Article  Google Scholar 

  15. Wang, X., et al.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Chellappa, R., et al. (eds.) CVPR. IEEE (2017)

    Google Scholar 

  16. Antonelli, M., et al.: The medical segmentation decathlon. Nat. Commun. 13(1),  4128 (2022). https://doi.org/10.1038/s41467-022-30695-9

    Article  Google Scholar 

  17. Simpson, A.L., et al.: A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv:1902.09063 (2019)

  18. Bernard, O., et al.: Deep learning techniques for automatic mri cardiac multi-structures segmentation and diagnosis: Is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018). https://doi.org/10.1109/TMI.2018.2837502

    Article  Google Scholar 

  19. Karras, T., et al.: Analyzing and improving the image quality of stylegan. In: Zabih, R., et al. (eds.) CVPR. IEEE (2020)

    Google Scholar 

  20. Karras, T., et al.: Training generative adversarial networks with limited data. In: Larochelle, H., et al. (eds.) NeurIPS. vol. 33, pp. 12104–12114. Curran Associates, Inc. (2020)

    Google Scholar 

  21. Zhao, S., Liu, Z., Lin, J., Zhu, J.Y., Han, S.: Differentiable augmentation for data-efficient gan training. In: Larochelle, H., et al. (eds.) NeurIPS. vol. 33, pp. 7559–7570. Curran Associates, Inc. (2020)

    Google Scholar 

  22. Jiang, L., Dai, B., Wu, W., Loy, C.C.: Deceive d: Adaptive pseudo augmentation for gan training with limited data. In: Ranzato, M. (ed.) NeurIPS. vol. 34, pp. 21655–21667. Curran Associates, Inc. (2021)

    Google Scholar 

  23. Dowson, D., Landau, B.: The fréchet distance between multivariate normal distributions. J. Multivar. Anal. 12(3), 450–455 (1982). https://doi.org/10.1016/0047-259X(82)90077-X

    Article  Google Scholar 

  24. Szegedy, C., et al.: Going deeper with convolutions. In: Bischof, H., et al. (eds.) CVPR. IEEE (2015)

    Google Scholar 

  25. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Zabih, R., et al. (eds.) CVPR. IEEE (2016)

    Google Scholar 

  26. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning. vol. 31 (2017). https://doi.org/10.1609/aaai.v31i1.11231

  27. Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR. IEEE (2017)

    Google Scholar 

  28. Deng, J., et al.: Imagenet: A large-scale hierarchical image database. In: CVPR. pp. 248–255. IEEE (2009). https://doi.org/10.1109/CVPR.2009.5206848

  29. Caron, M., et al.: Unsupervised learning of visual features by contrasting cluster assignments. In: NeurIPS. vol. 33, pp. 9912–9924. Curran Associates, Inc. (2020)

    Google Scholar 

  30. Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Berg, T., et al. (eds.) ICCV. pp. 9650–9660. IEEE (2021)

    Google Scholar 

  31. Li, Z., Wang, Y., Yu, J.: Swin transformer: Hierarchical vision transformer using shifted windows. In: Berg, T., et al. (eds.) ICCV. pp. 10012–10022. IEEE (2021)

    Google Scholar 

  32. Zhou, H.Y., Lu, C., Yang, S., Yu, Y.: Convnets vs. transformers: Whose visual representations are more transferable? In: Vandenhende, S., et al. (eds.) ICCV Workshops. pp. 2230–2238. IEEE (2021)

    Google Scholar 

  33. Kang, M., Shim, W., Cho, M., Park, J.: Studiogan: A taxonomy and benchmark of gans for image synthesis. Trans. Pattern Anal. Mach. Intell. 45(12), 15725–15742 (2023). https://doi.org/10.1109/TPAMI.2023.3306436

    Article  Google Scholar 

Download references

Acknowledgments

Research reported in this publication was supported in part by resources of the Image Guided Cancer Therapy Research Program (IGCT) at The University of Texas MD Anderson Cancer Center, a generous gift from the Apache Corporation, the National Institutes of Health/NCI under award number P30CA016672, and the Tumor Measurement Initiative through the MD Anderson Strategic Initiative Development Program (STRIDE). We thank the NIH Clinical Center for the ChestX-ray14 dataset, Dr. Rishi Agrawal and Dr. Carol Wu for their generative modeling feedback, Dr. Vikram Haheshri and Dr. Oleg Igoshin for the discussion that led to the hypothesis testing contribution, and Erica Goodoff - Senior Scientific Editor in the Research Medical Library at MD Anderson for editing this article. GPT4 was used in the proofreading stage of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to McKell Woodland .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 56 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Woodland, M. et al. (2024). Feature Extraction for Generative Medical Imaging Evaluation: New Evidence Against an Evolving Trend. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15012. Springer, Cham. https://doi.org/10.1007/978-3-031-72390-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72390-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72389-6

  • Online ISBN: 978-3-031-72390-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics