Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15101))

Included in the following conference series:

  • 148 Accesses

Abstract

Computed tomography (CT) is widely utilized in clinical settings because it delivers detailed 3D images of the human body. However, performing CT scans is not always feasible due to radiation exposure and limitations in certain surgical environments. As an alternative, reconstructing CT images from ultra-sparse X-rays offers a valuable solution and has gained significant interest in scientific research and medical applications. However, it presents great challenges as it is inherently an ill-posed problem, often compromised by artifacts resulting from overlapping structures in X-ray images. In this paper, we propose DiffuX2CT, which models CT reconstruction from orthogonal biplanar X-rays as a conditional diffusion process. DiffuX2CT is established with a 3D global coherence denoising model with a new, implicit conditioning mechanism. We realize the conditioning mechanism by a newly designed tri-plane decoupling generator and an implicit neural decoder. By doing so, DiffuX2CT achieves structure-controllable reconstruction, which enables 3D structural information to be recovered from 2D X-rays, therefore producing faithful textures in CT images. As an extra contribution, we collect a real-world lumbar CT dataset, called LumbarV, as a new benchmark to verify the clinical significance and performance of CT reconstruction from X-rays. Extensive experiments on this dataset and three more publicly available datasets demonstrate the effectiveness of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdollahi, A., Pradhan, B., Alamri, A.: VNet: an end-to-end fully convolutional neural network for road extraction from high-resolution remote sensing data. IEEE Access (2020)

    Google Scholar 

  2. Anciukevičius, T., Xu, Z., Fisher, M., Henderson, P., Bilen, H., Mitra, N.J., Guerrero, P.: RenderDiffusion: image diffusion for 3D reconstruction, inpainting and generation. In: arXiv:2211.09869 (2022)

  3. Armato III, S.G., et al.: The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys. 38(2) 915–931 (2011)

    Google Scholar 

  4. Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: ViViT: a video vision transformer. In: ICCV (2021)

    Google Scholar 

  5. Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-Nerf: a multiscale representation for anti-aliasing neural radiance fields. In: ICCV (2021)

    Google Scholar 

  6. Blattmann, A., et al.: Align your latents: high-resolution video synthesis with latent diffusion models. In: CVPR (2023)

    Google Scholar 

  7. Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks. arXiv:1608.04236 (2016)

  8. Cai, R., et al.: Learning gradient fields for shape generation. In: ECCV (2020)

    Google Scholar 

  9. Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)

    Google Scholar 

  10. Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: CVPR (2021)

    Google Scholar 

  11. Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: CVPR (2019)

    Google Scholar 

  12. Chung, H., Ryu, D., McCann, M.T., Klasky, M.L., Ye, J.C.: Solving 3D inverse problems using pre-trained 2D diffusion models. In: CVPR (2023)

    Google Scholar 

  13. Chung, H., Sim, B., Ryu, D., Ye, J.C.: Improving diffusion models for inverse problems using manifold constraints. In: NeurIPS (2022)

    Google Scholar 

  14. Deng, Y., et al.: CTspine1k: a large-scale dataset for spinal vertebrae segmentation in computed tomography. arXiv:2105.14711 (2021)

  15. Deng, Y., Yang, J., Xiang, J., Tong, X.: Gram: generative radiance manifolds for 3D-aware image generation. In: CVPR (2022)

    Google Scholar 

  16. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: NeurIPS 34, 8780–8794 (2021)

    Google Scholar 

  17. Galvin, J.M., Sims, C., Dominiak, G., Cooper, J.S.: The use of digitally reconstructed radiographs for three-dimensional treatment planning and CT-simulation. Int. J. Radiat. Oncol. Biol. Phys. 31(4), 935–942 (1995)

    Google Scholar 

  18. Gao, S., et al.: Implicit diffusion models for continuous super-resolution. In: CVPR (2023)

    Google Scholar 

  19. Ge, R., et al.: X-CTRSNet: 3D cervical vertebra CT reconstruction and segmentation directly from 2D X-ray images. Knowl. -Based Syst. 236, 107680 (2022)

    Google Scholar 

  20. Goodfellow, I., et al.: Generative adversarial nets. NeurIPS 27(2014)

    Google Scholar 

  21. Harmon, S.A., et al.: Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets. Nat. commun. 11(1), 4080 (2020)

    Google Scholar 

  22. Henzler, P., Rasche, V., Ropinski, T., Ritschel, T.: Single-image tomography: 3D volumes from 2D cranial x-rays. Comput. Graph. Forum 37(2), 377–388 (2018)

    Google Scholar 

  23. Herman, G.T.: Fundamentals of Computerized Tomography: Image Reconstruction from Projections. Springer Science & Business Media (2009). https://doi.org/10.1007/978-1-84628-723-7

  24. Ho, J., et al.: Imagen video: high definition video generation with diffusion models. arXiv:2210.02303 (2022)

  25. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. NeurIPS 33, 6840–6851 (2020)

    Google Scholar 

  26. Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models. NeurIPS 35, 8633–8646 (2022)

    Google Scholar 

  27. Huang, Y., Taubmann, O., Huang, X., Haase, V., Lauritsch, G., Maier, A.: Scale-space anisotropic total variation for limited angle tomography. IEEE Trans. Radiat. Plasma Med. Sci. 2(4), 307–314 (2018)

    Google Scholar 

  28. Jiang, L., Zhang, M., Wei, R., Liu, B., Bai, X., Zhou, F.: Reconstruction of 3D CT from a single x-ray projection view using CVAE-GAN. In: ICMIPE (2021)

    Google Scholar 

  29. Johnson, A.E., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. data 6(1), 317 (2019)

    Google Scholar 

  30. Johnson, C.D., et al.: Accuracy of CT colonography for detection of large adenomas and cancers. New Engl. J. Med. 359(12), 1207–1217 (2008)

    Google Scholar 

  31. Kasten, Y., Doktofsky, D., Kovler, I.: End-to-end convolutional neural network for 3D reconstruction of knee bones from bi-planar X-ray images. In: MICCAI (2020)

    Google Scholar 

  32. Khan, A., et al.: Comparing next-generation robotic technology with 3-dimensional computed tomography navigation technology for the insertion of posterior pedicle screws. World Neurosurg. 123, e474–e481 (2019)

    Google Scholar 

  33. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)

    Google Scholar 

  34. Kong, Z., Ping, W., Huang, J., Zhao, K., Catanzaro, B.: DiffWave: a versatile diffusion model for audio synthesis. arXiv:2009.09761 (2020)

  35. Lee, S., Chung, H., Park, M., Park, J., Ryu, W.S., Ye, J.C.: Improving 3D imaging with pre-trained perpendicular 2D diffusion models. In: ICCV (2023)

    Google Scholar 

  36. Li, H., et al.: UV-IDM: identity-conditioned latent diffusion model for face UV-texture generation. In: CVPR (2024)

    Google Scholar 

  37. Li, M., Duan, Y., Zhou, J., Lu, J.: Diffusion-SDF: text-to-shape via voxelized diffusion. In: CVPR (2023)

    Google Scholar 

  38. Li, S., et al.: Zone: zero-shot instruction-guided local editing. In: CVPR (2024)

    Google Scholar 

  39. Liao, Y., Schwarz, K., Mescheder, L., Geiger, A.: Towards unsupervised learning of generative models for 3D controllable image synthesis. In: CVPR (2020)

    Google Scholar 

  40. Lin, C.H., et al.: Magic3D: High-resolution text-to-3D content creation. arXiv:2211.10440 (2022)

  41. Liu, J., et al.: Dolce: a model-based probabilistic diffusion framework for limited-angle CT reconstruction. In: ICCV (2023)

    Google Scholar 

  42. Liu, J., Li, C., Ren, Y., Chen, F., Zhao, Z.: Diffsinger: singing voice synthesis via shallow diffusion mechanism. In: AAAI (2022)

    Google Scholar 

  43. Liu, P., et al.: Deep learning to segment pelvic bones: large-scale CT datasets and baseline models. Int. J. Comput. Assist. Radiol. Surg. 16, 749–756 (2021)

    Google Scholar 

  44. Liu, X., et al.: Ladiffgan: Training GANs with diffusion supervision in latent spaces. In: CVPRW (2024)

    Google Scholar 

  45. Liu, Z., Ning, J., Cao, Y., Wei, Y., Zhang, Z., Lin, S., Hu, H.: Video swin transformer. In: CVPR (2022)

    Google Scholar 

  46. Liu, Z., Feng, Y., Black, M.J., Nowrouzezahrai, D., Paull, L., Liu, W.: MeshDiffusion: score-based generative 3D mesh modeling. arXiv:2303.08133 (2023)

  47. Luo, S., Hu, W.: Diffusion probabilistic models for 3D point cloud generation. In: CVPR (2021)

    Google Scholar 

  48. Mildenhall, B., et al.: Nerf: Representing scenes as neural radiance fields for view synthesis. ACM, Commun. 65(1), 99–106 (2021)

    Google Scholar 

  49. Mohan, K.A., et al.: TIMBIR: a method for time-space reconstruction from interlaced views. IEEE Trans. Comput. Imaging 1(2), 96–111 (2015)

    Google Scholar 

  50. Nichol, A., et al.: Glide: towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741 (2021)

  51. Nolan, T.: Head-and-neck squamous cell carcinoma patients with CT taken during pre-treatment, mid-treatment, and post-treatment (HNSCC-3DCT-RT). Cancer Imaging Archive (2022)

    Google Scholar 

  52. Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR (2019)

    Google Scholar 

  53. Peebles, W., Xie, S.: Scalable diffusion models with transformers. arXiv:2212.09748 (2022)

  54. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3D using 2D diffusion. arXiv:2209.14988 (2022)

  55. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125 (2022)

  56. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)

    Google Scholar 

  57. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: MICCAI (2015)

    Google Scholar 

  58. Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS 35, 36479–36494(2022)

    Google Scholar 

  59. Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement. TPAMI 45(4), 4713–4726 (2022)

    Google Scholar 

  60. Schofield, R., et al.: Image reconstruction: part 1–understanding filtered back projection, noise and image acquisition. J. cardiovasc. comput. tomogr. 14(3), 219–225 (2020)

    Google Scholar 

  61. Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: Graf: generative radiance fields for 3d-aware image synthesis. NeurIPS 33, 20154–20166 (2020)

    Google Scholar 

  62. Shen, L., Zhao, W., Xing, L.: Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning. Nat. Biomed. Eng. 3(11), 880–888(2019)

    Google Scholar 

  63. Shiode, R., et al.: 2D–3D reconstruction of distal forearm bone from actual X-ray images of the wrist using convolutional neural networks. Sci. Rep. 11(1), 15249 (2021)

    Google Scholar 

  64. Siasios, I.D., Pollina, J., Khan, A., Dimopoulos, V.G.: Percutaneous screw placement in the lumbar spine with a modified guidance technique based on 3D CT navigation system. J. Spine Surg. 3(4), 657 (2017)

    Google Scholar 

  65. Simpson, A.L., et al.: A large annotated medical image dataset for the development and evaluation of segmentation algorithms (2019)

    Google Scholar 

  66. Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. INeurIPS 33, 7462–7473 (2020)

    Google Scholar 

  67. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: ICML (2015)

    Google Scholar 

  68. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv:2010.02502 (2020)

  69. Song, Y., Shen, L., Xing, L., Ermon, S.: Solving inverse problems in medical imaging with score-based generative models. In: ICLR (2022)

    Google Scholar 

  70. Tang, J., Wang, T., Zhang, B., Zhang, T., Yi, R., Ma, L., Chen, D.: Make-it-3D: high-fidelity 3D creation from a single image with diffusion prior. arXiv:2303.14184 (2023)

  71. Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., Bermano, A.H.: Human motion diffusion model. In: ICLR (2023)

    Google Scholar 

  72. Vahdat, A., Kreis, K., Kautz, J.: Score-based generative modeling in latent space. In: NeurIPS 34, 11287–11302 (2021)

    Google Scholar 

  73. Vaswani, A., et al.: Attention is all you need. NeurIPS (2017)

    Google Scholar 

  74. Venkatakrishnan, S.V., Drummy, L.F., Jackson, M.A., De Graef, M., Simmons, J., Bouman, C.A.: A model based iterative reconstruction algorithm for high angle annular dark field-scanning transmission electron microscope (HAADF-STEM) tomography. TIP 22(11), 4532–4544 (2013)

    Google Scholar 

  75. Venkatakrishnan, S.V., Mohan, K.A., Ziabari, A.K., Bouman, C.A.: Algorithm-driven advances for scientific CT instruments: from model-based to deep learning-based approaches. IEEE Sign. Process. Mag. 39(1), 32–43 (2021)

    Google Scholar 

  76. Wang, G., Ye, J.C., Mueller, K., Fessler, J.A.: Image reconstruction is a new frontier of machine learning. TMI, 37(6), 1289–1296 (2018)

    Google Scholar 

  77. Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score jacobian chaining: Lifting pretrained 2D diffusion models for 3D generation. In: ICCV (2023)

    Google Scholar 

  78. Wang, T., et al.: Rodin: a generative model for sculpting 3D digital avatars using diffusion. In: CVPR (2023)

    Google Scholar 

  79. Wang, Z., Lu, C., Wang, Y., Bao, F., Li, C., Su, H., Zhu, J.: Prolificdreamer: High-fidelity and diverse text-to-3D generation with variational score distillation. arXiv:2305.16213 (2023)

  80. Wang, Z., Wu, Z., Agarwal, D., Sun, J.: MedCLIP: contrastive learning from unpaired medical images and text. arXiv:2210.10163 (2022)

  81. Wu, J.Z., et al.: Tune-a-video: one-shot tuning of image diffusion models for text-to-video generation. In: ICCV (2023)

    Google Scholar 

  82. Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling (2016)

    Google Scholar 

  83. Wu, J., Mahfouz, M.R.: Reconstruction of knee anatomy from single-plane fluoroscopic X-ray based on a nonlinear statistical shape model. J. Med. Imaging 8(1), 016001–016001(2021)

    Google Scholar 

  84. Xie, C., Wang, C., Zhang, B., Yang, H., Chen, D., Wen, F.: Style-based point generator with adversarial rendering for point cloud completion. In: CVPR (2021)

    Google Scholar 

  85. Ying, X., Guo, H., Ma, K., Wu, J., Weng, Z., Zheng, Y.: X2ct-GAN: reconstructing CT from biplanar X-rays with generative adversarial networks. In: CVPR (2019)

    Google Scholar 

  86. Yu, S., Sohn, K., Kim, S., Shin, J.: Video probabilistic diffusion models in projected latent space. In: CVPR (2023)

    Google Scholar 

  87. Zeng, B., et al.: IPDreamer: appearance-controllable 3D object generation with image prompts. arXiv:2310.05375 (2023)

  88. Zeng, B., et al.: Controllable mind visual diffusion model. In: AAAI (2024)

    Google Scholar 

  89. Zeng, B., et al.: FNeVR: neural volume rendering for face animation. NeurIPS 35, 22451–22462 (2022)

    Google Scholar 

  90. Zeng, B., et al.: Face animation with an attribute-guided diffusion model. In: CVPRW (2023)

    Google Scholar 

  91. Zhang, C., et al.: Xtransct: Ultra-fast volumetric CT reconstruction using two orthogonal x-ray projections via a transformer network. arXiv:2305.19621 (2023)

  92. Zhang, Z., Sun, L., Yang, Z., Chen, L., Yang, Y.: Global-correlated 3D-decoupling transformer for clothed avatar reconstruction. NeurIPs 36 (2024)

    Google Scholar 

  93. Zhu, J., Zhuang, P.: HiFA: high-fidelity text-to-3D with advanced diffusion guidance. arXiv:2305.18766 (2023)

Download references

Acknowledgements

The work was supported by the National Key Research and Development Program of China (Grant No. 2023YFC3300029). This research was also supported by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LD24F020007, Beijing Natural Science Foundation L223024, National Natural Science Foundation of China under Grant NO. 62076016, 62176068, and 12201024, “One Thousand Plan” projects in Jiangxi Province Jxsg2023102268, Beijing Municipal Science & Technology Commission, Administrative Commission of Zhongguancun Science Park Grant No.Z231100005923035. Taiyuan City “Double hundred Research action” 2024TYJB0127.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Juan Zhang or Xiantong Zhen .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8308 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, X. et al. (2025). DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15101. Springer, Cham. https://doi.org/10.1007/978-3-031-72775-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72775-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72774-0

  • Online ISBN: 978-3-031-72775-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics