DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

Liu, Xuhui; Qiao, Zhi; Liu, Runkun; Li, Hong; Zhang, Juan; Zhen, Xiantong; Qian, Zhen; Zhang, Baochang

doi:10.1007/978-3-031-72775-7_26

Xuhui Liu¹³,
Zhi Qiao¹⁴,
Runkun Liu¹⁴,
Hong Li¹³,
Juan Zhang¹³,
Xiantong Zhen¹⁴,
Zhen Qian¹⁴ &
…
Baochang Zhang^13,15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15101))

Included in the following conference series:

European Conference on Computer Vision

148 Accesses

Abstract

Computed tomography (CT) is widely utilized in clinical settings because it delivers detailed 3D images of the human body. However, performing CT scans is not always feasible due to radiation exposure and limitations in certain surgical environments. As an alternative, reconstructing CT images from ultra-sparse X-rays offers a valuable solution and has gained significant interest in scientific research and medical applications. However, it presents great challenges as it is inherently an ill-posed problem, often compromised by artifacts resulting from overlapping structures in X-ray images. In this paper, we propose DiffuX2CT, which models CT reconstruction from orthogonal biplanar X-rays as a conditional diffusion process. DiffuX2CT is established with a 3D global coherence denoising model with a new, implicit conditioning mechanism. We realize the conditioning mechanism by a newly designed tri-plane decoupling generator and an implicit neural decoder. By doing so, DiffuX2CT achieves structure-controllable reconstruction, which enables 3D structural information to be recovered from 2D X-rays, therefore producing faithful textures in CT images. As an extra contribution, we collect a real-world lumbar CT dataset, called LumbarV, as a new benchmark to verify the clinical significance and performance of CT reconstruction from X-rays. Extensive experiments on this dataset and three more publicly available datasets demonstrate the effectiveness of our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SNAFusion: Distilling 2D Axial Plane Diffusion Priors for Sparse-View 3D Cone-Beam CT Imaging

Learning 3D Gaussians for Extremely Sparse-View Cone-Beam CT Reconstruction

DuDoTrans: Dual-Domain Transformer for Sparse-View CT Reconstruction

References

Abdollahi, A., Pradhan, B., Alamri, A.: VNet: an end-to-end fully convolutional neural network for road extraction from high-resolution remote sensing data. IEEE Access (2020)
Google Scholar
Anciukevičius, T., Xu, Z., Fisher, M., Henderson, P., Bilen, H., Mitra, N.J., Guerrero, P.: RenderDiffusion: image diffusion for 3D reconstruction, inpainting and generation. In: arXiv:2211.09869 (2022)
Armato III, S.G., et al.: The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys. 38(2) 915–931 (2011)
Google Scholar
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: ViViT: a video vision transformer. In: ICCV (2021)
Google Scholar
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-Nerf: a multiscale representation for anti-aliasing neural radiance fields. In: ICCV (2021)
Google Scholar
Blattmann, A., et al.: Align your latents: high-resolution video synthesis with latent diffusion models. In: CVPR (2023)
Google Scholar
Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks. arXiv:1608.04236 (2016)
Cai, R., et al.: Learning gradient fields for shape generation. In: ECCV (2020)
Google Scholar
Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)
Google Scholar
Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: CVPR (2021)
Google Scholar
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: CVPR (2019)
Google Scholar
Chung, H., Ryu, D., McCann, M.T., Klasky, M.L., Ye, J.C.: Solving 3D inverse problems using pre-trained 2D diffusion models. In: CVPR (2023)
Google Scholar
Chung, H., Sim, B., Ryu, D., Ye, J.C.: Improving diffusion models for inverse problems using manifold constraints. In: NeurIPS (2022)
Google Scholar
Deng, Y., et al.: CTspine1k: a large-scale dataset for spinal vertebrae segmentation in computed tomography. arXiv:2105.14711 (2021)
Deng, Y., Yang, J., Xiang, J., Tong, X.: Gram: generative radiance manifolds for 3D-aware image generation. In: CVPR (2022)
Google Scholar
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: NeurIPS 34, 8780–8794 (2021)
Google Scholar
Galvin, J.M., Sims, C., Dominiak, G., Cooper, J.S.: The use of digitally reconstructed radiographs for three-dimensional treatment planning and CT-simulation. Int. J. Radiat. Oncol. Biol. Phys. 31(4), 935–942 (1995)
Google Scholar
Gao, S., et al.: Implicit diffusion models for continuous super-resolution. In: CVPR (2023)
Google Scholar
Ge, R., et al.: X-CTRSNet: 3D cervical vertebra CT reconstruction and segmentation directly from 2D X-ray images. Knowl. -Based Syst. 236, 107680 (2022)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. NeurIPS 27(2014)
Google Scholar
Harmon, S.A., et al.: Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets. Nat. commun. 11(1), 4080 (2020)
Google Scholar
Henzler, P., Rasche, V., Ropinski, T., Ritschel, T.: Single-image tomography: 3D volumes from 2D cranial x-rays. Comput. Graph. Forum 37(2), 377–388 (2018)
Google Scholar
Herman, G.T.: Fundamentals of Computerized Tomography: Image Reconstruction from Projections. Springer Science & Business Media (2009). https://doi.org/10.1007/978-1-84628-723-7
Ho, J., et al.: Imagen video: high definition video generation with diffusion models. arXiv:2210.02303 (2022)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. NeurIPS 33, 6840–6851 (2020)
Google Scholar
Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models. NeurIPS 35, 8633–8646 (2022)
Google Scholar
Huang, Y., Taubmann, O., Huang, X., Haase, V., Lauritsch, G., Maier, A.: Scale-space anisotropic total variation for limited angle tomography. IEEE Trans. Radiat. Plasma Med. Sci. 2(4), 307–314 (2018)
Google Scholar
Jiang, L., Zhang, M., Wei, R., Liu, B., Bai, X., Zhou, F.: Reconstruction of 3D CT from a single x-ray projection view using CVAE-GAN. In: ICMIPE (2021)
Google Scholar
Johnson, A.E., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. data 6(1), 317 (2019)
Google Scholar
Johnson, C.D., et al.: Accuracy of CT colonography for detection of large adenomas and cancers. New Engl. J. Med. 359(12), 1207–1217 (2008)
Google Scholar
Kasten, Y., Doktofsky, D., Kovler, I.: End-to-end convolutional neural network for 3D reconstruction of knee bones from bi-planar X-ray images. In: MICCAI (2020)
Google Scholar
Khan, A., et al.: Comparing next-generation robotic technology with 3-dimensional computed tomography navigation technology for the insertion of posterior pedicle screws. World Neurosurg. 123, e474–e481 (2019)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)
Google Scholar
Kong, Z., Ping, W., Huang, J., Zhao, K., Catanzaro, B.: DiffWave: a versatile diffusion model for audio synthesis. arXiv:2009.09761 (2020)
Lee, S., Chung, H., Park, M., Park, J., Ryu, W.S., Ye, J.C.: Improving 3D imaging with pre-trained perpendicular 2D diffusion models. In: ICCV (2023)
Google Scholar
Li, H., et al.: UV-IDM: identity-conditioned latent diffusion model for face UV-texture generation. In: CVPR (2024)
Google Scholar
Li, M., Duan, Y., Zhou, J., Lu, J.: Diffusion-SDF: text-to-shape via voxelized diffusion. In: CVPR (2023)
Google Scholar
Li, S., et al.: Zone: zero-shot instruction-guided local editing. In: CVPR (2024)
Google Scholar
Liao, Y., Schwarz, K., Mescheder, L., Geiger, A.: Towards unsupervised learning of generative models for 3D controllable image synthesis. In: CVPR (2020)
Google Scholar
Lin, C.H., et al.: Magic3D: High-resolution text-to-3D content creation. arXiv:2211.10440 (2022)
Liu, J., et al.: Dolce: a model-based probabilistic diffusion framework for limited-angle CT reconstruction. In: ICCV (2023)
Google Scholar
Liu, J., Li, C., Ren, Y., Chen, F., Zhao, Z.: Diffsinger: singing voice synthesis via shallow diffusion mechanism. In: AAAI (2022)
Google Scholar
Liu, P., et al.: Deep learning to segment pelvic bones: large-scale CT datasets and baseline models. Int. J. Comput. Assist. Radiol. Surg. 16, 749–756 (2021)
Google Scholar
Liu, X., et al.: Ladiffgan: Training GANs with diffusion supervision in latent spaces. In: CVPRW (2024)
Google Scholar
Liu, Z., Ning, J., Cao, Y., Wei, Y., Zhang, Z., Lin, S., Hu, H.: Video swin transformer. In: CVPR (2022)
Google Scholar
Liu, Z., Feng, Y., Black, M.J., Nowrouzezahrai, D., Paull, L., Liu, W.: MeshDiffusion: score-based generative 3D mesh modeling. arXiv:2303.08133 (2023)
Luo, S., Hu, W.: Diffusion probabilistic models for 3D point cloud generation. In: CVPR (2021)
Google Scholar
Mildenhall, B., et al.: Nerf: Representing scenes as neural radiance fields for view synthesis. ACM, Commun. 65(1), 99–106 (2021)
Google Scholar
Mohan, K.A., et al.: TIMBIR: a method for time-space reconstruction from interlaced views. IEEE Trans. Comput. Imaging 1(2), 96–111 (2015)
Google Scholar
Nichol, A., et al.: Glide: towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741 (2021)
Nolan, T.: Head-and-neck squamous cell carcinoma patients with CT taken during pre-treatment, mid-treatment, and post-treatment (HNSCC-3DCT-RT). Cancer Imaging Archive (2022)
Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR (2019)
Google Scholar
Peebles, W., Xie, S.: Scalable diffusion models with transformers. arXiv:2212.09748 (2022)
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3D using 2D diffusion. arXiv:2209.14988 (2022)
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125 (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: MICCAI (2015)
Google Scholar
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS 35, 36479–36494(2022)
Google Scholar
Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement. TPAMI 45(4), 4713–4726 (2022)
Google Scholar
Schofield, R., et al.: Image reconstruction: part 1–understanding filtered back projection, noise and image acquisition. J. cardiovasc. comput. tomogr. 14(3), 219–225 (2020)
Google Scholar
Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: Graf: generative radiance fields for 3d-aware image synthesis. NeurIPS 33, 20154–20166 (2020)
Google Scholar
Shen, L., Zhao, W., Xing, L.: Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning. Nat. Biomed. Eng. 3(11), 880–888(2019)
Google Scholar
Shiode, R., et al.: 2D–3D reconstruction of distal forearm bone from actual X-ray images of the wrist using convolutional neural networks. Sci. Rep. 11(1), 15249 (2021)
Google Scholar
Siasios, I.D., Pollina, J., Khan, A., Dimopoulos, V.G.: Percutaneous screw placement in the lumbar spine with a modified guidance technique based on 3D CT navigation system. J. Spine Surg. 3(4), 657 (2017)
Google Scholar
Simpson, A.L., et al.: A large annotated medical image dataset for the development and evaluation of segmentation algorithms (2019)
Google Scholar
Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. INeurIPS 33, 7462–7473 (2020)
Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: ICML (2015)
Google Scholar
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv:2010.02502 (2020)
Song, Y., Shen, L., Xing, L., Ermon, S.: Solving inverse problems in medical imaging with score-based generative models. In: ICLR (2022)
Google Scholar
Tang, J., Wang, T., Zhang, B., Zhang, T., Yi, R., Ma, L., Chen, D.: Make-it-3D: high-fidelity 3D creation from a single image with diffusion prior. arXiv:2303.14184 (2023)
Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., Bermano, A.H.: Human motion diffusion model. In: ICLR (2023)
Google Scholar
Vahdat, A., Kreis, K., Kautz, J.: Score-based generative modeling in latent space. In: NeurIPS 34, 11287–11302 (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. NeurIPS (2017)
Google Scholar
Venkatakrishnan, S.V., Drummy, L.F., Jackson, M.A., De Graef, M., Simmons, J., Bouman, C.A.: A model based iterative reconstruction algorithm for high angle annular dark field-scanning transmission electron microscope (HAADF-STEM) tomography. TIP 22(11), 4532–4544 (2013)
Google Scholar
Venkatakrishnan, S.V., Mohan, K.A., Ziabari, A.K., Bouman, C.A.: Algorithm-driven advances for scientific CT instruments: from model-based to deep learning-based approaches. IEEE Sign. Process. Mag. 39(1), 32–43 (2021)
Google Scholar
Wang, G., Ye, J.C., Mueller, K., Fessler, J.A.: Image reconstruction is a new frontier of machine learning. TMI, 37(6), 1289–1296 (2018)
Google Scholar
Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score jacobian chaining: Lifting pretrained 2D diffusion models for 3D generation. In: ICCV (2023)
Google Scholar
Wang, T., et al.: Rodin: a generative model for sculpting 3D digital avatars using diffusion. In: CVPR (2023)
Google Scholar
Wang, Z., Lu, C., Wang, Y., Bao, F., Li, C., Su, H., Zhu, J.: Prolificdreamer: High-fidelity and diverse text-to-3D generation with variational score distillation. arXiv:2305.16213 (2023)
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: MedCLIP: contrastive learning from unpaired medical images and text. arXiv:2210.10163 (2022)
Wu, J.Z., et al.: Tune-a-video: one-shot tuning of image diffusion models for text-to-video generation. In: ICCV (2023)
Google Scholar
Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling (2016)
Google Scholar
Wu, J., Mahfouz, M.R.: Reconstruction of knee anatomy from single-plane fluoroscopic X-ray based on a nonlinear statistical shape model. J. Med. Imaging 8(1), 016001–016001(2021)
Google Scholar
Xie, C., Wang, C., Zhang, B., Yang, H., Chen, D., Wen, F.: Style-based point generator with adversarial rendering for point cloud completion. In: CVPR (2021)
Google Scholar
Ying, X., Guo, H., Ma, K., Wu, J., Weng, Z., Zheng, Y.: X2ct-GAN: reconstructing CT from biplanar X-rays with generative adversarial networks. In: CVPR (2019)
Google Scholar
Yu, S., Sohn, K., Kim, S., Shin, J.: Video probabilistic diffusion models in projected latent space. In: CVPR (2023)
Google Scholar
Zeng, B., et al.: IPDreamer: appearance-controllable 3D object generation with image prompts. arXiv:2310.05375 (2023)
Zeng, B., et al.: Controllable mind visual diffusion model. In: AAAI (2024)
Google Scholar
Zeng, B., et al.: FNeVR: neural volume rendering for face animation. NeurIPS 35, 22451–22462 (2022)
Google Scholar
Zeng, B., et al.: Face animation with an attribute-guided diffusion model. In: CVPRW (2023)
Google Scholar
Zhang, C., et al.: Xtransct: Ultra-fast volumetric CT reconstruction using two orthogonal x-ray projections via a transformer network. arXiv:2305.19621 (2023)
Zhang, Z., Sun, L., Yang, Z., Chen, L., Yang, Y.: Global-correlated 3D-decoupling transformer for clothed avatar reconstruction. NeurIPs 36 (2024)
Google Scholar
Zhu, J., Zhuang, P.: HiFA: high-fidelity text-to-3D with advanced diffusion guidance. arXiv:2305.18766 (2023)

Download references

Acknowledgements

The work was supported by the National Key Research and Development Program of China (Grant No. 2023YFC3300029). This research was also supported by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LD24F020007, Beijing Natural Science Foundation L223024, National Natural Science Foundation of China under Grant NO. 62076016, 62176068, and 12201024, “One Thousand Plan” projects in Jiangxi Province Jxsg2023102268, Beijing Municipal Science & Technology Commission, Administrative Commission of Zhongguancun Science Park Grant No.Z231100005923035. Taiyuan City “Double hundred Research action” 2024TYJB0127.

Author information

Authors and Affiliations

Beihang University, Beijing, China
Xuhui Liu, Hong Li, Juan Zhang & Baochang Zhang
Central Research Institute, United Imaging Healthcare, Beijing, China
Zhi Qiao, Runkun Liu, Xiantong Zhen & Zhen Qian
Zhongguancun Laboratory, Beijing, China
Baochang Zhang

Authors

Xuhui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Runkun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hong Li
View author publications
You can also search for this author in PubMed Google Scholar
Juan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiantong Zhen
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Qian
View author publications
You can also search for this author in PubMed Google Scholar
Baochang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Juan Zhang or Xiantong Zhen .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8308 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X. et al. (2025). DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15101. Springer, Cham. https://doi.org/10.1007/978-3-031-72775-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-72775-7_26
Published: 30 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72774-0
Online ISBN: 978-3-031-72775-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SNAFusion: Distilling 2D Axial Plane Diffusion Priors for Sparse-View 3D Cone-Beam CT Imaging

Learning 3D Gaussians for Extremely Sparse-View Cone-Beam CT Reconstruction

DuDoTrans: Dual-Domain Transformer for Sparse-View CT Reconstruction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 8308 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

SNAFusion: Distilling 2D Axial Plane Diffusion Priors for Sparse-View 3D Cone-Beam CT Imaging

Learning 3D Gaussians for Extremely Sparse-View Cone-Beam CT Reconstruction

DuDoTrans: Dual-Domain Transformer for Sparse-View CT Reconstruction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 8308 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation