Abstract
Computed tomography (CT) is widely utilized in clinical settings because it delivers detailed 3D images of the human body. However, performing CT scans is not always feasible due to radiation exposure and limitations in certain surgical environments. As an alternative, reconstructing CT images from ultra-sparse X-rays offers a valuable solution and has gained significant interest in scientific research and medical applications. However, it presents great challenges as it is inherently an ill-posed problem, often compromised by artifacts resulting from overlapping structures in X-ray images. In this paper, we propose DiffuX2CT, which models CT reconstruction from orthogonal biplanar X-rays as a conditional diffusion process. DiffuX2CT is established with a 3D global coherence denoising model with a new, implicit conditioning mechanism. We realize the conditioning mechanism by a newly designed tri-plane decoupling generator and an implicit neural decoder. By doing so, DiffuX2CT achieves structure-controllable reconstruction, which enables 3D structural information to be recovered from 2D X-rays, therefore producing faithful textures in CT images. As an extra contribution, we collect a real-world lumbar CT dataset, called LumbarV, as a new benchmark to verify the clinical significance and performance of CT reconstruction from X-rays. Extensive experiments on this dataset and three more publicly available datasets demonstrate the effectiveness of our proposal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abdollahi, A., Pradhan, B., Alamri, A.: VNet: an end-to-end fully convolutional neural network for road extraction from high-resolution remote sensing data. IEEE Access (2020)
Anciukevičius, T., Xu, Z., Fisher, M., Henderson, P., Bilen, H., Mitra, N.J., Guerrero, P.: RenderDiffusion: image diffusion for 3D reconstruction, inpainting and generation. In: arXiv:2211.09869 (2022)
Armato III, S.G., et al.: The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans. Med. Phys. 38(2) 915–931 (2011)
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: ViViT: a video vision transformer. In: ICCV (2021)
Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srinivasan, P.P.: Mip-Nerf: a multiscale representation for anti-aliasing neural radiance fields. In: ICCV (2021)
Blattmann, A., et al.: Align your latents: high-resolution video synthesis with latent diffusion models. In: CVPR (2023)
Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks. arXiv:1608.04236 (2016)
Cai, R., et al.: Learning gradient fields for shape generation. In: ECCV (2020)
Chan, E.R., et al.: Efficient geometry-aware 3D generative adversarial networks. In: CVPR (2022)
Chan, E.R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G.: pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. In: CVPR (2021)
Chen, Z., Zhang, H.: Learning implicit fields for generative shape modeling. In: CVPR (2019)
Chung, H., Ryu, D., McCann, M.T., Klasky, M.L., Ye, J.C.: Solving 3D inverse problems using pre-trained 2D diffusion models. In: CVPR (2023)
Chung, H., Sim, B., Ryu, D., Ye, J.C.: Improving diffusion models for inverse problems using manifold constraints. In: NeurIPS (2022)
Deng, Y., et al.: CTspine1k: a large-scale dataset for spinal vertebrae segmentation in computed tomography. arXiv:2105.14711 (2021)
Deng, Y., Yang, J., Xiang, J., Tong, X.: Gram: generative radiance manifolds for 3D-aware image generation. In: CVPR (2022)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: NeurIPS 34, 8780–8794 (2021)
Galvin, J.M., Sims, C., Dominiak, G., Cooper, J.S.: The use of digitally reconstructed radiographs for three-dimensional treatment planning and CT-simulation. Int. J. Radiat. Oncol. Biol. Phys. 31(4), 935–942 (1995)
Gao, S., et al.: Implicit diffusion models for continuous super-resolution. In: CVPR (2023)
Ge, R., et al.: X-CTRSNet: 3D cervical vertebra CT reconstruction and segmentation directly from 2D X-ray images. Knowl. -Based Syst. 236, 107680 (2022)
Goodfellow, I., et al.: Generative adversarial nets. NeurIPS 27(2014)
Harmon, S.A., et al.: Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets. Nat. commun. 11(1), 4080 (2020)
Henzler, P., Rasche, V., Ropinski, T., Ritschel, T.: Single-image tomography: 3D volumes from 2D cranial x-rays. Comput. Graph. Forum 37(2), 377–388 (2018)
Herman, G.T.: Fundamentals of Computerized Tomography: Image Reconstruction from Projections. Springer Science & Business Media (2009). https://doi.org/10.1007/978-1-84628-723-7
Ho, J., et al.: Imagen video: high definition video generation with diffusion models. arXiv:2210.02303 (2022)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. NeurIPS 33, 6840–6851 (2020)
Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models. NeurIPS 35, 8633–8646 (2022)
Huang, Y., Taubmann, O., Huang, X., Haase, V., Lauritsch, G., Maier, A.: Scale-space anisotropic total variation for limited angle tomography. IEEE Trans. Radiat. Plasma Med. Sci. 2(4), 307–314 (2018)
Jiang, L., Zhang, M., Wei, R., Liu, B., Bai, X., Zhou, F.: Reconstruction of 3D CT from a single x-ray projection view using CVAE-GAN. In: ICMIPE (2021)
Johnson, A.E., et al.: MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. data 6(1), 317 (2019)
Johnson, C.D., et al.: Accuracy of CT colonography for detection of large adenomas and cancers. New Engl. J. Med. 359(12), 1207–1217 (2008)
Kasten, Y., Doktofsky, D., Kovler, I.: End-to-end convolutional neural network for 3D reconstruction of knee bones from bi-planar X-ray images. In: MICCAI (2020)
Khan, A., et al.: Comparing next-generation robotic technology with 3-dimensional computed tomography navigation technology for the insertion of posterior pedicle screws. World Neurosurg. 123, e474–e481 (2019)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)
Kong, Z., Ping, W., Huang, J., Zhao, K., Catanzaro, B.: DiffWave: a versatile diffusion model for audio synthesis. arXiv:2009.09761 (2020)
Lee, S., Chung, H., Park, M., Park, J., Ryu, W.S., Ye, J.C.: Improving 3D imaging with pre-trained perpendicular 2D diffusion models. In: ICCV (2023)
Li, H., et al.: UV-IDM: identity-conditioned latent diffusion model for face UV-texture generation. In: CVPR (2024)
Li, M., Duan, Y., Zhou, J., Lu, J.: Diffusion-SDF: text-to-shape via voxelized diffusion. In: CVPR (2023)
Li, S., et al.: Zone: zero-shot instruction-guided local editing. In: CVPR (2024)
Liao, Y., Schwarz, K., Mescheder, L., Geiger, A.: Towards unsupervised learning of generative models for 3D controllable image synthesis. In: CVPR (2020)
Lin, C.H., et al.: Magic3D: High-resolution text-to-3D content creation. arXiv:2211.10440 (2022)
Liu, J., et al.: Dolce: a model-based probabilistic diffusion framework for limited-angle CT reconstruction. In: ICCV (2023)
Liu, J., Li, C., Ren, Y., Chen, F., Zhao, Z.: Diffsinger: singing voice synthesis via shallow diffusion mechanism. In: AAAI (2022)
Liu, P., et al.: Deep learning to segment pelvic bones: large-scale CT datasets and baseline models. Int. J. Comput. Assist. Radiol. Surg. 16, 749–756 (2021)
Liu, X., et al.: Ladiffgan: Training GANs with diffusion supervision in latent spaces. In: CVPRW (2024)
Liu, Z., Ning, J., Cao, Y., Wei, Y., Zhang, Z., Lin, S., Hu, H.: Video swin transformer. In: CVPR (2022)
Liu, Z., Feng, Y., Black, M.J., Nowrouzezahrai, D., Paull, L., Liu, W.: MeshDiffusion: score-based generative 3D mesh modeling. arXiv:2303.08133 (2023)
Luo, S., Hu, W.: Diffusion probabilistic models for 3D point cloud generation. In: CVPR (2021)
Mildenhall, B., et al.: Nerf: Representing scenes as neural radiance fields for view synthesis. ACM, Commun. 65(1), 99–106 (2021)
Mohan, K.A., et al.: TIMBIR: a method for time-space reconstruction from interlaced views. IEEE Trans. Comput. Imaging 1(2), 96–111 (2015)
Nichol, A., et al.: Glide: towards photorealistic image generation and editing with text-guided diffusion models. arXiv:2112.10741 (2021)
Nolan, T.: Head-and-neck squamous cell carcinoma patients with CT taken during pre-treatment, mid-treatment, and post-treatment (HNSCC-3DCT-RT). Cancer Imaging Archive (2022)
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: CVPR (2019)
Peebles, W., Xie, S.: Scalable diffusion models with transformers. arXiv:2212.09748 (2022)
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: text-to-3D using 2D diffusion. arXiv:2209.14988 (2022)
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125 (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: MICCAI (2015)
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS 35, 36479–36494(2022)
Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement. TPAMI 45(4), 4713–4726 (2022)
Schofield, R., et al.: Image reconstruction: part 1–understanding filtered back projection, noise and image acquisition. J. cardiovasc. comput. tomogr. 14(3), 219–225 (2020)
Schwarz, K., Liao, Y., Niemeyer, M., Geiger, A.: Graf: generative radiance fields for 3d-aware image synthesis. NeurIPS 33, 20154–20166 (2020)
Shen, L., Zhao, W., Xing, L.: Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning. Nat. Biomed. Eng. 3(11), 880–888(2019)
Shiode, R., et al.: 2D–3D reconstruction of distal forearm bone from actual X-ray images of the wrist using convolutional neural networks. Sci. Rep. 11(1), 15249 (2021)
Siasios, I.D., Pollina, J., Khan, A., Dimopoulos, V.G.: Percutaneous screw placement in the lumbar spine with a modified guidance technique based on 3D CT navigation system. J. Spine Surg. 3(4), 657 (2017)
Simpson, A.L., et al.: A large annotated medical image dataset for the development and evaluation of segmentation algorithms (2019)
Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. INeurIPS 33, 7462–7473 (2020)
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: ICML (2015)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv:2010.02502 (2020)
Song, Y., Shen, L., Xing, L., Ermon, S.: Solving inverse problems in medical imaging with score-based generative models. In: ICLR (2022)
Tang, J., Wang, T., Zhang, B., Zhang, T., Yi, R., Ma, L., Chen, D.: Make-it-3D: high-fidelity 3D creation from a single image with diffusion prior. arXiv:2303.14184 (2023)
Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., Bermano, A.H.: Human motion diffusion model. In: ICLR (2023)
Vahdat, A., Kreis, K., Kautz, J.: Score-based generative modeling in latent space. In: NeurIPS 34, 11287–11302 (2021)
Vaswani, A., et al.: Attention is all you need. NeurIPS (2017)
Venkatakrishnan, S.V., Drummy, L.F., Jackson, M.A., De Graef, M., Simmons, J., Bouman, C.A.: A model based iterative reconstruction algorithm for high angle annular dark field-scanning transmission electron microscope (HAADF-STEM) tomography. TIP 22(11), 4532–4544 (2013)
Venkatakrishnan, S.V., Mohan, K.A., Ziabari, A.K., Bouman, C.A.: Algorithm-driven advances for scientific CT instruments: from model-based to deep learning-based approaches. IEEE Sign. Process. Mag. 39(1), 32–43 (2021)
Wang, G., Ye, J.C., Mueller, K., Fessler, J.A.: Image reconstruction is a new frontier of machine learning. TMI, 37(6), 1289–1296 (2018)
Wang, H., Du, X., Li, J., Yeh, R.A., Shakhnarovich, G.: Score jacobian chaining: Lifting pretrained 2D diffusion models for 3D generation. In: ICCV (2023)
Wang, T., et al.: Rodin: a generative model for sculpting 3D digital avatars using diffusion. In: CVPR (2023)
Wang, Z., Lu, C., Wang, Y., Bao, F., Li, C., Su, H., Zhu, J.: Prolificdreamer: High-fidelity and diverse text-to-3D generation with variational score distillation. arXiv:2305.16213 (2023)
Wang, Z., Wu, Z., Agarwal, D., Sun, J.: MedCLIP: contrastive learning from unpaired medical images and text. arXiv:2210.10163 (2022)
Wu, J.Z., et al.: Tune-a-video: one-shot tuning of image diffusion models for text-to-video generation. In: ICCV (2023)
Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling (2016)
Wu, J., Mahfouz, M.R.: Reconstruction of knee anatomy from single-plane fluoroscopic X-ray based on a nonlinear statistical shape model. J. Med. Imaging 8(1), 016001–016001(2021)
Xie, C., Wang, C., Zhang, B., Yang, H., Chen, D., Wen, F.: Style-based point generator with adversarial rendering for point cloud completion. In: CVPR (2021)
Ying, X., Guo, H., Ma, K., Wu, J., Weng, Z., Zheng, Y.: X2ct-GAN: reconstructing CT from biplanar X-rays with generative adversarial networks. In: CVPR (2019)
Yu, S., Sohn, K., Kim, S., Shin, J.: Video probabilistic diffusion models in projected latent space. In: CVPR (2023)
Zeng, B., et al.: IPDreamer: appearance-controllable 3D object generation with image prompts. arXiv:2310.05375 (2023)
Zeng, B., et al.: Controllable mind visual diffusion model. In: AAAI (2024)
Zeng, B., et al.: FNeVR: neural volume rendering for face animation. NeurIPS 35, 22451–22462 (2022)
Zeng, B., et al.: Face animation with an attribute-guided diffusion model. In: CVPRW (2023)
Zhang, C., et al.: Xtransct: Ultra-fast volumetric CT reconstruction using two orthogonal x-ray projections via a transformer network. arXiv:2305.19621 (2023)
Zhang, Z., Sun, L., Yang, Z., Chen, L., Yang, Y.: Global-correlated 3D-decoupling transformer for clothed avatar reconstruction. NeurIPs 36 (2024)
Zhu, J., Zhuang, P.: HiFA: high-fidelity text-to-3D with advanced diffusion guidance. arXiv:2305.18766 (2023)
Acknowledgements
The work was supported by the National Key Research and Development Program of China (Grant No. 2023YFC3300029). This research was also supported by the Zhejiang Provincial Natural Science Foundation of China under Grant No. LD24F020007, Beijing Natural Science Foundation L223024, National Natural Science Foundation of China under Grant NO. 62076016, 62176068, and 12201024, “One Thousand Plan” projects in Jiangxi Province Jxsg2023102268, Beijing Municipal Science & Technology Commission, Administrative Commission of Zhongguancun Science Park Grant No.Z231100005923035. Taiyuan City “Double hundred Research action” 2024TYJB0127.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, X. et al. (2025). DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15101. Springer, Cham. https://doi.org/10.1007/978-3-031-72775-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-72775-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72774-0
Online ISBN: 978-3-031-72775-7
eBook Packages: Computer ScienceComputer Science (R0)