Abstract
Accurate geometric quantification of the human heart is a key step in the diagnosis of numerous cardiac diseases, and in the management of cardiac patients. Ultrasound imaging is the primary modality for cardiac imaging, however acquisition requires high operator skill, and its interpretation and analysis is difficult due to artifacts. Reconstructing cardiac anatomy in 3D can enable discovery of new biomarkers and make imaging less dependent on operator expertise, however most ultrasound systems only have 2D imaging capabilities. We propose both a simple alteration to the Pix2Vox++ networks for a sizeable reduction in memory usage and computational complexity, and a pipeline to perform reconstruction of 3D anatomy from 2D standard cardiac views, effectively enabling 3D anatomical reconstruction from limited 2D data. We evaluate our pipeline using synthetically generated data achieving accurate 3D whole-heart reconstructions (peak intersection over union score \(> 0.88\)) from just two standard anatomical 2D views of the heart. We also show preliminary results using real echo images.
D. Stojanovski—This work was supported by the Wellcome/EPSRC Centre for Medical Engineering [WT203148/Z/16/Z], by the British Heart Foundation [TG/17/3/33406] and the National Institute for Health Research (NIHR) Biomedical Research Centre at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London. Pablo Lamata holds a Wellcome Trust Senior Research Fellowship [209450/Z/17/Z]. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
1.1 Motivation and Background
The most common imaging modality for cardiac assessment worldwide is ultrasound (US) [2] because it is more affordable and safer, and it has higher temporal resolution when compared to other modalities. The reliance on operator skill to acquire high quality standard views is the main limitation of US, particularly for cardiac applications (echocardiography).
Most US systems only have 2D capabilities, and most echo protocols are limited to 2D modes [13]. 3D data is desirable both for anatomical and functional understanding of the heart. The heart is a complex 3D structure, and subsequently most features cannot be fully captured within 2D planes. Ventricular and atrial walls are curved surfaces, valve hinges do not sit on a 2D circle but rather a 3D saddle shape, and as a result current protocols require many standard planes to be able to assess most features, while a high quality 3D reconstruction would capture them all. Cardiac function is quantified primarily via tissue motion and blood flow. Blood quantity is volumetric, but usually 2D approximations are used, e.g. 2D left ventricular delineation on a 4 chamber view to compute ejection fraction. This has been commonly used both in classical clinical disease detection and deep learning methods [10, 15]. Tissue motion is analysed using 2D components of the motion e.g. longitudinal, radial, etc., however cardiac motion is complex and combines torsion, vertical motion and displacement, which are fundamentally 3D phenomena. Capturing only 2D views can impede further analysis if the quality of the views is suboptimal from factors such as foreshortening and artifacts.
3D volumes would allow for more accurate quantification of the cardiac geometries, and in turn allow clinicians to reduce the rate of disease misdiagnosis as a result of the incorrect quantification arising from using 2D data. 3D ultrasound probes are available for clinical use. They are however very prone to artifacting, particularly in moving anatomies such as the heart, and as a result are very rarely used in practice [9].
The ability to quantify cardiac function using standard 2D US systems, without the need for external trackers to define a spatial geometry, can be of great clinical use, but is limited by lack of paired US and native 3D ground truth anatomy. The aim of this work is to develop a method for reconstructing a full 3D representation of the heart from 2D image views.
1.2 Related Works
The advent of Deep Learning (DL) has led to development of 3D image reconstruction methods that have been shown to successfully learn shape and structure from partial observations, achieving start-of-the-art (SOTA) performance in natural and medical imaging.
2D to 3D Reconstruction in Natural Images. Current SOTA methods using DL allow accurate 2D to 3D reconstruction of objects from one or more RGB images, without the aid of camera positioning calibration information. A popular standardized dataset used to perform comparisons of reconstruction techniques is ShapeNetCore [4]. ShapeNetCore covers 55 object categories with over 51,300 unique 3D models. In 2019, ShapeNetCore reconstruction SOTA accuracy was achieved by Xie et al. who were able to reach 0.706 Intersection over Union (IoU) using the Pix2Vox++ (PiVox) network [16]. Two network variations, Pix2Vox++/Fast (PiVox/Fast) and Pix2Vox++/Accurate, or (PiVox/Accurate), were proposed with the PiVox/Fast network being a lighter weight, albeit lower performing, version of the full PiVox/Accurate network. However, PiVox/Fast is still an expensive model to train and perform inference, thus severely limiting the possible reconstruction resolution on currently available single Graphics Processing Units (GPUs).
Ultrasound Specific 2D to 3D Reconstruction. A number of US-specific 3D reconstruction algorithms have been developed showing effective performance but have not been applied to cardiac reconstruction. Cerrolaza et al. [1] performed fetal skull reconstruction using conditional hierarchical generative network Variational Auto-Encoders (VAEs) and achieved a Dice Coefficient (DC) of 0.91 using three orthogonal US views. In contrast to our goal of performing cardiac reconstructions, their reconstruction target had a generally regular shape which does not deform during imaging.
Prevost et al. [12] combine DL with an Inertial Measurement Unit (IMU), using two consecutive frames with optical flow as a channel to the network along with the Euler angles provided by the IMU. On a phantom they achieved a minimum, medium and maximum drift of 1.70, 18.30 and 36.90 mm respectively.
1.3 Contributions of This Study
No previous work has addressed 3D cardiac shape reconstructions from untracked 2D echo views. As a first step towards this goal, we present an exploratory investigation using synthetic echo data: 1) A pipeline to synthesize 2D echo views with corresponding 3D ground truth; 2) A demonstration that PiVox can produce accurate 2D to 3D reconstructions of realistic synthetic hearts; 3) A simple modification to both PiVox networks which vastly reduces memory and computational expense, while still achieving high reconstruction accuracy.
2 Methods
To explore reconstruction of a 3D heart from 2D data, we used two types of synthetic data: 1) A segmentation dataset, containing binary tissue masks (segmentations) on 2D standard views. Masks are simulated by slicing 3D computational models (which allows using 3D ground truths for evaluation); 2) A synthetic US dataset, containing synthetic 2D standard US views, generated from the tissue masks. We used both for model training and testing. An overview of the proposed pipeline is shown in Fig. 1.
2.1 Synthetic 2D Segmentations from 3D Heart Models
We obtain synthetic 2D segmentations by slicing 3D mesh models of the heart at standard echocardiographic views. These will be used to asses both how the 2D to 3D reconstruction networks are affected by the large amounts of variation that will be present in real ultrasound images, and also to provide insight into the feasibility of performing training on synthetic data and testing on real ultrasound data. If it can be shown that training using just synthetic data and testing on real data is feasible then it would allow generation of large training datasets and greatly reduce the amount of paired 3D CT/2D ultrasound data required to validate methodologies.
Our slice extraction technique utilizes a combination of the Visualization Toolkit (VTK) [8] and the PyVista Python packages [3]. The mesh data used was a set of 1000 synthetic cardiac meshes created by Rodero et al. [14].
The first step in extracting the standard plane cardiac views requires defining either 3 landmarks for each view to which a plane can be fit, or 2 landmarks and a projection along a vector. We defined these automatically-calculated landmarks based on their relation to various cardiac structures present in the meshes/segmentations (cf. Table 1). A trained sonographer examined 10 sets of segmentations to confirm the chosen landmarks resulted in suitably realistic extracted slices from our synthetic meshes.
The left ventricular apex (LVA) was found by performing a ray casting from the centre of mass of the mitral valve to all mesh cell faces in the LV mesh. If there was a pair of intersection points along the ray line (i.e. the ray penetrates both the endo- and epicardial wall) the distance between these intersection points was calculated. The shortest distance was chosen as the LVA, to find the thinnest location of the cardiac wall and to minimize apical foreshortening.
2.2 Synthetic Ultrasound Images
Synthetic Ultrasound images (i.e. with real appearance) are generated from the same 3D heart models to study feasibility in realistic echo data while having an exact ground truth. We adapted the technique proposed by Gilbert et al. [5] using the CAMUS dataset [7]. In brief, pseudo-images, i.e. tissue masks corresponding to 2 and 4 chamber images, with noise and Gaussian blurring are used as input to a CycleGAN network [17] with unpaired US images to create the final synthetic images, as exemplified in Fig. 2.
Producing high quality synthetic US images required small variations in 1) noise parameters 2) sequence of additive noise and blurring, and 3) size of the Gaussian blur kernel. These parameters were changed due to a different resolution input image, and our requirement to not perform any geometric transforms, e.g. changing of input anatomy dimensions.
2.3 Efficient Pix2Vox++
The general Pix2Vox++ (PiVox) network architecture is composed of a series of parallel encoder and decoder branches for each input view, before being passed into the fusion and refiner modules. The decoder, in particular, is memory and computationally expensive due to the 3D deconvolutional kernels. This greatly limits the resolution of the 3D volumes that can be used, thus necessitating a more efficient network with high reconstruction accuracy. It also contains a very large number of learnable parameters, in turn requiring a very large training set.
Xie et al. proposed a lighter weight, albeit lower performing, variant of the PiVox/Accurate network, referred to as PiVox/Fast, for reducing memory and computational complexity by 1) Using ResNet-18 instead of ResNet-50 2) Decreased de/convolution kernel sizes 3) Removal of the refiner module.
We propose a simple adaptation of the PiVox networks, referred to as E-PiVox, to greatly decrease both memory usage and computational expense, with minimal impact to performance. We achieve this by adding a 3D convolution at the end of the encoder module along the input image dimension to reduce the decoder to a single branch for any number of input images. All information is propagated through the parallel decoder branches in the PiVox networks passes through a compressed latent space, in theory allowing for minimal decreases in performance from having just a single decoder branch rather than multiple branches.
2.4 2D to 3D Reconstruction of Synthetic Hearts
Implementation Details.
The dataset of 1000 cardiac meshes was divided into a 70/15/15\(\%\) train/validation/test split and remained unchanged for each of the different training runs. The models were trained with binary segmentation masks, the label maps converted to realistic echo-like images using the CycleGAN network and also on the ShapeNet dataset. All training was performed using Pytorch 1.9.1 [11] on an Nvidia RTX 3090 for 200 epochs. The Adam optimizer was used with a \(\beta _1\) of 0.9 and a \(\beta _2\) of 0.999 [6]. The code is available at https://github.com/david-stojanovski/E-Pix2Vox-reconstruction
3 Experiments and Results
Reconstruction accuracy was assessed using the thresholded Intersection over Union (IoU) [16].
The final reconstruction results of the ShapeNet and cardiac training runs are shown in Tables 2 and 3 respectively. Table 2 shows PiVox/Accurate was consistently the best performing network, however E-PiVox/Fast and E-PiVox/Accurate were consistently within \( 1.2 \%\) and \( 2 \%\) respectively of PiVox/Fast and PiVox/Accurate. This was achieved while being much more memory and computationally efficient.
It can be seen in Table 2 that our E-PiVox networks was able to come very close to the performance of the much more computationally expensive PiVox networks. The relationship between computational expense and memory usage in regard to number of input views is shown in Fig. 3 and highlights the importance of efficient network architectures when using a larger number of views, such as in video recordings, or higher resolutions e.g. full resolution CT.
Table 3 shows the results when using both binary and realistic ultrasound CycleGAN inferenced images. As shown in Table 3 both E-PiVox networks generally outperformed the PiVox networks in the relative comparisons.The reconstruction accuracy dropped most in areas where the walls of the heart were quite thin, generally in the atrial region. Small errors appearing over the entire structure appear to arise due to errors in the exact location discretization of the voxels.
We present examples of 3D heart reconstructions in Fig. 4, including preliminary results using real 2D echo images from the CAMUS dataset. Synthetically trained reconstructions show close correspondence with the ground truth anatomy (black wireframe). However, reduced number of input views resulted in decreased accuracy and clear non-physiological holes. Such phenomena were exacerbated when using US-like images (see Fig. 4C). Larger differences were observed in the LVA and valve planes for all models.
4 Discussion and Conclusions
This work provides a proof of concept that a complex geometry like the human heart can be reconstructed with reasonable accuracy from a limited number of standard 2D anatomical views. And that this can be achieved with efficient training and inference.
Results in Table 3 show that the PiVox network can successfully reconstruct a full heart with a peak IoU of 0.903 for 9 segmented views, and a minimal decrease in performance to 0.881 for just 2 views. The 0.741 IoU achieved with just 2 synthetic US images represents an acceptable performance given the much more challenging input image data that emulates real-world acquisitions.
It is important to note that real US images present great variability in contrast and appearance. The preliminary results using real data (panel D in Fig. 4) show reconstructions are possible, albeit needing further improvement. Given a suitably large enough set of real US images, accurate 3D reconstructions could be possible from a small number of standard clinical 2D views.
The keystone for this contribution is the synthesis of training data (i.e. 2D views) with idealized ground truth 3D reconstructions. This approach addresses the limitations in medical data availability, data privacy, and cost of expert annotation. Preliminary results show promise in the use on real data using models trained with synthetic data, but further research and evidence is needed.
The main limitation of our study is that it is based on the anatomical variability of a synthetic cohort of healthy subjects. As such, the performance could dramatically drop in the presence of disease.
In conclusion, this work demonstrates, in a synthetic workbench, the feasibility of 3D cardiac reconstruction from standard 2D views.
References
Cerrolaza, J.J., et al.: 3D fetal skull reconstruction from 2DUS via deep conditional generative networks. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11070, pp. 383–391. Springer, 10.1007/978-3-030-00928-1 (2018). https://doi.org/10.1007/978-3-030-00928-1_44
Braga, J.R., Leong-Poi, H., Rac, V.E., Austin, P.C., Ross, H.J., Lee, D.S.: Trends in the use of cardiac imaging for patients with heart failure in Canada. JAMA Netw. Open 2(8), 1–13 (2019). https://doi.org/10.1001/jamanetworkopen.2019.8766
Castro, D.d.l.I., et al.: daavoo/pyntcloud: v0.1.6 (2022). https://doi.org/10.5281/ZENODO.5841822, ‘zenodo.org/record/5841822’
Chang, A.X. et al.: ShapeNet: An Information-Rich 3D Model Repository (2015). arXiv:1512.03012
Gilbert, A., Marciniak, M., Rodero, C., Lamata, P., Samset, E., McLeod, K.: Generating synthetic labeled data from existing anatomical models: an Example with echocardiography segmentation. IEEE Trans. Med. Imaging 40(10), 2783–2794 (2021). https://doi.org/10.1109/TMI.2021.3051806
Kingma, D.P., Ba, J.L.: Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–15 (2015)
Leclerc, S., et al.: Deep learning for segmentation using an open large-scale dataset in 2D echocardiography. IEEE Trans. Med. Imaging 38(9), 2198–2210 (2019). https://doi.org/10.1109/TMI.2019.2900516
Lowekamp, B.C., Chen, D.T., Ibáñez, L., Blezek, D.: The design of simpleITK. Frontiers in Neuroinformatics 7(DEC), 1–14 (2013). https://doi.org/10.3389/fninf.2013.00045
Nelson, T.R., Pretorius, D.H., Hull, A., Riccabona, M., Sklansky, M.S., James, G.: Sources and impact of artifacts on clinical three-dimensional ultrasound imaging. Ultrasound Obstet. Gynecol. 16(4), 374–383 (2000). https://doi.org/10.1046/j.1469-0705.2000.00180.x
Ouyang, D., et al.: Video-based AI for beat-to-beat assessment of cardiac function. Nature 580(7802), 252–256 (2020). https://doi.org/10.1038/s41586-020-2145-8
Paszke, A., et al.: PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32(NeurIPS) (2019)
Prevost, R., et al.: 3D freehand ultrasound without external tracking using deep learning. Med. Image Anal. 48, 187–202 (2018). https://doi.org/10.1016/j.media.2018.06.003
Robinson, S.: A practical guideline for performing a comprehensive transthoracic echocardiogram in adults: the british society of echocardiography minimum dataset. Echo Res. Pract. 7(4), G59–G93 (2020). https://doi.org/10.1530/ERP-20-0026
Rodero, C., et al.: Linking statistical shape models and simulated function in the healthy adult human heart. PLoS Comput. Biol. 17(4), 1–28 (2021). https://doi.org/10.1371/journal.pcbi.1008851
Upton, R., et al.: Automated Echocardiographic Detection of Severe Coronary Artery Disease Using Artificial Intelligence. JACC Cardiovascular Imaging pp. 1–13 (2022). https://doi.org/10.1016/j.jcmg.2021.10.013
Xie, H., Yao, H., Sun, X., Zhou, S., Zhang, S.: Pix2Vox: Context-aware 3D reconstruction from single and multi-view images. In: Proceedings of the IEEE International Conference on Computer Vision 2019-Octob, 2690–2698 (2019). https://doi.org/10.1109/ICCV.2019.00278
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks (2017). https://doi.org/10.48550/ARXIV.1703.10593. arXiv:1703.10593
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this paper
Cite this paper
Stojanovski, D., Hermida, U., Muffoletto, M., Lamata, P., Beqiri, A., Gomez, A. (2022). Efficient Pix2Vox++ for 3D Cardiac Reconstruction from 2D Echo Views. In: Aylward, S., Noble, J.A., Hu, Y., Lee, SL., Baum, Z., Min, Z. (eds) Simplifying Medical Ultrasound. ASMUS 2022. Lecture Notes in Computer Science, vol 13565. Springer, Cham. https://doi.org/10.1007/978-3-031-16902-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-16902-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16901-4
Online ISBN: 978-3-031-16902-1
eBook Packages: Computer ScienceComputer Science (R0)