US & MRI Image Fusion Based on Markerless Skin Registration

Martina Paccini, Giacomo Paschina, Stefano De Beni, Andrei Stefanov,
Velizar Kolev, Giuseppe Patanè

Abstract

This paper presents an innovative automatic fusion imaging system that combines 3D CT/MR images with real-time ultrasound (US) acquisition. The system eliminates the need for external physical markers and complex training, making image fusion feasible for physicians with different experience levels. The integrated system involves a portable 3D camera for patient-specific surface acquisition, an electromagnetic tracking system, and US components. The fusion algorithm comprises two main parts: skin segmentation and rigid co-registration, both integrated into the US machine. The co-registration software aligns the surface extracted from CT/MR images with patient-specific coordinates, facilitating rapid and effective fusion. Experimental testing in different settings validates the system’s accuracy, computational efficiency, noise robustness, and operator independence. The co-registration error remains under the acceptable range of $1$ cm.

1 Introduction

Medical imaging offers many image acquisition techniques, which allow us to obtain information related to different tissues with various settings, such as signal-to-noise ratio, contrast, and resolution. Generally, high-resolution imaging has the drawback of requiring extended image acquisition time, thus making these techniques unsuitable for real-time image processing and analysis. For instance, surgical tools guidance, requires monitoring and guiding the insertion of a biopsy needle. Similarly, in cardiological imaging, ongoing monitoring of organ functional reactions is crucial for evaluating heart function. In contrast to high-resolution imaging (e.g., CT, PET, MRI), US imaging allows real-time acquisition. It assists physicians in various interventional applications, from simple biopsy to more thorough procedures like mini-invasive tumour treatment or neurosurgery. However, the US has a reduced field of view compared to other imaging techniques and a lower quality of features, such as image resolution or the revelation and reproduction of certain kinds of tissue, e.g., soft tissues. Therefore, medical imaging is moving toward combining real-time US with other acquisition modalities. This image combination, or fusion, is applied to several applications, like diagnostics and mini-invasive surgical interventions. The core concept in fusion imaging is accurately registering the patient’s anatomy in different medical imaging data, such as US, CT, or MR images, which implies aligning different acquisitions into a standard reference system. This process often leads to overcomplicated systems for an actual clinical application. Image fusion (Sect. 2) is carried out by tracking the US probe’s position, orientation, and displacements during the acquisition within a reference system common to the US images and the other modalities considered. This setting implies the use of probe trackers of different natures: probe trackers based on (i) optical technology can simultaneously track several objects with high precision but with the drawback of the line of sight that is difficult to guarantee in an interventional room. Alternatively, (ii) EM technology, whose drawback is the high sensitivity to metals plus the necessity of several wires, one for each object that needs to be tracked. Moreover, typically EM tracking systems require choosing markers, on the patient with a wand (or needle guide wire or catheter) and simultaneously selecting those markers on the preprocedural image (CT/MRI) or leveraging fiducial patches on the procedural (CT/MRI) image [AJKK ${}^{+}$ 12]. This paper introduces an innovative image fusion system (Fig. 1) that combines 3D CT/MR images with US acquisition in real-time (Sect. 3). The system consists of hardware and software components designed to seamlessly integrate the resulting system into a clinical environment, particularly in interventional radiology, with a direct approach that does not require specific training. This efficient and intuitive integration is achieved trough the use of a 3D depth camera able to acquire a 3D surface of the subject undergoing the US exam as quickly as photograph-taking [Dep]. Indeed, the surface obtained by the 3D camera bridges the 3D anatomy acquired by MR/CT and US images, allowing their fast fusion. The highly portable 3D camera can be introduced in an operating room without compromising the pre-existing set-up. The other hardware components of the image fusion system include a US system and a simplified EM tracking system that does not require the placement of physical markers or fiducial patches. The tracking system comprises an electronic unit, a mid-range transmitter and sensors for tracking traces of the position and coordinates of the 3D camera and the US probe. The transmitter generates an electromagnetic field and can simultaneously track up to four sensors at 70 times per second. The sensors are placed on the components to be tracked to establish their spatial relationship within the fusion imaging setup (Fig. 1). One sensor is connected to the US probe, and the other is associated with the 3D camera, allowing the representation of the US image and the 3D surface acquired by the camera in a unique coordinate system (i.e., the tracking coordinates). The software components can be divided into tracking, surface co-registration, and visualisation software. The system segments the skin surface from the CT/MRI and generates a 3D surface overlay on the patient’s 3D rendering. Then, the clinician acquires the 3D skin with the 3D camera, and the co-registration software enables the image fusion within a few seconds, together with the registration error visualisation, facilitating the identification of potential mismatches in the scan area. Upon successful registration, the system presents MRI or CT images alongside the US image in various visualisation modes. This allows the clinician to access the corresponding anatomical information and real-time US data during the examination. Differently from state-of-the-art methods, where the registration techniques are based mainly on the doctor’s ability and require a long learning curve, the proposed image fusion is suitable even for radiologists with lower experience levels. Moreover, the proposed method avoids external physical markers, which, despite the interesting results [SPR ${}^{+}$ 18], could not comply with the existing hospital workflow since marker positioning requires time and can be error-prone. The skin segmentation method developed is highly general and can be applied to different anatomical regions and image types. Even the new AI-based methods for automatic liver shape or vessel tree segmentation require sequences that are only sometimes present in the patient data set and encounter even more complications due to the high variability of the images from series to series. The proposed image fusion system has been tested considering different aspects: the co-registration accuracy between MR/CT and US images (millimetric error), the computational cost for real-time applications (in seconds), noise robustness, and independence from the operator and setting (Sects. 4, 5).

2 Related work

Fusion imaging

Fusion imaging is in the guideline for several clinical procedures like targeted prostate biopsy that allow more comfort for the patient and more reliable results in the tissue sampling [BMWT16, GvdAB ${}^{+}$ 16]. In abdominal applications, fusion imaging is widely used for liver tumour treatment using ablation techniques based on radiofrequency (RF) needles, microwave (MW) antennas, laser fibre or cryoprobes. All these techniques require the placement of ablation electrodes or applicators into the lesion and the deployment of energy until the tissue reaches a temperature over 65°C (RF MW Laser) or less than -18°C (Cryo), causing cellular death. The main challenge for fusion imaging is to reach a millimetric accuracy between the target in the MRI/CT and US real-time imaging due to different factors influencing the result. Firstly, physical phenomena can reduce the accuracy depending on the nature of the tracking system: an optical tracker can produce errors if the line of sight is not maintained, and EM ones can be influenced by metal distortion. Furthermore, the different postures of the patient between the CT/MR image acquisition and the current position during the percutaneous procedure under US guidance can negatively affect fusion accuracy. Even less predictable movement due to the patient’s breathing and the different position of the organ between the CT/MR acquisition and the current examination has to be taken into account. Considering the liver, the patient’s breathing generally induces a mismatch from the inspiration to the expiration phase, generating an error in targeting up to 5cm [Lee14].

Different clinical procedures aim to reduce the error sources by either driving the patient’s breathing during the second modality examination or controlling (manually or aided by a breathing machine) the breathing movements during the procedure. The fusion imaging systems generally calibrate the organ directly to minimise possible errors. This solution fits well for those applications (neuro-oncology, muscle-skeletal) where the organ is not subjected to a deformation. The solutions introduced for deforming organs apply the registration based on elastic fusion, trying to deform the original image following the organ’s shape or vessel tree depicted by a US image or a volume. Unfortunately, all these solutions are palliative since the deformation is applied without knowing the organ rigidity and assuming that rigidity is homogeneous inside the organ, where the nodule is generally harder than parenchyma, as the elastography image technique confirms. Additional sensors are also used to track the patient’s breathing to synchronise the two image modalities in their best match during the breathing phase [LLM ${}^{+}$ 19, SGL ${}^{+}$ 20, MHvN ${}^{+}$ 23, YDK ${}^{+}$ 15].

Skin segmentation

In breast image analysis, a few works segmented the skin as part of their pipeline. For the diagnosis of breast diseases with the dynamic contrast-enhanced MRI (DCE-MRI), the segmentation of the breast’s first layer of skin has been obtained through a pre-processing of the image with median filters and mathematical morphology followed by the identification of the upper boundary of the breast, which is the skin boundary [LCCC18]. However, the method used to identify the upper boundary is not explicitly described and focuses only on DCE-MRI images. Another technique for breast skin identification on classical CT and MRI can be obtained through thresholding followed by morphological filters [JGM ${}^{+}$ 13] or 3D vector-based connected component algorithm [WPI ${}^{+}$ 12]. Thresholds have also been leveraged in other districts on the raw image [BYK ${}^{+}$ 12] or after a pre-processing aimed at edge enhancement [AnA]. These works apply the thresholding method on the whole image to classify the pixel in the background (black) or body (white) and then use other filtering methods to clean the obtained result. Skin segmentation is generally applied to CT images, as the corresponding skin’s Hounsfield Unit (HU) value is known [BYK ${}^{+}$ 12] and cannot be applied directly to other imaging modalities. In [BYM ${}^{+}$ 16], the Watershed transform from markers has been used to a gradient image containing light to do dark transitions obtained from T1-MRI. In deep learning, previous work [WKK ${}^{+}$ 19, WQT ${}^{+}$ 17] focused on body composition analysis, segmenting the image into different body structures, including subcutaneous adipose tissue and the external skin edge. In [OMB ${}^{+}$ 19], a combination of the Canny filter, the selection of boundaries and a local regression has been applied to delimitate the different skin layers in 3T MRI with T2-weighted sequence. All these works have developed skin segmentation as part of their work pipelines, thus focusing on one imaging modality and leveraging the properties of that specific image.

3D Rigid registration

3D rigid registration refers to aligning two 3D surfaces or point clouds. The Iterative Closest Point (ICP) algorithm [WZ17] iteratively searches the closest points between two point clouds and computes a rigid transformation to align them. The Robust Point Matching (RPM) [RCM ${}^{+}$ 97] applies a probabilistic approach to estimate the correspondences between points in two 3D point clouds. It is less sensitive to noise and outliers than ICP. The Coherent Point Drift (CPD) [MS10] applies a Gaussian mixture model to model the probability distribution of the point clouds. It supports rigid and non-rigid deformations and is more versatile than ICP and RPM. Deep learning methods, such as PointNet [AGSL19, QSMG17] and PointNet+ [QSMG17], apply neural networks to learn features from 3D point clouds and perform registration. Deep learning methods are highly efficient regarding the time required for registration after the training, thus valuable for real-time applications. However, learning methods must be trained on large data to avoid biases, and having large and various data sets in medical applications is still a challenge.

Refer to caption — Figure 1: (a) Hardware and software components of the system and their mutual interaction for the image fusion. (b) US system integrated into a testing setup that mimics the clinical environment.

3D Sensors for medical applications

The value of 3D camera trackers, depth sensors, or LiDAR scanners has been firmly established across various fields. Medical imaging, radiology, and surgery are some domains that can benefit significantly from their integration. In recent years, extensive research efforts have been directed towards enhancing the resolution, accuracy, and speed of these sensor technologies [Sak02]. Leveraging the data captured by 3D sensors holds the potential for substantial improvements in several medical applications. These advancements span diverse areas, ranging from surgical navigation and robot-assisted surgery, where 3D sensors facilitate image-guided procedures and enable robots to assist surgeons, to the domain of rehabilitation and physical therapy, where these sensors are already being employed to monitor patient movements meticulously and offer valuable feedback to both patients and therapists. Radiology and Imaging, in particular, emerge as a pivotal field poised to leverage the capabilities of 3D sensors [vHBW ${}^{+}$ 21]. Their integration enhances the quality of medical images, including those from CT and MRI scans, by providing real-time feedback on patient positioning and motion throughout the imaging process. Within this area, our research is centred on the challenge of leveraging a 3D camera tracker in radiology, which is necessary for harmonising heterogeneous data sources, notably the volumetric pre-operative image and the 3D surface data derived from the camera.

3 MR $\&$ US fusion system

(a)	(b)
(c)
(d)

The novel co-registration is divided into two main software components integrated into the image fusion system. The first aims to segment the external skin surface of patients acquired by MR/CT, while the second seeks to co-register the segmented skin surface with the 3D surface obtained by the camera.

Skin segmentation

The segmentation of the 3D surface representing the patient’s skin is used to bridge the seamless integration of the heterogeneous data sources involved in the system. Indeed, extracting the body surface from volumetric imaging facilitates subsequent analyses and enables the processing of lighter data. The segmentation of the external body surface is computed according to the Hounsfield value (CT) or intensity level (MR), set as a default parameter, and is represented as triangle mesh. Given a CT/MR image paired with the skin iso-value, the proposed segmentation identifies the subject’s skin surface, which is used as input for the co-registration. The general idea is to leverage the differences in intensities between the air and the body surface. The segmentation proceeds one slice at a time, starting from a background pixel. Then, the growth of the background region proceeds iteratively based on the pixels’ adjacency and stops when it encounters a pixel whose grey value is higher or equal to the iso-value of the skin. Through this region-growing algorithm, the evaluation expands only where the air is present; the body will be segmented as a whole object by exclusion.

Fig. 2 describes the segmentation method developed. We create a mock-up grid of the same dimensions of each slice, and each element of the grid is related to the pixel in the original slice that presents the same location (Fig. 2(a)). Initially, all the grid elements of the mock-up are associated with the same initial value equal to 2. On the image slice, the starting pixel must belong to the background, e.g., the pixels in the corners, since the subject’s body is typically located in the centre of the image. Once the starting pixel has been selected, we check if it belongs to the background through its intensity level in the original slice. A pixel is considered background if its intensity remains under the skin isovalue in the image. In contrast, if its intensity is above the isovalue, it indicates that we encountered the body edge. If the pixel belongs to the background, the corresponding element in the mock-up grid is associated with 0, and the pixel is marked as visited. Then, we select its neighbouring pixels, which initialise the list of pixels to be subsequentially visited (Fig. 2(b)). Iteratively, we check if the first pixel of the list belongs to the background. If it belongs in the background, we proceed as before and update the list by removing the just visited pixel, marking it as visited and adding its neighbourhood. Pixels that have already been visited before must not be inserted in the list again (Fig. 2(c)). If the pixel in the original image is above the skin isovalue, then we assign the value of 1 to the corresponding element in the mock-up grid. Moreover, we do not insert the pixel’s neighbourhood in the list (Fig. 2(d)). At the end of the process, the mock-up grid presents a value of 0 in the background, a value of 2 inside the subject’s body (i.e. the initial value of all the elements in the mock-up grid) and 1 in correspondence with the skin. The same procedure is applied to all the slices in the volume. Then, the segmented volume undergoes the marching cube algorithm [LC87] to extract a 3D surface mesh of the segmented skin, representing the input for the co-registration phase to match the MRI and the 3D surface acquired by the camera. To improve the segmentation, we add padding around each slice coloured with the minimum value appearing in the image to guarantee that it will be considered background. In this way, the algorithm proceeds through the padding pixels toward the slice’s end. The padding is helpful in case the MR/CT bed has been acquired with the patient. To reduce the overall computational time for skin segmentation, we can sub-sample each slice and each set in the case of high-resolution MR/CT images. The intensity value for the skin, i.e. the iso-value needed as an input parameter to the segmentation algorithm, is easily retrievable according to the specification of the MR/CT acquisition machine. Indeed, usually, each manufacturer has standard values for each imaging modality.

Skin co-registration

To align the MR/CT image with the US probe, the 3D surface acquired by the camera, which is in the same reference system as the US probe and of the magnetic tracking, is rigidly co-registered with the patient skin segmented from the MR/CT images. The output of the co-registration is a translation vector and a rotation matrix, which co-register the segmented surface to the 3D surface acquired by the camera (i.e., the Intel RealSense in the experimental setup), minimising the corresponding misalignment. The co-registration takes the segmented surface extracted from the anatomical images (MRI/CT), the 3D surface acquired by the camera and a reference virtual landmark as inputs. The 3D surface must be acquired by the 3D camera with a frontal view, following the guidance provided by the camera to minimise acquisition errors. The segmented surface is oriented consistently (i.e., head-feet, right-left) to avoid errors related to body symmetries. The operator manually selects one corresponding landmark point on each input surface to align the two surfaces through an intuitive interface. Thus, the landmark is exclusively virtual and does not require any external physical placement. A pipeline composed by orientation adjustment through Principal Component Analysis (PCA), surface sub-region selection and tuning (region of interest), and various coregistration refinements leveraging the Iterative Closest Point algorithm (ICP) allows for accurate alignment of the segmented surface with the surface acquired by the camera (Fig. 3). Then, the computed roto-translation is applied to the volumetric data to put the MR/CT images in the same reference system of the US probe, thus enabling the fusion of the MR/CT image with the US image since the tracking system tracks both the 3D camera and the US probe. This result will allow the radiologist/surgeon to navigate the MR and US images simultaneously during the US examination or preoperatively. To optimise the time to rigidly register the segmented surface with the 3D surface acquired by the camera, the segmented surface is cut to get only the front part of the body. This way, the relevant part of the segmented surface undergoes the registration. To cut the surface, we consider the angle between the normal at each surface vertex and the sagittal axis: the vertices associated with an angle smaller than 90°are selected as part of the front surface.

Visualising the registration error between the segmented and acquired skin (Fig. 4) gives valuable insight to assess whether acquiring a more accurate surface from the camera is necessary to improve the co-registration or if the (co-registration) results are accurate enough. The co-registration error between the segmented skin and the skin acquired by the camera is computed as the Hausdorff distance between the co-registered surfaces. Calling the segmented surface $\mathbf{X}_{1}$ and the 3D surface acquired by the camera $\mathbf{X}_{2}$ we identify co-registration error by computing their Hausdorff distance $d(\mathbf{X}_{1},\mathbf{X}_{2}):=\max\{d_{\mathbf{X}_{1}}(X_{2}),d_{\mathbf{X% }_{2}}(X_{1})\}$ . Where $d_{\mathbf{X}_{1}}\left(\mathbf{X}_{2}\right):=\max_{\mathbf{x}\in\mathbf{X}_{% 1}}\left\{\min_{\mathbf{y}\in\mathbf{X}_{2}}\left\{\left\|\mathbf{x}-\mathbf{y% }\right\|_{2}\right\}\right\}\leavevmode\nobreak\$ . The minimum distance is calculated using a Kd-tree structure. Higher co-registration errors present a higher Hausdorff distance. The distance distribution is mapped to RGB colours, and each vertex is assigned the corresponding colour according to its distance from the other surface. To better analyse the distance distribution in the relevant portion of the surface, vertices that present a distance equal or higher than 5mm are coloured as red. Vertices that have a null distance are shown in blue. The other distances are mapped to the shades between red and blue. If the error is located in areas relevant to the structure under analysis, it may prompt reconsidering the data acquisition process. Conversely, if errors are primarily present in regions not critical to the examination, the medical professional can confidently proceed with analysing the fused MR/CT and US images.

4 Experimental results and validation

We discuss the results on skin segmentation, the robustness of the skin co-registration to noise and selected parameters (e.g., HU value, virtual landmark), and the accuracy of the image fusion.

Skin segmentation

The segmentation (i.e., the voxel labelling) and mesh extraction have a computational cost linear to the number of voxels composing the volumetric image. Table 1 reports the timing of each algorithm step on an 11th generation Intel(R) Core(TM) i7-11700k 8 core. To better integrate the approach with existing clinical workflow, we tested its robustness to subsampling. Indeed, given the computational cost property of the method, even a light subsampling could drastically improve the performance in terms of time required for the segmentation to complete. Fig. 5 shows how skin segmentation remains clean and accurate given an image and its subsampled version. We subsampled a volume image by a factor of two in each direction. The only difference is in the resolution of the surface, which is a direct consequence of the lower resolution of the subsampled original image. To confirm the maintained accuracy of the segmentation at different image resolutions, we computed the distance distribution between the surfaces extracted from a volume image and its subsampled version. In this case, the higher distances correspond to those slices and pixels missing in the subsampled version of the image, and the value of the distance is coherent with the changed dimension of the voxels. The 3D skin segmentation, contrary to AI methods, does not require any training and consequently does not require a large data set or various acquisition of diverse imaging modalities, contributing to the method’s generality. The skin segmentation has been designed to be as general as possible regarding the anatomical area scanned (e.g., head, breast, total body and abdomen) and the acquisition modality (e.g., MR, CT, PET). We tested the quality of the segmentation on different anatomical volume images, such as CTs and MRIs (Fig. 6), obtaining satisfactory results.

Table 1: Computing time of 3D skin segmentation algorithm on various anatomical districts and imaging modalities.

Imaging	Volume	District	Volume	Segmentation	Skin extraction
Modality	size		reading		extraction
MRI T1	$260\times 52\times 72$	Abdomen	$3$ s	$28$ s	$4$ s
MRI T2	$184\times 256\times 30$	Abdomen	$1$ s	$5$ s	$0.2$ s
MRI	$384\times 384\times 50$	Breast	$3$ s	$45$ s	$3$ s
MRI T2	$384\times 384\times 46$	Breast	$2$ s	$42$ s	$2$ s
CT	$512\times 512\times 247$	Whole body	$2$ s	$304$ s	$16$ s
CT	$256\times 256\times 160$	Head	$1$ s	$46$ s	$1$ s

Co-registration

The co-registration experimental tests were performed on both the phantom of the abdomen and real subjects. The error on a phantom is related mainly to the camera position, the acquired area’s limited dimension, and its symmetric shape. The accuracy inside the volume with the phantom tests is satisfactory since the error remains within an acceptable range (i.e., under 1 cm), and the physician can quickly correct the smaller error through a fast tuning performed manually to produce a more accurate result (if needed). To verify the robustness of the co-registration, the skin surface was captured by placing the 3D camera at different distances from the skin (Fig. 7). The co-registration remains stable against the acquisition distance. At very high distances, the noise acquired by the camera increases, confirming the algorithm’s robustness to the noise and symmetries, which are typical of the phantom but unlikely in real subjects. Fig. 8 shows the co-registration on CT/MRI images of real subjects. We tested how much tilting the 3D camera at various angles during the acquisition would affect the co-registration. From the camera’s specifications, the ideal acquisition angle is 45 degrees to avoid distortions in the acquired 3D surface. Fig. 9 shows the co-registration when the camera is tilted by 45 degrees, which has the best co-registration due to the higher quality of the acquired surface, and when the camera view is perpendicular to the patient’s surface. The additional noise with a view angle of 45 degrees does not interfere with the co-registration, which includes the automatic identification of regions of interest both on the segmented surface and the 3D surface acquired by the camera. This region of interest identifies correspondent regions on the two surfaces and notably reduces the inclusion of noise from the 3D camera. To verify the influence of the selection of corresponding virtual landmarks on the co-registration, we select slightly displaced markers along the $X$ , the $Z$ axes and a diagonal ( $X$ and $Y$ displacement) directions and measure the changes of the angles between the corresponding co-registration matrices. Considering increasing displacements on the landmark selection, significant differences in the rotation angles were not found (Fig. 10). The robustness of the co-registration to a misplacement of the reference virtual landmarks confirms that the algorithm will not be user-dependent, i.e., the users can apply different approaches to select the landmark.

Phantom tests

The accuracy of the US/MR image fusion has been tested on an abdominal phantom CIRS Model 057. The phantom tests have been conducted on the CT (Fig. 11) and MR (Fig. 12) images. The skin surface has been acquired by segmenting the CT acquisition of the phantom. In contrast, the 3D surface obtained by the camera was captured by moving the camera around the phantom to simulate a better possible clinical configuration, where the EM transmitter and camera are placed around the patient bed. The accuracy result is better in the 0 degrees and 180 degrees with respect to the lateral one (90-180). In the worst-case scenario, the accuracy error varies from 4.3mm to 13 mm.

5 Conclusions and future work

This paper presents a method for fusing volumetric anatomical images (MRI/CT) with US images through a 3D depth sensor. The main novelty in the fusion between the two images is the co-registration between the skin surface extracted from the volumetric image and the skin surface acquired by the 3D camera. This co-registration, together with the magnetic tracking system and the 3D sensors placed on the probe and camera, allows the fusion of the MRI/CT image with real-time US acquisitions without using external physical markers. The co-registration has satisfactory accuracy and robustness to noise, virtual landmark misalignment, camera acquisition distance, and camera tilting during the acquisition phase.

The precision achieved on tests of the complete system integrated within a US system is of the order of the millimetre. In some cases, the dataset has limitations due to the shape (e.g., symmetry of the phantom, anisotropic room illumination).

Future work will focus on reducing the computational time by sub-sampling the volumetric image and through the acquisition of the patient’s skin by a 3D camera with a lower resolution. Skin reference has demonstrated great relevance in many other applications such as on breast tissue, vessels and blood evaluation [JGM ${}^{+}$ 13, WPI ${}^{+}$ 12], [LCCC18], or in the neurological field for surgical navigation system optimisation and registration [AnA, BYM ${}^{+}$ 16], as well as on the abdominal district [WKK ${}^{+}$ 19, BYK ${}^{+}$ 12]. Thus, future works will focus on improving and developing the skin segmentation that, at the moment, provides promising results in accuracy and generality, as well as its visualisation for clinical applications such as surgical intervention planning. Moreover, we will focus on an augmented system to visualise the error and to represent possible misalignment in the volume image other than on the surface.

A potential avenue for further improvement in the coregistration involves exploring camera registration while partially dressing the patient. Further enhancement could include making the image fusion system independent from the patient’s position and breathing phase, presenting opportunities for continued refinement in future iterations.

Acknowledgments

This work has been supported by the European Commission, NextGenerationEU, Missione 4 Componente 2, “Dalla ricerca all’impresa”, Innovation Ecosystem RAISE “Robotics and AI for Socio-economic Empowerment”, ECS00000035.

References

[AGSL19] Yasuhiro Aoki, Hunter Goforth, Rangaprasad Arun Srivatsan, and Simon Lucey. Pointnetlk: Robust & efficient point cloud registration using pointnet. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7163–7172, 2019.
[AJKK ${}^{+}$ 12] Nadine Abi-Jaoudeh, Jochen Kruecker, Samuel Kadoury, Hicham Kobeiter, Aradhana M Venkatesan, Elliot Levy, and Bradford J Wood. Multimodality image fusion–guided procedures: technique, accuracy, and applications. Cardiovascular and interventional radiology, 35:986–998, 2012.
[AnA] An automatic algorithm for skin surface extraction from mr scans. https://cds.ismrm.org/ismrm-2000/PDF3/0672.pdf. (undefined 23/12/2023 11:11).
[BMWT16] Marc A Bjurlin, Neil Mendhiratta, James S Wysock, and Samir S Taneja. Multiparametric mri and targeted prostate biopsy: Improvements in cancer detection, localization, and risk assessment. Central European Journal of Urology, 69(1):9, 2016.
[BYK ${}^{+}$ 12] Thomas Baum, Samuel P Yap, Dimitrios C Karampinos, Lorenzo Nardo, Daniel Kuo, Andrew J Burghardt, Umesh B Masharani, Ann V Schwartz, Xiaojuan Li, and Thomas M Link. Does vertebral bone marrow fat content correlate with abdominal adipose tissue, lumbar spine bone mineral density, and blood biomarkers in women with type 2 diabetes mellitus? Journal of Magnetic Resonance Imaging, 35(1):117–124, 2012.
[BYM ${}^{+}$ 16] Richard Beare, Joseph Yuan-Mou Yang, Wirginia J Maixner, A Simon Harvey, Michael J Kean, Vicki A Anderson, and Marc L Seal. Automated alignment of perioperative mri scans: A technical note and application in pediatric epilepsy surgery. Technical report, Wiley Online Library, 2016.
[Dep] Depth camera d415 – intel® realsense™ depth and tracking cameras. https://www.intelrealsense.com/depth-camera-d415/. (Accessed on 03/09/2023).
[GvdAB ${}^{+}$ 16] Maudy Gayet, Anouk van der Aa, Harrie P Beerlage, Bart Ph Schrier, Peter FA Mulders, and Hessel Wijkstra. The value of magnetic resonance imaging and ultrasonography (mri/us)-fusion biopsy platforms in prostate cancer detection: A systematic review. BJU international, 117(3):392–400, 2016.
[JGM ${}^{+}$ 13] Michael Jermyn, Hamid Ghadyani, Michael A Mastanduno, Wes Turner, Scott C Davis, Hamid Dehghani, and Brian W Pogue. Fast segmentation and high-quality three-dimensional volume mesh creation from medical images for diffuse optical tomography. Journal of biomedical optics, 18(8):086007–086007, 2013.
[LC87] William E Lorensen and Harvey E Cline. Marching cubes: A high resolution 3d surface construction algorithm. ACM Siggraph Computer Graphics, 21(4):163–169, 1987.
[LCCC18] Chia-Yen Lee, Tzu-Fang Chang, Nai-Yun Chang, and Yeun-Chung Chang. An automated skin segmentation of breasts in dynamic contrast-enhanced magnetic resonance imaging. Scientific Reports, 8(1):6159, 2018.
[Lee14] Min Woo Lee. Fusion imaging of real-time ultrasonography with ct or mri for hepatic intervention. Ultrasonography, 33(4):227, 2014.
[LLM ${}^{+}$ 19] Xinzhou Li, Yu-Hsiu Lee, Samantha Mikaiel, James Simonelli, Tsu-Chin Tsao, and Holden H Wu. Respiratory motion prediction using fusion-based multi-rate kalman filtering and real-time golden-angle radial mri. IEEE Transactions on Biomedical Engineering, 67(6):1727–1738, 2019.
[MHvN ${}^{+}$ 23] Bruno Madore, Aaron T Hess, Adam MJ van Niekerk, Daniel C Hoinkiss, Patrick Hucker, Maxim Zaitsev, Onur Afacan, and Matthias Günther. External hardware and sensors, for improved mri. Journal of Magnetic Resonance Imaging, 57(3):690–705, 2023.
[MS10] Andriy Myronenko and Xubo Song. Point set registration: Coherent point drift. Transactions on Pattern Analysis and Machine Intelligence, 32(12):2262–2275, 2010.
[OMB ${}^{+}$ 19] Julien Ognard, Jawad Mesrar, Younes Benhoumich, Laurent Misery, Valerie Burdin, and Douraied Ben Salem. Edge detector-based automatic segmentation of the skin layers and application to moisturization in high-resolution 3 tesla magnetic resonance imaging. Skin Research and Technology, 25(3):339–346, 2019.
[QSMG17] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 652–660, 2017.
[RCM ${}^{+}$ 97] Anand Rangarajan, Haili Chui, Eric Mjolsness, Suguna Pappu, Lila Davachi, Patricia Goldman-Rakic, and James Duncan. A robust point-matching algorithm for autoradiograph alignment. Medical Image Analysis, 1(4):379–398, 1997.
[Sak02] Georgios Sakas. Trends in medical imaging: From 2d to 3d. Computers & Graphics, 26(4):577–587, 2002.
[SGL ${}^{+}$ 20] Francesco Santini, Laura Gui, Orane Lorton, Pauline C Guillemin, Gibran Manasseh, Myriam Roth, Oliver Bieri, Jean-Paul Vallée, Rares Salomir, and Lindsey A Crowe. Ultrasound-driven cardiac mri. Physica Medica, 70:161–168, 2020.
[SPR ${}^{+}$ 18] Marco Solbiati, Katia M Passera, Alessandro Rotilio, Francesco Oliva, Ilaria Marre, S Nahum Goldberg, Tiziana Ierace, and Luigi Solbiati. Augmented reality for interventional oncology: Proof-of-concept study of a novel high-end guidance system platform. European Radiology Experimental, 2:1–9, 2018.
[vHBW ${}^{+}$ 21] Felix von Haxthausen, Sven Böttger, Daniel Wulff, Jannis Hagenah, Verónica García-Vázquez, and Svenja Ipsen. Medical robotics for ultrasound imaging: Current systems and future trends. Current Robotics Reports, 2:55–71, 2021.
[WKK ${}^{+}$ 19] Alexander D Weston, Panagiotis Korfiatis, Timothy L Kline, Kenneth A Philbrick, Petro Kostandy, Tomas Sakinis, Motokazu Sugimoto, Naoki Takahashi, and Bradley J Erickson. Automated abdominal segmentation of ct scans for body composition analysis using deep learning. Radiology, 290(3):669–679, 2019.
[WPI ${}^{+}$ 12] Lei Wang, Bram Platel, Tatyana Ivanovskaya, Markus Harz, and Horst K Hahn. Fully automatic breast segmentation in 3d breast mri. In 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI), pages 1024–1027. IEEE, 2012.
[WQT ${}^{+}$ 17] Yunzhi Wang, Yuchen Qiu, Theresa Thai, Kathleen Moore, Hong Liu, and Bin Zheng. A two-step convolutional neural network based computer-aided detection scheme for automatically segmenting adipose tissue volume depicting on ct images. Computer Methods and Programs in Biomedicine, 144:97–104, 2017.
[WZ17] Fang Wang and Zijian Zhao. A survey of iterative closest point algorithm. In 2017 Chinese Automation Congress (CAC), pages 4395–4399. IEEE, 2017.
[YDK ${}^{+}$ 15] Minglei Yang, Hui Ding, Jingang Kang, Lei Zhu, and Guangzhi Wang. Subject-specific real-time respiratory liver motion compensation method for ultrasound-mri/ct fusion imaging. International Journal of Computer Assisted Radiology and Surgery, 10:517–529, 2015.

(a)	(b)	(c)	(d)
(e)	(f)	(g)	(h)

US & MRI Image Fusion Based on Markerless Skin Registration

Abstract

1 Introduction

2 Related work

Fusion imaging

Skin segmentation

3D Rigid registration

3D Sensors for medical applications

3 MR &\&& US fusion system

Skin segmentation

Skin co-registration

4 Experimental results and validation

Skin segmentation

Co-registration

Phantom tests

5 Conclusions and future work

Acknowledgments

References

3 MR $\&$ US fusion system