Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Oral and Maxillofacial Surgery https://doi.org/10.1007/s10006-018-0719-5 ORIGINAL ARTICLE A novel noise filtered and occlusion removal: navigational accuracy in augmented reality-based constructive jaw surgery Bijaya Raj Basnet 1 & Abeer Alsadoon 1 & Chandana Withana 1 & Anand Deva 2 & Manoranjan Paul 1 Received: 16 March 2018 / Accepted: 28 August 2018 # Springer-Verlag GmbH Germany, part of Springer Nature 2018 Abstract Purpose Augmented reality-based constructive jaw surgery has been facing various limitations such as noise in real-time images, the navigational error of implants and jaw, image overlay error, and occlusion handling which have limited the implementation of augmented reality (AR) in corrective jaw surgery. This research aimed to improve the navigational accuracy, through noise and occlusion removal, during positioning of an implant in relation to the jaw bone to be cut or drilled. Method The proposed system consists of a weighting-based de-noising filter and depth mapping-based occlusion removal for removing any occluded object such as surgical tools, the surgeon’s body parts, and blood. Results The maxillary (upper jaw) and mandibular (lower jaw) jaw bone sample results show that the proposed method can achieve the image overlay error (video accuracy) of 0.23~0.35 mm and processing time of 8–12 frames per second compared to 0.35~0.45 mm and 6–11 frames per second by the existing best system. Conclusion The proposed system concentrates on removing the noise from the real-time video frame and the occlusion. Thus, the acceptable range of accuracy and the processing time are provided by this study for surgeons for carrying out a smooth surgical flow. Keywords Augmented reality navigation . 3D-2D matching . Image registration . Occlusion handling . Noise removal Introduction Corrective jaw surgery can be defined as a surgical procedure that is performed on jaw bones. Corrective jaw surgery is performed to correct the dental misalignment. The corrective jaw surgery could contain various surgical procedures such as drilling, cutting, resection, and implantation. The main problem of this surgery is the limited viewing space in the mouth of the patient [1]. There is always a high risk of surgeons damaging the nerve channels or tooth root during dental surgery [1, 2]. The traditional method of performing the jaw surgery used the CT scan to report to plan the surgical procedure manually by the surgeons [3]. Surgeons were required to identify the nerve channel and root canals manually with the use of * Chandana Withana cwithana@studygroup.com 1 School of Computing and Mathematics, Charles Sturt University, Sydney Campus, Sydney, Australia 2 Faculty of Medicine and Health Sciences, Macquarie University, Sydney, Australia the CT scan report [3]. Due to the limitations such as difficulty in identifying the nerves and accurate position of drilling in the surgical procedure using the traditional method, a 2D virtual video-guided system was developed which helped the surgeon by displaying virtual video on the monitor and then augmented reality has emerged as the latest technology in medical surgery [3]. Figure 1 shows the traditional, videoguided, and augmented reality-based surgery. Augmented reality (AR)-based surgery uses both virtual images from the pre-surgery and the real-time image during surgery to create the augmented view for the user [4]. Augmented reality-based surgeries superimpose the virtual jaw onto the real jaw during surgery which provides the surgeon with the 3D view in real time. Augmented reality provides the surgeons in the surgical environment with more realistic and intuitive information during surgery which can guide the surgeons during the surgical procedure [4]. It provides the surgeon with the information about the cutting lines and drilling position in the jaw bone and also helps to find the nerve channel and location of disease [3]. The 3D view is provided by augmented reality by superimposing the various virtual images onto the real-time Oral Maxillofac Surg Fig. 1 a Traditional surgery. b Video guided. c AR guided. These images are downloaded using Google search engine; the image is free to use, share or modify, even commercially images [4, 5]. Augmented reality has been providing a huge benefit in the medical field. Augmented reality is generally used in the surgery of the complicated and sensitive areas like the heart, kidney, brain, pelvis, breast, arteries, and jaw, but its implementation has been limited in the jaw surgery. Numerous research has been conducted in the past and present in the field of corrective jaw surgery. AR in jaw surgery has been facing various limitations such as image registration, occlusion, noise in real-time images, high processing time, and poor occlusion handling which has limited the implementation of AR in corrective jaw surgery [4]. Hence, 3D view accuracy and processing time plays a vital role in augmented reality-based surgery. The best system should be able to provide better accuracy, low processing time, and better occlusion handling capacity. It is necessary to provide the surgeons with accurate real-time navigational guidance for higher precision and accuracy in surgery through accurate object tracking, navigation, and real-time registration process [6]. In the current context of augmented reality technologies in the medical sector, video-based display, see-through display, and projection-based display are the main categories of augmented reality technologies [4]. This paper aims to improve the accuracy of the real-time video accuracy by removing the noise in the real-time video caused by a range of factors such as machine vibration, camera movement, and image sensors and also by removing the occlusion caused by surgical instruments, the surgeon’s hands, etc. The noise removal is necessary for augmented realitybased constructive jaw surgery because the noise deteriorates the image edges which impacts negatively on image registration, navigation, and image overlay. The features of the modified kernel non-local means (MKNLM) filter are used to denoise the real-time video images. This feature is used in the de-noising process because this filter is less sensitive to outliers and produces the constant regular results while the other filters are outlier sensitive and tend to produce incorrect results. The tracking-learning-detection (TLD) cannot handle the occlusion and eventually fails if the occlusion is present [7]. This will require re-initialization of the TLD. Failure of the TLD eventually results in image registration failure. This research proposes a new TLD system with depth mappingbased occlusion removal to improve the image tracking and image registration (overlay). A significant body of research exists that focuses on increasing accuracy and lowering processing time in augmented reality-based surgery. [8] proposed a portable surgical navigation device and technique to reduce the bone resection error. This solution proposes a resection plane that automatically computes the resection margin with an error of 1.02 mm. This solution has used markers but failed to consider the deformities caused by the patient’s movement during surgery. [9] proposed a projection-based augmented reality solution to eliminate the necessity of monitoring several display monitors and co-ordinates during surgery. They proposed a technique for projecting the pre-surgical images (virtual image) onto the body of the patient but did not improve the accuracy (range 1.4–7.4 mm) and failed to consider the patient’s movements and the occlusion present during surgery. Therefore, these solutions do not provide possibilities for further improvement. [10] proposed the concept of a differential map to determine the shape changes during the bone tumor resection surgery to allow the surgeons to visualize the remains of a tumor to be resected (cut) and provide depth information through a graphical overlay. However, the researcher has used fiducial markers which could change their position with the movement of the patient. Furthermore, the author has not considered the occlusion that is present due to surgical tools and blood. [11] presented an optical see-through head-mounted display-based augmented reality for navigation to improve accuracy and reliability using an optical tracking system and surface-based registration. The solution improved the accuracy but was not able to address the processing time. A further limitation is that the latency occurrence in an anatomical structural movement which decreases the real-time performance of the system was not addressed. In addition, the weight of the head-mounted display could cause problems for the surgeon during long surgeries. Thus, these solutions offer no major possibilities for improvement, either in accuracy or in processing time. Oral Maxillofac Surg [12] conducted a study to evaluate the navigational accuracy of implants in an augmented reality-based navigational system for zygomatic implant placement. The study concluded that the real-time navigation-based surgery demonstrated higher accuracy but did not consider the presence of saliva and blood that could cause occlusion. Furthermore, deviations in accuracy analysis that could influence the implant failure, generally caused by the invasion of other anatomical structure, were not taken into consideration. [13] proposed a method to track the patient-specific 3D-printed implant during the intraoperative placement process with the use of point-based [14] and surface-based registration [15]. This solution was able to increase the accuracy of implant placement but ignored the deformities caused by soft tissues, patient movement, and noise from the breathing of the patient while registering the patient’s body position on the 3D image set. Thus, these solutions offer no major possibilities for improvement in accuracy or processing time. [4] proposed a marker-less registration solution with the use of a stereo camera and a half-silvered mirror for depth perception. Even though the burden of marker usage was eliminated, this solution failed to improve the processing time as integral videography has high processing time. The researchers also did not address the impact of blood and other fluids which could cause inaccuracy in contouring and decrease the registration accuracy. [1] have also proposed a solution with a stereo camera and half-silvered mirror for tracking of the surgical instruments, patients’ movements, contours, and ICP (iterative closet point) for patient-image registration. However, the proposed framework still has issues in the initial registration process where there are chances of errors which could lead to surgical inaccuracies and inconsistencies. Furthermore, the use of a stereo camera which has to be re-calibrated and maintained for high accuracy causes difficulties in daily clinical use. Thus, these solutions offer no major possibilities for improvement in accuracy or processing time. [16] proposed a marker-less registration system that enables AR visualization by projecting the CT image directly onto the real patient’s body. The author has used kinetic surface segmentation, a two-phase registration process (initial and fine registration), color image fusing, and CT data for the AR view. However, repetitive initial registration (manual registration) is required in case of movement of the object or the camera. Furthermore, the solution was developed for the forensic field which means that it works for non-deforming objects. In addition, accuracy and the processing time of this solution are relatively higher than that in any of the other solutions presented. Thus, this method offers no major possibilities for improvement in accuracy or processing time. [6] also conducted research into the use of stereo cameras and a translucent mirror with the use of a 3D calibration model in integral imaging to remove the initial registration error and display undistorted 3D images. However, even though the processing time has improved in this solution, accuracy remains unchanged, with an additional limitation arising from a lack of consideration of occlusion. Thus, this solution offers no major possibilities for improvement in accuracy or processing time. [17] proposed wafer-less maxillary positioning with the help of interactive IGV (image-guided visualization) display complemented surgical navigation that can offer an alternative approach to the use of arbitrary splints and 2D orthognathic planning. However, this model did not reduce the surgical time which was high due to setting up the technical and recording process. Thus, this solution offers no major possibilities for improvement in accuracy or processing time. [18] introduced a video see-thorough system that uses a hierarchy of images, TLD tracking (frame to frame) proposed by [7], and iterative closest point (ICP) developed by [19] for 3D pose refinement. Ulrich’s method [20] is used for initial registration. The bounding box is used for object tracking which reduces object matching time through limiting of the search area and iterative closest point (ICP) is used to refine the 3D pose for higher registration accuracy [21, 22]. Limitations arise from the fact that the solution has failed to address depth perception and occlusion present in surgical procedures due to the presence of surgical tools and blood. Thus, this solution offers no major possibilities for improvement in accuracy or processing time. [3] proposed a rotational matrix and translation vector algorithm to improve the geometric accuracy in oral and maxillofacial surgery in the Wang model [18]. This solution has addressed the depth perception using two stereo cameras. Similar to the Wang model, this solution uses an aspect graph to create multiple models to be matched in real time. Tracking-learning-detection developed by [7] is used to track the object in the video frame with the use of a bounding box which decreases the search area. Ulrich’s method is used for initial registration and enhanced ICP [3] is used for final pose refinement with the use of a novel rotational matrix and translation vector algorithm that improves the geometric error. This system reduced the overlay error to 0.30~0.40 mm and processed10–13 frames per second. However, this system failed to consider the time consumed due to the use of the 3D stereo camera, noise in real-time video due to machine vibration, patient movements, image sensors, and also the occlusion caused by the surgical tools and the surgeon’s body parts, as well as blood, etc. The noise issue was addressed with the use of the modified kernel non-local means (MKNLM) filter by [23]. There are various other noise removal filters, but this filter is less sensitive to outliers and provides more consistent results when compared to the other filters [23]. Addition of this feature to the model mentioned above can improve the image registration accuracy and hence this feature is a significant addition to improve the quality of the system. Oral Maxillofac Surg The model proposed by [3] addresses the accuracy and processing time and has lower image overlay error and processing time in comparison to other proposed systems. This research is focused on this model to improve the results produced for a better-augmented reality view. This paper works on the model proposed by [3] and particularly focuses on the tracking-learning-detection (TLD) algorithm called Tracking. This paper illustrates that better results can be achieved by removing the occlusion caused by surgical tools, the surgeon’s body part, and blood, etc. during surgery. The paper is organized into three parts. The first part contains a “System Overview” that discusses the current best model proposed by [3]. It also includes the description of the proposed system, the associated flowchart, and pseudocode for the proposed formula. The second part discusses “Results” where the proposed system is tested with a range of samples from maxillary and mandible jaw bones. This is Fig. 2 State-of-the-art AR System followed by a “Discussion” and comparison between the results of the state-of-art and the proposed system results and a conclusion is provided. System Overview State of the art This section describes the current state-of-art solution (Fig. 2) with limitations (highlighted in red—Fig. 2). The model proposed by [3] provides a better image overlay with the use of a rotational matrix and a translation vector (RMaTV) algorithm. This system has higher accuracy through a lower image overlay error (0.35~0.45 mm) and the best processing speed of 10– 13 frames per second. The model is divided into pre-operative, intra-operative, and pose refinement phases (Fig. 2). Oral Maxillofac Surg Pre-operative environment The pre-operative planning of the surgery is done with the use of the CT image of the patient which is segmented and an aspect graph (hierarchy of the model) is created in the offline phase as shown in Fig. 2. This permits matching of the different models of the segmented CT scan in the online phase against the real-time video frame. Intra-operative environment Two 3D stereo cameras are used for capturing the real-time surgical video with the translucent mirror for visualizing the augmented reality view. Video frames are generated from the real-time video and the hierarchy of the video frame is created based on its resolution. The image with the lowest resolution is used for tracking and detecting the region of interest (ROI). However, this solution did not consider the need for regular re-calibration and maintenance of the stereo camera to maintain performance at levels of high accuracy which is not possible in a real-time scenario. Furthermore, this solution failed to consider the processing time required to convert the 3D stereo video image frames to 2D image video frames for tracking of the object of interest using the tracking-learningdetection (TLD) algorithm [18]. Further limitations come from the fact that the need for a strict viewing angle for the stereo camera was not considered which may result in image overlay inaccuracies, if the correct viewing angle is not achieved [18]. In addition, the solution failed to consider the noise present in the real-time video due to vibrations from the machinery and optical sensors. This noise results in a deterioration of the image edges and may also lead to contour leakage which would then negatively affect the image overlay and registration accuracy. The tracking-learning-detection algorithm (TLD) is used for tracking the region of interest (surgical area). The TLD uses a bounding box to match the object of interest with the aspect graph created during the offline phase. The search for the bounding box is carried out from the top level of the video frame (lowest resolution) to the lowest level of the video frame from the hierarchy (highest resolution). Once the match is found, the 2D image is overlaid onto the real-time video creating an accurate 2D model. The initial registration is performed using a method proposed by [20] also known as “Ulrich’s method” which uses shape similarity matching and online matching [18]. After the initial registration, the ICP (iterative closest point) is used for post-refinement for achieving an accurate 3D model. A rotational matrix and a translation vector (RMaTV) algorithm are used to remove the geometric error proposed by [3]. The refined 3D model along with the real-time video is projected onto the translucent mirror creating an augmented reality view for the surgeon. Tracking an object can be defined as the estimation of displacement of the object between the two-image frames [7]. Tracking is necessary as failure to track the object of interest results in an incorrect image overlay of pre-surgical and intrasurgical images. The quality of the image overlay depends on how well the object of interest has been tracked. First of all, the object needs to be tracked and then detected before the image overlay and registration can take place. The TLD uses Lukas– Kanade median flow (LKMF) for tracking an object of interest (Kalal et al., 2012. [7]). It uses object feature point flow estimation for tracking (Fig. 3). The TLD uses LKMF tracker with failure detection features that detect the TLD failure based on the median displacement of the feature points being tracked. A TLD failure is established if the median displacement of the object feature point is greater than the threshold (Fig. 3). This tracker is highly susceptible and prone to occlusion. Lukas– Kanade Median flow tracker fails once the object gets occluded because it cannot track the feature point of an object of interest and computes the median displacement as greater than a threshold which results in failure of the TLD. With the use of this tracker, the model achieved an accuracy of 0.35~0.45 mm but fails when an occlusion occurs because it cannot compute and track the feature point in the object once the object is occluded. The Lucas-Kanade median flow tracker is presented in Table 1 and the flowchart in Fig. 3. The Lucas-Kanade median optical flow for tracking is calculated as Eq. 1: V ¼uþd ð1Þ V = final location, u = image point in 2D in first image frame, d= image velocity vector (optical flow) that reduces residual function calculated as Eq. 2:   ∈ d ; ¼ ∈ d x; d y ¼ ux þωx uy þωy ∑ ∑ x¼ux −ωx y¼uy −ωy  I ðx; yÞ− J x þ d x ; y þ d y 2 ð2Þ dx,dy = x and y point of optical flow matrix, ∈ = residual function, ux, uy = image point at u, ωx, ωy = two integers, I, J = two gray scaled image, I(x, y) = gray scaled image I at point X(x, y). RMaTV algorithm-based pose refinement A rotational matrix and a translation vector algorithm (RMaTV) proposed by [3] are used to eliminate the geometric error with the help of rotational and translation vectors. ICP used to register the images produces higher image overlay accuracy with the use of the RMaTV algorithm. The RMaTV algorithm helps to eliminate the estimation of the wrong pose hence improves the image overlay accuracy. Oral Maxillofac Surg Fig. 3 Tracking in TLD using Lukas–Kanade optical flow tracker Start Select First Image frame(I) from hierarchy of images Select Feature Points Calculate the optical flow and mean displacement Display Bounding N Is mean displacement > the threshold? Calculate the position of the feature point in next image frame (J) Ye Occlusion has occurred Tracking-Learning-Detection fails Proposed solution A range of techniques and models from existing augmented reality-based surgery have been analyzed and reviewed in depth to identify strengths and weaknesses. The main Table 1 problems relating to the augmented reality-based surgery are accuracy, processing time, noise, and occlusion handling. Most models have primarily focused on accuracy and processing time and, to the lesser extent, on noise removal and occlusion handling. [3] model has been selected as the base model Lukas–Kande optical flow tracker Algorithm: Lucas- kanade method to track the object of interest Input: Two images frames image1(I) and image2 (J) from the hierarchy of images created. Image1 (I)=current image frame in TLD where feature point is tracked Image2(J)=Next image frame the feature point is to be tracked Output: Optical flow (d) which is the estimated displacement of feature point between two images i.e., from image (I) to image (J) BEGIN Step 1: First of all, the image hierarchy are created Step 2: Select the feature point to be tracked in the object of interest. Step 3: Get the 2d image position (feature point) of the point in image I (u). Step 4: calculate the optical flow (d) and mean displacement of each co-ordinates of the point. Step 5: Check if the median displacement is greater than the threshold Step 6: If the median displacement is greater than the threshold, discard the point else Calculate the new position of the image point in the image J; = + Step 7: Apply step 3-4 on each feature point selected. END Oral Maxillofac Surg for the proposed solution which includes a range of features from the base model. In addition, it proposes a noise filtered video frame and occlusion removal based on an enhanced TLD algorithm to overcome the noise and occlusion problem in augmented reality-based corrective jaw surgery. This has improved the tracking of the jaw which in turn improves the image registration through better tracking and detection of the region of interest (jaw). Furthermore, features from the second-best solution were adapted to improve the processing time through a highperformance optical camera as shown in Fig. 4 [18] eliminating the need for regular re-calibration and maintenance of the stereo camera (Fig. 2). This also improves the processing time by removing the image conversion process of the 3D stereo video images frame to a 2D image video frame used in TLD (Fig. 4). Using the optical camera will improve the viewing angle for the surgeon by not limiting the view of the object of interest (jaw) to only one defined angle. The optical camera captures the real-time surgical video with a single highdefinition camera and follows the remaining state-of-the-art solution. Fig. 4 The proposed AR solution We propose an enhanced video frame with noise removal and an enhanced TLD with an occlusion removal system to remove noise in real-time video frames and occlusion in the tracking and detecting phase. This will improve the tracking accuracy and also the augmented accuracy by reducing the overlay error to 0.23~35 mm compared to 0.35~0.45 mm. The processing time was improved from 6~11 frames per second to 8~12 frames per second. TLD algorithm history TLD also known as tracking-learning-detection was developed by [22]. The recorded 2D video, in the intra-operative phase, needs to be sent through a TLD algorithm to find and segment the exact location of the surgical area with the help of bounding box tracking to reduce the search area and speed up the process. However, TLD cannot handle the full occlusion and movement of the object of interest from a frame and terminates when these two factors occur [7]. The TLD has in recent years undergone significant re-development to improve it, yet the TLD algorithm remains a major subject for research. Oral Maxillofac Surg The TLD algorithm has three parts: tracking, learning, and detection. Area of improvement The proposed modification focuses on the first stage of TLD—tracking, to improve tracking accuracy of the object of interest for image registration. So far, the TLD algorithm has failed if the object of interest is occluded or exits the video frame, unable to deal with the loss of the object of interest, proposing the wrong estimate of the position of the object of interest and finally failing. The TLD algorithm requires re-initiation after failure because feature points get lost once the tracker fails and needs to be assigned again to track. This slows the algorithm process as a search for feature points takes longer when the object is occluded. The proposed system consists of three major parts as shown in Fig. 4 also known as the pre-operative environment, intra-operative environment with noise removal, and tracking with occlusion removal and RMaTV algorithm-based pose refinement. Pre-operative environment In this pre-operative environment, a CT scan of the patient is taken and segmented. A hierarchy of the segmented image model is created so that the real-time images can be matched with these aspect graph images from different angles and perspectives. The CT scan is used because it is superior to other medical images in providing details and information about bones and nerves. Intra-operative environment using the optical camera An optical camera is used to capture real-time videos during surgery. This reduces the processing time by eliminating the need to convert the 3D video frames into 2D video frames that are used in TLD tracking. It also eliminates the necessity of regular re-calibration and maintenance of the stereo camera (Fig. 4). Furthermore, it improves the processing time by removing the image conversion process of the 3D stereo video images frame to 2D image video (Fig. 4). With the use of the optical camera, the viewing angle of the object of interest improves as it does not limit the viewing angle the way the optical camera does. When the video frames are generated from the video, they contain noise from the vibration of surgical machinery, the image sensors used, the sensors used to monitor the patient’s health condition, and also the movements of the patient. This noise deteriorates the quality of the image, especially the object edges present in the video which eventually affects the image registration accuracy. Thus, a modified kernel non-local means (MKNLM) noise filter proposed by [23] is used to eliminate the noise from the real-time video frames. MKNLM filter is a robust filter and removes the noise from the image frame because it is insensitive to outliers and produces accurate results on a consistent basis. The use of the MKNLM filter in the system removes the noise from the image frame and preserves the image edges from deterioration which plays a vital role in the image overlay process (Appendix). A hierarchy model of the image is created in 5 level images with respect to the resolution. The highest level of the image (lowest resolution) is sent to the TLD for tracking and detecting of the object of interest for online matching. The lowest resolution image is used in TLD to decrease the processing time. High-resolution images take more time to process in comparison with the low-resolution image. The TLD searches for the bounding box and matches it with the aspect graph from the offline phase. The bounding box is used to decrease the search area for an object of interest within the image frame and hence speed up the online matching. It also reduces the possibility of matching with an object that is outside the bounding box. This matching process continues until the lowest level image is found. However, if the bounding box cannot be found, the occlusion removal technique is applied which is described below. Occlusion removal using image reconstruction-based technique The TLD algorithm method of tracking and detecting an object of interest uses the feature point within the bounding box from the initial frame to track the same feature point in the next image frame with the help of a Lukas–Kanade median flow tracker [22]. This tracker fails if the object of interest is occluded. With the proposed solution, the failure of the tracker can be prevented by reconstructing the occluded object of interest as shown in Eq. 3 below. Proposed equation Image reconstruction-based occlusion removal uses a technique based on image pixel classification by [21] Image pixel classification output will be either the object of interest image or not. It is the key to achieve the high-quality image reconstructed after the occlusion has occurred. Input is the superposition of the shifted elemental image, center position to the image sensor, the number of the pixel from the object to reconstruct the number of the pixels that have been occluded, the weight of the image pixel in the range of [1,0] depending on the pixel belonging to the object of interest or not. Oral Maxillofac Surg Table 2 Filtered video frame and enhanced TLD with occlusion removal Algorithm: Proposed Filtered Video Frame and Enhanced TLD with Occlusion Removal Input: Two Image Frame, image1(I) & image2 (J) Output: Noise free and occlusion removed image frame BEGIN Step 1: Get the lowest resolution image frame from the hierarchy of image Step 2: Apply MKLMN noise removal filter to the image frame. Step 3: Get the 2d image position (feature point) of the point in the image I (u). = ( , ), where =image point, x= image point position in x-axis and y= image point position in y-axis Step 4: calculate the optical flow (d) and mean displacement of each co-ordinates of the point. Step 5: Check if the median displacement is greater than the threshold |>10 pixel (threshold). If | − Where =displacement of single pint in optical flow, = mean displacement Step 6: If the median displacement is greater than the threshold, apply occlusion removal algorithm 1 + 1 , + 1 + 1 , + 1 Step 6: Calculate the new position of the image point in the image J = + where V= new position for the feature point; u=original position of the feature point in Image I d= optical flow of the feature point Step 7: Repeat Step 3-7 for each feature point The weight of the image pixel is classified into 1 and 0 to eliminate the use of any pixels from the occluding object as the use of this pixel will result in wrong image reconstruction of the occluded object of interest leading to a higher image overlay error. The pixel is determined as the object pixel if the statistical variance of the pixel is greater than the threshold and hence the weight of the pixel is determined as 1. For each of the pixel position that has been occluded that belongs to the object of interest, the following equation can be used to reconstruct the occluded image (J) hence removing the occlusion as Eq. 3. J¼ 1 N M ∑ ∑E H i¼1 j¼1    1 1 1 xþ Cx; y þ C y *W x þ M0 M0 M0 Cx; y þ The following Eq. 4 is used to calculate the number of pixels relating to the object using the technique proposed by [21].   N 1 1 H ¼ ∑ W xþ ð4Þ Cx ; y þ Cy M0 M0 i¼1 where, W=weight in [1,0], 1 if the pixel belongs to object else 0; (x, y) = point belonging to object; i= ith column of the sensor; j= jth column of the sensor; N, M = sample intensities of the point. Equation 5 is used to calculate the weight of pixel in terms of 0 and 1  w ¼ 1 if v < t else 0 ð3Þ where v is the statistical variance of the pixel being calculated which is given as Eq. 6 and t is the pre-defined threshold [21]: 1 Cy M0 where, E = superposition of shifted elemental image; Cx = center position of the image sensor in x; Cy = center position of the image sensor in y; H = pixel numbers from object class which is calculated as Eq. 4: N ∑ v¼ ð5Þ N ∑ E2 i¼1 j¼1 NM ð6Þ Oral Maxillofac Surg where E = statistical mean of the pixel being calculated which is calculated as Eq. 7 [21]. E¼ N N ∑ ∑ I i¼1 j¼1 NM ð7Þ where I is the intensity of the pixel being calculated. Why image reconstruction-based occlusion removal? Image reconstruction-based occlusion removal is a simple technique of reconstructing the occluded object of interest by removing the occlusion through classifying the image pixel as the object of interest (image pixel) or the pixel that does not belong to the object of interest. With the aid of this algorithm, the occluded object will be reconstructed so that the TLD does not fail. The current Lucas–Kanade median optical flow tracker (Eq. 2) cannot handle the occlusion of the object of interest. The current TLD employs Lukas–Kanade median flow tracker for tracking the object of interest. The tracker fails when it cannot track the displacement of the feature point that it is tracking in the object of interest. This results in failure to re-initialize and, thus, in a failure of the entire trackinglearning-detecting process, slowing down the whole process. The additional time required for the tracker to search for the feature point after the occlusion occurs and the time required for re-initialization time after failure results in an increase of the processing time. The proposed system detects the tracker failure with the help of calculating the median displacement of the feature point. If the median displacement is greater than the threshold (generally 10), the tracker detects an occlusion and fails. After the detection of the occlusion, the proposed system (Table 2) based on imaged reconstruction-based occlusion removal is executed and removes the occlusion before the tracker fails. This not only helps to improve the tracking and detecting accuracy, it will finally help the registration accuracy. It further reduces the time processing time by removing the prolonged time required for the tracker to search for the feature point after the occlusion occurs and the re-initialization time after its failure. The processing time of the system will also increase with the use of the optical camera as it removes the system overload of converting 3D images into 2D image frames for tracking and detection. Also, the image overlay accuracy increases with the help of noise removal by the modified kernel non-local means (MKLMN) filter proposed by [23]. With the assistance of the proposed method (Table 2 explains the steps involved), the MKLMN filter removes the noise from the live image frame and removes the occlusion through image reconstruction-based occlusion removal after the tracker detects the occlusion whereas the state-of-the-art solution has no noise removal technique in live video frames and the TLD method has no ability to remove the occlusion and, thus, fails when the occlusion occurs. The proposed system can produce an image overlay error of 0.23~35 mm compared to 0.35~0.45 mm produced by state of the art in a jaw image overlay. Furthermore, the proposed system is able to achieve a processing speed of 8–12 frames per second compared to 6–11 frames per second achieved by the state of the art. The proposed filtered video frame and enhanced TLD with occlusion removal system are presented in Table 2 and the flowchart is illustrated in Fig. 5. RMaTV algorithm-based pose refinement Rotational matrix and Translation vector algorithm (RMaTV) proposed by [3] is used to eliminate the geometric error with the help of rotational and translation vector. ICP used to register the images produces a higher image overlay accuracy with the use of the RMaTV algorithm. RMaTV algorithm helps to eliminate the estimation of the wrong pose hence improve the image registration accuracy. Start Select First Image frame(I) Remove Noise using MKLMN Algorithm Select Feature Points Calculate the optical flow and mean displacement Display Bounding Is mean displacement > threshold N Calculate the position of the feature point in next image frame Y Occlusion occurred Remove occlusion using image reconstruction technique Fig. 5 Flowchart for the proposed algorithm Oral Maxillofac Surg Table 3 Results for mandibular and maxillary jaw bone S. No 1. Sample details Lower Left mandible (Age-27) (Male) Origi nal video Process ed sample Current solution Processing Accurac time y by (Frames per overlay second) error 0.42 mm 0.45 mm 3. Lower Right Mandible (Age-37) (Male) Lower Frontal Mandible (Age-42) (Male) Image overlay 9 fps If patient moves? 7 fps If surgical tools move? 6 fps 0.7 mm 7 fps 0.25mm 10 fps 0.33 mm 8 fps 0.32mm 7 fps 7 fps 0.7 mm 7 fps 0.36mm Image overlay 9 fps 0.25mm 10 fps 0.42mm If patient moves? 8 fps 0.30mm 8fps 0.44mm If surgical tools move? 7 fps 0.32mm 8 fps 0.7 mm 7 fps 0.22mm 10 fps Image registration 0.7 mm 0.38 mm 7 fps Image overlay 8 fps If patient moves? 9 fps 0.28 mm 9 fps If surgical tools move? 7 fps 0.30mm 8 fps 0.7 mm Image registration 7 fps 0.7 mm 7 fps 0.36mm Image overlay 9 fps 0.26mm 10 fps 0.41 mm Lower posterior Mandible (Age-17) (Male) 7 fps Image registration 0.7 mm 0.35 mm 4. Processin g time (Frames per second) Image registration 0.7 mm 0.35mm 2. Proposed solution Process Accurac ed sample y by overlay error 0.38 mm 0.43 mm If patient moves? 6 fps If surgical tools move? 7fps 0.27 mm 8fps 0.31 mm 8 fps Oral Maxillofac Surg Table 3 (Continued) 5. Upper Frontal Maxilla (Age-7) (Male) Image registration 0.7 mm 7 fps 0.7 mm 7 fps 0.22mm 10 fps Image overlay 0.34mm 0.38 mm 0.42 mm 6. Upper Right Maxilla (Age-18) (Male) 0.7 mm 0.41 mm 0.43 mm Upper Frontal Maxilla (Age-15) (Male) 0.7 mm 0.40 mm 0.44 mm Upper left Maxilla (Age-32) (Female) If surgical tools move? 7 fps 0.27 mm 9fps 0.31 mm 8 fps 7 fps Image overlay 9 fps If patient moves? 7 fps If surgical tools move? 6 fps 0.7 mm 7 fps 0.27mm 10 fps 0.30 mm 8fps 0.31mm 8 fps 0.7 mm 7 fps 0.25mm 12 fps Image registration 0.36 mm 8. If patient moves? 9 fps Image registration 0.39mm 7. 10 fps 0.7 mm 0.37 mm 0.41 mm 0.45 mm 7 fps Image overlay 10 fps If patient moves? 9 fps 0.28 mm 9 fps If surgical tools move? 8 fps 0.32mm 8 fps Image registration 7 fps 0.7 mm 7 fps Image overlay 9 fps If patient moves? 6 fps If surgical tools move? 7 fps 0.26 mm 10 fps 0.29 mm 8fps 0.35 mm 8 fps Oral Maxillofac Surg Table 3 (Continued) 9. Upper anterior Maxilla (Age-67) (Male) Image registration 0.7 mm 0.35mm 0.39 mm 0.43 mm 10. Upper posterior Maxilla (Age-42) (Female) 7 fps Image overlay 9 fps If patient moves? 7 fps If surgical tools move? 5 fps 0.7 mm 7 fps 0.25mm 10 fps 0.28 mm 8fps 0.32 mm 8 fps Image registration 0.7 mm 0.32 mm 7 fps Image overlay 11 fps 0.7 mm 7 fps 0.42mm 12 fps If patient moves? 0.36 mm 0.43 mm Results The proposed model was implemented in the MATLAB R2017b [24]. The model was implemented with the use of 10 video samples and 10 CT scan samples from a various age group in maxillary and mandibular jaw bones (Table 3). The videos and CT scan samples were gathered from various sources that are available online for research and study purposes. The image overlay accuracy and Fig. 6 Grouping of the jaw bone 9 fps If surgical tools move? 7 fps 0.25 mm 0.31mm 9 fps 8 fps processing were calculated to measure the performance of the system proposed. The jaw is divided into lower left mandible, lower right mandible, lower front mandible, upper right maxilla, upper left maxilla, and upper frontal maxilla. The proposed system works in three main stages: the preoperative stage, the intra-operative stage with noise removal and tracking (with occlusion removal), and pose refinement with RMaTV. In the intra-operative phase, the CT scan images Oral Maxillofac Surg Fig. 7 a Real-time image. b Image registration. c Image overlay 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 initialization after this process. Thus, the image reconstruction-based occlusion removal that uses pixel classification to identify the object of interest (or not) is used to remove the occlusion and improve the quality of the reconstructed image. A bounding box is detected which defines the object of interest through the hierarchy of images. The bounding box helps to decrease the search area and processing time. Ulrich’s method is used for the initial alignment. This method eliminates the need for initial manual registration and decreases the possibility of human error. This helps to generate the 2D pose object of interest. Once the best image is found, it is sent for 3D pose refinement with the use of the ICP algorithm. In pose refinement with RMaTV phase, a rotational matrix and a translation vector are used to eliminate the geometric error. This algorithm is used to eliminate the possibility of wrong pose selection, hence improving the image overlay and registration accuracy (Fig. 7). Sample videos and images were used in Matlab 2017b [24] to simulate the state-of-the-art system and the proposed system. A range of reports and graphs were generated in terms of accuracy and speed to evaluate and compare the state-of-theart with the proposed system. The comparison graphs are displayed below. Figure 8 compares features of the mandibular jaw in terms of image overlay accuracy and Fig. 9 draws a comparison in terms of the processing time. Similarly, Fig. 10 Processing in terms of frames per second Accuracy in terms of overlay error (mm) of the patient are collected and segmented as per the object of interest (jaw). These segmented images are used to create the aspect graph that contains images with various models, used to match with the online images from a different perspective (angles). The generation of the aspect graph depends on the type of camera parameters used and the jaw model used (Fig. 6). In our case, the aspect graph generation was less than 45 s. In the intra-operative phase with noise removal and tracking with noise removal, optical cameras are used to capture the live surgical video. The video frames are generated from the real-time video and MKLMN filter is used to remove the noise from the image, improving the edges of the objects and preventing contour leakage through denoising. A pyramid of the video frame images is created based on its resolution. The highest-level image (image with the lowest resolution) is taken to the TLD for tracking and detecting the object of interest. The tracking phase of the TLD algorithm uses Lucas–Kande median flow tracker to track the object of interest and display the bounding box. The Lucas–Kanade Median flow tracker uses feature points from the object of interest and calculates its displacement (optical flow) to track the same feature points in the next image frame. But if the object of interest is occluded, then the Lucas–Kande tracker cannot track the feature points and will eventually fail, resulting in failure of the TLD as well, requiring re- Image registraon Image overlay If paent moves? If instrument moves? 12 10 8 6 4 2 0 Image registraon Image overlay If paent moves? If instrument moves? Different stages Different stages State-of-Art Proposed soluon Fig. 8 Accuracy results in mandibular jaw bones samples State-of-Art Proposed soluon Fig. 9 Processing time result in mandibular jaw bone samples Oral Maxillofac Surg Discussion Accuracy in terms of overlay error (mm) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Image registraon Image overlay If paent moves? If instrument moves? Different stages State-of-Art Proposed soluon Fig. 10 Accuracy results in maxillary jaw bones samples and Fig. 11 compare the maxillary jaw bones in terms of accuracy and processing time between the state-of-the-art and proposed the system. The main factor deciding the accuracy and speed of an augmented reality-based system is the image overlay error and the processing time. The image overlay error is the difference in the superimposition of the projected offline 2D image onto the real-time video. The processing time of the augmented reality system is the total number of the image frames processed by the system in a given timeframe. We calculated the speed of the system in terms of seconds. The samples that were collected were simulated in Matlab 2017b using both the state-of-the-art and the proposed system. This simulation was carried out for different age groups ranging from 7 to 67 and a variety of jaw bones were used (Fig. 6) to simulate and test the accuracy and processing time of both systems. During the simulation, the proposed system has been able to achieve a lower image overlay error by approx. 0.12 mm and improved the processing time by 3–4 frames per second. Test data are presented as bar graph to compare the accuracy and processing time of state-of-the-art and the proposed the system. Processing in terms of frames per second 12 10 8 6 4 2 Table 3 represents the comparison between the accuracy and processing time achieved by the state-of-the-art solution and the proposed solution. The results for accuracy and processing time are compared in terms of registration, image overlay, patient movement, and surgical tool movement. The result achieved by the implementation of the proposed system, in terms of overlay error, was 0.23~0.35 mm in comparison to 0.35~0.45 mm achieved by the state of art. Furthermore, the use of an optical camera reduces the system processing time by eliminating the necessity of conversion of the 3D video frame to a 2D video frame in the online phase, leading to an increase in frames per second (8–12) in comparison to 6~11 frames per second achieved by state of art solution. An augmented reality system is the combination of a range of techniques and methods that work simultaneously to provide better AR results and view. When improved techniques are combined, they form a superior AR system. Our proposed system uses an optical camera to reduce the time overhead and a noise removal technique that improves the quality of the live video frames. The aspect graphs help to match the object in real-time videos from various perspectives (angles, rotation, etc.). Initial registration through Ulrich’s method eliminates human error, possible when the initial registration is done manually. The use of the TLD helps with long-term tracking and detection in real-time videos and the use of the bounding box reduces the search area in the real-time video, reducing the processing time. The use of RMaTV removes the geometric error and improves registration accuracy and reduces the image overlay error. A significant body of research exists in the field of augmented reality-based surgery, especially in constructive jaw surgery but, to date, accuracy and processing time remain an area of concern. This study aimed to improve the current best solution which produced image overlay accuracy of 0.35~0.45 mm and processing time of 6~11 frames per second. The proposed method of noise removal and image reconstruction-based occlusion removal in the TLD was simulated in Matlab to demonstrate that the proposed method can reduce the image overlay error while positively affecting the processing time achieved by the state-of-art solution. The current method removes the noise from the real-time video frame and improves tracking and detection through occlusion removal that improves the image overlay to 0.23~0.35 mm and achieves a processing speed of 8~12 frames per second. 0 Image registraon Image overlay If paent moves? If instrument moves? Future research Different stages State-of-Art Proposed soluon Fig. 11 Processing time results in maxillary jaw bone samples Future research may be able to improve the other stages of the TLD including Learning and Detection. Furthermore, the image reconstructed through the image reconstruction occlusion Oral Maxillofac Surg removal process could be improved, further improving the accuracy of the system. Acknowledgements This work was supported in part by Study Support Manager Angelika Maag from the Sydney Study Centre of Charles Sturt University, Sydney, Australia. 7. 8. 9. Compliance with ethical standards Conflict of interest The authors declare that they have no conflict of interest. 10. Ethical approval Not Applicable. Informed consent Not Applicable. 12. Appendix Table 4 11. Abbreviations for the terms used in the paper AR Augmented Reality ICP TLD Iterative closest point algorithm Tracking Learning Detection algorithm CT Computed Tomography LKMF RMaTV MKNLM Lukas–Kanade Median Flow Tracker Rotational Matrix and Translation Vector Algorithm Modified Kernel Non-Local Means Filter 3D 2D Three-Dimensional Two-Dimensional 13. 14. 15. 16. References 1. 2. 3. 4. 5. 6. Wang J, Suenaga H, Hoshi K, Yang L, Kobayashi E, Sakuma I, Liao H (2014) Augmented reality navigation with automatic marker-free image registration using 3-D image overlay for dental surgery. IEEE Trans Biomed Eng 61(4):1295–1304 Bruellmann D, Tjaden H, Schwanecke U, Barth P (2012) An optimized video system for augmented reality in endodontics: a feasibility study. Clin Oral Investig 17(2):441–448 Murugesan Y, Alsadoon A, Paul M, Prasad P (2018) A novel rotational matrix and translation vector (RMaTV) algorithms: geometric accuracy for augmented reality (AR) in oral and maxillofacial surgeries. Int J Med Rob Comput Assisted Surg 14:e1889. https:// doi.org/10.1002/rcs.1889 Suenaga H, Tran H, Liao H, Masamune K, Dohi T, Hoshi K, Takato T (2015) Vision-based markerless registration using stereo vision and an augmented reality surgical navigation system: a pilot study. BMC Med Imaging 15(1):51. https://doi.org/10.1186/s12880-015-0089-5 Sielhorst T, Feuerstein M, Navab N (2008) Advanced medical displays: a literature review of augmented reality. J Disp Technol 4(4): 451–467. https://doi.org/10.1109/jdt.2008.2001575 Wang J, Suenaga H, Liao H, Hoshi K, Yang L, Kobayashi E, Sakuma I (2015) Real-time computer-generated integral imaging and 3D image calibration for augmented reality surgical navigation. Comput Med Imaging Graph 40:147–159. https://doi.org/10.1016/ j.compmedimag.2014.11.003 17. 18. 19. 20. 21. 22. Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422 https:// doi.org/10.1109/tpami.2011.239 Choi H, Park Y, Lee S, Ha H, Kim S, Cho H, Hong J (2017) A portable surgical navigation device to display resection planes for bone tumor surgery. Minim Invasive Ther Allied Technol 26(3): 144–150. https://doi.org/10.1080/13645706.2016.1274766 Wu J, Wang M, Liu K, Hu M, Lee P (2014) Real-time advanced spinal surgery via visible patient model and augmented reality system. Comput Methods Prog Biomed 113(3):869–881. https://doi. org/10.1016/j.cmpb.2013.12.021 Nakao M, Endo S, Nakao S, Yoshida M, Matsuda T (2016) Augmented endoscopic images overlaying shape changes in bone cutting procedures. PLoS One 11(9):e0161815. https://doi.org/10. 1371/journal.pone.0161815 Chen X, Xu L, Wang Y, Wang H, Wang F, Zeng X, Wang Q, Egger J (2015) Development of a surgical navigation system based on augmented reality using an optical see-through head-mounted display. J Biomed Inform 55:124–131. https://doi.org/10.1016/j.jbi.2015.04.003 Hung K, Wang F, Wang H, Zhou W, Huang W, Wu Y (2017) Accuracy of a real-time surgical navigation system for the placement of quad zygomatic implants in the severe atrophic maxilla: a pilot clinical study. Clin Implant Dent Relat Res 19(3):458–465. https://doi.org/10.1111/cid.12475 Chen X, Xu L, Wang Y, Hao Y, Wang L (2016) Image-guided installation of 3D-printed patient-specific implant and its application in pelvic tumor resection and reconstruction surgery. Comput Methods Prog Biomed 125:66–78. https://doi.org/10.1016/j.cmpb. 2015.10.020 Fitzpatrick J, West J, Maurer C (1998) Predicting error in rigidbody point-based registration. IEEE Trans Med Imaging 17(5): 694–702. https://doi.org/10.1109/42.736021 Schicho K, Figl M, Seemann R, Donat M, Pretterklieber M, Birkfellner W et al (2007) Comparison of laser surface scanning and fiducial marker–based registration in frameless stereotaxy. J Neurosurg 106(4):704–709. https://doi.org/10.3171/jns.2007.106. 4.704 Kilgus T, Heim E, Haase S, Prüfer S, Müller M, Seitel A, Fangerau M, Wiebe T, Iszatt J, Schlemmer HP, Hornegger J, Yen K, MaierHein L (2014) Mobile markerless augmented reality and its application in forensic medicine. Int J Comput Assist Radiol Surg 10(5): 573–586. https://doi.org/10.1007/s11548-014-1106-9 Zinser M, Mischkowski R, Dreiseidler T, Thamm O, Rothamel D, Zöller J (2013) Computer-assisted orthognathic surgery: waferless maxillary positioning, versatility, and accuracy of an image-guided visualisation display. Br J Oral Maxillofac Surg 51(8):827–833. https://doi.org/10.1016/j.bjoms.2013.06.014 Wang J, Suenaga H, Yang L, Kobayashi E, Sakuma I (2016) Video see-through augmented reality for oral and maxillofacial surgery. Int J Med Rob Comput Assisted Surg 13(2):e1754 https://doi.org/ 10.1002/rcs.1754 Gold S, Rangarajan A, Lu C, Pappu S, Mjolsness E (1998) New algorithms for 2D and 3D point matching. Pattern Recogn 31(8): 1019–1031. https://doi.org/10.1016/s0031-3203(98)80010-1 Ulrich M, Wiedemann C, Steger C (2012) Combining scale-space and similarity-based aspect graphs for fast 3D object recognition. IEEE Trans Pattern Anal Mach Intell 34(10):1902–1914. https:// doi.org/10.1109/tpami.2011.266 Xiao J, Gerke M, Vosselman G (2012) Building extraction from oblique airborne imagery based on robust façade detection. ISPRS J Photogramm Remote Sens 68:65–68 Kalal Z, Mikolajczyk K, Matas J (2010) Forward-backward error: automatic detection of tracking failures. In: 2010 20Th international conference on pattern recognition. https://doi.org/10.1109/icpr. 2010.675 Oral Maxillofac Surg 23. Kazemi, M., Mohammadi, E., sadeghi, P., & Menhaj, M. (2017). A Non-local means approach for Gaussian noise removal from images using a modified weighting kernel. In Iranian Conference on Electr Eng Tehran: ICEE 24. He C., Liu Y. and Wang Y. Sensor-fusion based augmented-reality surgical navigation system. 2016. [Online]. Available: http:// ieeexplore.ieee.org/document/7520404. Accessed 7 Jan 2018