4.1. Experimental Setup
Datasets. We validate our method using three datasets: the indoor datasets 3DMatch and 3DLoMatch, and the outdoor dataset KITTI. The 3DMatch dataset contains 1672 pairs of point clouds, while the more challenging 3DLoMatch dataset contains 1781 pairs with an overlap ratio of only 10% to 30%. For the KITTI dataset, we follow [
22,
23] and use 555 pairs of point clouds for testing.
Evaluation Criteria. We primarily report the registration recall (RR) within a certain error range as the main evaluation criterion. Additionally, following [
29], we measure the registration effectiveness using the rotation error (RE) and translation error (TE). For the 3DMatch and 3DLoMatch datasets, registration is considered successful when RE ≤ 15° and TE ≤ 30 cm. For the KITTI dataset, registration is considered successful when RE ≤ 5° and TE ≤ 60 cm. Furthermore, following [
29], since failed registrations can cause significant errors, we present both RE and TE in the experimental tables as errors from successfully registered pairs only.
Implementation Details. All input correspondence sets are generated by traditional FPFH or learning-based FCGF descriptors. Following our clique-like construction methods described in
Section 3.3, we set up five types of clique-like and find initial nodes. We refer to [
22] to set the node of our smallest clique-like to 3, and refer to [
23] to set the max node to 20. The overall five types of clique-like sizes are sequentially decreased as 20, 15, 10, 5, and 3. Subsequently, we implement the clique-like sampling enhancement method proposed in
Section 3.3, where the node demotion ratio is set to 0.5. Finally, hypotheses are generated from the correspondences in each clique-like using instance-equal SVD, and they are evaluated by MAE and FS-TCD. All the experiments are conducted on a machine with an Intel i9 12900k CPU (Intel Corporation, Santa Clara, CA, USA) and a single NVIDIA RTX3090 (NVIDIA, Santa Clara, CA, USA).
4.2. Results on 3DMatch Dataset
In this experiment, we first test the 3DMatch dataset. As shown in
Table 1, our CL-PCR outperforms all other compared methods in the correspondences generated using the traditional descriptor FPFH. For the RR, which is the most important evaluation metric in registration, our method shows a 4.38% improvement over the state-of-the-art MAC algorithm, a 1.30% improvement over SC
2-PCR++, and a 10.91% improvement over the deep learning method PointDSC. It is important to note that the RE and TE calculations are derived from the results of successful registrations. This strategy can lead to methods with high RR tending to introduce larger errors, as they include more difficult-to-align data in their error calculations than methods with low RR [
23]. Nevertheless, our RE and TE remain relatively substantial. When using the FCGF descriptor, the correspondence inlier ratio improves compared to FPFH, as expected, leading to better performance across all correspondence-based comparison methods. As shown in
Table 1, our method still achieves strong performance, with an RR slightly lower than SC
2-PCR++ but 0.18% higher than MAC. Notably, in subsequent experiments with the proposed optimization models, our Fast-CL-PCRv1 achieves the highest registration recall.
4.5. Analysis Experiments
In this section, we conduct and analyze the ablation experiments of our method on the 3DMatch and 3DLoMatch datasets using the FPFH descriptor. We compare several algorithms introduced in
Section 3. To avoid random errors caused by the clique-like sampling enhancement method, we ensure consistency in the inputs for each set of ablation experiments by sequentially selecting the initial nodes. This approach allows us to focus on the impact of different strategy choices rather than the variability introduced by random node selection. The results of the ablation experiments for different combinations of methods are presented in
Table 4 and
Table 5. Additionally, we analyze the selection rate and recall rate of the five types of clique-like in our method, with the findings shown in
Table 6. To further improve efficiency, we propose three optimization models, with their results displayed in
Table 7 and
Table 8. Lastly, we evaluate the upper performance limits of our method compared to the SC
2-PCR++ method, as illustrated in
Table 9.
Clique-like construction methods. We test the three clique-like construction methods mentioned in
Section 3. As shown in
Table 4 and
Table 5 (rows 1 to 3), the RR is highest with the clique-like sampling enhancement method using the safe demotion strategy. This method shows an improvement of 0.56% on the 3DMatch dataset and 0.17% on the 3DLoMatch dataset compared to the normal construction method. These results demonstrate that demoting some nodes to smaller clique-like subsets, while retaining the original larger subsets, can effectively address the issue of outlier penetration and enhance overall registration performance. In contrast, the data enhancement method using the default demotion strategy is less effective than the non-enhancement method, indicating that the forced pruning of certain subsets can be counterproductive.
Choices of graph matrix construction. In this paper, we use the graph matrix twice. The first usage is for initial node sorting, where we default to the more relaxed first-order matrix (FM). The second usage is during the construction of clique-like, where we compare the effectiveness of different graph matrices. As shown in
Table 4 and
Table 5 (rows 3 and 4), employing the second-order matrix (SM) to construct the graph matrix results in a 2.47% improvement in RR on the 3DMatch dataset and a 6.85% improvement on the 3DLoMatch dataset compared to using FM. This enhancement suggests that SM is more effective at distinguishing between inliers and outliers when identifying neighboring nodes for the initial node, thereby facilitating the construction of more robust clique-likes and leading to better registration outcomes.
Weighted SVD vs. instance-equal SVD. We compare the performance of instance-equal SVD and weighted SVD, as shown in
Table 4 and
Table 5 (rows 3 and 8). Our method using instance-equal SVD shows a 0.3% higher RR on the 3DMatch dataset and a 0.4% higher RR on the 3DLoMatch dataset compared to using weighted SVD. Although weighted SVD is widely employed in SC
2-PCR and other state-of-the-art methods, the unweighted SVD approach proves more effective for hypotheses generated by small clique-like subsets with fewer nodes. This approach helps to minimize the interference of global information, thereby facilitating the generation of correct hypotheses.
Evaluation metrics selection. We compare three model selection methods: IC, MAE, and MSE, along with the recently proposed reselection metric FS-TCD for hypothesis evaluation of our method. As shown in
Table 4 and
Table 5 (rows 3 and 7), our method achieves the highest performance with MAE + FS-TCD. This combination of reselection by FS-TCD improves RR by 4.87% on the 3DMatch dataset and 4.27% on the 3DLoMatch dataset, demonstrating that advanced metrics are more effective at evaluating hypotheses generated by fewer but more reliable nodes in our clique-like method. When using reselection as depicted in
Table 4 and
Table 5 (rows 3, 5, and 6), MAE results in a 0.99% higher RR than MSE on the 3DMatch dataset and 0.23% higher on the 3DLoMatch dataset. Additionally, MAE yields the same RR as IC on the 3DMatch dataset while reducing the rotation error (RE) by 0.02° and the translation error (TE) by 0.02 cm. On the 3DLoMatch dataset, MAE outperforms IC in RR by 0.96%.
Clique-like model selection analysis. To further assess the effectiveness of our method across different clique-like sizes, we evaluate the selection and recall of five differently sized clique-like structures on the 3DMatch and 3DLoMatch datasets using the FPFH and FCGF descriptors. The results are summarized in
Table 6. The AS statistic shows that the five clique-like models with varying node counts contribute evenly to the final model selection. This indicates that our clique-like models are both effective and meaningful, with no undue reliance on any single size. In terms of the AP statistic, we analyze the recall performance for each clique-like model. Notably, the model consisting of only three nodes achieves the highest recall. This finding highlights the effectiveness of our subset mining process in identifying valuable hypotheses.
Model optimization analysis. In this experiment, we optimize several components of CL-PCR to enhance model efficiency. FS-TCD is particularly suitable for cases where the inlier count metric becomes unreliable, especially with low point cloud pair overlap ratios. We define that an inlier count of fewer than 100 is no longer entirely reliable. Additionally, smaller clique-likes are more effective for extreme cases, as shown in
Figure 2, while using all hypotheses can be time-consuming in simpler scenarios. Thus, we propose that when the inlier count is less than 30, smaller clique-likes are preferable for deeper mining. Based on this observation, we optimize the model by reducing resource allocation for simpler pairs to improve efficiency. We measure the inlier count using the highest MAE score of the hypotheses. The impacts of these optimizations are presented in
Table 7 and
Table 8. Furthermore, we introduce three optimization models through different algorithms, significantly reducing computation time and enhancing registration efficiency without compromising accuracy.
Performance upper bound analysis. Previous experiments indicate that while FS-TCD is a highly advanced metric, the registration recall (RR) may increase after pruning, suggesting it is not always the best metric for model selection. In this section, we assess the performance upper bound of our method under ideal conditions. We define ideal evaluation metrics as those allowing a pair of point clouds to be correctly aligned. This helps evaluate the performance of our method against SC2-PCR++ under upper bound constraints.
We compare different thresholds for the number of correct hypotheses to gauge the reliability of our method relative to SC
2-PCR++. The results, shown in
Table 9, demonstrate that our method outperforms SC
2-PCR++ at all tested thresholds. Specifically, in the success-1 experiment, our method achieves a registration recall of 96.00% and 98.34% on the 3DMatch dataset using the FPFH and FCGF descriptors, respectively, surpassing SC
2-PCR++ by 4.5% and 1.54%. On the 3DLoMatch dataset, our method achieves 70.19% and 87.37% recall with FPFH and FCGF, respectively, exceeding SC
2-PCR++ by 20.84% and 10.33%. These results indicate that our method is highly effective in generating correct hypotheses for most point clouds in the 3DMatch dataset. For the challenging 3DLoMatch dataset with low overlap, our method achieves up to 70.19% recall with FPFH and an even higher 87.37% with FCGF. Although our model has not yet achieved the ultimate upper limit as defined by the current evaluation metrics, the success-1 results demonstrate that our method has a higher performance ceiling. As evaluation metrics continue to improve, our model is poised to achieve even better performance.
4.6. Real Data Experiments
In this experiment, we use three real point cloud datasets collected from a 3D laser scanner equipped with a three-million-pixel industrial camera. These datasets capture various object parts, as shown in
Figure 4. The workflow for processing these datasets is illustrated in
Figure 5. The process begins with the 3D camera capturing the object’s 3D information, which is then converted into point cloud format. To increase complexity, we introduce cropping and noise into the point cloud data. Following this, we downsample the point clouds using a 1 cm voxel size and extract features with the FPFH descriptor to generate correspondences. Finally, we perform the registration using our Faster-CL-PCR method.
In this section, we compare the registration performance of our method with the state-of-the-art method SC
2-PCR++ on real datasets with varying overlap rates. The results of this comparison are presented in
Table 10. Additionally, we investigate the impact of different sampling degrees and noise levels on registration performance. The results of these investigations are illustrated in
Figure 6.
Effect of different overlap rates on results. To further validate the effectiveness and robustness of our method in scenarios with low overlap rates, we test overlap rates ranging from 0.1 to 1 on real datasets. As shown in
Table 10, our method performs consistently well across various overlap rates, with most errors remaining below 1. This indicates that our method is robust even in low-overlap scenarios with real data. While SC
2-PCR++ performs admirably on public datasets, our method shows smaller errors across different overlap rates in real scenarios. Additionally, our method registers point clouds faster than SC
2-PCR++, further demonstrating its practical advantages.
PCR processing capability of our algorithm. To further analyze the processing capability of our algorithm on real data point clouds, we focus on parts (a) from
Figure 4 with an overlap rate of 0.4. We first investigate the impact of different noise levels on the algorithm. The original point cloud of the train component parts (a) contains 1,501,616 points, with a minimum bounding box size of 86 cm × 53 cm × 86 cm and an average nearest distance of 0.036 cm. Gaussian noise, with a standard deviation ranging from 0 to 20 times this level unit, was added to the point cloud. The error curve for the registration results is shown in
Figure 6a. Despite substantial noise, our algorithm remains highly robust. When the noise level is below 10, the rotation error stays under 1° and the translation error remains below 1 cm (as indicated by the reference line). Notably, with no additional noise, our method achieves an error close to 0, highlighting the effectiveness of our approach even in the presence of inherent noise in the raw data.
We also investigate how the sampling rate affects the number of correspondences, time consumption, and errors.
Figure 6b,c present the results of our experiments with voxel sizes ranging from 0.4 cm to 3.0 cm. As shown in
Figure 6b, the error curves exhibit significant fluctuations at a 1.5 cm voxel size, but the errors remain below the reference line.
Figure 6c indicates that when the number of correspondences exceeds the first reference line, both preprocessing time (Pre_time) and registration time (PCR_time) increase substantially, with total time exceeding 1 s. Combining the insights from
Figure 6b,c, a 1 cm voxel size is optimal for the real datasets in this study. For correspondence number requirements, or algorithm achieves an effective balance between speed and accuracy with approximately 1000 to 8000 correspondences, which falls between the two reference lines in
Figure 6c.