Correlative Scan Matching Position Estimation Method by Fusing Visual and Radar Line Features

Li, Yang; Cui, Xiwei; Wang, Yanping; Sun, Jinping

doi:10.3390/rs16010114

Open AccessArticle

Correlative Scan Matching Position Estimation Method by Fusing Visual and Radar Line Features

¹

Radar Monitoring Technology Laboratory, School of Information Science and Technology, North China University of Technology, Beijing 100144, China

²

School of Electronic and Information Engineering, Beihang University, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(1), 114; https://doi.org/10.3390/rs16010114

Submission received: 30 October 2023 / Revised: 20 December 2023 / Accepted: 23 December 2023 / Published: 27 December 2023

(This article belongs to the Special Issue Environmental Monitoring Using UAV and Mobile Mapping Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Millimeter-wave radar and optical cameras are one of the primary sensing combinations for autonomous platforms such as self-driving vehicles and disaster monitoring robots. The millimeter-wave radar odometry can perform self-pose estimation and environmental mapping. However, cumulative errors can arise during extended measurement periods. In particular scenes where loop closure conditions are absent and visual geometric features are discontinuous, existing loop detection methods based on back-end optimization face challenges. To address this issue, this study introduces a correlative scan matching (CSM) pose estimation method that integrates visual and radar line features (VRL-SLAM). By making use of the pose output and the occupied grid map generated by the front end of the millimeter-wave radar’s simultaneous localization and mapping (SLAM), it compensates for accumulated errors by matching discontinuous visual line features and radar line features. Firstly, a pose estimation framework that integrates visual and radar line features was proposed to reduce the accumulated errors generated by the odometer. Secondly, an adaptive Hough transform line detection method (A-Hough) based on the projection of the prior radar grid map was introduced, eliminating interference from non-matching lines, enhancing the accuracy of line feature matching, and establishing a collection of visual line features. Furthermore, a Gaussian mixture model clustering method based on radar cross-section (RCS) was proposed, reducing the impact of radar clutter points online feature matching. Lastly, actual data from two scenes were collected to compare the algorithm proposed in this study with the CSM algorithm and RI-SLAM.. The results demonstrated a reduction in long-term accumulated errors, verifying the effectiveness of the method.

Keywords:

millimeter-wave radar; SLAM; line features; multi-sensor fusion

1. Introduction

Tunnels are an essential component of modern transportation and urban infrastructure. Their unique environmental characteristics, such as low lighting, humidity, and dust, pose significant challenges for fire rescue. In tunnel applications, typical sensors like LiDAR and optical cameras offer advantages such as dense point clouds and strong semantic feature information. However, they are easily hindered by environmental factors like smoke and dust. Under adverse conditions, these sensors cannot obtain continuous data, making it difficult to robustly perceive the surrounding environment. Millimeter-wave radar can overcome constraints posed by conditions like rain and snow, allowing for the collection of continuous data even in harsh environments. Still, it has the disadvantages of sparse point clouds and weak semantic information. Existing rail robots equipped with millimeter-wave radar and optical cameras move in a straight line along the tracks at a fixed observation angle. They utilize millimeter-wave radar to gather continuous point-cloud data and optical cameras to capture intermittent geometric features. Using millimeter-wave radar odometry, they perceive the environment and obtain their positioning information, but they struggle to create closed loops [1].

Existing radar and visual fusion SLAM systems often integrate both the front end and the back end. Current SLAM front-end frameworks mainly fall into three method categories: feature matching based, correlation based, and point-cloud registration based. Algorithms, such as CSM [2], which are representative of front-end pose estimation, as cited in [3], can achieve self-pose estimation and map the surrounding environment. However, due to the observation errors of the radar and coordinate transformation errors during the pose estimation process, there is an accumulated error over a prolonged measurement period. This causes the pose trajectory to gradually deviate in a direction perpendicular to the distance. Therefore, back-end optimization is necessary to correct the long-term cumulative errors produced by the front end.

In the application of back-end optimization, most optimization-oriented SLAM techniques consist of two subsystems. The first subsystem utilizes sensor data by finding matches between newly added observations and the map, establishing the constraints of the problem. The second subsystem computes or corrects the sensor’s pose (as well as previous poses) and the map according to these constraints to achieve a consistent whole [4]. Existing fusion methods, such as those combining LiDAR and cameras, or LiDAR and IMU, largely rely on back-end optimization’s loop closure detection methods to reduce cumulative errors. These methods utilize the geometric features from LiDAR point clouds or visual features’ similarity during measurements for loop closure detection and pose correction. A LiDAR and visual SLAM back end is introduced in [5], which uses both LiDAR geometric features and visual features to achieve loop closure detection. It constructs a Bag of Words (Bow) model that describes visual similarity to aid in loop closure detection, followed by point-cloud rematching to verify the loop closure and complete graph optimization. DVLSLAM’s [6] back end optimizes the pose graph using graph optimization techniques, detecting whether there’s a loop between the current frame and previous keyframes. If a loop is detected, it is added as an edge to the pose graph. LIV-LAM [7] integrates LiDAR-based odometry measurements with target detection based on a monocular camera and associates it with loop closure detection through pose graph optimization. LC-LVF [8] suggests a new error function that takes both scanning and image data as constraints for pose graph optimization and uses g2o for further optimization, ultimately employing a Bag-of-Words-based method for revisited place detection. These loop closure detection-based methods require the sensor’s movement trajectory to close a loop and relatively continuous data. They are challenging to apply in scenes with a linear motion trajectory and discontinuous visual geometry.

In the tunnel scenes addressed in this paper, track robots are characterized by a single pass through the fire, which is not conducive to loop closure correction. Furthermore, cameras struggle to capture continuous geometric features due to obstructions like smoke and dust. As a result, this paper introduces a new backend correction framework that incorporates discontinuous visual line features. These features are matched with line features in the occupancy grid map built by the millimeter-wave radar odometry. By correcting the observed measurements, the framework aims to reduce the cumulative error of front-end pose estimation. However, given the sparsity of the millimeter-wave radar point cloud, the precision of matching line features in the occupancy grid map with visual line features is low, leading to mismatches and redundancies. Moreover, the radar is prone to producing clutter points during observation. These points often possess location information similar to the real point cloud, making it challenging to accurately detect and identify radar line features. This impacts the precision of their match with visual line features.

In response to the aforementioned challenges, this paper takes advantage of the consistent and stable information that line features provide in low-textured environments and their consistency across different viewpoints. This paper proposes a CSM pose estimation method that integrates visual and radar line features. The main contributions of this paper are as follows:

Addressing the cumulative error of a single radar odometer during extended measurement periods, this paper proposes a pose estimation framework that fuses visual and radar line features. By matching visual line features with global grid map line features, an error constraint function is constructed, and the Levenberg–Marquardt (LM) algorithm is employed to solve for the optimized pose.
Considering the significant matching error between the front-end global grid map and visual line features, making it challenging to construct a visual line feature set in the global coordinate system, this paper proposes an adaptive Hough transform line detection method based on prior radar grid map projections. Using the principle of homography, the global grid is mapped onto optical images, endowing the optical images with prior map distance information. Based on this prior information, we identify “areas of interest”, excluding non-matching Hough transform line detection results to enhance line feature matching accuracy.
Given the interference of radar clutter points online feature points, traditional Gaussian mixture models (GMMs) based on position information clustering find it difficult to categorize line features from the global grid map, making it challenging to match with the visual line feature set. Addressing this issue, this paper proposes an RCS-based GMM clustering method (R-GMM), enhancing the matching accuracy between visual line features and global grid map line features, thereby obtaining the required parameters for pose correction.

The second section of this paper introduces the coordinate system definition, related work, and a pose estimation framework that integrates visual and radar line features. The third section describes the principle of the adaptive Hough transform line detection method based on the projection of the radar’s prior grid map, establishing the visual line feature set. The fourth section presents the principle of the GMM clustering model based on RCS. The fifth section is dedicated to experiments and result analysis, where several comparative experiments demonstrate the effectiveness of the method. The final section offers a summary.

2. Pose Estimation Framework Integrating Radar and Visual Line Features

2.1. Related Work

CSM is a global scan matching algorithm that avoids the issue of local minima. Moreover, this algorithm does not require the extraction of feature points, has low requirements on the scene, and can be used for positioning perception in tunnel scenes using millimeter-wave radar odometry. The improved CSM algorithm proposed in [3] introduces uncertainty, matching the radar point cloud to a local submap of the global grid map, achieving tight coupling between positioning and mapping. However, it is easily affected by multipath clutter points, leading to cumulative errors. A radar-inertial odometry (RI-SLAM) method was introduced in [9], which uses a continuous-time framework to fuse measurements from multiple automotive radars and an inertial measurement unit (IMU) to complete the frame-to-frame matching-based pose estimation method. Based on [3], Ref. [1] incorporates scattering angle features, which reduces the influence of multipath clutter points on millimeter-wave radar odometry. Nevertheless, there is still an issue with pose drift due to the long-term accumulation of radar measurement errors. This paper, building on the CSM method from [3], incorporates a pose estimation framework that fuses radar and visual line features in the back end of millimeter-wave radar SLAM, aiming to correct the pose output from the front end and reduce cumulative errors.

Line features are composed of multiple points. Even if there are obstructions and significant changes in illumination during the long-term measurement process, their features are likely to remain unchanged. Therefore, they have a certain robustness in low-texture environments, such as corridors, tunnels, and other enclosed spaces [10,11]. Many studies have mentioned the use of line features in optimization. In VL-SLAM [12], visual information provided by ORB features and the Bag-of-Words model is used to detect whether the vehicle has already visited that location. Scan matching uses local registration and global association (LRGC), matching between point clouds and line segments. A sparse pose adjustment (SPA) based on the Levenberg–Marquardt algorithm is employed for pose graph optimization, correcting the global map. The proposed method is effective in loop closure, but using ORB features makes it susceptible to featureless environments. Ref. [13] introduced a LIDAR-visual-inertial odometry based on optimized visual point and line features, proposed an improved scale-space line feature extraction method and a matching strategy based on the epipolar constraint, providing richer visual features for LVIO front end; in the back end, a fusion system of LVIO is constructed based on a factor graph of Bayesian networks, introducing GNSS and loop factors to globally constrain LVIO. Ref. [14] Modified the TF-IDF scoring scheme on the basis of the Bow model, added a discriminant coefficient, enhanced the discriminative power of visual words, and proposed a closed-loop detection algorithm only suitable for line features, especially suitable for artificial environments, but such methods are not applicable under harsh conditions.

2.2. Coordinate System Definition

In this paper, the reference coordinate system relative to the millimeter-wave radar itself is defined as the local coordinate system. The coordinate system used by the millimeter-wave radar SLAM front end to describe the relative position and posture of objects and robots in the entire environment is defined as the global coordinate system. The reference coordinate system relative to the optical image captured by the optical camera is defined as the pixel coordinate system.

When the radar collects data frame-by-frame, the point in the local coordinate system is denoted as

(x_{i}^{l}, y_{i}^{l})

. Taking the starting point of the radar’s movement as the origin of the global coordinate system,

x_{i}^{l}

represents the coordinate of the measurement distance axis in the local coordinate system of the ith frame, and

y_{i}^{l}

represents the coordinates of the azimuth axis measured in the local coordinate system of the ith frame, with the unit being meters. The point in the local coordinate system is referred to as the local coordinate. The schematic diagram of the local coordinate system is shown in Figure 1.

Upon obtaining the first frame of local coordinates from the radar, with the starting point of the radar’s movement as the origin of the global coordinate system, the measurement azimuth axis in the local coordinate system is taken as the x-axis of the global coordinate system, and the measurement distance axis in the local coordinate system is taken as the y-axis of the global coordinate system. Let the grid coordinate in the global coordinate system be denoted as

(x_{i}^{g}, y_{i}^{g})

.

x_{i}^{g}

represents the coordinates of the distance axis measured in the global coordinate system of the ith radar frame, and

y_{i}^{g}

denotes the coordinates of the azimuth axis measured in the ith frame, with the unit being meters. The occupancy grid map constructed by the millimeter-wave radar SLAM front end in the global coordinate system is called the global grid map, and the points in it are referred to as grid points. The global coordinate system diagram is shown in Figure 2.

Let the point in the pixel coordinate system be

(x_{i}^{c}, y_{i}^{c})

, with the top-left corner of the optical image as the origin.

x_{i}^{c}

is the coordinate on the U-axis of the ith optical image, and

y_{i}^{c}

is the coordinate on the V-axis, measured in pixels. The point in the pixel coordinate system is referred to as the pixel coordinate. A schematic diagram of the pixel coordinate system is shown in Figure 3.

2.3. Principle of the Method

This paper addresses a research scene where forming a closed loop is challenging, rendering loop closure detection methods inapplicable. As a result, a CSM pose estimation method that integrates both visual and radar line features is proposed. In the back end of millimeter-wave radar SLAM, discontinuous visual line features are incorporated, matching them with the line features of the occupied grid map output from the front end by correcting the observed measurements and optimizing the pose of the radar. The overall process flow is depicted in Figure 4.

Firstly, by leveraging the principle of homographic transformation, a mapping relationship between the global coordinate system and the pixel coordinate system is established. The global grid map, constructed by the millimeter-wave radar SLAM front end, is projected onto the optical image, achieving a fusion of information between the millimeter-wave radar and vision. Utilizing the distance information from the global grid map, a prior is provided for the optical image. This introduces a distance constraint for the Hough transform line detection algorithm, determining the “area of interest” for line detection. This facilitates adaptive Hough transform line detection, enhancing the congruence between visual line features and the line features of the global grid map. Line feature pixels are extracted, and through inverse homographic transformation and rigid body transformation, they are transferred to the global coordinate system, forming a collection of visual line features. Within the global coordinates, both the line features from the global grid map and the visual line feature collection are clustered using a Gaussian mixture model based on RCS. This process identifies matched line feature categories and the required mean and covariance matrices for pose refinement. Using SVD decomposition on the covariance matrix of the visual line feature category, the essential translation vectors and rotation matrices for adjustment are obtained. In the global coordinate system, an error function is constructed. By minimizing the distance between points of the two-line feature categories, pose adjustment is achieved using a “submap” matching approach. Finally, the refined pose is determined using the Levenberg–Marquardt algorithm.

The error function is presented as Equation (1):

X^{*} = \underset{X}{argmin} | | (R_{l}^{g} x_{i}^{l} + T_{l}^{g} - (R_{l_{c}}^{g} x_{c_{i}}^{l} + T_{l_{c}}^{g})) | |_{^{2}}

(1)

R_{l_{c}}^{g}

represents the rotation relationship of the visual line feature pixel coordinates when transitioning from the pixel coordinate system to the radar local coordinate system and then to the global coordinate system.

x_{c_{i}}^{l}

is the local point coordinate in the local coordinate system when the visual line feature pixel of the ith optical image frame is converted to it.

T_{l_{c}}^{g}

signifies the translational relationship when converting the line feature pixel coordinates in the pixel coordinate system to the global coordinate system.

R_{l}^{g}

is the rotation relationship when converting the local point coordinates from the local coordinate system to the global coordinate system.

x_{i}^{l}

indicates the local point coordinates of the ith frame, and

T_{l}^{g}

represents the translational relationship when converting local points from the local coordinate system to the global coordinate system.

R_{l}^{g} x_{i}^{l} + T_{l}^{g}

means the transformation of local points to the global coordinate system, and

R_{l_{c}}^{g} x_{c_{i}}^{l} + T_{l_{c}}^{g}

denotes the transformation of visual line features from the pixel coordinate system to the global coordinate system. The optimal pose estimation is obtained by minimizing the distance between the two types of grid points. Lastly, the corrected pose is determined using the Levenberg–Marquardt (LM) algorithm [15]. The A-Hough method is described in Section 3 of this paper, and the R-GMM method is detailed in Section 4.

The diagram below (Figure 5) illustrates the VRL-SLAM method:

Figure 5 depicts the cumulative error of the millimeter-wave radar SLAM front end over an extended measurement period, resulting in a pose trajectory offset perpendicular to the distance direction. By introducing visual line features into the discontinuous time frame sequence, the map constructed before the addition of line features becomes a “submap”, and this moment is referred to as the “submap frame”. The radar’s own pose is corrected by compensating for the error in radar measurements.

3. Adaptive Hough Transform Line Detection Method Based on Prior Radar Grid Map Projection

Commonly used line detection algorithms include Hough [16], LSWMS [17], Edline [18], and LSD [19]. Most of these methods rely on image gradients or edge information, and different methods have varying sensitivities to lines. The Hough transform is a technique widely used in image processing, particularly adept at detecting incomplete or noise-disturbed geometric shapes from images, with line detection being particularly common. The basic idea is to transform the representation of image space into a parameter space. For line detection, this parameter space is typically polar coordinates. In this space, one identifies areas where the density of intersecting points or places surpassing a certain threshold represent lines in the image. Through the Hough transform, even if lines in the image are broken or partially obscured by other objects, they can still be successfully detected, making it suitable for scenes in tunnels where continuous data are hard to obtain.

Due to its dense point cloud, the Lidar point-line feature is very distinctive, mapping its point cloud to optical images, and matching highly with detection algorithms sensitive to straight lines [20,21], such as the LSD and Edline algorithms. However, the millimeter-wave radar point cloud is sparse, resulting in a low match between the global grid map line features and the Hough transform line detection algorithm results, easily causing mismatches and redundancies. Thus, this paper proposes an adaptive Hough transform line detection method based on the radar prior to grid map projection, establishing a visual line feature set and excluding non-matching line interference.

First, using the homography transformation to construct the grid coordinate and pixel coordinate mapping relationship, map the global grid map to the optical image corresponding to the submap frame as a prior distance constraint, determine the “regions of interest” of the Hough transform line detection, to realize the adaptive Hough transform line detection, and increase the match between the global grid map line features and visual line features.

Figure 6 shows the flowchart of the A-Hough method:

3.1. Estimation of Mapping Parameters between Local and Global Coordinate Systems

The mapping parameters between the local and global coordinate systems are obtained through the front-end rigid transformation. Let the pose of the first frame be pose = (tx, ty, θ), where

t x

represents the translation in the X-direction in the global coordinate system,

t y

represents the translation in the Y-direction in the global coordinate system, and

θ

represents the rotation angle. Let the rotation relationship between the point-cloud coordinates in the local coordinate system and the grid coordinates in the global coordinate system be R. The mapping relationship between the two coordinate systems is shown in Equation (2).

[\begin{array}{l} x_{i}^{g} \\ y_{i}^{g} \\ 1 \end{array}] = [\begin{matrix} \cos θ & - \sin θ & t x \\ \sin θ & \cos θ & t y \\ 0 & 0 & 1 \end{matrix}] [\begin{array}{l} x_{i}^{l} \\ y_{i}^{l} \\ 1 \end{array}]

(2)

From this, we can determine the mapping relationship between the global coordinate system and the local coordinate system. What this means is that after the local coordinate system captures the point-cloud coordinates of the first frame, the SLAM front end estimates the pose of that initial frame. Based on rigid transformations, the transformation relationship between the global and local coordinate systems is established. As the number of frames increases, the local coordinate systems of subsequent frames map to the global coordinate system based on their motion relationship. Here,

i (i = 1, 2, 3, \dots)

represents the frame number linking the local and global coordinate systems.

3.2. Estimation of Mapping Parameters between Local Coordinate System and Pixel Coordinate System

The mapping parameters between the local coordinate system and the pixel coordinate system are obtained through the calibration of the millimeter-wave radar and the optical camera. Suppose that when the millimeter-wave radar performs target detection, the target’s coordinates in the local coordinate system are

(x_{i}^{l}, y_{i}^{l})

, and the target’s pixel coordinates captured by the camera are

(x_{i}^{c}, y_{i}^{c})

. Both the target detected by the radar and the target detected by the camera are on the radar acquisition plane

x_{r} o_{r} y_{r}

and the pixel plane

u_{c} o_{c} v_{c}

. The position of an object simultaneously detected by both the millimeter-wave radar and the camera in the local coordinate system is

P_{r} (x_{i}^{g}, y_{i}^{g})

, and its position in the pixel coordinate system is

P_{c} (x_{i}^{c}, y_{i}^{c})

. Based on the principle of homography, the transformation relationship between the radar and the camera is the mapping relationship between the target point in the local coordinate system and its corresponding target point on the pixel coordinate plane. According to the principle of homography, the relationship between the local coordinate system and the pixel coordinate system is shown in Equation (3).

S [\begin{array}{l} x_{i}^{c} \\ y_{i}^{c} \\ 1 \end{array}] = H [\begin{array}{l} x_{i}^{l} \\ y_{i}^{l} \\ 1 \end{array}]

(3)

From this, a homography matrix representing the mapping relationship between the local coordinate system and the pixel coordinate system can be obtained. Here, S is a constant and is used as a scaling factor to ensure equality on both sides of the equation.

3.3. Estimation of Mapping Parameters between the Global Coordinate System and the Pixel Coordinate System

The mapping parameters between the global coordinate system and the pixel coordinate system are obtained through the principle of homography transformation. The target point coordinates in the global coordinate system are extracted, and the extracted coordinates are paired with the target points in the local coordinate system to form a set of matching points. Through the homography transformation relationship between the local coordinate system and the pixel coordinate system, the pixel coordinates of the target points can be derived. Then, using the local coordinate system as a bridge, the mapping relationship between the global coordinate system and the pixel coordinate system is determined based on the principle of homography transformation, as shown in Equation (4).

[\begin{array}{l} x_{i}^{g} \\ y_{i}^{g} \\ 1 \end{array}] = [\begin{matrix} \cos θ & - \sin θ & t x \\ \sin θ & \cos θ & t y \\ 0 & 0 & 1 \end{matrix}] S H^{- 1} [\begin{array}{l} x_{i}^{c} \\ y_{i}^{c} \\ 1 \end{array}]

(4)

Through the aforementioned steps, a mapping relationship between the global coordinate system and the pixel coordinate system can be constructed. This allows for the projection of the global grid map onto optical images, endowing the optical images with distance information.

In a given frame, the pixel coordinates in the optical image mapped from the line features in the global grid map are denoted as

(x_{n}^{c_{g}}, y_{n}^{c_{g}})

.

x_{n}^{c_{g}}

represents the horizontal coordinate of the nth pixel point in the pixel coordinate system, converted from grid coordinates in the pixel coordinate system, and

y_{n}^{c_{g}}

represents the vertical coordinate of the nth pixel point in the pixel coordinate system, converted from grid coordinates in the pixel coordinate system. By traversing all the mapped line feature points, obtaining the maximum and minimum points on both the x-axis and y-axis. As shown in Equations (5)–(8).

x_{g}^{c_{\min}} = \min {x_{1}^{c_{g}}, x_{2}^{c_{g}} \dots x_{n}^{c_{g}}}

(5)

x_{g}^{c_{\max}} = \max {x_{1}^{c_{g}}, x_{2}^{c_{g}} \dots x_{n}^{c_{g}}}

(6)

y_{g}^{c_{\min}} = \min {y_{1}^{c_{g}}, y_{2}^{c_{g}} \dots y_{n}^{c_{g}}}

(7)

y_{g}^{c_{\max}} = \max {y_{1}^{c_{g}}, y_{2}^{c_{g}} \dots y_{n}^{c_{g}}}

(8)

x_{g}^{c_{\min}}

is the minimum horizontal coordinate of the feature points of the global grid map lines mapped to the pixel points in the optical image, and

x_{g}^{c_{\max}}

is the maximum horizontal coordinate point.

y_{g}^{c_{\min}}

is the minimum vertical coordinate of the feature points of the global coordinate system grid map lines mapped to the pixel points in the optical image, and

y_{g}^{c_{\max}}

is the maximum vertical coordinate point. Based on the distance information projected onto the optical image from the prior map, the “regions of interest” for line detection can be identified as

L_{s} = {x_{g}^{c_{\min}}, x_{g}^{c_{\max}}, y_{g}^{c_{\min}}, y_{g}^{c_{\max}}}

. Subsequently, the Hough transform is utilized exclusively for line feature detection within this region. As time progresses, the detection range for each frame of the optical image is determined based on this distance information, thus achieving adaptive Hough transform line detection.

The collection of line features detected in the pixel coordinate system is denoted as

l_{c} = {l_{1}, l_{2} \dots l_{c}}

,

c \geq 1

. Extracting the pixel coordinates

(x_{i}^{c}, y_{i}^{c})

of each line feature. Using the inverse homography transformation, line features in the pixel coordinate system are converted to the global coordinate system, as shown in Equation (9). The visual line feature set is constructed as

l_{q} = {l_{1}, l_{2} \dots l_{q}}

,

q \geq 1

. The coordinates of each line feature in the global coordinate system are represented by

(x_{i}^{g}, y_{i}^{g})

.

[\begin{array}{l} x_{c_{i}}^{q} \\ y_{c_{i}}^{q} \\ 1 \end{array}] = {[\begin{matrix} \cos θ & - \sin θ & t x \\ \sin θ & \cos θ & t y \\ 0 & 0 & 1 \end{matrix}]}^{- 1} S H [\begin{array}{l} u_{n} \\ v_{n} \\ 1 \end{array}]

(9)

By projecting the global grid map with distance information onto the optical image and determining the “regions of interest”, adaptive Hough transformation is achieved. This enhances the matching accuracy between the line features of the global grid map and the detected line features, serving as a foundation for back-end optimization.

4. Based on RCS Gaussian Mixture Model Clustering Method

Millimeter-wave radar includes both a transmitter and a receiver. The transmitter can generate and emit high-frequency electromagnetic waves within a certain frequency range (typically between 30 and 300 GHz). When the emitted electromagnetic waves encounter an object, they are reflected by targets such as vehicles, walls, fences, etc. The size, shape, and material of the target all influence the characteristics of the reflected wave, with the most critical factor being the RCS, which describes the ability of a target to reflect radar waves. The receiver captures the reflected electromagnetic waves. By analyzing the time delay, frequency shift, and intensity of these echoes, it determines the target’s distance, speed, and size. However, radar echoes often experience clutter interference from the ground, buildings, etc., producing a large number of irrelevant data points in the scene, possessing location information similar to the “points of interest”. Particularly when slender linear features like fences exist in the scene, this clutter makes these features difficult to detect and recognize accurately. Clutter usually originates from various irregular reflectors, so their RCS values tend to be variable with a broad distribution. For instance, ground clutter might encompass reflectors ranging from pebbles to large buildings. However, genuine targets usually have similar RCS characteristics, such as fences, vehicles, airplanes, etc. This is because their shape, material, and size are typically consistent. For example, a metal fence, in a specific direction, will always have a relatively fixed RCS value.

Clustering is one of the important methods in data mining. By utilizing the similarity between data, it categorizes data objects with similar properties into the same class. Through clustering, point clouds can be classified, thereby isolating line features that match the global grid map. The GMM can be used for clustering data in unsupervised learning. It is a statistical model used to represent an observed dataset generated by a mix of several Gaussian distributions. The model assumes that data is generated by a blend of various Gaussian distributions. A GMM can be expressed as:

p (x | π, μ, \sum) = \sum_{i = 1}^{M} π_{i} N (x | μ_{i}, \sum_{i})

(10)

where x is the data point, M is the number of Gaussian distributions,

π_{i}

is the mixture weight of the ith distribution, and

N (x | μ_{i}, \sum_{i})

represents the Gaussian density function, satisfying

\sum_{i = 1}^{M} π_{i} = 1

, with a mean

μ_{i}

and covariance

\sum_{i}

.

In the application of GMM clustering, each Gaussian distribution can be viewed as a cluster within the data. The power of the GMM lies in its ability to describe the structure of data by combining multiple Gaussian distributions. One of the strengths of the GMM is its ability to capture clusters of various shapes, not just spherical ones but also elongated shapes, thanks to the covariance matrix of each Gaussian distribution. Compared to traditional K-means clustering, the GMM takes into account not only the center of the data but also its covariance structure, capturing a richer structural feature of the data. Its parameters include the mean (center) and covariance (which describes the shape and directional spread) of each component.

However, in the global grid map, due to the clutter points having similar positional features to the “points of interest”, the GMM might treat these clutter points as significant data features, allocating one or more Gaussian distributions to fit these clutter points. This is because the GMM’s goal is to minimize the overall error, and the clutter may dominate much of the data. At the same time, the points forming line features might be sparse in comparison to the entire dataset. Thus, when the GMM attempts to fit the large number of clutter points with Gaussian distributions, it might “overshadow” the sparse line features.

Traditional clustering approaches struggle to address the issue of the influence of clutter points. Hence, this paper proposes the incorporation of RCS as a weight to distinguish between line feature points and clutter points, increasing the distance in the ellipsoidal feature space, ensuring points with similar structures cluster together. A schematic diagram is shown in Figure 7. The blue box represents the clustering results after incorporating RCS, while the yellow box shows the GMM clustering results. Both algorithms cluster the same line.

During the clustering process, a weight is assigned to each point-cloud data point. This weight is based on the RCS value of the point, specifically, data points with a larger RCS value are given a higher weight.

Let us assume the set of data points is

X_{k} = {x_{1}, x_{2}, \dots, x_{N}}

and their corresponding RCS values are

w_{k} = {w_{1}, w_{2}, \dots, w_{N}}

. Normalized each point using its RCS value such that the sum of all weights is one. As shown in Equation (11).

w_{k}^{'} = \frac{w_{k}}{\sum_{i = 1}^{N} w_{i}}

(11)

N is the total number of data points in the set, and

w_{k}^{'}

is the normalized RCS weight of the kth data point. After introducing RCS, the GMM probability distribution with RCS weighting is shown in Equation (12).

p (X_{k} | w^{'}, π, μ, \sum) = w_{k}^{'} \sum_{i = 1}^{M} π_{i} N (x_{k} | μ_{i}, \sum_{i})

(12)

\sum_{i = 1}^{M} π_{i} = 1

(13)

When the RCS weight

w_{k}^{'}

is introduced, each data point

x_{k}

has an associated weight. However, these weights do not directly influence the probability distribution of individual data points; they mainly play a role in the likelihood function and parameter estimation.

The likelihood function is shown in Equation (14).

p (X_{k} | w^{'}, π, μ, \sum) = Π_{k = 1}^{N} (w_{k}^{'} \sum_{i = 1}^{M} π_{i} N (x_{k} | μ_{i}, \sum_{i}))

(14)

X_{k}

is the set of data points,

w^{'}

is the RCS weight,

w_{k}^{'}

is the RCS weight of the kth data point,

π_{i}

is the mixing weight of the ith Gaussian distribution, M is the number of Gaussian distributions, N is the number of data points,

N (x_{k} | μ_{i}, \sum_{i})

is the Gaussian density function with mean

μ_{i}

and covariance

\sum_{i}

.

The likelihood function still serves the purpose of describing the probability of observing the entire dataset given the model parameters. However, the aforementioned likelihood function is influenced by the RCS weights. Specifically, data points with higher RCS weights will have a greater weight in the likelihood function. This means that the model will pay more attention to these data points, effectively filtering out the interference from clutter points and segregating point clouds with similar characteristics.

Since this paper mainly focuses on the region in the global grid map that matches with the visual line features, during the R-GMM clustering, this region is set to cluster into three categories. Centered around the line feature region, each side of the line feature is clustered into one category. A schematic diagram is shown in Figure 8.

Next, the EM (expectation-maximization) algorithm is used to estimate the model parameters.

The EM algorithm is an iterative method used to find parameter estimates when there are hidden variables, aiming to maximize the likelihood function of the observed data. For the Gaussian mixture model, these hidden variables indicate which Gaussian distribution the data points come from. The EM algorithm mainly consists of two steps: the E-step (expectation step) and the M-step (maximization step).

E-step: Calculate the posterior probability

w_{m k}

of data point x belonging to the mth Gaussian distribution. As shown in Equation (15). In this step, the expected posterior probability of data points belonging to each Gaussian distribution is calculated, which is the probability of the data point coming from a particular Gaussian distribution given the observed data and current parameter estimates.

χ (z_{i k}) = \frac{π_{i} w_{k}^{'} N (x_{k} | μ_{i}, \sum_{i})}{\sum_{j = 1}^{K} π_{j} w_{k}^{'} N (x_{k} | μ_{j}, \sum_{j})}

(15)

M-step: Based on the posterior probabilities obtained from the E-step, update the parameters of the Gaussian Mixture Model to maximize the data likelihood.

The updated mean is shown in Equation (16).

μ_{i}^{'} = \frac{\sum_{k = 1}^{N} χ (z_{i k}) w_{k}^{'} x_{k}}{\sum_{k = 1}^{N} χ (z_{i k}) w_{k}^{'}}

(16)

Equation (16) calculates the mean of the kth Gaussian mixture component with the inclusion of the RCS weight. The RCS weight

w_{k}^{'}

adjusts the contribution of each data point’s position. When a data point has a larger RCS weight, its contribution to the mean of mixture component k is greater.

Update the covariance matrix is shown in Equation (17).

\sum_{i}^{'} = \frac{\sum_{k = 1}^{N} χ (z_{i k}) w_{k}^{'} (x_{k} - μ_{i}^{'}) {(x_{k} - μ_{i}^{'})}^{T}}{\sum_{k = 1}^{N} χ (z_{i k}) w_{k}^{'}}

(17)

Equation (17) computes the new covariance for the kth Gaussian mixture component. The RCS weight

w_{k}^{'}

adjusts the contribution from each data point regarding its shape and direction. Data points with a larger RCS weight have a greater influence on the covariance of the kth mixture component.

Update weights are shown in Equation (18).

π_{i}^{'} = \frac{\sum_{k = 1}^{N} χ (z_{i k})}{N}

(18)

The RCS weight assigns different importance to data points to eliminate the interference of clutter points. During the M-step update of GMM parameters, the RCS weight of a data point determines its significance in calculating the new weights, means, and covariances. Data points with a larger RCS weight will have a more significant impact on the update of model parameters. This allows the GMM to prioritize those more crucial data points (based on RCS weight) to adjust the model parameters.

Using the R-GMM method, both the global grid map and the visual line feature set are clustered, yielding two-line feature classes and obtaining their mean and covariance matrix. Suppose the mean of the visual line feature class clustered in the global coordinate system is

μ_{c}

, and the covariance matrix is

\sum_{c}

; the mean of the line feature class of the global grid map is

μ_{q}

, with covariance matrix

\sum_{q}

.

The covariance matrix provides information on the direction and magnitude of the shape post-clustering. Using singular value decomposition (SVD) to decompose the matrix, obtaining the rotation matrix. Specifically, the column vectors of the rotation matrix are the base vectors in the new coordinate system, which are the eigenvectors of the covariance matrix. These eigenvectors determine the class’s direction and typically represent the primary directions of the covariance matrix, i.e., the directions in which the data has the most significant variance. The visual line feature class is decomposed using SVD, as shown in Equation (19).

\sum_{c} = U_{c} D_{c} V_{c}^{T}

(19)

U_{c}

is an orthogonal matrix, and its columns are the eigenvectors of the covariance matrix

\sum_{c}

. These eigenvectors determine the primary directions or variation directions of the line features.

D_{c}

is a diagonal matrix, with its diagonal elements being the eigenvalues of the covariance matrix

\sum_{c}

. These eigenvalues represent the amount of variation or “spread” of the data in the directions of the corresponding eigenvectors. A large eigenvalue implies that the line feature points have a significant variation in that direction, while a small eigenvalue indicates that the variation of the line feature points in that direction is minimal.

V_{c}^{T}

is the transpose of

U_{c}

.

Its rotation matrix

R_{l_{c}}^{g}

is given by Equation (20):

R_{l_{c}}^{g} = U_{c} V_{c}^{T}

(20)

This rotation matrix

R_{l_{c}}^{g}

represents the rotation of the visual line feature relative to the coordinate axes in the global coordinate system. It is an orthogonal matrix, and its column vectors are unit vectors, indicating the direction in the coordinate system. This formula describes the rotation of the visual line feature in the global coordinate system.

5. Results

5.1. Experiment System

The equipment system used in this paper includes a 77GHz millimeter-wave radar, an optical camera, and an inertial navigation system (INS). The radar used in the experimental system is Continental’s ARS408-21 FMCW (frequency-modulated continuous wave) radar. In its short-distance mode, the measurement range is 0.2–70 m/100 m (within ±45°), with a distance measurement accuracy of ±0.1 m. In the long-distance mode, the measurement range is 0.2–250 m, with a distance measurement accuracy of ±0.4 m. This paper employs the radar in its short-distance mode. The camera captures images of size 1920 × 1080. The positioning accuracy of the INS is at the centimeter level. The data update rate is 200 Hz. The accelerometer’s zero-bias stability is 0.02 mg, and the gyroscope’s range is ±400°/s. The IMU’s heading accuracy is 0.5°, position accuracy is 0.1°, and the typical gyro bias stability is 5°/h. The INS can be used to measure the ground truth of the position. The equipment system is depicted in Figure 9.

5.2. Experimental Design

In this study, two experiments are designed to validate the effectiveness of the adaptive Hough transform line detection method based on the radar prior grid map projection: the Gaussian mixture model clustering method based on RCS, and the CSM pose estimation method that integrates visual and radar line features. Scene 1: Moving Scene. This scene was set in a moving environment, where there were no moving targets, and the area was relatively open. The only structures present were a fence and a house, both with strong reflectivity. They were positioned approximately 10 m apart, likely to produce multipath interference. This setting aimed to test the matching ability of the line feature class obtained from clustering the global grid map with the visual line feature set, especially in scenes with abundant multipath interference points. Scene 2: Parking Lot Scene. The scene was set in a parking lot, characterized by its complexity. Stationary objects in the scene included trees, guardrails, vehicles, and corner reflectors. The environment consisted of both weak reflectors (trees) and strong reflectors (guardrails, vehicles, corner reflectors), situated closely to each other, making it prone to multipath effects. The placement of the corner reflectors provided a direct validation point for the projection of the global grid map. This setting aimed to demonstrate that the CSM pose estimation method, which integrates visual and radar line features, remains applicable even in intricate scenes.

From the correspondence between the global grid map projected onto the optical image frame sequence, it is known that there is a time difference between the two frame sequences. To simulate harsh conditions and obtain discontinuous line features, only one optical image is incorporated into the back-end framework proposed in this study during experimental verification. The relationship between the optical image and the millimeter-wave radar frame sequence is illustrated in Figure 10.

Due to the limited perspective of optical images, images that include buildings within the scene are selected for experimental validation. For the two scenes, selected optical images are processed using the Hough transform line detection method and the adaptive Hough transform line detection method based on the radar’s prior grid map projection. through manual annotation, line features matching the global grid map are obtained to validate the effectiveness of the proposed methods.
Firstly, global grid maps of the two scenes are constructed separately through the millimeter-wave radar SLAM front end. Secondly, these maps then undergo GMM clustering and R-GMM clustering. Finally, through manual annotation, line features matching the visual line features are identified, verifying the efficacy of the methods.
Firstly, the pose $(x, y, θ)$ is obtained through the front end of the millimeter-wave radar SLAM. Secondly, the corrected pose $(x^{'}, y^{'}, θ^{'})$ is determined using the CSM pose estimation method that fuses visual and radar line features. The (INS) provides the ground truth for comparison. Finally, the pose trajectories are plotted before and after correction, and the difference of each frame is relative to the ground truth. By calculating the root mean square error (RMSE) between the two pose estimation results and the ground truth, the accuracy of the pose estimation is evaluated. This serves to validate the effectiveness of the method.
Based on the designs of Scene 1 and Scene 2, the INS and the experimental equipment move simultaneously, and the RI-SLAM algorithm proposed in [9] is utilized for pose estimation. It is compared with the VRL-SLAM method introduced in this paper. The effectiveness of the two methods is evaluated by calculating the RMSE between each method and the ground truth provided by the inertial navigation system.

Scene 1 and Scene 2 are illustrated in Figure 11:

5.2.1. Experimental Results and Analysis of the Hough Transform Line Detection Method Based on Prior Map Projection

Mapping of the global grid map projected onto the optical image for both Scene 1 and Scene 2 is provided, defining the “region of interest”. For Scene 1 and Scene 2, the selected optical images underwent the adaptive Hough transform line detection method based on radar prior to grid map projection, as well as the Hough transform line detection method. The results are shown in Figure 12 and Figure 13:

In the figure, the yellow boxes represent the line features manually annotated. In Figure 12b and Figure 13b, the red dots are the grid coordinates projected from the global grid map onto the optical image. In Figure 12c,d and Figure 13c,d, the red lines are the lines detected by the two methods. From the results of the two scenes, it can be seen that many of the line detection results in Figure 12d and Figure 13d do not match the line features in the yellow boxes in Figure 12a. After adding the distance constraint through the prior grid map projection in Figure 12b and Figure 13b, the unmatched lines are filtered out, as shown in Figure 12c and Figure 13c. By manually annotating, the matched lines can be visually observed, verifying the effectiveness of the method.

5.2.2. Experimental Results and Analysis of the RCS-Based Gaussian Mixture Model Clustering Method

Both the line features in the global grid map and the visual line feature set in the global coordinate system were subjected to Gaussian mixture model clustering and R- GMM clustering. The results are shown in Figure 14 and Figure 15.

In the figures, the yellow boxes indicate line features manually marked. In Figure 14c,d and Figure 15c,d, different colors represent different categories. From Figure 14d and Figure 15d, it can be seen that due to the interference from clutter points, the GMM clustering algorithm based on positional information struggles to extract the line features. Observing Figure 14c and Figure 15c, after incorporating the RCS weights, the line features in the scene can be clustered into a single category, enhancing the matching accuracy between the global grid map and the visual line features. The method’s effectiveness is validated through manual annotations. Moreover, from Figure 14b and Figure 15b, one can intuitively notice that the line features in the global grid map exhibit a tendency to shift perpendicular to the distance direction. This shift corresponds with the direction of the pose trajectory offset estimated by the millimeter-wave radar odometer. This paper mainly focuses on researching this specific issue.

5.2.3. Experimental Results and Analysis of the CSM Pose Optimization Method Integrating Visual and Radar Line Features

Using the millimeter-wave radar odometry, the position and pose of the data collection device were obtained in the two scenes. The VRL-SLAM method is used to correct the pose outputs of the millimeter-wave radar odometry. The trajectory comparison for Scene 1 is shown in Figure 16, and for Scene 2 is shown in Figure 17.

From Figure 16 and Figure 17, it can be observed that the pose estimation method based on CSM tends to offset in the direction perpendicular to the distance (in the Y-direction) during prolonged measurements, resulting in significant deviation from the ground truth. In contrast, the pose estimation results of the VRL-SLAM method proposed in this paper will have a certain offset. Over prolonged measurements, as the back-end framework is optimized, the cumulative error decreases, causing the pose trajectory to get progressively closer to the ground truth and gradually become smoother.

The RMSE is used in this paper to evaluate the discrepancy between the estimated pose and the ground truth pose. The RMSE in the X-direction is given by Equation (21):

R M S E_{X} = \sqrt{\frac{\sum_{i = 1}^{n} {(x_{i}^{p} - x_{i}^{r e a l})}^{2}}{N}}

(21)

where

x_{i}^{p}

represents the X-coordinate of the estimated pose for the ith frame,

x_{i}^{r e a l}

represents the X-coordinate of the true pose for the ith frame, and

N

represents the total number of points.

The root mean square error (RMSE) in the Y-direction is shown in Equation (22):

R M S E_{Y} = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i}^{p} - y_{i}^{r e a l})}^{2}}{N}}

(22)

where

y_{i}^{r e a l}

represents the y-coordinate of the true pose for the ith frame,

y_{i}^{p}

represents the y-coordinate of the pose for the ith frame, and

N

is the total number of points.

The root mean square error (RMSE) for angle estimation is shown in Equation (23):

R M S E_{θ} = \sqrt{\frac{\sum_{i = 1}^{n} {(θ_{i}^{p} - θ_{i}^{r e a l})}^{2}}{N}}

(23)

where

θ_{i}^{p}

is the angle of the estimated pose for the ith frame,

θ_{i}^{r e a l}

is the angle of the true pose for the ith frame, and

N

is the total number of points.

The results of the RMSE calculation are shown in Table 1:

From Table 1, it can be observed that the VRL-SLAM method proposed in this paper improves the estimation accuracy in the Y-direction. To more clearly and intuitively observe the accuracy of the localization for each scan, this paper calculates the difference between the pose Y of each scan and the true pose Y. The results are shown in Figure 18 and Figure 19:

5.2.4. Comparison and Analysis of Experimental Results between VRL-SLAM and RI-SLAM

The pose trajectories of RI-SLAM and VRL-SLAM in Scene 1 and Scene 2 are shown in Figure 20 and Figure 21, respectively.

From Figure 20 and Figure 21, it can be seen that the RI-SLAM algorithm initially estimates poses similar to the ground truth trajectory, but gradually deviates after a period of motion, moving farther away from the ground truth. The VRL-SLAM algorithm starts with a deviation in pose estimation but becomes increasingly closer to the ground truth over time, reducing cumulative errors and verifying the effectiveness of the method.

RMSE is used to evaluate the effectiveness of the RI-SLAM and VRL-SLAM algorithms. The calculation formulas for the RMSE in the X-direction, Y-direction, and angle are consistent with Equation (21), Equation (22), and Equation (23), respectively. The calculation results are shown in Table 2.

From Table 2, it is evident that the pose estimation results of the VRL-SLAM algorithm are superior to the RI-SLAM algorithm, confirming the effectiveness of the method.

Due to the diverging errors of the IMU and the fact that the measurements of the IMU and millimeter-wave radar are independent with no common features or links, RI-SLAM is more suitable for front-end frame–frame matching without significant errors. But the IMU can generate cumulative errors when using subgraph matching methods. In contrast, the VRL-SLAM proposed in this paper utilizes the spatial consistency of large-scale line features in global grid maps and optical images as constraints for joint optimization, reducing cumulative errors between submaps. Therefore, the performance of VRL-SLAM is superior to that of RI-SLAM.

6. Discussions

Millimeter-wave radar odometry can achieve self-localization and mapping of the surrounding environment in enclosed spaces, but it tends to generate cumulative errors. Ref. [1] analyzed the impact of multipath clutter points on localization, improving pose accuracy by 20%, yet cumulative errors still exist. A new opportunity arises by reducing cumulative errors in long-term localization processes through multi-sensor fusion. Many methods of pose estimation that fuse Lidar and vision have been well applied. By analyzing the map generated by the millimeter-wave radar odometry, we examine the line features of a single frame point cloud, submaps, and the global grid map. Line features from the global grid map can reflect cumulative errors generated during long-term measurements, while line features from single frame point clouds and submaps are not as prominent. Thus, this paper introduces line features based on this analysis. In this study, to address the issue of cumulative errors in long-term measurements with existing millimeter-wave radar odometry, a pose estimation framework fusing vision and radar line features is proposed. This utilizes discontinuous visual line features to optimize the radar’s pose by correcting the measurements. Ref. [20] reduced cumulative errors by integrating IMU with millimeter-wave radar, but this method is more suitable for front end frame matching, and also generates cumulative errors in subgraph matching methods, as shown in the experimental results and analysis in Section 5.2.3. The results indicate that the VRL-SLAM method proposed in this paper is superior to the RI-SLAM method in the application of this paper.

Based on the pose estimation framework proposed in this paper, two issues need further exploration. The first issue is the low matching degree between the global grid map and visual line features due to the sparsity of millimeter-wave radar point clouds. The Lidar method is not suitable for this paper due to the dense point clouds and clear point-line features of lidar, which match well with visual line features. Although the Hough transform line detection method is applicable to our application scenario, it still faces the presence of non-matching line features. After projecting the line features from the global grid map onto the optical image, they can reflect changes in pose. This paper proposes using homography transformation to project the global grid map onto the optical image, endowing the optical image with distance information to eliminate non-matching line features and improve matching accuracy, as shown in the experimental results in Section 5.2.1.

The second issue to be addressed is the interference of multipath clutter points on line feature classes. Although the Gaussian mixture model clustering method based on positional information can classify long features, it is challenging to categorize them due to the interference of multipath clutter points, making it difficult to obtain the rotation matrix needed for pose correction. Based on the similarity of RCS values for the same target, this paper proposes a Gaussian mixture model clustering method based on RCS, assigning RCS weights to each point, thereby clustering points with similar weights together to improve matching accuracy, as shown in the experimental results in Section 5.2.2.

7. Conclusions

In response to the accumulated errors generated during positioning and mapping due to the single radar’s range and angle measurement errors, this paper proposes a CSM pose optimization method that integrates visual and radar line features. By introducing visual line features and correcting the measurement errors, the poses output from the front end are adjusted. Due to the sparsity of millimeter-wave point clouds and weak semantic feature information, matching global grid map line features with visual line features can easily lead to mismatches and redundancies. Therefore, this paper introduces an adaptive Hough transform line detection method based on radar prior to grid map projection, establishing a set of visual line features, which improves matching accuracy. Traditional GMM clustering methods based on positional information are significantly affected by clutter, increasing the matching error between global grid map line features and visual line features. To address this, this paper proposes a Gaussian mixture model clustering method based on RCS. By adding RCS weights to each grid coordinate, interference from clutter points is eliminated, enhancing the precision of line feature matching, reducing matching errors, and obtaining the parameters needed for pose correction. Finally, using data collection equipment, two different scenes were captured with inertial navigation system data as the ground truth. The results showed that the pose accuracy in the Y-direction improved by 40%. By comparing the RMSE and the pose trajectories of the pose estimation results between the RI-SLAM and VRL-SLAM algorithms, validating the effectiveness of the method. In the future, line features can be incorporated into the back-end optimization framework using graph optimization.

Author Contributions

Conceptualization, Y.L. and X.C.; methodology, Y.L. and X.C.; software, X.C.; validation, Y.L. and Y.W.; formal analysis, Y.L. and X.C.; investigation, X.C. and Y.L.; resources, Y.L. and X.C.; data curation, X.C.; writing—original draft preparation, Y.L. and X.C.; writing—review and editing, X.C., Y.L. and Y.W.; visualization, X.C. and Y.L.; supervision, Y.L. and Y.W.; project administration, Y.L. and Y.W.; funding acquisition, Y.L., Y.W. and J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Beijing Natural Science Foundation 4232003, the National Natural Science Foundation of China 62131001, the National Natural Science Foundation of China 61971456, and the Yuyou Talent Training Program of the North China University of Technology 218051360020XN115/014.

Acknowledgments

We thank the anonymous reviewers for their good suggestions and comments to help improve the quality of the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, Y.; Wei, Y.; Wang, Y.; Lin, Y.; Shen, W.; Jiang, W. False Detections Revising Algorithm for Millimeter Wave Radar SLAM in Tunnel. Remote Sens. 2023, 15, 277. [Google Scholar] [CrossRef]
Olson, E.B. Real-time correlative scan matching. In Proceedings of the IEEE International Conference on Robotics & Automation, Kobe, Japan, 12–17 May 2009. [Google Scholar]
Li, Y.; Liu, Y.; Wang, Y.; Lin, Y.; Shen, W. The Millimeter-Wave Radar SLAM Assisted by the RCS Feature of the Target and IMU. Sensors 2020, 20, 5421. [Google Scholar] [CrossRef]
Chghaf, M.; Rodriguez, S.; Ouardi, A.E. Camera, LiDAR and Multi-modal SLAM Systems for Autonomous Ground Vehicles: A Survey. J. Intell. Robot. Syst. 2022, 105, 2. [Google Scholar] [CrossRef]
Chen, S.; Zhou, B.; Jiang, C.; Xue, W.; Li, Q. A LiDAR/Visual SLAM Backend with Loop Closure Detection and Graph Optimization. Remote Sens. 2021, 13, 2720. [Google Scholar] [CrossRef]
Shin, Y.-S.; Park, Y.S.; Kim, A. Direct Visual SLAM Using Sparse Depth for Camera-LiDAR System. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 5144–5151. [Google Scholar] [CrossRef]
Radmanesh, R.; Wang, Z.; Chipade, V.S.; Tsechpenakis, G.; Panagou, D. LIV-LAM: LiDAR and Visual Localization and Mapping. In Proceedings of the 2020 American Control Conference (ACC), Denver, CO, USA, 1–3 July 2020; pp. 659–664. [Google Scholar] [CrossRef]
Jiang, G.; Yin, L.; Jin, S.; Tian, C.; Ma, X.; Ou, Y. A Simultaneous Localization and Mapping (SLAM) Framework for 2.5D Map Building Based on Low-Cost LiDAR and Vision Fusion. Appl. Sci. 2019, 9, 2105. [Google Scholar] [CrossRef]
Ng, Y.Z.; Choi, B.; Tan, R.; Heng, L. Continuous-time Radar-inertial Odometry for Automotive Radars. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 323–330. [Google Scholar] [CrossRef]
Kottas, D.G.; Roumeliotis, S.I. Efficient and consistent vision-aided inertial navigation using line observations. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 1540–1547. [Google Scholar]
Kong, X.; Wu, W.; Zhang, L.; Wang, Y. Tightly-coupled stereo visualinertial navigation using point and line features. Sensors 2015, 15, 12816–12833. [Google Scholar] [CrossRef] [PubMed]
Liang, X.; Chen, H.; Li, Y.; Liu, Y. Visual laser-SLAM in large-scale indoor environments. In Proceedings of the 2016 IEEE International Conference on Robotics and Biomimetics (ROBIO), Qingdao, China, 3–7 December 2016; pp. 19–24. [Google Scholar]
He, X.; Gao, W.; Sheng, C.; Zhang, Z.; Pan, S.; Duan, L.; Zhang, H.; Lu, X. LiDAR-Visual-Inertial Odometry Based on Optimized Visual Point-Line Features. Remote Sens. 2022, 14, 622. [Google Scholar] [CrossRef]
Dong, R.; Wei, Z.-G.; Liu, C.; Kan, J. A Novel Loop Closure Detection Method Using Line Features. IEEE Access 2019, 7, 111245–111256. [Google Scholar] [CrossRef]
Bao, J.-F.; Li, C.; Shen, W.-P.; Yao, J.-C.; Guu, S.-M. Approximate Gauss–Newton methods for solving underdetermined nonlinear least squares problems. Appl. Numer. Math. 2017, 111, 92–110. [Google Scholar] [CrossRef]
Fernandes, L.A.F.; Oliveira, M.M. Real-time line detection through an improved Hough transform voting scheme. Pattern Recognit. 2008, 41, 299–314. [Google Scholar] [CrossRef]
Nieto, M.; Cuevas, C.; Salgado, L.; Narciso, G. Line segment detection using weighted mean shift procedures on a 2D slice sampling strategy. Pattern Anal. Appl. 2011, 14, 149–163. [Google Scholar] [CrossRef]
Akinlar, C.; Topal, C. EDLines: A real-time line segment detector with a false detection control. Pattern Recognit. Lett. 2011, 32, 1633–1642. [Google Scholar] [CrossRef]
Gioi, R.; Jakubowicz, J.; Morel, J.M.; Randall, G. LSD: A fast line segment detector with a false detection control. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 722–732. [Google Scholar] [CrossRef] [PubMed]
Gomez-Ojeda, R.; Moreno, F.-A.; Zuñiga-Noël, D.; Scaramuzza, D.; Gonzalez-Jimenez, J. PL-SLAM: A Stereo SLAM System Through the Combination of Points and Line Segments. IEEE Trans. Robot. 2019, 35, 734–746. [Google Scholar] [CrossRef]
Lusk, P.C.; Parikh, D.; How, J.P. GraffMatch: Global Matching of 3D Lines and Planes for Wide Baseline LiDAR Registration. IEEE Robot. Autom. Lett. 2023, 8, 632–639. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the local coordinate system.

Figure 2. Global coordinate system diagram.

Figure 3. Schematic diagram of the pixel coordinate system.

Figure 4. Method flowchart.

Figure 5. Schematic diagram of the VRL-SLAM method.

Figure 6. The flowchart of the A-Hough method.

Figure 7. Comparative illustration of GMM clustering with and without RCS weighting.

Figure 8. Schematic diagram for category selection in the Gaussian mixture model based on RCS.

Figure 9. Data collection equipment.

Figure 10. Correspondence between optical images and millimeter-wave radar frame sequences.

Figure 11. Optical image of Experimental Scene: (a) Scene 1: Moving Scene, (b) Scene 2: Parking Lot Scene.

Figure 12. Experimental results for Moving Scene: (a) global grid map; (b) projection of the global grid map onto the optical image; (c) results of A-Hough; (d) results of Hough transform line detection method.

Figure 13. Experimental results for Parking Lot Scene: (a) global grid map; (b) projection of the global grid map onto the optical image; (c) results of A-Hough; (d) results of Hough transform line detection method.

Figure 14. Experimental results for Moving Scene: (a) global grid map; (b) clustering results for the visual line feature set; (c) results from R-GMM clustering; (d) results from GMM clustering.

Figure 15. Experimental results for Parking Lot Scene: (a) global grid map; (b) clustering results for the visual line feature set; (c) results from R-GMM clustering; (d) results from GMM clustering.

Figure 16. Comparison of CSM and VRL-SLAM poses for Scene 1.

Figure 17. Comparison of CSM and VRL-SLAM poses for Scene 2.

Figure 18. The Y-direction difference between CSM and VRL-SLAM in Scene 1.

Figure 19. The Y-direction difference between CSM and VRL-SLAM in Scene 2 From Figure 18 and Figure 19, it can be intuitively observed that both methods initially have certain errors. However, the VRL-SLAM method proposed in this paper gradually reduces its error over prolonged measurements, tending towards smoothness, which validates the effectiveness of the method.

Figure 20. Comparison of RI-SLAM and VRL-SLAM poses for Scene 1.

Figure 21. Comparison of RI-SLAM and VRL-SLAM poses for Scene 2.

Table 1. Comparison of the poses between CSM-SLAM and VRL-SLAM.

Name	Scene 1	Scene 2
$R M S E_{X}^{C S M}$ (m)	0.18	0.20
$R M S E_{_{X}}^{V R L - S L A M}$ (m)	0.16	0.17
Accuracy Improvement (%)	7.09	6.53
$R M S E_{_{Y}}^{C S M}$ (m)	0.39	0.60
$R M S E_{_{Y}}^{V R L - S L A M}$ (m)	0.21	0.35
Accuracy Improvement (%)	46	41
$R M S E_{_{θ}}^{C S M}$ (°)	0.10	0.14
$R M S E_{_{θ}}^{V R L - S L A M}$ (°)	0.09	0.11
Accuracy Improvement (%)	10	2.14

Table 2. Comparison of the poses between RI-SLAM and VRL-SLAM.

Name	Scene 1	Scene 2
$R M S E_{X}^{R I - S L A M}$ (m)	0.17	0.21
$R M S E_{_{X}}^{V R L - S L A M}$ (m)	0.16	0.17
$R M S E_{_{Y}}^{R I - S L A M}$ (m)	0.32	0.44
$R M S E_{_{Y}}^{V R L - S L A M}$ (m)	0.21	0.35
$R M S E_{_{θ}}^{R I - S L A M}$ (°)	0.18	0.21
$R M S E_{_{θ}}^{V R L - S L A M}$ (°)	0.09	0.11

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Cui, X.; Wang, Y.; Sun, J. Correlative Scan Matching Position Estimation Method by Fusing Visual and Radar Line Features. Remote Sens. 2024, 16, 114. https://doi.org/10.3390/rs16010114

AMA Style

Li Y, Cui X, Wang Y, Sun J. Correlative Scan Matching Position Estimation Method by Fusing Visual and Radar Line Features. Remote Sensing. 2024; 16(1):114. https://doi.org/10.3390/rs16010114

Chicago/Turabian Style

Li, Yang, Xiwei Cui, Yanping Wang, and Jinping Sun. 2024. "Correlative Scan Matching Position Estimation Method by Fusing Visual and Radar Line Features" Remote Sensing 16, no. 1: 114. https://doi.org/10.3390/rs16010114

APA Style

Li, Y., Cui, X., Wang, Y., & Sun, J. (2024). Correlative Scan Matching Position Estimation Method by Fusing Visual and Radar Line Features. Remote Sensing, 16(1), 114. https://doi.org/10.3390/rs16010114

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Correlative Scan Matching Position Estimation Method by Fusing Visual and Radar Line Features

Abstract

1. Introduction

2. Pose Estimation Framework Integrating Radar and Visual Line Features

2.1. Related Work

2.2. Coordinate System Definition

2.3. Principle of the Method

3. Adaptive Hough Transform Line Detection Method Based on Prior Radar Grid Map Projection

3.1. Estimation of Mapping Parameters between Local and Global Coordinate Systems

3.2. Estimation of Mapping Parameters between Local Coordinate System and Pixel Coordinate System

3.3. Estimation of Mapping Parameters between the Global Coordinate System and the Pixel Coordinate System

4. Based on RCS Gaussian Mixture Model Clustering Method

5. Results

5.1. Experiment System

5.2. Experimental Design

5.2.1. Experimental Results and Analysis of the Hough Transform Line Detection Method Based on Prior Map Projection

5.2.2. Experimental Results and Analysis of the RCS-Based Gaussian Mixture Model Clustering Method

5.2.3. Experimental Results and Analysis of the CSM Pose Optimization Method Integrating Visual and Radar Line Features

5.2.4. Comparison and Analysis of Experimental Results between VRL-SLAM and RI-SLAM

6. Discussions

7. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI