1. Introduction
Simultaneous Localization and Mapping (SLAM) is often considered as one of the main challenges in the field of robotics and autonomous vehicles [
1,
2]. The aim of SLAM is to build a map of an unknown environment while simultaneously determining the location of the vehicle within this map. Neither the map nor the vehicle position are known in advance. However, a kinematic model of the vehicle motion is assumed to be known a priori, and the unknown environment is populated with artificial or natural landmarks.
There is a large body of literature on solving the SLAM problem for various autonomous vehicle applications. Many of the solutions are formulated based on using a Bayesian filter that recursively propagates the distribution of the vehicle’s dynamic states (and sometimes the map features). Davison [
3] proposed a solution based on using Extended Kalman Filters (EKFs) to track the location of each map feature which was extracted from the sensor measurements. To improve the accuracy of this method, Thrun et al. [
4] proposed the Sparse Extended Information Filter. The idea is to optimize the calculations required by the EKF, via taking advantage of sparse matrices. Montemerlo et al. [
5] proposed a Rao-Blackwellised particle filter-based method called FastSLAM. In this method, the odometry uncertainties are rectified by adding random offsets to the odometry data for each particle.
In such solutions, to simultaneously estimate both vehicle and landmark locations, the Bayesian filter needs to employ two models: an observation model, and a motion model. With regards to the observations, the vehicle must be equipped with a sensor or set of sensors that produce measurements related to the surrounding landmark locations. The most common examples are LIDAR [
6,
7,
8], RGB camera [
9,
10,
11,
12], RGB-D camera [
13,
14] and sonar [
15] sensors. The sensory measurements usually have range limitations, and contain measurement noise. The raw measurements are normally processed further to extract point features. An on-board sensory system on the vehicle is usually in place to collect the features in real-time.
With the advancement of sensor technology and decreasing cost of LIDAR scanners, the generation of an adequate 3D point cloud is now possible. A LIDAR works by sending a laser pulse towards an object and measuring the time it takes to reflect from the object and return to the sensor. Because of its high accuracy and ease of use, a LIDAR sensor has become a common choice for SLAM purposes. Having the generated point cloud, it is possible to extract features that are more complex (and convey more information) than point features. Employing more information-rich features as observations in a Bayesian filter within a SLAM algorithm can increase the algorithm efficiency and reduce the computational cost. For example employing planar features instead of 3D points could considerably enhance the algorithm efficiency, as the number of planes is significantly smaller than the number of points in a typical 3D point cloud. However, in order to properly use these features in the Bayesian framework, an appropriate motion model needs to be devised.
In human-inhabited environments, buildings and large facilities are currently almost ubiquitous and their geometric profiles comprise a large number of plane shapes. Out of all these shapes, the major ones naturally create “obstacles” or define “edges” beyond which the vehicles cannot protrude, and these planes can be extracted and employed as features or landmarks in SLAM applications. Indeed, due to the geometric simplicity of the plane shapes and their abundance in human-inhabited environments, planar features have attracted increasing attention from both the computer graphics and robotics community in recent years [
16,
17,
18]. As for the smaller plane shapes, they may be from the profiles of small objects or even moving objects such as vehicles. These small planes can be readily excluded from the set of landmarks by applying a threshold to the dimension of planes.
This paper presents for the first time the idea of using planar features extracted from a 3D point cloud, instead of point features, for Bayesian SLAM filters. The major contribution of this study lies in proposing a novel transition model that predicts the parameters of planar features extracted from a 3D point cloud. With Bayesian filters, the first step is prediction, where the object state densities are propagated to the next time step based on a stochastic transition model (motion model). We present how such a transition model can be developed, and propose a solution for state prediction. In the simulation studies, using a dataset of measurements acquired from real vehicle sensors, we apply the model to predict the planar features at the next time. The results show reasonable accuracy and efficiency for statistical filtering-based SLAM applications.
The proposed transition model consists of two sub-models: the plane transition model and the vehicle transition model. The latter one is associated with localization; namely, it predicts the vehicle states at time
using the state information at time
k. The former one is associated with mapping, which predicts the feature parameters at time
based on the feature information at time
k as well as the predicted vehicle states resulting from the latter model. The accuracy and effectiveness of the proposed transition model are verified using real-world point cloud measurements from the KITTI dataset [
19]. The graphical and numerical simulation results show that the predicted planar features resulting from the proposed transition model closely match the measured features (i.e., segmented 3D planes) at time
, in terms of the plane dimensions (plane area and vertex distance) and the plane orientation (normal vector).
The rest of the paper is organized as follows.
Section 2 provides relevant theoretical background.
Section 3 introduces the details of the proposed transition model which comprises two sub-models: the plane transition model and the vehicle transition model.
Section 4 explains the detailed procedure for implementation of the proposed transition model.
Section 5 provides verification of the proposed transition model by means of graphical and numerical simulation results.
Section 6 concludes the paper.
3. Proposed Transition Model
As explained in
Section 1, a transition model is required in SLAM filters to predict the states of the features at the next time step, based on the state information at the current time step. Specifically, in this study, we are interested in constructing a transition model for predicting the plane parameters at the next time step, using the current plane parameters estimated by a SLAM filter. Note that the planes involved in the transition model are expressed in the vehicle coordinate system (fixed to the vehicle), as the point cloud measurements are acquired by a laser scanner mounted on top of the experimental vehicle. Thus the plane parameters vary with time, as long as the experimental vehicle is in motion. On the other hand, if seen from the global coordinate system (fixed to the ground), the plane parameters are invariant since the planes are all static in this coordinate system.
We emphasize that the vehicle is assumed to move in a static environment. If this is not the case, we make the practical assumption that planar surfaces on other moving objects are small enough to be excluded from the set “plane observations” extracted from the 3D point cloud. That set is expected to include only relatively large planar surfaces such as walls of buildings along the road.
3.1. Plane Transition Model
To facilitate the design of the transition model, the two aforementioned coordinate systems are established, as shown in
Figure 1. The global coordinate system
is fixed to the ground, and the vehicle coordinate system
is attached to the vehicle Center of Mass and moves along with the vehicle. P is a point on a plane segmented from the 3D point cloud obtained from the on-board laser scanner.
and
are the position vectors of P in the global and vehicle coordinate systems, and
is the position vector of the vehicle centre of mann (i.e., origin
o) in the global coordinate system. The global coordinates of P are invariant since
is static as seen from the global coordinate system. However, the local coordinates of P, expressed in the vehicle coordinate system, vary with time due to the motion of the vehicle. As a result, the plane parameters expressed in the vehicle coordinate system also evolve with time, and a transition model is thus needed to predict the change of plane parameters for the purpose of accurate and effective SLAM.
The plane equation, expressed in the vehicle coordinate system, is assumed to take the following form in this study:
where
x,
y and
z are the coordinates of a point on the plane, and
a,
b,
c and
d are the plane parameters. Note that other forms of plane equations are also available in the literature [
21], but all forms of plane equations can eventually lead to Equation (23) after simple algebraic manipulation. We may rewrite Equation (23) in the following vector form:
where
and
.
Assuming that the vehicle is only performing a planar motion (which is normally the case in common flat-road urban driving scenarios), the coordinates of point P in the global coordinate system
, and the coordinates of point P in the vehicle coordinate system
, are related according to the following equations describing the kinematics of the vehicle [
22,
23]:
and
where
represents the heading angle of the vehicle (with respect to the
X axis), and
and
denote the coordinates of the vehicle centre of mass (origin
o) in the global coordinate system (namely
). The vehicle states
,
and
are graphically shown in
Figure 2.
Equations (25) and (26) can be rewritten as follows, for time
k:
and
where
. Similarly, we have the following equations for time
:
and
where
.
The vector
is invariant with time, since it is the position vector of P in the global coordinate system. As a result, one can achieve the following equation by combining Equations (27) and (29):
Note that Equation (24) can be rewritten as follows, for time
k:
where
represents a 4-by-4 invertible matrix. We might as well let
and combine Equations (31) and (32), then we arrive at:
The plane Equation (24) for time
can be expressed as follows:
Combining Equations (33) and (34) leads to the following transition model for plane parameters:
where
.
This plane transition model indicates that the plane parameters at time can be calculated based on the plane parameter information at time k and a vehicle-motion-dependent 4-by-4 matrix . Note that the computation of matrix requires the inverse of . It can be easily proven that the matrix , expressed by Equation (28), has full rank and its inverse exists at all times. Thus, the matrix can be computed as long as is available. However, this matrix (as indicated by Equation (30)) requires the vehicle state information at the future time , which is not available in real-time SLAM applications. To tackle this issue, in the following section we will introduce a new vehicle transition model and demonstrate how the vehicle states at time can be predicted.
3.2. Vehicle Transition Model
As mentioned above, the transformation matrix requires the unknown information from the future time . In order to obtain matrix , in this section a vehicle transition model is introduced to predict the future vehicle states at time , using the available information at time k.
In Bayesian filtering, the Constant Turn (CT) model [
24,
25,
26] which describes the maneuver of the vehicle motion is commonly used. This model is formulated based on the assumption that the vehicle maneuvers with a (nearly) constant speed and a (nearly) constant angular rate [
24]. Based on the CT model, an Extended Constant Turn (ECT) model is proposed in this work, which can be mathematically expressed as follows:
with
where
,
,
and
denote the vehicle longitudinal and lateral displacements (i.e., the coordinates of the vehicle centre of mass as introduced in
Section 3.1) in the global coordinate system at time
k and
,
and
represent the vehicle heading angles at time
k and
,
,
and
are the vehicle velocity and yaw rate noise components, and
T stands for the sampling period. By means of this vehicle transition model (36), the required vehicle state information at time
can be predicted and in turn the matrices
and
can be calculated.
Note that the plane transition model proposed in
Section 3.1 (i.e., Equation (35)), along with the above ECT vehicle transition model, constitutes a complete state transition model that propagates the state densities of the planar features to the next time. This state transition model plays a key role in the prediction step for prospective planar-feature-based SLAM filters.
4. Implementation
In this section, we introduce the Sequential Monte Carlo (SMC) implementation of the proposed transition model, for the recently developed LMB filter. The LMB filter has been used in many applications such as target tracking [
27,
28], SLAM [
29], visual tracking [
30], and sensor control [
31,
32]. In the SMC implementation, the LMB posterior for the map features (planes) at each time step are represented by a set of particles. These particles are propagated using the proposed transition model to obtain the LMB posterior at the next time step.
Assuming the following LMB posterior
at time
k:
where
ℓ represents the label of a plane,
denotes the existence probability of the plane
ℓ,
is the spatial distribution of its parameters, and
stands for the label space. In an SMC implementation, the spatial distribution is represented by a set of weighted samples [
28], namely:
where
represents the number of particles,
denotes the weight of the
i-th particle, and
is the Dirac delta function.
For each particle
at time
k, we are able to calculate the corresponding predicted particle
at time
based on the transition model proposed in
Section 3. As a result, after the prediction stage, we will end up with a new set of particles with the same weights, namely:
where
. Then, based on the new predicted particles at time
, the plane parameter estimate
can be achieved from the distribution particles as the Expected A Posteriori (EAP) estimate (i.e., weighted average), or the Maximum A Posteriori (MAP) estimate (i.e., considering the particle with the largest weight as the estimate).
The above implementation procedure is illustrated step by step in Algorithm 1, in the form of a pseudo-code.
Algorithm 1 Step-by-Step Implementation of the Proposed Transition Model |
Require: vehicle states , , , and particles representing the distribution of each plane with label ℓ, for all labels |
1: compute matrix | ▹ use Equation (28) |
2: generate a noise sample | ▹ diag(noise variances) |
3: generate , and | ▹ use Equation (36) |
4: compute matrix | ▹ use Equation (30) |
5: |
6: for do |
7: for do |
8: |
9: |
10: end for |
11: | ▹ compute the EAP estimate |
12: end for |
5. Simulation Results
In this section, we demonstrate how well the proposed transition model performs, compared with the actually measured results from the KITTI dataset. Specifically, we generate the predicted planes at time
using the plane information at time
k, by means of the proposed transition model, and we segment planes from the point cloud data in the KITTI dataset at time
, by means of the Modified Selective Statistical Estimator (MSSE) [
33]. Then, the predicted planes at time
are compared with the segmented planes at time
, and both graphical and numerical results are demonstrated to show the closeness of the predicted planes and the segmented planes.
The point cloud data from the folder “2011_09_26_drive_0005_sync” in the KITTI dataset are employed in the simulation studies for plane segmentation. The details for accessing and using the KITTI dataset are available in [
19]. The folder “2011_09_26_drive_0005_sync” includes 154 time steps (i.e.,
), and the points obtained at
,
,
and
are used in the simulation. The number of points at
,
,
and
are 123,334, 123,316, 118,950 and 118,572 respectively.
5.1. Graphical Results
Figure 3 and
Figure 4 demonstrate both the predicted planes and the segmented planes at time
for one Monte Carlo (MC) run. The predicted planes are in orange color, and are generated by the proposed transition model using the LIDAR measurements and vehicle states at time
. The segmented planes are in blue color, and are plotted based on the point cloud measurements at time
. Apart from the planes themselves, the predicted and measured point clouds at time
are also plotted in
Figure 4, on top of the corresponding planes.
Figure 3 shows that by means of the transition model and the available information at time
, three planes are predicted to exist at time
. Also, three segmented planes are present in this figure based on the LIDAR measurements at time
. It is clearly observed in
Figure 3 that each predicted plane is associated with one segmented plane. Namely, these six planes constitute three pairs of planes, with each pair composed of one predicted plane and one segmented plane. Besides, we see that for each pair of orange and blue planes, the areas of planes are fairly close and the distances between each pair of plane vertexes are rather small (‘Each pair of plane vertexes’ is referred to one vertex on the predicted plane and its closest counterpart on the segmented plane).
Figure 4 demonstrates the three pairs of planes from different angles. Again, we observe the closeness of the predicted and segmented planes in terms of plane areas and vertex distances, when viewed from different angles.
Figure 4c shows a ‘side view’ of the three pairs of planes, from which we clearly see that the predicted and segmented planes (and their normal vectors) almost coincide. Similar results are also observed in
Figure 4d where a ‘top view’ of the planes is shown.
In addition to
, our simulation studies have included a large range of other time steps.
Figure 5 and
Figure 6 provide the results of another time step
. The rest of time steps present similar results and are not shown in this paper for the purpose of brevity. In the following section, the above graphical results will be further supplemented and clarified, by means of detailed numerical results.
5.2. Numerical Results
In
Figure 3,
Figure 4,
Figure 5 and
Figure 6, we have graphically demonstrated the comparison results between the predicted and segmented planes, in terms of three important indicators – plane area, vertex distance and normal vector. In this section, we provide the results of our numerical simulation for 100 MC runs, in order to demonstrate ‘how close’ exactly these planes are.
The first two indicators, plane area and vertex distance, are mainly used to quantify the closeness in terms of plane dimensions (sizes).
Table 1 shows the areas of the predicted and segmented planes for time step
(averaged from 100 MC runs), and
Table 2 gives the distances between each pair of plane vertexes for time step
(averaged from 100 MC runs). We see in these tables that the areas of the predicted and segmented planes are close, and the distances between each pair of vertexes are small. These numerical results are consistent with the graphical results shown in
Section 5.1.
The third indicator, normal vector, is employed to evaluate the closeness in terms of plane orientation.
Table 3 shows the angles between each pair of normal vectors for time step
(averaged from 100 MC runs) (‘Each pair of normal vectors’ is referred to the normal vector on the predicted plane and its counterpart on the segmented plane). The small magnitudes of these angles imply that the predicted and segmented planes are very close in terms of orientation. This explains the coincidence of the predicted and segmented planes seen in
Figure 4c,d.
Besides the above three indicators for closeness evaluation, the execution time of the program is also recorded. The proposed transition model is implemented based on Algorithm 1, and the program is executed on a laptop equipped with an Intel i7-5600U CPU and an 8G RAM. For time step , the execution time is 0.0375 s (averaged from 100 MC runs).
Apart from
, the numerical results for another time step
are also presented in
Table 4,
Table 5 and
Table 6. Similarly, these results show that the predicted planes (obtained using the information at
) are fairly close to the segmented planes at
, in terms of plane areas, vertex distances and normal vectors. Besides, the execution time of the program for
is 0.0369 s (averaged from 100 MC runs). For the purpose of brevity, the numerical results for other time steps are omitted in this paper.