We achieve our goal by proposing a hierarchical planning method based on IB-PRM.
Figure 3 illustrates the system framework of the method proposed by us. Whenever the unmanned ground vehicle reaches a new location, a PRM graph is randomly sampled within the perception range, with the current position as the center. Then, using the cubic B-spline control point search algorithm, the shortest path from the starting node to each node in the graph is obtained (corresponding to the green dashed line part in
Figure 3,
Section 4.1), and a new local graph structure is established (corresponding to the red dashed line part in
Figure 3,
Section 4.2). After clustering the nodes in the local graph, they are combined as new frontiers with the historical frontiers already stored in the global graph to solve the Traveling Salesman Problem and obtain the globally optimal path traversing the best nodes (corresponding to the blue dashed line part in
Figure 3,
Section 4.3). Meanwhile, the frontiers in the local graph will also be added to the global graph in the form of paths. Finally, the TEB algorithm is used to generate a locally smooth path suitable for the unmanned ground vehicle to execute (corresponding to the yellow dashed line part in
Figure 3,
Section 4.4), which is then output to the controller to complete one exploration motion. Iterate the process until the conditions for completing the full-graph exploration as set by us are met. The B-spline control point search algorithm proposed by us, which serves as the foundation and main innovation point for the subsequent content, will be first introduced in
Section 4.1. The design and theoretical derivation of the local trajectory planner will be detailed in
Section 4.4.
4.1. B-Spline Control Points Search
B-splines possess excellent properties such as locality, continuity, convex hull characteristics, and geometric invariance. Building upon the B-spline search [
15,
16], we propose a B-spline control point search algorithm utilizing a probabilistic roadmap. On one hand, most planning algorithms adopt a hierarchical framework, utilizing low-dimensional global planning paths as references and incorporating time dimensions using numerical optimization methods while satisfying motion and collision constraints. The exploration process of autonomous vehicles similarly follows this approach. However, the exploration process lacks fixed target points; its primary objective is to guide the vehicle to unknown areas. In this scenario, a fast and feasible path search algorithm is needed to acquire paths to reach frontiers. On the other hand, with map updates, the exploration target points of the vehicle change, leading to corresponding changes in global paths. Frequent changes in global paths can cause discontinuous vehicle motion, prolonging exploration time. Hence, an algorithm considering the current orientation of the vehicle is essential to ensure continuity throughout the global planning. We have designed a B-spline control point search algorithm based on PRM to meet the aforementioned requirements. In comparison to traditional search algorithms, using PRM to represent the map significantly reduces the number of nodes while ensuring probabilistic completeness. Expanding neighboring nodes through a B-spline search leads to a high-order continuous global path.
Let
be the control point representing the optimal trajectory. According to the locality of B-spline curves, for a
k-degree B-spline curve, every
control point determines a segment of the curve, denoted as
. To accurately quantify the cost of this segment, an appropriate cost function is needed. Mellinger and Kumar [
17] proposed that the quadratic integral of trajectory derivatives can reflect the smoothness of the trajectory and the cost of control to some extent, but calculating high-order derivatives of the trajectory is complex. Therefore, Usenko [
18] used the control points of the B-spline to calculate the integrals of the required high-order derivatives of the trajectory and optimized them as the objective function of the curve. The cost function of a cubic B-spline curve is as follows:
where
represents the basis function matrix of the curve, and
represents the integration of the interval vector’s derivatives. Their expressions are as follows:
where
represents the interval of the B-spline parameter
t,
represents the normalized result of the original B-spline parameter equation, i.e.,
.
From this, it can be seen that the cost of the curve
is not only controlled by two simple starting and ending points, but collectively determined by the
control points
. In this case, we define these
control points as nodes
of a new graph
, where edges
represent the connectivity between the new node
.
Figure 4 simply depicts the correspondence between the graphs
and the original graph
when
. The nodes
in the graph form an initial graph structure
, with
representing the edges between nodes. Based on this, the graph
is constructed with 4 nodes as a new node
, where
,
,
, and
form a new node
,
,
,
, and
form a new node
. Since
is connected to
in graph
,
and
share
,
, and
,
is connected to
, and, similarly,
is connected to
. As a result, the new node
jointly defined by the four original nodes
P represents a segment of a high-order continuous curve in geometric terms, thereby transforming the original path
into a new set of paths
.
Next, we draw on the idea of the Dijkstra algorithm. First, we randomly sample points in the map where the distance to obstacles is greater than to generate the probabilistic roadmap . Subsequently, we use B-spline control points to search the entire region’s graph structure, finding the shortest path from the starting point to each node in the graph. The algorithm flow is shown in Algorithm 1. Initially, we maintain three container structures: OPEN, LIST, and CLOSE. OPEN and LIST structures help in finding the node with the minimum cost value, while CLOSE determines if the node has already found the minimum value for the path. We calculate the cost value of the starting node using Equation (2). To differentiate between the expanded nodes, we use the function to encode each node. Considering that the orientation of the initial node of each segment of a cubic spline curve is controlled only by the first and third nodes, we generate an index value for each new node based on encoding the first node and the third node among the four nodes.
We use the function
to expand the next node while checking the feasibility of the node. First, we obtain the tail node
of the current node
in the original PRM graph. By using the connectivity in the PRM graph, we obtain its neighboring node
and combine node
with the three nodes
,
, and
in
to form a new node
. To reduce computational complexity, the feasibility of the node is checked by simply detecting that there is no collision between the control points. In the absence of a target point, the entire search process continues until the
set is empty, thus obtaining the shortest paths from the starting point
to each point
. After obtaining the shortest paths, precise collision detection is performed on the curve interpolation. We adopt the method from [
19], by inserting new control points on the line connecting the control points corresponding to the collision position to increase the weight in the direction away from obstacles, ultimately obtaining a safe collision-free path.
Algorithm 1: B-spline Node Search |
1: | function (,k,,) |
1: | function (,k,,) |
2: | ;//init priority queue |
3: |
|
4: |
|
5: |
|
6: |
|
7: | while do |
8: |
|
9: | for do |
10: |
|
11: | if not then |
12: |
|
13: |
end if |
14: | if then |
15: |
|
16: |
|
17: |
end if |
18: |
end for |
19: |
end while |
20: | end function |
4.2. Local Graph Construction
At the beginning of the exploration, the global exploration graph is initialized using the starting node. Then, the current pose
of the vehicle is obtained. Taking the current position of the vehicle as the center and the perception length
as the radius, a local PRM is constructed. As shown in Algorithm 2, new nodes are sampled within the perception range
. Subsequently, using
KD-TREE, the nearest node in the graph is found, and nodes are added at a distance of step size
. Collision detection is performed using lazy check. Connection edges are formed between collision-free sampling nodes and neighboring nodes, and it is determined whether there are connectable edges within a distance of
around the new node. The construction of the local PRM graph
continues until the number of nodes or edges in the entire graph exceeds the set thresholds
and
.
Algorithm 2: Build Local PRM |
1: | function BuildLocalGraph ( ) |
2: |
|
3: |
|
4: | while and do |
5: |
|
6: |
|
7: | if CollisionFree then |
8: |
|
9: |
|
10: | for all do |
11: | if CollisionFree then |
12: |
|
13: |
end if |
14: |
end for |
15: |
end if |
16: |
end while |
17: | return |
18: | end function |
Subsequently, the cubic B-spline control point search algorithm in Algorithm 1 is used to obtain the optimal path from the starting point to each node. The B-spline graph search algorithm, based on graph , combines 4 adjacent nodes , , , and to form a new node and stores it in the tail node to establish a new local graph . This combination ensures that even if the tail node is the same among the four nodes, i.e., even if reaching the same node during the expansion process, the different expansion paths result in a different index for the new node . After the search, each node in stores a large number of new nodes . Based on this special graph structure, suitable frontier sets are selected through two layers of operations: filtering and clustering.
- (1)
Filtering: The primary goal of exploration is to expand the unknown areas on the map. Therefore, we use the ray-casting algorithm to calculate the information gain value
of nodes in the local graph for filtering. Unlike algorithms [
9,
12] where the node with the highest information gain value is selected as the target point, we do not rank the information gain values of nodes. Considering that the vehicle’s perception range is a circular area with a radius of
, the orientation of nodes during exploration does not affect the magnitude of information gain value. The calculation of information gain value is solely for filtering out frontiers and non-frontiers. We set a threshold
as a boundary, where points
are considered frontiers, completing the first filtering step.
- (2)
Clustering: After filtering, there are still many frontiers in the area, and many of these frontiers are close in position and orientation. Directly merging these frontiers with frontiers in the global graph for solving the Traveling Salesman Problem undoubtedly increases computational complexity. Therefore, we use the Dynamic Time Warping algorithm (DTW) [
20] to cluster the remaining nodes. This algorithm compares the similarity between different paths and cluster nodes with similar and close positions based on the temporal distance between paths. The number of frontiers significantly reduces after DTW clustering.
4.3. Global Graph Construction and TSP Solution
To preserve the generated frontiers, we establish and maintain a global graph structure that is not reinitialized with each iteration of the exploration algorithm but incrementally expands as the map is updated. During the expansion process, new frontiers generated in the local graph are added to the global graph in a path-like manner using a method similar to Algorithm 2. This ensures connectivity and allows the utilization of the global graph to find frontiers further away from the vehicle, providing retreat routes when the vehicle encounters dead ends in local areas. Additionally, the gradually constructed global graph only incorporates paths selected from the local graph, reducing the search burden and laying the groundwork for handling large, complex environments.
The Traveling Salesman Problem involves finding the shortest path for a single traveler to depart from a starting point, visit all target nodes, and return to the origin. The purpose of exploration is to quickly traverse the entire graph. In contrast to algorithms [
9] and [
12], where only the node with the lowest local cost is selected as the target point, leading to potential local optima, a comprehensive consideration of all node paths results in shorter and more effective exploration paths. In each frame of the algorithm loop, multiple frontiers are generated. Inspired by [
5], this paper converts these into a Traveling Salesman Problem, incorporating improvements based on the characteristics of unmanned vehicles.
When new frontiers are generated in the local graph, after filtering and clustering, combined with existing frontiers in the global graph, the traversal problem of frontiers is transformed into a Traveling Salesman Problem for resolution. When the information gain value for each node in the local area is less than a threshold , indicating the completion of the local area exploration, only historical frontiers stored in the global graph are used to solve the Traveling Salesman Problem to obtain escape routes.
Assuming the existence of
frontiers denoted as
, where
represents the node where the vehicle is currently located, the starting point of the TSP. In existing algorithms, drones, due to their few constraints, adopt an approximation of circular flying during exploration. While this method reduces exploration path length and time, it is not suitable for vehicles with more constraints. Additionally, cyclically solving the TSP and executing it consumes significant computational resources during exploration. As robots can only move to one node at a time during each TSP loop, the computation cost of calculating the true path cost for all nodes is high. Therefore, considering the relatively low movement cost of nodes in the local graph, prioritizing the newly generated frontiers in the local PRM is likely to yield better results. In the calculation of the cost from the starting point to each frontier, the cost from the starting point
to frontiers outside the local graph is set to a large value, while the cost between other nodes is calculated using the normal formula. The formula is as follows:
In the equation,
represents the cost from frontier
to frontier
, which is the maximum value of the path length obtained from the Dijkstra search considering obstacle avoidance constraints and the B-spline curve length ignoring obstacles. The design of the equation is inspired by the design concept of the hybrid A* heuristic function [
21], utilizing the search speed of the Dijkstra algorithm to obtain the collision-free shortest path for each node. Combining the locality and continuity of the B-spline curve (ignoring obstacles) to find a smooth trajectory suitable for unmanned vehicle execution, the maximum value between the two is taken as the cost between nodes, avoiding the excessive computational cost of considering both kinematic and obstacle avoidance constraints for each node. The cost of the B-spline curve between nodes is directly retrieved from the global graph without the need for separate solving.
Considering that the TSP requires returning to the starting point, while exploration does not require such a return, this paper sets the distance from each frontier to the starting point
as 0. After solving the TSP, the order of the shortest paths for traversing each node is obtained, with the first node to be traversed selected as the current target point. Additionally, considering the uncertainty in the vehicle’s movement process due to modules such as localization leads to ongoing oscillation in the vehicle’s selection of target points, preventing the stable movement towards the target point and the rolling optimization of the arrival path. We only call the LKH library [
22] to solve the TSP when the change in information gain value at the target frontier is less than a certain threshold
.
4.4. Local Path Planning Based on TEB
To ensure the quick identification of frontiers in the map, we adopt a strategy of non-uniform step extension, which increases the density of the graph, resulting in unequal distances between nodes in the path. When the distance is large, the local trajectory appears rough and it is difficult to guarantee optimality. Therefore, trajectory optimization is required for the global path that includes B-spline curve segments obtained from the search. The TEB algorithm [
23] divides the trajectory into segments consisting of line segments and circular arcs, capable of generating the optimal executable trajectory while satisfying obstacle avoidance and dynamic constraints.
The traditional Elastic Band (EB) algorithm uses the vehicle’s pose
for path optimization and defines the entire optimization variables as follows:
TEB then adds the corresponding time interval
on the basis of EB, with the formula as follows:
where
represents the time needed to transform posture
to posture
, and combining the two equations gives the optimization variable of TEB as follows:
As shown in
Figure 5a,
,
, and
are three poses on the time elastic band, with
and
representing the time intervals between
and
and
and
, respectively. The principle is to discretize the entire path, treating the entire path as an elastic band, with path points serving as nodes of the elastic band. The constraints and objective function of the path are considered as external forces of the elastic band, and appropriate paths are found by modifying nodes based on external forces. This algorithm combines local planning with control quantities, taking the global path as input and motor speed and wheel angle as outputs. Interpolation is performed on the basis of the global path
to obtain an initial path
, where both
and
represent the pose of the vehicle.
Given the trajectory of a path, combined with Equation (1), the motor speed
and the equivalent front wheel steering angle
at each moment
can be derived as follows:
where
indicates the direction of linear velocity, which is positive for forward and negative for backward.
When planning the local trajectory, we need to consider various constraints such as the global path, obstacles, vehicle kinematics, and dynamics. Firstly, there are kinematic constraints. As shown in
Figure 5b, let
and
be two consecutive poses in the vehicle trajectory,
and
be the velocity directions of the two poses,
,
and
be the angles between
and
and
and
, respectively. Since the two poses are on the same constant curvature arc with curvature
, the vehicle moves from
to
only under the action of a constant control quantity
. Combining Equation (8) leads to the derivation of the vehicle kinematic constraints.
In addition, the constraints of maximum curvature, maximum velocity, and maximum acceleration are, respectively:
where
, angular velocity
, and acceleration
.
For obstacle constraints, we utilize the discrete nature of trajectory points to consider obstacle avoidance by taking into account the poses near obstacles. Firstly, find the trajectory points in the entire global path within a certain range of obstacles
and their corresponding obstacles. Utilize the scalar
to constrain their distances, with the constraint inequality as follows:
where
represents the minimum distance from pose
to obstacle
, and scalar
represents the minimum distance that can be operated with the obstacle.
At the initial stage, the TEB will be initialized using the global planning path, and as the vehicle moves, the optimized state variables will be dynamically removed and added, with the adjustment intervals as follows:
the
represents the time interval that
should satisfy, while
serves as a lagging parameter to avoid oscillation of the optimization results, ensuring the density of the entire optimization results. The trajectory nodes are associated with obstacles and waypoints near the trajectory, and each trajectory point corresponds to the respective obstacles. The algorithm transforms the optimization problem into a hyper-graph, which is different from a basic two-dimensional graph. In this hyper-graph, the nodes connected by an edge are unrestricted, and the constraints are the edges of the hyper-graph. The optimization variables related to it are the nodes connected by the edge. This transformation allows constraints to act only on local optimization variables, such as the velocity constraint in
Figure 6, which is only related to the front and back poses of the vehicle,
,
, and their time interval
. Therefore, the hyper-graph has three nodes and one edge. Thus, the entire optimization problem can be represented as a sparse matrix, and the open-source framework
g2o is used to solve this optimization problem. Once the optimal trajectory
is obtained, on the one hand, using Equation (8), the required motor speed
and equivalent front wheel steering angle
for the wheel controller are calculated. On the other hand, leveraging the algorithm framework, with the movement of the vehicle, after inserting and deleting states, re-optimization is carried out. This iterative optimization strategy greatly improves computational efficiency and ensures trajectory quality.
In contrast to directly selecting the minimum motion cost, we use the total time of the trajectory as the objective function to improve exploration efficiency:
By combining the above constraints and objective functions, we transform the nonlinear programming problem with hard constraints into a quadratic optimization problem to improve the efficiency of solving. Simultaneously, we utilize the sparsity of the problem and transform constraints into penalty terms. For the equality constraint in Equation (9), it is transformed into quadratic constraints, as follows:
where
is the identity matrix, and
is the penalty value for this constraint.
For inequality constraints
such as (10), (11), (12), (13), they can be converted into one-sided quadratic constraints with penalty value
:
We transform the final objective function into the following form:
Here, represents the time interval between two poses, is the kinematic constraint, is the curvature constraint, and are the velocity and acceleration constraints, represents the distance constraint from obstacles, and , , , , and are penalty matrices. This structure provides convenience for adding new constraint conditions in the future.
The overall algorithm flow is shown in
Figure 7. Firstly, based on the global reference path obtained from solving the TSP, the initial path is interpolated. Dynamically insert or delete optimized state variables according to the vehicle’s motion, then match the trajectory nodes with the obstacles and waypoints near the trajectory, find the corresponding obstacle for each trajectory point, establish a hyper-graph using the objective function in Equation (18), and optimize to obtain the optimal trajectory
; when the trajectory is feasible, calculate the vehicle’s control inputs based on Equation (8) and send them to the lower-level controller.