1. Introduction
Multi-UAV RL and navigation not only play an increasingly crucial role in civil and military fields [
1,
2], but have also achieved notable successes in commerce, agriculture, and medical rescue [
3,
4,
5]. Among them, UAV positioning technology [
6] is a core element ensuring its safe and efficient operation. By controlling the circumnavigation of UAVs around a central point, the system [
7] can achieve precise positioning and navigation in complex environments. In the application of circumnavigation, a multi-UAV RL system [
8] must be capable of responding to changing environments and mission requirements in real time. Achieving a good real-time performance may necessitate more complex algorithms and hardware. Therefore, in the design and implementation of a multi-UAV RL system, it is essential to comprehensively address the aforementioned shortcomings and identify corresponding solutions to enhance the robustness and adaptability of the system.
The most common methods include external positioning systems, such as the Global Positioning System (GPS) [
9,
10] and anchor-based ultra-wideband (UWB) positioning [
11,
12], which have notably enhanced positioning accuracies. However, in specific environments, such as urban canyons, indoors, or during severe weather conditions, satellite signals may be obstructed or interfered with, thereby limiting the reliability and accuracy of traditional Global Navigation Satellite Systems (GNSSs) [
13,
14] in these situations. Furthermore, the deployment of an external positioning system can lead to complexities in maintenance and updating. In particular, in systems necessitating long-term operation, the maintenance of both hardware and software can pose challenges.
To address these challenges, researchers are integrating other positioning technologies into UAV systems, including vision sensors [
15,
16,
17]. The integrated use of these technologies enables UAVs to operate in more intricate and demanding environments, accomplishing tasks such as navigating through urban buildings or conducting search and rescue operations in forests. While this approach eliminates the need for infrastructure, visual localization often entails extensive image processing and computational tasks, demanding high-performance hardware and sophisticated algorithms. This can pose challenges in meeting real-time requirements, particularly on resource-constrained UAV platforms.
On the other hand, there are also examples of infrastructure-independent solutions, but they may be inadequate in certain aspects. In [
18], a method for RL using radar is proposed. Radar is capable of providing high-precision distance measurements and is very useful for precise RL. However, its equipment cost is high, and it is sensitive to ambient light and transparent objects. Radio Frequency Identification (RFID) systems are suitable for tag recognition at short distances and are applicable for indoor positioning [
19]. However, for applications with large-range, high-precision RL requirements, the accuracy of the RFID system may be low. In [
20], an RL algorithm based on the fusion of relative navigation sensors was proposed. It mainly involves combining information from different sensors, such as inertial navigation systems and magnetometers, to improve the robustness of RL. Nevertheless, the design and calibration of sensor fusion algorithms are relatively complex and can be susceptible to sensor errors. In [
21], an RL algorithm based on visual–inertial navigation fusion was proposed. Combining visual and inertial navigation can overcome the sensor switching problem when transitioning between indoors and outdoors, providing more comprehensive RL information. However, covering large areas may require additional devices to capture enough feature points. The main idea of [
22] is to present a weight matrix, simplifying the average consensus algorithm over mobile wireless sensor networks and thereby prolonging the network lifetime as well as ensuring the proper operation of the algorithm. However, the majority of wireless sensor networks discussed in this article are depicted as undirected graphs, which may not adequately address the complexities and fluctuations present in real-world environments.
More recently, a consensus-based leader–follower algorithm was developed in mobile sensor networks, where the goal for the entire network is to converge to the state of the leader [
23]. However, this article does not address the positioning issue. A control strategy for a quadrotor elliptical target orbit based on uncertain and non-periodic updates of angle measurements was proposed in [
24]. At the translational level, only orientation data are utilized, without incorporating the target’s prior position and velocity information. A position estimator is developed for locating unknown targets. However, the influence of measurement noise is not addressed. In [
25], a UAV group circumnavigation control strategy is proposed, in which the UAV circumnavigation orbit is an ellipse whose size can be adjusted arbitrarily. However, the article does not include any research on RL-related aspects. Unlike the work in [
26], earlier research focused on utilizing multiple agents to localize targets under fixed topology assumptions. This paper addresses the challenge of cooperative localization in a time-varying topology without an infrastructure, such as anchor points.
In this paper, a directed graph model is employed to represent exchanged information. As measurement failures may occur or a UAV could move beyond the sensing range of its neighbors, the directed graph describing the information flow relationship is time-varying. When a UAV strays beyond the operation area, it triggers a return-to-base protocol, where the UAV autonomously navigates back to the designated area using predefined waypoints or by following a path generated by the control system. To address the RL problem in this scenario, a discrete-time RL direct estimation is proposed for each UAV. The estimation of the relative position concerning its neighbors leverages distance and rate-of-change measurements, the angle of arrival, and the velocities of both itself and its neighbor. The displacements of neighbors during the intervals when distance and rate of change measurements are lost are also taken into account. Moreover, an RL fusion estimation method is devised for each UAV. This fusion estimation involves fusing the relative position estimates of the UAV concerning UGV0 and its neighbors. This approach enables UAVs without direct distance or angle measurements to locate UGV0 with the assistance of their neighbors. Subsequently, we apply the proposed RL fusion estimation algorithm to circumnavigation. In contrast to [
25], our system integrates RL fusion estimations with a circumnavigation controller.
The primary contributions are outlined as follows:
(1) When the information flow graph between adjacent UAVs is unidirectional and time-varying, this paper proposes a distributed state observer with state switching to dynamically estimate the positions of UGV0. Only local measurements and limited information exchanges between nearby UAVs are used to estimate the relative coordinates of a group of UAVs concerning a single UGV0. The RL direct estimation error is bounded even in the presence of measurement noise.
(2) To enhance the robustness of RL, consensus-based RL fusion estimation is proposed. The boundedness of the RL fused estimation error is analyzed, and the experimental results demonstrate the effectiveness of the proposed method. The proposed RL method enables each UAV to continuously estimate its relative coordinates to UGV0, even in the absence of any relative measurements concerning UGV0 or its neighbors.
(3) The effectiveness of the entire system was demonstrated through numerical simulations of UAVs using RL fusion estimation for circumnavigation. The system integrates RL into circumnavigation control through UWB ranging and communication networks. The RL scheme proposed in this article applies not only to two-dimensional space but also to three-dimensional space.
The remainder of this article is structured as follows:
Section 2 presents the problem formulation.
Section 3 proposes an indirect RL method and consistency-based RL fusion estimation.
Section 4 discusses the use of RL fusion estimation for circumnavigation.
Section 5 conducts simulation experiments, and the article is summarized in
Section 6.
2. Problem Formulation
This paper considers a network consisting of a single dynamic UGV and
N UAVs, labeled 0 and
, respectively. Define the position of each UAV as
. If
, then in the local coordinate system of UAV
i, the relative position of
j is
. Simultaneously, each UAV uses these relative estimates for circumnavigation. Let
represent the neighbors of UAV or UGV0. When
, UAV
i can obtain the distance
of UAV
j. Utilizing the obtained angle measurement information
, UAV
i can deduce the relative position
under UAV
j, as illustrated in
Figure 1. Subsequently, leveraging the relative position estimates of its neighbors, each UAV generates corresponding circumnavigation control commands. During circumnavigation, the neighboring UAV
j is effectively within the sensing radius; i.e.,
is less than the sensing radius. Assuming a sampling period of
T, and denoting the sampling time as
k, for simplicity,
k is used to represent
.
Assuming UAVs follow a standard particle model with UAV speeds denoted as
, the relationship between relative speed and position is given by
, where
represents the relative speed of UAV
j in the
i coordinate system. The angle measurement of neighbor
j is denoted as
, and this neighbor can be the target UGV0. It is assumed that each UAV
i has access to its own speed; distance, represented by
; and angle measurement
in its own inertial frame. The reference frame is denoted as
and is the same as
. This can be achieved if the UAV is equipped with a compass. Furthermore, assume that each UAV
i is equipped with airborne sensors, allowing it to obtain the distance measurement
and change rate
of its neighbor UAV
j, or the distance measurement
and change rate
of UGV0. As shown in
Figure 1,
and
can be obtained. The goal is to develop an estimator so that each UAV can estimate its relative coordinate
in UGV0’s frame
. With these RL estimates and measurements of distance and orientation between UAVs, the next goal is to integrate RL into circumnavigation control. Next, let us introduce graph theory.
If each UAV is viewed as a node, their interrelationships can be represented by a directed graph denoted as , where is the set of all nodes and u corresponds to the set of N UAVs. If , then there is a corresponding arc in the directed graph, and UAVi can measure the distance, angle, and its corresponding rate of change. To study RL problems (e.g., estimating the position relative to UAVj), another weighted directed graph is also considered. Here, is the set of all nodes, is the set of all arcs in the graph, is the weighted adjacency matrix, and each element of the matrix is positive. Given , if there is an arc in graph ℓ, then ; otherwise, . It is worth noting that ℓ may be time-varying due to possible interruptions in the measurement. Assume that the set comprises all ordinary UAVs, referred to as the original set. UGV0 is introduced as the source point to form a new set , known as the expanded set. Here, , where denotes the edge set comprising UGV0 and its surrounding neighbor UAVs. signifies that UAVi and UAVj can exchange speed and data packets.
Define the Laplacian matrix of the weighted directed graph ℓ as , and the diagonal matrix as the degree matrix of ℓ, where the diagonal element is . To investigate the RL problem, a system comprising N UAVs and UGV0 is associated with another graph. The graph ℓ composed of N UAVs is a subgraph of . Let represent the set of neighbor nodes of node i in , which may include UGV0. If UAVi can obtain direct observation of the UGV0 distance or angle measurement , then ; otherwise, . Define a diagonal matrix as the target adjacency matrix associated with , with diagonal element . If UAVj is the neighbor of UAVi, and UAVi can directly obtain the distance and angle measurement of UAVj, then ; otherwise, . For , if there exists a pathway from UGV0 to UAVi, we consider UGV0 to be jointly reachable.
3. Cooperative RL Algorithm
In this section, we propose a distributed RL algorithm based on mixed measurements. It addresses the challenge of estimating the relative coordinates of a UAV in the local frame of its neighbors. Through this approach, if UGV0 is a neighbor of the UAV, the UAV can acquire relative measurements of UGV0. Subsequently, the UAV can directly estimate its coordinates in the local frame of UGV0, termed a direct estimate. Conversely, if UGV0 is not a neighbor of the UAV, the UAV cannot utilize the relative measurements between UGV0 and itself to estimate its coordinates concerning UGV0. In this scenario, if the UAV can access the relative coordinates of its neighbors and the neighbors know the relative coordinates of UGV0, the UAV can deduce the relative coordinates of UGV0 through its neighbors. Nevertheless, multiple neighbors could potentially aid the UAV in establishing the relative coordinates of UGV0 through this method. To avoid dependence on unique neighbors, it is essential to combine multiple estimates. Furthermore, even with the availability of direct estimation, combining indirect estimation can enhance accuracy and expedite convergence. Hence, a consensus-based fusion estimation method is devised for each UAV in the second part of this section to fuse both direct estimates and all accessible indirect estimates.
3.1. RL Direct Estimation Relies on Persistent Excitation
As we all know, UAVs can experience measurement and communication losses due to harsh environments or sensor failures. In this subsection, we assume that UAVi () can communicate with UGV0 and has access to distance measurement and the rate of change in certain time intervals, in addition to angle measurement and . A direct estimator is designed to estimate the relative coordinates of UAVi in the local frame of UAVj.
Due to unreliability in local relative measurements, assume that UAV
i obtains measurements relative to UAV
j,
and
, or angle measurements
and
at time
, with a measurement break at
. An indicator function, denoted as
, is defined to represent the status. When
, UAV
i can acquire local relative measurements
and
or angle measurement
and
; otherwise,
. As illustrated in
Figure 2,
. Taking the derivatives on both sides of
we can obtain
. Two unit vectors are constructed from the angle measurement information
: the unit vector
pointing from UAV
i to UAV
j, and the vector
obtained by rotating it 90° counterclockwise. Because vectors
and
have the same direction, and
and
are perpendicular to each other, the constraint equation
can be obtained. When
, UAV
i can obtain the estimation algorithm of
through information from measurements and communication. Estimates can become inaccurate due to UAV motion, as the sensors are subject to interference. Assume that the estimated value
before the interruption is initially employed. Once communication and measurement are restored, that is, when
, UAV
i will correct the deviation caused by the estimated value in the case of
. Considering sensor noise, the RL direct estimation of UAV
i in the local coordinate system of UAV
j at time
k is estimated as follows:
where
represent the measurement noise of
at time
k, respectively.
K is a tunable constant gain.
represents the coordinate of UAV
i in the local coordinate system
of UAV
j. When
, it indicates the distance measurement is normal, and when
, it indicates the angle measurement is normal. It will be demonstrated later that the RL direct estimation error is bounded in the presence of noise.
Let the estimation error of the above observer (1) be denoted as
, and the dynamic equation of the estimation error is
Let us begin by introducing the concept of persistent excitation [
27] before moving on to the discussion of the convergence of error system (2). There exists a positive integer
m,
such that for any
, there is
where
T is the sampling period. Next, let us analyze the physical meaning of persistent excitation. Assuming that the speed of each UAV is continuously differentiable and bounded, the upper bound obviously holds. Therefore, the main focus is on (3), the lower bound. Expanding Equation (
3), we can obtain
The two components of
are represented as
and
, which are linearly independent of
. According to the inequality of Cauchy–Bunyakovsky, for any
, the following formula
holds. Both sides of the above inequality are equal if and only if
and
are linearly related.
Theorem 1. According to Equation (1), when persistent excitation (3) holds for UAVi and if the sampling period T satisfies the given criterionthen the estimation error of system (2) is bounded. The upper bound of the error is determined by a specific value , ensuring that the error satisfies , the specified condition. To proceed, suppose there exists a constant such that Proof. Firstly, consider the system related to (2):
Construct the Lyapunov function
, and the difference in the Lyapunov function within
m time steps can be expressed as
In addition, applying the induction method to system (5), the
f-step RL direct estimation error range
is obtained. Now, (6) can be expressed as
Observing (2) and (5), we can see that if , then for any is satisfied. In the interval, remains unchanged, while is time-varying. However, it can be seen from (2) that when , . To sum up, it can be deduced that as long as , then for any , .
In addition, according to the definition of , when , . When , . It can be seen from (3) that the RL direct estimation error of (5) is bounded in the case of noise interference. That is, there exists that satisfies for any , if and only if there exists , and (3) holds for any . To now, the RL direct estimation error bound can be expressed as follows. Combining (4) and (7) leads to , where . Therefore, when . It can be proven that when ; the proof of Theorem 1 is complete. □
3.2. Fusion-Based RL Estimation
In the previous
Section 3.1, we assumed that local relative measurements (
, rate of change
, and
) are unreliable, with measurement values available only at certain moments. An estimator was then designed for the UAV
i to position UGV0. If UGV0 is a neighbor of UAV
i, then UAV
i can obtain its direct estimate
in the local coordinate system of UGV0. However, due to harsh environments or temporary sensor failures, a UAV may lose its local relative measurements. Worse still, some UAVs may not always have relative measurements with respect to UGV0 because UGV0 is outside their sensing range. In this case, cooperation among neighbor UAVs is needed to help UGV0 localize. In this subsection, an RL indirect estimation method is developed for each UAV
i to estimate its relative coordinates with respect to UAV
j, even when it may lack direct measurements for UGV0 or experience measurement failures over time.
If UAV
i can estimate its coordinate
relative to neighboring UAV
r, UAV
r can also share its estimate
with UAV
i. As a result, UAV
i can indirectly obtain its estimate relative to UAV
j through UAV
r, as illustrated in
Figure 3. The formula is as follows:
Expanding on both direct and indirect RL algorithms, we also explore RL fusion estimation between UAVs to achieve target positioning. RL fusion estimation serves a dual purpose. It aids UAVs lacking direct target measurements, enhancing their ability to locate targets. Simultaneously, it bolsters the robustness of relying on RL direct estimation. Utilizing information gathered by UAVs and integrating both direct (1) and indirect (8) RL estimators, an RL fusion estimation method is proposed. Each UAV
i employs the following estimation method to update its final estimate, as expressed in the formula:
Consequently, each UAV
i updates its fused estimate using (9), irrespective of its ability to directly obtain relative measurements about UGV0.
Theorem 2. If the conditions of Theorem 1 are met; if it is assumed that, for every node pair in that appears infinitely many times, the persistent excitation (3) is satisfied; if each node in is uniformly jointly reachable from UGV0; and if the fusion weight satisfies , then the fusion estimate of each UAVi asymptotically converges to its true coordinates. In the presence of measurement noise, the RL fused estimation error is bounded.
Definition 1 (Stability of a discrete system input state [
28]).
Consider a nonlinear system given by equation . If there exists a function class : and a class function γ such that for any control input and any , the following formula holds for any positive integer k, then the system in Definition 1 is said to be input-to-state stable (ISS). Lemma 1 ([
28]).
If A is a Schur matrix, then the discrete system is ISS.
Lemma 2 ([
29]).
Matrix is positively stable, indicated by eigenvalues with positive real parts, if and only if UAVj is jointly reachable in Γ.
Proof. For any given
, let
. Then, (9) can be rewritten as
where the
part serves as the input signal. Aggregating all the equations in
, (10) can be expressed in matrix form as
where
p is two or three, determined by the dimensions of the environmental space.
represents the weighted Laplacian matrix of the graph
, and
is the adjacency matrix of
j associated with
.
Firstly, considering the unforced system in
it can be verified that all eigenvalues of
strictly lie within the unit circle centered at the origin. Therefore,
is a Schur matrix. As UAV
i and UGV0 are uniformly and jointly reachable, according to Lemmas 1 and 2, when
, all components of the solution of (12) uniformly converge exponentially to a certain common value. It can be concluded that when
(
c is a constant), indicating the exponential stability of (12).
Consider (11) next. According to Definition 1, for any . According to Theorem 1, for any positive constant . Since is a function of class, it follows that when . Because is bounded and is a function of class, when . Therefore, when . The proof of Theorem 2 is complete. □
4. Integrated Solutions for RL and Circumnavigation
In this section, we propose an integrated solution combining RL and circumnavigation to facilitate the rotation of multi-UAVs around UGV0 while maintaining a circular formation. This capability is particularly valuable in practical scenarios like surrounding and entrapping a hostile target. As depicted in
Figure 4, it relies on distance
and angle measurements
. An adaptive estimator will be formulated to attain relative positioning with the assistance of a specifically designed bounded input
. Let
denote the relative position to the UGV0.
is expressed as an estimate of
. Note that
should also satisfy
to address the circumnavigation problem (13), where
U is the maximum velocity.
Given UGV0 at any location
, the objective is to enable the UAV to orbit around UGV0. The formula is as follows:
where
represents the position of UAV
i at time step
k. For trajectory planning purposes, consider a discrete-time integrator model with bounded velocity:
where
T is the sampling period.
Based on this, a circumnavigation control law involving RL fusion estimation is proposed:
where
is any non-zero real scalar and
E is the rotation matrix.
represents the estimated value of
. Additionally, a constant positive scalar
is introduced. By integrating the previously developed RL algorithm (9) with the circumnavigation control algorithm (14), we propose an integrated RL fusion estimation and circumnavigation control algorithm.